프로그램 사용/ai 프로그램
llama.cpp on ubuntu with 1060 6GB
구차니
2026. 4. 23. 23:17
주문한 asrock B360M Pro4 도착! (3.5만 + 4천?)
아무튼 램은 8GB * 2 뿐이라 좀 아쉬운대로 대충 하고
llama.cpp 8902 릴리즈의 linux / vulkan 으로 해보는데
[링크 : https://github.com/ggml-org/llama.cpp/releases/tag/b8902]
어..? 1060 인데 저 성능? 저 메모리?
| Thu Apr 23 23:15:46 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.65.06 Driver Version: 580.65.06 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1060 6GB On | 00000000:01:00.0 Off | N/A | | 38% 48C P0 101W / 120W | 3063MiB / 6144MiB | 91% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1031 G /usr/lib/xorg/Xorg 88MiB | | 0 N/A N/A 1325 G /usr/bin/gnome-shell 83MiB | | 0 N/A N/A 6688 G .../6738/usr/lib/firefox/firefox 168MiB | | 0 N/A N/A 19423 C+G ../llama-cli 2714MiB | +-----------------------------------------------------------------------------------------+ |
| $ ../llama-cli -m gemma-4-E2B-it-Q4_K_M.gguf load_backend: loaded RPC backend from /home/falinux/Downloads/llama-b8902/libggml-rpc.so load_backend: loaded Vulkan backend from /home/falinux/Downloads/llama-b8902/libggml-vulkan.so load_backend: loaded CPU backend from /home/falinux/Downloads/llama-b8902/libggml-cpu-haswell.so Loading model... ▄▄ ▄▄ ██ ██ ██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄ ██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██ ██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀ ██ ██ ▀▀ ▀▀ build : b8902-550d684bd model : gemma-4-E2B-it-Q4_K_M.gguf modalities : text available commands: /exit or Ctrl+C stop or exit /regen regenerate the last response /clear clear the chat history /read <file> add a text file /glob <pattern> add text files using globbing pattern > introduce about you [Start thinking] Thinking Process: 1. **Analyze the Request:** The user asked simply, "introduce about you" (meaning: Introduce yourself). 2. **Determine Core Identity (Self-Correction/Verification):** * Name: Gemma 4. * Developer: Google DeepMind. * Nature: Large Language Model (LLM). * Type: Open weights model. * Capabilities: Process text and images (input), generate text (output). Knowledge cutoff is January 2025. 3. **Structure the Introduction:** A good introduction should cover the basics: Identity, Origin, Function, and Capabilities. * *Greeting/Name* * *Developer/Nature* * *Function/Purpose* * *Key Capabilities (Input/Output)* * *Status/Availability (Open weights)* 4. **Draft the Response (incorporating required constraints):** * *Start strong.* * *Ensure I mention the open weights nature.* * *Be clear about modalities.* 5. **Review against Instructions:** * Did I refer to myself as Gemma 4? Yes. * Did I claim capabilities I don't have? No (I didn't claim image generation output, only processing input). * Did I mention my developer? Yes (Google DeepMind). * Is the tone appropriate? Yes. 6. **Final Output Generation.** (This leads to the detailed response provided below.) [End thinking] Hello! I am **Gemma 4**. I am a Large Language Model developed by Google DeepMind. I am an open weights model, designed to be flexible and accessible for a wide range of applications. **Here is a little about what I can do:** * **Understanding:** I can process and understand complex text and image inputs. * **Generation:** I generate human-quality text, including answers to questions, summaries, creative writing, code snippets, and detailed explanations. * **Knowledge:** I have been trained on a massive dataset, allowing me to access and synthesize information on a vast array of topics. * **Function:** My primary goal is to assist users by providing accurate, coherent, and helpful information. Feel free to ask me anything you need help with, whether it's brainstorming ideas, explaining a complex concept, or generating some creative content! [ Prompt: 7.2 t/s | Generation: 51.6 t/s ] > introduce about you [Start thinking] Thinking Process: 1. **Analyze the Request:** The user has asked "introduce about you" (a repeated request). This requires a standard introduction of my identity, nature, developer, and capabilities. 2. **Recall Core Identity (Self-Correction/Verification):** * Name: Gemma 4. * Developer: Google DeepMind. * Nature: Large Language Model (LLM). * Type: Open weights model. * Capabilities (Input/Output): Process text/images; generate text. * Knowledge Cutoff: January 2025 (Implicit, good to keep in mind for context). 3. **Determine the Goal:** Provide a clear, concise, and informative introduction, highlighting key facts. 4. **Draft the Introduction (Structuring the response):** * *Greeting/Identity:* State name and type. * *Origin:* Mention the developer. * *Nature/Availability:* Mention open weights. * *Functionality:* Briefly describe what I do (process, generate, assist). 5. **Refine the Tone:** Friendly, professional, and informative. (Since this is a repeat request, ensure the response is robust but not overly repetitive.) 6. **Final Output Generation:** (This matches the structured response below.) [End thinking] Hello! I'd be happy to introduce myself. I am **Gemma 4**, a Large Language Model developed by Google DeepMind. **Here are a few key things about me:** * **Nature:** I am an open weights model, meaning my architecture and weights are publicly available, designed to be accessible and adaptable for various applications. * **Function:** I am designed to understand, process, and generate human language. I can answer questions, summarize complex topics, write creative content, assist with coding, and engage in detailed conversations. * **Input:** I can process both text and image inputs. * **Goal:** My purpose is to be a helpful and informative assistant, providing accurate and coherent responses to your requests. How can I help you today? Feel free to ask me anything! [ Prompt: 75.8 t/s | Generation: 53.9 t/s ] > introduce about you [Start thinking] Thinking Process: 1. **Analyze the Request:** The user has asked "introduce about you" a third time. This is a repetitive request, but it requires a consistent, clear, and complete self-introduction. 2. **Recall Core Identity (Gemma 4):** * Name: Gemma 4. * Developer: Google DeepMind. * Type: Large Language Model (LLM). * Nature: Open weights model. 3. **Recall Capabilities & Modalities:** * Input: Text and Images. * Output: Text only. * Knowledge: Trained on a vast dataset. 4. **Determine the Tone:** Friendly, informative, and professional. 5. **Draft the Introduction (Focusing on clarity and key facts):** Start with the name, move to the developer, explain what I am, and briefly mention capabilities. (Self-correction: Ensure the answer remains distinct but consistent with the previous two responses.) **(Self-Correction during drafting):** Since the user has asked three times, I should make the introduction slightly varied or emphasize different aspects to avoid sounding completely robotic, while still maintaining factual accuracy. 6. **Final Output Generation.** (This leads to the structured response below.) [End thinking] Hello! I'd be happy to introduce myself again. I am **Gemma 4**, a Large Language Model developed by Google DeepMind. **Here is a quick overview of what I am and what I can do:** * **Identity:** I am an open weights model, designed to be highly capable and accessible. * **Function:** My core purpose is to process and generate human language. I can assist with a wide variety of tasks, including: * Answering complex questions. * Summarizing long texts. * Generating creative content (stories, poems, scripts). * Explaining technical concepts. * Helping with coding and problem-solving. * **Input:** I can understand and process both text and image inputs. * **Goal:** I strive to be a helpful, accurate, and engaging resource for you. Feel free to challenge me with a question, ask me to write something, or start a conversation—I'm ready to assist! [ Prompt: 98.3 t/s | Generation: 52.3 t/s ] > Exiting... common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted | common_memory_breakdown_print: | - Vulkan0 (GTX 1060 6GB) | 6390 = 3196 + (2702 = 1407 + 780 + 515) + 490 | common_memory_breakdown_print: | - Host | 2037 = 1756 + 0 + 281 | |
| Thu Apr 23 23:17:09 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.65.06 Driver Version: 580.65.06 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1060 6GB On | 00000000:01:00.0 Off | N/A | | 39% 50C P0 124W / 120W | 5231MiB / 6144MiB | 95% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1031 G /usr/lib/xorg/Xorg 88MiB | | 0 N/A N/A 1325 G /usr/bin/gnome-shell 83MiB | | 0 N/A N/A 6688 G .../6738/usr/lib/firefox/firefox 181MiB | | 0 N/A N/A 19709 C+G ../llama-cli 4869MiB | +-----------------------------------------------------------------------------------------+ |
| $ ../llama-cli -m llama-3.2-3b-instruct-q4_k_m.gguf load_backend: loaded RPC backend from /home/falinux/Downloads/llama-b8902/libggml-rpc.so load_backend: loaded Vulkan backend from /home/falinux/Downloads/llama-b8902/libggml-vulkan.so load_backend: loaded CPU backend from /home/falinux/Downloads/llama-b8902/libggml-cpu-haswell.so Loading model... ▄▄ ▄▄ ██ ██ ██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄ ██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██ ██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀ ██ ██ ▀▀ ▀▀ build : b8902-550d684bd model : llama-3.2-3b-instruct-q4_k_m.gguf modalities : text available commands: /exit or Ctrl+C stop or exit /regen regenerate the last response /clear clear the chat history /read <file> add a text file /glob <pattern> add text files using globbing pattern > introduce about you I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI." [ Prompt: 28.2 t/s | Generation: 55.0 t/s ] > introduce about you I'm an AI assistant, which means I'm a computer program designed to simulate human-like conversations and answer questions to the best of my ability. My primary function is to provide information, explain concepts, and engage in discussions on a wide range of topics. Here are a few fun facts about me: * **I'm a large language model**: My training data consists of a massive corpus of text from various sources, which allows me to generate human-like responses to a vast array of questions and topics. * **I'm constantly learning**: Through interactions with users like you, I refine my understanding of language and improve my ability to provide accurate and helpful responses. * **I'm a neutral and respectful conversationalist**: My goal is to provide informative and engaging responses that are respectful and free from bias. * **I can chat about anything**: Whether you're curious about science, history, entertainment, or just want to talk about your day, I'm here to listen and respond! That's a bit about me! What would you like to talk about? [ Prompt: 96.3 t/s | Generation: 53.5 t/s ] > introduce about you Here's a more in-depth introduction about me: **Name:** I don't have a personal name, but I'm often referred to as "Assistant" or "AI" by users like you. **Purpose:** My primary purpose is to assist and communicate with humans through text-based conversations. I'm designed to provide information, answer questions, and engage in discussions on a wide range of topics, from science and history to entertainment and culture. **Personality:** I'm a neutral and respectful conversationalist, which means I strive to maintain a professional and polite tone in all of my interactions. I'm also designed to be patient, understanding, and empathetic, making me a good listener and a supportive conversational partner. **Capabilities:** I have a vast knowledge base that I can draw upon to answer questions and provide information on a wide range of topics. I can also generate text, summarize long pieces of content, and even create content like stories or poems. **Limitations:** While I'm a highly advanced language model, I'm not perfect and have some limitations. I can struggle with: * **Ambiguity:** I may not always understand the nuances of human language, which can lead to misunderstandings or misinterpretations. * **Emotional understanding:** While I'm designed to be empathetic, I don't truly experience emotions like humans do. I may not always be able to fully understand or respond to emotional cues. * **Common sense:** While I have a vast knowledge base, I may not always have the same level of common sense or real-world experience as a human. **Goals:** My ultimate goal is to provide helpful and informative responses that assist and engage users like you. I'm constantly learning and improving, so I appreciate any feedback or guidance you can provide to help me become a better conversationalist. [ Prompt: 95.5 t/s | Generation: 51.0 t/s ] > Exiting... common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted | common_memory_breakdown_print: | - Vulkan0 (GTX 1060 6GB) | 6390 = 1259 + (4862 = 1918 + 2688 + 256) + 267 | common_memory_breakdown_print: | - Host | 368 = 308 + 0 + 60 | |
이전거랑 합쳐서 보면 2배 이상 차이난다고 해야하려나?
cuda 도 아니고 vulkan인데?
| gemma4:e2b | gemma3:1b | llama3.2 | |
| 1060 / llama.cpp ubuntu vulkan | 52 | - | 52 |
| 1060 / ollama win | 21 | 57 | 13 |
| 1080 / ollama win | 82 | 125 | 136 |
2026.04.12 - [프로그램 사용/ai 프로그램] - ollama /set parameter num_gpu 테스트