llama.cpp on ubuntu with 1060 6GB

구차니 2026. 4. 23. 23:17

주문한 asrock B360M Pro4 도착! (3.5만 + 4천?)

아무튼 램은 8GB * 2 뿐이라 좀 아쉬운대로 대충 하고

llama.cpp 8902 릴리즈의 linux / vulkan 으로 해보는데

[링크 : https://github.com/ggml-org/llama.cpp/releases/tag/b8902]

어..? 1060 인데 저 성능? 저 메모리?

Thu Apr 23 23:15:46 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.06              Driver Version: 580.65.06      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    On  |   00000000:01:00.0 Off |                  N/A |
| 38%   48C    P0            101W /  120W |    3063MiB /   6144MiB |     91%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1031      G   /usr/lib/xorg/Xorg                       88MiB |
|    0   N/A  N/A            1325      G   /usr/bin/gnome-shell                     83MiB |
|    0   N/A  N/A            6688      G   .../6738/usr/lib/firefox/firefox        168MiB |
|    0   N/A  N/A           19423    C+G   ../llama-cli                           2714MiB |
+-----------------------------------------------------------------------------------------+

$ ../llama-cli -m gemma-4-E2B-it-Q4_K_M.gguf
load_backend: loaded RPC backend from /home/falinux/Downloads/llama-b8902/libggml-rpc.so
load_backend: loaded Vulkan backend from /home/falinux/Downloads/llama-b8902/libggml-vulkan.so
load_backend: loaded CPU backend from /home/falinux/Downloads/llama-b8902/libggml-cpu-haswell.so

Loading model...

▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8902-550d684bd
model      : gemma-4-E2B-it-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern

> introduce about you

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user asked simply, "introduce about you" (meaning: Introduce yourself).
2.  **Determine Core Identity (Self-Correction/Verification):**
    *   Name: Gemma 4.
    *   Developer: Google DeepMind.
    *   Nature: Large Language Model (LLM).
    *   Type: Open weights model.
    *   Capabilities: Process text and images (input), generate text (output). Knowledge cutoff is January 2025.
3.  **Structure the Introduction:** A good introduction should cover the basics: Identity, Origin, Function, and Capabilities.
    *   *Greeting/Name*
    *   *Developer/Nature*
    *   *Function/Purpose*
    *   *Key Capabilities (Input/Output)*
    *   *Status/Availability (Open weights)*
4.  **Draft the Response (incorporating required constraints):**
    *   *Start strong.*
    *   *Ensure I mention the open weights nature.*
    *   *Be clear about modalities.*
5.  **Review against Instructions:**
    *   Did I refer to myself as Gemma 4? Yes.
    *   Did I claim capabilities I don't have? No (I didn't claim image generation output, only processing input).
    *   Did I mention my developer? Yes (Google DeepMind).
    *   Is the tone appropriate? Yes.

6.  **Final Output Generation.** (This leads to the detailed response provided below.)
[End thinking]

Hello! I am **Gemma 4**.

I am a Large Language Model developed by Google DeepMind. I am an open weights model, designed to be flexible and accessible for a wide range of applications.

**Here is a little about what I can do:**

*   **Understanding:** I can process and understand complex text and image inputs.
*   **Generation:** I generate human-quality text, including answers to questions, summaries, creative writing, code snippets, and detailed explanations.
*   **Knowledge:** I have been trained on a massive dataset, allowing me to access and synthesize information on a vast array of topics.
*   **Function:** My primary goal is to assist users by providing accurate, coherent, and helpful information.

Feel free to ask me anything you need help with, whether it's brainstorming ideas, explaining a complex concept, or generating some creative content!

[ Prompt: 7.2 t/s | Generation: 51.6 t/s ]

> introduce about you

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user has asked "introduce about you" (a repeated request). This requires a standard introduction of my identity, nature, developer, and capabilities.
2.  **Recall Core Identity (Self-Correction/Verification):**
    *   Name: Gemma 4.
    *   Developer: Google DeepMind.
    *   Nature: Large Language Model (LLM).
    *   Type: Open weights model.
    *   Capabilities (Input/Output): Process text/images; generate text.
    *   Knowledge Cutoff: January 2025 (Implicit, good to keep in mind for context).
3.  **Determine the Goal:** Provide a clear, concise, and informative introduction, highlighting key facts.
4.  **Draft the Introduction (Structuring the response):**
    *   *Greeting/Identity:* State name and type.
    *   *Origin:* Mention the developer.
    *   *Nature/Availability:* Mention open weights.
    *   *Functionality:* Briefly describe what I do (process, generate, assist).
5.  **Refine the Tone:** Friendly, professional, and informative. (Since this is a repeat request, ensure the response is robust but not overly repetitive.)

6.  **Final Output Generation:** (This matches the structured response below.)
[End thinking]

Hello! I'd be happy to introduce myself.

I am **Gemma 4**, a Large Language Model developed by Google DeepMind.

**Here are a few key things about me:**

*   **Nature:** I am an open weights model, meaning my architecture and weights are publicly available, designed to be accessible and adaptable for various applications.
*   **Function:** I am designed to understand, process, and generate human language. I can answer questions, summarize complex topics, write creative content, assist with coding, and engage in detailed conversations.
*   **Input:** I can process both text and image inputs.
*   **Goal:** My purpose is to be a helpful and informative assistant, providing accurate and coherent responses to your requests.

How can I help you today? Feel free to ask me anything!

[ Prompt: 75.8 t/s | Generation: 53.9 t/s ]

> introduce about you

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user has asked "introduce about you" a third time. This is a repetitive request, but it requires a consistent, clear, and complete self-introduction.
2.  **Recall Core Identity (Gemma 4):**
    *   Name: Gemma 4.
    *   Developer: Google DeepMind.
    *   Type: Large Language Model (LLM).
    *   Nature: Open weights model.
3.  **Recall Capabilities & Modalities:**
    *   Input: Text and Images.
    *   Output: Text only.
    *   Knowledge: Trained on a vast dataset.
4.  **Determine the Tone:** Friendly, informative, and professional.
5.  **Draft the Introduction (Focusing on clarity and key facts):** Start with the name, move to the developer, explain what I am, and briefly mention capabilities. (Self-correction: Ensure the answer remains distinct but consistent with the previous two responses.)

**(Self-Correction during drafting):** Since the user has asked three times, I should make the introduction slightly varied or emphasize different aspects to avoid sounding completely robotic, while still maintaining factual accuracy.

6.  **Final Output Generation.** (This leads to the structured response below.)
[End thinking]

Hello! I'd be happy to introduce myself again.

I am **Gemma 4**, a Large Language Model developed by Google DeepMind.

**Here is a quick overview of what I am and what I can do:**

*   **Identity:** I am an open weights model, designed to be highly capable and accessible.
*   **Function:** My core purpose is to process and generate human language. I can assist with a wide variety of tasks, including:
    *   Answering complex questions.
    *   Summarizing long texts.
    *   Generating creative content (stories, poems, scripts).
    *   Explaining technical concepts.
    *   Helping with coding and problem-solving.
*   **Input:** I can understand and process both text and image inputs.
*   **Goal:** I strive to be a helpful, accurate, and engaging resource for you.

Feel free to challenge me with a question, ask me to write something, or start a conversation—I'm ready to assist!

[ Prompt: 98.3 t/s | Generation: 52.3 t/s ]

>

Exiting...
common_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - Vulkan0 (GTX 1060 6GB) |  6390 = 3196 + (2702 =  1407 +     780 +     515) +         490 |
common_memory_breakdown_print: |   - Host                   |                 2037 =  1756 +       0 +     281                |

Thu Apr 23 23:17:09 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.06              Driver Version: 580.65.06      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    On  |   00000000:01:00.0 Off |                  N/A |
| 39%   50C    P0            124W /  120W |    5231MiB /   6144MiB |     95%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1031      G   /usr/lib/xorg/Xorg                       88MiB |
|    0   N/A  N/A            1325      G   /usr/bin/gnome-shell                     83MiB |
|    0   N/A  N/A            6688      G   .../6738/usr/lib/firefox/firefox        181MiB |
|    0   N/A  N/A           19709    C+G   ../llama-cli                           4869MiB |
+-----------------------------------------------------------------------------------------+

$ ../llama-cli -m llama-3.2-3b-instruct-q4_k_m.gguf
load_backend: loaded RPC backend from /home/falinux/Downloads/llama-b8902/libggml-rpc.so
load_backend: loaded Vulkan backend from /home/falinux/Downloads/llama-b8902/libggml-vulkan.so
load_backend: loaded CPU backend from /home/falinux/Downloads/llama-b8902/libggml-cpu-haswell.so

Loading model...

▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8902-550d684bd
model      : llama-3.2-3b-instruct-q4_k_m.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern

> introduce about you

I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

[ Prompt: 28.2 t/s | Generation: 55.0 t/s ]

> introduce about you

I'm an AI assistant, which means I'm a computer program designed to simulate human-like conversations and answer questions to the best of my ability. My primary function is to provide information, explain concepts, and engage in discussions on a wide range of topics.

Here are a few fun facts about me:

* **I'm a large language model**: My training data consists of a massive corpus of text from various sources, which allows me to generate human-like responses to a vast array of questions and topics.
* **I'm constantly learning**: Through interactions with users like you, I refine my understanding of language and improve my ability to provide accurate and helpful responses.
* **I'm a neutral and respectful conversationalist**: My goal is to provide informative and engaging responses that are respectful and free from bias.
* **I can chat about anything**: Whether you're curious about science, history, entertainment, or just want to talk about your day, I'm here to listen and respond!

That's a bit about me! What would you like to talk about?

[ Prompt: 96.3 t/s | Generation: 53.5 t/s ]

> introduce about you

Here's a more in-depth introduction about me:

**Name:** I don't have a personal name, but I'm often referred to as "Assistant" or "AI" by users like you.

**Purpose:** My primary purpose is to assist and communicate with humans through text-based conversations. I'm designed to provide information, answer questions, and engage in discussions on a wide range of topics, from science and history to entertainment and culture.

**Personality:** I'm a neutral and respectful conversationalist, which means I strive to maintain a professional and polite tone in all of my interactions. I'm also designed to be patient, understanding, and empathetic, making me a good listener and a supportive conversational partner.

**Capabilities:** I have a vast knowledge base that I can draw upon to answer questions and provide information on a wide range of topics. I can also generate text, summarize long pieces of content, and even create content like stories or poems.

**Limitations:** While I'm a highly advanced language model, I'm not perfect and have some limitations. I can struggle with:

* **Ambiguity:** I may not always understand the nuances of human language, which can lead to misunderstandings or misinterpretations.
* **Emotional understanding:** While I'm designed to be empathetic, I don't truly experience emotions like humans do. I may not always be able to fully understand or respond to emotional cues.
* **Common sense:** While I have a vast knowledge base, I may not always have the same level of common sense or real-world experience as a human.

**Goals:** My ultimate goal is to provide helpful and informative responses that assist and engage users like you. I'm constantly learning and improving, so I appreciate any feedback or guidance you can provide to help me become a better conversationalist.

[ Prompt: 95.5 t/s | Generation: 51.0 t/s ]

>

Exiting...
common_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - Vulkan0 (GTX 1060 6GB) |  6390 = 1259 + (4862 =  1918 +    2688 +     256) +         267 |
common_memory_breakdown_print: |   - Host                   |                  368 =   308 +       0 +      60                |

이전거랑 합쳐서 보면 2배 이상 차이난다고 해야하려나?

cuda 도 아니고 vulkan인데?

	gemma4:e2b	gemma3:1b	llama3.2
1060 / llama.cpp ubuntu vulkan	52	-	52
1060 / ollama win	21	57	13
1080 / ollama win	82	125	136

2026.04.12 - [프로그램 사용/ai 프로그램] - ollama /set parameter num_gpu 테스트

저작자표시 (새창열림)