좀 더 자세히 써있는 내용

어떻게 문장을 넣나 했더니

임베딩한 벡터를 1차원씩 넣어주고

결과로 나온 것과 입력 1차원씩 합쳐서 계속 넣어주고

뺄때는 최대 토큰 갯수 혹은

EOS(End of Sequence)가 나올때 까지 

출력 벡터가 나오면 다시 그걸 이어서 출력단에 다시 넣어주면

전체 계산하는게 아니라 출력단만 빠르게 계산되어 결과가 쭈욱 나오는듯.

 

참.. 마법같네

 

[링크 : https://tech.kakaopay.com/post/how-llm-works/]

Posted by 구차니

250W 짜리를 그대로 사용할 경우

 

125W로 제한할 경우

 

30토큰이나 35토큰이나 비슷하다고 생각하면 비슷한 수준이라

전기는 절반만 먹고 약간느려지는 수준이라면 나쁘지 않은 설정인듯

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

llama.cpp ubuntu vulkan 1080ti 11GB * 2 실패  (0) 2026.04.25
llm transformer  (0) 2026.04.25
llama-server web ui  (0) 2026.04.24
llama.cpp on windows / cuda vs vulkan  (0) 2026.04.24
vscode continue.dev  (0) 2026.04.24
Posted by 구차니

llama.cpp의 llama-server 로 구동하면 모델은 바꿀수 없지만

chatGPT 나 claude와 유사한 web ui로 사용할 수 있다.

 

간략하게 토큰 생성 속도도 나온다.

 

설정은 다음과 같다.

 

 

Posted by 구차니

두 개 성능 차이가 있을줄 알았는데(cuda가 높을줄..)

거의 오차범위 내에서 동일하게 나왔다.

그럼. 리눅스에서도 cuda 빌드 안해도 쓸만하다는걸려나?

 

근데 1080 ti 라서 그런가 cuda 13 버전은 오작동하고 12 버전으로 성공함

D:\study\llm\llama-b8916-bin-win-cuda-13.1-x64>llama-cli.exe -m ..\gemma-4-E2B-it-Q4_K_M.gguf
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 11263 MiB):
  Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes, VRAM: 11263 MiB
load_backend: loaded CUDA backend from D:\study\llm\llama-b8916-bin-win-cuda-13.1-x64\ggml-cuda.dll
load_backend: loaded RPC backend from D:\study\llm\llama-b8916-bin-win-cuda-13.1-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\study\llm\llama-b8916-bin-win-cuda-13.1-x64\ggml-cpu-haswell.dll

Loading model... /ggml_cuda_compute_forward: SCALE failed
CUDA error: no kernel image is available for execution on the device
  current device: 0, in function ggml_cuda_compute_forward at D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2962
  err
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:97: CUDA error

 

gemma4 e2b 모델인데 2.89GB 짜리이고 제법 정직하게 용량을 먹는 느낌.

Fri Apr 24 22:57:19 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 582.28                 Driver Version: 582.28         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti   WDDM  |   00000000:01:00.0 Off |                  N/A |
| 27%   39C    P8             11W /  250W |    2936MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1400    C+G   ...-win-vulkan-x64\llama-cli.exe      N/A      |
|    0   N/A  N/A            1452      C   ...al\Programs\Ollama\ollama.exe      N/A      |
|    0   N/A  N/A            9728    C+G   ...em32\Kinect\KinectService.exe      N/A      |
|    0   N/A  N/A           13036    C+G   ...s\Win64\EpicGamesLauncher.exe      N/A      |
+-----------------------------------------------------------------------------------------+

 

D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64>llama-cli.exe -m ..\gemma-4-E2B-it-Q4_K_M.gguf
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 11263 MiB):
  Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes, VRAM: 11263 MiB
load_backend: loaded CUDA backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cuda.dll
load_backend: loaded RPC backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cpu-haswell.dll

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8918-e583f3b4f
model      : gemma-4-E2B-it-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> 안녕?

안녕하세요! 저는 사용자님을 돕기 위해 여기 있습니다. 😊

궁금한 점이 있으시거나 도움이 필요하시면 언제든지 말씀해 주세요!

[ Prompt: 182.3 t/s | Generation: 84.7 t/s ]

> 헬로?

[Start thinking]
Thinking Process:

1.  **Analyze the input:** The user input is "헬로?" (Hello?). It's a very informal greeting, essentially a variation of "Hello?".
2.  **Determine the context:** This is a casual interaction following a previous greeting ("안녕?" -> "안녕하세요!"). The user is acknowledging or testing the connection.
3.  **Formulate the response strategy:**
    *   Acknowledge the greeting warmly.
    *   Match the casual tone (but remain polite and helpful).
    *   Reiterate availability.
    *   Invite further interaction.
4.  **Draft potential responses (in Korean):**
    *   (Option 1 - Simple echo): 네, 안녕하세요. (Yes, hello.)
    *   (Option 2 - Enthusiastic): 네, 안녕하세요! 또 무엇을 도와드릴까요? (Yes, hello! What else can I help you with?)
    *   (Option 3 - Casual/Friendly): 네! 저는 준비되어 있습니다. 😊 (Yes! I am ready.)
5.  **Select the best response:** Option 2 is the most effective as it confirms the interaction and immediately prompts the user to state their need, moving the conversation forward.

6.  **Final Polish:** (Translate the chosen response into natural Korean.)
[End thinking]

네, 안녕하세요! 😊

저는 언제든지 사용자님의 질문에 답하거나 필요한 정보를 제공해 드릴 준비가 되어 있습니다.

궁금한 것이 있으신가요?

[ Prompt: 321.5 t/s | Generation: 84.8 t/s ]

> 으아앙

[Start thinking]
Thinking Process:

1.  **Analyze the input:** The user input is "으아앙" (Eu-aang).
2.  **Determine the nature of the input:** "으아앙" is an interjection or sound, often used to express various emotions (pain, excitement, frustration, distress, playful sounds, etc.). In this context, it's highly informal and emotional.
3.  **Assess the context:** The previous interactions were simple greetings ("안녕?", "헬로?"). The current input is a sudden, emotional expression.
4.  **Determine the appropriate response strategy:**
    *   Since the input is purely emotional and doesn't pose a question or request a task, the response should be empathetic, neutral, and open-ended.
    *   The tone should be gentle and non-judgmental.
    *   Acknowledge the emotion without demanding further explanation, allowing the user to guide the conversation.
5.  **Draft potential responses (in Korean):**
    *   *Option A (Direct inquiry):* 무슨 일 있으세요? (What's going on?) - A bit too direct.
    *   *Option B (Empathy + Open invitation):* 무슨 일이 있으신가요? 괜찮으세요? (Is something happening? Are you okay?) - Good balance.
    *   *Option C (Simple acknowledgment):* 네? (Yes?) / 어떠세요? (How are you?) - Too brief.
6.  **Select and refine the best response (B is the most helpful):** Acknowledge the sound/feeling and gently ask if the user needs assistance or wants to talk about it.

7.  **Final Check:** The response is safe, empathetic, and keeps the door open for the user to continue the conversation on their terms. (Self-correction: Avoid sounding too robotic. Keep the tone friendly.)
[End thinking]

무슨 일이 있으신가요? 😊

혹시 힘든 일이 있으시거나, 이야기하고 싶으신 것이 있다면 편하게 말씀해 주세요. 제가 들어드릴게요.

[ Prompt: 1014.9 t/s | Generation: 84.9 t/s ]

>

Exiting...
common_memory_breakdown_print: | memory breakdown [MiB]  | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - CUDA0 (GTX 1080 Ti) | 11263 = 7192 + (2981 =  1407 +     780 +     793) +        1090 |
common_memory_breakdown_print: |   - Host                |                 2037 =  1756 +       0 +     281                |

 

D:\study\llm\llama-b8916-bin-win-vulkan-x64>llama-cli.exe -m ..\gemma-4-E2B-it-Q4_K_M.gguf
load_backend: loaded RPC backend from D:\study\llm\llama-b8916-bin-win-vulkan-x64\ggml-rpc.dll
load_backend: loaded Vulkan backend from D:\study\llm\llama-b8916-bin-win-vulkan-x64\ggml-vulkan.dll
load_backend: loaded CPU backend from D:\study\llm\llama-b8916-bin-win-vulkan-x64\ggml-cpu-haswell.dll

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8916-ffdd983fb
model      : gemma-4-E2B-it-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> 안녕?

[Start thinking]
Thinking Process:

1.  **Analyze the input:** The input is "안녕?" (Annyeong?), which means "Hello?" or "Hi?". It is a casual greeting in Korean.
2.  **Determine the user's intent:** The user is initiating a conversation or simply saying hello.
3.  **Formulate an appropriate response (in Korean):**
    *   Acknowledge the greeting.
    *   Return the greeting.
    *   (Optional but good) Ask how the user is doing or what they need.
4.  **Draft potential responses:**
    *   *Simple:* 안녕하세요. (Annyeonghaseyo.) - Slightly formal.
    *   *Casual (matching the user's tone):* 안녕! / 안녕. (Annyeong!)
    *   *More engaging:* 안녕하세요! 저는 잘 지내요. (Hello! I am doing well.)
    *   *Standard AI response:* 안녕하세요! 무엇을 도와드릴까요? (Hello! What can I help you with?)
5.  **Select the best response:** Since the user used a casual greeting ("안녕?"), a friendly and equally approachable response is best.

6.  **Final Korean output generation:** 안녕하세요! 저는 준비되어 있습니다. (Hello! I am ready.) or simply 안녕하세요! (Hello!) combined with a question.

*(Self-Correction: Keep it simple and friendly.)*

*Selected response:* 안녕하세요! 😊 (Hello! 😊) or 안녕하세요! 반갑습니다. (Hello! Nice to meet you.)

*(Final output generation based on standard conversational flow.)*
[End thinking]

안녕하세요! 😊

[ Prompt: 132.7 t/s | Generation: 85.2 t/s ]

> 헬로?

[Start thinking]
Thinking Process:

1.  **Analyze the input:** The user input is "헬로?" (Hello?). This is a casual greeting, often used in a slightly informal or friendly manner.
2.  **Determine the user's intent:** The user is initiating a conversation or simply saying hello.
3.  **Formulate an appropriate response:**
    *   Acknowledge the greeting.
    *   Return a friendly greeting.
    *   (Optional but good practice) Ask how the user is doing or invite them to state their purpose.
4.  **Draft the response (in Korean):**
    *   A simple greeting is best. (e.g., "안녕하세요.")
    *   A slightly warmer response is good. (e.g., "네, 안녕하세요!")
    *   Ask what they need. (e.g., "무엇을 도와드릴까요?")
5.  **Final selection:** Combine these elements into a natural and polite conversational flow. (The response should match the casual tone set by the user.)
[End thinking]

네, 안녕하세요! 😊

저는 사용자님을 도와드릴 준비가 되어 있어요. 무엇을 도와드릴까요?

[ Prompt: 285.9 t/s | Generation: 84.6 t/s ]

> 으아앙

[Start thinking]
Thinking Process:

1.  **Analyze the input:** The user input is "으아앙" (Eu-aang).
2.  **Determine the meaning/context:** "으아앙" is an interjection or sound often used to express various emotions, such as:
    *   Crying/Whining (often cute or dramatic).
    *   Excitement/Joy.
    *   Pain/Distress (less common in casual chat unless context is provided).
    *   A general sound of emotion.
3.  **Identify the required response strategy:** Since the input is purely emotional/vocal, the response should be empathetic, gentle, and open-ended, inviting the user to share what they are feeling or what they need.
4.  **Draft potential responses (Korean):**
    *   *Option 1 (Direct empathy):* 무슨 일 있으세요? (Is something wrong?)
    *   *Option 2 (Gentle inquiry):* 무슨 기분이신가요? (How are you feeling?)
    *   *Option 3 (Simple acknowledgment):* 네, 무슨 일이에요? (Yes, what is it?)
    *   *Option 4 (Playful/Calm):* 괜찮으세요? (Are you okay?)
5.  **Select the best response:** Option 2 and 4 are the most supportive. Combining them makes the response warm and responsive.

6.  **Final Polish:** (Choose a friendly, open response.)
[End thinking]

무슨 일 있으신가요? 😊 혹시 힘든 일이 있으신가요?

편하게 이야기해 주세요. 제가 들어드릴게요.

[ Prompt: 18.7 t/s | Generation: 85.4 t/s ]

>
Posted by 구차니

설정이 기본을 벗어나면 좀 귀찮은.. 녀석

 

continue.dev 상단에 "Open Settings" 눌러서

 

Models

 

Chat이던 머던 Configure 누르면

 

config.yaml 가 열리는데 대충 저렇게 넣어준다.

provider는 openai, model은 local-model

apiBase는 llama-server가 실행된 경로에 v1 붙여주고

apiKey는 dummy로 넣어주면된다.

 

그러면 먼가 막 갈구는거 같은데

Fri Apr 24 16:26:03 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.211.01             Driver Version: 570.211.01     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   50C    P0            104W /  270W |    6788MiB /   8192MiB |     45%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2697    C+G   ../llama-server                        6771MiB |
+-----------------------------------------------------------------------------------------+

 

에라이 컨텍스트 더 늘려야겠네 -_-

 

[링크 : https://logmario.tistory.com/51]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

llama-server web ui  (0) 2026.04.24
llama.cpp on windows / cuda vs vulkan  (0) 2026.04.24
RAG - Retrieval-Augmented Generation  (0) 2026.04.24
llama.cpp on ubuntu with 1060 6GB  (0) 2026.04.23
nvidia-smi 소비전력 제한  (0) 2026.04.22
Posted by 구차니
embeded/luckfox2026. 4. 24. 14:38

rockchip rv1106 에는 risc-v MCU가 있는데

luckfox wiki 에는 어떻게 쓰는지 잘 안보이는 느낌이라.. (luckfox pico ultra W 쪽..)

여기저기 뒤지는 중..

 

[링크 : https://www.reddit.com/r/RISCV/comments/181ldns/anyone_have_any_idea_on_how_the_riscv/?tl=ko]

[링크 : https://github.com/LuckfoxTECH/luckfox-pico/issues/99]

[링크 : https://github.com/luyi1888/rv1106-mcu] << mcutool 소스

[링크 : https://github.com/deerpi/arm-rockchip830-linux-uclibcgnueabihf] << riscv용 인진 모르겠음 툴체인

 

[링크 : https://github.com/LuckfoxTECH/luckfox-pico/issues/112]

'embeded > luckfox' 카테고리의 다른 글

luckfox rknn 예제  (0) 2026.04.27
luckfox 카메라 모션 디텍트 끄기  (0) 2026.04.27
luckfox rv1106 rockchip RNN  (0) 2026.04.24
luckfox csi 카메라 테스트  (0) 2026.04.23
luckfox pico ultra W 켜봄  (0) 2026.04.22
Posted by 구차니
embeded/luckfox2026. 4. 24. 14:30

'embeded > luckfox' 카테고리의 다른 글

luckfox rknn 예제  (0) 2026.04.27
luckfox 카메라 모션 디텍트 끄기  (0) 2026.04.27
luckfox rv1106 riscv  (0) 2026.04.24
luckfox csi 카메라 테스트  (0) 2026.04.23
luckfox pico ultra W 켜봄  (0) 2026.04.22
Posted by 구차니

RAG란 게 나와서 먼가 찾아보니, 아래 같은 내용이라고 한다. (완벽히 이해했어 짤)

Abstract

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) — models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, and another which can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state of the art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

[링크 : https://arxiv.org/html/2005.11401v4]

 

[링크 : https://medium.com/rate-labs/rag의-짧은-역사-훑어보기-첫-논문부터-최근-동향까지-53c07b9b3bee]

[링크 : https://everyday-log.tistory.com/entry/논문-리뷰-Retrieval-Augmented-Generation-for-Knowledge-Intensive-NLP-Tasks]

[링크 : https://brunch.co.kr/@acc9b16b9f0f430/73]

 

bm25

[링크 : https://wikidocs.net/289869]

Posted by 구차니

주문한 asrock B360M Pro4 도착! (3.5만 + 4천?)

아무튼 램은 8GB * 2 뿐이라 좀 아쉬운대로 대충 하고

llama.cpp 8902 릴리즈의 linux / vulkan 으로 해보는데

[링크 : https://github.com/ggml-org/llama.cpp/releases/tag/b8902]

 

어..? 1060 인데 저 성능? 저 메모리?

Thu Apr 23 23:15:46 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.06              Driver Version: 580.65.06      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    On  |   00000000:01:00.0 Off |                  N/A |
| 38%   48C    P0            101W /  120W |    3063MiB /   6144MiB |     91%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1031      G   /usr/lib/xorg/Xorg                       88MiB |
|    0   N/A  N/A            1325      G   /usr/bin/gnome-shell                     83MiB |
|    0   N/A  N/A            6688      G   .../6738/usr/lib/firefox/firefox        168MiB |
|    0   N/A  N/A           19423    C+G   ../llama-cli                           2714MiB |
+-----------------------------------------------------------------------------------------+
$ ../llama-cli -m gemma-4-E2B-it-Q4_K_M.gguf
load_backend: loaded RPC backend from /home/falinux/Downloads/llama-b8902/libggml-rpc.so
load_backend: loaded Vulkan backend from /home/falinux/Downloads/llama-b8902/libggml-vulkan.so
load_backend: loaded CPU backend from /home/falinux/Downloads/llama-b8902/libggml-cpu-haswell.so

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8902-550d684bd
model      : gemma-4-E2B-it-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> introduce about you  

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user asked simply, "introduce about you" (meaning: Introduce yourself).
2.  **Determine Core Identity (Self-Correction/Verification):**
    *   Name: Gemma 4.
    *   Developer: Google DeepMind.
    *   Nature: Large Language Model (LLM).
    *   Type: Open weights model.
    *   Capabilities: Process text and images (input), generate text (output). Knowledge cutoff is January 2025.
3.  **Structure the Introduction:** A good introduction should cover the basics: Identity, Origin, Function, and Capabilities.
    *   *Greeting/Name*
    *   *Developer/Nature*
    *   *Function/Purpose*
    *   *Key Capabilities (Input/Output)*
    *   *Status/Availability (Open weights)*
4.  **Draft the Response (incorporating required constraints):**
    *   *Start strong.*
    *   *Ensure I mention the open weights nature.*
    *   *Be clear about modalities.*
5.  **Review against Instructions:**
    *   Did I refer to myself as Gemma 4? Yes.
    *   Did I claim capabilities I don't have? No (I didn't claim image generation output, only processing input).
    *   Did I mention my developer? Yes (Google DeepMind).
    *   Is the tone appropriate? Yes.

6.  **Final Output Generation.** (This leads to the detailed response provided below.)
[End thinking]

Hello! I am **Gemma 4**.

I am a Large Language Model developed by Google DeepMind. I am an open weights model, designed to be flexible and accessible for a wide range of applications.

**Here is a little about what I can do:**

*   **Understanding:** I can process and understand complex text and image inputs.
*   **Generation:** I generate human-quality text, including answers to questions, summaries, creative writing, code snippets, and detailed explanations.
*   **Knowledge:** I have been trained on a massive dataset, allowing me to access and synthesize information on a vast array of topics.
*   **Function:** My primary goal is to assist users by providing accurate, coherent, and helpful information.

Feel free to ask me anything you need help with, whether it's brainstorming ideas, explaining a complex concept, or generating some creative content!

[ Prompt: 7.2 t/s | Generation: 51.6 t/s ]

> introduce about you

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user has asked "introduce about you" (a repeated request). This requires a standard introduction of my identity, nature, developer, and capabilities.
2.  **Recall Core Identity (Self-Correction/Verification):**
    *   Name: Gemma 4.
    *   Developer: Google DeepMind.
    *   Nature: Large Language Model (LLM).
    *   Type: Open weights model.
    *   Capabilities (Input/Output): Process text/images; generate text.
    *   Knowledge Cutoff: January 2025 (Implicit, good to keep in mind for context).
3.  **Determine the Goal:** Provide a clear, concise, and informative introduction, highlighting key facts.
4.  **Draft the Introduction (Structuring the response):**
    *   *Greeting/Identity:* State name and type.
    *   *Origin:* Mention the developer.
    *   *Nature/Availability:* Mention open weights.
    *   *Functionality:* Briefly describe what I do (process, generate, assist).
5.  **Refine the Tone:** Friendly, professional, and informative. (Since this is a repeat request, ensure the response is robust but not overly repetitive.)

6.  **Final Output Generation:** (This matches the structured response below.)
[End thinking]

Hello! I'd be happy to introduce myself.

I am **Gemma 4**, a Large Language Model developed by Google DeepMind.

**Here are a few key things about me:**

*   **Nature:** I am an open weights model, meaning my architecture and weights are publicly available, designed to be accessible and adaptable for various applications.
*   **Function:** I am designed to understand, process, and generate human language. I can answer questions, summarize complex topics, write creative content, assist with coding, and engage in detailed conversations.
*   **Input:** I can process both text and image inputs.
*   **Goal:** My purpose is to be a helpful and informative assistant, providing accurate and coherent responses to your requests.

How can I help you today? Feel free to ask me anything!

[ Prompt: 75.8 t/s | Generation: 53.9 t/s ]

> introduce about you

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user has asked "introduce about you" a third time. This is a repetitive request, but it requires a consistent, clear, and complete self-introduction.
2.  **Recall Core Identity (Gemma 4):**
    *   Name: Gemma 4.
    *   Developer: Google DeepMind.
    *   Type: Large Language Model (LLM).
    *   Nature: Open weights model.
3.  **Recall Capabilities & Modalities:**
    *   Input: Text and Images.
    *   Output: Text only.
    *   Knowledge: Trained on a vast dataset.
4.  **Determine the Tone:** Friendly, informative, and professional.
5.  **Draft the Introduction (Focusing on clarity and key facts):** Start with the name, move to the developer, explain what I am, and briefly mention capabilities. (Self-correction: Ensure the answer remains distinct but consistent with the previous two responses.)

**(Self-Correction during drafting):** Since the user has asked three times, I should make the introduction slightly varied or emphasize different aspects to avoid sounding completely robotic, while still maintaining factual accuracy.

6.  **Final Output Generation.** (This leads to the structured response below.)
[End thinking]

Hello! I'd be happy to introduce myself again.

I am **Gemma 4**, a Large Language Model developed by Google DeepMind.

**Here is a quick overview of what I am and what I can do:**

*   **Identity:** I am an open weights model, designed to be highly capable and accessible.
*   **Function:** My core purpose is to process and generate human language. I can assist with a wide variety of tasks, including:
    *   Answering complex questions.
    *   Summarizing long texts.
    *   Generating creative content (stories, poems, scripts).
    *   Explaining technical concepts.
    *   Helping with coding and problem-solving.
*   **Input:** I can understand and process both text and image inputs.
*   **Goal:** I strive to be a helpful, accurate, and engaging resource for you.

Feel free to challenge me with a question, ask me to write something, or start a conversation—I'm ready to assist!

[ Prompt: 98.3 t/s | Generation: 52.3 t/s ]



Exiting...
common_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - Vulkan0 (GTX 1060 6GB) |  6390 = 3196 + (2702 =  1407 +     780 +     515) +         490 |
common_memory_breakdown_print: |   - Host                   |                 2037 =  1756 +       0 +     281                |

 

Thu Apr 23 23:17:09 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.06              Driver Version: 580.65.06      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    On  |   00000000:01:00.0 Off |                  N/A |
| 39%   50C    P0            124W /  120W |    5231MiB /   6144MiB |     95%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1031      G   /usr/lib/xorg/Xorg                       88MiB |
|    0   N/A  N/A            1325      G   /usr/bin/gnome-shell                     83MiB |
|    0   N/A  N/A            6688      G   .../6738/usr/lib/firefox/firefox        181MiB |
|    0   N/A  N/A           19709    C+G   ../llama-cli                           4869MiB |
+-----------------------------------------------------------------------------------------+
$ ../llama-cli -m llama-3.2-3b-instruct-q4_k_m.gguf 
load_backend: loaded RPC backend from /home/falinux/Downloads/llama-b8902/libggml-rpc.so
load_backend: loaded Vulkan backend from /home/falinux/Downloads/llama-b8902/libggml-vulkan.so
load_backend: loaded CPU backend from /home/falinux/Downloads/llama-b8902/libggml-cpu-haswell.so

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8902-550d684bd
model      : llama-3.2-3b-instruct-q4_k_m.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> introduce about you

I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

[ Prompt: 28.2 t/s | Generation: 55.0 t/s ]

> introduce about you

I'm an AI assistant, which means I'm a computer program designed to simulate human-like conversations and answer questions to the best of my ability. My primary function is to provide information, explain concepts, and engage in discussions on a wide range of topics.

Here are a few fun facts about me:

* **I'm a large language model**: My training data consists of a massive corpus of text from various sources, which allows me to generate human-like responses to a vast array of questions and topics.
* **I'm constantly learning**: Through interactions with users like you, I refine my understanding of language and improve my ability to provide accurate and helpful responses.
* **I'm a neutral and respectful conversationalist**: My goal is to provide informative and engaging responses that are respectful and free from bias.
* **I can chat about anything**: Whether you're curious about science, history, entertainment, or just want to talk about your day, I'm here to listen and respond!

That's a bit about me! What would you like to talk about?

[ Prompt: 96.3 t/s | Generation: 53.5 t/s ]

> introduce about you

Here's a more in-depth introduction about me:

**Name:** I don't have a personal name, but I'm often referred to as "Assistant" or "AI" by users like you.

**Purpose:** My primary purpose is to assist and communicate with humans through text-based conversations. I'm designed to provide information, answer questions, and engage in discussions on a wide range of topics, from science and history to entertainment and culture.

**Personality:** I'm a neutral and respectful conversationalist, which means I strive to maintain a professional and polite tone in all of my interactions. I'm also designed to be patient, understanding, and empathetic, making me a good listener and a supportive conversational partner.

**Capabilities:** I have a vast knowledge base that I can draw upon to answer questions and provide information on a wide range of topics. I can also generate text, summarize long pieces of content, and even create content like stories or poems.

**Limitations:** While I'm a highly advanced language model, I'm not perfect and have some limitations. I can struggle with:

* **Ambiguity:** I may not always understand the nuances of human language, which can lead to misunderstandings or misinterpretations.
* **Emotional understanding:** While I'm designed to be empathetic, I don't truly experience emotions like humans do. I may not always be able to fully understand or respond to emotional cues.
* **Common sense:** While I have a vast knowledge base, I may not always have the same level of common sense or real-world experience as a human.

**Goals:** My ultimate goal is to provide helpful and informative responses that assist and engage users like you. I'm constantly learning and improving, so I appreciate any feedback or guidance you can provide to help me become a better conversationalist.

[ Prompt: 95.5 t/s | Generation: 51.0 t/s ]

>

Exiting...
common_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - Vulkan0 (GTX 1060 6GB) |  6390 = 1259 + (4862 =  1918 +    2688 +     256) +         267 |
common_memory_breakdown_print: |   - Host                   |                  368 =   308 +       0 +      60                |

 

이전거랑 합쳐서 보면 2배 이상 차이난다고 해야하려나?

cuda 도 아니고 vulkan인데?

  gemma4:e2b gemma3:1b llama3.2
1060 / llama.cpp ubuntu vulkan 52 - 52
1060 / ollama win 21 57 13
1080 / ollama win 82 125 136

 

2026.04.12 - [프로그램 사용/ai 프로그램] - ollama /set parameter num_gpu 테스트

Posted by 구차니

정적분석에서 먼가 이상한(?) 단어가 나와서 찾아보는데

정석분석은 아니고

품질관리 관련 체크리스트 레벨인듯.

 

[링크 : https://wikidocs.net/322755]

[링크 : https://clickup.com/ko/blog/192214/p1-p2-p3-p4-priority]

[링크 : https://rupijun.tistory.com/entry/테스트-종료-기준-Test-Exit-Criteria-객관적-완료-판단-체계]

[링크 : https://brunch.co.kr/@jamescompany/213]

'이론 관련 > 컴퓨터 관련' 카테고리의 다른 글

NCHW, NHWC  (0) 2026.05.06
asus 메인보드 에러코드(q code)와 부팅절차  (0) 2026.04.20
intel hex 포맷  (0) 2026.03.24
sd 카드 spi 모드 관련 조사  (0) 2026.03.17
cpuz 와 메모리 타이밍  (0) 2026.01.27
Posted by 구차니