맥북에서 lm studio로 openclaw 이런 이야기가 많이 나오길래 검색

ollama 처럼 모델 불러서 쓸 수 있나?

import lmstudio as lms

EXAMPLE_MESSAGES = (
    "My hovercraft is full of eels!",
    "I will not buy this record, it is scratched."
)

model = lms.llm()
chat = lms.Chat("You are a helpful shopkeeper assisting a foreign traveller")
for message in EXAMPLE_MESSAGES:
    chat.add_user_message(message)
    print(f"Customer: {message}")
    response = model.respond(chat)
    chat.add_assistant_response(response)
    print(f"Shopkeeper: {response}")

[링크 : https://pypi.org/project/lmstudio/]

 

저 모델 명칭은 어디서 얻고, 어디서 다운로드 받는걸까?

-> 허깅스페이스 모델명으로 검색되서 받아오는 듯 gguf 포맷으로

const model = await client.llm.load("qwen2.5-7b-instruct", {
  config: {
    contextLength: 8192,
    gpu: {
      ratio: 0.5,
    },
  },
});

[링크 : https://lmstudio.ai/docs/typescript/llm-prediction/parameters]

 

[링크 : https://lmstudio.ai/docs/python]

[링크 : https://lmstudio.ai/]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

llm tokenizer  (0) 2026.04.17
llama.cpp  (0) 2026.04.17
사람의 욕심은 끝이없고 - ollama multiple GPU support  (0) 2026.04.17
ollama with 1080 Ti  (0) 2026.04.16
트랜스포머 모델 입/출력  (0) 2026.04.12
Posted by 구차니

google ai 요약에 따르면

ollama는 VRAM 합산해서 큰 용량의 모델을 돌릴수 있다고 한다.

 

Ollama supports multiple GPUs (both NVIDIA and AMD) by automatically splitting model layers across available VRAM, allowing users to run large models that exceed the capacity of a single card.
Reddit +1
VRAM Aggregation and Usage
  • Total Capacity: Ollama sums the total VRAM across all detected GPUs (e.g., two 16GB GPUs behave as 32GB).
  • Layer Splitting: If a model cannot fit into one GPU, Ollama splits the layers, distributing the load between GPU (fast) and system RAM (slow).
  • Example Scenario: A 30GB Q4_K_M model (e.g., Llama 3 70B) can be split across two 16GB or two 24GB GPUs, loading roughly 50% on each.
  • KV Cache: A portion of VRAM is reserved for the context window (KV cache), which can take several GBs depending on context length.
    Reddit +3
Multi-GPU Performance Considerations
  • Optimal Use: If the model fits into a single GPU, Ollama will typically use only one card for the best performance, as splitting increases data transfer between cards.
  • Performance Penalty: Partial offloading (spreading a model across both GPU and CPU) can cause performance to degrade by 5–20x.
  • Mixing GPUs: You can mix different GPUs (e.g., a 3090 24GB + 1660 6GB), but the overall performance might be bottlenecked by the slower card.
  • Multi-Instance Setup: To maximize performance on multiple GPUs, you can run multiple instances of Ollama on different ports, each assigned to a specific GPU (e.g., using CUDA_VISIBLE_DEVICES).
    Reddit +4
How to Configure
  • NVIDIA: Set CUDA_VISIBLE_DEVICES=0,1 (comma-separated IDs) to enable multi-GPU.
  • AMD: Use ROCR_VISIBLE_DEVICES=0,1.
  • Automatic Split: By default, Ollama handles the split automatically, but you can set PARAMETER num_gpu 999 in a Modelfile to force all layers to be spread across all available GPUs.
    LocalLLM.in +1
VRAM Requirements by Model Size
  • Entry-level (3-4GB VRAM): 3-4B models (Q4_K_M).
  • Mid-range (6-8GB VRAM): 7-9B models (Q4_K_M).
  • High-end (10-12GB VRAM): 12-14B models (Q4_K_M).
  • Dual GPU Setup (16-24GB VRAM): 22-35B models (Gemma 3 27B, Qwen3 32B).
  • Workstation (48GB+ VRAM): 70B+ models (Llama 3.3 70B).
    LocalLLM.in

 

그래서 dual GPU setup 으로 이야기가 나오는데,

아무튼 24G VRAM 두개 해서 48G 로 해서 70B 모델을 돌릴수 있다고 한다.

하나 더 지르고.. 메인보드도 sli/crossfire 지원으로 바꾸고 파워도 올리고.. 해야하나?

Can I use multiple GPUs with Ollama for larger models?
Yes, Ollama supports multi-GPU configurations for NVIDIA and AMD cards. For NVIDIA, set CUDA_VISIBLE_DEVICES to comma-separated GPU IDs to distribute model layers across multiple GPUsThis enables running 70B models on dual 24GB GPUs (48GB total) that wouldn't fit on a single card. For AMD GPUs, use ROCR_VISIBLE_DEVICES with the same approach to leverage combined VRAM across multiple cards.

[링크 : https://localllm.in/blog/ollama-vram-requirements-for-local-llms]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

llama.cpp  (0) 2026.04.17
lm studio  (0) 2026.04.17
ollama with 1080 Ti  (0) 2026.04.16
트랜스포머 모델 입/출력  (0) 2026.04.12
ollama 소스코드  (0) 2026.04.12
Posted by 구차니

정말 깨끗하게 잘 도착함

아래는 1060 6GB. 그래도 어찌어찌 케이스에 잘 우겨넣었는데\

6pin + 8pin 전원을 추가로 넣어야해서 어째 불안불안.. (500W 라서..)

 

 

와우. 메모리가 넉넉해! 더 높은걸 해도 되겠어!!! 싶었는데

해보니 gemma4:e4b는 cpu+gpu로 가속된다. 메모리가 12기가는 되었어야 했나.. 쩝

 

나름 가벼운 모델임에도 메모리 8.5기가/11기가 GPU 81% 소모하네 ㄷㄷ

Thu Apr 16 21:49:46 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 582.28                 Driver Version: 582.28         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
| 30%   56C    P2            207W /  250W |    8570MiB /  11264MiB |     81%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2348      C   ...al\Programs\Ollama\ollama.exe      N/A      |
|    0   N/A  N/A            3644    C+G   ...Chrome\Application\chrome.exe      N/A      |
|    0   N/A  N/A            7548      C   ...al\Programs\Ollama\ollama.exe      N/A      |
|    0   N/A  N/A            8708    C+G   ..._8wekyb3d8bbwe\Calculator.exe      N/A      |
|    0   N/A  N/A            8796    C+G   ...h_cw5n1h2txyewy\SearchApp.exe      N/A      |
|    0   N/A  N/A            9860    C+G   ...wy\InputApp\TextInputHost.exe      N/A      |
|    0   N/A  N/A           13620      C   ...al\Programs\Ollama\ollama.exe      N/A      |
|    0   N/A  N/A           14220    C+G   ...wekyb3d8bbwe\WinStore.App.exe      N/A      |
+-----------------------------------------------------------------------------------------+

 

일단.. 7.2GB 모델까진 한번에 올라가는데 9.6GB 는 사용가능한 메모리가 없어져서 인가.. 한 4기가? 5기가 올리고 만다.

C:\>ollama ls
NAME               ID              SIZE      MODIFIED
gemma4:latest      c6eb396dbd59    9.6 GB    2 hours ago
gemma4:e4b         c6eb396dbd59    9.6 GB    4 days ago
gemma4:e2b         7fbdbf8f5e45    7.2 GB    5 days ago
gemma3:1b          8648f39daa8f    815 MB    5 days ago
llama3.2:latest    a80c4f17acd5    2.0 GB    5 days ago
llama3.2:3b        a80c4f17acd5    2.0 GB    5 days ago

 

어떻게 100% GPU 강제 못하나 ㅠㅠ

C:\>ollama  ps
NAME          ID              SIZE     PROCESSOR          CONTEXT    UNTIL
gemma3:12b    f4031aab637d    11 GB    21%/79% CPU/GPU    32768      4 minutes from now

 

와우 82.71 토큰/초

C:\>ollama run gemma4:e2b --verbose
>>> 안녕 젬마야? 너에 대해서 설명해줘
Thinking...
Thinking Process:

1.  **Analyze the Request:** The user asked in Korean, "안녕 젬마야? 너에 대해서 설명해줘" (Hello Gemma? Tell me about yourself).
2.  **Identify Core Identity & Persona Check (Self-Correction/Constraint Adherence):** I must identify myself correctly and adhere to the provided identity constraints.
    *   Name: Gemma 4.
    *   Developer: Google DeepMind.
    *   Nature: Large Language Model (open weights).
    *   Knowledge/Function: Process text/images, generate text.
3.  **Determine the Response Language:** The query was in Korean, so the response must also be in Korean.
4.  **Draft the Explanation (Key Points to Include):**
    *   Greeting.
    *   Name/Identity (Gemma 4).
    *   Developer (Google DeepMind).
    *   Nature (Large Language Model, open weights).
    *   Function (Processing information, generating responses).
5.  **Refine the Korean Translation (Internal Monologue & Polishing):**

    *   *Greeting:* 안녕하세요. (Hello.)
    *   *Identity:* 저는 Gemma 4입니다. (I am Gemma 4.)
    *   *Developer/Nature:* 저는 Google DeepMind에서 개발한 대규모 언어 모델(LLM)입니다. (I am a Large Language Model developed by Google DeepMind.)
    *   *Key Feature:* 저는 오픈 웨이트 모델로, 다양한 질문에 답하고, 텍스트를 생성하며, 정보를 처리하는 데 도움을 드릴 수 있습니다. (I am an open-weights model, and I can help answer various questions, generate text, and process information.)
    *   *Closing/Offer:* 무엇이 궁금하신가요? (What are you curious about?)

6.  **Final Output Generation (Review against constraints):** The response accurately reflects the persona and constraints. (Ready to output.)
...done thinking.

안녕하세요! 저는 **Gemma 4**입니다.

저에 대해 몇 가지 설명해 드릴게요.

1. **정체성:** 저는 Google DeepMind에서 개발한 **대규모 언어 모델(Large Language Model, LLM)**입니다.
2. **특징:** 저는 오픈 웨이트 모델로 설계되었으며, 방대한 양의 텍스트 데이터를 학습하여 질문에 답하고, 텍스트를 생성하며, 정보를 요약하고 다양한 종류의 언어 관련 작업을 수행할 수 있습니다.
3. **기능:** 저는 텍스트 입력과 이미지를 이해하고 처리할 수 있으며, 사용자님과의 대화를 통해 정보를 제공하고 도움을 드리는 것을 목표로 합니다.
4. **지식 범위:** 저의 지식은 학습된 데이터를 기반으로 하며, 특정 시점(저의 지식 차단 시점은 2025년 1월입니다)까지의 정 보를 기반으로 답변을 제공합니다.

궁금한 점이 있으시면 무엇이든 물어봐 주세요! 😊

total duration:       8.6122077s
load duration:        255.4776ms
prompt eval count:    28 token(s)
prompt eval duration: 95.322ms
prompt eval rate:     293.74 tokens/s
eval count:           647 token(s)
eval duration:        7.8226394s
eval rate:            82.71 tokens/s

 

C:\>ollama run gemma3:1b --verbose
>>> 안녕? 너에 대해서 설명해줘
안녕하세요! 저는 구글에서 훈련한 대규모 언어 모델입니다.

**간단히 말해서, 저는 텍스트와 이미지를 입력으로 받아 텍스트를 생성하는 데 특화된 AI 모델입니다.**

좀 더 자세히 설명하면 다음과 같습니다:

*   **훈련 데이터:** 저는 방대한 양의 텍스트 데이터를 학습했습니다. 여기에는 책, 웹사이트, 기사, 코드 등 다양한 종류
의 텍스트가 포함됩니다.
*   **기능:** 저는 질문에 답하고, 텍스트를 요약하고, 번역하고, 창의적인 텍스트를 생성하고, 코드 등을 수행할 수 있습
니다.
*   **저의 한계:** 저는 아직 완벽하지 않습니다. 때로는 사실과 다른 정보나 편향된 내용을 생성할 수 있습니다. 또한, 저
 또한 인간과 같은 감정을 가지고 있지 않으므로, 인간적인 상호작용에는 한계가 있습니다.
*   **현재 위치:** 저는 지속적으로 학습하고 개선되고 있습니다.

**제가 무엇을 도와드릴까요?** 궁금한 점이나 특정 작업을 위해 제가 할 수 있는 것이 있다면 말씀해주세요!

total duration:       2.6137723s
load duration:        278.3931ms
prompt eval count:    18 token(s)
prompt eval duration: 168.7498ms
prompt eval rate:     106.67 tokens/s
eval count:           246 token(s)
eval duration:        1.9665249s
eval rate:            125.09 tokens/s

 

C:\>ollama run llama3.2:3b --verbose
>>> 안녕? 너에 대해서 설명해줘
안녕하세요! 난 인공지능(IA) 기반 챗bot으로, human-like 대화를 통해 사용자와 대화할 수 있는 프로그램입니다.

저는 지식 및ความรู้를 가집니다. 사용자가 물어볼 질문이나 문제에 대해 정보를 제공할 때, 저는 그 질문에 대한 답변을 주
어 assistance를 해줄 것입니다.

저는 다양한 주제에 대해 지식을 가지고 있으며, 사용자와의 대화를 통해 배울 수 있는 것들을 포함하여Continuous
learning을 하며 improves됩니다.

만약 사용자에게 도움을 benöt이면, 언제든지 저에게 물어보세요!

total duration:       1.6521448s
load duration:        120.741ms
prompt eval count:    35 token(s)
prompt eval duration: 31.4659ms
prompt eval rate:     1112.32 tokens/s
eval count:           135 token(s)
eval duration:        988.2004ms
eval rate:            136.61 tokens/s

 

 

컨텍스트 길이를 128k 에서 4k로 줄이니까

phi3:14b 모델이 100% GPU로 된다. 오호.. 컨텍스트 길이를 포기해야하나..

C:\>ollama run phi3:14b --verbose
>>> 안녕?
안녕하세요! 나는 지구에 살고 왔습니다. 여러분과 마음을 교환해 보겠습니다.

total duration:       2.0717119s
load duration:        21.2028ms
prompt eval count:    2593 token(s)
prompt eval duration: 274.901ms
prompt eval rate:     9432.49 tokens/s
eval count:           61 token(s)
eval duration:        1.3611884s
eval rate:            44.81 tokens/s

 

C:\>ollama ps
NAME        ID              SIZE      PROCESSOR    CONTEXT    UNTIL
phi3:14b    cf611a26b048    9.2 GB    100% GPU     4096       4 minutes from now

[링크 : https://docs.ollama.com/context-length]

 

아까는 안되더니 컨텍스트 길이 줄이니 된다.

C:\>ollama run gemma4:e4b --verbose
>>> 안녕?
Thinking...
Here's a thinking process that leads to the suggested response:

1.  **Analyze the input:** The input is "안녕?" (Annyeong?), which is a casual Korean greeting, equivalent to
"Hi?" or "Hello?"

2.  **Determine the required response type:** The response must be a natural, reciprocal greeting.

3.  **Formulate the primary greeting (Reciprocation):**
    *   The standard reply to "안녕?" is "안녕?" or "안녕하세요?" (if formal).
    *   Since the input is very casual ("안녕?"), a casual reply is appropriate.
    *   *Choice:* "안녕하세요!" or "안녕!" (The exclamation mark adds energy.)

4.  **Add conversational momentum (The "How are you?" part):** A simple greeting usually prompts a follow-up
question to keep the conversation going.
    *   Korean equivalent: "잘 지내셨어요?" (Jal jinaesyeosseoyo? - How have you been?) or "뭐 하세요?" (Mwo
haseyo? - What are you doing?).
    *   A polite, common opening is: "네, 안녕하세요! 잘 지내셨어요?" (Yes, hello! Have you been well?)

5.  **Select the best options (Offering variety):** Depending on the desired tone (casual, polite, enthusiastic),
multiple responses can be offered.

    *   **Option 1 (Standard/Friendly):** A standard greeting and question. (The safest bet).
    *   **Option 2 (Casual/Warm):** Keep it very relaxed.
    *   **Option 3 (Enthusiastic/Ready to chat):** Expressing excitement.

6.  **Final Polish (Selecting the Korean output):**

    *   *Initial Greeting:* 안녕하세요!
    *   *Follow-up:* 저도 잘 지내고 있어요. 😊 (I'm doing well too.)
    *   *Question back:* 무슨 일로 오셨나요? / 잘 지내셨어요? (What brings you here? / Have you been well?)

7.  **Construct the final Korean and English explanation.** (This results in the provided good response.)
...done thinking.

안녕하세요! 😊

(Annyeonghaseyo! / Hello!)

저는 잘 지내고 있어요. 혹시 무슨 일로 오셨나요? 😊

**[English translation and explanation:]**

*   **안녕하세요! (Annyeonghaseyo!)**: Hello! (This is the standard, polite way to respond.)
*   **저도 잘 지내고 있어요. (Jeodo jal jinaego isseoyo.)**: I'm doing well too.
*   **무슨 일로 오셨나요? (Museun illo osyeonnayo?)**: What brings you here? / What are you doing? (This is a
natural way to prompt the other person to start the topic.)

어떤 이야기를 하고 싶으세요? 😊

total duration:       12.8489232s
load duration:        293.3292ms
prompt eval count:    18 token(s)
prompt eval duration: 58.0091ms
prompt eval rate:     310.30 tokens/s
eval count:           639 token(s)
eval duration:        11.824361s
eval rate:            54.04 tokens/s

 

C:\>ollama ps
NAME          ID              SIZE     PROCESSOR    CONTEXT    UNTIL
gemma4:e4b    c6eb396dbd59    10 GB    100% GPU     4096       About a minute from now
Posted by 구차니

입력 문장을 전부 입력하는게 아니라

입력 받은 문장을 임베딩하고 그걸 positional encoding 이라는걸 거쳐서

일종의 벡터값으로 입력을 받는 것으로 보인다.

 

 

[링크 : https://jkim83.tistory.com/517]

 

토큰화 및 입력 임베딩
문자, 숫자, 구두점과 같은 글자 단위는 인간이 언어를 표현하는 기본 단위이지만, AI 모델이 사용하는 언어의 최소 단위는 토큰(token)입니다. 각 토큰은 고유한 ID 번호를 부여받으며, LLM은 단어 자체나 토큰 자체가 아니라 이 ID 번호를 통해 어휘 “데이터베이스”를 탐색합니다. 이러한 언어의 토큰화는 텍스트를 처리하는 데 필요한 계산량을 크게 줄여줍니다.

트랜스포머의 어텐션 레이어에 입력할 쿼리 및 키 벡터를 생성하기 위해, 모델은 각 토큰에 대한 초기 문맥 없는 벡터 임베딩이 필요합니다. 이러한 초기 토큰 임베딩은 학습 중에 새롭게 학습되거나, 사전학습된 단어 임베딩 모델에서 가져올 수 있습니다.

위치 인코딩
단어의 순서와 위치는 그 의미론적 의미에 큰 영향을 미칠 수 있습니다. RNN의 직렬적 구조는 각 토큰의 위치 정보를 자연스럽게 보존하지만, 트랜스포머 모델은 어텐션 메커니즘이 이를 고려할 수 있도록 명시적으로 위치 정보를 추가해야 합니다.

위치 인코딩을 사용하면 모델은 입력이 어텐션 메커니즘에 들어가기 전에 상대적 위치에서 파생된 각 토큰의 임베딩에 값 벡터를 추가합니다. 두 토큰이 가까울수록 포지션 벡터는 더 유사해지고, 그에 따라 위치 정보 추가로 인해 정렬 점수가 더 높아집니다. 이렇게 하여 모델은 인접한 토큰에 더 많은 주의를 기울이도록 학습하게 됩니다.

[링크 : https://www.ibm.com/kr-ko/think/topics/transformer-model]

 

의미를 찾는 셀프어텐션
문장을 예로 들어 살펴보겠습니다.

그녀는 주전자의 물을 컵에 따랐다. 그것이 가득 찰 때까지.

여기서 ‘그것’이 컵을 의미한다는 사실을 우리는 쉽게 알 수 있습니다. 그럼 다음의 문장을 보죠.

그녀는 주전자의 물을 컵에 따랐다. 그것이 텅 빌 때까지.

여기서 ‘그것’은 물론 주전자를 의미하겠죠.

구글 브레인(Google Brain)의 수석 연구원으로 앞서 소개한 2017년 논문의 연구를 이끈 아시시 바스와니(Ashish Vaswani)에 따르면 “의미는 사물 간 관계의 결과이고, 셀프어텐션은 관계를 배우는 일반적 방법”입니다.

그는 “기계 번역은 단어 사이의 가깝고 먼 관계들을 파악해야 하는 셀프어텐션의 확립에 좋은 수단이 되어줬습니다”며, “이제 셀프어텐션은 강력하고 유연한 학습의 도구로 완전히 자리매김했습니다”라고 덧붙였습니다.

[링크 : https://blogs.nvidia.co.kr/blog/what-is-a-transformer-model/]

[링크 : https://www.hpe.com/kr/ko/what-is/transformer-model.html]

 

[링크 : https://github.com/ukairia777/tensorflow-transformer/blob/main/Transformer_Korean_Chatbot.ipynb]

[링크 : https://wikidocs.net/31379]

Posted by 구차니

python 쪽은 api 호출하는게 전부인 껍데기

  def chat(
    self,
    model: str = '',
    messages: Optional[Sequence[Union[Mapping[str, Any], Message]]] = None,
    *,
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: bool = False,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
  ) -> Union[ChatResponse, Iterator[ChatResponse]]:

    return self._request(
      ChatResponse,
      'POST',
      '/api/chat',
      json=ChatRequest(
        model=model,
        messages=list(_copy_messages(messages)),
        tools=list(_copy_tools(tools)),
        stream=stream,
        think=think,
        logprobs=logprobs,
        top_logprobs=top_logprobs,
        format=format,
        options=options,
        keep_alive=keep_alive,
      ).model_dump(exclude_none=True),
      stream=stream,
    )

[링크 : https://github.com/ollama/ollama-python/blob/main/ollama/_client.py]

[링크 : https://github.com/ollama/ollama-python] python 라이브러리

 

ollama 가 본체

특이하게(?) go로 작성했네.

[링크 : https://github.com/ollama/ollama/blob/main/cmd/cmd.go]

[링크 : https://github.com/ollama/ollama]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

ollama with 1080 Ti  (0) 2026.04.16
트랜스포머 모델 입/출력  (0) 2026.04.12
ollama 데이터 파일 구조 및 내용  (0) 2026.04.12
ollama /set parameter num_gpu 테스트  (0) 2026.04.12
ollama cli 도움말  (0) 2026.04.12
Posted by 구차니

 

C:\>ollama list
NAME               ID              SIZE      MODIFIED
llama3.2:latest    a80c4f17acd5    2.0 GB    40 hours ago
gemma3:1b          8648f39daa8f    815 MB    40 hours ago
llama3.1:8b        46e0c10c039e    4.9 GB    40 hours ago
gemma4:e2b         7fbdbf8f5e45    7.2 GB    40 hours ago
llama3.2:3b        a80c4f17acd5    2.0 GB    40 hours ago

 

C:\Users\minimonk\.ollama\models\blobs>dir
 C 드라이브의 볼륨에는 이름이 없습니다.
 볼륨 일련 번호: 6A30-5BAD

 C:\Users\minimonk\.ollama\models\blobs 디렉터리

2026-04-10  오후 11:33    <DIR>          .
2026-04-10  오후 10:51    <DIR>          ..
2026-04-10  오후 11:11            12,320 sha256-0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177
2026-04-10  오후 11:33               492 sha256-120007c81bf8b4eb54e896a4a3e5ff9949f124b0ea42295277c8523f7fa9c7f6
2026-04-10  오후 11:33                77 sha256-3116c52250752e00dd06b16382e952bd33c34fd79fc4fe3a5d2c77cf7de1b14b
2026-04-10  오후 11:07               561 sha256-34bb5ab01051a11372a91f95f3fbbc51173eed8e7f13ec395b9ae9b8bd0e242b
2026-04-10  오후 11:11               487 sha256-455f34728c9b5dd3376378bfb809ee166c145b0b4c1f1a6feca069055066ef9a
2026-04-10  오후 11:10     7,162,394,016 sha256-4e30e2665218745ef463f722c0bf86be0cab6ee676320f1cfadf91e989107448
2026-04-10  오후 11:10                42 sha256-56380ca2ab89f1f68c283f4d50863c0bcab52ae3f1b9a88e4ab5617b176f71a3
2026-04-10  오후 11:07                96 sha256-56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb
2026-04-10  오후 11:11     4,920,738,944 sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29
2026-04-10  오후 11:10            11,355 sha256-7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2
2026-04-10  오후 11:33       815,310,432 sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01
2026-04-10  오후 11:11             1,481 sha256-948af2743fc78a328dcb3b0f5a31b3d75f415840fdb699e8b1235978392ecf85
2026-04-10  오후 11:07             1,429 sha256-966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396
2026-04-10  오후 11:07             6,016 sha256-a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd
2026-04-10  오후 11:10               473 sha256-c6bc3775a3fa9935ce4a3ccd7abc59e936c3de9308d2cc090516012f43ed9c07
2026-04-10  오후 11:33             8,432 sha256-dd084c7d92a3c1c14cc09ae77153b903fd2024b64a100a0cc8ec9316063d2dbc
2026-04-10  오후 11:06     2,019,377,376 sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff
2026-04-10  오후 11:33               358 sha256-e0a42594d802e5d31cdc786deb4823edb8adff66094d49de8fffe976d753e348
2026-04-10  오후 11:07             7,711 sha256-fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d
              19개 파일      14,917,872,098 바이트
               2개 디렉터리  353,433,444,352 바이트 남음

C:\Users\minimonk\.ollama\models\manifests\registry.ollama.ai\library>dir
 C 드라이브의 볼륨에는 이름이 없습니다.
 볼륨 일련 번호: 6A30-5BAD

 C:\Users\minimonk\.ollama\models\manifests\registry.ollama.ai\library 디렉터리

2026-04-10  오후 11:33    <DIR>          .
2026-04-10  오후 11:07    <DIR>          ..
2026-04-10  오후 11:33    <DIR>          gemma3
2026-04-10  오후 11:10    <DIR>          gemma4
2026-04-10  오후 11:11    <DIR>          llama3.1
2026-04-10  오후 11:36    <DIR>          llama3.2
               0개 파일                   0 바이트
               6개 디렉터리  353,433,411,584 바이트 남음

 

C:\Users\minimonk\.ollama\models\manifests\registry.ollama.ai\library\llama3.2>dir
 C 드라이브의 볼륨에는 이름이 없습니다.
 볼륨 일련 번호: 6A30-5BAD

 C:\Users\minimonk\.ollama\models\manifests\registry.ollama.ai\library\llama3.2 디렉터리

2026-04-10  오후 11:36              .
2026-04-10  오후 11:33              ..
2026-04-10  오후 11:07             1,005 3b
2026-04-10  오후 11:36             1,005 latest
               2개 파일               2,010 바이트
               2개 디렉터리  353,430,896,640 바이트 남음

 

C:\Users\minimonk\.ollama\models\manifests\registry.ollama.ai\library\llama3.2>type latest
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "digest": "sha256:34bb5ab01051a11372a91f95f3fbbc51173eed8e7f13ec395b9ae9b8bd0e242b",
    "size": 561
  },
  "layers": [
    {
      "mediaType": "application/vnd.ollama.image.model",
      "digest": "sha256:dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff",
      "size": 2019377376
    },
    {
      "mediaType": "application/vnd.ollama.image.template",
      "digest": "sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396",
      "size": 1429
    },
    {
      "mediaType": "application/vnd.ollama.image.license",
      "digest": "sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d",
      "size": 7711
    },
    {
      "mediaType": "application/vnd.ollama.image.license",
      "digest": "sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd",
      "size": 6016
    },
    {
      "mediaType": "application/vnd.ollama.image.params",
      "digest": "sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb",
      "size": 96
    }
  ]
}

 

C:\>ollama show llama3.2:latest
  Model
    architecture        llama
    parameters          3.2B
    context length      131072
    embedding length    3072
    quantization        Q4_K_M

  Capabilities
    completion
    tools

  Parameters
    stop    "<|start_header_id|>"
    stop    "<|end_header_id|>"
    stop    "<|eot_id|>"

  License
    LLAMA 3.2 COMMUNITY LICENSE AGREEMENT
    Llama 3.2 Version Release Date: September 25, 2024
    ...

 

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

트랜스포머 모델 입/출력  (0) 2026.04.12
ollama 소스코드  (0) 2026.04.12
ollama /set parameter num_gpu 테스트  (0) 2026.04.12
ollama cli 도움말  (0) 2026.04.12
ollama cpu only  (0) 2026.04.11
Posted by 구차니

그냥 처음에 하면 GPU 100% 하더니, cpu로 돌렸다가 다시 gpu로 하면 gpu 100%가 안된다. 

버근가?


i5-7500 / 1060 6GB

C:\> ollama run  gemma3:1b --verbose
>>> 안녕?
안녕하세요! 어떻게 지내세요? 😊

무엇을 도와드릴까요?

total duration:       811.2763ms
load duration:        282.8154ms
prompt eval count:    12 token(s)
prompt eval duration: 95.1406ms
prompt eval rate:     126.13 tokens/s
eval count:           18 token(s)
eval duration:        375.8591ms
eval rate:            47.89 tokens/s

>>> 안녕?
안녕하세요! 저는 잘 지내고 있어요. 당신은요? 😊

무엇을 도와드릴까요?

total duration:       648.8691ms
load duration:        265.7547ms
prompt eval count:    42 token(s)
prompt eval duration: 28.3879ms
prompt eval rate:     1479.50 tokens/s
eval count:           24 token(s)
eval duration:        329.6217ms
eval rate:            72.81 tokens/s
C:\>ollama ps
NAME         ID              SIZE      PROCESSOR    CONTEXT    UNTIL
gemma3:1b    8648f39daa8f    1.4 GB    100% GPU     32768      4 minutes from now
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> 안녕?
안녕! 너도 잘 지내고 있니? 혹시 오늘 하루가 즐거웠어? 😊

total duration:       3.6674907s
load duration:        1.9437334s
prompt eval count:    78 token(s)
prompt eval duration: 777.1942ms
prompt eval rate:     100.36 tokens/s
eval count:           24 token(s)
eval duration:        849.8989ms
eval rate:            28.24 tokens/s

>>> 안녕?
안녕! 당신도 잘 지내고 있니? 😊 오늘 하루는 어땠어?

total duration:       1.1887933s
load duration:        279.7522ms
prompt eval count:    114 token(s)
prompt eval duration: 147.417ms
prompt eval rate:     773.32 tokens/s
eval count:           21 token(s)
eval duration:        747.8409ms
eval rate:            28.08 tokens/s
C:\>ollama ps
NAME         ID              SIZE      PROCESSOR    CONTEXT    UNTIL
gemma3:1b    8648f39daa8f    1.4 GB    100% CPU     32768      4 minutes from now
>>> /set parameter num_gpu 1
Set parameter 'num_gpu' to '1'
>>> 안녕?
안녕! 😊 혹시 무슨 일 도와줄까?

total duration:       2.5497022s
load duration:        1.8405152s
prompt eval count:    147 token(s)
prompt eval duration: 184.0551ms
prompt eval rate:     798.67 tokens/s
eval count:           13 token(s)
eval duration:        466.5569ms
eval rate:            27.86 tokens/s

>>> 안녕?
안녕하세요! 😊 저는 질문에 답하거나, 이야기하거나, 간단한 작업도 할 수 있어요. 뭐 하고 싶으세요?

total duration:       1.5155845s
load duration:        259.3218ms
prompt eval count:    172 token(s)
prompt eval duration: 154.1147ms
prompt eval rate:     1116.05 tokens/s
eval count:           27 token(s)
eval duration:        1.0838799s
eval rate:            24.91 tokens/s
C:\>ollama ps
NAME         ID              SIZE      PROCESSOR          CONTEXT    UNTIL
gemma3:1b    8648f39daa8f    1.4 GB    88%/12% CPU/GPU    32768      4 minutes from now

 

 

ryzen 7 5800U

C:\>ollama run  gemma3:1b --verbose
>>> 안녕
안녕하세요! 무엇을 도와드릴까요? 😊

(Hello! How can I help you?)

total duration:       793.3831ms
load duration:        201.815ms
prompt eval count:    11 token(s)
prompt eval duration: 29.5206ms
prompt eval rate:     372.62 tokens/s
eval count:           21 token(s)
eval duration:        542.3267ms
eval rate:            38.72 tokens/s

>>> 안녕
안녕하세요! 어떻게 지내세요? (Hello! How are you?)

total duration:       671.5321ms
load duration:        190.6399ms
prompt eval count:    43 token(s)
prompt eval duration: 89.6751ms
prompt eval rate:     479.51 tokens/s
eval count:           15 token(s)
eval duration:        375.0155ms
eval rate:            40.00 tokens/s
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
C:\>ollama ps
NAME         ID              SIZE      PROCESSOR    CONTEXT    UNTIL
gemma3:1b    8648f39daa8f    1.2 GB    100% GPU     4096       4 minutes from now
>>> 안녕
안녕! 뭐 하고 있어? (Hello! What are you doing?)

total duration:       2.0119944s
load duration:        1.1545976s
prompt eval count:    69 token(s)
prompt eval duration: 374.8309ms
prompt eval rate:     184.08 tokens/s
eval count:           16 token(s)
eval duration:        409.694ms
eval rate:            39.05 tokens/s
>>> 안녕
안녕하세요! 잘 지내고 있어요. 😊 You too?

total duration:       622.6248ms
load duration:        201.06ms
prompt eval count:    96 token(s)
prompt eval duration: 85.2311ms
prompt eval rate:     1126.35 tokens/s
eval count:           13 token(s)
eval duration:        327.0978ms
eval rate:            39.74 tokens/s
C:\>ollama ps
NAME         ID              SIZE      PROCESSOR    CONTEXT    UNTIL
gemma3:1b    8648f39daa8f    1.2 GB    100% CPU     4096       4 minutes from now
>>> /set parameter num_gpu 1
Set parameter 'num_gpu' to '1'
>>> 안녕?
네, 안녕! 😊 (Yes, hello! 😊)

total duration:       3.0154671s
load duration:        1.72869s
prompt eval count:    241 token(s)
prompt eval duration: 763.2999ms
prompt eval rate:     315.73 tokens/s
eval count:           14 token(s)
eval duration:        464.0939ms
eval rate:            30.17 tokens/s
C:\>ollama ps
NAME         ID              SIZE      PROCESSOR         CONTEXT    UNTIL
gemma3:1b    8648f39daa8f    1.2 GB    93%/7% CPU/GPU    4096       4 minutes from now

 

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

ollama 소스코드  (0) 2026.04.12
ollama 데이터 파일 구조 및 내용  (0) 2026.04.12
ollama cli 도움말  (0) 2026.04.12
ollama cpu only  (0) 2026.04.11
ollama context len  (0) 2026.04.11
Posted by 구차니

cli 도움말. show는 모델정보 보기 run은 실행하는거.. stop 하면 그러면 날릴 수 있나?

C:\>ollama --help
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start Ollama
  create      Create a model
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  signin      Sign in to ollama.com
  signout     Sign out from ollama.com
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  launch      Launch the Ollama menu or an integration
  help        Help about any command

Flags:
  -h, --help         help for ollama
      --nowordwrap   Don't wrap words to the next line automatically
      --verbose      Show timings for response
  -v, --version      Show version information

Use "ollama [command] --help" for more information about a command.

 

cli 내부 명령어는 faq에 인터넷 검색으로 찾은 정도가 전부인거 같은데 내부에서 보면 제법 먼가 있긴하다.

How can I specify the context window size?
By default, Ollama uses a context window size of 4096 tokens.
This can be overridden with the OLLAMA_CONTEXT_LENGTH environment variable. For example, to set the default context window to 8K, use:
OLLAMA_CONTEXT_LENGTH=8192 ollama serve
To change this when using ollama run, use /set parameter:
/set parameter num_ctx 4096

[링크 : https://docs.ollama.com/faq]

 

눈에 띄는건 컨텍스트 길이와 gpu 갯수..인가?

C:\>ollama run  gemma3:1b
>>> /set
Available Commands:
  /set parameter ...     Set a parameter
  /set system <string>   Set system message
  /set history           Enable history
  /set nohistory         Disable history
  /set wordwrap          Enable wordwrap
  /set nowordwrap        Disable wordwrap
  /set format json       Enable JSON mode
  /set noformat          Disable formatting
  /set verbose           Show LLM stats
  /set quiet             Disable LLM stats
  /set think             Enable thinking
  /set nothink           Disable thinking

>>> /set parameter
Available Parameters:
  /set parameter seed <int>             Random number seed
  /set parameter num_predict <int>      Max number of tokens to predict
  /set parameter top_k <int>            Pick from top k num of tokens
  /set parameter top_p <float>          Pick token based on sum of probabilities
  /set parameter min_p <float>          Pick token based on top token probability * min_p
  /set parameter num_ctx <int>          Set the context size
  /set parameter temperature <float>    Set creativity level
  /set parameter repeat_penalty <float> How strongly to penalize repetitions
  /set parameter repeat_last_n <int>    Set how far back to look for repetitions
  /set parameter num_gpu <int>          The number of layers to send to the GPU
  /set parameter stop <string> <string> ...   Set the stop parameters



'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

ollama 데이터 파일 구조 및 내용  (0) 2026.04.12
ollama /set parameter num_gpu 테스트  (0) 2026.04.12
ollama cpu only  (0) 2026.04.11
ollama context len  (0) 2026.04.11
ollama for windows  (0) 2026.04.10
Posted by 구차니

gpu 사용할 갯수를 0으로 정해주는 것으로 우회는 가능하다는데

 

앱에서는 안되고

 

cli 에서만 되는 듯.

C:\>ollama
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> Send a message (/? for help)

 

해보니 GPU 메모리 사용량이 떨어지고, 응답중인데 GPU Load가 올라가지 않는다.

 

[링크 : https://github.com/ollama/ollama/issues/1046]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

ollama /set parameter num_gpu 테스트  (0) 2026.04.12
ollama cli 도움말  (0) 2026.04.12
ollama context len  (0) 2026.04.11
ollama for windows  (0) 2026.04.10
ollama 를 jetson nano 4GB 에서 돌려보기  (0) 2026.04.10
Posted by 구차니

ollama 앱에서 설정을 보면 가장 아래 context length가 있는데

초기에는 4k로 되어있다. 256k 까지 가능한데

 

 

128k 에서 실행해보고 4k로 바뀐후 실행하보니 아예 사라진다.

context length 변경하면 기존 컨테이너(?)는 죽어 버리고 새로 시작되어야 적용이 되는 듯.

C:\>ollama ps
NAME          ID              SIZE     PROCESSOR          CONTEXT    UNTIL
gemma4:e2b    7fbdbf8f5e45    11 GB    73%/27% CPU/GPU    131072     3 minutes from now

C:\>ollama ps
NAME    ID    SIZE    PROCESSOR    CONTEXT    UNTIL

C:\>ollama ps
NAME    ID    SIZE    PROCESSOR    CONTEXT    UNTIL

C:\>ollama
>>>

C:\>ollama ps
NAME          ID              SIZE      PROCESSOR          CONTEXT    UNTIL
gemma4:e2b    7fbdbf8f5e45    7.9 GB    73%/27% CPU/GPU    4096       4 minutes from now

 

리눅스에서는 ollama serve 앞에 OLLAMA_CONTEXT_LENGTH 변수로 줄 수 있는것 같은데

윈도우에서는 안되는 듯.

 

일단 설명페이지에 있는 컨텍스트 별 메모리 요구량.

[링크 : https://docs.ollama.com/context-length]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

ollama cli 도움말  (0) 2026.04.12
ollama cpu only  (0) 2026.04.11
ollama for windows  (0) 2026.04.10
ollama 를 jetson nano 4GB 에서 돌려보기  (0) 2026.04.10
python + telegram + ollama + selenium  (0) 2026.04.09
Posted by 구차니