intel cpu 인데 내장 그래픽은 역시 버리는건가..(!)

 

$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
[sudo] minimonk 암호: 
>>> Downloading ollama-linux-amd64.tar.zst
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode.

 

$ ollama 
Ollama 0.20.3                                       
                                                    
  Chat with a model                                 
    Start an interactive chat with a model          
                                                    
  Launch Claude Code (not installed)                
    Anthropic's coding tool with subagents          
                                                    
  Launch Codex (not installed)                      
    OpenAI's open-source coding agent               
                                                    
  Launch OpenClaw (install)                         
    Personal AI with 100+ skills                    
                                                    
  Launch OpenCode (not installed)                   
    Anomaly's open-source coding agent              
                                                    
  Launch Droid (not installed)                      
    Factory's coding agent across terminal and IDEs 
                                                    
▸ Launch Pi (install)                               
    Press enter to install                          
                                                    
                                                    
↑/↓ navigate • enter launch • → configure • esc quit

 

$ ollama 
Select model to run: Type to filter...                                          
                                                                                
  Recommended                                                                   
  ▸ kimi-k2.5:cloud                                                             
      Multimodal reasoning with subagents                                       
    qwen3.5:cloud                                                               
      Reasoning, coding, and agentic tool use with vision                       
    glm-5:cloud                                                                 
      Reasoning and code generation                                             
    minimax-m2.7:cloud                                                          
      Fast, efficient coding and real-world productivity                        
    gemma4                                                                      
      Reasoning and code generation locally, ~16GB, (not downloaded)            
    qwen3.5                                                                     
      Reasoning, coding, and visual understanding locally, ~11GB, (not downloade
                                                                                
↑/↓ navigate • enter select • ← back                         
$ ollama 
Download gemma4?                         
                                         
   Yes    No                             
                                         
←/→ navigate • enter confirm • esc cancel

 

메모리도 작고(16GB) cpu로 돌릴거라 취소!

pulling manifest 
pulling 4c27e0f5b5ad:  40% ▕███████           ▏ 3.9 GB/9.6 GB   11 MB/s   8m11s

 

[링크 : https://goddaehee.tistory.com/]

[링크 : https://ollama.com/download]

[링크 : https://ollama.com/search]

 

+

일단 돌릴 테스트니까  1B던 3B던 상관없으니 시작

그런데 아까 위에 메뉴에서는 왜 gemma만 보이고 이제 llama는 안보이는걸까?

일단은.. updated 1 years ago 인거 봐서 최신으로만 보여주는걸려나?

$ ollama run llama3.2
pulling manifest 
pulling dde5aa3fc5ff:  70% ▕████████████      ▏ 1.4 GB/2.0 GB   20 MB/s     29s

[링크 : https://ollama.com/library/llama3.2]

 

으아아 컴퓨터 녹는다 ㅋㅋㅋ

그래도 초당 2~3 단어 정도 나오는 느낌인데 이정도만 해도 그냥 대화하는 느낌 속도라

그거보다 빠르면 읽기도 전에 말이 더 빠르게 나올 듯?

 

Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz

 

다시 실행하니 내가 받아둔 모델이 뜬다.

$ ollama 
Select model to run: Type to filter...                                            
                                                                                  
  Recommended                                                                     
    gemma4                                                                        
      Reasoning and code generation locally, ~16GB, (not downloaded)              
    qwen3.5                                                                       
      Reasoning, coding, and visual understanding locally, ~11GB, (not downloaded)
    kimi-k2.5:cloud                                                               
      Multimodal reasoning with subagents                                         
    qwen3.5:cloud                                                                 
      Reasoning, coding, and agentic tool use with vision                         
    glm-5:cloud                                                                   
      Reasoning and code generation                                               
    minimax-m2.7:cloud                                                            
      Fast, efficient coding and real-world productivity                          
                                                                                  
  More                                                                            
  ▸ llama3.2                                                                      
                                                                                  
↑/↓ navigate • enter select • ← back                  

 

 

$ ollama --help
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start Ollama
  create      Create a model
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  signin      Sign in to ollama.com
  signout     Sign out from ollama.com
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  launch      Launch the Ollama menu or an integration
  help        Help about any command

Flags:
  -h, --help         help for ollama
      --nowordwrap   Don't wrap words to the next line automatically
      --verbose      Show timings for response
  -v, --version      Show version information

Use "ollama [command] --help" for more information about a command.

 

단발성으로 질문하고 속도를 보려고 하는데

음.. eval rate가 실제 체감되는 속도인가? 12 token/s

이정도 수치는 되어야 많이 지루하지 않게 읽는 속도로 말이 생성되는 듯.

$ ollama run llama3.2:latest "안녕" --verbose
안녕하세요! (Hello!) I'm here to help you. How can I assist you today?

total duration:       2.282485591s
load duration:        143.825937ms
prompt eval count:    27 token(s)
prompt eval duration: 350.162008ms
prompt eval rate:     77.11 tokens/s
eval count:           21 token(s)
eval duration:        1.73838491s
eval rate:            12.08 tokens/s

 

gemma가 핫하긴 한데 용량이 커서 안돌것 같아서 3로 시도

[링크 : https://ollama.com/library/gemma3]

[링크 : https://ollama.com/library/gemma4]

 

제법 빠르고llama 보다는 괜찮은듯?

$ ollama run gemma3:1b --verbose
>>> 안녕?
안녕하세요! 잘 지내세요? 😊 저는 오늘 당신의 도움이 필요하신 건가요? 궁금한
 게 있으신가요?

total duration:       1.695837878s
load duration:        297.67234ms
prompt eval count:    12 token(s)
prompt eval duration: 164.822603ms
prompt eval rate:     72.81 tokens/s
eval count:           28 token(s)
eval duration:        1.132524529s
eval rate:            24.72 tokens/s

 

혹시나 gemma4를 16GB 메모리에서 시도해봤는데 메모리 부족으로 실패.

$ ollama run gemma4:e2b --verbose
pulling manifest 
pulling 4e30e2665218: 100% ▕██████████████████▏ 7.2 GB                         
pulling 7339fa418c9a: 100% ▕██████████████████▏  11 KB                         
pulling 56380ca2ab89: 100% ▕██████████████████▏   42 B                         
pulling c6bc3775a3fa: 100% ▕██████████████████▏  473 B                         
verifying sha256 digest 
writing manifest 
success 
Error: 500 Internal Server Error: model requires more system memory (7.3 GiB) than is available (4.8 GiB)

 

프로세스 이것저것 죽이고 다시 시도하니 10token /s 정도 나오는 듯.

$ ollama run gemma4:e2b --verbose
>>> 안녕
안녕하세요! 😊

total duration:       1.261211464s
load duration:        302.800476ms
prompt eval count:    17 token(s)
prompt eval duration: 551.024817ms
prompt eval rate:     30.85 tokens/s
eval count:           4 token(s)
eval duration:        226.924038ms
eval rate:            17.63 tokens/s
>>> 너에 대해서 소개해봐
Thinking...
Thinking Process:

1.  **Analyze the Request:** The user asked "너에 대해서 소개해봐" (Tell 
me about yourself).
2.  **Determine Identity & Core Facts (Self-Correction/Verification):**
    *   Name: Gemma 4.
    *   Developer: Google DeepMind.
    *   Nature: Large Language Model (LLM).
    *   Type: Open weights model.
    *   Capabilities: Understands text/images, generates text.
    *   Knowledge Cutoff: (Relevant for context, but usually stated 
implicitly)
3.  **Structure the Introduction (Korean Context):** A good introduction 
should cover:
    *   Greeting.
    *   Name/Identity.
    *   Developer/Origin.
    *   Nature/Function (What I do).
    *   Capabilities (What I can process).
4.  **Draft the Response (Internal Monologue & Refinement):**
    *   *Start warmly.* (안녕하세요!)
    *   *State Identity.* (저는 Gemma 4입니다.)
    *   *State Origin.* (Google DeepMind에서 개발했습니다.)
    *   *State Nature.* (저는 대규모 언어 모델(LLM)입니다.)
    *   *State Function/Role.* (사용자의 질문에 답하고, 정보를 제공하며, 다
양한 텍스트 기반 작업을 돕는 역할을 합니다.)
    *   *State Modality/Input.* (텍스트와 이미지를 이해할 수 있습니다.)
    *   *Concluding remark/Offer help.* (어떤 것을 도와드릴까요?)
5.  **Final Output Generation (Review against constraints):** The 
generated response must adhere to the core identity (Gemma 4, LLM, Google 
DeepMind, open weights). (The draft looks good and adheres to the required 
identity.)
...done thinking.

안녕하세요! 저는 **Gemma 4**입니다.

저는 Google DeepMind에서 개발한 **대규모 언어 모델(LLM)**이며, 오픈 웨이트
(open weights) 모델로 공개되어 있습니다.

**저의 주요 특징은 다음과 같습니다:**

1. **정체성:** 저는 방대한 데이터를 학습하여 인간의 언어를 이해하고 생성하며
, 다양한 질문에 답하고 정보를 제공하는 능력을 가지고 있습니다.
2. **기능:** 텍스트 기반의 대화, 요약, 번역, 작문, 코드 생성 등 다양한 언어
 관련 작업을 수행할 수 있습니다.
3. **입력 능력:** 저는 텍스트와 이미지를 이해하고 처리할 수 있습니다. (다만
, 출력은 텍스트로만 생성합니다.)
4. **목표:** 사용자분들이 필요한 정보를 얻고, 창의적인 작업을 수행하며, 복잡
한 문제를 해결하는 데 도움을 드릴 수 있도록 설계되었습니다.

궁금한 점이 있거나 도움이 필요하시면 언제든지 말씀해주세요! 😊

total duration:       57.596689687s
load duration:        297.105521ms
prompt eval count:    35 token(s)
prompt eval duration: 482.722245ms
prompt eval rate:     72.51 tokens/s
eval count:           614 token(s)
eval duration:        56.261795129s
eval rate:            10.91 tokens/s

 

+

intel 내장 그래픽(xe 나 arc 아님)은 openCL 가속 자체를 지원하지 않나?

$ clinfo
Number of platforms                               0

[링크 : https://docs.ollama.com/gpu]

[링크 : https://dgpu-docs.intel.com/driver/client/overview.html#ubuntu-22.04]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

python 으로 telegram + fatherbot 연동하기  (0) 2026.04.08
antigravity artifact  (0) 2026.03.18
agi를 느꼈다?  (0) 2026.02.24
gemini 동영상 만들기 (유료?)  (0) 2026.02.22
antigravity 사용량 끝!  (0) 2026.02.20
Posted by 구차니