구차니의 잡동사니 모음

프로그램 사용/ai 프로그램2026. 4. 8. 14:09

ollama 설치

intel cpu 인데 내장 그래픽은 역시 버리는건가..(!)

$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
[sudo] minimonk 암호:
>>> Downloading ollama-linux-amd64.tar.zst
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode.

$ ollama
Ollama 0.20.3

  Chat with a model
    Start an interactive chat with a model

  Launch Claude Code (not installed)
    Anthropic's coding tool with subagents

  Launch Codex (not installed)
    OpenAI's open-source coding agent

  Launch OpenClaw (install)
    Personal AI with 100+ skills

  Launch OpenCode (not installed)
    Anomaly's open-source coding agent

  Launch Droid (not installed)
    Factory's coding agent across terminal and IDEs

▸ Launch Pi (install)
    Press enter to install


↑/↓ navigate • enter launch • → configure • esc quit

$ ollama
Select model to run: Type to filter...

  Recommended
  ▸ kimi-k2.5:cloud
      Multimodal reasoning with subagents
    qwen3.5:cloud
      Reasoning, coding, and agentic tool use with vision
    glm-5:cloud
      Reasoning and code generation
    minimax-m2.7:cloud
      Fast, efficient coding and real-world productivity
    gemma4
      Reasoning and code generation locally, ~16GB, (not downloaded)
    qwen3.5
      Reasoning, coding, and visual understanding locally, ~11GB, (not downloade

↑/↓ navigate • enter select • ← back

$ ollama
Download gemma4?

   Yes    No

←/→ navigate • enter confirm • esc cancel

메모리도 작고(16GB) cpu로 돌릴거라 취소!

pulling manifest
pulling 4c27e0f5b5ad: 40% ▕███████ ▏ 3.9 GB/9.6 GB 11 MB/s 8m11s

[링크 : https://goddaehee.tistory.com/]

[링크 : https://ollama.com/download]

[링크 : https://ollama.com/search]

일단 돌릴 테스트니까 1B던 3B던 상관없으니 시작

그런데 아까 위에 메뉴에서는 왜 gemma만 보이고 이제 llama는 안보이는걸까?

일단은.. updated 1 years ago 인거 봐서 최신으로만 보여주는걸려나?

$ ollama run llama3.2
pulling manifest
pulling dde5aa3fc5ff: 70% ▕████████████ ▏ 1.4 GB/2.0 GB 20 MB/s 29s

[링크 : https://ollama.com/library/llama3.2]

으아아 컴퓨터 녹는다 ㅋㅋㅋ

그래도 초당 2~3 단어 정도 나오는 느낌인데 이정도만 해도 그냥 대화하는 느낌 속도라

그거보다 빠르면 읽기도 전에 말이 더 빠르게 나올 듯?

Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz

다시 실행하니 내가 받아둔 모델이 뜬다.

$ ollama
Select model to run: Type to filter...

  Recommended
    gemma4
      Reasoning and code generation locally, ~16GB, (not downloaded)
    qwen3.5
      Reasoning, coding, and visual understanding locally, ~11GB, (not downloaded)
    kimi-k2.5:cloud
      Multimodal reasoning with subagents
    qwen3.5:cloud
      Reasoning, coding, and agentic tool use with vision
    glm-5:cloud
      Reasoning and code generation
    minimax-m2.7:cloud
      Fast, efficient coding and real-world productivity

  More
  ▸ llama3.2

↑/↓ navigate • enter select • ← back

$ ollama --help
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start Ollama
  create      Create a model
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  signin      Sign in to ollama.com
  signout     Sign out from ollama.com
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  launch      Launch the Ollama menu or an integration
  help        Help about any command

Flags:
  -h, --help         help for ollama
      --nowordwrap   Don't wrap words to the next line automatically
      --verbose      Show timings for response
  -v, --version      Show version information

Use "ollama [command] --help" for more information about a command.

단발성으로 질문하고 속도를 보려고 하는데

음.. eval rate가 실제 체감되는 속도인가? 12 token/s

이정도 수치는 되어야 많이 지루하지 않게 읽는 속도로 말이 생성되는 듯.

$ ollama run llama3.2:latest "안녕" --verbose
안녕하세요! (Hello!) I'm here to help you. How can I assist you today?

total duration:       2.282485591s
load duration:        143.825937ms
prompt eval count:    27 token(s)
prompt eval duration: 350.162008ms
prompt eval rate:     77.11 tokens/s
eval count:           21 token(s)
eval duration:        1.73838491s
eval rate:            12.08 tokens/s

gemma가 핫하긴 한데 용량이 커서 안돌것 같아서 3로 시도

[링크 : https://ollama.com/library/gemma3]

[링크 : https://ollama.com/library/gemma4]

제법 빠르고llama 보다는 괜찮은듯?

$ ollama run gemma3:1b --verbose
>>> 안녕?
안녕하세요! 잘 지내세요? 😊 저는 오늘 당신의 도움이 필요하신 건가요? 궁금한
게 있으신가요?

total duration:       1.695837878s
load duration:        297.67234ms
prompt eval count:    12 token(s)
prompt eval duration: 164.822603ms
prompt eval rate:     72.81 tokens/s
eval count:           28 token(s)
eval duration:        1.132524529s
eval rate:            24.72 tokens/s

혹시나 gemma4를 16GB 메모리에서 시도해봤는데 메모리 부족으로 실패.

$ ollama run gemma4:e2b --verbose
pulling manifest
pulling 4e30e2665218: 100% ▕██████████████████▏ 7.2 GB
pulling 7339fa418c9a: 100% ▕██████████████████▏  11 KB
pulling 56380ca2ab89: 100% ▕██████████████████▏   42 B
pulling c6bc3775a3fa: 100% ▕██████████████████▏  473 B
verifying sha256 digest
writing manifest
success
Error: 500 Internal Server Error: model requires more system memory (7.3 GiB) than is available (4.8 GiB)

프로세스 이것저것 죽이고 다시 시도하니 10token /s 정도 나오는 듯.

$ ollama run gemma4:e2b --verbose
>>> 안녕
안녕하세요! 😊

total duration:       1.261211464s
load duration:        302.800476ms
prompt eval count:    17 token(s)
prompt eval duration: 551.024817ms
prompt eval rate:     30.85 tokens/s
eval count:           4 token(s)
eval duration:        226.924038ms
eval rate:            17.63 tokens/s
>>> 너에 대해서 소개해봐
Thinking...
Thinking Process:

1.  **Analyze the Request:** The user asked "너에 대해서 소개해봐" (Tell
me about yourself).
2.  **Determine Identity & Core Facts (Self-Correction/Verification):**
    *   Name: Gemma 4.
    *   Developer: Google DeepMind.
    *   Nature: Large Language Model (LLM).
    *   Type: Open weights model.
    *   Capabilities: Understands text/images, generates text.
    *   Knowledge Cutoff: (Relevant for context, but usually stated
implicitly)
3.  **Structure the Introduction (Korean Context):** A good introduction
should cover:
    *   Greeting.
    *   Name/Identity.
    *   Developer/Origin.
    *   Nature/Function (What I do).
    *   Capabilities (What I can process).
4.  **Draft the Response (Internal Monologue & Refinement):**
    *   *Start warmly.* (안녕하세요!)
    *   *State Identity.* (저는 Gemma 4입니다.)
    *   *State Origin.* (Google DeepMind에서 개발했습니다.)
    *   *State Nature.* (저는 대규모 언어 모델(LLM)입니다.)
    *   *State Function/Role.* (사용자의 질문에 답하고, 정보를 제공하며, 다
양한 텍스트 기반 작업을 돕는 역할을 합니다.)
    *   *State Modality/Input.* (텍스트와 이미지를 이해할 수 있습니다.)
    *   *Concluding remark/Offer help.* (어떤 것을 도와드릴까요?)
5.  **Final Output Generation (Review against constraints):** The
generated response must adhere to the core identity (Gemma 4, LLM, Google
DeepMind, open weights). (The draft looks good and adheres to the required
identity.)
...done thinking.

안녕하세요! 저는 **Gemma 4**입니다.

저는 Google DeepMind에서 개발한 **대규모 언어 모델(LLM)**이며, 오픈 웨이트
(open weights) 모델로 공개되어 있습니다.

**저의 주요 특징은 다음과 같습니다:**

1. **정체성:** 저는 방대한 데이터를 학습하여 인간의 언어를 이해하고 생성하며
, 다양한 질문에 답하고 정보를 제공하는 능력을 가지고 있습니다.
2. **기능:** 텍스트 기반의 대화, 요약, 번역, 작문, 코드 생성 등 다양한 언어
관련 작업을 수행할 수 있습니다.
3. **입력 능력:** 저는 텍스트와 이미지를 이해하고 처리할 수 있습니다. (다만
, 출력은 텍스트로만 생성합니다.)
4. **목표:** 사용자분들이 필요한 정보를 얻고, 창의적인 작업을 수행하며, 복잡
한 문제를 해결하는 데 도움을 드릴 수 있도록 설계되었습니다.

궁금한 점이 있거나 도움이 필요하시면 언제든지 말씀해주세요! 😊

total duration:       57.596689687s
load duration:        297.105521ms
prompt eval count:    35 token(s)
prompt eval duration: 482.722245ms
prompt eval rate:     72.51 tokens/s
eval count:           614 token(s)
eval duration:        56.261795129s
eval rate:            10.91 tokens/s

intel 내장 그래픽(xe 나 arc 아님)은 openCL 가속 자체를 지원하지 않나?

$ clinfo
Number of platforms 0

[링크 : https://docs.ollama.com/gpu]

[링크 : https://dgpu-docs.intel.com/driver/client/overview.html#ubuntu-22.04]

저작자표시 (새창열림)

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

python + telegram + ollama + selenium (0)	2026.04.09
python 으로 telegram + fatherbot 연동하기 (0)	2026.04.08
antigravity artifact (0)	2026.03.18
gemini 동영상 만들기 (유료?) (0)	2026.02.22
antigravity 사용량 끝! (0)	2026.02.20

Posted by 구차니

구차니의 잡동사니 모음

ollama 설치

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

티스토리툴바