구차니의 잡동사니 모음

프로그램 사용/ai 프로그램2026. 4. 17. 16:54

llama.cpp

ollama 보다 성능이 좋게 나온다는데 한 번 쓰는법 찾아봐야지

[링크 : https://peekaboolabs.ai/blog/ollama-vs-llama-cpp-guide]

[링크 :https://news.hada.io/topic?id=28622]

[링크 : https://github.com/ggml-org/llama.cpp]

official은 아니지만 윈도우용 pre-built binary가 존재는 하는 듯.

[링크 : https://github.com/HPUhushicheng/llama.cpp_windows]

일단 python 라이브러리

from llama_cpp import Llama

llm = Llama(
      model_path="./models/7B/llama-model.gguf",
      # n_gpu_layers=-1, # Uncomment to use GPU acceleration
      # seed=1337, # Uncomment to set a specific seed
      # n_ctx=2048, # Uncomment to increase the context window
)
output = llm(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
) # Generate a completion, can also call create_completion
print(output)

[링크 : https://pypi.org/project/llama-cpp-python/]

OpenCL 드라이버 인스톨 된 거 확인하고

LLAMA_CLBLAST=1 make

이렇게 컴파일 하면 된다고 합니다.

make인걸 보니 리눅스에서 컴파일하는 걸 테고 윈도에서는 cmake 써야겠죠.

[링크 : https://arca.live/b/alpaca/76969814]

전체 연산이 아니라 token 생성만 가속인가?

OpenCL Token Generation Acceleration

[링크 : https://github.com/ggml-org/llama.cpp/releases/tag/master-2e6cd4b]

To get this running on the XTX I had to install the latest 5.5 version of the AMD linux drivers, which are released but not available from the normal AMD download page yet. You can get the deb for the installer here. I installed with amdgpu-install --usecase=opencl,rocm and installed CLBlast after apt install libclblast-dev.

Confirm opencl is working with sudo clinfo (did not find the GPU device unless I run as root).

Build llama.cpp (with merged pull) using LLAMA_CLBLAST=1 make.

[링크 : https://www.reddit.com/r/LocalLLaMA/comments/13m8li2/finally_got_a_model_running_on_my_xtx_using/]

저작자표시 (새창열림)

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

llm tokenizer - phi3 (0)	2026.04.19
llm tokenizer (0)	2026.04.17
lm studio (0)	2026.04.17
사람의 욕심은 끝이없고 - ollama multiple GPU support (0)	2026.04.17
ollama with 1080 Ti (0)	2026.04.16

Posted by 구차니

구차니의 잡동사니 모음

llama.cpp

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

티스토리툴바