128k context length에 2기가 VRAM을 냠냠
| $ ./llama-b8925/llama-cli -m model/gemma4-e4b/gemma-4-E4B-it-Q4_K_M.gguf --verbose llama_kv_cache: size = 2048.00 MiB (131072 cells, 4 layers, 1/1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 512 llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 512 llama_kv_cache_iswa: creating SWA KV cache, size = 1024 cells |
k 만 q4로 하니 288MB! 256MB 보단 약간 큰데 아무튼 대충~ 1/4 로 줄었다.
| $ ./llama-b8925/llama-cli -m model/gemma4-e4b/gemma-4-E4B-it-Q4_K_M.gguf --verbose -ctk q4_0 --ctx-size 131072 llama_kv_cache: size = 1312.00 MiB (131072 cells, 4 layers, 1/1 seqs), K (q4_0): 288.00 MiB, V (f16): 1024.00 MiB llama_kv_cache: attn_rot_k = 1, n_embd_head_k_all = 512 llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 512 llama_kv_cache_iswa: creating SWA KV cache, size = 1024 cells |
kv를 q4로 하니 대충 512MB 근처.
| $ ./llama-b8925/llama-cli -m model/gemma4-e4b/gemma-4-E4B-it-Q4_K_M.gguf --verbose -ctk q4_0 -ctv q4_0 --ctx-size 131072 llama_kv_cache: size = 576.00 MiB (131072 cells, 4 layers, 1/1 seqs), K (q4_0): 288.00 MiB, V (q4_0): 288.00 MiB llama_kv_cache: attn_rot_k = 1, n_embd_head_k_all = 512 llama_kv_cache: attn_rot_v = 1, n_embd_head_k_all = 512 llama_kv_cache_iswa: creating SWA KV cache, size = 1024 cells |
'프로그램 사용 > ai 프로그램' 카테고리의 다른 글
| wan2.2 + comfyui (0) | 2026.05.19 |
|---|---|
| comfyui 실행 (0) | 2026.05.19 |
| openai api (0) | 2026.05.18 |
| RAG 시도 - postgresql(14) + pgvector (1) | 2026.05.15 |
| gpt님 만세! - pip torch 버전 낮추기 (0) | 2026.05.15 |
