프로그램 사용/ai 프로그램

llama.cpp windows cuda12 1080 ti 11GB + 1060 6GB 테스트

구차니 2026. 4. 25. 19:04

0번 1080 Ti 11GB

1번 1060 6GB

 

Sat Apr 25 18:45:09 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 582.28                 Driver Version: 582.28         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti   WDDM  |   00000000:01:00.0 Off |                  N/A |
| 24%   40C    P8             12W /  250W |    9328MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce GTX 1060 6GB  WDDM  |   00000000:02:00.0 Off |                  N/A |
| 39%   35C    P8              5W /  120W |    4111MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            3040      C   ...n-cuda-12.4-x64\llama-cli.exe      N/A      |
|    1   N/A  N/A            3040      C   ...n-cuda-12.4-x64\llama-cli.exe      N/A      |
+-----------------------------------------------------------------------------------------+

 

D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64>llama-cli --list-devices
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 17407 MiB):
  Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes, VRAM: 11263 MiB
  Device 1: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes, VRAM: 6143 MiB
load_backend: loaded CUDA backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cuda.dll
load_backend: loaded RPC backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cpu-haswell.dll
Available devices:
  CUDA0: NVIDIA GeForce GTX 1080 Ti (11263 MiB, 10200 MiB free)
  CUDA1: NVIDIA GeForce GTX 1060 6GB (6143 MiB, 5197 MiB free)

 

1080 Ti 11GB + 1060 6GB

gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf 15.9GB

none 17 t/s with -dev CUDA0
10 t/s with -dev CUDA1
layer 19 t/s
row -
tensor -

 

row 로드실패
D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64>llama-cli.exe -m ..\gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf -sm row
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 17407 MiB):
  Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes, VRAM: 11263 MiB
  Device 1: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes, VRAM: 6143 MiB
load_backend: loaded CUDA backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cuda.dll
load_backend: loaded RPC backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cpu-haswell.dll

Loading model... -D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:97: CUDA error
CUDA error: out of memory

 

tensor 로드실패
D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64>llama-cli.exe -m ..\gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf -sm tensor
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 17407 MiB):
  Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes, VRAM: 11263 MiB
  Device 1: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes, VRAM: 6143 MiB
load_backend: loaded CUDA backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cuda.dll
load_backend: loaded RPC backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cpu-haswell.dll

Loading model... /D:/a/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:119: GGML_ASSERT(buffer) failed
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 8537.78 MiB on device 1: cudaMalloc failed: out of memory

 

Qwen3.6-27B-Q5_K_M.gguf  18.1GB

none < 2  t/s with -dev CUDA0
< 0.1 t/s with -dev CUDA1
layer 2 t/s
row -
tensor -

 

row 로드실패
D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64>llama-cli.exe -m ..\Qwen3.6-27B-Q5_K_M.gguf -sm row
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 17407 MiB):
  Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes, VRAM: 11263 MiB
  Device 1: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes, VRAM: 6143 MiB
load_backend: loaded CUDA backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cuda.dll
load_backend: loaded RPC backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\study\llm\llama-b8918-bin-win-cuda-12.4-x64\ggml-cpu-haswell.dll

Loading model... /CUDA error: out of memory
  current device: 1, in function ggml_backend_cuda_split_buffer_init_tensor at D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:956
  ggml_cuda_device_malloc((void**)&buf, size, id)
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:97: CUDA error

 

1080 Ti 11GB * 2

gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf 15.9GB

none 13 t/s
layer 43 t/s
row -
tensor -

 

Qwen3.6-27B-Q5_K_M.gguf 18.1GB

none < 2 t/s
layer 7 t/s
row 5 t/s
tensor -