한번 해봐야지. 그나저나 1080 에서 F16만 된다고 하면.. 그냥 직접 변환하는것도 방법이 될려나?
| /llama.cpp$ ls convert_* convert_hf_to_gguf.py convert_hf_to_gguf_update.py convert_llama_ggml_to_gguf.py convert_lora_to_gguf.py |
[링크 : https://huggingface.co/google/gemma-4-E4B/tree/main]
[링크 : https://velog.io/@choonsik_mom/llama.cpp로-gguf-모델-서빙하기-ul02hone]
+
실행해보면 다음과 같이 나오고
| $ /home/minimonk/src/llama.cpp/convert_hf_to_gguf.py usage: convert_hf_to_gguf.py [-h] [--vocab-only] [--outfile OUTFILE] [--outtype {f32,f16,bf16,q8_0,tq1_0,tq2_0,auto}] [--bigendian] [--use-temp-file] [--no-lazy] [--model-name MODEL_NAME] [--verbose] [--split-max-tensors SPLIT_MAX_TENSORS] [--split-max-size SPLIT_MAX_SIZE] [--dry-run] [--no-tensor-first-split] [--metadata METADATA] [--print-supported-models] [--remote] [--mmproj] [--mistral-format] [--disable-mistral-community-chat-template] [--sentence-transformers-dense-modules] [--fuse-gate-up-exps] [model] convert_hf_to_gguf.py: error: the following arguments are required: model |
huggingface를 통해서 받아도 되고, 최소한의 파일 저 3개를 받아줘도 된다.
| /mnt/Downloads/model/gemma4-e4b/google$ ls config.json model.safetensors tokenizer.json |
GPU를 사용하지 않고, 한번에 메모리에 전부 올려서 하는건지 메모리 터져서 한번 리부팅.. 크흡
그 와중에 옵션 안주니까 BF16 으로 설정해버리는 센스(!) 하지만 난 BF16 가속을 못받는 하드웨어.. ㅠ
| $ /home/minimonk/src/llama.cpp/convert_hf_to_gguf.py ./google/ --outfile gguf.gguf INFO:hf-to-gguf:Loading model: google INFO:hf-to-gguf:Model architecture: Gemma4ForConditionalGeneration INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors' INFO:hf-to-gguf:heuristics detected bfloat16 tensor dtype, setting --outtype bf16 INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Exporting model... Writing: 9%|██████▎ | 1.34G/15.0G [00:20<01:16, 180Mbyte/s] Writing: 100%|███████████████████████████████████████████████████████████████████████| 15.0G/15.0G [01:54<00:00, 131Mbyte/s] INFO:hf-to-gguf:Model successfully exported to gguf.gguf |
exporting model.. 과 wrting 사이 로그
INFO:hf-to-gguf:rope_freqs.weight, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> BF16, shape = {2560, 262144}
INFO:hf-to-gguf:per_layer_token_embd.weight, torch.bfloat16 --> BF16, shape = {10752, 262144}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.0.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.0.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.0.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.0.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.1.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.1.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.1.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.1.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.10.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.10.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.10.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.10.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.11.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.11.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.11.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.11.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.11.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.12.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.12.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.12.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.12.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.13.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.13.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.13.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.13.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.13.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.14.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.14.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.14.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.14.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.15.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.15.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.15.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.15.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.15.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.16.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.16.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.16.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.16.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.16.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.17.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.17.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.17.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.17.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.17.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.18.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.18.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.18.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.18.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.18.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.19.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.19.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.19.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.19.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.19.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.2.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.2.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.2.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.2.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.20.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.20.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.20.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.20.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.20.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.21.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.21.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.21.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.21.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.21.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.22.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.22.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.22.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.22.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.22.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.23.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.23.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.23.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.23.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.23.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.24.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.24.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.24.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.24.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.24.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.24.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.24.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.24.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.25.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.25.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.25.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.25.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.25.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.25.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.25.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.25.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.26.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.26.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.26.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.26.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.26.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.26.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.26.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.26.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.27.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.27.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.27.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.27.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.27.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.27.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.27.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.27.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.28.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.28.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.28.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.28.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.28.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.28.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.28.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.28.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.28.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.28.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.29.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.29.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.29.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.29.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.29.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.29.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.29.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.29.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.29.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.29.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.3.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.3.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.3.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.3.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.30.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.30.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.30.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.30.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.30.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.30.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.30.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.30.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.30.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.30.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.31.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.31.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.31.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.31.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.31.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.31.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.31.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.31.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.31.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.31.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.32.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.32.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.32.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.32.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.32.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.32.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.32.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.32.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.32.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.32.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.33.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.33.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.33.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.33.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.33.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.33.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.33.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.33.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.33.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.33.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.34.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.34.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.34.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.34.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.34.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.34.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.34.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.34.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.34.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.34.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.35.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.35.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.35.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.35.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.35.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.35.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.35.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.35.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.35.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.35.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.36.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.36.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.36.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.36.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.36.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.36.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.36.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.36.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.36.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.36.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.37.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.37.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.37.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.37.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.37.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.37.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.37.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.37.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.37.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.37.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.38.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.38.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.38.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.38.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.38.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.38.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.38.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.38.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.38.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.38.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.39.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.39.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.39.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.39.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.39.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.39.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.39.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.39.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.39.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.39.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.4.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.4.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.4.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.4.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.40.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.40.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.40.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.40.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.40.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.40.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.40.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.40.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.40.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.40.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.41.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.41.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.41.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.41.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.41.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.41.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.41.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.41.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.41.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.41.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.5.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.5.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.5.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.5.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.6.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.6.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.6.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.6.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.7.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.7.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.7.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.7.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.8.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.8.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.8.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.8.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.9.inp_gate.weight, torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.9.proj.weight, torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.9.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_k.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.9.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_q.weight, torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:per_layer_model_proj.weight, torch.bfloat16 --> BF16, shape = {2560, 10752}
INFO:hf-to-gguf:per_layer_proj_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 2560
INFO:hf-to-gguf:gguf: feed forward length = 10240
INFO:hf-to-gguf:gguf: head count = 8
INFO:hf-to-gguf:gguf: key-value head count = 2
WARNING:hf-to-gguf:Unknown RoPE type: proportional
INFO:hf-to-gguf:gguf: rope scaling type = NONE
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rope theta swa = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.context_length', overwriting it with new value 131072 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.head_count', overwriting it with new value 8 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-06 of type FLOAT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.key_length', overwriting it with new value 256 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.value_length', overwriting it with new value 256 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.rope.freq_base', overwriting it with new value 1000000.0 of type FLOAT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.head_count_kv', overwriting it with new value 2 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.key_length', overwriting it with new value 512 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.value_length', overwriting it with new value 512 of type UINT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:hf-to-gguf:Token '<|tool_call>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<tool_call|>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<|tool_response>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<tool_response|>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<|"|>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<|channel>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<channel|>' is set to USER_DEFINED
WARNING:gguf.vocab:Unknown separator token '<bos>' in TemplateProcessing<pair>
INFO:gguf.vocab:Adding 514906 merge(s).
INFO:gguf.vocab:Setting special token type bos to 2
INFO:gguf.vocab:Setting special token type eos to 1
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_sep_token to False
WARNING:gguf.gguf_writer:Duplicated key name 'tokenizer.ggml.add_bos_token', overwriting it with new value True of type BOOL
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:gguf.gguf: n_tensors = 720, total_size = 15.0G
둘 다 머.. 거의 동일한 사이즈로 나왔다. 1기가도 안줄었네
| drwxrwxr-x 2 minimonk minimonk 4096 6월 1 13:52 ./ drwxrwxr-x 3 minimonk minimonk 4096 6월 1 13:52 ../ -rw-rw-r-- 1 minimonk minimonk 5105 6월 1 13:51 config.json -rw-rw-r-- 1 minimonk minimonk 15992595884 6월 1 13:03 model.safetensors -rw-rw-r-- 1 minimonk minimonk 32170070 6월 1 13:51 tokenizer.json |
| -rw-rw-r-- 1 minimonk minimonk 15053078208 6월 1 14:01 gguf.gguf |
그래서 F16 으로 하게 했더니 BF16 이나 F16 이나 용량은 바이트 단위 까지 동일!
| -rw-rw-r-- 1 minimonk minimonk 15053078208 6월 1 14:05 gguf.gguf |
일단 옵션은 outtype 으로 주면된다.
| $ /home/minimonk/src/llama.cpp/convert_hf_to_gguf.py ./google/ --outfile gguf.gguf --outtype f16 INFO:hf-to-gguf:Loading model: google INFO:hf-to-gguf:Model architecture: Gemma4ForConditionalGeneration INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors' INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Exporting model... Writing: 9%|██████▎ | 1.34G/15.0G [00:02<00:27, 503Mbyte/s] Writing: 100%|███████████████████████████████████████████████████████████████████████| 15.0G/15.0G [00:49<00:00, 305Mbyte/s] INFO:hf-to-gguf:Model successfully exported to gguf.gguf |
exporting model.. 과 wrting 사이 로그
INFO:hf-to-gguf:rope_freqs.weight, torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {2560, 262144}
INFO:hf-to-gguf:per_layer_token_embd.weight, torch.bfloat16 --> F16, shape = {10752, 262144}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.0.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.0.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.0.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.0.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.1.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.1.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.1.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.1.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.10.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.10.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.10.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.10.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.11.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.11.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.11.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.11.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.11.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.12.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.12.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.12.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.12.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.13.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.13.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.13.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.13.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.13.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.14.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.14.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.14.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.14.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.15.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.15.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.15.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.15.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.15.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.16.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.16.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.16.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.16.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.16.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.17.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.17.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.17.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.17.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.17.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.18.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.18.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.18.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.18.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.18.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.19.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.19.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.19.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.19.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.19.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.2.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.2.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.2.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.2.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.20.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.20.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.20.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.20.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.20.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.21.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.21.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.21.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.21.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.21.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.22.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.22.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.22.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.22.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.22.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.23.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.23.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.23.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.23.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.23.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.24.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.24.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.24.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.24.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.24.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.24.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.24.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.24.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.25.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.25.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.25.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.25.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.25.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.25.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.25.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.25.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.26.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.26.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.26.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.26.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.26.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.26.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.26.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.26.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.27.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.27.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.27.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.27.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.27.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.27.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.27.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.27.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.28.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.28.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.28.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.28.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.28.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.28.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.28.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.28.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.28.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.28.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.29.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.29.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.29.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.29.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.29.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.29.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.29.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.29.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.29.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.29.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.3.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.3.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.3.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.3.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.30.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.30.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.30.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.30.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.30.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.30.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.30.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.30.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.30.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.30.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.31.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.31.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.31.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.31.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.31.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.31.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.31.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.31.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.31.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.31.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.32.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.32.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.32.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.32.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.32.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.32.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.32.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.32.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.32.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.32.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.33.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.33.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.33.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.33.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.33.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.33.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.33.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.33.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.33.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.33.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.34.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.34.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.34.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.34.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.34.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.34.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.34.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.34.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.34.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.34.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.35.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.35.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.35.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.35.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.35.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.35.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.35.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.35.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.35.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.35.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.36.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.36.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.36.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.36.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.36.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.36.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.36.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.36.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.36.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.36.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.37.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.37.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.37.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.37.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.37.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.37.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.37.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.37.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.37.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.37.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.38.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.38.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.38.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.38.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.38.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.38.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.38.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.38.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.38.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.38.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.39.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.39.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.39.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.39.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.39.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.39.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.39.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.39.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.39.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.39.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.4.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.4.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.4.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.4.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.40.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.40.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.40.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.40.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.40.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.40.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.40.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.40.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.40.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.40.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.41.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.41.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.41.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.41.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.41.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.41.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.41.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.41.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.41.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.41.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.5.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.5.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.5.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.5.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.6.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.6.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.6.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.6.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.7.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.7.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.7.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.7.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.8.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.8.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.8.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.8.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.layer_output_scale.weight, torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.9.inp_gate.weight, torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.9.proj.weight, torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.9.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.post_ffw_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.post_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_k.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.9.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_q.weight, torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:per_layer_model_proj.weight, torch.bfloat16 --> F16, shape = {2560, 10752}
INFO:hf-to-gguf:per_layer_proj_norm.weight, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 2560
INFO:hf-to-gguf:gguf: feed forward length = 10240
INFO:hf-to-gguf:gguf: head count = 8
INFO:hf-to-gguf:gguf: key-value head count = 2
WARNING:hf-to-gguf:Unknown RoPE type: proportional
INFO:hf-to-gguf:gguf: rope scaling type = NONE
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rope theta swa = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 1
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.context_length', overwriting it with new value 131072 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.head_count', overwriting it with new value 8 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-06 of type FLOAT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.key_length', overwriting it with new value 256 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.value_length', overwriting it with new value 256 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.rope.freq_base', overwriting it with new value 1000000.0 of type FLOAT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.head_count_kv', overwriting it with new value 2 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.key_length', overwriting it with new value 512 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.value_length', overwriting it with new value 512 of type UINT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:hf-to-gguf:Token '<|tool_call>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<tool_call|>' is set to USER_DEFINED</tool_call|>
INFO:hf-to-gguf:Token '<|tool_response>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<tool_response|>' is set to USER_DEFINED</tool_response|>
INFO:hf-to-gguf:Token '<|"|>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<|channel>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<channel|>' is set to USER_DEFINED</channel|>
WARNING:gguf.vocab:Unknown separator token '' in TemplateProcessing
INFO:gguf.vocab:Adding 514906 merge(s).
INFO:gguf.vocab:Setting special token type bos to 2
INFO:gguf.vocab:Setting special token type eos to 1
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_sep_token to False
WARNING:gguf.gguf_writer:Duplicated key name 'tokenizer.ggml.add_bos_token', overwriting it with new value True of type BOOL
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:gguf.gguf: n_tensors = 720, total_size = 15.0G
양자화 F16 으로 했더니 14GB 정도 합산해서 먹고 있고

llama.cpp로 해보니 애가 오작동한다. 왜 저러니 ㅠㅠ

+
| $ /mnt/Downloads/llama-b9305/llama-quantize usage: /mnt/Downloads/llama-b9305/llama-quantize [--help] [--allow-requantize] [--leave-output-tensor] [--pure] [--imatrix] [--include-weights] [--exclude-weights] [--output-tensor-type] [--token-embedding-type] [--tensor-type] [--tensor-type-file] [--prune-layers] [--keep-split] [--override-kv] [--dry-run] model-f32.gguf [model-quant.gguf] type [nthreads] --allow-requantize allow requantizing tensors that have already been quantized WARNING: this can severely reduce quality compared to quantizing from 16bit or 32bit! --leave-output-tensor leave output.weight un(re)quantized increases model size but may also increase quality, especially when requantizing --pure disable k-quant mixtures and quantize all tensors to the same type --imatrix file_name use data in file_name as importance matrix for quant optimizations --include-weights tensor_name use importance matrix for this/these tensor(s) --exclude-weights tensor_name do not use importance matrix for this/these tensor(s) --output-tensor-type ggml_type use this ggml_type for the output.weight tensor --token-embedding-type ggml_type use this ggml_type for the token embeddings tensor --tensor-type tensor_name=ggml_type quantize this tensor to this ggml_type this is an advanced option to selectively quantize tensors. may be specified multiple times. example: --tensor-type attn_q=q8_0 --tensor-type-file tensor_types.txt list of tensors to quantize to a specific ggml_type this is an advanced option to selectively quantize a long list of tensors. the file should use the same format as above, separated by spaces or newlines. --prune-layers L0,L1,L2... comma-separated list of layer numbers to prune from the model WARNING: this is an advanced option, use with care. --keep-split generate quantized model in the same shards as input --override-kv KEY=TYPE:VALUE override model metadata by key in the quantized model. may be specified multiple times. WARNING: this is an advanced option, use with care. --dry-run calculate and show the final quantization size without performing quantization example: llama-quantize --dry-run model-f32.gguf Q4_K note: --include-weights and --exclude-weights cannot be used together ----------------------------------------------------------------------------- allowed quantization types ----------------------------------------------------------------------------- 40 or Q1_0 : 1.125 bpw quantization 2 or Q4_0 : 4.34G, +0.4685 ppl @ Llama-3-8B 3 or Q4_1 : 4.78G, +0.4511 ppl @ Llama-3-8B 38 or MXFP4_MOE : MXFP4 MoE 8 or Q5_0 : 5.21G, +0.1316 ppl @ Llama-3-8B 9 or Q5_1 : 5.65G, +0.1062 ppl @ Llama-3-8B 19 or IQ2_XXS : 2.06 bpw quantization 20 or IQ2_XS : 2.31 bpw quantization 28 or IQ2_S : 2.5 bpw quantization 29 or IQ2_M : 2.7 bpw quantization 24 or IQ1_S : 1.56 bpw quantization 31 or IQ1_M : 1.75 bpw quantization 36 or TQ1_0 : 1.69 bpw ternarization 37 or TQ2_0 : 2.06 bpw ternarization 10 or Q2_K : 2.96G, +3.5199 ppl @ Llama-3-8B 21 or Q2_K_S : 2.96G, +3.1836 ppl @ Llama-3-8B 23 or IQ3_XXS : 3.06 bpw quantization 26 or IQ3_S : 3.44 bpw quantization 27 or IQ3_M : 3.66 bpw quantization mix 12 or Q3_K : alias for Q3_K_M 22 or IQ3_XS : 3.3 bpw quantization 11 or Q3_K_S : 3.41G, +1.6321 ppl @ Llama-3-8B 12 or Q3_K_M : 3.74G, +0.6569 ppl @ Llama-3-8B 13 or Q3_K_L : 4.03G, +0.5562 ppl @ Llama-3-8B 25 or IQ4_NL : 4.50 bpw non-linear quantization 30 or IQ4_XS : 4.25 bpw non-linear quantization 15 or Q4_K : alias for Q4_K_M 14 or Q4_K_S : 4.37G, +0.2689 ppl @ Llama-3-8B 15 or Q4_K_M : 4.58G, +0.1754 ppl @ Llama-3-8B 17 or Q5_K : alias for Q5_K_M 16 or Q5_K_S : 5.21G, +0.1049 ppl @ Llama-3-8B 17 or Q5_K_M : 5.33G, +0.0569 ppl @ Llama-3-8B 18 or Q6_K : 6.14G, +0.0217 ppl @ Llama-3-8B 7 or Q8_0 : 7.96G, +0.0026 ppl @ Llama-3-8B 1 or F16 : 14.00G, +0.0020 ppl @ Mistral-7B 32 or BF16 : 14.00G, -0.0050 ppl @ Mistral-7B 0 or F32 : 26.00G @ 7B COPY : only copy tensors, no quantizing |
F16 에서 Q8로 바꾸니 용량이 절반으로!
| -rw-rw-r-- 1 minimonk minimonk 4977169088 4월 29 10:43 gemma-4-E4B-it-Q4_K_M.gguf -rw-rw-r-- 1 minimonk minimonk 15053078208 6월 1 14:24 gguf.gguf -rw-rw-r-- 1 minimonk minimonk 5335273152 6월 1 14:32 Q4_K_M.gguf -rw-rw-r-- 1 minimonk minimonk 8031223488 6월 1 14:27 Q8.gguf |
| $ /mnt/Downloads/llama-b9305/llama-quantize ./gguf.gguf ./Q8.gguf Q8_0 load_backend: loaded RPC backend from /mnt/Downloads/llama-b9305/libggml-rpc.so ggml_vulkan: Found 3 Vulkan devices: ggml_vulkan: 0 = Intel(R) UHD Graphics 630 (CFL GT2) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 1 = NVIDIA GeForce GTX 1080 Ti (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none ggml_vulkan: 2 = NVIDIA GeForce GTX 1080 Ti (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none load_backend: loaded Vulkan backend from /mnt/Downloads/llama-b9305/libggml-vulkan.so load_backend: loaded CPU backend from /mnt/Downloads/llama-b9305/libggml-cpu-haswell.so llama_print_build_info: build = 9305 (63248fc3e) llama_print_build_info: built with GNU 11.4.0 for Linux x86_64 llama_quantize: quantizing './gguf.gguf' to './Q8.gguf' as Q8_0 llama_model_loader: loaded meta data with 37 key-value pairs and 720 tensors from ./gguf.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = gemma4 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Google llama_model_loader: - kv 3: general.size_label str = 7.5B llama_model_loader: - kv 4: gemma4.block_count u32 = 42 llama_model_loader: - kv 5: gemma4.context_length u32 = 131072 llama_model_loader: - kv 6: gemma4.embedding_length u32 = 2560 llama_model_loader: - kv 7: gemma4.feed_forward_length u32 = 10240 llama_model_loader: - kv 8: gemma4.attention.head_count u32 = 8 llama_model_loader: - kv 9: gemma4.attention.head_count_kv u32 = 2 llama_model_loader: - kv 10: gemma4.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 11: gemma4.rope.freq_base_swa f32 = 10000.000000 llama_model_loader: - kv 12: gemma4.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 13: gemma4.attention.key_length u32 = 512 llama_model_loader: - kv 14: gemma4.attention.value_length u32 = 512 llama_model_loader: - kv 15: general.file_type u32 = 1 llama_model_loader: - kv 16: gemma4.final_logit_softcapping f32 = 30.000000 llama_model_loader: - kv 17: gemma4.attention.sliding_window u32 = 512 llama_model_loader: - kv 18: gemma4.attention.shared_kv_layers u32 = 18 llama_model_loader: - kv 19: gemma4.embedding_length_per_layer_input u32 = 256 llama_model_loader: - kv 20: gemma4.attention.sliding_window_pattern arr[bool,42] = [true, true, true, true, true, false,... llama_model_loader: - kv 21: gemma4.attention.key_length_swa u32 = 256 llama_model_loader: - kv 22: gemma4.attention.value_length_swa u32 = 256 llama_model_loader: - kv 23: gemma4.rope.dimension_count u32 = 512 llama_model_loader: - kv 24: gemma4.rope.dimension_count_swa u32 = 256 llama_model_loader: - kv 25: general.quantization_version u32 = 2 llama_model_loader: - kv 26: tokenizer.ggml.model str = gemma4 llama_model_loader: - kv 27: tokenizer.ggml.tokens arr[str,262144] = ["<pad>", "<eos>", "<bos>", "<unk>", ... llama_model_loader: - kv 28: tokenizer.ggml.scores arr[f32,262144] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 29: tokenizer.ggml.token_type arr[i32,262144] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 30: tokenizer.ggml.merges arr[str,514906] = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ... llama_model_loader: - kv 31: tokenizer.ggml.bos_token_id u32 = 2 llama_model_loader: - kv 32: tokenizer.ggml.eos_token_id u32 = 1 llama_model_loader: - kv 33: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 34: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 35: tokenizer.ggml.add_sep_token bool = false llama_model_loader: - kv 36: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - type f32: 339 tensors llama_model_loader: - type f16: 381 tensors /// 여기 생략됨 llama_model_quantize_impl: model size = 14340.66 MiB (16.00 BPW) llama_model_quantize_impl: quant size = 7644.10 MiB (8.53 BPW) llama_quantize: quantize time = 99905.90 ms llama_quantize: total time = 99905.90 ms |
[ 1/ 720] output_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 2/ 720] per_layer_model_proj.weight - [ 2560, 10752, 1, 1], type = f16, size = 52.500 MiB
[ 3/ 720] per_layer_proj_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 4/ 720] per_layer_token_embd.weight - [ 10752, 262144, 1, 1], type = f16, converting to q8_0 .. size = 5376.00 MiB -> 2856.00 MiB
[ 5/ 720] rope_freqs.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 6/ 720] token_embd.weight - [ 2560, 262144, 1, 1], type = f16, converting to q8_0 .. size = 1280.00 MiB -> 680.00 MiB
[ 7/ 720] blk.0.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 8/ 720] blk.0.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 9/ 720] blk.0.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 10/ 720] blk.0.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 11/ 720] blk.0.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 12/ 720] blk.0.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 13/ 720] blk.0.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 14/ 720] blk.0.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 15/ 720] blk.0.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 16/ 720] blk.0.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 17/ 720] blk.0.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 18/ 720] blk.0.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 19/ 720] blk.0.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 20/ 720] blk.0.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 21/ 720] blk.0.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 22/ 720] blk.0.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 23/ 720] blk.0.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 24/ 720] blk.1.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 25/ 720] blk.1.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 26/ 720] blk.1.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 27/ 720] blk.1.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 28/ 720] blk.1.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 29/ 720] blk.1.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 30/ 720] blk.1.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 31/ 720] blk.1.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 32/ 720] blk.1.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 33/ 720] blk.1.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 34/ 720] blk.1.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 35/ 720] blk.1.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 36/ 720] blk.1.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 37/ 720] blk.1.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 38/ 720] blk.1.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 39/ 720] blk.1.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 40/ 720] blk.1.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 41/ 720] blk.2.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 42/ 720] blk.2.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 43/ 720] blk.2.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 44/ 720] blk.2.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 45/ 720] blk.2.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 46/ 720] blk.2.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 47/ 720] blk.2.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 48/ 720] blk.2.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 49/ 720] blk.2.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 50/ 720] blk.2.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 51/ 720] blk.2.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 52/ 720] blk.2.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 53/ 720] blk.2.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 54/ 720] blk.2.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 55/ 720] blk.2.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 56/ 720] blk.2.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 57/ 720] blk.2.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 58/ 720] blk.3.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 59/ 720] blk.3.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 60/ 720] blk.3.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 61/ 720] blk.3.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 62/ 720] blk.3.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 63/ 720] blk.3.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 64/ 720] blk.3.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 65/ 720] blk.3.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 66/ 720] blk.3.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 67/ 720] blk.3.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 68/ 720] blk.3.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 69/ 720] blk.3.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 70/ 720] blk.3.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 71/ 720] blk.3.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 72/ 720] blk.3.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 73/ 720] blk.3.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 74/ 720] blk.3.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 75/ 720] blk.4.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 76/ 720] blk.4.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 77/ 720] blk.4.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 78/ 720] blk.4.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 79/ 720] blk.4.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 80/ 720] blk.4.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 81/ 720] blk.4.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 82/ 720] blk.4.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 83/ 720] blk.4.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 84/ 720] blk.4.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 85/ 720] blk.4.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 86/ 720] blk.4.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 87/ 720] blk.4.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 88/ 720] blk.4.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 89/ 720] blk.4.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 90/ 720] blk.4.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 91/ 720] blk.4.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 92/ 720] blk.5.attn_k.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 93/ 720] blk.5.attn_k_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 94/ 720] blk.5.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 95/ 720] blk.5.attn_output.weight - [ 4096, 2560, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 96/ 720] blk.5.attn_q.weight - [ 2560, 4096, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 97/ 720] blk.5.attn_q_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 98/ 720] blk.5.attn_v.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 99/ 720] blk.5.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 100/ 720] blk.5.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 101/ 720] blk.5.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 102/ 720] blk.5.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 103/ 720] blk.5.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 104/ 720] blk.5.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 105/ 720] blk.5.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 106/ 720] blk.5.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 107/ 720] blk.5.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 108/ 720] blk.5.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 109/ 720] blk.6.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 110/ 720] blk.6.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 111/ 720] blk.6.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 112/ 720] blk.6.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 113/ 720] blk.6.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 114/ 720] blk.6.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 115/ 720] blk.6.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 116/ 720] blk.6.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 117/ 720] blk.6.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 118/ 720] blk.6.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 119/ 720] blk.6.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 120/ 720] blk.6.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 121/ 720] blk.6.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 122/ 720] blk.6.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 123/ 720] blk.6.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 124/ 720] blk.6.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 125/ 720] blk.6.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 126/ 720] blk.7.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 127/ 720] blk.7.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 128/ 720] blk.7.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 129/ 720] blk.7.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 130/ 720] blk.7.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 131/ 720] blk.7.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 132/ 720] blk.7.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 133/ 720] blk.7.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 134/ 720] blk.7.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 135/ 720] blk.7.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 136/ 720] blk.7.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 137/ 720] blk.7.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 138/ 720] blk.7.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 139/ 720] blk.7.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 140/ 720] blk.7.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 141/ 720] blk.7.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 142/ 720] blk.7.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 143/ 720] blk.8.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 144/ 720] blk.8.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 145/ 720] blk.8.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 146/ 720] blk.8.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 147/ 720] blk.8.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 148/ 720] blk.8.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 149/ 720] blk.8.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 150/ 720] blk.8.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 151/ 720] blk.8.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 152/ 720] blk.8.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 153/ 720] blk.8.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 154/ 720] blk.8.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 155/ 720] blk.8.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 156/ 720] blk.8.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 157/ 720] blk.8.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 158/ 720] blk.8.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 159/ 720] blk.8.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 160/ 720] blk.9.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 161/ 720] blk.9.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 162/ 720] blk.9.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 163/ 720] blk.9.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 164/ 720] blk.9.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 165/ 720] blk.9.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 166/ 720] blk.9.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 167/ 720] blk.9.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 168/ 720] blk.9.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 169/ 720] blk.9.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 170/ 720] blk.9.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 171/ 720] blk.9.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 172/ 720] blk.9.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 173/ 720] blk.9.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 174/ 720] blk.9.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 175/ 720] blk.9.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 176/ 720] blk.9.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 177/ 720] blk.10.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 178/ 720] blk.10.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 179/ 720] blk.10.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 180/ 720] blk.10.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 181/ 720] blk.10.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 182/ 720] blk.10.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 183/ 720] blk.10.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 184/ 720] blk.10.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 185/ 720] blk.10.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 186/ 720] blk.10.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 187/ 720] blk.10.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 188/ 720] blk.10.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 189/ 720] blk.10.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 190/ 720] blk.10.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 191/ 720] blk.10.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 192/ 720] blk.10.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 193/ 720] blk.10.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 194/ 720] blk.11.attn_k.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 195/ 720] blk.11.attn_k_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 196/ 720] blk.11.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 197/ 720] blk.11.attn_output.weight - [ 4096, 2560, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 198/ 720] blk.11.attn_q.weight - [ 2560, 4096, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 199/ 720] blk.11.attn_q_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 200/ 720] blk.11.attn_v.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 201/ 720] blk.11.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 202/ 720] blk.11.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 203/ 720] blk.11.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 204/ 720] blk.11.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 205/ 720] blk.11.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 206/ 720] blk.11.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 207/ 720] blk.11.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 208/ 720] blk.11.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 209/ 720] blk.11.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 210/ 720] blk.11.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 211/ 720] blk.12.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 212/ 720] blk.12.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 213/ 720] blk.12.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 214/ 720] blk.12.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 215/ 720] blk.12.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 216/ 720] blk.12.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 217/ 720] blk.12.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 218/ 720] blk.12.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 219/ 720] blk.12.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 220/ 720] blk.12.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 221/ 720] blk.12.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 222/ 720] blk.12.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 223/ 720] blk.12.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 224/ 720] blk.12.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 225/ 720] blk.12.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 226/ 720] blk.12.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 227/ 720] blk.12.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 228/ 720] blk.13.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 229/ 720] blk.13.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 230/ 720] blk.13.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 231/ 720] blk.13.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 232/ 720] blk.13.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 233/ 720] blk.13.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 234/ 720] blk.13.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 235/ 720] blk.13.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 236/ 720] blk.13.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 237/ 720] blk.13.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 238/ 720] blk.13.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 239/ 720] blk.13.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 240/ 720] blk.13.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 241/ 720] blk.13.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 242/ 720] blk.13.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 243/ 720] blk.13.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 244/ 720] blk.13.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 245/ 720] blk.14.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 246/ 720] blk.14.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 247/ 720] blk.14.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 248/ 720] blk.14.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 249/ 720] blk.14.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 250/ 720] blk.14.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 251/ 720] blk.14.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 252/ 720] blk.14.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 253/ 720] blk.14.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 254/ 720] blk.14.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 255/ 720] blk.14.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 256/ 720] blk.14.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 257/ 720] blk.14.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 258/ 720] blk.14.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 259/ 720] blk.14.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 260/ 720] blk.14.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 261/ 720] blk.14.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 262/ 720] blk.15.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 263/ 720] blk.15.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 264/ 720] blk.15.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 265/ 720] blk.15.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 266/ 720] blk.15.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 267/ 720] blk.15.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 268/ 720] blk.15.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 269/ 720] blk.15.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 270/ 720] blk.15.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 271/ 720] blk.15.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 272/ 720] blk.15.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 273/ 720] blk.15.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 274/ 720] blk.15.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 275/ 720] blk.15.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 276/ 720] blk.15.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 277/ 720] blk.15.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 278/ 720] blk.15.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 279/ 720] blk.16.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 280/ 720] blk.16.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 281/ 720] blk.16.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 282/ 720] blk.16.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 283/ 720] blk.16.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 284/ 720] blk.16.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 285/ 720] blk.16.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 286/ 720] blk.16.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 287/ 720] blk.16.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 288/ 720] blk.16.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 289/ 720] blk.16.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 290/ 720] blk.16.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 291/ 720] blk.16.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 292/ 720] blk.16.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 293/ 720] blk.16.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 294/ 720] blk.16.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 295/ 720] blk.16.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 296/ 720] blk.17.attn_k.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 297/ 720] blk.17.attn_k_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 298/ 720] blk.17.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 299/ 720] blk.17.attn_output.weight - [ 4096, 2560, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 300/ 720] blk.17.attn_q.weight - [ 2560, 4096, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 301/ 720] blk.17.attn_q_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 302/ 720] blk.17.attn_v.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 303/ 720] blk.17.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 304/ 720] blk.17.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 305/ 720] blk.17.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 306/ 720] blk.17.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 307/ 720] blk.17.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 308/ 720] blk.17.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 309/ 720] blk.17.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 310/ 720] blk.17.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 311/ 720] blk.17.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 312/ 720] blk.17.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 313/ 720] blk.18.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 314/ 720] blk.18.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 315/ 720] blk.18.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 316/ 720] blk.18.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 317/ 720] blk.18.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 318/ 720] blk.18.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 319/ 720] blk.18.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 320/ 720] blk.18.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 321/ 720] blk.18.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 322/ 720] blk.18.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 323/ 720] blk.18.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 324/ 720] blk.18.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 325/ 720] blk.18.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 326/ 720] blk.18.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 327/ 720] blk.18.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 328/ 720] blk.18.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 329/ 720] blk.18.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 330/ 720] blk.19.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 331/ 720] blk.19.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 332/ 720] blk.19.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 333/ 720] blk.19.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 334/ 720] blk.19.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 335/ 720] blk.19.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 336/ 720] blk.19.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 337/ 720] blk.19.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 338/ 720] blk.19.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 339/ 720] blk.19.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 340/ 720] blk.19.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 341/ 720] blk.19.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 342/ 720] blk.19.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 343/ 720] blk.19.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 344/ 720] blk.19.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 345/ 720] blk.19.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 346/ 720] blk.19.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 347/ 720] blk.20.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 348/ 720] blk.20.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 349/ 720] blk.20.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 350/ 720] blk.20.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 351/ 720] blk.20.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 352/ 720] blk.20.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 353/ 720] blk.20.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 354/ 720] blk.20.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 355/ 720] blk.20.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 356/ 720] blk.20.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 357/ 720] blk.20.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 358/ 720] blk.20.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 359/ 720] blk.20.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 360/ 720] blk.20.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 361/ 720] blk.20.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 362/ 720] blk.20.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 363/ 720] blk.20.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 364/ 720] blk.21.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 365/ 720] blk.21.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 366/ 720] blk.21.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 367/ 720] blk.21.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 368/ 720] blk.21.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 369/ 720] blk.21.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 370/ 720] blk.21.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 371/ 720] blk.21.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 372/ 720] blk.21.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 373/ 720] blk.21.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 374/ 720] blk.21.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 375/ 720] blk.21.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 376/ 720] blk.21.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 377/ 720] blk.21.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 378/ 720] blk.21.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 379/ 720] blk.21.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 380/ 720] blk.21.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 381/ 720] blk.22.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 382/ 720] blk.22.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 383/ 720] blk.22.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 384/ 720] blk.22.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 385/ 720] blk.22.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 386/ 720] blk.22.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 387/ 720] blk.22.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 388/ 720] blk.22.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 389/ 720] blk.22.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 390/ 720] blk.22.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 391/ 720] blk.22.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 392/ 720] blk.22.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 393/ 720] blk.22.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 394/ 720] blk.22.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 395/ 720] blk.22.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 396/ 720] blk.22.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 397/ 720] blk.22.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 398/ 720] blk.23.attn_k.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 399/ 720] blk.23.attn_k_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 400/ 720] blk.23.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 401/ 720] blk.23.attn_output.weight - [ 4096, 2560, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 402/ 720] blk.23.attn_q.weight - [ 2560, 4096, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 403/ 720] blk.23.attn_q_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 404/ 720] blk.23.attn_v.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 405/ 720] blk.23.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 406/ 720] blk.23.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 407/ 720] blk.23.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 408/ 720] blk.23.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 409/ 720] blk.23.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 410/ 720] blk.23.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 411/ 720] blk.23.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 412/ 720] blk.23.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 413/ 720] blk.23.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 414/ 720] blk.23.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 415/ 720] blk.24.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 416/ 720] blk.24.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 417/ 720] blk.24.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 418/ 720] blk.24.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 419/ 720] blk.24.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 420/ 720] blk.24.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 421/ 720] blk.24.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 422/ 720] blk.24.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 423/ 720] blk.24.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 424/ 720] blk.24.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 425/ 720] blk.24.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 426/ 720] blk.24.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 427/ 720] blk.24.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 428/ 720] blk.24.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 429/ 720] blk.24.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 430/ 720] blk.24.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 431/ 720] blk.24.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 432/ 720] blk.25.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 433/ 720] blk.25.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 434/ 720] blk.25.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 435/ 720] blk.25.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 436/ 720] blk.25.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 437/ 720] blk.25.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 438/ 720] blk.25.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 439/ 720] blk.25.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 440/ 720] blk.25.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 441/ 720] blk.25.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 442/ 720] blk.25.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 443/ 720] blk.25.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 444/ 720] blk.25.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 445/ 720] blk.25.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 446/ 720] blk.25.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 447/ 720] blk.25.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 448/ 720] blk.25.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 449/ 720] blk.26.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 450/ 720] blk.26.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 451/ 720] blk.26.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 452/ 720] blk.26.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 453/ 720] blk.26.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 454/ 720] blk.26.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 455/ 720] blk.26.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 456/ 720] blk.26.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 457/ 720] blk.26.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 458/ 720] blk.26.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 459/ 720] blk.26.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 460/ 720] blk.26.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 461/ 720] blk.26.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 462/ 720] blk.26.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 463/ 720] blk.26.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 464/ 720] blk.26.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 465/ 720] blk.26.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 466/ 720] blk.27.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 467/ 720] blk.27.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 468/ 720] blk.27.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 469/ 720] blk.27.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 470/ 720] blk.27.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 471/ 720] blk.27.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 472/ 720] blk.27.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 473/ 720] blk.27.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 474/ 720] blk.27.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 475/ 720] blk.27.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 476/ 720] blk.27.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 477/ 720] blk.27.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 478/ 720] blk.27.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 479/ 720] blk.27.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 480/ 720] blk.27.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 481/ 720] blk.27.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 482/ 720] blk.27.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 483/ 720] blk.28.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 484/ 720] blk.28.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 485/ 720] blk.28.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 486/ 720] blk.28.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 487/ 720] blk.28.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 488/ 720] blk.28.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 489/ 720] blk.28.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 490/ 720] blk.28.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 491/ 720] blk.28.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 492/ 720] blk.28.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 493/ 720] blk.28.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 494/ 720] blk.28.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 495/ 720] blk.28.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 496/ 720] blk.28.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 497/ 720] blk.28.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 498/ 720] blk.28.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 499/ 720] blk.28.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 500/ 720] blk.29.attn_k.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 501/ 720] blk.29.attn_k_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 502/ 720] blk.29.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 503/ 720] blk.29.attn_output.weight - [ 4096, 2560, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 504/ 720] blk.29.attn_q.weight - [ 2560, 4096, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 505/ 720] blk.29.attn_q_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 506/ 720] blk.29.attn_v.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 507/ 720] blk.29.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 508/ 720] blk.29.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 509/ 720] blk.29.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 510/ 720] blk.29.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 511/ 720] blk.29.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 512/ 720] blk.29.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 513/ 720] blk.29.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 514/ 720] blk.29.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 515/ 720] blk.29.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 516/ 720] blk.29.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 517/ 720] blk.30.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 518/ 720] blk.30.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 519/ 720] blk.30.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 520/ 720] blk.30.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 521/ 720] blk.30.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 522/ 720] blk.30.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 523/ 720] blk.30.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 524/ 720] blk.30.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 525/ 720] blk.30.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 526/ 720] blk.30.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 527/ 720] blk.30.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 528/ 720] blk.30.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 529/ 720] blk.30.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 530/ 720] blk.30.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 531/ 720] blk.30.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 532/ 720] blk.30.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 533/ 720] blk.30.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 534/ 720] blk.31.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 535/ 720] blk.31.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 536/ 720] blk.31.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 537/ 720] blk.31.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 538/ 720] blk.31.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 539/ 720] blk.31.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 540/ 720] blk.31.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 541/ 720] blk.31.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 542/ 720] blk.31.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 543/ 720] blk.31.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 544/ 720] blk.31.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 545/ 720] blk.31.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 546/ 720] blk.31.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 547/ 720] blk.31.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 548/ 720] blk.31.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 549/ 720] blk.31.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 550/ 720] blk.31.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 551/ 720] blk.32.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 552/ 720] blk.32.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 553/ 720] blk.32.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 554/ 720] blk.32.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 555/ 720] blk.32.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 556/ 720] blk.32.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 557/ 720] blk.32.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 558/ 720] blk.32.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 559/ 720] blk.32.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 560/ 720] blk.32.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 561/ 720] blk.32.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 562/ 720] blk.32.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 563/ 720] blk.32.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 564/ 720] blk.32.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 565/ 720] blk.32.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 566/ 720] blk.32.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 567/ 720] blk.32.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 568/ 720] blk.33.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 569/ 720] blk.33.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 570/ 720] blk.33.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 571/ 720] blk.33.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 572/ 720] blk.33.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 573/ 720] blk.33.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 574/ 720] blk.33.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 575/ 720] blk.33.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 576/ 720] blk.33.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 577/ 720] blk.33.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 578/ 720] blk.33.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 579/ 720] blk.33.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 580/ 720] blk.33.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 581/ 720] blk.33.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 582/ 720] blk.33.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 583/ 720] blk.33.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 584/ 720] blk.33.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 585/ 720] blk.34.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 586/ 720] blk.34.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 587/ 720] blk.34.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 588/ 720] blk.34.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 589/ 720] blk.34.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 590/ 720] blk.34.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 591/ 720] blk.34.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 592/ 720] blk.34.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 593/ 720] blk.34.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 594/ 720] blk.34.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 595/ 720] blk.34.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 596/ 720] blk.34.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 597/ 720] blk.34.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 598/ 720] blk.34.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 599/ 720] blk.34.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 600/ 720] blk.34.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 601/ 720] blk.34.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 602/ 720] blk.35.attn_k.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 603/ 720] blk.35.attn_k_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 604/ 720] blk.35.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 605/ 720] blk.35.attn_output.weight - [ 4096, 2560, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 606/ 720] blk.35.attn_q.weight - [ 2560, 4096, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 607/ 720] blk.35.attn_q_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 608/ 720] blk.35.attn_v.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 609/ 720] blk.35.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 610/ 720] blk.35.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 611/ 720] blk.35.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 612/ 720] blk.35.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 613/ 720] blk.35.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 614/ 720] blk.35.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 615/ 720] blk.35.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 616/ 720] blk.35.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 617/ 720] blk.35.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 618/ 720] blk.35.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 619/ 720] blk.36.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 620/ 720] blk.36.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 621/ 720] blk.36.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 622/ 720] blk.36.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 623/ 720] blk.36.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 624/ 720] blk.36.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 625/ 720] blk.36.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 626/ 720] blk.36.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 627/ 720] blk.36.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 628/ 720] blk.36.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 629/ 720] blk.36.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 630/ 720] blk.36.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 631/ 720] blk.36.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 632/ 720] blk.36.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 633/ 720] blk.36.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 634/ 720] blk.36.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 635/ 720] blk.36.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 636/ 720] blk.37.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 637/ 720] blk.37.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 638/ 720] blk.37.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 639/ 720] blk.37.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 640/ 720] blk.37.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 641/ 720] blk.37.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 642/ 720] blk.37.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 643/ 720] blk.37.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 644/ 720] blk.37.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 645/ 720] blk.37.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 646/ 720] blk.37.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 647/ 720] blk.37.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 648/ 720] blk.37.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 649/ 720] blk.37.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 650/ 720] blk.37.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 651/ 720] blk.37.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 652/ 720] blk.37.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 653/ 720] blk.38.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 654/ 720] blk.38.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 655/ 720] blk.38.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 656/ 720] blk.38.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 657/ 720] blk.38.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 658/ 720] blk.38.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 659/ 720] blk.38.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 660/ 720] blk.38.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 661/ 720] blk.38.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 662/ 720] blk.38.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 663/ 720] blk.38.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 664/ 720] blk.38.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 665/ 720] blk.38.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 666/ 720] blk.38.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 667/ 720] blk.38.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 668/ 720] blk.38.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 669/ 720] blk.38.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 670/ 720] blk.39.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 671/ 720] blk.39.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 672/ 720] blk.39.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 673/ 720] blk.39.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 674/ 720] blk.39.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 675/ 720] blk.39.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 676/ 720] blk.39.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 677/ 720] blk.39.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 678/ 720] blk.39.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 679/ 720] blk.39.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 680/ 720] blk.39.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 681/ 720] blk.39.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 682/ 720] blk.39.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 683/ 720] blk.39.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 684/ 720] blk.39.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 685/ 720] blk.39.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 686/ 720] blk.39.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 687/ 720] blk.40.attn_k.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 688/ 720] blk.40.attn_k_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 689/ 720] blk.40.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 690/ 720] blk.40.attn_output.weight - [ 2048, 2560, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 691/ 720] blk.40.attn_q.weight - [ 2560, 2048, 1, 1], type = f16, converting to q8_0 .. size = 10.00 MiB -> 5.31 MiB
[ 692/ 720] blk.40.attn_q_norm.weight - [ 256, 1, 1, 1], type = f32, size = 0.001 MiB
[ 693/ 720] blk.40.attn_v.weight - [ 2560, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.50 MiB -> 1.33 MiB
[ 694/ 720] blk.40.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 695/ 720] blk.40.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 696/ 720] blk.40.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 697/ 720] blk.40.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 698/ 720] blk.40.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 699/ 720] blk.40.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 700/ 720] blk.40.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 701/ 720] blk.40.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 702/ 720] blk.40.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 703/ 720] blk.40.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 704/ 720] blk.41.attn_k.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 705/ 720] blk.41.attn_k_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 706/ 720] blk.41.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 707/ 720] blk.41.attn_output.weight - [ 4096, 2560, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 708/ 720] blk.41.attn_q.weight - [ 2560, 4096, 1, 1], type = f16, converting to q8_0 .. size = 20.00 MiB -> 10.62 MiB
[ 709/ 720] blk.41.attn_q_norm.weight - [ 512, 1, 1, 1], type = f32, size = 0.002 MiB
[ 710/ 720] blk.41.attn_v.weight - [ 2560, 1024, 1, 1], type = f16, converting to q8_0 .. size = 5.00 MiB -> 2.66 MiB
[ 711/ 720] blk.41.ffn_down.weight - [ 10240, 2560, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 712/ 720] blk.41.ffn_gate.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 713/ 720] blk.41.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 714/ 720] blk.41.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, converting to q8_0 .. size = 50.00 MiB -> 26.56 MiB
[ 715/ 720] blk.41.inp_gate.weight - [ 2560, 256, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
[ 716/ 720] blk.41.layer_output_scale.weight - [ 1, 1, 1, 1], type = f32, size = 0.000 MiB
[ 717/ 720] blk.41.post_attention_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 718/ 720] blk.41.post_ffw_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 719/ 720] blk.41.post_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MiB
[ 720/ 720] blk.41.proj.weight - [ 256, 2560, 1, 1], type = f16, converting to q8_0 .. size = 1.25 MiB -> 0.66 MiB
'프로그램 사용 > ai 프로그램' 카테고리의 다른 글
| antigravity 2 (0) | 2026.05.28 |
|---|---|
| telegram bot api 로 기능 변경하기 (0) | 2026.05.25 |
| python huggingface 저장경로 변경하기 (0) | 2026.05.24 |
| vllm 설치, 실행 실패 -> 사실상 포기 (0) | 2026.05.24 |
| antigravity gemini flash 할당량이 깃털 같구만? (0) | 2026.05.22 |
