'프로그램 사용'에 해당되는 글 2593건

  1. 2026.06.01 llama.cpp reasoning 옵션
  2. 2026.06.01 torchvision model
  3. 2026.06.01 safetensors to gguf 일단 실패
  4. 2026.06.01 STFPM 실행
  5. 2026.05.28 antigravity 2
  6. 2026.05.27 blener ll3m
  7. 2026.05.27 openscad
  8. 2026.05.27 딥러닝 학습 관련(epoch, loss)
  9. 2026.05.25 telegram bot api 로 기능 변경하기
  10. 2026.05.24 python huggingface 저장경로 변경하기

옵션을 살펴보면

/llama-b9305$ ./llama-server --help
/llama-b9305$ ./llama-cli --help

 

4개 정도가 나오는데 gemma4-e4b는 reasoning을 지원하는지 옵션없이 실행하면 켜져있는 듯 하다.

--reasoning-format FORMAT               controls whether thought tags are allowed and/or extracted from the
                                        response, and in which format they're returned; one of:
                                        - none: leaves thoughts unparsed in `message.content`
                                        - deepseek: puts thoughts in `message.reasoning_content`
                                        - deepseek-legacy: keeps `<think>` tags in `message.content` while
                                        also populating `message.reasoning_content`
                                        (default: auto)
                                        (env: LLAMA_ARG_THINK)
-rea,  --reasoning [on|off|auto]        Use reasoning/thinking in the chat ('on', 'off', or 'auto', default:
                                        'auto' (detect from template))
                                        (env: LLAMA_ARG_REASONING)
--reasoning-budget N                    token budget for thinking: -1 for unrestricted, 0 for immediate end,
                                        N>0 for token budget (default: -1)
                                        (env: LLAMA_ARG_THINK_BUDGET)
--reasoning-budget-message MESSAGE      message injected before the end-of-thinking tag when reasoning budget
                                        is exhausted (default: none)
                                        (env: LLAMA_ARG_THINK_BUDGET_MESSAGE)

 

cli  에서는 잘 먹는것 같다.

$ llama-b9305/llama-cli  --model /mnt/Downloads/model/gemma4-e2b/gemma-4-E2B-it-Q4_K_M.gguf -rea on -sm none

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b9305-63248fc3e
model      : gemma-4-E2B-it-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> 무슨생각하고 있어?

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user asked "무슨 생각하고 있어?" (What are you thinking?). This is an open-ended question about the model's internal state or current activity
.
2.  **Determine Identity and Nature:** I am Gemma 4, a Large Language Model. I do not have consciousness, feelings, personal thoughts, or internal subjective experiences.
3.  **Identify the Core Activity:** My current "thought" process is processing the input (the user's question) and generating a relevant, coherent, and helpful response.
4.  **Formulate the Answer Strategy (Internal Monologue -> External Response):**
    *   Acknowledge the nature of the question.                                                                                                                                    *   State clearly that I don't "think" in a human sense (no consciousness/feelings).
    *   Describe what I *am* doing (processing information, analyzing language, generating text).
    *   Pivot back to the user (ask how I can help).
5.  **Draft the Response in Korean (Focusing on natural, polite tone):**

    *   *Initial thought:* 저는 생각하지 않아요. (I don't think.)
    *   *Elaboration:* 저는 정보를 처리하고 답변을 생성하고 있어요. (I am processing information and generating answers.)
    *   *Contextualizing the current state:* 지금 사용자님의 질문을 이해하고 가장 적절한 답변을 찾고 있습니다. (Right now, I am understanding your question and finding the mos
t appropriate answer.)
    *   *Call to action:* 무엇을 도와드릴까요? (What can I help you with?)

6.  **Final Review:** The response is accurate, polite, and directly addresses the user's intent while staying true to my identity as an LLM. (The drafted Korean response belo
w is good.)
[End thinking]

저는 인간처럼 '생각'을 하지는 않습니다. 저는 대규모 언어 모델(Large Language Model)이기 때문에, 정보를 처리하고, 패턴을 분석하며, 사용자님의 질문에 가장 적절하고 유용한 답변을
 생성하는 작업을 하고 있습니다.

지금은 사용자님의 질문을 이해하고, 그에 맞는 정보를 찾고 답변을 구성하는 과정에 집중하고 있습니다.

궁금한 것이 있거나 도움이 필요하시면 언제든지 말씀해 주세요! 😊
[ Prompt: 193.7 t/s | Generation: 94.4 t/s ]

>

Exiting...

 

$ llama-b9305/llama-cli  --model /mnt/Downloads/model/gemma4-e2b/gemma-4-E2B-it-Q4_K_M.gguf -rea off -sm none

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b9305-63248fc3e
model      : gemma-4-E2B-it-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> 무슨생각하고 있어?

저는 Google에서 훈련한 대규모 언어 모델입니다. 😊

**지금 저는 당신의 질문에 답하고, 정보를 제공하고, 창의적인 텍스트를 생성하는 등 제가 할 수 있는 모든 작업을 수행하고 있습니다.**

혹시 **특정한 주제**에 대해 궁금하신가요? 아니면 **제가 무엇을 할 수 있는지** 더 알고 싶으신가요?

어떤 것에 대해 이야기하고 싶으신지 말씀해주시면, 그에 맞춰서 생각하고 대화할 준비가 되어 있습니다!

[ Prompt: 126.5 t/s | Generation: 95.2 t/s ]

>

Exiting...

 

llama-swap 에서 llama-server에 -rea off 하고 질문을 하면 옵션이 안 먹나? 똑같이 reasoning 나온다. 머냐?

 

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

gemma 12b, tesla t4 16GB / 1080 ti 11GB * 2 / 3070 8GB  (0) 2026.06.04
nvidia tesla t4 16GB  (0) 2026.06.02
torchvision model  (0) 2026.06.01
safetensors to gguf 일단 실패  (0) 2026.06.01
antigravity 2  (0) 2026.05.28
Posted by 구차니

pytorch 껀진 모르겠는데 아무튼, 

torchvision에 model을 뒤지다 보니

mobilenet v3 관련 QAT 가 있어서 정리

 

[링크 : https://github.com/gdwang08/STFPM]

[링크 : https://github.com/gdwang08/STFPM/blob/main/main.py]

[링크 : https://docs.pytorch.org/vision/main/models/mobilenetv3_quant.html#model-builders]

[링크 : https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/mobilenetv3.py]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

nvidia tesla t4 16GB  (0) 2026.06.02
llama.cpp reasoning 옵션  (0) 2026.06.01
safetensors to gguf 일단 실패  (0) 2026.06.01
antigravity 2  (0) 2026.05.28
telegram bot api 로 기능 변경하기  (0) 2026.05.25
Posted by 구차니

한번 해봐야지. 그나저나 1080 에서 F16만 된다고 하면.. 그냥 직접 변환하는것도 방법이 될려나?

/llama.cpp$ ls convert_*
convert_hf_to_gguf.py  convert_hf_to_gguf_update.py  convert_llama_ggml_to_gguf.py  convert_lora_to_gguf.py

 

[링크 : https://huggingface.co/google/gemma-4-E4B/tree/main]

[링크 : https://velog.io/@choonsik_mom/llama.cpp로-gguf-모델-서빙하기-ul02hone]

 

+

실행해보면 다음과 같이 나오고

$ /home/minimonk/src/llama.cpp/convert_hf_to_gguf.py 
usage: convert_hf_to_gguf.py [-h] [--vocab-only] [--outfile OUTFILE] [--outtype {f32,f16,bf16,q8_0,tq1_0,tq2_0,auto}]
                             [--bigendian] [--use-temp-file] [--no-lazy] [--model-name MODEL_NAME] [--verbose]
                             [--split-max-tensors SPLIT_MAX_TENSORS] [--split-max-size SPLIT_MAX_SIZE] [--dry-run]
                             [--no-tensor-first-split] [--metadata METADATA] [--print-supported-models] [--remote]
                             [--mmproj] [--mistral-format] [--disable-mistral-community-chat-template]
                             [--sentence-transformers-dense-modules] [--fuse-gate-up-exps]
                             [model]
convert_hf_to_gguf.py: error: the following arguments are required: model

 

huggingface를 통해서 받아도 되고, 최소한의 파일 저 3개를 받아줘도 된다.

/mnt/Downloads/model/gemma4-e4b/google$ ls
config.json  model.safetensors  tokenizer.json

 

GPU를 사용하지 않고, 한번에 메모리에 전부 올려서 하는건지 메모리 터져서 한번 리부팅.. 크흡

그 와중에 옵션 안주니까 BF16 으로 설정해버리는 센스(!) 하지만 난 BF16 가속을 못받는 하드웨어.. ㅠ

$ /home/minimonk/src/llama.cpp/convert_hf_to_gguf.py ./google/ --outfile gguf.gguf
INFO:hf-to-gguf:Loading model: google
INFO:hf-to-gguf:Model architecture: Gemma4ForConditionalGeneration
INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'
INFO:hf-to-gguf:heuristics detected bfloat16 tensor dtype, setting --outtype bf16
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...

Writing:   9%|██████▎                                                                | 1.34G/15.0G [00:20<01:16, 180Mbyte/s]

Writing: 100%|███████████████████████████████████████████████████████████████████████| 15.0G/15.0G [01:54<00:00, 131Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to gguf.gguf

 

exporting model.. 과 wrting 사이 로그

더보기

INFO:hf-to-gguf:rope_freqs.weight,                 torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:token_embd.weight,                 torch.bfloat16 --> BF16, shape = {2560, 262144}
INFO:hf-to-gguf:per_layer_token_embd.weight,       torch.bfloat16 --> BF16, shape = {10752, 262144}
INFO:hf-to-gguf:blk.0.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.0.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.0.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.0.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.0.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.0.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.0.attn_output.weight,          torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.0.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.0.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.1.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.1.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.1.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.1.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.1.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.1.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.1.attn_output.weight,          torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.1.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.1.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.10.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.10.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.10.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.10.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.10.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.10.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.10.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.10.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.10.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.11.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.11.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.11.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.11.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.11.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.11.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight,         torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.11.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.11.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.12.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.12.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.12.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.12.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.12.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.12.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.12.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.12.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.12.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.13.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.13.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.13.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.13.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.13.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.13.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.13.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.13.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.13.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.14.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.14.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.14.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.14.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.14.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.14.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.14.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.14.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.14.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.15.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.15.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.15.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.15.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.15.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.15.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.15.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.15.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.15.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.16.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.16.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.16.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.16.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.16.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.16.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.16.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.16.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.16.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.17.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.17.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.17.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.17.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.17.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.17.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight,         torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.17.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.17.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.18.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.18.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.18.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.18.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.18.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.18.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.18.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.18.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.19.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.19.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.19.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.19.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.19.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.19.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.19.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.19.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.19.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.2.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.2.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.2.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.2.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.2.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.2.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.2.attn_output.weight,          torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.2.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.2.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.20.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.20.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.20.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.20.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.20.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.20.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.20.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.20.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.20.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.21.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.21.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.21.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.21.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.21.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.21.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.21.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.21.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.21.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.22.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.22.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.22.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.22.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.22.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.22.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.22.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.22.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.22.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.23.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.23.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.23.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.23.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.23.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.23.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight,         torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.23.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.23.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.24.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.24.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.24.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.24.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.24.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.24.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.24.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.24.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.24.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.24.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.25.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.25.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.25.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.25.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.25.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.25.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.25.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.25.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.25.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.25.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.26.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.26.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.26.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.26.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.26.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.26.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.26.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.26.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.26.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.26.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.27.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.27.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.27.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.27.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.27.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.27.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.27.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.27.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.27.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.27.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.28.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.28.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.28.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.28.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.28.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.28.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.28.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.28.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.28.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.28.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.29.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.29.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.29.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.29.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.29.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.29.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.29.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.29.attn_output.weight,         torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.29.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.29.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.3.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.3.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.3.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.3.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.3.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.3.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.3.attn_output.weight,          torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.3.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.3.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.30.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.30.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.30.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.30.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.30.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.30.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.30.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.30.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.30.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.30.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.31.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.31.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.31.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.31.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.31.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.31.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.31.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.31.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.31.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.31.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.32.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.32.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.32.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.32.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.32.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.32.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.32.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.32.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.32.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.32.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.33.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.33.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.33.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.33.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.33.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.33.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.33.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.33.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.33.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.33.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.34.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.34.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.34.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.34.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.34.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.34.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.34.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.34.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.34.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.34.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.35.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.35.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.35.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.35.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.35.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.35.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.35.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.35.attn_output.weight,         torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.35.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.35.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.36.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.36.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.36.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.36.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.36.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.36.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.36.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.36.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.36.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.36.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.37.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.37.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.37.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.37.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.37.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.37.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.37.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.37.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.37.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.37.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.38.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.38.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.38.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.38.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.38.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.38.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.38.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.38.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.38.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.38.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.39.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.39.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.39.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.39.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.39.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.39.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.39.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.39.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.39.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.39.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.4.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.4.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.4.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.4.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.4.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.4.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.4.attn_output.weight,          torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.4.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.4.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.40.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.40.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.40.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.40.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.40.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.40.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.40.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.40.attn_output.weight,         torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.40.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.40.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.41.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.41.ffn_down.weight,            torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.41.ffn_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.41.ffn_up.weight,              torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.41.inp_gate.weight,            torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.41.proj.weight,                torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.41.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_k.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.41.attn_output.weight,         torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.41.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_q.weight,              torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.41.attn_v.weight,              torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.5.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.5.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.5.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.5.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.5.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight,          torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.5.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.5.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.6.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.6.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.6.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.6.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.6.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.6.attn_output.weight,          torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.6.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.6.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.7.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.7.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.7.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.7.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.7.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.7.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.7.attn_output.weight,          torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.7.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.7.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.8.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.8.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.8.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.8.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.8.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.8.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.8.attn_output.weight,          torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.8.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.8.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.9.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.9.ffn_down.weight,             torch.bfloat16 --> BF16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.9.ffn_up.weight,               torch.bfloat16 --> BF16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.9.inp_gate.weight,             torch.bfloat16 --> BF16, shape = {2560, 256}
INFO:hf-to-gguf:blk.9.proj.weight,                 torch.bfloat16 --> BF16, shape = {256, 2560}
INFO:hf-to-gguf:blk.9.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_k.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:blk.9.attn_output.weight,          torch.bfloat16 --> BF16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.9.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_q.weight,               torch.bfloat16 --> BF16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight,               torch.bfloat16 --> BF16, shape = {2560, 512}
INFO:hf-to-gguf:output_norm.weight,                torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:per_layer_model_proj.weight,       torch.bfloat16 --> BF16, shape = {2560, 10752}
INFO:hf-to-gguf:per_layer_proj_norm.weight,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 2560
INFO:hf-to-gguf:gguf: feed forward length = 10240
INFO:hf-to-gguf:gguf: head count = 8
INFO:hf-to-gguf:gguf: key-value head count = 2
WARNING:hf-to-gguf:Unknown RoPE type: proportional
INFO:hf-to-gguf:gguf: rope scaling type = NONE
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rope theta swa = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.context_length', overwriting it with new value 131072 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.head_count', overwriting it with new value 8 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-06 of type FLOAT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.key_length', overwriting it with new value 256 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.value_length', overwriting it with new value 256 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.rope.freq_base', overwriting it with new value 1000000.0 of type FLOAT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.head_count_kv', overwriting it with new value 2 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.key_length', overwriting it with new value 512 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.value_length', overwriting it with new value 512 of type UINT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:hf-to-gguf:Token '<|tool_call>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<tool_call|>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<|tool_response>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<tool_response|>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<|"|>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<|channel>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<channel|>' is set to USER_DEFINED
WARNING:gguf.vocab:Unknown separator token '<bos>' in TemplateProcessing<pair>
INFO:gguf.vocab:Adding 514906 merge(s).
INFO:gguf.vocab:Setting special token type bos to 2
INFO:gguf.vocab:Setting special token type eos to 1
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_sep_token to False
WARNING:gguf.gguf_writer:Duplicated key name 'tokenizer.ggml.add_bos_token', overwriting it with new value True of type BOOL
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:gguf.gguf: n_tensors = 720, total_size = 15.0G

 

둘 다 머.. 거의 동일한 사이즈로 나왔다. 1기가도 안줄었네

drwxrwxr-x 2 minimonk minimonk        4096  6월  1 13:52 ./
drwxrwxr-x 3 minimonk minimonk        4096  6월  1 13:52 ../
-rw-rw-r-- 1 minimonk minimonk        5105  6월  1 13:51 config.json
-rw-rw-r-- 1 minimonk minimonk 15992595884  6월  1 13:03 model.safetensors
-rw-rw-r-- 1 minimonk minimonk    32170070  6월  1 13:51 tokenizer.json
-rw-rw-r-- 1 minimonk minimonk 15053078208  6월  1 14:01 gguf.gguf

 

그래서 F16 으로 하게 했더니  BF16 이나 F16 이나 용량은 바이트 단위 까지 동일!

-rw-rw-r-- 1 minimonk minimonk 15053078208  6월  1 14:05 gguf.gguf

 

일단 옵션은 outtype 으로 주면된다.

$ /home/minimonk/src/llama.cpp/convert_hf_to_gguf.py ./google/ --outfile gguf.gguf --outtype f16
INFO:hf-to-gguf:Loading model: google
INFO:hf-to-gguf:Model architecture: Gemma4ForConditionalGeneration
INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...

Writing:   9%|██████▎                                                                | 1.34G/15.0G [00:02<00:27, 503Mbyte/s]

Writing: 100%|███████████████████████████████████████████████████████████████████████| 15.0G/15.0G [00:49<00:00, 305Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to gguf.gguf

 

exporting model.. 과 wrting 사이 로그

더보기

INFO:hf-to-gguf:rope_freqs.weight,                 torch.float32 --> F32, shape = {256}
INFO:hf-to-gguf:token_embd.weight,                 torch.bfloat16 --> F16, shape = {2560, 262144}
INFO:hf-to-gguf:per_layer_token_embd.weight,       torch.bfloat16 --> F16, shape = {10752, 262144}
INFO:hf-to-gguf:blk.0.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.0.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.0.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.0.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.0.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.0.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.0.attn_output.weight,          torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.0.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.0.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.1.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.1.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.1.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.1.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.1.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.1.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.1.attn_output.weight,          torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.1.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.1.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.1.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.10.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.10.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.10.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.10.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.10.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.10.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.10.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.10.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.10.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.10.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.11.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.11.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.11.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.11.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.11.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.11.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight,         torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.11.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.11.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.12.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.12.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.12.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.12.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.12.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.12.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.12.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.12.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.12.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.12.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.13.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.13.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.13.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.13.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.13.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.13.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.13.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.13.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.13.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.13.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.14.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.14.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.14.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.14.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.14.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.14.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.14.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.14.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.14.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.14.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.15.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.15.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.15.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.15.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.15.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.15.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.15.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.15.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.15.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.15.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.16.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.16.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.16.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.16.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.16.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.16.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.16.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.16.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.16.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.16.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.17.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.17.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.17.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.17.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.17.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.17.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight,         torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.17.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.17.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.18.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.18.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.18.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.18.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.18.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.18.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.18.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.18.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.18.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.19.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.19.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.19.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.19.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.19.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.19.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.19.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.19.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.19.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.19.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.2.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.2.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.2.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.2.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.2.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.2.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.2.attn_output.weight,          torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.2.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.2.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.2.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.20.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.20.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.20.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.20.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.20.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.20.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.20.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.20.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.20.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.20.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.21.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.21.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.21.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.21.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.21.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.21.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.21.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.21.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.21.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.21.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.22.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.22.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.22.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.22.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.22.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.22.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.22.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.22.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.22.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.22.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.23.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.23.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.23.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.23.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.23.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.23.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight,         torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.23.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.23.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.24.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.24.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.24.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.24.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.24.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.24.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.24.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.24.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.24.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.24.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.24.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.25.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.25.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.25.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.25.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.25.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.25.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.25.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.25.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.25.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.25.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.25.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.26.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.26.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.26.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.26.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.26.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.26.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.26.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.26.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.26.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.26.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.26.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.27.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.27.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.27.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.27.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.27.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.27.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.27.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.27.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.27.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.27.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.27.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.28.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.28.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.28.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.28.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.28.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.28.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.28.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.28.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.28.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.28.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.28.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.29.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.29.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.29.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.29.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.29.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.29.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.29.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.29.attn_output.weight,         torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.29.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.29.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.3.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.3.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.3.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.3.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.3.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.3.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.3.attn_output.weight,          torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.3.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.3.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.3.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.30.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.30.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.30.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.30.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.30.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.30.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.30.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.30.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.30.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.30.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.30.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.31.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.31.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.31.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.31.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.31.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.31.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.31.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.31.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.31.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.31.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.31.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.32.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.32.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.32.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.32.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.32.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.32.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.32.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.32.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.32.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.32.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.32.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.33.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.33.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.33.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.33.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.33.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.33.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.33.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.33.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.33.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.33.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.33.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.34.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.34.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.34.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.34.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.34.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.34.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.34.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.34.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.34.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.34.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.34.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.35.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.35.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.35.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.35.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.35.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.35.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.35.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.35.attn_output.weight,         torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.35.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.35.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.36.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.36.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.36.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.36.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.36.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.36.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.36.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.36.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.36.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.36.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.36.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.36.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.37.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.37.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.37.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.37.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.37.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.37.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.37.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.37.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.37.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.37.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.37.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.37.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.38.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.38.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.38.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.38.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.38.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.38.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.38.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.38.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.38.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.38.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.38.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.38.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.39.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.39.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.39.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.39.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.39.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.39.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.39.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.39.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.39.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.39.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.39.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.39.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.4.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.4.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.4.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.4.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.4.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.4.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.4.attn_output.weight,          torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.4.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.4.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.4.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.40.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.40.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.40.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.40.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.40.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.40.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.40.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.40.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.40.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.40.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.40.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.40.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.41.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.layer_output_scale.weight,  torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.41.ffn_down.weight,            torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.41.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.41.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.41.inp_gate.weight,            torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.41.proj.weight,                torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.41.post_attention_norm.weight, torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.post_ffw_norm.weight,       torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.post_norm.weight,           torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.41.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_k.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.41.attn_output.weight,         torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.41.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_q.weight,              torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.41.attn_v.weight,              torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.5.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.5.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.5.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.5.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.5.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight,          torch.bfloat16 --> F16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.5.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.5.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.6.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.6.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.6.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.6.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.6.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.6.attn_output.weight,          torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.6.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.6.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.6.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.7.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.7.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.7.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.7.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.7.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.7.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.7.attn_output.weight,          torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.7.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.7.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.7.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.8.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.8.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.8.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.8.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.8.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.8.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.8.attn_output.weight,          torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.8.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.8.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.8.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.9.attn_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.layer_output_scale.weight,   torch.bfloat16 --> F32, shape = {1}
INFO:hf-to-gguf:blk.9.ffn_down.weight,             torch.bfloat16 --> F16, shape = {10240, 2560}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.9.ffn_up.weight,               torch.bfloat16 --> F16, shape = {2560, 10240}
INFO:hf-to-gguf:blk.9.inp_gate.weight,             torch.bfloat16 --> F16, shape = {2560, 256}
INFO:hf-to-gguf:blk.9.proj.weight,                 torch.bfloat16 --> F16, shape = {256, 2560}
INFO:hf-to-gguf:blk.9.post_attention_norm.weight,  torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.post_ffw_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.post_norm.weight,            torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,             torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.attn_k_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_k.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:blk.9.attn_output.weight,          torch.bfloat16 --> F16, shape = {2048, 2560}
INFO:hf-to-gguf:blk.9.attn_q_norm.weight,          torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.9.attn_q.weight,               torch.bfloat16 --> F16, shape = {2560, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight,               torch.bfloat16 --> F16, shape = {2560, 512}
INFO:hf-to-gguf:output_norm.weight,                torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:per_layer_model_proj.weight,       torch.bfloat16 --> F16, shape = {2560, 10752}
INFO:hf-to-gguf:per_layer_proj_norm.weight,        torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 2560
INFO:hf-to-gguf:gguf: feed forward length = 10240
INFO:hf-to-gguf:gguf: head count = 8
INFO:hf-to-gguf:gguf: key-value head count = 2
WARNING:hf-to-gguf:Unknown RoPE type: proportional
INFO:hf-to-gguf:gguf: rope scaling type = NONE
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rope theta swa = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 1
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.context_length', overwriting it with new value 131072 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.head_count', overwriting it with new value 8 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-06 of type FLOAT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.key_length', overwriting it with new value 256 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.value_length', overwriting it with new value 256 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.rope.freq_base', overwriting it with new value 1000000.0 of type FLOAT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.head_count_kv', overwriting it with new value 2 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.key_length', overwriting it with new value 512 of type UINT32
WARNING:gguf.gguf_writer:Duplicated key name 'gemma4.attention.value_length', overwriting it with new value 512 of type UINT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:hf-to-gguf:Token '<|tool_call>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<tool_call|>' is set to USER_DEFINED</tool_call|>
INFO:hf-to-gguf:Token '<|tool_response>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<tool_response|>' is set to USER_DEFINED</tool_response|>
INFO:hf-to-gguf:Token '<|"|>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<|channel>' is set to USER_DEFINED
INFO:hf-to-gguf:Token '<channel|>' is set to USER_DEFINED</channel|>
WARNING:gguf.vocab:Unknown separator token '' in TemplateProcessing
INFO:gguf.vocab:Adding 514906 merge(s).
INFO:gguf.vocab:Setting special token type bos to 2
INFO:gguf.vocab:Setting special token type eos to 1
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_sep_token to False
WARNING:gguf.gguf_writer:Duplicated key name 'tokenizer.ggml.add_bos_token', overwriting it with new value True of type BOOL
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:gguf.gguf: n_tensors = 720, total_size = 15.0G

 

양자화 F16 으로 했더니 14GB 정도 합산해서 먹고 있고

 

llama.cpp로 해보니 애가 오작동한다. 왜 저러니 ㅠㅠ

 

+

$ /mnt/Downloads/llama-b9305/llama-quantize 
usage: /mnt/Downloads/llama-b9305/llama-quantize [--help] [--allow-requantize] [--leave-output-tensor] [--pure] [--imatrix] [--include-weights]
       [--exclude-weights] [--output-tensor-type] [--token-embedding-type] [--tensor-type] [--tensor-type-file]
       [--prune-layers] [--keep-split] [--override-kv] [--dry-run]
       model-f32.gguf [model-quant.gguf] type [nthreads]

  --allow-requantize
                                      allow requantizing tensors that have already been quantized
                                      WARNING: this can severely reduce quality compared to quantizing
                                               from 16bit or 32bit!
  --leave-output-tensor
                                      leave output.weight un(re)quantized
                                      increases model size but may also increase quality, especially when requantizing
  --pure
                                      disable k-quant mixtures and quantize all tensors to the same type
  --imatrix file_name
                                      use data in file_name as importance matrix for quant optimizations
  --include-weights tensor_name
                                      use importance matrix for this/these tensor(s)
  --exclude-weights tensor_name
                                      do not use importance matrix for this/these tensor(s)
  --output-tensor-type ggml_type
                                      use this ggml_type for the output.weight tensor
  --token-embedding-type ggml_type
                                      use this ggml_type for the token embeddings tensor
  --tensor-type tensor_name=ggml_type
                                      quantize this tensor to this ggml_type
                                      this is an advanced option to selectively quantize tensors. may be specified multiple times.
                                      example: --tensor-type attn_q=q8_0
  --tensor-type-file tensor_types.txt
                                      list of tensors to quantize to a specific ggml_type
                                      this is an advanced option to selectively quantize a long list of tensors.
                                      the file should use the same format as above, separated by spaces or newlines.
  --prune-layers L0,L1,L2...
                                      comma-separated list of layer numbers to prune from the model
                                      WARNING: this is an advanced option, use with care.
  --keep-split
                                      generate quantized model in the same shards as input
  --override-kv KEY=TYPE:VALUE
                                      override model metadata by key in the quantized model. may be specified multiple times.
                                      WARNING: this is an advanced option, use with care.
  --dry-run
                                      calculate and show the final quantization size without performing quantization
                                      example: llama-quantize --dry-run model-f32.gguf Q4_K

note: --include-weights and --exclude-weights cannot be used together

-----------------------------------------------------------------------------
 allowed quantization types
-----------------------------------------------------------------------------

  40  or  Q1_0    :  1.125 bpw quantization
   2  or  Q4_0    :  4.34G, +0.4685 ppl @ Llama-3-8B
   3  or  Q4_1    :  4.78G, +0.4511 ppl @ Llama-3-8B
  38  or  MXFP4_MOE :  MXFP4 MoE
   8  or  Q5_0    :  5.21G, +0.1316 ppl @ Llama-3-8B
   9  or  Q5_1    :  5.65G, +0.1062 ppl @ Llama-3-8B
  19  or  IQ2_XXS :  2.06 bpw quantization
  20  or  IQ2_XS  :  2.31 bpw quantization
  28  or  IQ2_S   :  2.5  bpw quantization
  29  or  IQ2_M   :  2.7  bpw quantization
  24  or  IQ1_S   :  1.56 bpw quantization
  31  or  IQ1_M   :  1.75 bpw quantization
  36  or  TQ1_0   :  1.69 bpw ternarization
  37  or  TQ2_0   :  2.06 bpw ternarization
  10  or  Q2_K    :  2.96G, +3.5199 ppl @ Llama-3-8B
  21  or  Q2_K_S  :  2.96G, +3.1836 ppl @ Llama-3-8B
  23  or  IQ3_XXS :  3.06 bpw quantization
  26  or  IQ3_S   :  3.44 bpw quantization
  27  or  IQ3_M   :  3.66 bpw quantization mix
  12  or  Q3_K    : alias for Q3_K_M
  22  or  IQ3_XS  :  3.3 bpw quantization
  11  or  Q3_K_S  :  3.41G, +1.6321 ppl @ Llama-3-8B
  12  or  Q3_K_M  :  3.74G, +0.6569 ppl @ Llama-3-8B
  13  or  Q3_K_L  :  4.03G, +0.5562 ppl @ Llama-3-8B
  25  or  IQ4_NL  :  4.50 bpw non-linear quantization
  30  or  IQ4_XS  :  4.25 bpw non-linear quantization
  15  or  Q4_K    : alias for Q4_K_M
  14  or  Q4_K_S  :  4.37G, +0.2689 ppl @ Llama-3-8B
  15  or  Q4_K_M  :  4.58G, +0.1754 ppl @ Llama-3-8B
  17  or  Q5_K    : alias for Q5_K_M
  16  or  Q5_K_S  :  5.21G, +0.1049 ppl @ Llama-3-8B
  17  or  Q5_K_M  :  5.33G, +0.0569 ppl @ Llama-3-8B
  18  or  Q6_K    :  6.14G, +0.0217 ppl @ Llama-3-8B
   7  or  Q8_0    :  7.96G, +0.0026 ppl @ Llama-3-8B
   1  or  F16     : 14.00G, +0.0020 ppl @ Mistral-7B
  32  or  BF16    : 14.00G, -0.0050 ppl @ Mistral-7B
   0  or  F32     : 26.00G              @ 7B
          COPY    : only copy tensors, no quantizing 

 

F16 에서 Q8로 바꾸니 용량이 절반으로!

-rw-rw-r-- 1 minimonk minimonk  4977169088  4월 29 10:43 gemma-4-E4B-it-Q4_K_M.gguf

-rw-rw-r-- 1 minimonk minimonk 15053078208  6월  1 14:24 gguf.gguf
-rw-rw-r-- 1 minimonk minimonk  5335273152  6월  1 14:32 Q4_K_M.gguf
-rw-rw-r-- 1 minimonk minimonk  8031223488  6월  1 14:27 Q8.gguf

 

$ /mnt/Downloads/llama-b9305/llama-quantize  ./gguf.gguf ./Q8.gguf Q8_0
load_backend: loaded RPC backend from /mnt/Downloads/llama-b9305/libggml-rpc.so
ggml_vulkan: Found 3 Vulkan devices:
ggml_vulkan: 0 = Intel(R) UHD Graphics 630 (CFL GT2) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce GTX 1080 Ti (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 2 = NVIDIA GeForce GTX 1080 Ti (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from /mnt/Downloads/llama-b9305/libggml-vulkan.so
load_backend: loaded CPU backend from /mnt/Downloads/llama-b9305/libggml-cpu-haswell.so
llama_print_build_info: build = 9305 (63248fc3e)
llama_print_build_info: built with GNU 11.4.0 for Linux x86_64
llama_quantize: quantizing './gguf.gguf' to './Q8.gguf' as Q8_0
llama_model_loader: loaded meta data with 37 key-value pairs and 720 tensors from ./gguf.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma4
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Google
llama_model_loader: - kv   3:                         general.size_label str              = 7.5B
llama_model_loader: - kv   4:                         gemma4.block_count u32              = 42
llama_model_loader: - kv   5:                      gemma4.context_length u32              = 131072
llama_model_loader: - kv   6:                    gemma4.embedding_length u32              = 2560
llama_model_loader: - kv   7:                 gemma4.feed_forward_length u32              = 10240
llama_model_loader: - kv   8:                gemma4.attention.head_count u32              = 8
llama_model_loader: - kv   9:             gemma4.attention.head_count_kv u32              = 2
llama_model_loader: - kv  10:                      gemma4.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  11:                  gemma4.rope.freq_base_swa f32              = 10000.000000
llama_model_loader: - kv  12:    gemma4.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  13:                gemma4.attention.key_length u32              = 512
llama_model_loader: - kv  14:              gemma4.attention.value_length u32              = 512
llama_model_loader: - kv  15:                          general.file_type u32              = 1
llama_model_loader: - kv  16:             gemma4.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  17:            gemma4.attention.sliding_window u32              = 512
llama_model_loader: - kv  18:          gemma4.attention.shared_kv_layers u32              = 18
llama_model_loader: - kv  19:    gemma4.embedding_length_per_layer_input u32              = 256
llama_model_loader: - kv  20:    gemma4.attention.sliding_window_pattern arr[bool,42]     = [true, true, true, true, true, false,...
llama_model_loader: - kv  21:            gemma4.attention.key_length_swa u32              = 256
llama_model_loader: - kv  22:          gemma4.attention.value_length_swa u32              = 256
llama_model_loader: - kv  23:                gemma4.rope.dimension_count u32              = 512
llama_model_loader: - kv  24:            gemma4.rope.dimension_count_swa u32              = 256
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - kv  26:                       tokenizer.ggml.model str              = gemma4
llama_model_loader: - kv  27:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  28:                      tokenizer.ggml.scores arr[f32,262144]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  29:                  tokenizer.ggml.token_type arr[i32,262144]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  30:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
llama_model_loader: - kv  31:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  32:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  33:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  35:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  36:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - type  f32:  339 tensors
llama_model_loader: - type  f16:  381 tensors

/// 여기 생략됨 

llama_model_quantize_impl: model size  = 14340.66 MiB (16.00 BPW)
llama_model_quantize_impl: quant size  =  7644.10 MiB (8.53 BPW)

llama_quantize: quantize time = 99905.90 ms
llama_quantize:    total time = 99905.90 ms

 

더보기

[   1/ 720] output_norm.weight                   - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[   2/ 720] per_layer_model_proj.weight          - [  2560,  10752,      1,      1], type =    f16, size =   52.500 MiB
[   3/ 720] per_layer_proj_norm.weight           - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[   4/ 720] per_layer_token_embd.weight          - [ 10752, 262144,      1,      1], type =    f16, converting to q8_0 .. size =  5376.00 MiB ->  2856.00 MiB
[   5/ 720] rope_freqs.weight                    - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[   6/ 720] token_embd.weight                    - [  2560, 262144,      1,      1], type =    f16, converting to q8_0 .. size =  1280.00 MiB ->   680.00 MiB
[   7/ 720] blk.0.attn_k.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[   8/ 720] blk.0.attn_k_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[   9/ 720] blk.0.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  10/ 720] blk.0.attn_output.weight             - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  11/ 720] blk.0.attn_q.weight                  - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  12/ 720] blk.0.attn_q_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[  13/ 720] blk.0.attn_v.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[  14/ 720] blk.0.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  15/ 720] blk.0.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  16/ 720] blk.0.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  17/ 720] blk.0.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  18/ 720] blk.0.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  19/ 720] blk.0.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[  20/ 720] blk.0.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  21/ 720] blk.0.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  22/ 720] blk.0.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  23/ 720] blk.0.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  24/ 720] blk.1.attn_k.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[  25/ 720] blk.1.attn_k_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[  26/ 720] blk.1.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  27/ 720] blk.1.attn_output.weight             - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  28/ 720] blk.1.attn_q.weight                  - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  29/ 720] blk.1.attn_q_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[  30/ 720] blk.1.attn_v.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[  31/ 720] blk.1.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  32/ 720] blk.1.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  33/ 720] blk.1.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  34/ 720] blk.1.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  35/ 720] blk.1.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  36/ 720] blk.1.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[  37/ 720] blk.1.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  38/ 720] blk.1.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  39/ 720] blk.1.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  40/ 720] blk.1.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  41/ 720] blk.2.attn_k.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[  42/ 720] blk.2.attn_k_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[  43/ 720] blk.2.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  44/ 720] blk.2.attn_output.weight             - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  45/ 720] blk.2.attn_q.weight                  - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  46/ 720] blk.2.attn_q_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[  47/ 720] blk.2.attn_v.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[  48/ 720] blk.2.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  49/ 720] blk.2.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  50/ 720] blk.2.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  51/ 720] blk.2.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  52/ 720] blk.2.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  53/ 720] blk.2.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[  54/ 720] blk.2.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  55/ 720] blk.2.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  56/ 720] blk.2.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  57/ 720] blk.2.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  58/ 720] blk.3.attn_k.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[  59/ 720] blk.3.attn_k_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[  60/ 720] blk.3.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  61/ 720] blk.3.attn_output.weight             - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  62/ 720] blk.3.attn_q.weight                  - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  63/ 720] blk.3.attn_q_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[  64/ 720] blk.3.attn_v.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[  65/ 720] blk.3.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  66/ 720] blk.3.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  67/ 720] blk.3.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  68/ 720] blk.3.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  69/ 720] blk.3.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  70/ 720] blk.3.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[  71/ 720] blk.3.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  72/ 720] blk.3.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  73/ 720] blk.3.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  74/ 720] blk.3.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  75/ 720] blk.4.attn_k.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[  76/ 720] blk.4.attn_k_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[  77/ 720] blk.4.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  78/ 720] blk.4.attn_output.weight             - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  79/ 720] blk.4.attn_q.weight                  - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[  80/ 720] blk.4.attn_q_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[  81/ 720] blk.4.attn_v.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[  82/ 720] blk.4.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  83/ 720] blk.4.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  84/ 720] blk.4.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  85/ 720] blk.4.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[  86/ 720] blk.4.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  87/ 720] blk.4.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[  88/ 720] blk.4.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  89/ 720] blk.4.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  90/ 720] blk.4.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  91/ 720] blk.4.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[  92/ 720] blk.5.attn_k.weight                  - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[  93/ 720] blk.5.attn_k_norm.weight             - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[  94/ 720] blk.5.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[  95/ 720] blk.5.attn_output.weight             - [  4096,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[  96/ 720] blk.5.attn_q.weight                  - [  2560,   4096,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[  97/ 720] blk.5.attn_q_norm.weight             - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[  98/ 720] blk.5.attn_v.weight                  - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[  99/ 720] blk.5.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 100/ 720] blk.5.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 101/ 720] blk.5.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 102/ 720] blk.5.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 103/ 720] blk.5.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 104/ 720] blk.5.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 105/ 720] blk.5.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 106/ 720] blk.5.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 107/ 720] blk.5.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 108/ 720] blk.5.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 109/ 720] blk.6.attn_k.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 110/ 720] blk.6.attn_k_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 111/ 720] blk.6.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 112/ 720] blk.6.attn_output.weight             - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 113/ 720] blk.6.attn_q.weight                  - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 114/ 720] blk.6.attn_q_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 115/ 720] blk.6.attn_v.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 116/ 720] blk.6.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 117/ 720] blk.6.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 118/ 720] blk.6.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 119/ 720] blk.6.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 120/ 720] blk.6.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 121/ 720] blk.6.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 122/ 720] blk.6.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 123/ 720] blk.6.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 124/ 720] blk.6.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 125/ 720] blk.6.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 126/ 720] blk.7.attn_k.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 127/ 720] blk.7.attn_k_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 128/ 720] blk.7.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 129/ 720] blk.7.attn_output.weight             - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 130/ 720] blk.7.attn_q.weight                  - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 131/ 720] blk.7.attn_q_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 132/ 720] blk.7.attn_v.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 133/ 720] blk.7.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 134/ 720] blk.7.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 135/ 720] blk.7.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 136/ 720] blk.7.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 137/ 720] blk.7.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 138/ 720] blk.7.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 139/ 720] blk.7.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 140/ 720] blk.7.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 141/ 720] blk.7.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 142/ 720] blk.7.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 143/ 720] blk.8.attn_k.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 144/ 720] blk.8.attn_k_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 145/ 720] blk.8.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 146/ 720] blk.8.attn_output.weight             - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 147/ 720] blk.8.attn_q.weight                  - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 148/ 720] blk.8.attn_q_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 149/ 720] blk.8.attn_v.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 150/ 720] blk.8.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 151/ 720] blk.8.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 152/ 720] blk.8.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 153/ 720] blk.8.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 154/ 720] blk.8.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 155/ 720] blk.8.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 156/ 720] blk.8.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 157/ 720] blk.8.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 158/ 720] blk.8.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 159/ 720] blk.8.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 160/ 720] blk.9.attn_k.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 161/ 720] blk.9.attn_k_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 162/ 720] blk.9.attn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 163/ 720] blk.9.attn_output.weight             - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 164/ 720] blk.9.attn_q.weight                  - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 165/ 720] blk.9.attn_q_norm.weight             - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 166/ 720] blk.9.attn_v.weight                  - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 167/ 720] blk.9.ffn_down.weight                - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 168/ 720] blk.9.ffn_gate.weight                - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 169/ 720] blk.9.ffn_norm.weight                - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 170/ 720] blk.9.ffn_up.weight                  - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 171/ 720] blk.9.inp_gate.weight                - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 172/ 720] blk.9.layer_output_scale.weight      - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 173/ 720] blk.9.post_attention_norm.weight     - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 174/ 720] blk.9.post_ffw_norm.weight           - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 175/ 720] blk.9.post_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 176/ 720] blk.9.proj.weight                    - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 177/ 720] blk.10.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 178/ 720] blk.10.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 179/ 720] blk.10.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 180/ 720] blk.10.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 181/ 720] blk.10.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 182/ 720] blk.10.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 183/ 720] blk.10.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 184/ 720] blk.10.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 185/ 720] blk.10.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 186/ 720] blk.10.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 187/ 720] blk.10.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 188/ 720] blk.10.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 189/ 720] blk.10.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 190/ 720] blk.10.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 191/ 720] blk.10.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 192/ 720] blk.10.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 193/ 720] blk.10.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 194/ 720] blk.11.attn_k.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 195/ 720] blk.11.attn_k_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 196/ 720] blk.11.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 197/ 720] blk.11.attn_output.weight            - [  4096,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 198/ 720] blk.11.attn_q.weight                 - [  2560,   4096,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 199/ 720] blk.11.attn_q_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 200/ 720] blk.11.attn_v.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 201/ 720] blk.11.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 202/ 720] blk.11.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 203/ 720] blk.11.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 204/ 720] blk.11.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 205/ 720] blk.11.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 206/ 720] blk.11.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 207/ 720] blk.11.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 208/ 720] blk.11.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 209/ 720] blk.11.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 210/ 720] blk.11.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 211/ 720] blk.12.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 212/ 720] blk.12.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 213/ 720] blk.12.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 214/ 720] blk.12.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 215/ 720] blk.12.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 216/ 720] blk.12.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 217/ 720] blk.12.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 218/ 720] blk.12.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 219/ 720] blk.12.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 220/ 720] blk.12.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 221/ 720] blk.12.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 222/ 720] blk.12.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 223/ 720] blk.12.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 224/ 720] blk.12.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 225/ 720] blk.12.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 226/ 720] blk.12.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 227/ 720] blk.12.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 228/ 720] blk.13.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 229/ 720] blk.13.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 230/ 720] blk.13.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 231/ 720] blk.13.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 232/ 720] blk.13.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 233/ 720] blk.13.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 234/ 720] blk.13.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 235/ 720] blk.13.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 236/ 720] blk.13.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 237/ 720] blk.13.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 238/ 720] blk.13.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 239/ 720] blk.13.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 240/ 720] blk.13.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 241/ 720] blk.13.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 242/ 720] blk.13.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 243/ 720] blk.13.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 244/ 720] blk.13.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 245/ 720] blk.14.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 246/ 720] blk.14.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 247/ 720] blk.14.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 248/ 720] blk.14.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 249/ 720] blk.14.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 250/ 720] blk.14.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 251/ 720] blk.14.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 252/ 720] blk.14.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 253/ 720] blk.14.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 254/ 720] blk.14.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 255/ 720] blk.14.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 256/ 720] blk.14.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 257/ 720] blk.14.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 258/ 720] blk.14.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 259/ 720] blk.14.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 260/ 720] blk.14.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 261/ 720] blk.14.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 262/ 720] blk.15.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 263/ 720] blk.15.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 264/ 720] blk.15.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 265/ 720] blk.15.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 266/ 720] blk.15.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 267/ 720] blk.15.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 268/ 720] blk.15.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 269/ 720] blk.15.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 270/ 720] blk.15.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 271/ 720] blk.15.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 272/ 720] blk.15.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 273/ 720] blk.15.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 274/ 720] blk.15.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 275/ 720] blk.15.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 276/ 720] blk.15.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 277/ 720] blk.15.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 278/ 720] blk.15.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 279/ 720] blk.16.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 280/ 720] blk.16.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 281/ 720] blk.16.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 282/ 720] blk.16.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 283/ 720] blk.16.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 284/ 720] blk.16.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 285/ 720] blk.16.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 286/ 720] blk.16.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 287/ 720] blk.16.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 288/ 720] blk.16.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 289/ 720] blk.16.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 290/ 720] blk.16.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 291/ 720] blk.16.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 292/ 720] blk.16.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 293/ 720] blk.16.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 294/ 720] blk.16.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 295/ 720] blk.16.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 296/ 720] blk.17.attn_k.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 297/ 720] blk.17.attn_k_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 298/ 720] blk.17.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 299/ 720] blk.17.attn_output.weight            - [  4096,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 300/ 720] blk.17.attn_q.weight                 - [  2560,   4096,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 301/ 720] blk.17.attn_q_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 302/ 720] blk.17.attn_v.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 303/ 720] blk.17.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 304/ 720] blk.17.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 305/ 720] blk.17.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 306/ 720] blk.17.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 307/ 720] blk.17.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 308/ 720] blk.17.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 309/ 720] blk.17.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 310/ 720] blk.17.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 311/ 720] blk.17.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 312/ 720] blk.17.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 313/ 720] blk.18.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 314/ 720] blk.18.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 315/ 720] blk.18.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 316/ 720] blk.18.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 317/ 720] blk.18.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 318/ 720] blk.18.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 319/ 720] blk.18.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 320/ 720] blk.18.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 321/ 720] blk.18.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 322/ 720] blk.18.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 323/ 720] blk.18.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 324/ 720] blk.18.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 325/ 720] blk.18.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 326/ 720] blk.18.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 327/ 720] blk.18.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 328/ 720] blk.18.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 329/ 720] blk.18.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 330/ 720] blk.19.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 331/ 720] blk.19.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 332/ 720] blk.19.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 333/ 720] blk.19.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 334/ 720] blk.19.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 335/ 720] blk.19.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 336/ 720] blk.19.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 337/ 720] blk.19.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 338/ 720] blk.19.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 339/ 720] blk.19.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 340/ 720] blk.19.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 341/ 720] blk.19.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 342/ 720] blk.19.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 343/ 720] blk.19.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 344/ 720] blk.19.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 345/ 720] blk.19.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 346/ 720] blk.19.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 347/ 720] blk.20.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 348/ 720] blk.20.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 349/ 720] blk.20.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 350/ 720] blk.20.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 351/ 720] blk.20.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 352/ 720] blk.20.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 353/ 720] blk.20.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 354/ 720] blk.20.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 355/ 720] blk.20.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 356/ 720] blk.20.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 357/ 720] blk.20.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 358/ 720] blk.20.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 359/ 720] blk.20.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 360/ 720] blk.20.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 361/ 720] blk.20.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 362/ 720] blk.20.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 363/ 720] blk.20.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 364/ 720] blk.21.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 365/ 720] blk.21.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 366/ 720] blk.21.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 367/ 720] blk.21.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 368/ 720] blk.21.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 369/ 720] blk.21.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 370/ 720] blk.21.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 371/ 720] blk.21.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 372/ 720] blk.21.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 373/ 720] blk.21.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 374/ 720] blk.21.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 375/ 720] blk.21.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 376/ 720] blk.21.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 377/ 720] blk.21.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 378/ 720] blk.21.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 379/ 720] blk.21.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 380/ 720] blk.21.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 381/ 720] blk.22.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 382/ 720] blk.22.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 383/ 720] blk.22.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 384/ 720] blk.22.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 385/ 720] blk.22.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 386/ 720] blk.22.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 387/ 720] blk.22.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 388/ 720] blk.22.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 389/ 720] blk.22.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 390/ 720] blk.22.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 391/ 720] blk.22.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 392/ 720] blk.22.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 393/ 720] blk.22.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 394/ 720] blk.22.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 395/ 720] blk.22.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 396/ 720] blk.22.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 397/ 720] blk.22.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 398/ 720] blk.23.attn_k.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 399/ 720] blk.23.attn_k_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 400/ 720] blk.23.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 401/ 720] blk.23.attn_output.weight            - [  4096,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 402/ 720] blk.23.attn_q.weight                 - [  2560,   4096,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 403/ 720] blk.23.attn_q_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 404/ 720] blk.23.attn_v.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 405/ 720] blk.23.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 406/ 720] blk.23.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 407/ 720] blk.23.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 408/ 720] blk.23.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 409/ 720] blk.23.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 410/ 720] blk.23.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 411/ 720] blk.23.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 412/ 720] blk.23.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 413/ 720] blk.23.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 414/ 720] blk.23.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 415/ 720] blk.24.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 416/ 720] blk.24.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 417/ 720] blk.24.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 418/ 720] blk.24.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 419/ 720] blk.24.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 420/ 720] blk.24.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 421/ 720] blk.24.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 422/ 720] blk.24.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 423/ 720] blk.24.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 424/ 720] blk.24.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 425/ 720] blk.24.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 426/ 720] blk.24.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 427/ 720] blk.24.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 428/ 720] blk.24.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 429/ 720] blk.24.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 430/ 720] blk.24.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 431/ 720] blk.24.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 432/ 720] blk.25.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 433/ 720] blk.25.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 434/ 720] blk.25.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 435/ 720] blk.25.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 436/ 720] blk.25.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 437/ 720] blk.25.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 438/ 720] blk.25.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 439/ 720] blk.25.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 440/ 720] blk.25.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 441/ 720] blk.25.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 442/ 720] blk.25.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 443/ 720] blk.25.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 444/ 720] blk.25.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 445/ 720] blk.25.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 446/ 720] blk.25.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 447/ 720] blk.25.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 448/ 720] blk.25.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 449/ 720] blk.26.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 450/ 720] blk.26.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 451/ 720] blk.26.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 452/ 720] blk.26.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 453/ 720] blk.26.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 454/ 720] blk.26.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 455/ 720] blk.26.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 456/ 720] blk.26.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 457/ 720] blk.26.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 458/ 720] blk.26.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 459/ 720] blk.26.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 460/ 720] blk.26.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 461/ 720] blk.26.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 462/ 720] blk.26.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 463/ 720] blk.26.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 464/ 720] blk.26.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 465/ 720] blk.26.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 466/ 720] blk.27.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 467/ 720] blk.27.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 468/ 720] blk.27.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 469/ 720] blk.27.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 470/ 720] blk.27.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 471/ 720] blk.27.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 472/ 720] blk.27.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 473/ 720] blk.27.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 474/ 720] blk.27.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 475/ 720] blk.27.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 476/ 720] blk.27.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 477/ 720] blk.27.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 478/ 720] blk.27.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 479/ 720] blk.27.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 480/ 720] blk.27.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 481/ 720] blk.27.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 482/ 720] blk.27.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 483/ 720] blk.28.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 484/ 720] blk.28.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 485/ 720] blk.28.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 486/ 720] blk.28.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 487/ 720] blk.28.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 488/ 720] blk.28.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 489/ 720] blk.28.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 490/ 720] blk.28.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 491/ 720] blk.28.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 492/ 720] blk.28.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 493/ 720] blk.28.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 494/ 720] blk.28.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 495/ 720] blk.28.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 496/ 720] blk.28.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 497/ 720] blk.28.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 498/ 720] blk.28.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 499/ 720] blk.28.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 500/ 720] blk.29.attn_k.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 501/ 720] blk.29.attn_k_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 502/ 720] blk.29.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 503/ 720] blk.29.attn_output.weight            - [  4096,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 504/ 720] blk.29.attn_q.weight                 - [  2560,   4096,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 505/ 720] blk.29.attn_q_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 506/ 720] blk.29.attn_v.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 507/ 720] blk.29.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 508/ 720] blk.29.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 509/ 720] blk.29.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 510/ 720] blk.29.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 511/ 720] blk.29.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 512/ 720] blk.29.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 513/ 720] blk.29.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 514/ 720] blk.29.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 515/ 720] blk.29.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 516/ 720] blk.29.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 517/ 720] blk.30.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 518/ 720] blk.30.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 519/ 720] blk.30.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 520/ 720] blk.30.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 521/ 720] blk.30.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 522/ 720] blk.30.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 523/ 720] blk.30.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 524/ 720] blk.30.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 525/ 720] blk.30.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 526/ 720] blk.30.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 527/ 720] blk.30.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 528/ 720] blk.30.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 529/ 720] blk.30.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 530/ 720] blk.30.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 531/ 720] blk.30.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 532/ 720] blk.30.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 533/ 720] blk.30.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 534/ 720] blk.31.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 535/ 720] blk.31.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 536/ 720] blk.31.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 537/ 720] blk.31.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 538/ 720] blk.31.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 539/ 720] blk.31.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 540/ 720] blk.31.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 541/ 720] blk.31.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 542/ 720] blk.31.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 543/ 720] blk.31.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 544/ 720] blk.31.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 545/ 720] blk.31.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 546/ 720] blk.31.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 547/ 720] blk.31.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 548/ 720] blk.31.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 549/ 720] blk.31.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 550/ 720] blk.31.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 551/ 720] blk.32.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 552/ 720] blk.32.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 553/ 720] blk.32.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 554/ 720] blk.32.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 555/ 720] blk.32.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 556/ 720] blk.32.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 557/ 720] blk.32.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 558/ 720] blk.32.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 559/ 720] blk.32.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 560/ 720] blk.32.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 561/ 720] blk.32.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 562/ 720] blk.32.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 563/ 720] blk.32.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 564/ 720] blk.32.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 565/ 720] blk.32.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 566/ 720] blk.32.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 567/ 720] blk.32.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 568/ 720] blk.33.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 569/ 720] blk.33.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 570/ 720] blk.33.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 571/ 720] blk.33.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 572/ 720] blk.33.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 573/ 720] blk.33.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 574/ 720] blk.33.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 575/ 720] blk.33.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 576/ 720] blk.33.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 577/ 720] blk.33.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 578/ 720] blk.33.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 579/ 720] blk.33.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 580/ 720] blk.33.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 581/ 720] blk.33.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 582/ 720] blk.33.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 583/ 720] blk.33.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 584/ 720] blk.33.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 585/ 720] blk.34.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 586/ 720] blk.34.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 587/ 720] blk.34.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 588/ 720] blk.34.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 589/ 720] blk.34.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 590/ 720] blk.34.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 591/ 720] blk.34.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 592/ 720] blk.34.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 593/ 720] blk.34.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 594/ 720] blk.34.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 595/ 720] blk.34.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 596/ 720] blk.34.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 597/ 720] blk.34.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 598/ 720] blk.34.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 599/ 720] blk.34.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 600/ 720] blk.34.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 601/ 720] blk.34.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 602/ 720] blk.35.attn_k.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 603/ 720] blk.35.attn_k_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 604/ 720] blk.35.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 605/ 720] blk.35.attn_output.weight            - [  4096,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 606/ 720] blk.35.attn_q.weight                 - [  2560,   4096,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 607/ 720] blk.35.attn_q_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 608/ 720] blk.35.attn_v.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 609/ 720] blk.35.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 610/ 720] blk.35.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 611/ 720] blk.35.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 612/ 720] blk.35.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 613/ 720] blk.35.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 614/ 720] blk.35.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 615/ 720] blk.35.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 616/ 720] blk.35.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 617/ 720] blk.35.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 618/ 720] blk.35.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 619/ 720] blk.36.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 620/ 720] blk.36.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 621/ 720] blk.36.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 622/ 720] blk.36.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 623/ 720] blk.36.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 624/ 720] blk.36.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 625/ 720] blk.36.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 626/ 720] blk.36.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 627/ 720] blk.36.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 628/ 720] blk.36.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 629/ 720] blk.36.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 630/ 720] blk.36.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 631/ 720] blk.36.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 632/ 720] blk.36.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 633/ 720] blk.36.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 634/ 720] blk.36.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 635/ 720] blk.36.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 636/ 720] blk.37.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 637/ 720] blk.37.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 638/ 720] blk.37.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 639/ 720] blk.37.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 640/ 720] blk.37.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 641/ 720] blk.37.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 642/ 720] blk.37.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 643/ 720] blk.37.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 644/ 720] blk.37.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 645/ 720] blk.37.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 646/ 720] blk.37.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 647/ 720] blk.37.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 648/ 720] blk.37.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 649/ 720] blk.37.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 650/ 720] blk.37.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 651/ 720] blk.37.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 652/ 720] blk.37.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 653/ 720] blk.38.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 654/ 720] blk.38.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 655/ 720] blk.38.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 656/ 720] blk.38.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 657/ 720] blk.38.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 658/ 720] blk.38.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 659/ 720] blk.38.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 660/ 720] blk.38.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 661/ 720] blk.38.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 662/ 720] blk.38.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 663/ 720] blk.38.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 664/ 720] blk.38.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 665/ 720] blk.38.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 666/ 720] blk.38.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 667/ 720] blk.38.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 668/ 720] blk.38.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 669/ 720] blk.38.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 670/ 720] blk.39.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 671/ 720] blk.39.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 672/ 720] blk.39.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 673/ 720] blk.39.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 674/ 720] blk.39.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 675/ 720] blk.39.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 676/ 720] blk.39.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 677/ 720] blk.39.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 678/ 720] blk.39.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 679/ 720] blk.39.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 680/ 720] blk.39.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 681/ 720] blk.39.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 682/ 720] blk.39.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 683/ 720] blk.39.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 684/ 720] blk.39.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 685/ 720] blk.39.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 686/ 720] blk.39.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 687/ 720] blk.40.attn_k.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 688/ 720] blk.40.attn_k_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 689/ 720] blk.40.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 690/ 720] blk.40.attn_output.weight            - [  2048,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 691/ 720] blk.40.attn_q.weight                 - [  2560,   2048,      1,      1], type =    f16, converting to q8_0 .. size =    10.00 MiB ->     5.31 MiB
[ 692/ 720] blk.40.attn_q_norm.weight            - [   256,      1,      1,      1], type =    f32, size =    0.001 MiB
[ 693/ 720] blk.40.attn_v.weight                 - [  2560,    512,      1,      1], type =    f16, converting to q8_0 .. size =     2.50 MiB ->     1.33 MiB
[ 694/ 720] blk.40.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 695/ 720] blk.40.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 696/ 720] blk.40.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 697/ 720] blk.40.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 698/ 720] blk.40.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 699/ 720] blk.40.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 700/ 720] blk.40.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 701/ 720] blk.40.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 702/ 720] blk.40.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 703/ 720] blk.40.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 704/ 720] blk.41.attn_k.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 705/ 720] blk.41.attn_k_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 706/ 720] blk.41.attn_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 707/ 720] blk.41.attn_output.weight            - [  4096,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 708/ 720] blk.41.attn_q.weight                 - [  2560,   4096,      1,      1], type =    f16, converting to q8_0 .. size =    20.00 MiB ->    10.62 MiB
[ 709/ 720] blk.41.attn_q_norm.weight            - [   512,      1,      1,      1], type =    f32, size =    0.002 MiB
[ 710/ 720] blk.41.attn_v.weight                 - [  2560,   1024,      1,      1], type =    f16, converting to q8_0 .. size =     5.00 MiB ->     2.66 MiB
[ 711/ 720] blk.41.ffn_down.weight               - [ 10240,   2560,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 712/ 720] blk.41.ffn_gate.weight               - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 713/ 720] blk.41.ffn_norm.weight               - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 714/ 720] blk.41.ffn_up.weight                 - [  2560,  10240,      1,      1], type =    f16, converting to q8_0 .. size =    50.00 MiB ->    26.56 MiB
[ 715/ 720] blk.41.inp_gate.weight               - [  2560,    256,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB
[ 716/ 720] blk.41.layer_output_scale.weight     - [     1,      1,      1,      1], type =    f32, size =    0.000 MiB
[ 717/ 720] blk.41.post_attention_norm.weight    - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 718/ 720] blk.41.post_ffw_norm.weight          - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 719/ 720] blk.41.post_norm.weight              - [  2560,      1,      1,      1], type =    f32, size =    0.010 MiB
[ 720/ 720] blk.41.proj.weight                   - [   256,   2560,      1,      1], type =    f16, converting to q8_0 .. size =     1.25 MiB ->     0.66 MiB

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

llama.cpp reasoning 옵션  (0) 2026.06.01
torchvision model  (0) 2026.06.01
antigravity 2  (0) 2026.05.28
telegram bot api 로 기능 변경하기  (0) 2026.05.25
python huggingface 저장경로 변경하기  (0) 2026.05.24
Posted by 구차니

언제나 그렇듯(!) 1080은 돌리기 힘든 시대구만.. 저번에 찾아둔 걸로 버전 맞추고

pip3 uninstall -y torch torchvision torchaudio xformers
pip3 install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install xformers==0.0.29.post2

 

-mvtec-ad 는 dataset 다운로드 받아 압축푼 최상위 디렉토리를 지정해주면 된다.

~/src/STFPM/mvtec-ad$ tree -d -L 2
.
├── bottle
│   ├── ground_truth
│   ├── test
│   └── train
├── cable
│   ├── ground_truth
│   ├── test
│   └── train
├── capsule
│   ├── ground_truth
│   ├── test
│   └── train
├── carpet
│   ├── ground_truth
│   ├── test
│   └── train
├── grid
│   ├── ground_truth
│   ├── test
│   └── train
├── hazelnut
│   ├── ground_truth
│   ├── test
│   └── train
├── leather
│   ├── ground_truth
│   ├── test
│   └── train
├── metal_nut
│   ├── ground_truth
│   ├── test
│   └── train
├── pill
│   ├── ground_truth
│   ├── test
│   └── train
├── screw
│   ├── ground_truth
│   ├── test
│   └── train
├── tile
│   ├── ground_truth
│   ├── test
│   └── train
├── toothbrush
│   ├── ground_truth
│   ├── test
│   └── train
├── transistor
│   ├── ground_truth
│   ├── test
│   └── train
├── wood
│   ├── ground_truth
│   ├── test
│   └── train
└── zipper
    ├── ground_truth
    ├── test
    └── train

 

$ python3 main.py train --mvtec-ad  mvtec-ad --category carpet --epochs 200
...
[198/200] loss: 0.145916
[198/200] loss: 0.147569
[198/200] loss: 0.147157
[198/200] loss: 0.146806
[198/200] loss: 0.147341
[198/200] loss: 0.145462
[198/200] loss: 0.144424
Valid Loss: 0.0001142
[199/200] loss: 0.146722
[199/200] loss: 0.146246
[199/200] loss: 0.145853
[199/200] loss: 0.145966
[199/200] loss: 0.145793
[199/200] loss: 0.145863
[199/200] loss: 0.154857
Valid Loss: 0.0001460

real 9m35.951s
user 31m24.301s
sys 1m16.579s

 

loss 만 보고 학습을 하게 했네. 

그래도 200 epoch 인데 내부는 어떤지 모르겠지만 10분 컷이면 양호하네

 

cpu 로 1개 코어만 사용해서 하는데 제법 걸린다. 이미지로 판별하고 출력할 수 없나?

$ time python3 main.py test --mvtec-ad  mvtec-ad --category carpet --checkpoint snapshots/carpet/best.pth.tar
load snapshots/carpet/best.pth.tar
Catergory: carpet Pixel-AUC: 0.990288 Image-AUC: 0.955859 PRO: 0.959637

real 5m39.944s
user 5m46.592s
sys 0m1.064s

 

tar 라면서 왜 zip 이래?!?

~/src/STFPM/snapshots/carpet$ file *
best.pth.tar: Zip archive data, at least v0.0 to extract, compression method=store

~/src/STFPM/snapshots/carpet$ ll -h
total 43M
drwxrwxr-x 2 falinux falinux 4.0K  6월  1 11:57 ./
drwxrwxr-x 3 falinux falinux 4.0K  6월  1 11:14 ../
-rw-rw-r-- 1 falinux falinux  43M  6월  1 11:52 best.pth.tar

 

[링크 : https://github.com/gdwang08/STFPM]

'프로그램 사용 > yolo_tensorflow' 카테고리의 다른 글

moviad stfpm  (0) 2026.06.18
ubuntu 26.04 + 3070 + tensorflow + python 3.14 + docker...  (0) 2026.06.17
딥러닝 학습 관련(epoch, loss)  (0) 2026.05.27
NAS - Neural Architecture Search  (0) 2026.05.21
MCUNet  (0) 2026.05.21
Posted by 구차니

생각해보니 기존에 쓰던 버전도 몰랐네 -_-?

 

그래서 리눅스용 tar 파일 받고, 실행하는데 머.. 특별한건 없어 보인다.

 

로그인 하고

 

동의하고

 

뜬다.

 

소심하게 좌측상단 메뉴 옆에 Antigravity 누르면 버전 나오고 땡

 

테마 고르고

 

멀 할지 고르고(아무것도 선택안해도 넘어감)

 

짜잔~

이전에 작업했던 것들이 왼쪽에 쭈르륵 뜬다. 이건 좀 좋다고 해야하나 말아야 하나?

 

음.. 저번에 다 써먹었는데 이번에는 좀 쓸만한 양을 주려나?

 

근데, 우분투에서 패키지로 깐게 아니라

설치하는 경로에서 실행해줘야 해서 조금 귀찮다. 어떻게 데스크탑 아이콘 설정 바꾸지?

 

 

Exec 뒤에 경로를 적절히 조정해주면 된다.

$ cd ~/.local/share/applications
$ vi antigravity-ide.desktop
$ cat antigravity-ide.desktop
[Desktop Entry]
Name=Antigravity IDE
Comment=Antigravity IDE v2.0 - Experience liftoff
GenericName=IDE
Exec="/home/minimonk/Antigravity-x64/antigravity" %F
Icon=antigravity
Type=Application
Terminal=false
StartupNotify=true
StartupWMClass=Antigravity
Categories=Development;IDE;TextEditor;
MimeType=application/x-antigravity-workspace;
$ chmod +x antigravity-ide.desktop 
$ update-desktop-database ~/.local/share/applications/

[링크 : https://discuss.ai.google.dev/t/ubuntu-24-04-upgrading-to-antigravity-2-0-and-antigravity-ide/146516/4]

Posted by 구차니
프로그램 사용/Blender2026. 5. 27. 23:53

먼가 발견하긴 했는데, 이미 서비스 종료 -_ㅠ

 

Notice: The model used in the paper, Claude Sonnet 3.7, has been retired. As a result, we have discontinued the LL3M server.

[링크 : https://github.com/threedle/ll3m]

[링크 : https://threedle.github.io/ll3m/]

 

'프로그램 사용 > Blender' 카테고리의 다른 글

blender bone (armature)  (0) 2026.06.24
3d model 다운로드 / blender import  (0) 2026.06.24
gemini + blender  (0) 2025.09.01
blender로 stl 구멍 메우기  (0) 2025.08.20
blender 휠 에뮬레이트 하기  (0) 2025.05.18
Posted by 구차니

freecad나 blender cad 확장으로는 이래저래 힘들고

python 통해서 제어하면 이상하게 나와서 제어가 유리한 무언가를 찾는 중

 

[링크 : https://github.com/fogleman/sdf]

 

[링크 : https://news.hada.io/topic?id=25230]

'프로그램 사용 > openSCAD' 카테고리의 다른 글

openSCAD MCP  (0) 2026.06.20
pythonSCAD  (0) 2026.06.20
openSCAD cheatsheet  (0) 2026.06.20
openSCAD 설치  (0) 2026.06.20
Posted by 구차니

 

-----

metric 함수. 얘도 함수였구나.

아무튼 metric은 격하게 오락가락 하고

[링크 : https://modulabs.co.kr/blog/loss-versus-accuracy]

 

loss 가 점진적 감소 -> 잘 학습

loss 급격히 감소 후 유지 -> 정상

loss 일정 시점 이후 증가 -> over fitting

[링크 : https://m.blog.naver.com/datapreprep/223748771141]

 

근데.. underfitting은 먼지 감이 안오네.

When to change the number of epochs (training cycles)

Selecting the appropriate number of epochs is a balance between underfitting and overfitting.

Underfitting: One of the most straightforward indicators of underfitting is if the model performs poorly on the training data. This can be observed in Edge Impulse Studio through metrics such as accuracy, or loss, depending on the type of problem (classification or regression). If these metrics indicate poor performance, it suggests that the model has not learned the patterns of the data well. In that case, increasing the number of epochs can improve your model performance. Please note that other solutions exist such as increasing your neural network architecture complexity, changing the preprocessing technique or reducing regularization.

Overfitting: Detecting overfitting involves recognizing when the model has learned too much from the training data, including its noise and outliers, to the detriment of its performance on new, unseen data. Overfitting is characterized by the model performing exceptionally well on the training data but poorly on the validation or test data. Evaluating overfitting can be achieved by comparing the performance of the model between the training set and the validation set during training. When the performance on the validation set starts to degrade, it might indicate that the model is beginning to overfit the training data. In that case, decreasing the number of epochs can improve your model performance. As with underfitting, other solutions exist to reduce overfitting such as increasing the number of training data, adding regularization techniques to add penalties on large weights, adding dropout layers, simplifying the model architecture and even using early stopping.



[링크 : https://docs.edgeimpulse.com/knowledge/concepts/machine-learning/neural-networks/epochs]

 

overfitting 중인 loss 값 이라는데, 

 

학습셋이 너무 적을 경우 overfitting이 빠르게 발생하는 것으로 보인다.

그나저나 저 130 miliion parameter는 ssd300 에서 어떻게 산출된걸까?

What you are experiencing is called overfitting and it happens because of your very small dataset.
All the model cares about is performance on the training dataset, so given the opportunity, it will simply attempt to memorize it. This is what happens in you case, you feed a model which contains over 130 Million parameters less than 319 images.
So regarding your questions:
  1. The loss function shows a clear case of overfitting.
  2. On general, it is okay to use a trained model, especially when you only have a small dataset, but in your case, the dataset is too small for any deep-learning model. When I say small dataset, I mean 10k images, not several hundreds.
  3. You should not train for longer time, once the validation loss stops improving, it is a clear sign to stop. There is even a training technique named "early stopping" which is designed to stop training once the validation loss stops to drop.
You have to understand that currently, your dataset of 300 images, is irrelevant to the world of deep-learning. So if you still want to use it for object detection, you need to revert to more classic computer-vision techniques like using HOG or SIFT features, or even manually engineering the features for your special case.
 

[링크 : https://datascience.stackexchange.com/questions/46456/issues-with-training-ssd-on-own-dataset]

 

The SSD300 has 35.6 million parameters, while the SSDlite320 has about a tenth of them, i.e., 3.4 million.

[링크 : https://www.mdpi.com/2673-6470/3/3/12]

'프로그램 사용 > yolo_tensorflow' 카테고리의 다른 글

ubuntu 26.04 + 3070 + tensorflow + python 3.14 + docker...  (0) 2026.06.17
STFPM 실행  (0) 2026.06.01
NAS - Neural Architecture Search  (0) 2026.05.21
MCUNet  (0) 2026.05.21
STFPM - Student-Teacher Feature Pyramid Matching  (0) 2026.05.20
Posted by 구차니

그래서.. openclaw 에서 대화창 인증하고 나서 바로 메뉴가 추가되었던 건가..

그리고 set_my_commands() 로 등록되어도 채팅 내용으로 받아와지진 않는다.

다른 무슨 처리가 별도로 필요한 듯.

 

Bot settings

set_my_commands() Used for setting the list of commands
delete_my_commands() Used for deleting the list of commands
get_my_commands() Used for obtaining the list of commands
get_my_default_administrator_rights() Used for obtaining the default administrator rights for the bot
set_my_default_administrator_rights() Used for setting the default administrator rights for the bot
get_chat_menu_button() Used for obtaining the menu button of a private chat or the default menu button
set_chat_menu_button() Used for setting the menu button of a private chat or the default menu button
set_my_description() Used for setting the description of the bot
get_my_description() Used for obtaining the description of the bot
set_my_short_description() Used for setting the short description of the bot
get_my_short_description() Used for obtaining the short description of the bot
set_my_name() Used for setting the name of the bot
get_my_name() Used for obtaining the name of the bot
set_my_profile_photo() Used for setting the profile photo of the bot
remove_my_profile_photo() Used for removing the profile photo of the bot

[링크 : https://docs.python-telegram-bot.org/en/stable/telegram.bot.html]

 

봇 토큰으로 채팅내 메뉴 목록 보기

import asyncio
from telegram import Bot

async def check_bot_menu():
    # Replace with your actual Bot Token from @BotFather
    bot_token = "YOUR_BOT_TOKEN_HERE"
    bot = Bot(token=bot_token)
    
    print("Fetching active menu commands...")
    # Retrieves a list of BotCommand objects
    commands = await bot.get_my_commands()
    
    if not commands:
        print("No menu commands found. The menu is empty.")
    else:
        print(f"Found {len(commands)} command(s):")
        for cmd in commands:
            print(f"/{cmd.command} - {cmd.description}")

if __name__ == "__main__":
    asyncio.run(check_bot_menu())

 

봇 토큰으로 채팅내 메뉴 비우기

import asyncio
from telegram import Bot

async def remove_bot_menu():
    # Replace with your actual Bot Token from @BotFather
    bot_token = "YOUR_BOT_TOKEN_HERE"
    bot = Bot(token=bot_token)
    
    print("Removing all menu commands...")
    # Passing no arguments clears the commands globally
    await bot.delete_my_commands()
    print("Success! All menu commands have been removed.")

# Run the async function
if __name__ == "__main__":
    asyncio.run(remove_bot_menu())

 

봇 토큰과 채팅id를 이용해서 즉각적으로 메뉴 변경하기.

set_my_commands 로는 대화창을 나가도 바뀌지 않아서, 특정 채팅 아이디 넣어서야 바뀌게 되었음

import asyncio
from telegram import Bot, BotCommand, BotCommandScopeChat

async def force_update_user_menu():
    # 1. 봇 토큰과 대상 유저의 Chat ID를 입력하세요.
    BOT_TOKEN = "YOUR_BOT_TOKEN_HERE"
    USER_CHAT_ID = 123456789  # 숫자로 된 유저의 chat_id 입력
    
    bot = Bot(token=BOT_TOKEN)
    
    # 2. 변경하고 싶은 새로운 메뉴 목록을 정의합니다.
    new_commands = [
        BotCommand(command="home", description="🏠 홈 화면으로"),
        BotCommand(command="mypage", description="👤 내 정보 보기"),
        BotCommand(command="support", description="❓ 고객 센터")
    ]
    
    print(f"User({USER_CHAT_ID})의 메뉴를 즉시 변경합니다...")
    
    # 3. scope를 'BotCommandScopeChat'으로 지정하여 특정 chat_id에 즉시 강제 적용
    await bot.set_my_commands(
        commands=new_commands,
        scope=BotCommandScopeChat(chat_id=USER_CHAT_ID)
    )
    
    print("성공! 해당 유저의 텔레그램 앱 화면에서 메뉴가 즉시 업데이트되었습니다.")

if __name__ == "__main__":
    asyncio.run(force_update_user_menu())

 

요건 아직 테스트 안해봄

from telegram import Update, BotCommand, BotCommandScopeChat
from telegram.ext import Application, CommandHandler, ContextTypes

# 봇 토큰 설정
BOT_TOKEN = "YOUR_BOT_TOKEN_HERE"

# 1. /start 명령어가 들어왔을 때 실행될 함수 (여기서 메뉴를 즉시 변경)
async def start_command(update: Update, context: ContextTypes.DEFAULT_TYPE):
    chat_id = update.effective_chat.id
    
    # 해당 유저의 화면에 보일 새로운 메뉴 정의
    new_menu = [
        BotCommand(command="home", description="🏠 홈 화면으로"),
        BotCommand(command="mypage", description="👤 내 정보 보기")
    ]
    
    # 유저 화면의 메뉴 버튼 즉시 업데이트
    await context.bot.set_my_commands(
        commands=new_menu,
        scope=BotCommandScopeChat(chat_id=chat_id)
    )
    
    await update.message.reply_text(
        "반갑습니다! 메뉴 버튼이 업데이트되었습니다.\n"
        "좌측 하단의 [Menu] 버튼을 누르거나 명령어를 입력해보세요!"
    )

# 2. 메뉴 버튼의 /home 처리를 위한 핸들러 함수
async def home_command(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await update.message.reply_text("🏠 홈 화면으로 이동했습니다.")

# 3. 메뉴 버튼의 /mypage 처리를 위한 핸들러 함수
async def mypage_command(update: Update, context: ContextTypes.DEFAULT_TYPE):
    user = update.effective_user
    await update.message.reply_text(f"👤 [{user.first_name}] 님의 마이페이지입니다.")


def main():
    # 애플리케이션 빌드
    application = Application.builder().token(BOT_TOKEN).build()

    # ⭐ [핵심] 메뉴에 추가한 명령어들과 핸들러 함수를 1:1로 매핑해줍니다.
    application.add_handler(CommandHandler("start", start_command))
    application.add_handler(CommandHandler("home", home_command))       # /home 처리
    application.add_handler(CommandHandler("mypage", mypage_command))   # /mypage 처리

    # 봇 시작 (폴링 방식)
    print("봇이 시작되었습니다. 대화를 시작하세요...")
    application.run_polling()

if __name__ == "__main__":
    main()
Posted by 구차니

HF_HOME 환경 변수를 바꿔주면 된다고

리눅스에서는 기본값 ~/.cache/huggingface

[링크 : https://developer0hye.tistory.com/775]

Posted by 구차니