llama.cpp reasoning 옵션

구차니 2026. 6. 1. 23:45

옵션을 살펴보면

/llama-b9305$ ./llama-server --help
/llama-b9305$ ./llama-cli --help

4개 정도가 나오는데 gemma4-e4b는 reasoning을 지원하는지 옵션없이 실행하면 켜져있는 듯 하다.

--reasoning-format FORMAT               controls whether thought tags are allowed and/or extracted from the
                                        response, and in which format they're returned; one of:
                                        - none: leaves thoughts unparsed in `message.content`
                                        - deepseek: puts thoughts in `message.reasoning_content`
                                        - deepseek-legacy: keeps `<think>` tags in `message.content` while
                                        also populating `message.reasoning_content`
                                        (default: auto)
                                        (env: LLAMA_ARG_THINK)
-rea,  --reasoning [on|off|auto]        Use reasoning/thinking in the chat ('on', 'off', or 'auto', default:
                                        'auto' (detect from template))
                                        (env: LLAMA_ARG_REASONING)
--reasoning-budget N                    token budget for thinking: -1 for unrestricted, 0 for immediate end,
                                        N>0 for token budget (default: -1)
                                        (env: LLAMA_ARG_THINK_BUDGET)
--reasoning-budget-message MESSAGE      message injected before the end-of-thinking tag when reasoning budget
                                        is exhausted (default: none)
                                        (env: LLAMA_ARG_THINK_BUDGET_MESSAGE)

cli 에서는 잘 먹는것 같다.

$ llama-b9305/llama-cli  --model /mnt/Downloads/model/gemma4-e2b/gemma-4-E2B-it-Q4_K_M.gguf -rea on -sm none

Loading model...

▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b9305-63248fc3e
model      : gemma-4-E2B-it-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern

> 무슨생각하고 있어?

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user asked "무슨 생각하고 있어?" (What are you thinking?). This is an open-ended question about the model's internal state or current activity
.
2.  **Determine Identity and Nature:** I am Gemma 4, a Large Language Model. I do not have consciousness, feelings, personal thoughts, or internal subjective experiences.
3.  **Identify the Core Activity:** My current "thought" process is processing the input (the user's question) and generating a relevant, coherent, and helpful response.
4.  **Formulate the Answer Strategy (Internal Monologue -> External Response):**
    *   Acknowledge the nature of the question.                                                                                                                                    *   State clearly that I don't "think" in a human sense (no consciousness/feelings).
    *   Describe what I *am* doing (processing information, analyzing language, generating text).
    *   Pivot back to the user (ask how I can help).
5.  **Draft the Response in Korean (Focusing on natural, polite tone):**

    *   *Initial thought:* 저는 생각하지 않아요. (I don't think.)
    *   *Elaboration:* 저는 정보를 처리하고 답변을 생성하고 있어요. (I am processing information and generating answers.)
    *   *Contextualizing the current state:* 지금 사용자님의 질문을 이해하고 가장 적절한 답변을 찾고 있습니다. (Right now, I am understanding your question and finding the mos
t appropriate answer.)
    *   *Call to action:* 무엇을 도와드릴까요? (What can I help you with?)

6.  **Final Review:** The response is accurate, polite, and directly addresses the user's intent while staying true to my identity as an LLM. (The drafted Korean response belo
w is good.)
[End thinking]

저는 인간처럼 '생각'을 하지는 않습니다. 저는 대규모 언어 모델(Large Language Model)이기 때문에, 정보를 처리하고, 패턴을 분석하며, 사용자님의 질문에 가장 적절하고 유용한 답변을
생성하는 작업을 하고 있습니다.

지금은 사용자님의 질문을 이해하고, 그에 맞는 정보를 찾고 답변을 구성하는 과정에 집중하고 있습니다.

궁금한 것이 있거나 도움이 필요하시면 언제든지 말씀해 주세요! 😊
[ Prompt: 193.7 t/s | Generation: 94.4 t/s ]

>

Exiting...

$ llama-b9305/llama-cli  --model /mnt/Downloads/model/gemma4-e2b/gemma-4-E2B-it-Q4_K_M.gguf -rea off -sm none

Loading model...

▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b9305-63248fc3e
model      : gemma-4-E2B-it-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern

> 무슨생각하고 있어?

저는 Google에서 훈련한 대규모 언어 모델입니다. 😊

**지금 저는 당신의 질문에 답하고, 정보를 제공하고, 창의적인 텍스트를 생성하는 등 제가 할 수 있는 모든 작업을 수행하고 있습니다.**

혹시 **특정한 주제**에 대해 궁금하신가요? 아니면 **제가 무엇을 할 수 있는지** 더 알고 싶으신가요?

어떤 것에 대해 이야기하고 싶으신지 말씀해주시면, 그에 맞춰서 생각하고 대화할 준비가 되어 있습니다!

[ Prompt: 126.5 t/s | Generation: 95.2 t/s ]

>

Exiting...

llama-swap 에서 llama-server에 -rea off 하고 질문을 하면 옵션이 안 먹나? 똑같이 reasoning 나온다. 머냐?

저작자표시 (새창열림)