음성인식 - 그림 들어가면 그리기

 

 

그림 넣고 캡션에 그림 있으면 img2img로 작동시키기

Posted by 구차니
카테고리 없음2026. 5. 11. 06:33

길 못 찾아서 찾아봄 ㅠㅠ

[링크 : https://youtu.be/AZJuH2eG3Ug?t=3220]

Posted by 구차니

흐음.. 신기한걸 알았는데.. 또 파이썬이네.

핸드폰에서 돌려놓기도 하는거 보면 제법 가볍긴 한다 보다.

 

 

D:\study\llm>pip install litert-lm
D:\study\llm>litert-lm
CLI tool for LiteRT-LM models.

Usage: litert-lm [OPTIONS] COMMAND [ARGS]...

Commands:
  benchmark  Benchmarks a LiteRT-LM model.
  delete     Deletes a model from the local storage.
  import     Imports a model from a local path or HuggingFace hub.
  list       Lists all imported LiteRT-LM models.
  rename     Renames a model.
  run        Runs a LiteRT-LM model interactively or with a single prompt.
  serve      Start a server with a Gemini or OpenAI compatible API (alpha feature)

Global options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

D:\study\llm>litert-lm run --from-huggingface-repo=litert-community/gemma-4-E4B-it-litert-lm gemma-4-E4B-it.litertlm --backend=gpu  --enable-speculative-decoding=true --prompt="What is the capital of France?"
Downloading gemma-4-E4B-it.litertlm from litert-community/gemma-4-E4B-it-litert-lm...
gemma-4-E4B-it.litertlm:   0%|                                                 | 92.3k/3.66G [00:01<16:44:38, 60.7kB/s]
gemma-4-E4B-it.litertlm: 100%|████████████████████████████████████████████████████| 3.66G/3.66G [05:00<00:00, 12.2MB/s]
C:\Users\minimonk\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\minimonk\.cache\huggingface\hub\models--litert-community--gemma-4-E4B-it-litert-lm. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
The capital of France is **Paris**.

 

 

찾아보니 저장소의 파일명이 그런거였군.

[링크 : https://huggingface.co/litert-community/gemma-4-E4B-it-litert-lm/tree/main]

 

[링크 : https://huggingface.co/metricspace/gemma4-E2B-it-litert-128k-mtp/tree/main]

 

mtp 들어가면서 안되나?

D:\study\llm>litert-lm run --from-huggingface-repo=metricspace/gemma4-E2B-it-litert-128k-mtp model.litertlm --backend=gpu  --enable-speculative-decoding=true --prompt="What is the capital of France?"
Downloading model.litertlm from metricspace/gemma4-E2B-it-litert-128k-mtp...
E0000 00:00:1778407860.973266    8280 delegate_webgpu.cc:373] Failed to create litert::ml_drift::DelegateKernelLiteRt: RESOURCE_EXHAUSTED: Requested allocation size - 4294967296 bytes. Max allocation size for this GPU - 2147483648 bytes. Shape - {bhwdc, {1, 1, 8192, 1, 131072}}, data type - float32.
=== Source Location Trace: ===
third_party/ml_drift/common/task/tensor_desc.cc:1846
third_party/ml_drift/common/gpu_model_util.cc:232
third_party/ml_drift/common/gpu_model_util.cc:269
third_party/ml_drift/common/gpu_model_util.cc:432
third_party/odml/litert/ml_drift/delegate/delegate_kernel.cc:765
third_party/odml/litert/ml_drift/delegate/delegate_kernel.cc:695
third_party/odml/litert/ml_drift/delegate/delegate_kernel.cc:787
third_party/odml/litert/ml_drift/delegate/delegate_kernel.cc:284
third_party/odml/litert/ml_drift/delegate/delegate_kernel_litert.cc:167
ERROR: Failed to initialize kernel.
ERROR: Node number 223 (STABLEHLO_COMPOSITE) failed to prepare.
E0000 00:00:1778407862.768911    8280 engine.cc:491] Failed to create engine: INTERNAL: ERROR: [third_party/odml/litert_lm/runtime/executor/llm_litert_compiled_model_executor.cc:1928]
??ERROR: [./third_party/odml/litert/litert/cc/litert_compiled_model.h:1780]
=== Source Location Trace: ===
./third_party/odml/litert/litert/cc/litert_macros.h:538
third_party/odml/litert_lm/runtime/executor/llm_litert_compiled_model_executor_factory.cc:144
third_party/odml/litert_lm/runtime/core/engine_impl.cc:384
An error occurred
Traceback (most recent call last):
  File "C:\Users\minimonk\AppData\Local\Programs\Python\Python310\lib\site-packages\litert_lm_cli\model.py", line 255, in run_interactive
    engine_cm = litert_lm.Engine(
  File "C:\Users\ minimonk \AppData\Local\Programs\Python\Python310\lib\site-packages\litert_lm\engine.py", line 82, in __init__
    raise RuntimeError(
RuntimeError: Failed to create LiteRT-LM engine for C:\Users\ minimonk \.cache\huggingface\hub\models--metricspace--gemma4-E2B-it-litert-128k-mtp\snapshots\4dae3505f550397923c206eaa63be84f17ee43cb\model.litertlm

 

 

[링크 : https://github.com/google-ai-edge/LiteRT-LM]

[링크 : https://huggingface.co/metricspace/gemma4-E2B-it-litert-128k-mtp]

[링크 : https://pypi.org/project/litert-lm/]

[링크 : https://www.reddit.com/r/LocalLLaMA/comments/1somixt/practical_local_llm_on_android_gemma_4_via/?tl=ko]

Posted by 구차니

python 에서 돌아가는 녀석인 듯.

 

[링크 : https://vllm.ai/]

 

[링크 : https://github.com/vllm-project/vllm/releases]

Posted by 구차니

qwen 형님으로 모셔야 하나 ㅋㅋㅋ

 

D:\study\llm>pip install soundfile torch qwen_tts
D:\study\llm>python
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import soundfile as sf
>>> from qwen_tts import Qwen3TTSModel

********
Warning: flash-attn is not installed. Will only run the manual PyTorch version. Please install flash-attn for faster inference.
********

'sox' is not recognized as an internal or external command,
operable program or batch file.
SoX could not be found!

    If you do not have SoX, proceed here:
     - - - http://sox.sourceforge.net/ - - -

    If you do (or think that you should) have SoX, double-check your
    path variables.

>>>
>>> model = Qwen3TTSModel.from_pretrained(
...     "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
...     device_map="cuda:0",
...     dtype=torch.bfloat16,
...     attn_implementation="flash_attention_2",
... )
config.json: 4.91kB [00:00, 4.70MB/s]
C:\Users\minimonk\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\minimonk\.cache\huggingface\hub\models--Qwen--Qwen3-TTS-12Hz-1.7B-CustomVoice. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
model.safetensors:   0%|                                                                   | 0.00/3.83G [00:00<?, ?B/s]

model.safetensors: 100%|██████████████████████████████████████████████████████████| 3.83G/3.83G [04:45<00:00, 13.4MB/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\minimonk\AppData\Local\Programs\Python\Python310\lib\site-packages\qwen_tts\inference\qwen3_tts_model.py", line 112, in from_pretrained
    model = AutoModel.from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\models\auto\auto_factory.py", line 604, in from_pretrained
    return model_class.from_pretrained(
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\qwen_tts\core\models\modeling_qwen3_tts.py", line 1876, in from_pretrained
    model = super().from_pretrained(
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 277, in _wrapper
    return func(*args, **kwargs)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 4971, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\qwen_tts\core\models\modeling_qwen3_tts.py", line 1817, in __init__
    super().__init__(config)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2076, in __init__
    self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2686, in _check_and_adjust_attn_implementation
    applicable_attn_implementation = self.get_correct_attn_implementation(
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2714, in get_correct_attn_implementation
    self._flash_attn_2_can_dispatch(is_init_check)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2422, in _flash_attn_2_can_dispatch
    raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

 

에라이, 역시 리눅스 환경 기준으로 해야하나?

D:\study\llm>pip install flash_attn
Collecting flash_attn
  Using cached flash_attn-2.8.3.tar.gz (8.4 MB)
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\minimonk\\AppData\\Local\\Temp\\pip-install-gkk0v5su\\flash-attn_bdc9b907b4714d19aa80016a5ecbd8e6\\csrc/composable_kernel/library/src/tensor_operation_instance/gpu/batched_gemm_add_relu_gemm_add/device_batched_gemm_add_relu_gemm_add_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp'
HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at https://pip.pypa.io/warnings/enable-long-paths

 

화자와 언어가 달라도 될까 궁금하네

[링크 : https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

litert-lm 와 gemma4-e2b mtp 일단 실패  (0) 2026.05.10
vLLM  (0) 2026.05.10
supertone/supertonic3 시도  (0) 2026.05.10
outetts 시도  (0) 2026.05.10
huggingface 에서 다운로드 받기(python)  (0) 2026.05.10
Posted by 구차니

알아서 받고 한글도 정말 잘 변환해준다.

잠시 검색해보니 한국 회사인것 같고. hybe 자회사로 게임같은데서 보이스 체인저로 유명한 듯?

라이센스는 좀 읽어 봐야겠지만 대충 번역기 돌려서 보니 SaaS 까지도 허용하는 것 같긴한데..

 

outetts 처럼 빌드는 필요없이 그냥 pip만으로 설치되니 good!

그리고 auto_download 하면 먼가 열심히 받고 알아서 한다.

D:\study\llm>pip install supertonic
D:\study\llm>python
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from supertonic import TTS
>>> tts = TTS(auto_download=True)
Downloading (incomplete total...): 0.00B [00:00, ?B/s]                                                                 Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Fetching 26 files: 100%|███████████████████████████████████████████████████████████████| 26/26 [00:36<00:00,  1.40s/it]
Download complete: : 404MB [00:36, 19.4MB/s]                                                                           >>> style = tts.get_voice_style(voice_name="M1")
>>>
>>> text = "A gentle breeze moved through the open window while everyone listened to the story."
>>> wav, duration = tts.synthesize(text, voice_style=style, lang="en")
>>>
>>> tts.save_audio(wav, "output.wav")
>>> print(f"Generated {duration:.2f}s of audio")

>>> text = "안녕? 난 잼미니야 만나서 반가워"
>>> wav, duration = tts.synthesize(text, voice_style=style, lang="ko")
>>> tts.save_audio(wav, "output_ko.wav")

 

[링크 : https://huggingface.co/Supertone/supertonic-3]

[링크 : https://www.supertone.ai/ko]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

vLLM  (0) 2026.05.10
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice 시도 실패  (0) 2026.05.10
outetts 시도  (0) 2026.05.10
huggingface 에서 다운로드 받기(python)  (0) 2026.05.10
stable diffusion 사용법  (0) 2026.05.09
Posted by 구차니

윈도우에서 하려고 했더니

step 1에서 바로 좌절. 먼가 그럼 미친듯이 깔지 말고 컴파일러 부터 확인하고 가라고!!! 버럭버럭!

D:\study\llm> pip install outetts
      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> llama-cpp-python

D:\study\llm>

[링크 : https://github.com/edwko/OuteTTS?tab=readme-ov-file#installation]

[링크 : https://huggingface.co/unsloth/Llama-OuteTTS-1.0-1B]

 

 

Running the example
With both of the models generated, the LLM model and the voice decoder model,
we can run the example:

$ build/bin/llama-tts -m  ./models/outetts-0.2-0.5B-q8_0.gguf \
    -mv ./models/wavtokenizer-large-75-f16.gguf \
    -p "Hello world"
...
main: audio written to file 'output.wav'

[링크 : https://git.comtegra.pl/ajastrzebski/llama-cpp/-/tree/master/examples/tts]

[링크 : https://huggingface.co/OuteAI/OuteTTS-0.2-500M-GGUF/tree/main]

[링크 : https://huggingface.co/ggml-org/WavTokenizer/tree/main]

 

D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64>llama-tts -m ..\OuteTTS-0.3-500M-Q8_0.gguf  -mv ..\WavTokenizer-Large-75-F16.gguf -p "hello i am sam. how are you?"
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 6143 MiB):
  Device 0: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes, VRAM: 6143 MiB
load_backend: loaded CUDA backend from D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64\ggml-cuda.dll
load_backend: loaded RPC backend from D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64\ggml-cpu-haswell.dll
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
common_params_fit_impl: getting device memory data for initial parameters:
common_memory_breakdown_print: | memory breakdown [MiB]   | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - CUDA0 (GTX 1060 6GB) |  6143 = 5197 + ( 931 =   506 +      96 +     329) +          15 |
common_memory_breakdown_print: |   - Host                 |                  162 =   143 +       0 +      19                |
common_params_fit_impl: projected to use 931 MiB of device memory vs. 5197 MiB of free device memory
common_params_fit_impl: will leave 4265 >= 1024 MiB of free device memory, no changes needed
common_fit_params: successfully fit params to free device memory
common_fit_params: fitting params to free memory took 0.44 seconds
llama_model_loader: loaded meta data with 25 key-value pairs and 290 tensors from ..\OuteTTS-0.3-500M-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = OuteTTS 0.3 500M
llama_model_loader: - kv   3:                           general.basename str              = OuteTTS-0.3
llama_model_loader: - kv   4:                         general.size_label str              = 500M
llama_model_loader: - kv   5:                          qwen2.block_count u32              = 24
llama_model_loader: - kv   6:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv   7:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv   8:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv   9:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv  10:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  11:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  12:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,157696]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,157696]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 151644
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 151645
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  22:                    tokenizer.chat_template str              = outetts-0.3
llama_model_loader: - kv  23:               general.quantization_version u32              = 2
llama_model_loader: - kv  24:                          general.file_type u32              = 7
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q8_0:  169 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 506.02 MiB (8.50 BPW)
llama_prepare_model_devices: using device CUDA0 (NVIDIA GeForce GTX 1060 6GB) (0000:01:00.0) - 5197 MiB free
load: 0 unused tokens
load: control-looking token: 128247 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden
load: printing all EOG tokens:
load:   - 128247 ('</s>')
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 5152
load: token to piece cache size = 0.9712 MB
print_info: arch                  = qwen2
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 32768
print_info: n_embd                = 896
print_info: n_embd_inp            = 896
print_info: n_layer               = 24
print_info: n_head                = 14
print_info: n_head_kv             = 2
print_info: n_rot                 = 64
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 64
print_info: n_embd_head_v         = 64
print_info: n_gqa                 = 7
print_info: n_embd_k_gqa          = 128
print_info: n_embd_v_gqa          = 128
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-06
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: f_attn_value_scale    = 0.0000
print_info: n_ff                  = 4864
print_info: n_expert              = 0
print_info: n_expert_used         = 0
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 1
print_info: pooling type          = -1
print_info: rope type             = 2
print_info: rope scaling          = linear
print_info: freq_base_train       = 1000000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 32768
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = 1B
print_info: model params          = 499.19 M
print_info: general.name          = OuteTTS 0.3 500M
print_info: vocab type            = BPE
print_info: n_vocab               = 157696
print_info: n_merges              = 151387
print_info: BOS token             = 151644 '<|im_start|>'
print_info: EOS token             = 151645 '<|im_end|>'
print_info: EOT token             = 151645 '<|im_end|>'
print_info: PAD token             = 151645 '<|im_end|>'
print_info: LF token              = 198 'Ċ'
print_info: FIM PRE token         = 151659 '<|fim_prefix|>'
print_info: FIM SUF token         = 151661 '<|fim_suffix|>'
print_info: FIM MID token         = 151660 '<|fim_middle|>'
print_info: FIM PAD token         = 151662 '<|fim_pad|>'
print_info: FIM REP token         = 151663 '<|repo_name|>'
print_info: FIM SEP token         = 151664 '<|file_sep|>'
print_info: EOG token             = 128247 '</s>'
print_info: EOG token             = 151643 '<|endoftext|>'
print_info: EOG token             = 151645 '<|im_end|>'
print_info: EOG token             = 151662 '<|fim_pad|>'
print_info: EOG token             = 151663 '<|repo_name|>'
print_info: EOG token             = 151664 '<|file_sep|>'
print_info: max token length      = 256
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 23 repeating layers to GPU
load_tensors: offloaded 25/25 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   143.17 MiB
load_tensors:        CUDA0 model buffer size =   506.07 MiB
..........................................................
common_init_result: added </s> logit bias = -inf
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
common_init_result: added <|fim_pad|> logit bias = -inf
common_init_result: added <|repo_name|> logit bias = -inf
common_init_result: added <|file_sep|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 8192
llama_context: n_ctx_seq     = 8192
llama_context: n_batch       = 8192
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (8192) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     0.60 MiB
llama_kv_cache:      CUDA0 KV buffer size =    96.00 MiB
llama_kv_cache: size =   96.00 MiB (  8192 cells,  24 layers,  1/1 seqs), K (f16):   48.00 MiB, V (f16):   48.00 MiB
llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 64
llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 64
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve:      CUDA0 compute buffer size =   329.26 MiB
sched_reserve:  CUDA_Host compute buffer size =    19.51 MiB
sched_reserve: graph nodes  = 823
sched_reserve: graph splits = 2
sched_reserve: reserve took 9.05 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
common_params_fit_impl: getting device memory data for initial parameters:
common_memory_breakdown_print: | memory breakdown [MiB]   | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - CUDA0 (GTX 1060 6GB) |  6143 = 4255 + ( 496 =   120 +       0 +     376) +        1392 |
common_memory_breakdown_print: |   - Host                 |                   36 =     4 +       0 +      32                |
common_params_fit_impl: projected to use 496 MiB of device memory vs. 4255 MiB of free device memory
common_params_fit_impl: will leave 3758 >= 1024 MiB of free device memory, no changes needed
common_fit_params: successfully fit params to free device memory
common_fit_params: fitting params to free memory took -0.78 seconds
llama_model_loader: loaded meta data with 25 key-value pairs and 161 tensors from ..\WavTokenizer-Large-75-F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = wavtokenizer-dec
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = WavTokenizer Large Speech 75token
llama_model_loader: - kv   3:                           general.finetune str              = speech-75token
llama_model_loader: - kv   4:                           general.basename str              = WavTokenizer
llama_model_loader: - kv   5:                         general.size_label str              = large
llama_model_loader: - kv   6:                            general.license str              = mit
llama_model_loader: - kv   7:               wavtokenizer-dec.block_count u32              = 12
llama_model_loader: - kv   8:            wavtokenizer-dec.context_length u32              = 8192
llama_model_loader: - kv   9:          wavtokenizer-dec.embedding_length u32              = 1282
llama_model_loader: - kv  10:      wavtokenizer-dec.attention.head_count u32              = 1
llama_model_loader: - kv  11: wavtokenizer-dec.attention.layer_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  12:                          general.file_type u32              = 1
llama_model_loader: - kv  13:                wavtokenizer-dec.vocab_size u32              = 4096
llama_model_loader: - kv  14:           wavtokenizer-dec.features_length u32              = 512
llama_model_loader: - kv  15:       wavtokenizer-dec.feed_forward_length u32              = 2304
llama_model_loader: - kv  16: wavtokenizer-dec.attention.group_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  17: wavtokenizer-dec.attention.group_norm_groups u32              = 32
llama_model_loader: - kv  18:   wavtokenizer-dec.posnet.embedding_length u32              = 768
llama_model_loader: - kv  19:        wavtokenizer-dec.posnet.block_count u32              = 6
llama_model_loader: - kv  20: wavtokenizer-dec.convnext.embedding_length u32              = 768
llama_model_loader: - kv  21:      wavtokenizer-dec.convnext.block_count u32              = 12
llama_model_loader: - kv  22:          wavtokenizer-dec.attention.causal bool             = false
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = none
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  110 tensors
llama_model_loader: - type  f16:   51 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 124.15 MiB (16.03 BPW)
llama_prepare_model_devices: using device CUDA0 (NVIDIA GeForce GTX 1060 6GB) (0000:01:00.0) - 4255 MiB free
load: adding 4096 dummy tokens
print_info: arch                  = wavtokenizer-dec
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 8192
print_info: n_embd                = 512
print_info: n_embd_inp            = 512
print_info: n_layer               = 12
print_info: n_head                = 1
print_info: n_head_kv             = 1
print_info: n_rot                 = 512
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 512
print_info: n_embd_head_v         = 512
print_info: n_gqa                 = 1
print_info: n_embd_k_gqa          = 512
print_info: n_embd_v_gqa          = 512
print_info: f_norm_eps            = 1.0e-06
print_info: f_norm_rms_eps        = 0.0e+00
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: f_attn_value_scale    = 0.0000
print_info: n_ff                  = 2304
print_info: n_expert              = 0
print_info: n_expert_used         = 0
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 0
print_info: pooling type          = -1
print_info: rope type             = -1
print_info: rope scaling          = linear
print_info: freq_base_train       = 10000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 8192
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = ?B
print_info: model params          = 64.98 M
print_info: general.name          = WavTokenizer Large Speech 75token
print_info: vocab type            = no vocab
print_info: n_vocab               = 4096
print_info: n_merges              = 0
print_info: max token length      = 0
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 11 repeating layers to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =     4.00 MiB
load_tensors:        CUDA0 model buffer size =   120.15 MiB
.......................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 8192
llama_context: n_ctx_seq     = 8192
llama_context: n_batch       = 8192
llama_context: n_ubatch      = 8192
llama_context: causal_attn   = 0
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 10000.0
llama_context: freq_scale    = 1
llama_context:  CUDA_Host  output buffer size =     0.02 MiB
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve:      CUDA0 compute buffer size =   376.00 MiB
sched_reserve:  CUDA_Host compute buffer size =    32.03 MiB
sched_reserve: graph nodes  = 401
sched_reserve: graph splits = 2
sched_reserve: reserve took 14.06 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
sampler seed: 0
sampler params:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1
        top_k = 4, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000, adaptive_target = -1.000, adaptive_decay = 0.900
sampler chain: logits -> top-k -> dist
main: loading done
main: constructing prompt ..
main: prompt: 'hello<|space|>i<|space|>am<|space|>sam<|space|>how<|space|>are<|space|>you'


main: llama tokens: 151667, 198, 1782, 155780, 151929, 152412, 152308, 152585, 152460, 153375, 156777, 198, 74455, 155808, 151799, 151873, 151863, 152446, 152372, 152204, 152728, 152229, 152470, 151970, 153413, 152419, 153334, 153289, 153374, 153199, 152040, 153260, 152721, 152680, 153297, 152419, 153248, 152400, 152691, 153368, 153437, 156777, 198, 1722, 155828, 152607, 152256, 152991, 152299, 152688, 153163, 153016, 152789, 153198, 152712, 151911, 153107, 152623, 152170, 152395, 152852, 152207, 152461, 153321, 153309, 151750, 152137, 153340, 152573, 152267, 153347, 151789, 152681, 153339, 151992, 152512, 151751, 152179, 153434, 153180, 152900, 153440, 152474, 153122, 153129, 151904, 152311, 156777, 198, 1499, 155791, 152276, 152454, 153354, 152544, 153204, 153272, 152708, 153433, 152319, 153226, 153043, 152325, 153267, 152622, 156777, 198, 4250, 155797, 153454, 153342, 151989, 152458, 153420, 152303, 152271, 152827, 153036, 153196, 151708, 153263, 152561, 153207, 152213, 152112, 153204, 151722, 152542, 156777, 198, 19789, 155796, 153353, 153182, 152345, 152471, 152477, 153014, 152002, 152191, 151734, 152312, 152810, 152237, 153224, 153169, 153224, 152244, 153387, 153404, 156777, 198, 16069, 155811, 152265, 151946, 151808, 152412, 152363, 152305, 153156, 152733, 152810, 153157, 152016, 152100, 152069, 153234, 152317, 152589, 152707, 153121, 153341, 152159, 152114, 153156, 153001, 153504, 153376, 152272, 152433, 152325, 151941, 156777, 198, 285, 155788, 152238, 152255, 153427, 152318, 153009, 152381, 152474, 152680, 152157, 153255, 152324, 151682, 156777, 198, 32955, 155804, 153490, 153419, 152364, 152405, 152682, 152206, 152078, 153369, 152725, 153193, 153027, 152946, 152488, 153070, 151883, 152890, 152489, 153144, 153375, 152358, 151685, 152494, 152117, 152740, 156777, 198, 37448, 480, 155840, 151902, 152720, 153377, 152027, 152378, 152821, 153207, 153459, 153028, 153068, 152507, 153255, 152158, 152921, 151958, 152609, 152748, 152822, 152286, 151714, 152730, 152377, 152353, 152470, 152606, 152162, 152186, 153071, 152244, 153118, 153375, 153018, 152712, 153098, 152976, 152336, 151843, 153202, 152297, 151736, 153380, 153502, 152702, 152115, 153181, 152735, 153277, 153457, 152393, 153112, 152595, 156777, 198, 19098, 155808, 152464, 153452, 152595, 153312, 151937, 151933, 153197, 152239, 153163, 152922, 153402, 152034, 152591, 153438, 152215, 151673, 152005, 151785, 152642, 151924, 153278, 151805, 151974, 153482, 152718, 152862, 153347, 156777, 198, 72, 155780, 151795, 152111, 152746, 152377, 153471, 152309, 156777, 198, 19016, 155788, 153181, 152271, 152190, 152842, 152224, 152701, 152939, 152536, 152091, 151815, 152733, 151672, 156777, 198, 14689, 155788, 152291, 152072, 152942, 151734, 153042, 153504, 152589, 153333, 151839, 151941, 153038, 153180, 156777, 198, 36996, 8303, 155832, 152231, 152256, 152835, 152801, 152985, 153400, 152393, 152818, 152765, 152249, 152600, 151699, 152302, 152752, 153018, 153009, 151992, 153054, 152847, 153354, 153228, 152662, 153355, 152532, 153393, 151782, 152458, 152048, 152757, 152428, 153195, 151906, 153006, 153178, 153250, 152331, 152284, 152780, 153138, 153319, 151980, 153142, 152418, 152228, 152733, 156777, 198, 9096, 155801, 151698, 153321, 152217, 153039, 152935, 153400, 152122, 152531, 153106, 152169, 152892, 152957, 151851, 152427, 152826, 152451, 151851, 152901, 152885, 152594, 153446, 153080, 156777, 198, 14689, 155795, 152658, 151700, 153321, 152450, 152530, 153191, 151673, 151690, 151698, 152714, 152846, 152981, 153171, 153384, 153364, 153188, 153246, 156777, 198, 1055, 155779, 151869, 152388, 152711, 153334, 151736, 156777, 198, 1782, 155780, 153483, 153240, 152241, 152558, 152697, 153046, 156777, 198, 5804, 1363, 155820, 152941, 152764, 152605, 153034, 153434, 153372, 153347, 151887, 152453, 152758, 152133, 152510, 152694, 152431, 152321, 153088, 152676, 152223, 152581, 152459, 152015, 152502, 153063, 152712, 153294, 153451, 153032, 152903, 152859, 152989, 151748, 152669, 152661, 152650, 152409, 151861, 156777, 198, 300, 7973, 155828, 153095, 152469, 152988, 152894, 151819, 152391, 153019, 152058, 153062, 153230, 151826, 152112, 152306, 152264, 152769, 153390, 152384, 152435, 152790, 153393, 152983, 152540, 152252, 152034, 153107, 152540, 151919, 151893, 152558, 152817, 152946, 152956, 152129, 152715, 153131, 153490, 151734, 152271, 152707, 151734, 153321, 152450, 156777, 198, 8088, 155792, 152452, 153497, 153353, 152679, 152533, 152382, 152374, 152611, 153341, 153163, 152285, 153411, 152495, 153141, 152320, 156777, 198, 1199, 155781, 151764, 152360, 153295, 152634, 153342, 152199, 152271, 156777, 198, 43366, 155799, 152308, 151682, 152889, 152016, 152385, 152629, 152495, 151826, 153321, 152958, 152180, 151886, 153432, 152922, 152128, 153024, 153040, 152593, 152287, 151677, 156777, 198, 53660, 155808, 151727, 152092, 152680, 153331, 151699, 152316, 152938, 152289, 152433, 153384, 151781, 153137, 153259, 152175, 153213, 152291, 151869, 152691, 152489, 151941, 152049, 152034, 153053, 152179, 153160, 151676, 153367, 156777, 198, 268, 4123, 480, 155821, 152350, 152173, 152536, 151991, 151960, 153144, 153013, 152358, 152234, 153135, 152291, 153235, 152143, 152583, 152402, 153483, 152678, 152192, 152533, 152946, 151797, 153103, 152310, 152293, 151825, 152548, 153442, 152109, 152659, 153325, 152781, 152570, 152957, 151752, 152265, 153381, 152515, 156777, 198, 437, 155787, 152957, 152659, 151975, 152709, 152402, 152836, 152174, 151792, 153409, 153327, 152990, 156777, 198, 275, 155781, 152520, 153038, 152067, 153273, 153185, 152265, 152974, 156777, 198, 94273, 155799, 152953, 152938, 153427, 152244, 151920, 153423, 152929, 152367, 153052, 152129, 152331, 152257, 152987, 152777, 153448, 152408, 151696, 152408, 152326, 152699, 156777, 198, 385, 16239, 155828, 152306, 152268, 153438, 153228, 152978, 152957, 153153, 153393, 152795, 152110, 152918, 152923, 152467, 152331, 153053, 153330, 151889, 153444, 152234, 152624, 151779, 152801, 152784, 152139, 152222, 152751, 152512, 153287, 153141, 153052, 151840, 152589, 152508, 153499, 152109, 152255, 151739, 152267, 152759, 153318, 153165, 153349, 156777,


<|im_start|>
<|text_start|>the<|space|>overall<|space|>package<|space|>from<|space|>just<|space|>two<|space|>people<|space|>is<|space|>pretty<|space|>remarkable<|space|>sure<|space|>i<|space|>have<|space|>some<|space|>critiques<|space|>about<|space|>some<|space|>of<|space|>the<|space|>gameplay<|space|>aspects<|space|>but<|space|>its<|space|>still<|space|>really<|space|>enjoyable<|space|>and<|space|>it<|space|>looks<|space|>lovely<|space|>hello<|space|>i<|space|>am<|space|>sam<|space|>how<|space|>are<|space|>you<|text_end|>
<|audio_start|>
the<|t_0.08|><|257|><|740|><|636|><|913|><|788|><|1703|><|space|>
overall<|t_0.36|><|127|><|201|><|191|><|774|><|700|><|532|><|1056|><|557|><|798|><|298|><|1741|><|747|><|1662|><|1617|><|1702|><|1527|><|368|><|1588|><|1049|><|1008|><|1625|><|747|><|1576|><|728|><|1019|><|1696|><|1765|><|space|>
package<|t_0.56|><|935|><|584|><|1319|><|627|><|1016|><|1491|><|1344|><|1117|><|1526|><|1040|><|239|><|1435|><|951|><|498|><|723|><|1180|><|535|><|789|><|1649|><|1637|><|78|><|465|><|1668|><|901|><|595|><|1675|><|117|><|1009|><|1667|><|320|><|840|><|79|><|507|><|1762|><|1508|><|1228|><|1768|><|802|><|1450|><|1457|><|232|><|639|><|space|>
from<|t_0.19|><|604|><|782|><|1682|><|872|><|1532|><|1600|><|1036|><|1761|><|647|><|1554|><|1371|><|653|><|1595|><|950|><|space|>
just<|t_0.25|><|1782|><|1670|><|317|><|786|><|1748|><|631|><|599|><|1155|><|1364|><|1524|><|36|><|1591|><|889|><|1535|><|541|><|440|><|1532|><|50|><|870|><|space|>
two<|t_0.24|><|1681|><|1510|><|673|><|799|><|805|><|1342|><|330|><|519|><|62|><|640|><|1138|><|565|><|1552|><|1497|><|1552|><|572|><|1715|><|1732|><|space|>
people<|t_0.39|><|593|><|274|><|136|><|740|><|691|><|633|><|1484|><|1061|><|1138|><|1485|><|344|><|428|><|397|><|1562|><|645|><|917|><|1035|><|1449|><|1669|><|487|><|442|><|1484|><|1329|><|1832|><|1704|><|600|><|761|><|653|><|269|><|space|>
is<|t_0.16|><|566|><|583|><|1755|><|646|><|1337|><|709|><|802|><|1008|><|485|><|1583|><|652|><|10|><|space|>
pretty<|t_0.32|><|1818|><|1747|><|692|><|733|><|1010|><|534|><|406|><|1697|><|1053|><|1521|><|1355|><|1274|><|816|><|1398|><|211|><|1218|><|817|><|1472|><|1703|><|686|><|13|><|822|><|445|><|1068|><|space|>
remarkable<|t_0.68|><|230|><|1048|><|1705|><|355|><|706|><|1149|><|1535|><|1787|><|1356|><|1396|><|835|><|1583|><|486|><|1249|><|286|><|937|><|1076|><|1150|><|614|><|42|><|1058|><|705|><|681|><|798|><|934|><|490|><|514|><|1399|><|572|><|1446|><|1703|><|1346|><|1040|><|1426|><|1304|><|664|><|171|><|1530|><|625|><|64|><|1708|><|1830|><|1030|><|443|><|1509|><|1063|><|1605|><|1785|><|721|><|1440|><|923|><|space|>
sure<|t_0.36|><|792|><|1780|><|923|><|1640|><|265|><|261|><|1525|><|567|><|1491|><|1250|><|1730|><|362|><|919|><|1766|><|543|><|1|><|333|><|113|><|970|><|252|><|1606|><|133|><|302|><|1810|><|1046|><|1190|><|1675|><|space|>
i<|t_0.08|><|123|><|439|><|1074|><|705|><|1799|><|637|><|space|>
have<|t_0.16|><|1509|><|599|><|518|><|1170|><|552|><|1029|><|1267|><|864|><|419|><|143|><|1061|><|0|><|space|>
some<|t_0.16|><|619|><|400|><|1270|><|62|><|1370|><|1832|><|917|><|1661|><|167|><|269|><|1366|><|1508|><|space|>
critiques<|t_0.60|><|559|><|584|><|1163|><|1129|><|1313|><|1728|><|721|><|1146|><|1093|><|577|><|928|><|27|><|630|><|1080|><|1346|><|1337|><|320|><|1382|><|1175|><|1682|><|1556|><|990|><|1683|><|860|><|1721|><|110|><|786|><|376|><|1085|><|756|><|1523|><|234|><|1334|><|1506|><|1578|><|659|><|612|><|1108|><|1466|><|1647|><|308|><|1470|><|746|><|556|><|1061|><|space|>
about<|t_0.29|><|26|><|1649|><|545|><|1367|><|1263|><|1728|><|450|><|859|><|1434|><|497|><|1220|><|1285|><|179|><|755|><|1154|><|779|><|179|><|1229|><|1213|><|922|><|1774|><|1408|><|space|>
some<|t_0.23|><|986|><|28|><|1649|><|778|><|858|><|1519|><|1|><|18|><|26|><|1042|><|1174|><|1309|><|1499|><|1712|><|1692|><|1516|><|1574|><|space|>
of<|t_0.07|><|197|><|716|><|1039|><|1662|><|64|><|space|>
the<|t_0.08|><|1811|><|1568|><|569|><|886|><|1025|><|1374|><|space|>
gameplay<|t_0.48|><|1269|><|1092|><|933|><|1362|><|1762|><|1700|><|1675|><|215|><|781|><|1086|><|461|><|838|><|1022|><|759|><|649|><|1416|><|1004|><|551|><|909|><|787|><|343|><|830|><|1391|><|1040|><|1622|><|1779|><|1360|><|1231|><|1187|><|1317|><|76|><|997|><|989|><|978|><|737|><|189|><|space|>
aspects<|t_0.56|><|1423|><|797|><|1316|><|1222|><|147|><|719|><|1347|><|386|><|1390|><|1558|><|154|><|440|><|634|><|592|><|1097|><|1718|><|712|><|763|><|1118|><|1721|><|1311|><|868|><|580|><|362|><|1435|><|868|><|247|><|221|><|886|><|1145|><|1274|><|1284|><|457|><|1043|><|1459|><|1818|><|62|><|599|><|1035|><|62|><|1649|><|778|><|space|>
but<|t_0.20|><|780|><|1825|><|1681|><|1007|><|861|><|710|><|702|><|939|><|1669|><|1491|><|613|><|1739|><|823|><|1469|><|648|><|space|>
its<|t_0.09|><|92|><|688|><|1623|><|962|><|1670|><|527|><|599|><|space|>
still<|t_0.27|><|636|><|10|><|1217|><|344|><|713|><|957|><|823|><|154|><|1649|><|1286|><|508|><|214|><|1760|><|1250|><|456|><|1352|><|1368|><|921|><|615|><|5|><|space|>
really<|t_0.36|><|55|><|420|><|1008|><|1659|><|27|><|644|><|1266|><|617|><|761|><|1712|><|109|><|1465|><|1587|><|503|><|1541|><|619|><|197|><|1019|><|817|><|269|><|377|><|362|><|1381|><|507|><|1488|><|4|><|1695|><|space|>
enjoyable<|t_0.49|><|678|><|501|><|864|><|319|><|288|><|1472|><|1341|><|686|><|562|><|1463|><|619|><|1563|><|471|><|911|><|730|><|1811|><|1006|><|520|><|861|><|1274|><|125|><|1431|><|638|><|621|><|153|><|876|><|1770|><|437|><|987|><|1653|><|1109|><|898|><|1285|><|80|><|593|><|1709|><|843|><|space|>
and<|t_0.15|><|1285|><|987|><|303|><|1037|><|730|><|1164|><|502|><|120|><|1737|><|1655|><|1318|><|space|>
it<|t_0.09|><|848|><|1366|><|395|><|1601|><|1513|><|593|><|1302|><|space|>
looks<|t_0.27|><|1281|><|1266|><|1755|><|572|><|248|><|1751|><|1257|><|695|><|1380|><|457|><|659|><|585|><|1315|><|1105|><|1776|><|736|><|24|><|736|><|654|><|1027|><|space|>
lovely<|t_0.56|><|634|><|596|><|1766|><|1556|><|1306|><|1285|><|1481|><|1721|><|1123|><|438|><|1246|><|1251|><|795|><|659|><|1381|><|1658|><|217|><|1772|><|562|><|952|><|107|><|1129|><|1112|><|467|><|550|><|1079|><|840|><|1615|><|1469|><|1380|><|168|><|917|><|836|><|1827|><|437|><|583|><|67|><|595|><|1087|><|1646|><|1493|><|1677|><|space|>main: prompt size: 871

main: time for prompt: 252.929 ms

000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

main: time for decoder:       1412.620 ms
common_perf_print:    sampling time =      66.79 ms
common_perf_print:    samplers time =      26.02 ms /   199 tokens
common_perf_print:        load time =     655.95 ms
common_perf_print: prompt eval time =     234.97 ms /   871 tokens (    0.27 ms per token,  3706.90 tokens per second)
common_perf_print:        eval time =    1341.80 ms /   198 runs   (    6.78 ms per token,   147.56 tokens per second)
common_perf_print:       total time =    1075.97 ms /  1069 tokens
common_perf_print: unaccounted time =    -567.58 ms / -52.8 %      (total - sampling - prompt eval - eval) / (total)
common_perf_print:    graphs reused =        196
common_memory_breakdown_print: | memory breakdown [MiB]   | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - CUDA0 (GTX 1060 6GB) |  6143 = 3733 + ( 931 =   506 +      96 +     329) +        1479 |
common_memory_breakdown_print: |   - Host                 |                  162 =   143 +       0 +      19                |

codes: '
hello<|t_0.96|><|865|><|1506|><|865|><|1419|><|1819|><|838|><|624|><|1251|><|899|><|954|><|1096|><|710|><|1152|><|1418|><|710|><|1301|><|1120|><|17|><|1456|><|1405|><|776|><|1668|><|1390|><|86|><|1292|><|1023|><|1683|><|1589|><|1092|><|1556|><|1479|><|1294|><|1292|><|805|><|1683|><|1430|><|900|><|1714|><|995|><|1294|><|1432|><|1007|><|1622|><|1120|><|861|><|1803|><|995|><|1092|><|1668|><|710|><|1433|><|933|><|670|><|32|><|1293|><|1251|><|1134|><|1701|><|1347|><|816|><|642|><|95|><|508|><|48|><|503|><|653|><|1707|><|1041|><|267|><|1817|><|248|><|1754|><|space|>
i<|t_0.28|><|73|><|642|><|169|><|614|><|983|><|169|><|843|><|443|><|1092|><|752|><|252|><|1378|><|1315|><|221|><|1448|><|1083|><|565|><|866|><|93|><|767|><|1697|><|space|>
am<|t_0.16|><|422|><|852|><|408|><|847|><|1007|><|550|><|874|><|673|><|191|><|127|><|220|><|716|><|space|>
sam<|t_0.43|><|775|><|487|><|646|><|519|><|493|><|1513|><|1|><|1166|><|640|><|556|><|0|><|1061|><|18|><|333|><|719|><|632|><|693|><|907|><|430|><|1312|><|1086|><|1098|><|1333|><|974|><|816|><|440|><|1755|><|1324|><|1534|><|662|><|1812|><|385|><|space|>
how<|t_0.20|><|1663|><|1028|><|1488|><|1314|><|1393|><|1723|><|1303|><|1497|><|951|><|1181|><|789|><|142|><|1475|><|66|><|297|><|space|>
are<|t_0.13|><|798|><|1803|><|562|><|123|><|756|><|968|><|381|><|890|><|1773|><|1039|><|space|>
you<|t_0.08|><|193|><|92|><|1221|><|1334|><|562|><|1415|>
<|audio_end|>
<|im_end|>'
main: codes size: 199
codes audio: '<|865|><|1506|><|865|><|1419|><|1819|><|838|><|624|><|1251|><|899|><|954|><|1096|><|710|><|1152|><|1418|><|710|><|1301|><|1120|><|17|><|1456|><|1405|><|776|><|1668|><|1390|><|86|><|1292|><|1023|><|1683|><|1589|><|1092|><|1556|><|1479|><|1294|><|1292|><|805|><|1683|><|1430|><|900|><|1714|><|995|><|1294|><|1432|><|1007|><|1622|><|1120|><|861|><|1803|><|995|><|1092|><|1668|><|710|><|1433|><|933|><|670|><|32|><|1293|><|1251|><|1134|><|1701|><|1347|><|816|><|642|><|95|><|508|><|48|><|503|><|653|><|1707|><|1041|><|267|><|1817|><|248|><|1754|><|73|><|642|><|169|><|614|><|983|><|169|><|843|><|443|><|1092|><|752|><|252|><|1378|><|1315|><|221|><|1448|><|1083|><|565|><|866|><|93|><|767|><|1697|><|422|><|852|><|408|><|847|><|1007|><|550|><|874|><|673|><|191|><|127|><|220|><|716|><|775|><|487|><|646|><|519|><|493|><|1513|><|1|><|1166|><|640|><|556|><|0|><|1061|><|18|><|333|><|719|><|632|><|693|><|907|><|430|><|1312|><|1086|><|1098|><|1333|><|974|><|816|><|440|><|1755|><|1324|><|1534|><|662|><|1812|><|385|><|1663|><|1028|><|1488|><|1314|><|1393|><|1723|><|1303|><|1497|><|951|><|1181|><|789|><|142|><|1475|><|66|><|297|><|798|><|1803|><|562|><|123|><|756|><|968|><|381|><|890|><|1773|><|1039|><|193|><|92|><|1221|><|1334|><|562|><|1415|>'
main: codes audio size: 168
main: time for vocoder:      220.671 ms
main: time for spectral ops: 850.860 ms
main: total time:            2737.218 ms
main: audio written to file 'output.wav'

 

영어는 잘되는데 한글은 잘 안되는 듯.

output.wav
0.10MB

 

GPT 통해서 영어로 "안녕? 난 잼미니야 만나서 반가워" 를 TTS에 유리하게 바꾸어 달라고 했는데

'안녕? 난' 은 날아가고 '잼미니야 맨나서 빵가워' 정도로 들린다.

D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64>llama-tts -m ..\OuteTTS-0.3-500M-Q8_0.gguf  -mv ..\WavTokenizer-Large-75-F16.gguf -p "Annyoung? Nahn Jemmini-ya. Mannaseo bangawo."

 

hello_ko.wav
0.15MB

 

 

+

허깅페이스에서 타입에 아예 tts가 있었군.

[링크 : https://huggingface.co/models?pipeline_tag=text-to-speech]

Posted by 구차니

git clone  하듯 받을수 있을것 같기도 한데..

아무튼 파이썬을 통해서 한번에 받는 방법

 

step 1.

hugging face 아래에 복사 아이콘 (like 2 왼쪽) 을 누른다.

[링크 : https://huggingface.co/lysandre/arxiv-nlp]

 

step 2.

pip 로 huggingface_hub 패키지를 설치한다.

 

step 3. 

python 실행해서 아래를 따라한다.

전체 리포지토리 다운로드하기
snapshot_download() 함수는 특정 버전의 전체 리포지토리를 다운로드합니다. 이 함수는 내부적으로 hf_hub_download() 함수를 사용하므로, 다운로드한 모든 파일은 로컬 디스크에 캐시되어 저장됩니다. 다운로드는 여러 파일을 동시에 받아오기 때문에 빠르게 진행됩니다.

전체 리포지토리를 다운로드하려면 repo_id와 repo_type을 인자로 넘겨주면 됩니다:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="lysandre/arxiv-nlp")

[링크 : https://huggingface.co/docs/huggingface_hub/ko/guides/download]

 

음.. 걍 하나씩 받을까 -_-

Download complete: : 2.52GB [03:13, 50.8MB/s]              'C:\\Users\\minimonk\\.cache\\huggingface\\hub\\models--unsloth--Llama-OuteTTS-1.0-1B\\snapshots\\52b901173c6cf817148fdaef981e52408332c3ca'

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

supertone/supertonic3 시도  (0) 2026.05.10
outetts 시도  (0) 2026.05.10
stable diffusion 사용법  (0) 2026.05.09
stable diffusion python service  (0) 2026.05.08
opencode + qwen3.6 35b q2 사용 테스트  (0) 2026.05.08
Posted by 구차니

[링크 : https://selgyun.tistory.com/4]

 

(키워드) - 강화

[키워드] -약화

batch - 동시에 여러개 이미지 생성

[링크 : https://selgyun.tistory.com/5] txt2img

[링크 : https://selgyun.tistory.com/6] img2img

[링크 : https://selgyun.tistory.com/7] Lora - 모델타입?

 

+

2026.05.11

모델 받는 법, 종류

[링크 : https://healtable.tistory.com/7]

 

+

--ckpt CKPT model.ckpt Path to checkpoint of Stable Diffusion model; if specified, this checkpoint will be added to the list of checkpoints and loaded.
--ckpt-dir CKPT_DIR None Path to directory with Stable Diffusion checkpoints.

[링크 : https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings]

[링크 : https://www.reddit.com/r/StableDiffusion/comments/117p89v/how_can_i_choose_which_checkpoint_to_load_when/?tl=ko]

 

+

VAE stands for variational autoencoder
EMA and MSE. (Exponential Moving Average and Mean Square Error)

[링크 : https://stable-diffusion-art.com/automatic1111/]

[링크 : https://stable-diffusion-art.com/how-to-use-vae/]

[링크 : https://huggingface.co/stabilityai/sd-vae-ft-ema#visual]

 

지금 이거랑 비슷한 현상인데 봐야겠다.

[링크 : https://samablog.tistory.com/m/203]

 

+

이전 모델이 머였는지 모르겠는데, 새로 받은건 SDXL 계열이고.. 그래서 VAE를 automatic으로 해주라는데

어디서 설정을 하는건지 못 찾는 중..

Illustrious XL is an advanced Stable Diffusion XL (SD XL)-based model, developed by OnomaAI Research, optimized specifically for illustration and animation tasks. It is built upon the Kohaku XL-Beta - Revision 5 checkpoint, leveraging its robust foundation to deliver high-quality generative capabilities.

[링크 : https://civitai.com/models/795765/illustrious-xl]

 

gpt 에서 보면 VAE가 의심되긴한데 CLIP도 문제라지만..fp16 VAE(half-vae ?) 문제일수도 있다는데

일단 내가 한건 모델만 바꾼거라.. vae와 매칭이 안되는게 의심되긴 한다.

1. VAE가 모델과 맞지 않는 경우

가장 흔한 원인입니다.

예를 들어:

SD1.5 계열 모델
SDXL 계열 모델
애니 특화 모델
실사 특화 모델

각각 latent 분포가 조금 다릅니다.

그런데 다른 계열 VAE를 연결하면:

색감 이상
얼굴 붕괴
디테일 손실
프롬프트 반영 약화
채도 과다/부족
눈/손 이상

등이 발생합니다.

예:

SDXL 모델에 SD1.5 VAE 사용
Anything 계열에 실사 VAE 사용

→ “고양이”를 입력했는데 이상한 생물처럼 나오는 경우도 있습니다.


5. 실제로는 CLIP 문제인 경우도 많음

많은 사용자가 VAE 문제라고 생각하지만 실제론:

CLIP 인코딩
CFG scale
sampler
step 수
LoRA 충돌

문제인 경우도 매우 많습니다.

예:

CFG 너무 높음 → 과적합
Euler a → 프롬프트 드리프트
LoRA weight 과다 → 키워드 왜곡

그러면:

"1girl, red hair"

인데 갑자기:

배경 과다
캐릭터 여러 명
이상한 의상

등이 나옵니다.



6. fp16 VAE 문제

저사양 환경에서 흔합니다.

fp16 VAE는:

VRAM 절약
속도 향상

장점이 있지만:

검은 이미지
washed out
NaN
디테일 손실

이 발생하기도 합니다.

특히 AMD/구형 CUDA에서 자주 보입니다.

[링크 : https://chatgpt.com/share/6a01e096-dc48-83e9-8aa3-cfe937d3b9e6]

 

 

 

+

2026.05.12

ft-MSE가 내눈에는 좋아 보인다.

[링크 : https://huggingface.co/stabilityai/sd-vae-ft-mse/tree/main]

[링크 : https://huggingface.co/stabilityai/sd-vae-ft-mse-original/tree/main]

[링크 : https://huggingface.co/stabilityai/sd-vae-ft-ema/tree/main]

[링크 : https://huggingface.co/stabilityai/sd-vae-ft-ema-original/tree/main]

 

+

[링크 : https://www.reddit.com/r/StableDiffusion/comments/1594hbj/minimum_nonsquare_resolution_with_sdxl/]

 

Posted by 구차니

 

from diffusers import StableDiffusionPipeline

[링크 : https://www.reddit.com/r/Python/comments/10g5nay/use_python_to_build_a_free_stable_diffusion_app/?tl=ko]

[링크 : https://www.assemblyai.com/blog/build-a-free-stable-diffusion-app-with-a-gpu-backend]

 

매모리 터져나갈게 보이니 e4b는 gpu 0번에서

stable diffusion은 gpu 1번에서 돌려서 두개 연동해

img2img로 장난치거나 그림 그려 가 포함되면 txt2img로 돌려도 괜찮을듯

---

by gpt

api 모드를 활성화 해주고

./webui.sh --api

 

txt2img

import requests
import base64

url = "http://127.0.0.1:7860/sdapi/v1/txt2img"

payload = {
    "prompt": "masterpiece, ultra detailed, cyberpunk girl, neon city, rain",
    "negative_prompt": "low quality, blurry",
    "steps": 30,
    "width": 768,
    "height": 768,
    "cfg_scale": 7,
    "sampler_name": "DPM++ 2M Karras"
}

response = requests.post(url, json=payload)

result = response.json()

# base64 이미지 저장
image_data = base64.b64decode(result["images"][0])

with open("generated.png", "wb") as f:
    f.write(image_data)

print("generated.png 저장 완료")

 

오.. 된다.

 

img2img

import requests
import base64

# 입력 이미지 읽기
with open("input.png", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode()

url = "http://127.0.0.1:7860/sdapi/v1/img2img"

payload = {
    "init_images": [image_base64],

    "prompt": "cyberpunk style, neon lights, futuristic",
    "negative_prompt": "low quality, blurry",

    "denoising_strength": 0.55,

    "steps": 30,
    "cfg_scale": 7,
    "width": 768,
    "height": 768,
    "sampler_name": "DPM++ 2M Karras"
}

response = requests.post(url, json=payload)

result = response.json()

image_data = base64.b64decode(result["images"][0])

with open("modified.png", "wb") as f:
    f.write(image_data)

print("modified.png 저장 완료")

 

[링크 : https://chatgpt.com/share/69fd90a2-2bbc-83e9-8f04-6cecdddc1b41]

Posted by 구차니