윈도우에서 하려고 했더니
step 1에서 바로 좌절. 먼가 그럼 미친듯이 깔지 말고 컴파일러 부터 확인하고 가라고!!! 버럭버럭!
| D:\study\llm> pip install outetts *** CMake configuration failed [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python error: failed-wheel-build-for-install × Failed to build installable wheels for some pyproject.toml based projects ╰─> llama-cpp-python D:\study\llm> |
[링크 : https://github.com/edwko/OuteTTS?tab=readme-ov-file#installation]
[링크 : https://huggingface.co/unsloth/Llama-OuteTTS-1.0-1B]
| Running the example With both of the models generated, the LLM model and the voice decoder model, we can run the example: $ build/bin/llama-tts -m ./models/outetts-0.2-0.5B-q8_0.gguf \ -mv ./models/wavtokenizer-large-75-f16.gguf \ -p "Hello world" ... main: audio written to file 'output.wav' |
[링크 : https://git.comtegra.pl/ajastrzebski/llama-cpp/-/tree/master/examples/tts]
[링크 : https://huggingface.co/OuteAI/OuteTTS-0.2-500M-GGUF/tree/main]
[링크 : https://huggingface.co/ggml-org/WavTokenizer/tree/main]
| D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64>llama-tts -m ..\OuteTTS-0.3-500M-Q8_0.gguf -mv ..\WavTokenizer-Large-75-F16.gguf -p "hello i am sam. how are you?" ggml_cuda_init: found 1 CUDA devices (Total VRAM: 6143 MiB): Device 0: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes, VRAM: 6143 MiB load_backend: loaded CUDA backend from D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64\ggml-cuda.dll load_backend: loaded RPC backend from D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64\ggml-rpc.dll load_backend: loaded CPU backend from D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64\ggml-cpu-haswell.dll common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on common_params_fit_impl: getting device memory data for initial parameters: common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted | common_memory_breakdown_print: | - CUDA0 (GTX 1060 6GB) | 6143 = 5197 + ( 931 = 506 + 96 + 329) + 15 | common_memory_breakdown_print: | - Host | 162 = 143 + 0 + 19 | common_params_fit_impl: projected to use 931 MiB of device memory vs. 5197 MiB of free device memory common_params_fit_impl: will leave 4265 >= 1024 MiB of free device memory, no changes needed common_fit_params: successfully fit params to free device memory common_fit_params: fitting params to free memory took 0.44 seconds llama_model_loader: loaded meta data with 25 key-value pairs and 290 tensors from ..\OuteTTS-0.3-500M-Q8_0.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = OuteTTS 0.3 500M llama_model_loader: - kv 3: general.basename str = OuteTTS-0.3 llama_model_loader: - kv 4: general.size_label str = 500M llama_model_loader: - kv 5: qwen2.block_count u32 = 24 llama_model_loader: - kv 6: qwen2.context_length u32 = 32768 llama_model_loader: - kv 7: qwen2.embedding_length u32 = 896 llama_model_loader: - kv 8: qwen2.feed_forward_length u32 = 4864 llama_model_loader: - kv 9: qwen2.attention.head_count u32 = 14 llama_model_loader: - kv 10: qwen2.attention.head_count_kv u32 = 2 llama_model_loader: - kv 11: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 12: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,157696] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,157696] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 151644 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 151645 llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 22: tokenizer.chat_template str = outetts-0.3 llama_model_loader: - kv 23: general.quantization_version u32 = 2 llama_model_loader: - kv 24: general.file_type u32 = 7 llama_model_loader: - type f32: 121 tensors llama_model_loader: - type q8_0: 169 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 506.02 MiB (8.50 BPW) llama_prepare_model_devices: using device CUDA0 (NVIDIA GeForce GTX 1060 6GB) (0000:01:00.0) - 5197 MiB free load: 0 unused tokens load: control-looking token: 128247 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden [0mload: printing all EOG tokens: load: - 128247 ('</s>') load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 5152 load: token to piece cache size = 0.9712 MB print_info: arch = qwen2 print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 32768 print_info: n_embd = 896 print_info: n_embd_inp = 896 print_info: n_layer = 24 print_info: n_head = 14 print_info: n_head_kv = 2 print_info: n_rot = 64 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 7 print_info: n_embd_k_gqa = 128 print_info: n_embd_v_gqa = 128 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: f_attn_value_scale = 0.0000 print_info: n_ff = 4864 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = -1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 32768 print_info: rope_yarn_log_mul = 0.0000 print_info: rope_finetuned = unknown print_info: model type = 1B print_info: model params = 499.19 M print_info: general.name = OuteTTS 0.3 500M print_info: vocab type = BPE print_info: n_vocab = 157696 print_info: n_merges = 151387 print_info: BOS token = 151644 '<|im_start|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151645 '<|im_end|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 128247 '</s>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false) load_tensors: offloading output layer to GPU load_tensors: offloading 23 repeating layers to GPU load_tensors: offloaded 25/25 layers to GPU load_tensors: CPU_Mapped model buffer size = 143.17 MiB load_tensors: CUDA0 model buffer size = 506.07 MiB .......................................................... common_init_result: added </s> logit bias = -inf common_init_result: added <|endoftext|> logit bias = -inf common_init_result: added <|im_end|> logit bias = -inf common_init_result: added <|fim_pad|> logit bias = -inf common_init_result: added <|repo_name|> logit bias = -inf common_init_result: added <|file_sep|> logit bias = -inf llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 8192 llama_context: n_ctx_seq = 8192 llama_context: n_batch = 8192 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_seq (8192) < n_ctx_train (32768) -- the full capacity of the model will not be utilized [0mllama_context: CUDA_Host output buffer size = 0.60 MiB llama_kv_cache: CUDA0 KV buffer size = 96.00 MiB llama_kv_cache: size = 96.00 MiB ( 8192 cells, 24 layers, 1/1 seqs), K (f16): 48.00 MiB, V (f16): 48.00 MiB llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 64 llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 64 sched_reserve: reserving ... sched_reserve: Flash Attention was auto, set to enabled sched_reserve: resolving fused Gated Delta Net support: sched_reserve: fused Gated Delta Net (autoregressive) enabled sched_reserve: fused Gated Delta Net (chunked) enabled sched_reserve: CUDA0 compute buffer size = 329.26 MiB sched_reserve: CUDA_Host compute buffer size = 19.51 MiB sched_reserve: graph nodes = 823 sched_reserve: graph splits = 2 sched_reserve: reserve took 9.05 ms, sched copies = 1 common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) [0mcommon_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on common_params_fit_impl: getting device memory data for initial parameters: common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted | common_memory_breakdown_print: | - CUDA0 (GTX 1060 6GB) | 6143 = 4255 + ( 496 = 120 + 0 + 376) + 1392 | common_memory_breakdown_print: | - Host | 36 = 4 + 0 + 32 | common_params_fit_impl: projected to use 496 MiB of device memory vs. 4255 MiB of free device memory common_params_fit_impl: will leave 3758 >= 1024 MiB of free device memory, no changes needed common_fit_params: successfully fit params to free device memory common_fit_params: fitting params to free memory took -0.78 seconds llama_model_loader: loaded meta data with 25 key-value pairs and 161 tensors from ..\WavTokenizer-Large-75-F16.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = wavtokenizer-dec llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = WavTokenizer Large Speech 75token llama_model_loader: - kv 3: general.finetune str = speech-75token llama_model_loader: - kv 4: general.basename str = WavTokenizer llama_model_loader: - kv 5: general.size_label str = large llama_model_loader: - kv 6: general.license str = mit llama_model_loader: - kv 7: wavtokenizer-dec.block_count u32 = 12 llama_model_loader: - kv 8: wavtokenizer-dec.context_length u32 = 8192 llama_model_loader: - kv 9: wavtokenizer-dec.embedding_length u32 = 1282 llama_model_loader: - kv 10: wavtokenizer-dec.attention.head_count u32 = 1 llama_model_loader: - kv 11: wavtokenizer-dec.attention.layer_norm_epsilon f32 = 0.000001 llama_model_loader: - kv 12: general.file_type u32 = 1 llama_model_loader: - kv 13: wavtokenizer-dec.vocab_size u32 = 4096 llama_model_loader: - kv 14: wavtokenizer-dec.features_length u32 = 512 llama_model_loader: - kv 15: wavtokenizer-dec.feed_forward_length u32 = 2304 llama_model_loader: - kv 16: wavtokenizer-dec.attention.group_norm_epsilon f32 = 0.000001 llama_model_loader: - kv 17: wavtokenizer-dec.attention.group_norm_groups u32 = 32 llama_model_loader: - kv 18: wavtokenizer-dec.posnet.embedding_length u32 = 768 llama_model_loader: - kv 19: wavtokenizer-dec.posnet.block_count u32 = 6 llama_model_loader: - kv 20: wavtokenizer-dec.convnext.embedding_length u32 = 768 llama_model_loader: - kv 21: wavtokenizer-dec.convnext.block_count u32 = 12 llama_model_loader: - kv 22: wavtokenizer-dec.attention.causal bool = false llama_model_loader: - kv 23: tokenizer.ggml.model str = none llama_model_loader: - kv 24: general.quantization_version u32 = 2 llama_model_loader: - type f32: 110 tensors llama_model_loader: - type f16: 51 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 124.15 MiB (16.03 BPW) llama_prepare_model_devices: using device CUDA0 (NVIDIA GeForce GTX 1060 6GB) (0000:01:00.0) - 4255 MiB free load: adding 4096 dummy tokens [0mprint_info: arch = wavtokenizer-dec print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 8192 print_info: n_embd = 512 print_info: n_embd_inp = 512 print_info: n_layer = 12 print_info: n_head = 1 print_info: n_head_kv = 1 print_info: n_rot = 512 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 512 print_info: n_embd_head_v = 512 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 512 print_info: n_embd_v_gqa = 512 print_info: f_norm_eps = 1.0e-06 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: f_attn_value_scale = 0.0000 print_info: n_ff = 2304 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 0 print_info: pooling type = -1 print_info: rope type = -1 print_info: rope scaling = linear print_info: freq_base_train = 10000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 8192 print_info: rope_yarn_log_mul = 0.0000 print_info: rope_finetuned = unknown print_info: model type = ?B print_info: model params = 64.98 M print_info: general.name = WavTokenizer Large Speech 75token print_info: vocab type = no vocab print_info: n_vocab = 4096 print_info: n_merges = 0 print_info: max token length = 0 load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false) load_tensors: offloading output layer to GPU load_tensors: offloading 11 repeating layers to GPU load_tensors: offloaded 13/13 layers to GPU load_tensors: CPU_Mapped model buffer size = 4.00 MiB load_tensors: CUDA0 model buffer size = 120.15 MiB ....................................... llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 8192 llama_context: n_ctx_seq = 8192 llama_context: n_batch = 8192 llama_context: n_ubatch = 8192 llama_context: causal_attn = 0 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 10000.0 llama_context: freq_scale = 1 llama_context: CUDA_Host output buffer size = 0.02 MiB sched_reserve: reserving ... sched_reserve: Flash Attention was auto, set to enabled sched_reserve: resolving fused Gated Delta Net support: sched_reserve: fused Gated Delta Net (autoregressive) enabled sched_reserve: fused Gated Delta Net (chunked) enabled sched_reserve: CUDA0 compute buffer size = 376.00 MiB sched_reserve: CUDA_Host compute buffer size = 32.03 MiB sched_reserve: graph nodes = 401 sched_reserve: graph splits = 2 sched_reserve: reserve took 14.06 ms, sched copies = 1 common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) [0msampler seed: 0 sampler params: repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000 dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1 top_k = 4, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000, adaptive_target = -1.000, adaptive_decay = 0.900 sampler chain: logits -> top-k -> dist main: loading done main: constructing prompt .. main: prompt: 'hello<|space|>i<|space|>am<|space|>sam<|space|>how<|space|>are<|space|>you' main: llama tokens: 151667, 198, 1782, 155780, 151929, 152412, 152308, 152585, 152460, 153375, 156777, 198, 74455, 155808, 151799, 151873, 151863, 152446, 152372, 152204, 152728, 152229, 152470, 151970, 153413, 152419, 153334, 153289, 153374, 153199, 152040, 153260, 152721, 152680, 153297, 152419, 153248, 152400, 152691, 153368, 153437, 156777, 198, 1722, 155828, 152607, 152256, 152991, 152299, 152688, 153163, 153016, 152789, 153198, 152712, 151911, 153107, 152623, 152170, 152395, 152852, 152207, 152461, 153321, 153309, 151750, 152137, 153340, 152573, 152267, 153347, 151789, 152681, 153339, 151992, 152512, 151751, 152179, 153434, 153180, 152900, 153440, 152474, 153122, 153129, 151904, 152311, 156777, 198, 1499, 155791, 152276, 152454, 153354, 152544, 153204, 153272, 152708, 153433, 152319, 153226, 153043, 152325, 153267, 152622, 156777, 198, 4250, 155797, 153454, 153342, 151989, 152458, 153420, 152303, 152271, 152827, 153036, 153196, 151708, 153263, 152561, 153207, 152213, 152112, 153204, 151722, 152542, 156777, 198, 19789, 155796, 153353, 153182, 152345, 152471, 152477, 153014, 152002, 152191, 151734, 152312, 152810, 152237, 153224, 153169, 153224, 152244, 153387, 153404, 156777, 198, 16069, 155811, 152265, 151946, 151808, 152412, 152363, 152305, 153156, 152733, 152810, 153157, 152016, 152100, 152069, 153234, 152317, 152589, 152707, 153121, 153341, 152159, 152114, 153156, 153001, 153504, 153376, 152272, 152433, 152325, 151941, 156777, 198, 285, 155788, 152238, 152255, 153427, 152318, 153009, 152381, 152474, 152680, 152157, 153255, 152324, 151682, 156777, 198, 32955, 155804, 153490, 153419, 152364, 152405, 152682, 152206, 152078, 153369, 152725, 153193, 153027, 152946, 152488, 153070, 151883, 152890, 152489, 153144, 153375, 152358, 151685, 152494, 152117, 152740, 156777, 198, 37448, 480, 155840, 151902, 152720, 153377, 152027, 152378, 152821, 153207, 153459, 153028, 153068, 152507, 153255, 152158, 152921, 151958, 152609, 152748, 152822, 152286, 151714, 152730, 152377, 152353, 152470, 152606, 152162, 152186, 153071, 152244, 153118, 153375, 153018, 152712, 153098, 152976, 152336, 151843, 153202, 152297, 151736, 153380, 153502, 152702, 152115, 153181, 152735, 153277, 153457, 152393, 153112, 152595, 156777, 198, 19098, 155808, 152464, 153452, 152595, 153312, 151937, 151933, 153197, 152239, 153163, 152922, 153402, 152034, 152591, 153438, 152215, 151673, 152005, 151785, 152642, 151924, 153278, 151805, 151974, 153482, 152718, 152862, 153347, 156777, 198, 72, 155780, 151795, 152111, 152746, 152377, 153471, 152309, 156777, 198, 19016, 155788, 153181, 152271, 152190, 152842, 152224, 152701, 152939, 152536, 152091, 151815, 152733, 151672, 156777, 198, 14689, 155788, 152291, 152072, 152942, 151734, 153042, 153504, 152589, 153333, 151839, 151941, 153038, 153180, 156777, 198, 36996, 8303, 155832, 152231, 152256, 152835, 152801, 152985, 153400, 152393, 152818, 152765, 152249, 152600, 151699, 152302, 152752, 153018, 153009, 151992, 153054, 152847, 153354, 153228, 152662, 153355, 152532, 153393, 151782, 152458, 152048, 152757, 152428, 153195, 151906, 153006, 153178, 153250, 152331, 152284, 152780, 153138, 153319, 151980, 153142, 152418, 152228, 152733, 156777, 198, 9096, 155801, 151698, 153321, 152217, 153039, 152935, 153400, 152122, 152531, 153106, 152169, 152892, 152957, 151851, 152427, 152826, 152451, 151851, 152901, 152885, 152594, 153446, 153080, 156777, 198, 14689, 155795, 152658, 151700, 153321, 152450, 152530, 153191, 151673, 151690, 151698, 152714, 152846, 152981, 153171, 153384, 153364, 153188, 153246, 156777, 198, 1055, 155779, 151869, 152388, 152711, 153334, 151736, 156777, 198, 1782, 155780, 153483, 153240, 152241, 152558, 152697, 153046, 156777, 198, 5804, 1363, 155820, 152941, 152764, 152605, 153034, 153434, 153372, 153347, 151887, 152453, 152758, 152133, 152510, 152694, 152431, 152321, 153088, 152676, 152223, 152581, 152459, 152015, 152502, 153063, 152712, 153294, 153451, 153032, 152903, 152859, 152989, 151748, 152669, 152661, 152650, 152409, 151861, 156777, 198, 300, 7973, 155828, 153095, 152469, 152988, 152894, 151819, 152391, 153019, 152058, 153062, 153230, 151826, 152112, 152306, 152264, 152769, 153390, 152384, 152435, 152790, 153393, 152983, 152540, 152252, 152034, 153107, 152540, 151919, 151893, 152558, 152817, 152946, 152956, 152129, 152715, 153131, 153490, 151734, 152271, 152707, 151734, 153321, 152450, 156777, 198, 8088, 155792, 152452, 153497, 153353, 152679, 152533, 152382, 152374, 152611, 153341, 153163, 152285, 153411, 152495, 153141, 152320, 156777, 198, 1199, 155781, 151764, 152360, 153295, 152634, 153342, 152199, 152271, 156777, 198, 43366, 155799, 152308, 151682, 152889, 152016, 152385, 152629, 152495, 151826, 153321, 152958, 152180, 151886, 153432, 152922, 152128, 153024, 153040, 152593, 152287, 151677, 156777, 198, 53660, 155808, 151727, 152092, 152680, 153331, 151699, 152316, 152938, 152289, 152433, 153384, 151781, 153137, 153259, 152175, 153213, 152291, 151869, 152691, 152489, 151941, 152049, 152034, 153053, 152179, 153160, 151676, 153367, 156777, 198, 268, 4123, 480, 155821, 152350, 152173, 152536, 151991, 151960, 153144, 153013, 152358, 152234, 153135, 152291, 153235, 152143, 152583, 152402, 153483, 152678, 152192, 152533, 152946, 151797, 153103, 152310, 152293, 151825, 152548, 153442, 152109, 152659, 153325, 152781, 152570, 152957, 151752, 152265, 153381, 152515, 156777, 198, 437, 155787, 152957, 152659, 151975, 152709, 152402, 152836, 152174, 151792, 153409, 153327, 152990, 156777, 198, 275, 155781, 152520, 153038, 152067, 153273, 153185, 152265, 152974, 156777, 198, 94273, 155799, 152953, 152938, 153427, 152244, 151920, 153423, 152929, 152367, 153052, 152129, 152331, 152257, 152987, 152777, 153448, 152408, 151696, 152408, 152326, 152699, 156777, 198, 385, 16239, 155828, 152306, 152268, 153438, 153228, 152978, 152957, 153153, 153393, 152795, 152110, 152918, 152923, 152467, 152331, 153053, 153330, 151889, 153444, 152234, 152624, 151779, 152801, 152784, 152139, 152222, 152751, 152512, 153287, 153141, 153052, 151840, 152589, 152508, 153499, 152109, 152255, 151739, 152267, 152759, 153318, 153165, 153349, 156777, <|im_start|> <|text_start|>the<|space|>overall<|space|>package<|space|>from<|space|>just<|space|>two<|space|>people<|space|>is<|space|>pretty<|space|>remarkable<|space|>sure<|space|>i<|space|>have<|space|>some<|space|>critiques<|space|>about<|space|>some<|space|>of<|space|>the<|space|>gameplay<|space|>aspects<|space|>but<|space|>its<|space|>still<|space|>really<|space|>enjoyable<|space|>and<|space|>it<|space|>looks<|space|>lovely<|space|>hello<|space|>i<|space|>am<|space|>sam<|space|>how<|space|>are<|space|>you<|text_end|> <|audio_start|> the<|t_0.08|><|257|><|740|><|636|><|913|><|788|><|1703|><|space|> overall<|t_0.36|><|127|><|201|><|191|><|774|><|700|><|532|><|1056|><|557|><|798|><|298|><|1741|><|747|><|1662|><|1617|><|1702|><|1527|><|368|><|1588|><|1049|><|1008|><|1625|><|747|><|1576|><|728|><|1019|><|1696|><|1765|><|space|> package<|t_0.56|><|935|><|584|><|1319|><|627|><|1016|><|1491|><|1344|><|1117|><|1526|><|1040|><|239|><|1435|><|951|><|498|><|723|><|1180|><|535|><|789|><|1649|><|1637|><|78|><|465|><|1668|><|901|><|595|><|1675|><|117|><|1009|><|1667|><|320|><|840|><|79|><|507|><|1762|><|1508|><|1228|><|1768|><|802|><|1450|><|1457|><|232|><|639|><|space|> from<|t_0.19|><|604|><|782|><|1682|><|872|><|1532|><|1600|><|1036|><|1761|><|647|><|1554|><|1371|><|653|><|1595|><|950|><|space|> just<|t_0.25|><|1782|><|1670|><|317|><|786|><|1748|><|631|><|599|><|1155|><|1364|><|1524|><|36|><|1591|><|889|><|1535|><|541|><|440|><|1532|><|50|><|870|><|space|> two<|t_0.24|><|1681|><|1510|><|673|><|799|><|805|><|1342|><|330|><|519|><|62|><|640|><|1138|><|565|><|1552|><|1497|><|1552|><|572|><|1715|><|1732|><|space|> people<|t_0.39|><|593|><|274|><|136|><|740|><|691|><|633|><|1484|><|1061|><|1138|><|1485|><|344|><|428|><|397|><|1562|><|645|><|917|><|1035|><|1449|><|1669|><|487|><|442|><|1484|><|1329|><|1832|><|1704|><|600|><|761|><|653|><|269|><|space|> is<|t_0.16|><|566|><|583|><|1755|><|646|><|1337|><|709|><|802|><|1008|><|485|><|1583|><|652|><|10|><|space|> pretty<|t_0.32|><|1818|><|1747|><|692|><|733|><|1010|><|534|><|406|><|1697|><|1053|><|1521|><|1355|><|1274|><|816|><|1398|><|211|><|1218|><|817|><|1472|><|1703|><|686|><|13|><|822|><|445|><|1068|><|space|> remarkable<|t_0.68|><|230|><|1048|><|1705|><|355|><|706|><|1149|><|1535|><|1787|><|1356|><|1396|><|835|><|1583|><|486|><|1249|><|286|><|937|><|1076|><|1150|><|614|><|42|><|1058|><|705|><|681|><|798|><|934|><|490|><|514|><|1399|><|572|><|1446|><|1703|><|1346|><|1040|><|1426|><|1304|><|664|><|171|><|1530|><|625|><|64|><|1708|><|1830|><|1030|><|443|><|1509|><|1063|><|1605|><|1785|><|721|><|1440|><|923|><|space|> sure<|t_0.36|><|792|><|1780|><|923|><|1640|><|265|><|261|><|1525|><|567|><|1491|><|1250|><|1730|><|362|><|919|><|1766|><|543|><|1|><|333|><|113|><|970|><|252|><|1606|><|133|><|302|><|1810|><|1046|><|1190|><|1675|><|space|> i<|t_0.08|><|123|><|439|><|1074|><|705|><|1799|><|637|><|space|> have<|t_0.16|><|1509|><|599|><|518|><|1170|><|552|><|1029|><|1267|><|864|><|419|><|143|><|1061|><|0|><|space|> some<|t_0.16|><|619|><|400|><|1270|><|62|><|1370|><|1832|><|917|><|1661|><|167|><|269|><|1366|><|1508|><|space|> critiques<|t_0.60|><|559|><|584|><|1163|><|1129|><|1313|><|1728|><|721|><|1146|><|1093|><|577|><|928|><|27|><|630|><|1080|><|1346|><|1337|><|320|><|1382|><|1175|><|1682|><|1556|><|990|><|1683|><|860|><|1721|><|110|><|786|><|376|><|1085|><|756|><|1523|><|234|><|1334|><|1506|><|1578|><|659|><|612|><|1108|><|1466|><|1647|><|308|><|1470|><|746|><|556|><|1061|><|space|> about<|t_0.29|><|26|><|1649|><|545|><|1367|><|1263|><|1728|><|450|><|859|><|1434|><|497|><|1220|><|1285|><|179|><|755|><|1154|><|779|><|179|><|1229|><|1213|><|922|><|1774|><|1408|><|space|> some<|t_0.23|><|986|><|28|><|1649|><|778|><|858|><|1519|><|1|><|18|><|26|><|1042|><|1174|><|1309|><|1499|><|1712|><|1692|><|1516|><|1574|><|space|> of<|t_0.07|><|197|><|716|><|1039|><|1662|><|64|><|space|> the<|t_0.08|><|1811|><|1568|><|569|><|886|><|1025|><|1374|><|space|> gameplay<|t_0.48|><|1269|><|1092|><|933|><|1362|><|1762|><|1700|><|1675|><|215|><|781|><|1086|><|461|><|838|><|1022|><|759|><|649|><|1416|><|1004|><|551|><|909|><|787|><|343|><|830|><|1391|><|1040|><|1622|><|1779|><|1360|><|1231|><|1187|><|1317|><|76|><|997|><|989|><|978|><|737|><|189|><|space|> aspects<|t_0.56|><|1423|><|797|><|1316|><|1222|><|147|><|719|><|1347|><|386|><|1390|><|1558|><|154|><|440|><|634|><|592|><|1097|><|1718|><|712|><|763|><|1118|><|1721|><|1311|><|868|><|580|><|362|><|1435|><|868|><|247|><|221|><|886|><|1145|><|1274|><|1284|><|457|><|1043|><|1459|><|1818|><|62|><|599|><|1035|><|62|><|1649|><|778|><|space|> but<|t_0.20|><|780|><|1825|><|1681|><|1007|><|861|><|710|><|702|><|939|><|1669|><|1491|><|613|><|1739|><|823|><|1469|><|648|><|space|> its<|t_0.09|><|92|><|688|><|1623|><|962|><|1670|><|527|><|599|><|space|> still<|t_0.27|><|636|><|10|><|1217|><|344|><|713|><|957|><|823|><|154|><|1649|><|1286|><|508|><|214|><|1760|><|1250|><|456|><|1352|><|1368|><|921|><|615|><|5|><|space|> really<|t_0.36|><|55|><|420|><|1008|><|1659|><|27|><|644|><|1266|><|617|><|761|><|1712|><|109|><|1465|><|1587|><|503|><|1541|><|619|><|197|><|1019|><|817|><|269|><|377|><|362|><|1381|><|507|><|1488|><|4|><|1695|><|space|> enjoyable<|t_0.49|><|678|><|501|><|864|><|319|><|288|><|1472|><|1341|><|686|><|562|><|1463|><|619|><|1563|><|471|><|911|><|730|><|1811|><|1006|><|520|><|861|><|1274|><|125|><|1431|><|638|><|621|><|153|><|876|><|1770|><|437|><|987|><|1653|><|1109|><|898|><|1285|><|80|><|593|><|1709|><|843|><|space|> and<|t_0.15|><|1285|><|987|><|303|><|1037|><|730|><|1164|><|502|><|120|><|1737|><|1655|><|1318|><|space|> it<|t_0.09|><|848|><|1366|><|395|><|1601|><|1513|><|593|><|1302|><|space|> looks<|t_0.27|><|1281|><|1266|><|1755|><|572|><|248|><|1751|><|1257|><|695|><|1380|><|457|><|659|><|585|><|1315|><|1105|><|1776|><|736|><|24|><|736|><|654|><|1027|><|space|> lovely<|t_0.56|><|634|><|596|><|1766|><|1556|><|1306|><|1285|><|1481|><|1721|><|1123|><|438|><|1246|><|1251|><|795|><|659|><|1381|><|1658|><|217|><|1772|><|562|><|952|><|107|><|1129|><|1112|><|467|><|550|><|1079|><|840|><|1615|><|1469|><|1380|><|168|><|917|><|836|><|1827|><|437|><|583|><|67|><|595|><|1087|><|1646|><|1493|><|1677|><|space|>main: prompt size: 871 main: time for prompt: 252.929 ms [38;5;71m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;215m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;215m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;215m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;215m0[0m[38;5;71m0[0m[38;5;215m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;215m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;215m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;114m0[0m[38;5;71m0[0m[38;5;227m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;208m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;114m0[0m[38;5;227m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m[38;5;71m0[0m main: time for decoder: 1412.620 ms common_perf_print: sampling time = 66.79 ms common_perf_print: samplers time = 26.02 ms / 199 tokens common_perf_print: load time = 655.95 ms common_perf_print: prompt eval time = 234.97 ms / 871 tokens ( 0.27 ms per token, 3706.90 tokens per second) common_perf_print: eval time = 1341.80 ms / 198 runs ( 6.78 ms per token, 147.56 tokens per second) common_perf_print: total time = 1075.97 ms / 1069 tokens common_perf_print: unaccounted time = -567.58 ms / -52.8 % (total - sampling - prompt eval - eval) / (total) common_perf_print: graphs reused = 196 common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted | common_memory_breakdown_print: | - CUDA0 (GTX 1060 6GB) | 6143 = 3733 + ( 931 = 506 + 96 + 329) + 1479 | common_memory_breakdown_print: | - Host | 162 = 143 + 0 + 19 | codes: ' hello<|t_0.96|><|865|><|1506|><|865|><|1419|><|1819|><|838|><|624|><|1251|><|899|><|954|><|1096|><|710|><|1152|><|1418|><|710|><|1301|><|1120|><|17|><|1456|><|1405|><|776|><|1668|><|1390|><|86|><|1292|><|1023|><|1683|><|1589|><|1092|><|1556|><|1479|><|1294|><|1292|><|805|><|1683|><|1430|><|900|><|1714|><|995|><|1294|><|1432|><|1007|><|1622|><|1120|><|861|><|1803|><|995|><|1092|><|1668|><|710|><|1433|><|933|><|670|><|32|><|1293|><|1251|><|1134|><|1701|><|1347|><|816|><|642|><|95|><|508|><|48|><|503|><|653|><|1707|><|1041|><|267|><|1817|><|248|><|1754|><|space|> i<|t_0.28|><|73|><|642|><|169|><|614|><|983|><|169|><|843|><|443|><|1092|><|752|><|252|><|1378|><|1315|><|221|><|1448|><|1083|><|565|><|866|><|93|><|767|><|1697|><|space|> am<|t_0.16|><|422|><|852|><|408|><|847|><|1007|><|550|><|874|><|673|><|191|><|127|><|220|><|716|><|space|> sam<|t_0.43|><|775|><|487|><|646|><|519|><|493|><|1513|><|1|><|1166|><|640|><|556|><|0|><|1061|><|18|><|333|><|719|><|632|><|693|><|907|><|430|><|1312|><|1086|><|1098|><|1333|><|974|><|816|><|440|><|1755|><|1324|><|1534|><|662|><|1812|><|385|><|space|> how<|t_0.20|><|1663|><|1028|><|1488|><|1314|><|1393|><|1723|><|1303|><|1497|><|951|><|1181|><|789|><|142|><|1475|><|66|><|297|><|space|> are<|t_0.13|><|798|><|1803|><|562|><|123|><|756|><|968|><|381|><|890|><|1773|><|1039|><|space|> you<|t_0.08|><|193|><|92|><|1221|><|1334|><|562|><|1415|> <|audio_end|> <|im_end|>' main: codes size: 199 codes audio: '<|865|><|1506|><|865|><|1419|><|1819|><|838|><|624|><|1251|><|899|><|954|><|1096|><|710|><|1152|><|1418|><|710|><|1301|><|1120|><|17|><|1456|><|1405|><|776|><|1668|><|1390|><|86|><|1292|><|1023|><|1683|><|1589|><|1092|><|1556|><|1479|><|1294|><|1292|><|805|><|1683|><|1430|><|900|><|1714|><|995|><|1294|><|1432|><|1007|><|1622|><|1120|><|861|><|1803|><|995|><|1092|><|1668|><|710|><|1433|><|933|><|670|><|32|><|1293|><|1251|><|1134|><|1701|><|1347|><|816|><|642|><|95|><|508|><|48|><|503|><|653|><|1707|><|1041|><|267|><|1817|><|248|><|1754|><|73|><|642|><|169|><|614|><|983|><|169|><|843|><|443|><|1092|><|752|><|252|><|1378|><|1315|><|221|><|1448|><|1083|><|565|><|866|><|93|><|767|><|1697|><|422|><|852|><|408|><|847|><|1007|><|550|><|874|><|673|><|191|><|127|><|220|><|716|><|775|><|487|><|646|><|519|><|493|><|1513|><|1|><|1166|><|640|><|556|><|0|><|1061|><|18|><|333|><|719|><|632|><|693|><|907|><|430|><|1312|><|1086|><|1098|><|1333|><|974|><|816|><|440|><|1755|><|1324|><|1534|><|662|><|1812|><|385|><|1663|><|1028|><|1488|><|1314|><|1393|><|1723|><|1303|><|1497|><|951|><|1181|><|789|><|142|><|1475|><|66|><|297|><|798|><|1803|><|562|><|123|><|756|><|968|><|381|><|890|><|1773|><|1039|><|193|><|92|><|1221|><|1334|><|562|><|1415|>' main: codes audio size: 168 main: time for vocoder: 220.671 ms main: time for spectral ops: 850.860 ms main: total time: 2737.218 ms main: audio written to file 'output.wav' |
영어는 잘되는데 한글은 잘 안되는 듯.
GPT 통해서 영어로 "안녕? 난 잼미니야 만나서 반가워" 를 TTS에 유리하게 바꾸어 달라고 했는데
'안녕? 난' 은 날아가고 '잼미니야 맨나서 빵가워' 정도로 들린다.
| D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64>llama-tts -m ..\OuteTTS-0.3-500M-Q8_0.gguf -mv ..\WavTokenizer-Large-75-F16.gguf -p "Annyoung? Nahn Jemmini-ya. Mannaseo bangawo." |
+
허깅페이스에서 타입에 아예 tts가 있었군.

[링크 : https://huggingface.co/models?pipeline_tag=text-to-speech]
'프로그램 사용 > ai 프로그램' 카테고리의 다른 글
| Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice 시도 실패 (0) | 2026.05.10 |
|---|---|
| supertone/supertonic3 시도 (0) | 2026.05.10 |
| huggingface 에서 다운로드 받기(python) (0) | 2026.05.10 |
| stable diffusion 사용법 (0) | 2026.05.09 |
| stable diffusion python service (0) | 2026.05.08 |
