음.. 메모리 사용량과 최대 소비전력을 곱하니 지금 드시는 소비전력이랑 얼추 비슷하게 나오는 느낌.

Stable Diffusion은 먹는 메모리 대비로는 많이 GPU를 굴릴수 있는데 (250W 를 가끔 초과하거나 근접하게 돌림)

사용중인 모델은 gemma4-e4b 이긴 한데  어떻게 저렇게 우연이라도 얼추 맞게 나오지?

 

Posted by 구차니

DB : postgres, VectorDB : pgvector

embedding

[링크 : https://github.com/gulcin/pgvector-rag-app]

    [링크 : https://edbkorea.com/blog/postgres-및-pgvector가-포함된-rag-앱/]

import PyPDF2
import torch
from transformers import pipeline

def generate_embeddings(tokenizer, model, device, text):
    inputs = tokenizer(
        text, return_tensors="pt", truncation=True, max_length=512
    ).to(device)
    with torch.no_grad():
        outputs = model(**inputs, output_hidden_states=True)
    return text, outputs.hidden_states[-1].mean(dim=1).tolist()


def read_pdf_file(pdf_path):
    pdf_document = PyPDF2.PdfReader(pdf_path)

    lines = []
    for page_number in  range(len(pdf_document.pages)):
        page = pdf_document.pages[page_number]

        text = page.extract_text()

        lines.extend(text.splitlines())

    return lines

[링크 : https://github.com/gulcin/pgvector-rag-app/blob/master/embedding.py]

[링크 : https://github.com/gulcin/pgvector-rag-app/blob/master/commands/import_data.py]

 

langchain, kure(임베딩 벡터 생성)

fast api, streamlit

[링크 : https://lsjsj92.tistory.com/686]

     [링크 : https://huggingface.co/nlpai-lab/KURE-v1]

 

ChromaDB, langchain

DB : postgres, VectorDB : pgvector

[링크 : https://velog.io/@judy_choi/PGVector-와-프롬프트를-이용한-RAG-고도화]

Posted by 구차니

antigravity 가라사대~

512x 512에서 1024x1024로 바꾸란다.

왜 되지?!?!

1. 해상도 (Resolution) 변경 🌟 (가장 중요)
SD 1.5는 기본적으로 512x512 해상도로 학습되었습니다.
SDXL은 1024x1024 기반으로 학습되었습니다. SDXL에서 512x512로 생성하면 이미지가 깨지거나 노이즈 덩어리처럼 나오게 됩니다.
변경: width와 height를 1024로 변경하거나, 원하는 비율에 맞춰 SDXL 권장 해상도(예: 832x1216, 1152x896 등)로 변경해야 합니다.
2. CFG Scale (분류기 자유도) 하향
현재 코드에 설정된 cfg_scale: 15 (img2img는 30)는 SD 1.5에서도 꽤 높은 편이지만, SDXL에서는 너무 높은 수치입니다. CFG가 너무 높으면 색감이 타고 이미지가 일그러집니다.
변경: SDXL은 프롬프트 이해력이 뛰어나므로 cfg_scale을 5.0 ~ 7.0 사이로 낮추는 것이 좋습니다.
3. 네거티브 프롬프트 (선택 사항)
SDXL (특히 파생 모델들)은 SD 1.5만큼 복잡하고 긴 네거티브 프롬프트나 Embedding(EasyNegative 등)에 크게 의존하지 않습니다. 오히려 너무 무거운 네거티브 프롬프트는 생성을 방해할 수 있습니다.
현재 코드에 있는 base_neg 정도면 나쁘지 않으나, 필요하다면 줄일 수 있습니다. (우선은 그대로 두셔도 무방합니다.)

 

512x512 생성 실패, VAE automatic 설정

 

1024x1024 생성 성공, VAE automatic 설정

 

 

----

API  •  GitHub  •  Gradio  •  Startup profile  •  Reload UI
version: v1.10.1-96-g1937682a  •  python: 3.10.12  •  torch: 2.1.2+cu121  •  xformers: N/A  •  gradio: 3.41.2  •  checkpoint: 6ce0161689

 

[링크 : https://civitai.com/models/795765/illustrious-xl]

[링크 : https://huggingface.co/OnomaAIResearch/Illustrious-xl-early-release-v0/tree/main]

 

illustriousXL 모델로 하면 이상하게 나오는데

 

처음에 해봤던 V1.5 emanoly는 정상적으로 잘 나온다.(물론 이상한 비행기 형태인건 여전하지만)

 

 

There's not much to it. The SDXL models work similar to other models. They are just more resource intensive.
Update your A1111 installation. I find it easiest to just run "git pull" from a command line.
Then download an SDXL model and VAE and put them with your other SD models. (The official SDXL release consists of a base model and a refiner but most people don't seem to bother with refiners.)
If your graphics card has less than 12 GB VRAM, then add the "--medvram-sdxl" argument.
Don't expect your 1.5 prompts to perform well with SDXL. You'll need to relearn how to write good prompts. Go to Civitai, grab a model that you like, look at the example images and study their prompts and settings. SDXL works best at around 1024×1024 pixels.

[링크 : https://www.reddit.com/r/StableDiffusion/comments/1bwidjg/updated_guide_on_how_to_run_sdxl_on_a1111/]

[링크 : https://huggingface.co/stabilityai/sdxl-vae]

 

/mnt/Downloads/stable-diffusion-webui/models$ ls -alR VAE*
VAE:
total 8
drwxrwxr-x  2 falinux falinux 4096  5월  4 21:38  .
drwxrwxr-x 11 falinux falinux 4096  5월  4 21:51  ..
-rw-rw-r--  1 falinux falinux    0  5월  4 21:38 'Put VAE here.txt'

VAE-approx:
total 432
drwxrwxr-x  2 falinux falinux   4096  5월  6 21:57 .
drwxrwxr-x 11 falinux falinux   4096  5월  4 21:51 ..
-rw-rw-r--  1 falinux falinux 213777  5월  4 21:38 model.pt
-rw-rw-r--  1 falinux falinux 213777  5월  6 21:57 vaeapprox-sdxl.pt

 

$ cat /mnt/Downloads/stable-diffusion-webui/repositories/generative-models/configs/inference/sd_xl_base.yamlml
model:
  target: sgm.models.diffusion.DiffusionEngine
  params:
    scale_factor: 0.13025
    disable_first_stage_autocast: True

    denoiser_config:
      target: sgm.modules.diffusionmodules.denoiser.DiscreteDenoiser
      params:
        num_idx: 1000

        weighting_config:
          target: sgm.modules.diffusionmodules.denoiser_weighting.EpsWeighting
        scaling_config:
          target: sgm.modules.diffusionmodules.denoiser_scaling.EpsScaling
        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.LegacyDDPMDiscretization

    network_config:
      target: sgm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        adm_in_channels: 2816
        num_classes: sequential
        use_checkpoint: True
        in_channels: 4
        out_channels: 4
        model_channels: 320
        attention_resolutions: [4, 2]
        num_res_blocks: 2
        channel_mult: [1, 2, 4]
        num_head_channels: 64
        use_spatial_transformer: True
        use_linear_in_transformer: True
        transformer_depth: [1, 2, 10]  # note: the first is unused (due to attn_res starting at 2) 32, 16, 8 --> 64, 32, 16
        context_dim: 2048
        spatial_transformer_attn_type: softmax-xformers
        legacy: False

    conditioner_config:
      target: sgm.modules.GeneralConditioner
      params:
        emb_models:
          # crossattn cond
          - is_trainable: False
            input_key: txt
            target: sgm.modules.encoders.modules.FrozenCLIPEmbedder
            params:
              layer: hidden
              layer_idx: 11
          # crossattn and vector cond
          - is_trainable: False
            input_key: txt
            target: sgm.modules.encoders.modules.FrozenOpenCLIPEmbedder2
            params:
              arch: ViT-bigG-14
              version: laion2b_s39b_b160k
              freeze: True
              layer: penultimate
              always_return_pooled: True
              legacy: False
          # vector cond
          - is_trainable: False
            input_key: original_size_as_tuple
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256  # multiplied by two
          # vector cond
          - is_trainable: False
            input_key: crop_coords_top_left
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256  # multiplied by two
          # vector cond
          - is_trainable: False
            input_key: target_size_as_tuple
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256  # multiplied by two

    first_stage_config:
      target: sgm.models.autoencoder.AutoencoderKLInferenceWrapper
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          attn_type: vanilla-xformers
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult: [1, 2, 4, 4]
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

 

SD VAE 에서 refresh 하고

sdxl_vae.safetensors 를 선택하고 apply settings 해준 후

 

그리기 시도하니 얼추 비슷... 한건가?

 

원래는 이런 이미지

Hatsune Miku,limited palette,black background,colorful,vibrant,glowing outline,neon,blacklight,looking at viewer, masterpiece, very aesthetic Negative prompt: worst quality,bad quality,bad hands,very displeasing,extra digit,fewer digits,jpeg artifacts,signature,username,reference,mutated,lineup,manga,comic,disembodied,futanari,yaoi,dickgirl,turnaround,2koma,4koma,monster,cropped,amputee,text,bad foreshortening,what,guro,logo,bad anatomy,bad perspective,bad proportions,artistic error,anatomical nonsense,amateur,out of frame,multiple views, Steps: 28, CFG scale: 7.5, Sampler: Euler a, Seed: 3625896228, Size: 832x1216, Model: Illustrious-XL-v0.1, Version: f1.0.2-v1.10.1RC-latest-691-g37223711, Model hash: 3e15ba0038, Schedule type: Automatic, Discard penultimate sigma: True

 

--lowvram

 

[링크 : https://huggingface.co/stabilityai/models?search=sdxl]

[링크 : https://huggingface.co/stabilityai/sdxl-turbo/tree/main]

[링크 : https://huggingface.co/stabilityai/sdxl-vae/tree/main]

 

 

 

Posted by 구차니

아따 많이도 찾아봤었네 -_-

집에는 라즈베리 하나 켜놓고 회사에 켠 녀석을 집으로 SSH 접속하게 하고

집에서는 내부 아이피로 역으로 접속할수 있는 시스템을 만들려고 하는 중

 

[링크 : https://manpages.ubuntu.com/manpages/jammy/man1/autossh.1.html]

[링크 : https://lstm.tistory.com/10]

[링크 : https://m.clien.net/service/board/cm_linux/4344761]

[링크 : https://sangwonyoon.tistory.com/m/entry/Autossh로-SSH-연결-유지하기]

 

2018.05.14 - [프로그램 사용/ssh scp sftp] - reverse SSH

2021.01.03 - [프로그램 사용/ssh scp sftp] - reverse ssh

 

 

-------------------

2026.05.13

아래 링크의 옵션 참조했음

[링크 : https://donotlimityourself.tistory.com/33[

 

private(회사)

원격지에 2222 포트로 현재 pc의 22번 포트를 돌린다~ 라는 의미 인듯한데

그래서 listen에 추가로 포트가 열리지도 않았고, 정상적으로 실행이 되는 건가 보다.

$ ssh minimonk@집SSH도메인 -p 8022 -f -N -T -R 2222:localhost:22
minimonk@집SSH도메인's password: 
$ ps -ef | grep ssh
root         900       1  0  5월12 ?      00:00:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root        2298     900  0  5월12 ?      00:00:00 sshd: minimonk [priv]
minimonk    2385    2298  0  5월12 ?      00:05:27 sshd: minimonk@pts/0
root        5656     900  0  5월12 ?      00:00:00 sshd: minimonk [priv]
minimonk    5735    5656  0  5월12 ?      00:00:00 sshd: minimonk@pts/7
minimonk   10717       1  0 09:58 ?        00:00:00 ssh minimonk@집SSH도메인 -p 8022 -f -N -T -R 2222:localhost:22
minimonk   10719    5736  0 09:58 pts/7    00:00:00 grep --color=auto ssh

$ netstat -tnlp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:39557         0.0.0.0:*               LISTEN      3128/language_serve 
tcp        0      0 127.0.0.1:5803          0.0.0.0:*               LISTEN      5350/llama-server   
tcp        0      0 127.0.0.1:38605         0.0.0.0:*               LISTEN      3128/language_serve 
tcp        0      0 127.0.0.1:6010          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:6012          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:45837         0.0.0.0:*               LISTEN      6184/language_serve 
tcp        0      0 0.0.0.0:7860            0.0.0.0:*               LISTEN      5584/venv/bin/pytho 
tcp        0      0 127.0.0.1:36141         0.0.0.0:*               LISTEN      6184/language_serve 
tcp        0      0 127.0.0.1:36197         0.0.0.0:*               LISTEN      6184/language_serve 
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:44747         0.0.0.0:*               LISTEN      3016/antigravity    
tcp        0      0 127.0.0.1:35159         0.0.0.0:*               LISTEN      6061/exe            
tcp        0      0 127.0.0.1:34279         0.0.0.0:*               LISTEN      3016/antigravity    
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:631           0.0.0.0:*               LISTEN      -                   
tcp6       0      0 :::5900                 :::*                    LISTEN      1502/gnome-remote-d 
tcp6       0      0 ::1:631                 :::*                    LISTEN      -                   
tcp6       0      0 :::8080                 :::*                    LISTEN      5256/./llama-swap   
tcp6       0      0 :::22                   :::*                    LISTEN      -                   
tcp6       0      0 ::1:6010                :::*                    LISTEN      -                   
tcp6       0      0 ::1:6012                :::*                    LISTEN      -                   
tcp6       0      0 :::3389                 :::*                    LISTEN      1502/gnome-remote-d 

 

public(내 집)

접속전
$ netstat -tnlp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -                   
tcp6       0      0 :::22                   :::*                    LISTEN      -  

접속후
$ netstat -tnlp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:2222          0.0.0.0:*               LISTEN      -                   
tcp6       0      0 :::22                   :::*                    LISTEN      -                   
tcp6       0      0 ::1:2222                :::*                    LISTEN      -      

 

SSH 옵션을 보면 -fNTR 에서  -f이 백그라운드로 뜨게 하느것이고

NT는 터미널 관련, 명령 실행 관련

R은 원격지 포트에 내껄 붙인다는 의미

역시 NTR은 좋은것이여... (응?)

     -f      Requests ssh to go to background just before command execution.
             This is useful if ssh is going to ask for passwords or
             passphrases, but the user wants it in the background.  This im‐
             plies -n.  The recommended way to start X11 programs at a remote
             site is with something like ssh -f host xterm.

             If the ExitOnForwardFailure configuration option is set to “yes”,
             then a client started with -f will wait for all remote port for‐
             wards to be successfully established before placing itself in the
             background.  Refer to the description of ForkAfterAuthentication
             in ssh_config(5) for details.


     -N      Do not execute a remote command.  This is useful for just for‐
             warding ports.  Refer to the description of SessionType in
             ssh_config(5) for details.

     -T      Disable pseudo-terminal allocation.

     -R [bind_address:]port:host:hostport
     -R [bind_address:]port:local_socket
     -R remote_socket:host:hostport
     -R remote_socket:local_socket
     -R [bind_address:]port
             Specifies that connections to the given TCP port or Unix socket
             on the remote (server) host are to be forwarded to the local
             side.

 

이제 autossh를 설치하고

$ sudo apt-get install autossh
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  autossh
0 upgraded, 1 newly installed, 0 to remove and 47 not upgraded.
Need to get 29.2 kB of archives.
After this operation, 89.1 kB of additional disk space will be used.
Get:1 http://kr.archive.ubuntu.com/ubuntu jammy/universe amd64 autossh amd64 1.4g-1 [29.2 kB]
Fetched 29.2 kB in 0s (217 kB/s)   
Selecting previously unselected package autossh.
(Reading database ... 322586 files and directories currently installed.)
Preparing to unpack .../autossh_1.4g-1_amd64.deb ...
Unpacking autossh (1.4g-1) ...
Setting up autossh (1.4g-1) ...
Processing triggers for man-db (2.10.2-1) ...

 

 public 쪽에 로그인이 되도록 키를 복사하려는데 안되네 -_-???

아무튼 키를 생성하고 해주면 끝

$ ssh-copy-id -p 8022 minimonk@집SSH도메인
/usr/bin/ssh-copy-id: ERROR: No identities found

$ ssh-keygen

 

-f 를 넣으면 키를 넣어줘도 로그인이 안되서 일단 빼고 하니 되긴한데..

$ autossh -M -0 -o "ServerAliveinterval 30" -o "ServerAliveCountMax 3" -T -R 2222:localhost:22 minimonk@집SSH도메인 -p 2022

[링크 : https://sangwonyoon.tistory.com/entry/Autossh로-SSH-연결-유지하기]

 

autossh가 죽으면 어쩌지 싶어서 데몬으로 된 거 없나 찾아 봐야 할 듯.

[링크 : https://tecadmin.net/autossh-persistent-ssh-connections/]

 

 

+

여러 번의 인자를 사용하면 복수의 포트를 포워딩 할 수 있다.

ssh remote-host -L 8822:REMOTE_IP_1:22 -L 9922:REMOTE_IP_2:22

[링크 : https://stackoverflow.com/questions/29936948/ssh-l-forward-multiple-ports]

 

+

아래를 public 쪽 sshd_config 에 설정해주고 sshd를 재기동하고

$ cat /etc/ssh/sshd_config
# GatewayPorts no
GatewayPorts yes
$ sudo systemctl restart sshd

 

2222 대신 0.0.0.0:2222 라고 입력하고 실행하면

$ ssh minimonk@집SSH도메인 -p 8022 -f -N -T -R 0.0.0.0:2222:localhost:22

 

localhost로 되어있던걸

$ netstat -tnlp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:2222          0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:6010          0.0.0.0:*               LISTEN      -
tcp6       0      0 :::22                   :::*                    LISTEN      -
tcp6       0      0 ::1:2222                :::*                    LISTEN      -
tcp6       0      0 ::1:6010                :::*                    LISTEN      -

 

0.0.0.0 으로 바꾸어서 public 서버의 서비스 인것 처럼 붙일수도 있다.

$ netstat -tnlp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:2222            0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:6010          0.0.0.0:*               LISTEN      -
tcp6       0      0 :::22                   :::*                    LISTEN      -
tcp6       0      0 :::2222                 :::*                    LISTEN      -
tcp6       0      0 ::1:6010                :::*                    LISTEN      -

'프로그램 사용 > ssh scp sftp' 카테고리의 다른 글

ssh -t  (0) 2025.09.08
ssh-copy-id  (0) 2025.04.18
ssh socks proxy  (0) 2024.07.22
ssh ecdsa 미지원(ubuntu 22.04)  (0) 2023.05.30
ubuntu ssh x11 forwarding시 gnome 화면 끌어오기  (0) 2022.07.11
Posted by 구차니

음성인식 - 그림 들어가면 그리기

 

 

그림 넣고 캡션에 그림 있으면 img2img로 작동시키기

Posted by 구차니

흐음.. 신기한걸 알았는데.. 또 파이썬이네.

핸드폰에서 돌려놓기도 하는거 보면 제법 가볍긴 한다 보다.

 

 

D:\study\llm>pip install litert-lm
D:\study\llm>litert-lm
CLI tool for LiteRT-LM models.

Usage: litert-lm [OPTIONS] COMMAND [ARGS]...

Commands:
  benchmark  Benchmarks a LiteRT-LM model.
  delete     Deletes a model from the local storage.
  import     Imports a model from a local path or HuggingFace hub.
  list       Lists all imported LiteRT-LM models.
  rename     Renames a model.
  run        Runs a LiteRT-LM model interactively or with a single prompt.
  serve      Start a server with a Gemini or OpenAI compatible API (alpha feature)

Global options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

D:\study\llm>litert-lm run --from-huggingface-repo=litert-community/gemma-4-E4B-it-litert-lm gemma-4-E4B-it.litertlm --backend=gpu  --enable-speculative-decoding=true --prompt="What is the capital of France?"
Downloading gemma-4-E4B-it.litertlm from litert-community/gemma-4-E4B-it-litert-lm...
gemma-4-E4B-it.litertlm:   0%|                                                 | 92.3k/3.66G [00:01<16:44:38, 60.7kB/s]
gemma-4-E4B-it.litertlm: 100%|████████████████████████████████████████████████████| 3.66G/3.66G [05:00<00:00, 12.2MB/s]
C:\Users\minimonk\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\minimonk\.cache\huggingface\hub\models--litert-community--gemma-4-E4B-it-litert-lm. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
The capital of France is **Paris**.

 

 

찾아보니 저장소의 파일명이 그런거였군.

[링크 : https://huggingface.co/litert-community/gemma-4-E4B-it-litert-lm/tree/main]

 

[링크 : https://huggingface.co/metricspace/gemma4-E2B-it-litert-128k-mtp/tree/main]

 

mtp 들어가면서 안되나?

D:\study\llm>litert-lm run --from-huggingface-repo=metricspace/gemma4-E2B-it-litert-128k-mtp model.litertlm --backend=gpu  --enable-speculative-decoding=true --prompt="What is the capital of France?"
Downloading model.litertlm from metricspace/gemma4-E2B-it-litert-128k-mtp...
E0000 00:00:1778407860.973266    8280 delegate_webgpu.cc:373] Failed to create litert::ml_drift::DelegateKernelLiteRt: RESOURCE_EXHAUSTED: Requested allocation size - 4294967296 bytes. Max allocation size for this GPU - 2147483648 bytes. Shape - {bhwdc, {1, 1, 8192, 1, 131072}}, data type - float32.
=== Source Location Trace: ===
third_party/ml_drift/common/task/tensor_desc.cc:1846
third_party/ml_drift/common/gpu_model_util.cc:232
third_party/ml_drift/common/gpu_model_util.cc:269
third_party/ml_drift/common/gpu_model_util.cc:432
third_party/odml/litert/ml_drift/delegate/delegate_kernel.cc:765
third_party/odml/litert/ml_drift/delegate/delegate_kernel.cc:695
third_party/odml/litert/ml_drift/delegate/delegate_kernel.cc:787
third_party/odml/litert/ml_drift/delegate/delegate_kernel.cc:284
third_party/odml/litert/ml_drift/delegate/delegate_kernel_litert.cc:167
ERROR: Failed to initialize kernel.
ERROR: Node number 223 (STABLEHLO_COMPOSITE) failed to prepare.
E0000 00:00:1778407862.768911    8280 engine.cc:491] Failed to create engine: INTERNAL: ERROR: [third_party/odml/litert_lm/runtime/executor/llm_litert_compiled_model_executor.cc:1928]
??ERROR: [./third_party/odml/litert/litert/cc/litert_compiled_model.h:1780]
=== Source Location Trace: ===
./third_party/odml/litert/litert/cc/litert_macros.h:538
third_party/odml/litert_lm/runtime/executor/llm_litert_compiled_model_executor_factory.cc:144
third_party/odml/litert_lm/runtime/core/engine_impl.cc:384
An error occurred
Traceback (most recent call last):
  File "C:\Users\minimonk\AppData\Local\Programs\Python\Python310\lib\site-packages\litert_lm_cli\model.py", line 255, in run_interactive
    engine_cm = litert_lm.Engine(
  File "C:\Users\ minimonk \AppData\Local\Programs\Python\Python310\lib\site-packages\litert_lm\engine.py", line 82, in __init__
    raise RuntimeError(
RuntimeError: Failed to create LiteRT-LM engine for C:\Users\ minimonk \.cache\huggingface\hub\models--metricspace--gemma4-E2B-it-litert-128k-mtp\snapshots\4dae3505f550397923c206eaa63be84f17ee43cb\model.litertlm

 

 

[링크 : https://github.com/google-ai-edge/LiteRT-LM]

[링크 : https://huggingface.co/metricspace/gemma4-E2B-it-litert-128k-mtp]

[링크 : https://pypi.org/project/litert-lm/]

[링크 : https://www.reddit.com/r/LocalLLaMA/comments/1somixt/practical_local_llm_on_android_gemma_4_via/?tl=ko]

Posted by 구차니

python 에서 돌아가는 녀석인 듯.

 

[링크 : https://vllm.ai/]

 

[링크 : https://github.com/vllm-project/vllm/releases]

Posted by 구차니

qwen 형님으로 모셔야 하나 ㅋㅋㅋ

 

D:\study\llm>pip install soundfile torch qwen_tts
D:\study\llm>python
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import soundfile as sf
>>> from qwen_tts import Qwen3TTSModel

********
Warning: flash-attn is not installed. Will only run the manual PyTorch version. Please install flash-attn for faster inference.
********

'sox' is not recognized as an internal or external command,
operable program or batch file.
SoX could not be found!

    If you do not have SoX, proceed here:
     - - - http://sox.sourceforge.net/ - - -

    If you do (or think that you should) have SoX, double-check your
    path variables.

>>>
>>> model = Qwen3TTSModel.from_pretrained(
...     "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
...     device_map="cuda:0",
...     dtype=torch.bfloat16,
...     attn_implementation="flash_attention_2",
... )
config.json: 4.91kB [00:00, 4.70MB/s]
C:\Users\minimonk\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\minimonk\.cache\huggingface\hub\models--Qwen--Qwen3-TTS-12Hz-1.7B-CustomVoice. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
model.safetensors:   0%|                                                                   | 0.00/3.83G [00:00<?, ?B/s]

model.safetensors: 100%|██████████████████████████████████████████████████████████| 3.83G/3.83G [04:45<00:00, 13.4MB/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\minimonk\AppData\Local\Programs\Python\Python310\lib\site-packages\qwen_tts\inference\qwen3_tts_model.py", line 112, in from_pretrained
    model = AutoModel.from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\models\auto\auto_factory.py", line 604, in from_pretrained
    return model_class.from_pretrained(
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\qwen_tts\core\models\modeling_qwen3_tts.py", line 1876, in from_pretrained
    model = super().from_pretrained(
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 277, in _wrapper
    return func(*args, **kwargs)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 4971, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\qwen_tts\core\models\modeling_qwen3_tts.py", line 1817, in __init__
    super().__init__(config)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2076, in __init__
    self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2686, in _check_and_adjust_attn_implementation
    applicable_attn_implementation = self.get_correct_attn_implementation(
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2714, in get_correct_attn_implementation
    self._flash_attn_2_can_dispatch(is_init_check)
  File "C:\ Users\minimonk\AppData \Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2422, in _flash_attn_2_can_dispatch
    raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

 

에라이, 역시 리눅스 환경 기준으로 해야하나?

D:\study\llm>pip install flash_attn
Collecting flash_attn
  Using cached flash_attn-2.8.3.tar.gz (8.4 MB)
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\minimonk\\AppData\\Local\\Temp\\pip-install-gkk0v5su\\flash-attn_bdc9b907b4714d19aa80016a5ecbd8e6\\csrc/composable_kernel/library/src/tensor_operation_instance/gpu/batched_gemm_add_relu_gemm_add/device_batched_gemm_add_relu_gemm_add_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp'
HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at https://pip.pypa.io/warnings/enable-long-paths

 

화자와 언어가 달라도 될까 궁금하네

[링크 : https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

litert-lm 와 gemma4-e2b mtp 일단 실패  (0) 2026.05.10
vLLM  (0) 2026.05.10
supertone/supertonic3 시도  (0) 2026.05.10
outetts 시도  (0) 2026.05.10
huggingface 에서 다운로드 받기(python)  (0) 2026.05.10
Posted by 구차니

알아서 받고 한글도 정말 잘 변환해준다.

잠시 검색해보니 한국 회사인것 같고. hybe 자회사로 게임같은데서 보이스 체인저로 유명한 듯?

라이센스는 좀 읽어 봐야겠지만 대충 번역기 돌려서 보니 SaaS 까지도 허용하는 것 같긴한데..

 

outetts 처럼 빌드는 필요없이 그냥 pip만으로 설치되니 good!

그리고 auto_download 하면 먼가 열심히 받고 알아서 한다.

D:\study\llm>pip install supertonic
D:\study\llm>python
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from supertonic import TTS
>>> tts = TTS(auto_download=True)
Downloading (incomplete total...): 0.00B [00:00, ?B/s]                                                                 Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Fetching 26 files: 100%|███████████████████████████████████████████████████████████████| 26/26 [00:36<00:00,  1.40s/it]
Download complete: : 404MB [00:36, 19.4MB/s]                                                                           >>> style = tts.get_voice_style(voice_name="M1")
>>>
>>> text = "A gentle breeze moved through the open window while everyone listened to the story."
>>> wav, duration = tts.synthesize(text, voice_style=style, lang="en")
>>>
>>> tts.save_audio(wav, "output.wav")
>>> print(f"Generated {duration:.2f}s of audio")

>>> text = "안녕? 난 잼미니야 만나서 반가워"
>>> wav, duration = tts.synthesize(text, voice_style=style, lang="ko")
>>> tts.save_audio(wav, "output_ko.wav")

 

[링크 : https://huggingface.co/Supertone/supertonic-3]

[링크 : https://www.supertone.ai/ko]

'프로그램 사용 > ai 프로그램' 카테고리의 다른 글

vLLM  (0) 2026.05.10
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice 시도 실패  (0) 2026.05.10
outetts 시도  (0) 2026.05.10
huggingface 에서 다운로드 받기(python)  (0) 2026.05.10
stable diffusion 사용법  (0) 2026.05.09
Posted by 구차니

윈도우에서 하려고 했더니

step 1에서 바로 좌절. 먼가 그럼 미친듯이 깔지 말고 컴파일러 부터 확인하고 가라고!!! 버럭버럭!

D:\study\llm> pip install outetts
      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> llama-cpp-python

D:\study\llm>

[링크 : https://github.com/edwko/OuteTTS?tab=readme-ov-file#installation]

[링크 : https://huggingface.co/unsloth/Llama-OuteTTS-1.0-1B]

 

 

Running the example
With both of the models generated, the LLM model and the voice decoder model,
we can run the example:

$ build/bin/llama-tts -m  ./models/outetts-0.2-0.5B-q8_0.gguf \
    -mv ./models/wavtokenizer-large-75-f16.gguf \
    -p "Hello world"
...
main: audio written to file 'output.wav'

[링크 : https://git.comtegra.pl/ajastrzebski/llama-cpp/-/tree/master/examples/tts]

[링크 : https://huggingface.co/OuteAI/OuteTTS-0.2-500M-GGUF/tree/main]

[링크 : https://huggingface.co/ggml-org/WavTokenizer/tree/main]

 

D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64>llama-tts -m ..\OuteTTS-0.3-500M-Q8_0.gguf  -mv ..\WavTokenizer-Large-75-F16.gguf -p "hello i am sam. how are you?"
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 6143 MiB):
  Device 0: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes, VRAM: 6143 MiB
load_backend: loaded CUDA backend from D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64\ggml-cuda.dll
load_backend: loaded RPC backend from D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64\ggml-cpu-haswell.dll
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
common_params_fit_impl: getting device memory data for initial parameters:
common_memory_breakdown_print: | memory breakdown [MiB]   | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - CUDA0 (GTX 1060 6GB) |  6143 = 5197 + ( 931 =   506 +      96 +     329) +          15 |
common_memory_breakdown_print: |   - Host                 |                  162 =   143 +       0 +      19                |
common_params_fit_impl: projected to use 931 MiB of device memory vs. 5197 MiB of free device memory
common_params_fit_impl: will leave 4265 >= 1024 MiB of free device memory, no changes needed
common_fit_params: successfully fit params to free device memory
common_fit_params: fitting params to free memory took 0.44 seconds
llama_model_loader: loaded meta data with 25 key-value pairs and 290 tensors from ..\OuteTTS-0.3-500M-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = OuteTTS 0.3 500M
llama_model_loader: - kv   3:                           general.basename str              = OuteTTS-0.3
llama_model_loader: - kv   4:                         general.size_label str              = 500M
llama_model_loader: - kv   5:                          qwen2.block_count u32              = 24
llama_model_loader: - kv   6:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv   7:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv   8:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv   9:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv  10:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  11:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  12:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,157696]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,157696]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 151644
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 151645
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  22:                    tokenizer.chat_template str              = outetts-0.3
llama_model_loader: - kv  23:               general.quantization_version u32              = 2
llama_model_loader: - kv  24:                          general.file_type u32              = 7
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q8_0:  169 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 506.02 MiB (8.50 BPW)
llama_prepare_model_devices: using device CUDA0 (NVIDIA GeForce GTX 1060 6GB) (0000:01:00.0) - 5197 MiB free
load: 0 unused tokens
load: control-looking token: 128247 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden
load: printing all EOG tokens:
load:   - 128247 ('</s>')
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 5152
load: token to piece cache size = 0.9712 MB
print_info: arch                  = qwen2
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 32768
print_info: n_embd                = 896
print_info: n_embd_inp            = 896
print_info: n_layer               = 24
print_info: n_head                = 14
print_info: n_head_kv             = 2
print_info: n_rot                 = 64
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 64
print_info: n_embd_head_v         = 64
print_info: n_gqa                 = 7
print_info: n_embd_k_gqa          = 128
print_info: n_embd_v_gqa          = 128
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-06
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: f_attn_value_scale    = 0.0000
print_info: n_ff                  = 4864
print_info: n_expert              = 0
print_info: n_expert_used         = 0
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 1
print_info: pooling type          = -1
print_info: rope type             = 2
print_info: rope scaling          = linear
print_info: freq_base_train       = 1000000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 32768
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = 1B
print_info: model params          = 499.19 M
print_info: general.name          = OuteTTS 0.3 500M
print_info: vocab type            = BPE
print_info: n_vocab               = 157696
print_info: n_merges              = 151387
print_info: BOS token             = 151644 '<|im_start|>'
print_info: EOS token             = 151645 '<|im_end|>'
print_info: EOT token             = 151645 '<|im_end|>'
print_info: PAD token             = 151645 '<|im_end|>'
print_info: LF token              = 198 'Ċ'
print_info: FIM PRE token         = 151659 '<|fim_prefix|>'
print_info: FIM SUF token         = 151661 '<|fim_suffix|>'
print_info: FIM MID token         = 151660 '<|fim_middle|>'
print_info: FIM PAD token         = 151662 '<|fim_pad|>'
print_info: FIM REP token         = 151663 '<|repo_name|>'
print_info: FIM SEP token         = 151664 '<|file_sep|>'
print_info: EOG token             = 128247 '</s>'
print_info: EOG token             = 151643 '<|endoftext|>'
print_info: EOG token             = 151645 '<|im_end|>'
print_info: EOG token             = 151662 '<|fim_pad|>'
print_info: EOG token             = 151663 '<|repo_name|>'
print_info: EOG token             = 151664 '<|file_sep|>'
print_info: max token length      = 256
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 23 repeating layers to GPU
load_tensors: offloaded 25/25 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   143.17 MiB
load_tensors:        CUDA0 model buffer size =   506.07 MiB
..........................................................
common_init_result: added </s> logit bias = -inf
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
common_init_result: added <|fim_pad|> logit bias = -inf
common_init_result: added <|repo_name|> logit bias = -inf
common_init_result: added <|file_sep|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 8192
llama_context: n_ctx_seq     = 8192
llama_context: n_batch       = 8192
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (8192) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     0.60 MiB
llama_kv_cache:      CUDA0 KV buffer size =    96.00 MiB
llama_kv_cache: size =   96.00 MiB (  8192 cells,  24 layers,  1/1 seqs), K (f16):   48.00 MiB, V (f16):   48.00 MiB
llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 64
llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 64
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve:      CUDA0 compute buffer size =   329.26 MiB
sched_reserve:  CUDA_Host compute buffer size =    19.51 MiB
sched_reserve: graph nodes  = 823
sched_reserve: graph splits = 2
sched_reserve: reserve took 9.05 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
common_params_fit_impl: getting device memory data for initial parameters:
common_memory_breakdown_print: | memory breakdown [MiB]   | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - CUDA0 (GTX 1060 6GB) |  6143 = 4255 + ( 496 =   120 +       0 +     376) +        1392 |
common_memory_breakdown_print: |   - Host                 |                   36 =     4 +       0 +      32                |
common_params_fit_impl: projected to use 496 MiB of device memory vs. 4255 MiB of free device memory
common_params_fit_impl: will leave 3758 >= 1024 MiB of free device memory, no changes needed
common_fit_params: successfully fit params to free device memory
common_fit_params: fitting params to free memory took -0.78 seconds
llama_model_loader: loaded meta data with 25 key-value pairs and 161 tensors from ..\WavTokenizer-Large-75-F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = wavtokenizer-dec
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = WavTokenizer Large Speech 75token
llama_model_loader: - kv   3:                           general.finetune str              = speech-75token
llama_model_loader: - kv   4:                           general.basename str              = WavTokenizer
llama_model_loader: - kv   5:                         general.size_label str              = large
llama_model_loader: - kv   6:                            general.license str              = mit
llama_model_loader: - kv   7:               wavtokenizer-dec.block_count u32              = 12
llama_model_loader: - kv   8:            wavtokenizer-dec.context_length u32              = 8192
llama_model_loader: - kv   9:          wavtokenizer-dec.embedding_length u32              = 1282
llama_model_loader: - kv  10:      wavtokenizer-dec.attention.head_count u32              = 1
llama_model_loader: - kv  11: wavtokenizer-dec.attention.layer_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  12:                          general.file_type u32              = 1
llama_model_loader: - kv  13:                wavtokenizer-dec.vocab_size u32              = 4096
llama_model_loader: - kv  14:           wavtokenizer-dec.features_length u32              = 512
llama_model_loader: - kv  15:       wavtokenizer-dec.feed_forward_length u32              = 2304
llama_model_loader: - kv  16: wavtokenizer-dec.attention.group_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  17: wavtokenizer-dec.attention.group_norm_groups u32              = 32
llama_model_loader: - kv  18:   wavtokenizer-dec.posnet.embedding_length u32              = 768
llama_model_loader: - kv  19:        wavtokenizer-dec.posnet.block_count u32              = 6
llama_model_loader: - kv  20: wavtokenizer-dec.convnext.embedding_length u32              = 768
llama_model_loader: - kv  21:      wavtokenizer-dec.convnext.block_count u32              = 12
llama_model_loader: - kv  22:          wavtokenizer-dec.attention.causal bool             = false
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = none
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  110 tensors
llama_model_loader: - type  f16:   51 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 124.15 MiB (16.03 BPW)
llama_prepare_model_devices: using device CUDA0 (NVIDIA GeForce GTX 1060 6GB) (0000:01:00.0) - 4255 MiB free
load: adding 4096 dummy tokens
print_info: arch                  = wavtokenizer-dec
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 8192
print_info: n_embd                = 512
print_info: n_embd_inp            = 512
print_info: n_layer               = 12
print_info: n_head                = 1
print_info: n_head_kv             = 1
print_info: n_rot                 = 512
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 512
print_info: n_embd_head_v         = 512
print_info: n_gqa                 = 1
print_info: n_embd_k_gqa          = 512
print_info: n_embd_v_gqa          = 512
print_info: f_norm_eps            = 1.0e-06
print_info: f_norm_rms_eps        = 0.0e+00
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: f_attn_value_scale    = 0.0000
print_info: n_ff                  = 2304
print_info: n_expert              = 0
print_info: n_expert_used         = 0
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 0
print_info: pooling type          = -1
print_info: rope type             = -1
print_info: rope scaling          = linear
print_info: freq_base_train       = 10000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 8192
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = ?B
print_info: model params          = 64.98 M
print_info: general.name          = WavTokenizer Large Speech 75token
print_info: vocab type            = no vocab
print_info: n_vocab               = 4096
print_info: n_merges              = 0
print_info: max token length      = 0
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 11 repeating layers to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =     4.00 MiB
load_tensors:        CUDA0 model buffer size =   120.15 MiB
.......................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 8192
llama_context: n_ctx_seq     = 8192
llama_context: n_batch       = 8192
llama_context: n_ubatch      = 8192
llama_context: causal_attn   = 0
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 10000.0
llama_context: freq_scale    = 1
llama_context:  CUDA_Host  output buffer size =     0.02 MiB
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve:      CUDA0 compute buffer size =   376.00 MiB
sched_reserve:  CUDA_Host compute buffer size =    32.03 MiB
sched_reserve: graph nodes  = 401
sched_reserve: graph splits = 2
sched_reserve: reserve took 14.06 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
sampler seed: 0
sampler params:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1
        top_k = 4, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000, adaptive_target = -1.000, adaptive_decay = 0.900
sampler chain: logits -> top-k -> dist
main: loading done
main: constructing prompt ..
main: prompt: 'hello<|space|>i<|space|>am<|space|>sam<|space|>how<|space|>are<|space|>you'


main: llama tokens: 151667, 198, 1782, 155780, 151929, 152412, 152308, 152585, 152460, 153375, 156777, 198, 74455, 155808, 151799, 151873, 151863, 152446, 152372, 152204, 152728, 152229, 152470, 151970, 153413, 152419, 153334, 153289, 153374, 153199, 152040, 153260, 152721, 152680, 153297, 152419, 153248, 152400, 152691, 153368, 153437, 156777, 198, 1722, 155828, 152607, 152256, 152991, 152299, 152688, 153163, 153016, 152789, 153198, 152712, 151911, 153107, 152623, 152170, 152395, 152852, 152207, 152461, 153321, 153309, 151750, 152137, 153340, 152573, 152267, 153347, 151789, 152681, 153339, 151992, 152512, 151751, 152179, 153434, 153180, 152900, 153440, 152474, 153122, 153129, 151904, 152311, 156777, 198, 1499, 155791, 152276, 152454, 153354, 152544, 153204, 153272, 152708, 153433, 152319, 153226, 153043, 152325, 153267, 152622, 156777, 198, 4250, 155797, 153454, 153342, 151989, 152458, 153420, 152303, 152271, 152827, 153036, 153196, 151708, 153263, 152561, 153207, 152213, 152112, 153204, 151722, 152542, 156777, 198, 19789, 155796, 153353, 153182, 152345, 152471, 152477, 153014, 152002, 152191, 151734, 152312, 152810, 152237, 153224, 153169, 153224, 152244, 153387, 153404, 156777, 198, 16069, 155811, 152265, 151946, 151808, 152412, 152363, 152305, 153156, 152733, 152810, 153157, 152016, 152100, 152069, 153234, 152317, 152589, 152707, 153121, 153341, 152159, 152114, 153156, 153001, 153504, 153376, 152272, 152433, 152325, 151941, 156777, 198, 285, 155788, 152238, 152255, 153427, 152318, 153009, 152381, 152474, 152680, 152157, 153255, 152324, 151682, 156777, 198, 32955, 155804, 153490, 153419, 152364, 152405, 152682, 152206, 152078, 153369, 152725, 153193, 153027, 152946, 152488, 153070, 151883, 152890, 152489, 153144, 153375, 152358, 151685, 152494, 152117, 152740, 156777, 198, 37448, 480, 155840, 151902, 152720, 153377, 152027, 152378, 152821, 153207, 153459, 153028, 153068, 152507, 153255, 152158, 152921, 151958, 152609, 152748, 152822, 152286, 151714, 152730, 152377, 152353, 152470, 152606, 152162, 152186, 153071, 152244, 153118, 153375, 153018, 152712, 153098, 152976, 152336, 151843, 153202, 152297, 151736, 153380, 153502, 152702, 152115, 153181, 152735, 153277, 153457, 152393, 153112, 152595, 156777, 198, 19098, 155808, 152464, 153452, 152595, 153312, 151937, 151933, 153197, 152239, 153163, 152922, 153402, 152034, 152591, 153438, 152215, 151673, 152005, 151785, 152642, 151924, 153278, 151805, 151974, 153482, 152718, 152862, 153347, 156777, 198, 72, 155780, 151795, 152111, 152746, 152377, 153471, 152309, 156777, 198, 19016, 155788, 153181, 152271, 152190, 152842, 152224, 152701, 152939, 152536, 152091, 151815, 152733, 151672, 156777, 198, 14689, 155788, 152291, 152072, 152942, 151734, 153042, 153504, 152589, 153333, 151839, 151941, 153038, 153180, 156777, 198, 36996, 8303, 155832, 152231, 152256, 152835, 152801, 152985, 153400, 152393, 152818, 152765, 152249, 152600, 151699, 152302, 152752, 153018, 153009, 151992, 153054, 152847, 153354, 153228, 152662, 153355, 152532, 153393, 151782, 152458, 152048, 152757, 152428, 153195, 151906, 153006, 153178, 153250, 152331, 152284, 152780, 153138, 153319, 151980, 153142, 152418, 152228, 152733, 156777, 198, 9096, 155801, 151698, 153321, 152217, 153039, 152935, 153400, 152122, 152531, 153106, 152169, 152892, 152957, 151851, 152427, 152826, 152451, 151851, 152901, 152885, 152594, 153446, 153080, 156777, 198, 14689, 155795, 152658, 151700, 153321, 152450, 152530, 153191, 151673, 151690, 151698, 152714, 152846, 152981, 153171, 153384, 153364, 153188, 153246, 156777, 198, 1055, 155779, 151869, 152388, 152711, 153334, 151736, 156777, 198, 1782, 155780, 153483, 153240, 152241, 152558, 152697, 153046, 156777, 198, 5804, 1363, 155820, 152941, 152764, 152605, 153034, 153434, 153372, 153347, 151887, 152453, 152758, 152133, 152510, 152694, 152431, 152321, 153088, 152676, 152223, 152581, 152459, 152015, 152502, 153063, 152712, 153294, 153451, 153032, 152903, 152859, 152989, 151748, 152669, 152661, 152650, 152409, 151861, 156777, 198, 300, 7973, 155828, 153095, 152469, 152988, 152894, 151819, 152391, 153019, 152058, 153062, 153230, 151826, 152112, 152306, 152264, 152769, 153390, 152384, 152435, 152790, 153393, 152983, 152540, 152252, 152034, 153107, 152540, 151919, 151893, 152558, 152817, 152946, 152956, 152129, 152715, 153131, 153490, 151734, 152271, 152707, 151734, 153321, 152450, 156777, 198, 8088, 155792, 152452, 153497, 153353, 152679, 152533, 152382, 152374, 152611, 153341, 153163, 152285, 153411, 152495, 153141, 152320, 156777, 198, 1199, 155781, 151764, 152360, 153295, 152634, 153342, 152199, 152271, 156777, 198, 43366, 155799, 152308, 151682, 152889, 152016, 152385, 152629, 152495, 151826, 153321, 152958, 152180, 151886, 153432, 152922, 152128, 153024, 153040, 152593, 152287, 151677, 156777, 198, 53660, 155808, 151727, 152092, 152680, 153331, 151699, 152316, 152938, 152289, 152433, 153384, 151781, 153137, 153259, 152175, 153213, 152291, 151869, 152691, 152489, 151941, 152049, 152034, 153053, 152179, 153160, 151676, 153367, 156777, 198, 268, 4123, 480, 155821, 152350, 152173, 152536, 151991, 151960, 153144, 153013, 152358, 152234, 153135, 152291, 153235, 152143, 152583, 152402, 153483, 152678, 152192, 152533, 152946, 151797, 153103, 152310, 152293, 151825, 152548, 153442, 152109, 152659, 153325, 152781, 152570, 152957, 151752, 152265, 153381, 152515, 156777, 198, 437, 155787, 152957, 152659, 151975, 152709, 152402, 152836, 152174, 151792, 153409, 153327, 152990, 156777, 198, 275, 155781, 152520, 153038, 152067, 153273, 153185, 152265, 152974, 156777, 198, 94273, 155799, 152953, 152938, 153427, 152244, 151920, 153423, 152929, 152367, 153052, 152129, 152331, 152257, 152987, 152777, 153448, 152408, 151696, 152408, 152326, 152699, 156777, 198, 385, 16239, 155828, 152306, 152268, 153438, 153228, 152978, 152957, 153153, 153393, 152795, 152110, 152918, 152923, 152467, 152331, 153053, 153330, 151889, 153444, 152234, 152624, 151779, 152801, 152784, 152139, 152222, 152751, 152512, 153287, 153141, 153052, 151840, 152589, 152508, 153499, 152109, 152255, 151739, 152267, 152759, 153318, 153165, 153349, 156777,


<|im_start|>
<|text_start|>the<|space|>overall<|space|>package<|space|>from<|space|>just<|space|>two<|space|>people<|space|>is<|space|>pretty<|space|>remarkable<|space|>sure<|space|>i<|space|>have<|space|>some<|space|>critiques<|space|>about<|space|>some<|space|>of<|space|>the<|space|>gameplay<|space|>aspects<|space|>but<|space|>its<|space|>still<|space|>really<|space|>enjoyable<|space|>and<|space|>it<|space|>looks<|space|>lovely<|space|>hello<|space|>i<|space|>am<|space|>sam<|space|>how<|space|>are<|space|>you<|text_end|>
<|audio_start|>
the<|t_0.08|><|257|><|740|><|636|><|913|><|788|><|1703|><|space|>
overall<|t_0.36|><|127|><|201|><|191|><|774|><|700|><|532|><|1056|><|557|><|798|><|298|><|1741|><|747|><|1662|><|1617|><|1702|><|1527|><|368|><|1588|><|1049|><|1008|><|1625|><|747|><|1576|><|728|><|1019|><|1696|><|1765|><|space|>
package<|t_0.56|><|935|><|584|><|1319|><|627|><|1016|><|1491|><|1344|><|1117|><|1526|><|1040|><|239|><|1435|><|951|><|498|><|723|><|1180|><|535|><|789|><|1649|><|1637|><|78|><|465|><|1668|><|901|><|595|><|1675|><|117|><|1009|><|1667|><|320|><|840|><|79|><|507|><|1762|><|1508|><|1228|><|1768|><|802|><|1450|><|1457|><|232|><|639|><|space|>
from<|t_0.19|><|604|><|782|><|1682|><|872|><|1532|><|1600|><|1036|><|1761|><|647|><|1554|><|1371|><|653|><|1595|><|950|><|space|>
just<|t_0.25|><|1782|><|1670|><|317|><|786|><|1748|><|631|><|599|><|1155|><|1364|><|1524|><|36|><|1591|><|889|><|1535|><|541|><|440|><|1532|><|50|><|870|><|space|>
two<|t_0.24|><|1681|><|1510|><|673|><|799|><|805|><|1342|><|330|><|519|><|62|><|640|><|1138|><|565|><|1552|><|1497|><|1552|><|572|><|1715|><|1732|><|space|>
people<|t_0.39|><|593|><|274|><|136|><|740|><|691|><|633|><|1484|><|1061|><|1138|><|1485|><|344|><|428|><|397|><|1562|><|645|><|917|><|1035|><|1449|><|1669|><|487|><|442|><|1484|><|1329|><|1832|><|1704|><|600|><|761|><|653|><|269|><|space|>
is<|t_0.16|><|566|><|583|><|1755|><|646|><|1337|><|709|><|802|><|1008|><|485|><|1583|><|652|><|10|><|space|>
pretty<|t_0.32|><|1818|><|1747|><|692|><|733|><|1010|><|534|><|406|><|1697|><|1053|><|1521|><|1355|><|1274|><|816|><|1398|><|211|><|1218|><|817|><|1472|><|1703|><|686|><|13|><|822|><|445|><|1068|><|space|>
remarkable<|t_0.68|><|230|><|1048|><|1705|><|355|><|706|><|1149|><|1535|><|1787|><|1356|><|1396|><|835|><|1583|><|486|><|1249|><|286|><|937|><|1076|><|1150|><|614|><|42|><|1058|><|705|><|681|><|798|><|934|><|490|><|514|><|1399|><|572|><|1446|><|1703|><|1346|><|1040|><|1426|><|1304|><|664|><|171|><|1530|><|625|><|64|><|1708|><|1830|><|1030|><|443|><|1509|><|1063|><|1605|><|1785|><|721|><|1440|><|923|><|space|>
sure<|t_0.36|><|792|><|1780|><|923|><|1640|><|265|><|261|><|1525|><|567|><|1491|><|1250|><|1730|><|362|><|919|><|1766|><|543|><|1|><|333|><|113|><|970|><|252|><|1606|><|133|><|302|><|1810|><|1046|><|1190|><|1675|><|space|>
i<|t_0.08|><|123|><|439|><|1074|><|705|><|1799|><|637|><|space|>
have<|t_0.16|><|1509|><|599|><|518|><|1170|><|552|><|1029|><|1267|><|864|><|419|><|143|><|1061|><|0|><|space|>
some<|t_0.16|><|619|><|400|><|1270|><|62|><|1370|><|1832|><|917|><|1661|><|167|><|269|><|1366|><|1508|><|space|>
critiques<|t_0.60|><|559|><|584|><|1163|><|1129|><|1313|><|1728|><|721|><|1146|><|1093|><|577|><|928|><|27|><|630|><|1080|><|1346|><|1337|><|320|><|1382|><|1175|><|1682|><|1556|><|990|><|1683|><|860|><|1721|><|110|><|786|><|376|><|1085|><|756|><|1523|><|234|><|1334|><|1506|><|1578|><|659|><|612|><|1108|><|1466|><|1647|><|308|><|1470|><|746|><|556|><|1061|><|space|>
about<|t_0.29|><|26|><|1649|><|545|><|1367|><|1263|><|1728|><|450|><|859|><|1434|><|497|><|1220|><|1285|><|179|><|755|><|1154|><|779|><|179|><|1229|><|1213|><|922|><|1774|><|1408|><|space|>
some<|t_0.23|><|986|><|28|><|1649|><|778|><|858|><|1519|><|1|><|18|><|26|><|1042|><|1174|><|1309|><|1499|><|1712|><|1692|><|1516|><|1574|><|space|>
of<|t_0.07|><|197|><|716|><|1039|><|1662|><|64|><|space|>
the<|t_0.08|><|1811|><|1568|><|569|><|886|><|1025|><|1374|><|space|>
gameplay<|t_0.48|><|1269|><|1092|><|933|><|1362|><|1762|><|1700|><|1675|><|215|><|781|><|1086|><|461|><|838|><|1022|><|759|><|649|><|1416|><|1004|><|551|><|909|><|787|><|343|><|830|><|1391|><|1040|><|1622|><|1779|><|1360|><|1231|><|1187|><|1317|><|76|><|997|><|989|><|978|><|737|><|189|><|space|>
aspects<|t_0.56|><|1423|><|797|><|1316|><|1222|><|147|><|719|><|1347|><|386|><|1390|><|1558|><|154|><|440|><|634|><|592|><|1097|><|1718|><|712|><|763|><|1118|><|1721|><|1311|><|868|><|580|><|362|><|1435|><|868|><|247|><|221|><|886|><|1145|><|1274|><|1284|><|457|><|1043|><|1459|><|1818|><|62|><|599|><|1035|><|62|><|1649|><|778|><|space|>
but<|t_0.20|><|780|><|1825|><|1681|><|1007|><|861|><|710|><|702|><|939|><|1669|><|1491|><|613|><|1739|><|823|><|1469|><|648|><|space|>
its<|t_0.09|><|92|><|688|><|1623|><|962|><|1670|><|527|><|599|><|space|>
still<|t_0.27|><|636|><|10|><|1217|><|344|><|713|><|957|><|823|><|154|><|1649|><|1286|><|508|><|214|><|1760|><|1250|><|456|><|1352|><|1368|><|921|><|615|><|5|><|space|>
really<|t_0.36|><|55|><|420|><|1008|><|1659|><|27|><|644|><|1266|><|617|><|761|><|1712|><|109|><|1465|><|1587|><|503|><|1541|><|619|><|197|><|1019|><|817|><|269|><|377|><|362|><|1381|><|507|><|1488|><|4|><|1695|><|space|>
enjoyable<|t_0.49|><|678|><|501|><|864|><|319|><|288|><|1472|><|1341|><|686|><|562|><|1463|><|619|><|1563|><|471|><|911|><|730|><|1811|><|1006|><|520|><|861|><|1274|><|125|><|1431|><|638|><|621|><|153|><|876|><|1770|><|437|><|987|><|1653|><|1109|><|898|><|1285|><|80|><|593|><|1709|><|843|><|space|>
and<|t_0.15|><|1285|><|987|><|303|><|1037|><|730|><|1164|><|502|><|120|><|1737|><|1655|><|1318|><|space|>
it<|t_0.09|><|848|><|1366|><|395|><|1601|><|1513|><|593|><|1302|><|space|>
looks<|t_0.27|><|1281|><|1266|><|1755|><|572|><|248|><|1751|><|1257|><|695|><|1380|><|457|><|659|><|585|><|1315|><|1105|><|1776|><|736|><|24|><|736|><|654|><|1027|><|space|>
lovely<|t_0.56|><|634|><|596|><|1766|><|1556|><|1306|><|1285|><|1481|><|1721|><|1123|><|438|><|1246|><|1251|><|795|><|659|><|1381|><|1658|><|217|><|1772|><|562|><|952|><|107|><|1129|><|1112|><|467|><|550|><|1079|><|840|><|1615|><|1469|><|1380|><|168|><|917|><|836|><|1827|><|437|><|583|><|67|><|595|><|1087|><|1646|><|1493|><|1677|><|space|>main: prompt size: 871

main: time for prompt: 252.929 ms

000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

main: time for decoder:       1412.620 ms
common_perf_print:    sampling time =      66.79 ms
common_perf_print:    samplers time =      26.02 ms /   199 tokens
common_perf_print:        load time =     655.95 ms
common_perf_print: prompt eval time =     234.97 ms /   871 tokens (    0.27 ms per token,  3706.90 tokens per second)
common_perf_print:        eval time =    1341.80 ms /   198 runs   (    6.78 ms per token,   147.56 tokens per second)
common_perf_print:       total time =    1075.97 ms /  1069 tokens
common_perf_print: unaccounted time =    -567.58 ms / -52.8 %      (total - sampling - prompt eval - eval) / (total)
common_perf_print:    graphs reused =        196
common_memory_breakdown_print: | memory breakdown [MiB]   | total   free    self   model   context   compute    unaccounted |
common_memory_breakdown_print: |   - CUDA0 (GTX 1060 6GB) |  6143 = 3733 + ( 931 =   506 +      96 +     329) +        1479 |
common_memory_breakdown_print: |   - Host                 |                  162 =   143 +       0 +      19                |

codes: '
hello<|t_0.96|><|865|><|1506|><|865|><|1419|><|1819|><|838|><|624|><|1251|><|899|><|954|><|1096|><|710|><|1152|><|1418|><|710|><|1301|><|1120|><|17|><|1456|><|1405|><|776|><|1668|><|1390|><|86|><|1292|><|1023|><|1683|><|1589|><|1092|><|1556|><|1479|><|1294|><|1292|><|805|><|1683|><|1430|><|900|><|1714|><|995|><|1294|><|1432|><|1007|><|1622|><|1120|><|861|><|1803|><|995|><|1092|><|1668|><|710|><|1433|><|933|><|670|><|32|><|1293|><|1251|><|1134|><|1701|><|1347|><|816|><|642|><|95|><|508|><|48|><|503|><|653|><|1707|><|1041|><|267|><|1817|><|248|><|1754|><|space|>
i<|t_0.28|><|73|><|642|><|169|><|614|><|983|><|169|><|843|><|443|><|1092|><|752|><|252|><|1378|><|1315|><|221|><|1448|><|1083|><|565|><|866|><|93|><|767|><|1697|><|space|>
am<|t_0.16|><|422|><|852|><|408|><|847|><|1007|><|550|><|874|><|673|><|191|><|127|><|220|><|716|><|space|>
sam<|t_0.43|><|775|><|487|><|646|><|519|><|493|><|1513|><|1|><|1166|><|640|><|556|><|0|><|1061|><|18|><|333|><|719|><|632|><|693|><|907|><|430|><|1312|><|1086|><|1098|><|1333|><|974|><|816|><|440|><|1755|><|1324|><|1534|><|662|><|1812|><|385|><|space|>
how<|t_0.20|><|1663|><|1028|><|1488|><|1314|><|1393|><|1723|><|1303|><|1497|><|951|><|1181|><|789|><|142|><|1475|><|66|><|297|><|space|>
are<|t_0.13|><|798|><|1803|><|562|><|123|><|756|><|968|><|381|><|890|><|1773|><|1039|><|space|>
you<|t_0.08|><|193|><|92|><|1221|><|1334|><|562|><|1415|>
<|audio_end|>
<|im_end|>'
main: codes size: 199
codes audio: '<|865|><|1506|><|865|><|1419|><|1819|><|838|><|624|><|1251|><|899|><|954|><|1096|><|710|><|1152|><|1418|><|710|><|1301|><|1120|><|17|><|1456|><|1405|><|776|><|1668|><|1390|><|86|><|1292|><|1023|><|1683|><|1589|><|1092|><|1556|><|1479|><|1294|><|1292|><|805|><|1683|><|1430|><|900|><|1714|><|995|><|1294|><|1432|><|1007|><|1622|><|1120|><|861|><|1803|><|995|><|1092|><|1668|><|710|><|1433|><|933|><|670|><|32|><|1293|><|1251|><|1134|><|1701|><|1347|><|816|><|642|><|95|><|508|><|48|><|503|><|653|><|1707|><|1041|><|267|><|1817|><|248|><|1754|><|73|><|642|><|169|><|614|><|983|><|169|><|843|><|443|><|1092|><|752|><|252|><|1378|><|1315|><|221|><|1448|><|1083|><|565|><|866|><|93|><|767|><|1697|><|422|><|852|><|408|><|847|><|1007|><|550|><|874|><|673|><|191|><|127|><|220|><|716|><|775|><|487|><|646|><|519|><|493|><|1513|><|1|><|1166|><|640|><|556|><|0|><|1061|><|18|><|333|><|719|><|632|><|693|><|907|><|430|><|1312|><|1086|><|1098|><|1333|><|974|><|816|><|440|><|1755|><|1324|><|1534|><|662|><|1812|><|385|><|1663|><|1028|><|1488|><|1314|><|1393|><|1723|><|1303|><|1497|><|951|><|1181|><|789|><|142|><|1475|><|66|><|297|><|798|><|1803|><|562|><|123|><|756|><|968|><|381|><|890|><|1773|><|1039|><|193|><|92|><|1221|><|1334|><|562|><|1415|>'
main: codes audio size: 168
main: time for vocoder:      220.671 ms
main: time for spectral ops: 850.860 ms
main: total time:            2737.218 ms
main: audio written to file 'output.wav'

 

영어는 잘되는데 한글은 잘 안되는 듯.

output.wav
0.10MB

 

GPT 통해서 영어로 "안녕? 난 잼미니야 만나서 반가워" 를 TTS에 유리하게 바꾸어 달라고 했는데

'안녕? 난' 은 날아가고 '잼미니야 맨나서 빵가워' 정도로 들린다.

D:\study\llm\llama-b9093-bin-win-cuda-12.4-x64>llama-tts -m ..\OuteTTS-0.3-500M-Q8_0.gguf  -mv ..\WavTokenizer-Large-75-F16.gguf -p "Annyoung? Nahn Jemmini-ya. Mannaseo bangawo."

 

hello_ko.wav
0.15MB

 

 

+

허깅페이스에서 타입에 아예 tts가 있었군.

[링크 : https://huggingface.co/models?pipeline_tag=text-to-speech]

Posted by 구차니