'프로그램 사용 > ai 프로그램' 카테고리의 다른 글
| llama.cpp 와 ollama 성능 비교.. (cpu는 차이가 없?) (0) | 2026.04.22 |
|---|---|
| llama.cpp 도전! (0) | 2026.04.22 |
| ollama 외부접속 관련 (0) | 2026.04.21 |
| llm tokenizer - llama 3.2, exaone (0) | 2026.04.20 |
| ollama 모델 저장소 뜯어보기 (0) | 2026.04.19 |
| llama.cpp 와 ollama 성능 비교.. (cpu는 차이가 없?) (0) | 2026.04.22 |
|---|---|
| llama.cpp 도전! (0) | 2026.04.22 |
| ollama 외부접속 관련 (0) | 2026.04.21 |
| llm tokenizer - llama 3.2, exaone (0) | 2026.04.20 |
| ollama 모델 저장소 뜯어보기 (0) | 2026.04.19 |
리눅스에서 그냥 설치만 하고 딱히 설정한건 없는데, 기본이 모든 ip 접속 허용인 것 같고
| $ netstat -tnlp (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:11434 0.0.0.0:* LISTEN - |
윈도우의 경우 gui 클라이언트에서 설정하는게 보이던데.. 막상 포트는 확인을 안해본듯..
[링크 : http://practical.kr/?p=809]
| llama.cpp 도전! (0) | 2026.04.22 |
|---|---|
| unsloth ai (0) | 2026.04.21 |
| llm tokenizer - llama 3.2, exaone (0) | 2026.04.20 |
| ollama 모델 저장소 뜯어보기 (0) | 2026.04.19 |
| llm tokenizer - phi3 (0) | 2026.04.19 |
| vi 이전 위치 다음 위치로 이동하기 (0) | 2022.08.04 |
|---|---|
| vi가 늦게 켜지는 이유 (0) | 2022.07.28 |
| vim 색상 바꾸기(colorscheme) (0) | 2021.01.20 |
| vi 에서 매칭되는 갯수 확인하기 (0) | 2019.12.18 |
| vi gg=G와 set ts (0) | 2019.07.04 |
음.. tokenizer.json에 대한 접근이 신청 20분 만에 떨어졌었나 보다. (아몰라 잘래 하고 가버렸...)
llama가 한글에 대한 토큰이 하나도 없는데 어떻게 인식을 하지 신기하네..?
정규표현식을 보면 ? 가 있는데 매칭안되면 그냥 한글자씩 뽑아 버리는듯.. ㅎㄷㄷ
| { "version": "1.0", "truncation": null, "padding": null, "added_tokens": [ { "id": 128000, "content": "<|begin_of_text|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true }, { "id": 128001, "content": "<|end_of_text|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true }, ], "normalizer": null, "pre_tokenizer": { "type": "Sequence", "pretokenizers": [ { "type": "Split", "pattern": { "Regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" }, "behavior": "Isolated", "invert": false }, { "type": "ByteLevel", "add_prefix_space": false, "trim_offsets": true, "use_regex": false } ] }, // ... "model": { "type": "BPE", "dropout": null, "unk_token": null, "continuing_subword_prefix": null, "end_of_word_suffix": null, "fuse_unk": false, "byte_fallback": false, "ignore_merges": true, "vocab": { "!": 0, "\"": 1, "#": 2, "$": 3, "%": 4, // ... "ÙĨب": 127996, "ĠвÑĭÑģокой": 127997, "ãĥ¼ãĥ¼": 127998, "éͦ": 127999 }, "merges": [ "Ġ Ġ", "Ġ ĠĠĠ", // ... "ãĥ¼ ãĥ¼", "ãĥ¼ãĥ ¼", "éĶ ¦" ] } } |
[링크 : https://huggingface.co/meta-llama/Llama-3.2-1B/tree/main]
엥.. exaone은 그래도 LG에서 만들어서 한글 토큰들이 있을 줄 알았는데 없네?
그럼.. 한글은 '자동차' 면은 자/동/차 총 3개 토큰을 먹는건가?
| $ grep -P '\p{Hangul}' exa_tokenizer.json "content": "리앙쿠르", "content": "훈민정음", "content": "애국가", "리앙쿠르": 94, "훈민정음": 99, "애국가": 100, |
[링크 : https://huggingface.co/LGAI-EXAONE/EXAONE-4.5-33B/tree/main]
| unsloth ai (0) | 2026.04.21 |
|---|---|
| ollama 외부접속 관련 (0) | 2026.04.21 |
| ollama 모델 저장소 뜯어보기 (0) | 2026.04.19 |
| llm tokenizer - phi3 (0) | 2026.04.19 |
| llm tokenizer (0) | 2026.04.17 |
blob 으로 해시가 파일 명으로 저장되는데 이래저래 궁금해서 분석
| gemma4 e2b { "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "config": { "mediaType": "application/vnd.docker.container.image.v1+json", "digest": "sha256:c6bc3775a3fa9935ce4a3ccd7abc59e936c3de9308d2cc090516012f43ed9c07", "size": 473 }, "layers": [ { "mediaType": "application/vnd.ollama.image.model", "digest": "sha256:4e30e2665218745ef463f722c0bf86be0cab6ee676320f1cfadf91e989107448", "size": 7162394016 }, { "mediaType": "application/vnd.ollama.image.license", "digest": "sha256:7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2", "size": 11355 }, { "mediaType": "application/vnd.ollama.image.params", "digest": "sha256:56380ca2ab89f1f68c283f4d50863c0bcab52ae3f1b9a88e4ab5617b176f71a3", "size": 42 } ] } "sha256:c6bc3775a3fa9935ce4a3ccd7abc59e936c3de9308d2cc090516012f43ed9c07", { "model_format": "gguf", "model_family": "gemma4", "model_families": [ "gemma4" ], "model_type": "5.1B", "file_type": "Q4_K_M", "renderer": "gemma4", "parser": "gemma4", "requires": "0.20.0", "architecture": "amd64", "os": "linux", "rootfs": { "type": "layers", "diff_ids": [ "sha256:4e30e2665218745ef463f722c0bf86be0cab6ee676320f1cfadf91e989107448", "sha256:7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2", "sha256:56380ca2ab89f1f68c283f4d50863c0bcab52ae3f1b9a88e4ab5617b176f71a3" ] } } "sha256:4e30e2665218745ef463f722c0bf86be0cab6ee676320f1cfadf91e989107448" GGUF ? 7 gemma4.attention.head_count "sha256:7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2", Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ "sha256:56380ca2ab89f1f68c283f4d50863c0bcab52ae3f1b9a88e4ab5617b176f71a3", { "temperature": 1, "top_k": 64, "top_p": 0.95 } |
출력단 손보는 건 temperature, top_k, top_p 군
[링크 : https://wikidocs.net/333750]
허깅페이스에서는 tokenzier.json이 존재했는데 그건 구버전(?) 인것 같고
신버전 GGUF 에서는 토크나이저를 다 포함하고 있나 본데.. 어떻게 추출하지?
[링크 : https://www.minzkn.com/vibecoding/pages/gguf-format.html]
[링크 : https://huggingface.co/docs/transformers/ko/gguf]
[링크 : https://bitwise-life.tistory.com/5] << 토큰 목록 나옴
| ollama 외부접속 관련 (0) | 2026.04.21 |
|---|---|
| llm tokenizer - llama 3.2, exaone (0) | 2026.04.20 |
| llm tokenizer - phi3 (0) | 2026.04.19 |
| llm tokenizer (0) | 2026.04.17 |
| llama.cpp (0) | 2026.04.17 |
llama나 gemma 받으려니 먼가 모르겠어서 만만한(?) ms의 phi3를 받아서 분석!
(gemma나 llama 는 저장소 접근권한 요청.. gate model 이라고 뜨는데 언제 승인되려나)
[링크 : https://huggingface.co/docs/transformers/model_doc/phi3]
| ~/.cache/huggingface/hub/models--microsoft--Phi-3-mini-4k-instruct/snapshots/f39ac1d28e925b323eae81227eaba4464caced4e$ ls -al 합계 12 drwxrwxr-x 2 minimonk minimonk 4096 4월 19 21:58 . drwxrwxr-x 3 minimonk minimonk 4096 4월 19 21:58 .. lrwxrwxrwx 1 minimonk minimonk 52 4월 19 21:58 added_tokens.json -> ../../blobs/178968dec606c790aa335e9142f6afec37288470 lrwxrwxrwx 1 minimonk minimonk 52 4월 19 21:58 config.json -> ../../blobs/b9b031fadda61a035b2e8ceb4362cbf604002b21 lrwxrwxrwx 1 minimonk minimonk 52 4월 19 21:58 special_tokens_map.json -> ../../blobs/c6a944b4d49ce5d79030250ed6bdcbb1a65dfda1 lrwxrwxrwx 1 minimonk minimonk 52 4월 19 21:58 tokenizer.json -> ../../blobs/88ec145f4e7684c009bc6d55df24bb82c7d3c379 lrwxrwxrwx 1 minimonk minimonk 76 4월 19 21:58 tokenizer.model -> ../../blobs/9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347 lrwxrwxrwx 1 minimonk minimonk 52 4월 19 21:58 tokenizer_config.json -> ../../blobs/67aa82cddb4d66391ddf31ff99f059239bd2d1e7 |
tokenizer.json 열어보니 아래처럼 토큰이 나오는데..
어우.. 이런 추세(?) 라면 한글은 한글짜 단위로 토큰이라 난리가 나겠는데?
gpt 도움으로 저런 희한한 문자열 코드 기반으로 검색이 되는걸 알았네 ㄷㄷ
| $ grep -P '\p{Hangul}' tokenizer.json "이": 30393, "의": 30708, "다": 30709, "스": 30784, "사": 30791, "지": 30811, "리": 30826, "기": 30827, "정": 30852, "아": 30860, "한": 30877, "시": 30889, "대": 30890, "가": 30903, "로": 30906, "인": 30918, "하": 30944, "수": 30970, "주": 30981, "동": 31000, "자": 31013, "에": 31054, "니": 31063, "는": 31081, "서": 31093, "김": 31102, "성": 31126, "어": 31129, "도": 31136, "고": 31137, "일": 31153, "상": 31158, "전": 31170, "트": 31177, "소": 31189, "라": 31197, "원": 31198, "보": 31199, "나": 31207, "화": 31225, "구": 31231, "신": 31262, "부": 31279, "연": 31285, "을": 31286, "영": 31288, "국": 31293, "장": 31299, "제": 31306, "우": 31327, "공": 31334, "선": 31345, "오": 31346, "은": 31354, "미": 31362, "경": 31378, "문": 31406, "조": 31408, "마": 31417, "해": 31435, "여": 31457, "산": 31458, "비": 31487, "드": 31493, "를": 31517, "요": 31527, "유": 31533, "진": 31536, "천": 31563, "년": 31571, "세": 31578, "민": 31582, "호": 31603, "그": 31607, "현": 31680, "군": 31699, "무": 31716, "위": 31724, "안": 31734, "박": 31736, "용": 31737, "단": 31746, "면": 31747, "남": 31754, "강": 31774, "씨": 31781, "개": 31789, "들": 31804, "차": 31817, "학": 31822, "만": 31826, "터": 31856, "식": 31895, "과": 31906, "타": 31925, "종": 31930, "내": 31940, "중": 31941, "방": 31945, "월": 31950, "회": 31953, "모": 31962, "바": 31963, "음": 31966, "교": 31972, "재": 31973, "명": 31976, "합": 31980, "역": 31987, "백": 31989, "왕": 31996, |
| llm tokenizer - llama 3.2, exaone (0) | 2026.04.20 |
|---|---|
| ollama 모델 저장소 뜯어보기 (0) | 2026.04.19 |
| llm tokenizer (0) | 2026.04.17 |
| llama.cpp (0) | 2026.04.17 |
| lm studio (0) | 2026.04.17 |
심심해서 단어별? 토큰 인덱스를 알고 싶은데 이건 볼 방법 없나? 싶어서 조사
[링크 : https://makenow90.tistory.com/59]
[링크 : https://medium.com/the-research-nest/explained-tokens-and-embeddings-in-llms-69a16ba5db33]
AutoTokenizer.from_pretrained() 를 실행하니 먼가 즉석에서 받을 줄이야 -ㅁ-
| $ python3 Python 3.10.12 (main, Mar 3 2026, 11:56:32) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from transformers import AutoTokenizer Disabling PyTorch because PyTorch >= 2.4 is required but found 2.1.2 PyTorch was not found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. >>> >>> tokenizer = AutoTokenizer.from_pretrained("gpt2") Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads. config.json: 100%|██████████████████████████████| 665/665 [00:00<00:00, 819kB/s] tokenizer_config.json: 100%|█████████████████| 26.0/26.0 [00:00<00:00, 37.5kB/s] vocab.json: 1.04MB [00:00, 2.70MB/s] merges.txt: 456kB [00:00, 1.15MB/s] tokenizer.json: 1.36MB [00:00, 3.23MB/s] >>> >>> tokenizer GPT2Tokenizer(name_or_path='gpt2', vocab_size=50257, model_max_length=1024, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>'}, added_tokens_decoder={ 50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), }) >>> text="안녕? hello?" >>> tokens=tokenizer(text) >>> tokens {'input_ids': [168, 243, 230, 167, 227, 243, 30, 23748, 30], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]} |
[링크 : https://data-scient2st.tistory.com/224]
[링크 : https://huggingface.co/docs/transformers/model_doc/gpt2]
| $ find ./ -name tokenizer.json ./.cache/huggingface/hub/models--gpt2/snapshots/607a30d783dfa663caf39e06633721c8d4cfcd7e/tokenizer.json |
| $ tree ~/.cache/huggingface/hub/models--gpt2/snapshots/607a30d783dfa663caf39e06633721c8d4cfcd7e/ /home/minimonk/.cache/huggingface/hub/models--gpt2/snapshots/607a30d783dfa663caf39e06633721c8d4cfcd7e/ ├── config.json -> ../../blobs/10c66461e4c109db5a2196bff4bb59be30396ed8 ├── merges.txt -> ../../blobs/226b0752cac7789c48f0cb3ec53eda48b7be36cc ├── tokenizer.json -> ../../blobs/4b988bccc9dc5adacd403c00b4704976196548f8 ├── tokenizer_config.json -> ../../blobs/be4d21d94f3b4687e5a54d84bf6ab46ed0f8defd └── vocab.json -> ../../blobs/1f1d9aaca301414e7f6c9396df506798ff4eb9a6 |
+
2026.04.18
windows

beautifier + 일부 발췌
라틴어군에 대해서 있는 것 같고 토큰으로 잘리는걸 보면 단어 단위가 아닌
단어도 막 토막난 수준으로 잘릴 느낌?
| tokenizer.json | vocab.json |
| { "version": "1.0", "truncation": null, "padding": null, "added_tokens": [ { "id": 50256, "special": true, "content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true } ], "normalizer": null, "pre_tokenizer": { "type": "ByteLevel", "add_prefix_space": false, "trim_offsets": true }, "post_processor": { "type": "ByteLevel", "add_prefix_space": true, "trim_offsets": false }, "decoder": { "type": "ByteLevel", "add_prefix_space": true, "trim_offsets": true }, "model": { "dropout": null, "unk_token": null, "continuing_subword_prefix": "", "end_of_word_suffix": "", "fuse_unk": false, "vocab": { "0": 15, "1": 16, "2": 17, "3": 18, "4": 19, "5": 20, "6": 21, "7": 22, "8": 23, "9": 24, "!": 0, "\"": 1, "#": 2, "$": 3, "%": 4, "&": 5, "'": 6, "(": 7, ")": 8, "*": 9, "+": 10, ",": 11, "-": 12, ".": 13, "/": 14, ":": 25, ";": 26, "<": 27, "=": 28, ">": 29, "?": 30, "@": 31, "A": 32, "B": 33, "C": 34, "D": 35, "E": 36, "F": 37, "G": 38, "H": 39, "I": 40, "J": 41, "K": 42, "L": 43, "M": 44, "N": 45, "O": 46, "P": 47, "Q": 48, "R": 49, "S": 50, "T": 51, "U": 52, "V": 53, "W": 54, "X": 55, "Y": 56, "Z": 57, "[": 58, "\\": 59, "]": 60, "^": 61, "_": 62, "`": 63, "a": 64, "b": 65, "c": 66, "d": 67, "e": 68, "f": 69, "g": 70, "h": 71, "i": 72, "j": 73, "k": 74, "l": 75, "m": 76, "n": 77, "o": 78, "p": 79, "q": 80, "r": 81, "s": 82, "t": 83, "u": 84, "v": 85, "w": 86, "x": 87, "y": 88, "z": 89, "{": 90, "|": 91, "}": 92, "~": 93, "¡": 94, "¢": 95, "£": 96, "¤": 97, "¥": 98, "¦": 99, "he": 258, "in": 259, "re": 260, "on": 261, "Ġthe": 262, "er": 263, "Ġs": 264, "at": 265, "Ġw": 266, "Ġo": 267, "en": 268, "Ġc": 269, "it": 270, "is": 271, "an": 272, "or": 273, "es": 274, "Ġb": 275, "ed": 276, "Ġf": 277, "ing": 278, "Ġp": 279, "ou": 280, "Ġan": 281, "al": 282, "ar": 283, "Ġto": 284, "Ġm": 285, "Ġof": 286, "Ġin": 287, "Ġd": 288, "Ġh": 289, "Ġand": 290, "ic": 291, "as": 292, "le": 293, "Ġth": 294, "ion": 295, "om": 296, "ll": 297, "ent": 298, "Ġn": 299, "Ġl": 300, "st": 301, "Ġre": 302, "ve": 303, "Ġe": 304, "ro": 305, "ly": 306, "Ġbe": 307, "Ġg": 308, "ĠT": 309, "ct": 310, "ĠS": 311, "id": 312, "ot": 313, "ĠI": 314, "ut": 315, "et": 316, "ĠA": 317, "Ġis": 318, "Ġon": 319, "im": 320, "am": 321, "ow": 322, "ay": 323, "ad": 324, "se": 325, "Ġthat": 326, "ĠC": 327, "ig": 328, "Ġfor": 329, "ac": 330, "Ġy": 331, "ver": 332, "ur": 333, "Ġu": 334, "ld": 335, "Ġst": 336, "ĠM": 337, "'s": 338, "Ġhe": 339, "Ġit": 340, "ation": 341, "ith": 342, "ir": 343, "ce": 344, "Ġyou": 345, "il": 346, "ĠB": 347, "Ġwh": 348, "ol": 349, "ĠP": 350, "Ġwith": 351, "Ġ1": 352, "ter": 353, "ch": 354, "Ġas": 355, "Ġwe": 356, "Ġ(": 357, "nd": 358, "ill": 359, "ĠD": 360, "if": 361, "Ġ2": 362, "ag": 363, "ers": 364, "ke": 365, "Ġ\"": 366, "ĠH": 367, "em": 368, "Ġcon": 369, "ĠW": 370, "ĠR": 371, "her": 372, "Ġwas": 373, "Ġr": 374, "od": 375, "ĠF": 376, "ul": 377, "ate": 378, "Ġat": 379, "ri": 380, "pp": 381, "ore": 382, "ĠThe": 383, "Ġse": 384, "us": 385, "Ġpro": 386, "Ġha": 387, "um": 388, "Ġare": 389, "Ġde": 390, "ain": 391, "and": 392, |
{ "0": 15, "1": 16, "2": 17, "3": 18, "4": 19, "5": 20, "6": 21, "7": 22, "8": 23, "9": 24, "!": 0, "\"": 1, "#": 2, "$": 3, "%": 4, "&": 5, "'": 6, "(": 7, ")": 8, "*": 9, "+": 10, ",": 11, "-": 12, ".": 13, "/": 14, ":": 25, ";": 26, "<": 27, "=": 28, ">": 29, "?": 30, "@": 31, "A": 32, "B": 33, "C": 34, "D": 35, "E": 36, "F": 37, "G": 38, "H": 39, "I": 40, "J": 41, "K": 42, "L": 43, "M": 44, "N": 45, "O": 46, "P": 47, "Q": 48, "R": 49, "S": 50, "T": 51, "U": 52, "V": 53, "W": 54, "X": 55, "Y": 56, "Z": 57, "[": 58, "\\": 59, "]": 60, "^": 61, "_": 62, "`": 63, "a": 64, "b": 65, "c": 66, "d": 67, "e": 68, "f": 69, "g": 70, "h": 71, "i": 72, "j": 73, "k": 74, "l": 75, "m": 76, "n": 77, "o": 78, "p": 79, "q": 80, "r": 81, "s": 82, "t": 83, "u": 84, "v": 85, "w": 86, "x": 87, "y": 88, "z": 89, "{": 90, "|": 91, "}": 92, "~": 93, "¡": 94, "¢": 95, "£": 96, "¤": 97, "¥": 98, "¦": 99, "he": 258, "in": 259, "re": 260, "on": 261, "Ġthe": 262, "er": 263, "Ġs": 264, "at": 265, "Ġw": 266, "Ġo": 267, "en": 268, "Ġc": 269, "it": 270, "is": 271, "an": 272, "or": 273, "es": 274, "Ġb": 275, "ed": 276, "Ġf": 277, "ing": 278, "Ġp": 279, "ou": 280, "Ġan": 281, "al": 282, "ar": 283, "Ġto": 284, "Ġm": 285, "Ġof": 286, "Ġin": 287, "Ġd": 288, "Ġh": 289, "Ġand": 290, "ic": 291, "as": 292, "le": 293, "Ġth": 294, "ion": 295, "om": 296, "ll": 297, "ent": 298, "Ġn": 299, "Ġl": 300, "st": 301, "Ġre": 302, "ve": 303, "Ġe": 304, "ro": 305, "ly": 306, "Ġbe": 307, "Ġg": 308, "ĠT": 309, "ct": 310, "ĠS": 311, "id": 312, "ot": 313, "ĠI": 314, "ut": 315, "et": 316, "ĠA": 317, "Ġis": 318, "Ġon": 319, "im": 320, "am": 321, "ow": 322, "ay": 323, "ad": 324, "se": 325, "Ġthat": 326, "ĠC": 327, "ig": 328, "Ġfor": 329, "ac": 330, "Ġy": 331, "ver": 332, "ur": 333, "Ġu": 334, "ld": 335, "Ġst": 336, "ĠM": 337, "'s": 338, "Ġhe": 339, "Ġit": 340, "ation": 341, "ith": 342, "ir": 343, "ce": 344, "Ġyou": 345, "il": 346, "ĠB": 347, "Ġwh": 348, "ol": 349, "ĠP": 350, "Ġwith": 351, "Ġ1": 352, "ter": 353, "ch": 354, "Ġas": 355, "Ġwe": 356, "Ġ(": 357, "nd": 358, "ill": 359, "ĠD": 360, "if": 361, "Ġ2": 362, "ag": 363, "ers": 364, "ke": 365, "Ġ\"": 366, "ĠH": 367, "em": 368, "Ġcon": 369, "ĠW": 370, "ĠR": 371, "her": 372, "Ġwas": 373, "Ġr": 374, "od": 375, "ĠF": 376, "ul": 377, "ate": 378, "Ġat": 379, "ri": 380, "pp": 381, "ore": 382, "ĠThe": 383, "Ġse": 384, "us": 385, "Ġpro": 386, "Ġha": 387, "um": 388, "Ġare": 389, "Ġde": 390, "ain": 391, "and": 392, |
+
| >>> tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E2B") config.json: 4.91kB [00:00, 8.83MB/s] C:\Users\minimonk\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\file_download.py:138: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\minimonk\.cache\huggingface\hub\models--google--gemma-4-E2B. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message) tokenizer_config.json: 100%|██████████████████████████████████████████████████████████| 906/906 [00:00<00:00, 2.57MB/s] tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 32.2M/32.2M [00:02<00:00, 13.3MB/s] |
| ollama 모델 저장소 뜯어보기 (0) | 2026.04.19 |
|---|---|
| llm tokenizer - phi3 (0) | 2026.04.19 |
| llama.cpp (0) | 2026.04.17 |
| lm studio (0) | 2026.04.17 |
| 사람의 욕심은 끝이없고 - ollama multiple GPU support (0) | 2026.04.17 |
ollama 보다 성능이 좋게 나온다는데 한 번 쓰는법 찾아봐야지
[링크 : https://peekaboolabs.ai/blog/ollama-vs-llama-cpp-guide]
[링크 :https://news.hada.io/topic?id=28622]
[링크 : https://github.com/ggml-org/llama.cpp]
official은 아니지만 윈도우용 pre-built binary가 존재는 하는 듯.
[링크 : https://github.com/HPUhushicheng/llama.cpp_windows]
일단 python 라이브러리
| from llama_cpp import Llama llm = Llama( model_path="./models/7B/llama-model.gguf", # n_gpu_layers=-1, # Uncomment to use GPU acceleration # seed=1337, # Uncomment to set a specific seed # n_ctx=2048, # Uncomment to increase the context window ) output = llm( "Q: Name the planets in the solar system? A: ", # Prompt max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window stop=["Q:", "\n"], # Stop generating just before the model would generate a new question echo=True # Echo the prompt back in the output ) # Generate a completion, can also call create_completion print(output) |
[링크 : https://pypi.org/project/llama-cpp-python/]
| OpenCL 드라이버 인스톨 된 거 확인하고 LLAMA_CLBLAST=1 make 이렇게 컴파일 하면 된다고 합니다. make인걸 보니 리눅스에서 컴파일하는 걸 테고 윈도에서는 cmake 써야겠죠. |
[링크 : https://arca.live/b/alpaca/76969814]
전체 연산이 아니라 token 생성만 가속인가?
| OpenCL Token Generation Acceleration |
[링크 : https://github.com/ggml-org/llama.cpp/releases/tag/master-2e6cd4b]
| To get this running on the XTX I had to install the latest 5.5 version of the AMD linux drivers, which are released but not available from the normal AMD download page yet. You can get the deb for the installer here. I installed with amdgpu-install --usecase=opencl,rocm and installed CLBlast after apt install libclblast-dev. Confirm opencl is working with sudo clinfo (did not find the GPU device unless I run as root). Build llama.cpp (with merged pull) using LLAMA_CLBLAST=1 make. |
[링크 : https://www.reddit.com/r/LocalLLaMA/comments/13m8li2/finally_got_a_model_running_on_my_xtx_using/]
| llm tokenizer - phi3 (0) | 2026.04.19 |
|---|---|
| llm tokenizer (0) | 2026.04.17 |
| lm studio (0) | 2026.04.17 |
| 사람의 욕심은 끝이없고 - ollama multiple GPU support (0) | 2026.04.17 |
| ollama with 1080 Ti (0) | 2026.04.16 |
맥북에서 lm studio로 openclaw 이런 이야기가 많이 나오길래 검색
ollama 처럼 모델 불러서 쓸 수 있나?
| import lmstudio as lms EXAMPLE_MESSAGES = ( "My hovercraft is full of eels!", "I will not buy this record, it is scratched." ) model = lms.llm() chat = lms.Chat("You are a helpful shopkeeper assisting a foreign traveller") for message in EXAMPLE_MESSAGES: chat.add_user_message(message) print(f"Customer: {message}") response = model.respond(chat) chat.add_assistant_response(response) print(f"Shopkeeper: {response}") |
[링크 : https://pypi.org/project/lmstudio/]
저 모델 명칭은 어디서 얻고, 어디서 다운로드 받는걸까?
-> 허깅스페이스 모델명으로 검색되서 받아오는 듯 gguf 포맷으로
| const model = await client.llm.load("qwen2.5-7b-instruct", { config: { contextLength: 8192, gpu: { ratio: 0.5, }, }, }); |
[링크 : https://lmstudio.ai/docs/typescript/llm-prediction/parameters]
[링크 : https://lmstudio.ai/docs/python]
[링크 : https://lmstudio.ai/]
| llm tokenizer (0) | 2026.04.17 |
|---|---|
| llama.cpp (0) | 2026.04.17 |
| 사람의 욕심은 끝이없고 - ollama multiple GPU support (0) | 2026.04.17 |
| ollama with 1080 Ti (0) | 2026.04.16 |
| 트랜스포머 모델 입/출력 (0) | 2026.04.12 |
google ai 요약에 따르면
ollama는 VRAM 합산해서 큰 용량의 모델을 돌릴수 있다고 한다.
|
Ollama supports multiple GPUs (both NVIDIA and AMD) by automatically splitting model layers across available VRAM, allowing users to run large models that exceed the capacity of a single card.
Reddit +1
VRAM Aggregation and Usage
Multi-GPU Performance Considerations
How to Configure
VRAM Requirements by Model Size
|
그래서 dual GPU setup 으로 이야기가 나오는데,
아무튼 24G VRAM 두개 해서 48G 로 해서 70B 모델을 돌릴수 있다고 한다.
하나 더 지르고.. 메인보드도 sli/crossfire 지원으로 바꾸고 파워도 올리고.. 해야하나?
| Can I use multiple GPUs with Ollama for larger models? Yes, Ollama supports multi-GPU configurations for NVIDIA and AMD cards. For NVIDIA, set CUDA_VISIBLE_DEVICES to comma-separated GPU IDs to distribute model layers across multiple GPUs. This enables running 70B models on dual 24GB GPUs (48GB total) that wouldn't fit on a single card. For AMD GPUs, use ROCR_VISIBLE_DEVICES with the same approach to leverage combined VRAM across multiple cards. |
[링크 : https://localllm.in/blog/ollama-vram-requirements-for-local-llms]
| llama.cpp (0) | 2026.04.17 |
|---|---|
| lm studio (0) | 2026.04.17 |
| ollama with 1080 Ti (0) | 2026.04.16 |
| 트랜스포머 모델 입/출력 (0) | 2026.04.12 |
| ollama 소스코드 (0) | 2026.04.12 |