wan2.2 + comfyui 시도

구차니 2026. 5. 21. 10:52

심심하면(?) VRAM 부족으로 터져서

$ python3 main.py --listen 0.0.0.0
setup plugin alembic.autogenerate.schemas
setup plugin alembic.autogenerate.tables
setup plugin alembic.autogenerate.types
setup plugin alembic.autogenerate.constraints
setup plugin alembic.autogenerate.defaults
setup plugin alembic.autogenerate.comments
WARNING: You need pytorch with cu130 or higher to use optimized CUDA operations.
Found comfy_kitchen backend cuda: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend eager: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_mxfp8', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_mxfp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend triton: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8']}
Checkpoint files will always be loaded safely.
Total VRAM 11165 MB, total RAM 31755 MB
pytorch version: 2.7.1+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce GTX 1080 Ti : cudaMallocAsync
Using async weight offloading with 2 streams
Enabled pinned memory 28579.0
Using pytorch attention
Unsupported Pytorch detected. DynamicVRAM support requires Pytorch version 2.8 or later. Falling back to legacy ModelPatcher. VRAM estimates may be unreliable especially on Windows
Python version: 3.10.12 (main, Mar  3 2026, 11:56:32) [GCC 11.4.0]
ComfyUI version: 0.21.1
comfy-aimdo version: 0.3.0
comfy-kitchen version: 0.2.8
comfyui-frontend-package version: 1.43.18
comfyui-workflow-templates version: 0.9.77
comfyui-embedded-docs version: 0.5.0
comfy-kitchen version: 0.2.8
comfy-aimdo version: 0.3.0
[Prompt Server] web root: /home/minimonk/.local/lib/python3.10/site-packages/comfyui_frontend_package/static
Asset seeder disabled

Import times for custom nodes:
   0.0 seconds: /mnt/Downloads/ComfyUI/custom_nodes/websocket_image_save.py

Context impl SQLiteImpl.
Will assume non-transactional DDL.
Starting server

To see the GUI go to: http://0.0.0.0:8188
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
Found quantization metadata version 1
Using MixedPrecisionOps for text encoder
Requested to load WanTEModel
loaded completely;  6419.48 MB loaded, full load: True
CLIP/text encoder model load device: cpu, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanVAE
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 484.00 MB offloaded, 45.57 MB buffer reserved, lowvram patches: 0
Found quantization metadata version 1
Detected mixed precision quantization
Using mixed precision operations
Native ops:  , emulated ops: mxfp8, float8_e4m3fn, float8_e5m2, nvfp4
model weight dtype torch.float16, manual cast: torch.float32
model_type FLOW
Requested to load WAN21
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 13636.09 MB offloaded, 885.22 MB buffer reserved, lowvram patches: 0
  0%|                                                    | 0/10 [00:33<?, ?it/s]
!!! Exception during processing !!! Allocation on device
Traceback (most recent call last):
  File "/mnt/Downloads/ComfyUI/execution.py", line 535, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
  File "/mnt/Downloads/ComfyUI/execution.py", line 335, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
  File "/mnt/Downloads/ComfyUI/execution.py", line 309, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/mnt/Downloads/ComfyUI/execution.py", line 297, in process_inputs
    result = f(**inputs)
  File "/mnt/Downloads/ComfyUI/nodes.py", line 1612, in sample
    return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise)
  File "/mnt/Downloads/ComfyUI/nodes.py", line 1542, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "/mnt/Downloads/ComfyUI/comfy/sample.py", line 74, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 1180, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 1070, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 1052, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
  File "/mnt/Downloads/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 995, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 981, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "/mnt/Downloads/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 751, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/k_diffusion/sampling.py", line 205, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 400, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 954, in __call__
    return self.outer_predict_noise(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 961, in outer_predict_noise
    ).execute(x, timestep, model_options, seed)
  File "/mnt/Downloads/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 964, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 380, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 205, in calc_cond_batch
    return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 213, in _calc_cond_batch_outer
    return executor.execute(model, conds, x_in, timestep, model_options)
  File "/mnt/Downloads/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/samplers.py", line 325, in _calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
  File "/mnt/Downloads/ComfyUI/comfy/model_base.py", line 182, in apply_model
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
  File "/mnt/Downloads/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/model_base.py", line 226, in _apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/ldm/wan/model.py", line 644, in forward
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
  File "/mnt/Downloads/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/ldm/wan/model.py", line 664, in _forward
    return self.forward_orig(x, timestep, context, clip_fea=clip_fea, freqs=freqs, transformer_options=transformer_options, **kwargs)[:, :, :t, :h, :w]
  File "/mnt/Downloads/ComfyUI/comfy/ldm/wan/model.py", line 597, in forward_orig
    x = block(x, e=e0, freqs=freqs, context=context, context_img_len=context_img_len, transformer_options=transformer_options)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/Downloads/ComfyUI/comfy/ldm/wan/model.py", line 258, in forward
    y = self.ffn(torch.addcmul(repeat_e(e[3], x), self.norm2(x), 1 + repeat_e(e[4], x)))
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 240, in forward
    input = module(input)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/minimonk/.local/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 734, in forward
    return F.gelu(input, approximate=self.approximate)
torch.OutOfMemoryError: Allocation on device

Memory summary:
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |   5675 MiB |   7030 MiB |      0 B   |      0 B   |
|       from large pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|       from small pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Active memory         |   5675 MiB |   7030 MiB |      0 B   |      0 B   |
|       from large pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|       from small pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Requested memory      |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| GPU reserved memory   |  10784 MiB |  10784 MiB |      0 B   |      0 B   |
|       from large pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|       from small pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Non-releasable memory |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Allocations           |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize allocations  |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize GPU segments |       0    |       0    |       0    |       0    |
|===========================================================================|

Got an OOM, unloading all loaded models.
Prompt executed in 154.15 seconds

둘 중에 하나 주면 된다는데 해도 터지고

python3 main.py --listen 0.0.0.0 --lowvram
python3 main.py --listen 0.0.0.0 --novram

멀 하다가 꼬였는지 패키지 문제가 생겨서 다시 밀고 cuda 11.8에 맞춰서 재설치

pip3 uninstall -y torch torchvision torchaudio xformers
pip3 install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install xformers==0.0.29.post2

그래도 하려니 터져서 768x768 이었던게 왜 800x800이 되었는진 모르겠고 길이가 81로 되어있어서

일단은 512x512에 29 로 바꾸고 재시도

먼가 되는거 같긴한데.. offloaded가 엄청 크네.

got prompt
Requested to load WanVAE
loaded completely; 4417.69 MB usable, 484.06 MB loaded, full load: True
[MultiGPU Runtime] Using runtime device cuda:0 (comfy.sample.sample:ModelPatcher)
Requested to load WAN21
loaded partially; 7397.49 MB usable, 7211.06 MB loaded, 6425.03 MB offloaded, 175.06 MB buffer reserved, lowvram patches: 0
0%| | 0/10 [00:00<?, ?it/s]
20%|████████▌ | 2/10 [09:15<37:03, 277.94s/it]

1시간은 족히 넘은거 같은데 (37 띄워놓고 거짓말 쟁이!)

이제야 두번째 KSampler로 넘어갔다!!! 이예!!!

메모리 답이 안나올거 같아서 low_noise가 아닌 high_noise 에 해상도까지 낮추고 했는데

1시간 33분 16 동안 해서 29frame / 16fps 해서 대충 1.8초 짜리 똥을 생성해냄

loaded completely; 883.22 MB usable, 484.06 MB loaded, full load: True
Prompt executed in 01:33:16

이게 머야 ㅋㅋㅋ

ComfyUI_00006_.webm

1.61MB

왜 이번에는 webm이 아니라 webp일까?

그 와중에 49frame / 16fps 약 3초 만드는데 2시간 30분..

2시간 30분 짜리 똥이야!

got prompt
Found quantization metadata version 1
Detected mixed precision quantization
Using mixed precision operations
Native ops: , emulated ops: float8_e4m3fn, mxfp8, nvfp4, float8_e5m2
model weight dtype torch.float16, manual cast: torch.float32
model_type FLOW
[MultiGPU Runtime] Using runtime device cuda:0 (comfy.sample.sample:ModelPatcher)
Requested to load WAN21
loaded partially; 5509.86 MB usable, 5037.30 MB loaded, 8598.79 MB offloaded, 472.56 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████| 10/10 [1:14:32<00:00, 447.24s/it]
[MultiGPU Runtime] Using runtime device cuda:0 (comfy.sample.sample:ModelPatcher)
Requested to load WAN21
loaded partially; 5495.86 MB usable, 5019.61 MB loaded, 8616.47 MB offloaded, 472.56 MB buffer reserved, lowvram patches: 0
0%| | 0/10 [00:00<?, ?it/s]100%|████████████████████████████████████████| 10/10 [1:14:33<00:00, 447.32s/it]
Requested to load WanVAE
Unloaded partially: 2187.40 MB freed, 2832.21 MB remains loaded, 472.56 MB buffer reserved, lowvram patches: 0
loaded completely; 545.46 MB usable, 484.06 MB loaded, full load: True
Prompt executed in 02:30:07

ComfyUI_00008_.webp

1.00MB

저작자표시 (새창열림)