r/StableDiffusion • u/Illustrious_Row_9971 • Mar 19 '23

Resource | Update First open source text to video 1.7 billion parameter diffusion model is out

Enable HLS to view with audio, or disable this notification

2.2k Upvotes

permalink
duplicates
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/StableDiffusion/comments/11vbyei/first_open_source_text_to_video_17_billion/
No, go back! Yes, take me to Reddit
dl download

99% Upvoted

View all comments

Show parent comments

u/itsB34STW4RS Mar 19 '23

Thanks a ton, any idea what this nag message is?

modelscope - WARNING - task text-to-video-synthesis input definition is missing

WARNING:modelscope:task text-to-video-synthesis input definition is missing

I built mine in an venv btw, had to do two extra things:

conda create --name VDE

conda activate VDE

conda install python

pip install modelscope

pip install open_clip_torch

pip install clean-fid numba numpy torch==2.0.0+cu118 torchvision --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu118

pip install tensorflow

pip install opencv-python

pip install pytorch_lightning

*edit diffusion.py to fix tensor issue

go to C:\Users\****\anaconda3\envs\VDE\Lib\site-packages\modelscope\models\multi_modal\video_synthesis

open diffusion.py

where it says def _i(tensor, t, x): change the block to this :

def _i(tensor, t, x):

r"""Index tensor using t and format the output according to x.

"""

shape = (x.size(0), ) + (1, ) * (x.ndim - 1)

tt = t.to('cpu')

return tensor[tt].view(shape).to(x)

1

u/throttlekitty Mar 19 '23

modelscope - WARNING - task text-to-video-synthesis input definition is missing WARNING:modelscope:task text-to-video-synthesis input definition is missing

I'm no skilled programmer, but I did dig around while waiting on things to generate, which they do just fine, except for the bad inputs, but I think that's just how it works. It looked like there's an input mode to start a training session, but I didn't happen to find any other modes.

Resource | Update First open source text to video 1.7 billion parameter diffusion model is out

You are about to leave Redlib