Stable Diffusion’s generative art can now be animated, developer Stability AI announced. The company has released a new product called Stable Video Diffusion into a research preview, allowing users to create video from a single image. “This state-of-the-art generative AI video model represents a significant step in our journey toward creating models for everyone of every type,” the company wrote.
The new tool has been released in the form of two image-to-video models, each capable of generating 14 to 25 frames long at speeds between 3 and 30 frames per second at 576 × 1024 resolution. It’s capable of multi-view synthesis from a single frame with fine-tuning on multi-view datasets. “At the time of release in their foundational form, through external evaluation, we have found these models surpass the leading closed models in user preference studies,” the company said, comparing it to text-to-video platforms Runway and Pika Labs.
Stable Video Diffusion is available only for research purposes at this point, not real-world or commercial applications. Potential users can sign up to get on a waitlist for access to an “upcoming web experience featuring a text-to-video interface,” Stability AI wrote. The tool will showcase potential applications in sectors including advertising, education, entertainment and more.
The samples shown in the video above appear to be of relatively high quality, matching rival generative systems. However, it has some limitations, the company wrote: it generates relatively short video (less than 4 seconds), lacks perfect photorealism, can’t do camera motion except slow pans, has no text control, can’t generate legible text and may not generate people and faces properly.
The tool was trained on a dataset of millions of videos and then fine-tuned on a smaller set, with Stability AI only saying that it used video that was publicly available for research purposes. The origin of the data set is important, given that Stability AI was recently sued by Getty Images for scraping its image archives.
Video is a key goal for generative AI, due to its potential to simplify content creation. However, it’s also a tool with the most potential for abuse via deepfakes, copyright violations and more. And unlike OpenAI with its ChatGPT product, Stability has had less success commercializing its Stable Diffusion product and burned through cash at a high rate, TechCrunch noted. And last week, vice president of audio at Stability AI, Ed Newton-Rex, resigned over the use of copyrighted content to train generative AI models.