
Artificial intelligence (AI) models have made huge strides in creating visual content, but video creation remains a complex and resource-intensive process. The most advanced models often require hundreds of steps to create a quality video. Faced with this challenge, OpenAI researchers Cheng Lu and Yang Song have developed a revolutionary continuous-time coherence model (sCM) that can create video fifty times faster than existing models. Their work could pave the way for real-time generative AI applications – a breakthrough with promising implications for the future of media. Diffusion models: the foundation of generative AI To understand this innovation, it’s useful to briefly review what a diffusion model is – one of the most commonly used types of models in generative AI. These models, sometimes referred to as generative models based on estimation, operate in three main stages: a forward process, a backward process, and a sampling stage. The model learns to generate visual content by training on large amounts of data, which it analyses and decomposes to recreate a new, consistent and realistic creation.
However, in a typical data model, the sampling process is slow because it requires adjusting every detail of the generated content. This requires hundreds of iterations and significant computing power, which is why AI systems often experience delays. This process is even more challenging for video, where consistency from one image to the next is key to smooth rendering. A revolutionary model to reduce the number of steps to two The major breakthrough that OpenAI engineers achieved is that they were able to reduce the rendering process to just two steps – a simplification that dramatically changes the speed of generation. While other models take several seconds or more to generate high-quality video, Lu and Song’s sCM model achieves this performance in a fraction of a second. By keeping only the two main steps, the model maintains the quality of the generated content, but speeds up the process significantly, which reduces complexity and the need for processing power. To achieve this speed, the sCM model uses over 1.5 billion parameters that allow the system to analyse and create video with incredible accuracy. Furthermore, it can run on industry standard hardware such as the A100 GPU, making it much more affordable than models that require specialised and expensive hardware. Implications and potential applications of the OpenAi model The OpenAI sCM model opens the door for real-time generative applications – an area where AI can transform entire industries, from entertainment to digital communications. In content creation, this model could allow creators to quickly generate personalised videos, opening up opportunities for marketing, education and social media. Imagine a world where content creators can create high-quality videos in seconds, without requiring high technical skills or expensive IT resources. What’s more, this model is far less power-intensive than existing systems, which is a particularly valuable resource saving at a time when the power consumption of AI applications is skyrocketing. The researchers also hope that this model will facilitate the development of augmented reality (AR) and virtual reality (VR), where real-time images and videos can be integrated into virtual environments in a smoother and more realistic way.