Kandinsky 5.0 Video in the text-to-video arena

Kandinsky 5.0 Video in the text-to-video arena Results for the Kandinsky 5.0 Video Lite and Pro models have appeared on the arena. The Pro version is the top open model in the world. Overall, the Pro version falls short compared to SOTA models from Google, OpenAI, Alibaba, and KlingAI. However, it can be said to be on par with Luma Ray 3 and Minimax Hailuo 2.3 (the gap in ELO is a maximum of 3 points, with a 95% confidence interval of ±21 points). The Lite version (2B parameters) has outperformed the first version of Sora. It is worth noting that the very fact of a Russian generative model entering the international arena and competing with other players is quite a rare and, I would say, unexpected event. In terms of architecture, it is quite a large (19B) DiT with cross attention on text. At the same time, it uses a VAE based on HunyuanVideo. It generates 24fps video lasting 5 or 10 seconds in HD (1280x768).

AI Tools Mentioned