Category · 48 models

Cinematic Short-Form Video Generation

Turn a prompt or storyboard into broadcast-grade clips up to a minute.

What it is

Generative video models for ads, social, pre-vis and music: text-to-video, image-to-video, and increasingly sound-on, with controllable camera moves and character consistency.

Real-world examples

·Produce a 15-second product ad from a script
·Animate a still photo with a controlled camera move
·Generate b-roll variants for a YouTube edit

What to look for

·Clip length and resolution
·Character / scene consistency
·Commercial-use license

48 models in this category

Compare

Sora

OpenAI

AIDB92

Text-to-video model producing minute-long cinematic clips.

Video Generation

VideoProprietary

Compare

Veo 3

Google DeepMind

AIDB93

High-fidelity video generation with native synchronised audio.

Video GenerationAudio / Speech

Video + AudioProprietary

Compare

Runway Gen-4

Runway

AIDB86

Pro video generation with consistent characters and worlds.

Video Generation

VideoProprietary

Compare

Kling 2.0

Kuaishou

AIDB86

Chinese text-to-video model with strong physical realism.

Video Generation

VideoProprietary

Compare

Pika 2.0

Pika Labs

AIDB81

Creative video generator with scene ingredients and edits.

Video Generation

VideoProprietary

Compare

Wan 2.2

Alibaba

AIDB89

Open text-to-video and image-to-video model.

Video Generation

VideoOpen Weights

Compare

Hailuo 02

MiniMax

AIDB84

Cinematic text-to-video generator.

Video Generation

VideoProprietary

Compare

Seedance 1.0

ByteDance

AIDB82

ByteDance Seed video generation model.

Video Generation

VideoProprietary

Compare

Luma Ray 2

Luma AI

AIDB82

Large video generative model with realistic motion.

Video Generation

VideoProprietary

Compare

HeyGen Avatar IV

HeyGen

AIDB86

AI avatar video generator for marketing and training.

Video GenerationAudio / Speech

VideoProprietary

Compare

Synthesia

AIDB80

Enterprise AI video platform with realistic avatars.

Video Generation

VideoProprietary

Compare

HunyuanVideo

Tencent

AIDB86

Open 13B text-to-video model.

Video Generation

VideoOpen Weights

Compare

Mochi 1

Genmo

AIDB85

Open-source video generation model.

Video Generation

VideoOpen Weights

Compare

LTX Video

Lightricks

AIDB88

Real-time open video generation model.

Video Generation

VideoOpen Weights

Compare

Amazon Nova

AWS

AIDB94

Amazon's foundation model family (text, image, video) on Bedrock.

MultimodalImage GenerationVideo Generation

Text + Image + VideoProprietary

Compare

Wayve GAIA-2

Wayve

AIDB85

Generative world model for end-to-end embodied driving.

Video GenerationAgents

World ModelProprietary

Compare

Adobe Firefly

Adobe

AIDB89

Commercially-safe generative-AI models for image, vector and video.

Image GenerationVideo Generation

Image + VideoProprietary

Compare

ManiGaussian

Tsinghua University

AIDB84

Performing language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots.

AgentsImage UnderstandingMultimodal

Vision + ActionOpen Weights

Compare

Reka Core

Reka AI

AIDB81

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

VILA1.5-13B

NVIDIA

AIDB90

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Image GenerationMultimodalText Generation

VideoOpen Weights

Compare

LLaVA-OV-72B

ByteDance

AIDB83

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series.

Image GenerationMultimodalText Generation

VideoOpen Weights

Compare

Oryx 34B

Tsinghua University

AIDB86

Visual data comes in various forms, ranging from small icons of just a few pixels to long videos spanning hours.

3DImage GenerationMultimodal

3DOpen Weights

Compare

PixelDance

ByteDance

AIDB84

PixelDance V1.4 is a video generation model developed by the ByteDance Research team, using the DiT structure.

Image GenerationVideo Generation

VideoProprietary

Compare

Movie Gen Video

Amazon Nova Pro

Amazon

AIDB93

A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.

CodeImage GenerationMultimodal

VideoProprietary

Compare

NVILA 15B

NVIDIA

AIDB91

Visual language models (VLMs) have made significant advances in accuracy in recent years.

Image GenerationMultimodalText Generation

VideoOpen Weights

Compare

Sora Turbo

OpenAI

AIDB92

Our video generation model is rolling out at sora.com⁠.

Image GenerationVideo Generation

VideoProprietary

Compare

Gemini 2.0 Pro

Google DeepMind

AIDB94

Today, we’re releasing an experimental version of Gemini 2.0 Pro that responds to that feedback.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Apollo 7B

Meta AI

AIDB91

Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood.

MultimodalText GenerationVideo Generation

VideoProprietary

Compare

Veo 2

Google DeepMind

AIDB93

Google DeepMind's video, vision model tracked by Epoch, focused on video generation.

Image GenerationVideo Generation

VideoProprietary

Compare

ERNIE-4.5-VL-424B-A47B (文心大模型4.5)

Baidu

AIDB85

In this report, we introduce ERNIE 4.5, a new family of large-scale multimodal models comprising 10 distinct variants.

CodeImage GenerationMultimodal

VideoOpen Weights

Compare

Diffusion Renderer

NVIDIA

AIDB88

Understanding and modeling lighting effects are fundamental tasks in computer vision and graphics.

Video Generation

VideoOpen Weights

Compare

Gemini 2.5 Pro (Mar 2025)

Google DeepMind

AIDB92

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Gemini 2.5 Pro (May 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Seed1.5-VL

ByteDance

AIDB86

We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning.

Image GenerationMultimodalText Generation

VideoProprietary

Compare

Gemini 2.5 Pro (Jun 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Gemini 2.5 Deep Think

Google DeepMind

AIDB94

To advance Gemini’s capabilities towards solving hard reasoning problems, we developed a novel reasoning approach, called Deep Think, that naturally blends in parallel thinking techniques during response generation.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Qwen3-Omni-30B-A3B

Alibaba

AIDB87

We present Qwen3-Omni, a single multimodal model that, for the first time, maintains state-of-the-art performance across text, image, audio, and video without any degradation relative to single-modal counterparts.

Audio / SpeechImage GenerationMultimodal

AudioOpen Weights

Compare

Sora 2.0

OpenAI

AIDB91

Our latest video generation model is more physically accurate, realistic, and more controllable than prior systems.

Video Generation

VideoProprietary

Compare

Veo 3.1

Google DeepMind

AIDB94

We’re also introducing Veo 3.1, which brings richer audio, more narrative control, and enhanced realism that captures true-to-life textures.

Image GenerationVideo Generation

VideoProprietary

Compare

Seedance 2.0

ByteDance

AIDB82

ByteDance's image generation, video, audio model tracked by Epoch, focused on video generation.

Audio / SpeechImage GenerationVideo Generation

AudioProprietary

Compare

Mirage

Decart

AIDB83

Real-time generative world model that re-skins live video streams with text prompts.

Video GenerationImage Generation

VideoProprietary

Compare

Kling 2.5

Kuaishou

AIDB83

Kuaishou's flagship text-to-video model with strong motion coherence and 1080p output.

Video Generation

VideoProprietary

Compare

Aleph 2

Runway

AIDB90

Runway's closed-source in-context video editing model that modifies existing videos while preserving untouched regions.

Video GenerationMultimodal

VideoProprietary

Compare

LongCat Video Avatar 1.5

Meituan

AIDB88

Meituan LongCat's open-source audio-driven avatar video model for single- and multi-character human video generation.

Video GenerationMultimodal

Video + AudioOpen Weights

Compare

Gemini Omni Flash

Google DeepMind

AIDB94

Google DeepMind's closed-source multimodal video creation and editing model that generates or edits video from text, image, video, and audio references.

Video GenerationImage GenerationMultimodal

Text + Image + Video + AudioProprietary

Compare