Category · 48 models

Make Videos

Turn a sentence into a short video clip — for ads, social, and storytelling.

What it is

These models generate moving images from text or still photos. They're reshaping advertising, social content, and pre-visualization in film — work that used to cost thousands per clip.

Real-world examples

  • ·Produce a 10-second ad from a script
  • ·Animate a still product photo
  • ·Make a teaser trailer mockup
  • ·Generate b-roll for a YouTube video

What to look for

  • ·Length and resolution of clips
  • ·How realistic motion and physics look
  • ·Whether characters stay consistent

48 models in this category

Sora

OpenAI

AIDB92

Text-to-video model producing minute-long cinematic clips.

Video Generation
VideoProprietary

Veo 3

Google DeepMind

AIDB93

High-fidelity video generation with native synchronised audio.

Video GenerationAudio / Speech
Video + AudioProprietary

Runway Gen-4

Runway

AIDB86

Pro video generation with consistent characters and worlds.

Video Generation
VideoProprietary

Kling 2.0

Kuaishou

AIDB86

Chinese text-to-video model with strong physical realism.

Video Generation
VideoProprietary

Pika 2.0

Pika Labs

AIDB81

Creative video generator with scene ingredients and edits.

Video Generation
VideoProprietary

Wan 2.2

Alibaba

AIDB89

Open text-to-video and image-to-video model.

Video Generation
VideoOpen Weights

Hailuo 02

MiniMax

AIDB84

Cinematic text-to-video generator.

Video Generation
VideoProprietary

Seedance 1.0

ByteDance

AIDB82

ByteDance Seed video generation model.

Video Generation
VideoProprietary

Luma Ray 2

Luma AI

AIDB82

Large video generative model with realistic motion.

Video Generation
VideoProprietary

HeyGen Avatar IV

HeyGen

AIDB86

AI avatar video generator for marketing and training.

Video GenerationAudio / Speech
VideoProprietary

Synthesia

Synthesia

AIDB80

Enterprise AI video platform with realistic avatars.

Video Generation
VideoProprietary

HunyuanVideo

Tencent

AIDB86

Open 13B text-to-video model.

Video Generation
VideoOpen Weights

Mochi 1

Genmo

AIDB85

Open-source video generation model.

Video Generation
VideoOpen Weights

LTX Video

Lightricks

AIDB88

Real-time open video generation model.

Video Generation
VideoOpen Weights

Amazon Nova

AWS

AIDB94

Amazon's foundation model family (text, image, video) on Bedrock.

MultimodalImage GenerationVideo Generation
Text + Image + VideoProprietary

Wayve GAIA-2

Wayve

AIDB85

Generative world model for end-to-end embodied driving.

Video GenerationAgents
World ModelProprietary

Adobe Firefly

Adobe

AIDB89

Commercially-safe generative-AI models for image, vector and video.

Image GenerationVideo Generation
Image + VideoProprietary

ManiGaussian

Tsinghua University

AIDB84

Performing language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots.

AgentsImage UnderstandingMultimodal
Vision + ActionOpen Weights

Reka Core

Reka AI

AIDB81

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka.

Audio / SpeechCodeImage Generation
AudioProprietary

VILA1.5-13B

NVIDIA

AIDB90

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Image GenerationMultimodalText Generation
VideoOpen Weights

LLaVA-OV-72B

ByteDance

AIDB83

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series.

Image GenerationMultimodalText Generation
VideoOpen Weights

Oryx 34B

Tsinghua University

AIDB86

Visual data comes in various forms, ranging from small icons of just a few pixels to long videos spanning hours.

3DImage GenerationMultimodal
3DOpen Weights

PixelDance

ByteDance

AIDB84

PixelDance V1.4 is a video generation model developed by the ByteDance Research team, using the DiT structure.

Image GenerationVideo Generation
VideoProprietary

Movie Gen Video

Meta

AIDB90

We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio.

Image GenerationVideo Generation
VideoProprietary

Amazon Nova Pro

Amazon

AIDB93

A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.

CodeImage GenerationMultimodal
VideoProprietary

NVILA 15B

NVIDIA

AIDB91

Visual language models (VLMs) have made significant advances in accuracy in recent years.

Image GenerationMultimodalText Generation
VideoOpen Weights

Sora Turbo

OpenAI

AIDB92

Our video generation model is rolling out at sora.com⁠.

Image GenerationVideo Generation
VideoProprietary

Gemini 2.0 Pro

Google DeepMind

AIDB94

Today, we’re releasing an experimental version of Gemini 2.0 Pro that responds to that feedback.

Audio / SpeechCodeImage Generation
AudioProprietary

Apollo 7B

Meta AI

AIDB91

Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood.

MultimodalText GenerationVideo Generation
VideoProprietary

Veo 2

Google DeepMind

AIDB93

Google DeepMind's video, vision model tracked by Epoch, focused on video generation.

Image GenerationVideo Generation
VideoProprietary

ERNIE-4.5-VL-424B-A47B (文心大模型4.5)

Baidu

AIDB85

In this report, we introduce ERNIE 4.5, a new family of large-scale multimodal models comprising 10 distinct variants.

CodeImage GenerationMultimodal
VideoOpen Weights

Diffusion Renderer

NVIDIA

AIDB88

Understanding and modeling lighting effects are fundamental tasks in computer vision and graphics.

Video Generation
VideoOpen Weights

Gemini 2.5 Pro (Mar 2025)

Google DeepMind

AIDB92

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation
AudioProprietary

Gemini 2.5 Pro (May 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation
AudioProprietary

Seed1.5-VL

ByteDance

AIDB86

We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning.

Image GenerationMultimodalText Generation
VideoProprietary

Gemini 2.5 Pro (Jun 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation
AudioProprietary

Gemini 2.5 Deep Think

Google DeepMind

AIDB94

To advance Gemini’s capabilities towards solving hard reasoning problems, we developed a novel reasoning approach, called Deep Think, that naturally blends in parallel thinking techniques during response generation.

Audio / SpeechCodeImage Generation
AudioProprietary

Qwen3-Omni-30B-A3B

Alibaba

AIDB87

We present Qwen3-Omni, a single multimodal model that, for the first time, maintains state-of-the-art performance across text, image, audio, and video without any degradation relative to single-modal counterparts.

Audio / SpeechImage GenerationMultimodal
AudioOpen Weights

Sora 2.0

OpenAI

AIDB91

Our latest video generation model is more physically accurate, realistic, and more controllable than prior systems.

Video Generation
VideoProprietary

Veo 3.1

Google DeepMind

AIDB94

We’re also introducing Veo 3.1, which brings richer audio, more narrative control, and enhanced realism that captures true-to-life textures.

Image GenerationVideo Generation
VideoProprietary

Seedance 2.0

ByteDance

AIDB82

ByteDance's image generation, video, audio model tracked by Epoch, focused on video generation.

Audio / SpeechImage GenerationVideo Generation
AudioProprietary

Mirage

Decart

AIDB83

Real-time generative world model that re-skins live video streams with text prompts.

Video GenerationImage Generation
VideoProprietary

Kling 2.5

Kuaishou

AIDB83

Kuaishou's flagship text-to-video model with strong motion coherence and 1080p output.

Video Generation
VideoProprietary

Aleph 2

Runway

AIDB90

Runway's closed-source in-context video editing model that modifies existing videos while preserving untouched regions.

Video GenerationMultimodal
VideoProprietary

LongCat Video Avatar 1.5

Meituan

AIDB88

Meituan LongCat's open-source audio-driven avatar video model for single- and multi-character human video generation.

Video GenerationMultimodal
Video + AudioOpen Weights

Gemini Omni Flash

Google DeepMind

AIDB94

Google DeepMind's closed-source multimodal video creation and editing model that generates or edits video from text, image, video, and audio references.

Video GenerationImage GenerationMultimodal
Text + Image + Video + AudioProprietary

Lance

ByteDance

AIDB82

ByteDance's foundation model for fast multimodal content creation across short-form video pipelines.

MultimodalVideo Generation
Text + Image + VideoProprietary

Agora 1

Odyssey

AIDB86

Odyssey's interactive world model for real-time AI-generated explorable video environments.

Video GenerationAgents3D
Interactive VideoProprietary

Explore other categories