Category · 11 models

Automated Audio Syncing for Video

Lip-sync, dub and re-time voice to picture without a sound editor.

What it is

Generative voice + video models align dubbed dialog to mouth movement, re-time ADR, and produce localized soundtracks for ads, training and film.

Real-world examples

  • ·Dub an English ad into 12 languages with matched lip-sync
  • ·Replace narration in an explainer video without re-shooting
  • ·Sync AI-generated music to a cut

What to look for

  • ·Speaker-identity preservation
  • ·Frame-accurate alignment
  • ·Rights & consent for voice cloning

11 models in this category

Sora

OpenAI

AIDB92

Text-to-video model producing minute-long cinematic clips.

Video Generation
VideoProprietary

Veo 3

Google DeepMind

AIDB93

High-fidelity video generation with native synchronised audio.

Video GenerationAudio / Speech
Video + AudioProprietary

HeyGen Avatar IV

HeyGen

AIDB86

AI avatar video generator for marketing and training.

Video GenerationAudio / Speech
VideoProprietary

Reka Core

Reka AI

AIDB81

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka.

Audio / SpeechCodeImage Generation
AudioProprietary

Gemini 2.0 Pro

Google DeepMind

AIDB94

Today, we’re releasing an experimental version of Gemini 2.0 Pro that responds to that feedback.

Audio / SpeechCodeImage Generation
AudioProprietary

Gemini 2.5 Pro (Mar 2025)

Google DeepMind

AIDB92

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation
AudioProprietary

Gemini 2.5 Pro (May 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation
AudioProprietary

Gemini 2.5 Pro (Jun 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation
AudioProprietary

Gemini 2.5 Deep Think

Google DeepMind

AIDB94

To advance Gemini’s capabilities towards solving hard reasoning problems, we developed a novel reasoning approach, called Deep Think, that naturally blends in parallel thinking techniques during response generation.

Audio / SpeechCodeImage Generation
AudioProprietary

Qwen3-Omni-30B-A3B

Alibaba

AIDB87

We present Qwen3-Omni, a single multimodal model that, for the first time, maintains state-of-the-art performance across text, image, audio, and video without any degradation relative to single-modal counterparts.

Audio / SpeechImage GenerationMultimodal
AudioOpen Weights

Seedance 2.0

ByteDance

AIDB82

ByteDance's image generation, video, audio model tracked by Epoch, focused on video generation.

Audio / SpeechImage GenerationVideo Generation
AudioProprietary

Explore other categories