Category · 66 models

Talk & Listen (Voice)

AI you speak to that speaks back — for calls, meetings, and accessibility.

What it is

Voice AI turns speech into text and back again, often in real time. It powers phone agents, meeting transcripts, audiobooks, and natural conversations with assistants.

Real-world examples

·Transcribe and summarize a sales call
·Answer customer calls 24/7
·Read articles aloud in a natural voice
·Translate a conversation live

What to look for

·How natural the voice sounds
·Accuracy with accents and noisy audio
·Latency for real-time use

66 models in this category

Compare

GPT-4o

OpenAI

AIDB91

Real-time omni-model handling text, vision and voice in a single network.

MultimodalAudio / SpeechImage Understanding

Text + Image + AudioProprietary

Compare

Veo 3

Google DeepMind

AIDB93

High-fidelity video generation with native synchronised audio.

Video GenerationAudio / Speech

Video + AudioProprietary

Compare

Whisper v3

OpenAI

AIDB91

Open multilingual speech recognition and translation model.

Audio / Speech

AudioOpen

Compare

ElevenLabs v3

ElevenLabs

AIDB88

Best-in-class expressive TTS and voice cloning across 70+ languages.

Audio / Speech

AudioProprietary

Compare

Suno v4

Suno

AIDB85

Generates full songs with vocals from a text prompt.

Music

AudioProprietary

Compare

Udio

AIDB84

Text-to-music model focused on production-quality tracks.

Music

AudioProprietary

Compare

NotebookLM

Google

AIDB93

Source-grounded research assistant with audio overviews.

Text GenerationAudio / Speech

Text + AudioProprietary

Compare

TTS-1 / GPT-4o Voice

OpenAI

AIDB92

OpenAI text-to-speech voices via the audio API.

Audio / Speech

AudioProprietary

Compare

Lyria 2

Google DeepMind

AIDB93

Google's professional music generation model.

Music

AudioProprietary

Compare

Chirp 3

Google

AIDB93

High-fidelity expressive TTS voices on Google Cloud.

Audio / Speech

AudioProprietary

Compare

Seamless M4T v2

Meta

AIDB89

Multilingual speech-to-speech and speech-to-text translation.

Audio / Speech

Audio + TextOpen

Compare

MusicGen

Meta

AIDB88

Open text-to-music model from AudioCraft.

Music

AudioOpen

Compare

HeyGen Avatar IV

HeyGen

AIDB86

AI avatar video generator for marketing and training.

Video GenerationAudio / Speech

VideoProprietary

Compare

Cartesia Sonic

Cartesia

AIDB83

Ultra-low-latency state-space TTS model.

Audio / Speech

AudioProprietary

Compare

PlayHT 3.0

PlayHT

AIDB81

Conversational TTS optimised for AI agents.

Audio / Speech

AudioProprietary

Compare

Resemble AI

AIDB85

Voice cloning and real-time speech synthesis platform.

Audio / Speech

AudioProprietary

Compare

Deepgram Nova-3

Deepgram

AIDB85

Production-grade streaming speech-to-text model.

Audio / Speech

AudioProprietary

Compare

AssemblyAI Universal-2

AssemblyAI

AIDB83

Highly accurate speech recognition with rich audio intelligence.

Audio / Speech

AudioProprietary

Compare

Moonshine

Useful Sensors

AIDB86

Open ASR model optimised for real-time edge inference.

Audio / Speech

AudioOpen

Compare

Stable Audio 2.0

Stability AI

AIDB89

Generates full-length audio tracks from text.

Music

AudioProprietary

Compare

Riffusion

AIDB85

AI music generation with vocal & instrumental control.

Music

AudioProprietary

Compare

Duolingo Max

Duolingo

AIDB83

AI-powered language tutoring features.

Text GenerationAudio / Speech

Text + AudioProprietary

Compare

Google Tensor G5

Google

AIDB92

Pixel SoC powering on-device Gemini Nano features.

MultimodalAudio / Speech

On-deviceProprietary

Compare

Samsung Galaxy AI

Samsung

AIDB87

Suite of on-device + cloud AI features for Galaxy phones (translate, edit, summarize).

MultimodalAudio / SpeechImage Generation

HybridProprietary

Compare

BMW Intelligent Personal Assistant

BMW

AIDB84

Voice-first in-car AI assistant integrating Alexa LLM features.

Audio / SpeechAgents

In-vehicleProprietary

Compare

Rabbit R1

Rabbit

AIDB83

Pocket AI device built around the Large Action Model paradigm.

AgentsAudio / Speech

DeviceProprietary

Compare

Meta Ray-Ban (with Meta AI)

Meta

AIDB90

Smart glasses with multimodal Meta AI for live look-and-ask.

MultimodalAudio / Speech

WearableProprietary

Compare

Veritone Public Sector

Veritone

AIDB86

AI for evidence redaction, transcription and investigations for law enforcement.

Audio / SpeechImage Understanding

SaaSProprietary

Compare

Hyundai Pleos

Hyundai Motor

AIDB83

AI-powered software-defined vehicle OS with voice and personalization.

AgentsAudio / Speech

Vehicle OSProprietary

Compare

Zoom AI Companion

Zoom

AIDB87

AI assistant for meeting summaries, chat and email across Zoom.

AgentsAudio / Speech

SaaSProprietary

Compare

Cisco Webex AI Assistant

Cisco

AIDB93

GenAI assistant for meetings, contact center and collaboration.

AgentsAudio / Speech

SaaSProprietary

Compare

Oracle Health Clinical AI Agent

Oracle

AIDB93

Voice-enabled clinical documentation agent for clinicians.

AgentsAudio / Speech

SaaSProprietary

Compare

Amazon Connect AI

AWS

AIDB95

GenAI for contact-center agents, self-service and analytics.

AgentsAudio / Speech

SaaSProprietary

Compare

AWS HealthScribe

AWS

AIDB94

HIPAA-eligible service that generates clinical notes from patient conversations.

Audio / SpeechText Generation

APIProprietary

Compare

Customer Engagement Suite (CCAI)

Google Cloud

AIDB93

Generative contact-center AI for virtual agents, agent assist and insights.

AgentsAudio / Speech

SaaSProprietary

Compare

Gong AI

Gong

AIDB83

Revenue AI for call insights, forecasting and deal execution.

AgentsAudio / Speech

SaaSProprietary

Compare

Twilio CustomerAI

Twilio

AIDB83

GenAI and predictive AI across Twilio messaging, voice and Segment.

AgentsAudio / Speech

PlatformProprietary

Compare

Zendesk QA (Klaus)

Zendesk

AIDB84

AutoQA AI that scores 100% of support conversations across voice and chat.

ReasoningAudio / Speech

SaaSProprietary

Compare

Twilio Voice Intelligence

Twilio

AIDB87

Speech-to-text, summaries and language operators that analyze every call in real time.

Audio / SpeechReasoning

PlatformProprietary

Compare

Twilio AI Assistants

Twilio

AIDB87

Build conversational AI agents over SMS, voice and WhatsApp grounded in Segment data.

AgentsAudio / Speech

PlatformProprietary

Compare

Dragon Copilot

Microsoft / Nuance

AIDB95

Ambient AI scribe for clinicians that drafts notes and orders from doctor-patient conversations.

Audio / SpeechText Generation

SaaSProprietary

Compare

Spotify AI DJ

Spotify

AIDB88

Personalized AI DJ that curates and narrates listening sessions in a realistic voice.

Audio / SpeechAgents

SaaSProprietary

Compare

Reka Core

Reka AI

AIDB81

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

GPT-4o (Aug 2024)

OpenAI

AIDB91

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal

AudioProprietary

Compare

GPT-4o (Nov 2024)

OpenAI

AIDB92

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal

AudioProprietary

Compare

Fugatto 1

NVIDIA

AIDB91

Fugatto is a versatile audio synthesis and transformation model capable of following free-form text instructions with optional audio inputs.

Audio / SpeechMultimodalText Generation

AudioProprietary

Compare

Gemini 2.0 Pro

Google DeepMind

AIDB94

Today, we’re releasing an experimental version of Gemini 2.0 Pro that responds to that feedback.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

GPT-4o (Jan 2025)

OpenAI

AIDB92

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal

AudioProprietary

Compare

Gemini 2.5 Pro (Mar 2025)

Google DeepMind

AIDB92

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

GPT-4o (Mar 2025)

OpenAI

AIDB95

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal

AudioProprietary

Compare

Gemini 2.5 Pro (May 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Gemini 2.5 Pro (Jun 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Gemini 2.5 Deep Think

Google DeepMind

AIDB94

To advance Gemini’s capabilities towards solving hard reasoning problems, we developed a novel reasoning approach, called Deep Think, that naturally blends in parallel thinking techniques during response generation.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Qwen3-Omni-30B-A3B

Alibaba

AIDB87

We present Qwen3-Omni, a single multimodal model that, for the first time, maintains state-of-the-art performance across text, image, audio, and video without any degradation relative to single-modal counterparts.

Audio / SpeechImage GenerationMultimodal

AudioOpen Weights

Compare

Gemini Robotics-ER 1.5

Google DeepMind

AIDB94

Our most capable vision-language model (VLM) reasons about the physical world, natively calls digital tools and creates detailed, multi-step plans to complete a mission.

Audio / SpeechImage GenerationText Generation

AudioProprietary

Compare