Category · 59 models

Real-Time Call Transcription & Summary

Live captions, speaker labels, and an action-item digest by the time the call ends.

What it is

Used for sales, support and meetings: streaming ASR with diarization, plus an LLM pass that extracts decisions, owners and next steps.

Real-world examples

  • ·Auto-summarize a Zoom sales call into the CRM
  • ·Caption a multilingual all-hands in real time
  • ·Extract objections from 200 support calls per day

What to look for

  • ·Sub-300ms streaming latency
  • ·Diarization + accent robustness
  • ·Native CRM / helpdesk integration

59 models in this category

GPT-4o

OpenAI

AIDB91

Real-time omni-model handling text, vision and voice in a single network.

MultimodalAudio / SpeechImage Understanding
Text + Image + AudioProprietary

Veo 3

Google DeepMind

AIDB93

High-fidelity video generation with native synchronised audio.

Video GenerationAudio / Speech
Video + AudioProprietary

Whisper v3

OpenAI

AIDB91

Open multilingual speech recognition and translation model.

Audio / Speech
AudioOpen

ElevenLabs v3

ElevenLabs

AIDB88

Best-in-class expressive TTS and voice cloning across 70+ languages.

Audio / Speech
AudioProprietary

NotebookLM

Google

AIDB93

Source-grounded research assistant with audio overviews.

Text GenerationAudio / Speech
Text + AudioProprietary

TTS-1 / GPT-4o Voice

OpenAI

AIDB92

OpenAI text-to-speech voices via the audio API.

Audio / Speech
AudioProprietary

Chirp 3

Google

AIDB93

High-fidelity expressive TTS voices on Google Cloud.

Audio / Speech
AudioProprietary

Seamless M4T v2

Meta

AIDB89

Multilingual speech-to-speech and speech-to-text translation.

Audio / Speech
Audio + TextOpen

HeyGen Avatar IV

HeyGen

AIDB86

AI avatar video generator for marketing and training.

Video GenerationAudio / Speech
VideoProprietary

Cartesia Sonic

Cartesia

AIDB83

Ultra-low-latency state-space TTS model.

Audio / Speech
AudioProprietary

PlayHT 3.0

PlayHT

AIDB81

Conversational TTS optimised for AI agents.

Audio / Speech
AudioProprietary

Resemble AI

Resemble AI

AIDB85

Voice cloning and real-time speech synthesis platform.

Audio / Speech
AudioProprietary

Deepgram Nova-3

Deepgram

AIDB85

Production-grade streaming speech-to-text model.

Audio / Speech
AudioProprietary

AssemblyAI Universal-2

AssemblyAI

AIDB83

Highly accurate speech recognition with rich audio intelligence.

Audio / Speech
AudioProprietary

Moonshine

Useful Sensors

AIDB86

Open ASR model optimised for real-time edge inference.

Audio / Speech
AudioOpen

Duolingo Max

Duolingo

AIDB83

AI-powered language tutoring features.

Text GenerationAudio / Speech
Text + AudioProprietary

Google Tensor G5

Google

AIDB92

Pixel SoC powering on-device Gemini Nano features.

MultimodalAudio / Speech
On-deviceProprietary

Samsung Galaxy AI

Samsung

AIDB87

Suite of on-device + cloud AI features for Galaxy phones (translate, edit, summarize).

MultimodalAudio / SpeechImage Generation
HybridProprietary

BMW Intelligent Personal Assistant

BMW

AIDB84

Voice-first in-car AI assistant integrating Alexa LLM features.

Audio / SpeechAgents
In-vehicleProprietary

Rabbit R1

Rabbit

AIDB83

Pocket AI device built around the Large Action Model paradigm.

AgentsAudio / Speech
DeviceProprietary

Meta Ray-Ban (with Meta AI)

Meta

AIDB90

Smart glasses with multimodal Meta AI for live look-and-ask.

MultimodalAudio / Speech
WearableProprietary

Veritone Public Sector

Veritone

AIDB86

AI for evidence redaction, transcription and investigations for law enforcement.

Audio / SpeechImage Understanding
SaaSProprietary

Hyundai Pleos

Hyundai Motor

AIDB83

AI-powered software-defined vehicle OS with voice and personalization.

AgentsAudio / Speech
Vehicle OSProprietary

Zoom AI Companion

Zoom

AIDB87

AI assistant for meeting summaries, chat and email across Zoom.

AgentsAudio / Speech
SaaSProprietary

Cisco Webex AI Assistant

Cisco

AIDB93

GenAI assistant for meetings, contact center and collaboration.

AgentsAudio / Speech
SaaSProprietary

Oracle Health Clinical AI Agent

Oracle

AIDB93

Voice-enabled clinical documentation agent for clinicians.

AgentsAudio / Speech
SaaSProprietary

Amazon Connect AI

AWS

AIDB95

GenAI for contact-center agents, self-service and analytics.

AgentsAudio / Speech
SaaSProprietary

AWS HealthScribe

AWS

AIDB94

HIPAA-eligible service that generates clinical notes from patient conversations.

Audio / SpeechText Generation
APIProprietary

Customer Engagement Suite (CCAI)

Google Cloud

AIDB93

Generative contact-center AI for virtual agents, agent assist and insights.

AgentsAudio / Speech
SaaSProprietary

Gong AI

Gong

AIDB83

Revenue AI for call insights, forecasting and deal execution.

AgentsAudio / Speech
SaaSProprietary

Twilio CustomerAI

Twilio

AIDB83

GenAI and predictive AI across Twilio messaging, voice and Segment.

AgentsAudio / Speech
PlatformProprietary

Zendesk QA (Klaus)

Zendesk

AIDB84

AutoQA AI that scores 100% of support conversations across voice and chat.

ReasoningAudio / Speech
SaaSProprietary

Twilio Voice Intelligence

Twilio

AIDB87

Speech-to-text, summaries and language operators that analyze every call in real time.

Audio / SpeechReasoning
PlatformProprietary

Twilio AI Assistants

Twilio

AIDB87

Build conversational AI agents over SMS, voice and WhatsApp grounded in Segment data.

AgentsAudio / Speech
PlatformProprietary

Dragon Copilot

Microsoft / Nuance

AIDB95

Ambient AI scribe for clinicians that drafts notes and orders from doctor-patient conversations.

Audio / SpeechText Generation
SaaSProprietary

Spotify AI DJ

Spotify

AIDB88

Personalized AI DJ that curates and narrates listening sessions in a realistic voice.

Audio / SpeechAgents
SaaSProprietary

Reka Core

Reka AI

AIDB81

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka.

Audio / SpeechCodeImage Generation
AudioProprietary

GPT-4o (Aug 2024)

OpenAI

AIDB91

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal
AudioProprietary

GPT-4o (Nov 2024)

OpenAI

AIDB92

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal
AudioProprietary

Fugatto 1

NVIDIA

AIDB91

Fugatto is a versatile audio synthesis and transformation model capable of following free-form text instructions with optional audio inputs.

Audio / SpeechMultimodalText Generation
AudioProprietary

Gemini 2.0 Pro

Google DeepMind

AIDB94

Today, we’re releasing an experimental version of Gemini 2.0 Pro that responds to that feedback.

Audio / SpeechCodeImage Generation
AudioProprietary

GPT-4o (Jan 2025)

OpenAI

AIDB92

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal
AudioProprietary

Gemini 2.5 Pro (Mar 2025)

Google DeepMind

AIDB92

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation
AudioProprietary

GPT-4o (Mar 2025)

OpenAI

AIDB95

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal
AudioProprietary

Gemini 2.5 Pro (May 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation
AudioProprietary

Gemini 2.5 Pro (Jun 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation
AudioProprietary

Gemini 2.5 Deep Think

Google DeepMind

AIDB94

To advance Gemini’s capabilities towards solving hard reasoning problems, we developed a novel reasoning approach, called Deep Think, that naturally blends in parallel thinking techniques during response generation.

Audio / SpeechCodeImage Generation
AudioProprietary

Qwen3-Omni-30B-A3B

Alibaba

AIDB87

We present Qwen3-Omni, a single multimodal model that, for the first time, maintains state-of-the-art performance across text, image, audio, and video without any degradation relative to single-modal counterparts.

Audio / SpeechImage GenerationMultimodal
AudioOpen Weights

Gemini Robotics-ER 1.5

Google DeepMind

AIDB94

Our most capable vision-language model (VLM) reasons about the physical world, natively calls digital tools and creates detailed, multi-step plans to complete a mission.

Audio / SpeechImage GenerationText Generation
AudioProprietary

Seedance 2.0

ByteDance

AIDB82

ByteDance's image generation, video, audio model tracked by Epoch, focused on video generation.

Audio / SpeechImage GenerationVideo Generation
AudioProprietary

Gemini Flash 3.1 TTS

Google DeepMind

AIDB94

Google DeepMind's audio model tracked by Epoch, focused on audio generation.

Audio / Speech
AudioProprietary

Hume EVI 3

Hume AI

AIDB84

Empathic voice interface that perceives and generates emotional speech in real time.

Audio / SpeechMultimodal
VoiceProprietary

Pi 3.0

Inflection AI

AIDB86

Inflection's empathetic conversational assistant tuned for personal, supportive dialogue.

Text GenerationAudio / Speech
Text + VoiceProprietary

Stable Audio 3 Medium

Stability AI

AIDB87

Stability AI's 2B text-to-audio diffusion model for higher-capacity music, sound-effect generation, and audio editing.

MusicAudio / Speech
AudioProprietary

Qwen3.5 LiveTranslate Flash

Alibaba

AIDB92

Alibaba's vision-enhanced real-time audio/video translation model for live multilingual interpretation across 60 languages.

Audio / SpeechMultimodal
Audio + Video + TextProprietary

Mirelo SFX 1.6

Mirelo AI

AIDB82

Mirelo's text-to-sound-effects model for production-ready Foley, ambience, and SFX generation.

Audio / Speech
AudioProprietary

WavFlow

Meta

AIDB92

Meta's audio generation model focused on high-fidelity waveform synthesis and speech-music co-generation.

Audio / SpeechMusic
AudioOpen Weights

Dramabox

Resemble AI

AIDB85

Resemble AI's expressive multi-character voice acting model for long-form dramatic dialogue and narration.

Audio / Speech
AudioProprietary

Stable Audio 3 Small SFX

Stability AI

AIDB89

Stability AI's compact text-to-sound-effects diffusion model optimized for low-latency on-device SFX generation.

Audio / Speech
AudioOpen Weights

Explore other categories