Category · 95 models

Make Pictures

Type a description, get an image — for marketing, design, and ideation.

What it is

You describe what you want in words and the AI paints it. Useful for marketing visuals, product mockups, social posts, or just exploring ideas without hiring a designer for every iteration.

Real-world examples

·Create a hero image for a landing page
·Design a logo concept in five styles
·Generate product photos in different settings
·Illustrate a children's book

What to look for

·How well it follows the prompt
·Whether it can render legible text in images
·Consistency across a series

95 models in this category

Compare

GPT-image-1

OpenAI

AIDB93

Production-grade image generation API with strong text rendering.

Image Generation

ImageProprietary

Compare

DALL·E 3

OpenAI

AIDB91

Prompt-faithful image generator integrated across ChatGPT.

Image Generation

ImageProprietary

Compare

Imagen 4

Google DeepMind

AIDB92

Photoreal image model with sharp typography and detail.

Image Generation

ImageProprietary

Compare

Midjourney v7

Midjourney

AIDB84

Aesthetic-first image model beloved by designers and concept artists.

Image Generation

ImageProprietary

Compare

Stable Diffusion 3.5

Stability AI

AIDB90

Open-weights image generator with strong fine-tuning ecosystem.

Image Generation

ImageOpen Weights

Compare

FLUX.1

Black Forest Labs

AIDB90

State-of-the-art open image model from ex-Stable Diffusion researchers.

Image Generation

ImageOpen Weights

Compare

Ideogram 2.0

Ideogram

AIDB83

Image model specialised in legible in-image typography and logos.

Image Generation

ImageProprietary

Compare

DINOv3

Meta

AIDB92

Self-supervised vision foundation model for image features.

Image UnderstandingEmbeddings

ImageOpen

Compare

Aurora

xAI

AIDB87

Photoreal autoregressive image generation model.

Image Generation

ImageProprietary

Compare

Recraft V3

Recraft

AIDB85

Image model designed for brand & vector-style design assets.

Image Generation

ImageProprietary

Compare

Adobe Firefly Image 4

Adobe

AIDB92

Commercially-safe image model trained on licensed data.

Image Generation

ImageProprietary

Compare

Leonardo Phoenix

Leonardo.Ai

AIDB82

In-house foundation model with strong prompt adherence.

Image Generation

ImageProprietary

Compare

Playground v3

Playground

AIDB83

Image model focused on graphic design and typography.

Image Generation

ImageProprietary

Compare

FLUX.1 Kontext

Black Forest Labs

AIDB88

Image editing model with character & style consistency.

Image Generation

ImageOpen Weights

Compare

HiDream-I1

HiDream

AIDB87

Open 17B image generation model topping benchmarks.

Image Generation

ImageOpen Weights

Compare

Apple Intelligence

Apple

AIDB91

On-device + private cloud generative AI across iPhone, iPad and Mac.

MultimodalAgentsImage Generation

On-deviceProprietary

Compare

Samsung Galaxy AI

Samsung

AIDB87

Suite of on-device + cloud AI features for Galaxy phones (translate, edit, summarize).

MultimodalAudio / SpeechImage Generation

HybridProprietary

Compare

Samsung Gauss2

Samsung

AIDB82

Samsung's in-house generative model family for Galaxy products.

Text GenerationCodeImage Generation

Text + ImageProprietary

Compare

Amazon Nova

AWS

AIDB94

Amazon's foundation model family (text, image, video) on Bedrock.

MultimodalImage GenerationVideo Generation

Text + Image + VideoProprietary

Compare

Adobe Firefly

Adobe

AIDB89

Commercially-safe generative-AI models for image, vector and video.

Image GenerationVideo Generation

Image + VideoProprietary

Compare

Canva Magic Studio

Canva

AIDB87

Suite of AI design tools for image, video, copy and presentations.

Image GenerationText Generation

SaaSProprietary

Compare

Azure OpenAI Service

Microsoft

AIDB93

Enterprise access to GPT, o-series and DALL·E models on Azure.

Text GenerationImage Generation

APIProprietary

Compare

Nano Banana Pro

Google DeepMind

AIDB94

Gemini-powered flagship image generation and editing model with best-in-class text.

Image Generation

ImageProprietary

Compare

Shopify Magic

Shopify

AIDB87

Generative AI across the Shopify admin — product descriptions, emails, blog posts and image edits.

Text GenerationImage Generation

SaaSProprietary

Compare

Cloudflare Workers AI

Cloudflare

AIDB88

Serverless GPU inference platform running open models at the edge.

Text GenerationEmbeddingsImage Generation

PlatformProprietary

Compare

Pinterest Performance+

AIDB87

GenAI ads platform that builds creative and optimizes targeting automatically.

Image GenerationAgents

SaaSProprietary

Compare

Stable Diffusion 3

Stability AI

AIDB87

Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos.

Image Generation

ImageProprietary

Compare

Claude 3 Sonnet

Anthropic

AIDB92

Anthropic's multimodal, language, vision model tracked by Epoch, focused on chat.

CodeImage GenerationMultimodal

ImageProprietary

Compare

Claude 3 Opus

Anthropic

AIDB92

Anthropic's multimodal, language, vision model tracked by Epoch, focused on chat.

CodeImage GenerationMultimodal

ImageProprietary

Compare

GPT-4 Turbo (Apr 2024)

OpenAI

AIDB92

Today, we shared dozens of new additions and improvements, and reduced pricing across many parts of our platform.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

Reka Core

Reka AI

AIDB81

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

VILA1.5-13B

NVIDIA

AIDB90

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Image GenerationMultimodalText Generation

VideoOpen Weights

Compare

Claude 3.5 Sonnet

Anthropic

AIDB94

This addendum to our Claude 3 Model Card describes Claude 3.5 Sonnet, a new model which outperforms our previous most capable model, Claude 3 Opus, while operating faster and at a lower cost.

CodeImage GenerationMultimodal

ImageProprietary

Compare

Ernie 4.0 Turbo

Baidu

AIDB82

Baidu's multimodal, language, vision model tracked by Epoch, focused on vision-language generation.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

SenseChat 5.5

SenseTime

AIDB81

SenseTime's multimodal, language, vision model tracked by Epoch, focused on vision-language generation.

Image GenerationMultimodalReasoning

ImageProprietary

Compare

LLaVA-OV-72B

ByteDance

AIDB83

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series.

Image GenerationMultimodalText Generation

VideoOpen Weights

Compare

GPT-4o (Aug 2024)

OpenAI

AIDB91

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal

AudioProprietary

Compare

Grok-2

xAI

AIDB89

Grok-2 is our frontier language model with state-of-the-art reasoning capabilities.

CodeImage GenerationMultimodal

ImageProprietary

Compare

Oryx 34B

Tsinghua University

AIDB86

Visual data comes in various forms, ranging from small icons of just a few pixels to long videos spanning hours.

3DImage GenerationMultimodal

3DOpen Weights

Compare

PixelDance

ByteDance

AIDB84

PixelDance V1.4 is a video generation model developed by the ByteDance Research team, using the DiT structure.

Image GenerationVideo Generation

VideoProprietary

Compare

Movie Gen Video

Meta

AIDB90

We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio.

Image GenerationVideo Generation

VideoProprietary

Compare

NVLM-X 72B

NVIDIA

AIDB92

NVIDIA's vision, language model tracked by Epoch, focused on language modeling/generation.

CodeImage GenerationReasoning

ImageProprietary

Compare

NVLM-H 72B

NVIDIA

AIDB89

NVIDIA's vision, language model tracked by Epoch, focused on language modeling/generation.

CodeImage GenerationReasoning

ImageProprietary

Compare

NVLM-D 72B

NVIDIA

AIDB87

NVIDIA's vision, language model tracked by Epoch, focused on language modeling/generation.

CodeImage GenerationReasoning

ImageOpen Weights

Compare

SeedEdit

ByteDance

AIDB84

We introduce SeedEdit, a diffusion model that is able to revise a given image with any text prompts.

Image Generation

ImageProprietary

Compare

GPT-4o (Nov 2024)

OpenAI

AIDB92

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal

AudioProprietary

Compare

Amazon Nova Pro

Amazon

AIDB93

A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.

CodeImage GenerationMultimodal

VideoProprietary

Compare

NVILA 15B

NVIDIA

AIDB91

Visual language models (VLMs) have made significant advances in accuracy in recent years.

Image GenerationMultimodalText Generation

VideoOpen Weights

Compare

Infinity

ByteDance

AIDB85

We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution, photorealistic images following language instruction.

Image Generation

ImageOpen Weights

Compare

Sora Turbo

OpenAI

AIDB92

Our video generation model is rolling out at sora.com⁠.

Image GenerationVideo Generation

VideoProprietary

Compare

Gemini 2.0 Pro

Google DeepMind

AIDB94

Today, we’re releasing an experimental version of Gemini 2.0 Pro that responds to that feedback.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Veo 2

Google DeepMind

AIDB93

Google DeepMind's video, vision model tracked by Epoch, focused on video generation.

Image GenerationVideo Generation

VideoProprietary

Compare

Kimi k1.5

Moonshot AI

AIDB84

Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data.

CodeImage GenerationMultimodal

ImageProprietary

Compare

GPT-4o (Jan 2025)

OpenAI

AIDB92

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal

AudioProprietary

Compare

Grok 3

xAI

AIDB93

We are pleased to introduce Grok 3, our most advanced model yet: blending strong reasoning with extensive pretraining knowledge.

CodeImage GenerationMultimodal

ImageProprietary

Compare

GPT-4.5

OpenAI

AIDB92

We advance AI capabilities by scaling two complementary paradigms: unsupervised learning and reasoning.

CodeImage GenerationMultimodal

ImageProprietary

Compare

Mistral OCR

Mistral AI

AIDB88

Mistral OCR is an Optical Character Recognition API that sets a new standard in document understanding.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

ERNIE-4.5-VL-424B-A47B (文心大模型4.5)

Baidu

AIDB85

In this report, we introduce ERNIE 4.5, a new family of large-scale multimodal models comprising 10 distinct variants.

CodeImage GenerationMultimodal

VideoOpen Weights

Compare

Gemini 2.5 Pro (Mar 2025)

Google DeepMind

AIDB92

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

GPT-4o (Mar 2025)

OpenAI

AIDB95

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Audio / SpeechImage GenerationMultimodal

AudioProprietary

Compare

Llama 4 Scout

Meta

AIDB91

We’re sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

CodeImage GenerationMultimodal

ImageOpen Weights

Compare

Llama 4 Maverick

Meta

AIDB90

We’re sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

CodeImage GenerationMultimodal

ImageOpen Weights

Compare

Llama 4 Behemoth (preview)

Meta

AIDB92

We’re sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

CodeImage GenerationMultimodal

ImageProprietary

Compare

Gemini 2.5 Pro (May 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Seed1.5-VL

ByteDance

AIDB86

We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning.

Image GenerationMultimodalText Generation

VideoProprietary

Compare

Claude Sonnet 4

Anthropic

AIDB89

Claude Sonnet 4 can understand nuanced instructions and context, recognize and correct its own mistakes, and create sophisticated analysis and insights from complex data.

AgentsCodeImage Generation

ImageProprietary

Compare

Gemini 2.5 Pro (Jun 2025)

Google DeepMind

AIDB95

Gemini 2.5 Pro Experimental is our most advanced model for complex tasks.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Seed-1.6-Thinking

ByteDance

AIDB83

Seed1.6 is the latest general-purpose model series unveiled by the ByteDance Seed team.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

Gemini 2.5 Deep Think

Google DeepMind

AIDB94

To advance Gemini’s capabilities towards solving hard reasoning problems, we developed a novel reasoning approach, called Deep Think, that naturally blends in parallel thinking techniques during response generation.

Audio / SpeechCodeImage Generation

AudioProprietary

Compare

Qwen Image

Alibaba

AIDB88

We present Qwen-Image, an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing.

Image Generation

ImageOpen Weights

Compare

Claude Opus 4.1

Anthropic

AIDB92

Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning.

AgentsCodeImage Generation

ImageProprietary

Compare

GPT-5 nano

OpenAI

AIDB94

OpenAI's multimodal, language, vision model tracked by Epoch, focused on language modeling/generation.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

GPT-5 mini

OpenAI

AIDB93

OpenAI's multimodal, language, vision model tracked by Epoch, focused on language modeling/generation.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

Gemini 2.5 Flash Image (Nano Banana)

Google

AIDB94

Text-to-Image: Generate high-quality images from simple or complex text descriptions.

Image Generation

ImageProprietary

Compare

Qwen3-Omni-30B-A3B

Alibaba

AIDB87

We present Qwen3-Omni, a single multimodal model that, for the first time, maintains state-of-the-art performance across text, image, audio, and video without any degradation relative to single-modal counterparts.

Audio / SpeechImage GenerationMultimodal

AudioOpen Weights

Compare

Gemini Robotics-ER 1.5

Google DeepMind

AIDB94

Our most capable vision-language model (VLM) reasons about the physical world, natively calls digital tools and creates detailed, multi-step plans to complete a mission.

Audio / SpeechImage GenerationText Generation

AudioProprietary

Compare

GPT-5 Pro

OpenAI

AIDB92

OpenAI's multimodal, language, vision model tracked by Epoch, focused on language modeling/generation.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

Veo 3.1

Google DeepMind

AIDB94

We’re also introducing Veo 3.1, which brings richer audio, more narrative control, and enhanced realism that captures true-to-life textures.

Image GenerationVideo Generation

VideoProprietary

Compare

GPT-5.1 Instant

OpenAI

AIDB91

"Today we’re upgrading the GPT‑5 series with the release of: GPT‑5.1 Instant: our most-used model, now warmer, more intelligent, and better at following your instructions.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

Gemini 3 Pro Image (Nano Banana Pro)

Google DeepMind

AIDB93

Today, we’re introducing Nano Banana Pro (Gemini 3 Pro Image), our new state-of-the art image generation and editing model.

Image Generation

ImageProprietary

Compare

GPT-5.2 Pro

OpenAI

AIDB92

OpenAI's multimodal, language, vision model tracked by Epoch, focused on language modeling/generation.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

HyperCLOVA X SEED 32B Think

NAVER

AIDB84

Developed by Naver, South Korea’s leading AI research lab, this cutting-edge language model supports multimodal inputs and advanced reasoning.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

Seedance 2.0

ByteDance

AIDB82

ByteDance's image generation, video, audio model tracked by Epoch, focused on video generation.

Audio / SpeechImage GenerationVideo Generation

AudioProprietary

Compare

Qwen3.5 397B-A17B

Alibaba

AIDB87

We are delighted to announce the official release of Qwen3.5, introducing the open-weight of the first model in the Qwen3.5 series, namely Qwen3.5-397B-A17B.

Image GenerationText Generation

ImageOpen Weights

Compare

Gemini 3.1 Pro

Google DeepMind

AIDB94

Last week, we released a major update to Gemini 3 Deep Think to solve modern challenges across science, research and engineering.

Image GenerationText Generation

ImageProprietary

Compare

GPT-5.4 Pro

OpenAI

AIDB94

OpenAI's multimodal, language, vision model tracked by Epoch, focused on language modeling/generation.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

GPT-5.4

OpenAI

AIDB92

OpenAI's multimodal, language, vision model tracked by Epoch, focused on language modeling/generation.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

GPT Image 2

OpenAI

AIDB91

OpenAI's image generation model tracked by Epoch, focused on image generation.

Image Generation

ImageProprietary

Compare

GPT-5.5 Pro

OpenAI

AIDB91

OpenAI's multimodal, language, vision model tracked by Epoch, focused on language modeling/generation.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

GPT-5.5

OpenAI

AIDB92

OpenAI's multimodal, language, vision model tracked by Epoch, focused on language modeling/generation.

Image GenerationMultimodalText Generation

ImageProprietary

Compare

Krea 1

Krea AI

AIDB81

Krea's in-house image model tuned for aesthetic control and real-time iteration.

Image Generation

ImageProprietary

Compare

Mirage

Decart

AIDB83

Real-time generative world model that re-skins live video streams with text prompts.

Video GenerationImage Generation

VideoProprietary

Compare

Lens

Microsoft

AIDB93

Microsoft's open-source 3.8B text-to-image model focused on efficient training, fast high-res generation, and strong prompt adherence.

Image Generation

ImageOpen Weights

Compare

Gemini Omni Flash

Google DeepMind

AIDB94

Google DeepMind's closed-source multimodal video creation and editing model that generates or edits video from text, image, video, and audio references.

Video GenerationImage GenerationMultimodal

Text + Image + Video + AudioProprietary

Compare