Ideogram 4.0 — The Typography King Just Went Open-Weight, and It Was Trained on JSON, Not Captions

Ideogram 4.0 sample images showing typography, posters and photoreal design work
Ideogram 4.0 was built for designers first — typography, layout and brand work, not just pretty pictures. Source: Ideogram

Hook

The model that quietly became the best in the world at putting readable text inside an image just did something nobody saw coming: it went open-weight. On June 3, 2026, Ideogram shipped Ideogram 4.0 — a 9.3B-parameter diffusion transformer, trained from scratch, with the weights on Hugging Face. And the twist that matters for creatives: it wasn’t trained on captions. It was trained on JSON.

The Story

Ideogram has always been the typography king. While the rest of the field was still rendering “LOST” instead of “LOST” on a movie poster, Ideogram nailed signage, logos, and multi-line copy. The catch was that it lived behind a closed API. Version 4.0 changes the deal entirely — Ideogram’s first open-weight foundation model, released with fp8 and nf4 checkpoints (the nf4 build fits on a single 24 GB GPU).

Under the hood it’s a 34-layer, fully single-stream DiT with an embedding dimension of 4,608 and a Qwen3-VL-8B language model doing the text encoding. It generates at native 2K resolution — up to 2,048 px per side, aspect ratios out to 6:1 — with no separate upscaling pass. It trains via flow matching and samples in as few as 12 steps. Day-zero, ComfyUI already added support (a dedicated Ideogram4Scheduler node landed in v0.24.0), so you can wire it into a node graph right now.

But the real story is the caption format. Ideogram 4.0 was trained exclusively on structured JSON — every training image described not as a sentence but as a document: per-element text strings, separate styling descriptions, optional bounding boxes in normalized [y_min, x_min, y_max, x_max] coordinates, and color palettes of up to 16 hex values (5 per element). That single design choice is why it reasons about a layout the way a designer does, not the way a sentence does.

Ideogram 4.0 poster output with clean multi-line typography
Multi-line typography, font weights, and logos rendered cleanly — Ideogram 4.0 scores 0.97 OCR accuracy on in-image text. Source: Ideogram

Why You Should Care

If you’ve ever fought an image model to spell a brand name, place a headline in the top-third, and keep your palette on-brand — this is the model built for exactly that fight. The benchmarks back it up: 0.97 OCR accuracy on in-image text (X-Omni), 0.69 mIoU on layout control (7Bench), and a designer-preference ELO of 1,062 — second overall, and first among every open-weight model, behind only closed models from OpenAI and Google.

For the IK3D crowd, the JSON interface is the unlock. A bounding box plus a literal text string plus a hex palette is a programmable brief. You can template it, loop it over a product catalog, or generate a hundred on-brand variations from one schema — the kind of repeatable, art-directable pipeline that prompt-spaghetti never gave us. Pair it with a ComfyUI graph and you have a local, self-hosted design engine that actually respects composition.

Ideogram 4.0 composition control example placing subjects via bounding boxes
Composition control: objects placed via normalized bounding-box coordinates, not hope. Source: Ideogram

Try It / Follow Them

The honest asterisk first: “open-weight” here means the Ideogram 4 Non-Commercial license. The weights are public and gated on Hugging Face (ideogram-ai/ideogram-4-nf4 and ideogram-ai/ideogram-4-fp8), free to study, fine-tune, and tinker with — but client work still routes through the paid API. It’s a research-and-learning release, not a commercial freebie. Know that before you build a business on it.

To run it: grab the repo at github.com/ideogram-oss/ideogram4, pip install ., and call run_inference.py with the nf4 quantization on a 24 GB card. Don’t fancy hand-writing JSON? The free Magic Prompt API auto-expands plain text into the structured format. Or just pull it into ComfyUI and drive it from a graph. Full technical write-up is on the Ideogram blog.

DesignArena leaderboard showing Ideogram 4.0 at the top of open-weight models
Ideogram 4.0 topped the DesignArena open-weight leaderboard on launch day. Source: Ideogram

IK3D Lab Take

We’ve watched a parade of open image models chase photorealism. Ideogram 4.0 chases something more useful to people who ship: control. Training on JSON instead of prose is the quietly radical move here — it treats an image as a structured document with addressable parts, which is exactly how a designer’s brain works and exactly what a pipeline needs. The non-commercial license stings, and we won’t pretend otherwise. But a 24 GB-friendly, layout-aware, typography-crushing model you can run locally and script against is a genuine gift to the creative-tech toolbox. Download it, feed it a JSON brief, and watch it put the text where you told it to. That alone makes it one of the most interesting drops of the summer.

Sharing is caring!

Leave a Reply

Your email address will not be published. Required fields are marked *