ERNIE Image — a new open text2image generator from Baidu
TG AI News·April 14, 2026 at 9:05 PM·
Trusted Source
Related tools:
ERNIE Image
ERNIE Image — a new open text2image generator from Baidu
A fairly compact 8B model competes quite successfully with the significantly larger Qwen Image on benchmarks, and also surpasses Z-image on benchmarks.
It renders text surprisingly well for its size and resolution (1MP).
Architecturally, it is a single stream MM-DiT. ERNIE Image throws text tokens and image patches into a common transformer from the very beginning — no parallel branches (like in Flux), all weights are shared. This is simpler and more compact, while the quality is comparable. Architecturally, it resembles Z-image, but is simpler.
Among interesting nuances — the authors tuned a 3B LLM for rephrasing user prompts, which noticeably improves results, but the model can also be run without it. Along with the regular weights, they release a Turbo version, which requires only 8 steps for generation.
The model runs on 24GB VRAM, and the weights are under Apache 2.0 (you can do anything with them).
Let's test it, I have already deployed it on my H200 and the model is indeed good.