ERNIE Image — a new open text2image generator from Baidu

TG AI News·April 14, 2026 at 9:05 PM·
Trusted Source
Related tools:
ERNIE Image
ERNIE Image — a new open text2image generator from Baidu A fairly compact 8B model competes quite successfully with the significantly larger Qwen Image on benchmarks, and also surpasses Z-image on benchmarks. It renders text surprisingly well for its size and resolution (1MP). Architecturally, it is a single stream MM-DiT. ERNIE Image throws text tokens and image patches into a common transformer from the very beginning — no parallel branches (like in Flux), all weights are shared. This is simpler and more compact, while the quality is comparable. Architecturally, it resembles Z-image, but is simpler. Among interesting nuances — the authors tuned a 3B LLM for rephrasing user prompts, which noticeably improves results, but the model can also be run without it. Along with the regular weights, they release a Turbo version, which requires only 8 steps for generation. The model runs on 24GB VRAM, and the weights are under Apache 2.0 (you can do anything with them). Let's test it, I have already deployed it on my H200 and the model is indeed good.
ERNIE Image — a new open text2image generator from Baidu | AI News | AIventa