Gemma 4 12B
TG AI News·June 3, 2026 at 5:37 PM·
Trusted Source
Related tools:
Google Gemma
Gemma 4 12B
Accepts text, audio, and images with video input. The video length is limited to 30 seconds, and audio to 60 seconds. The model is a reasoning model with 256k context and an Apache 2.0 license.
The most interesting aspect of the release is how multimodality is structured. Typically, multimodal models require a separate encoder, but here they use simple linear projections, which require fewer parameters and computations.
Unfortunately, there is no technical report, so how they managed to train it is currently unclear. I hope that it, like the larger Gemma 4 124B, will eventually be released.