Bonsai Image 4B — Wonders of Quantization | AI News

The startup PrismML, specializing in extreme model compression, has created a quantized FLUX.2 Klein 4B down to one bit, which turned out surprisingly well. With this level of quantization, the Diffusion Transformer occupies only 930 megabytes in the 1-bit version and 1.2 gigabytes in the ternary version. The text encoder could not be compressed as much, so the entire set weighs about 3.5 gigabytes. Such quantization allows the model to run directly in the browser and on phones, using only 2 gigabytes of RAM. Generating a 512x512 image on the iPhone 17 Pro Max with this model takes 9.4 seconds at 4 steps, which is decent considering the offloading fact. We are waiting for larger models for local deployment.

AI Tools Mentioned