Liquid AI has launched LFM2-VL, a new family of vision-language foundation models designed for efficient deployment on smartphones, laptops, wearables, and embedded systems. The models promise up to twice the GPU inference speed of comparable vision-language models while maintaining competitive accuracy, addressing the growing demand for on-device AI that can process both text and images without relying on cloud infrastructure.
What you should know: LFM2-VL represents a significant step toward making multimodal AI accessible for resource-constrained devices through architectural innovations that prioritize efficiency.
- The models can process images at native resolutions up to 512×512 pixels without distortion, using smart patching for larger images that preserves both fine detail and global context.
- Two variants are available: LFM2-VL-450M with less than half a billion parameters for highly constrained environments, and LFM2-VL-1.6B for more capable single-GPU deployment.
- Both models are built on Liquid AI’s foundation architecture based on dynamical systems and signal processing principles, moving beyond traditional transformer models.
How it works: The models use a modular architecture combining multiple specialized components to achieve their efficiency gains.
- LFM2-VL integrates a language model backbone, SigLIP2 NaFlex vision encoder, and a multimodal projector with pixel unshuffle technology that reduces image tokens and improves throughput.
- Users can adjust parameters like maximum image tokens or patches to balance speed and quality for specific deployment scenarios.
- Training involved approximately 100 billion multimodal tokens from open datasets and synthetic data.
Performance benchmarks: LFM2-VL achieves competitive results across vision-language evaluations while delivering superior processing speeds.
- The 1.6B model scores 65.23 in RealWorldQA, 58.68 in InfoVQA, and 742 in OCRBench, maintaining solid performance in multimodal reasoning tasks.
- In inference testing, LFM2-VL achieved the fastest GPU processing times in its class when tested on a standard workload of a 1024×1024 image with short prompt.
The bigger picture: This launch builds on Liquid AI’s broader strategy to decentralize AI execution and reduce cloud dependency.
- In July 2025, the company launched the Liquid Edge AI Platform (LEAP), a cross-platform SDK enabling developers to run small language models directly on mobile devices.
- LEAP offers OS-agnostic support for iOS and Android with models as small as 300MB, accompanied by Apollo, a companion app for offline model testing.
- The approach reflects growing industry interest in privacy-preserving, low-latency AI that operates independently of internet connectivity.
What they’re saying: Liquid AI co-founder and CEO Ramin Hasani emphasized the company’s core value proposition in announcing the release.
- “Efficiency is our product,” Hasani wrote on X, highlighting the models’ “up to 2× faster on GPU with competitive accuracy” and “smart patching for big images.”
Company background: Liquid AI was founded by former MIT CSAIL researchers focused on building alternatives to transformer-based architectures.
- The company’s Liquid Foundation Models are based on principles from dynamical systems, signal processing, and numerical linear algebra.
- Their approach aims to deliver competitive performance using significantly fewer computational resources while enabling real-time adaptability during inference.
Availability and licensing: The models are immediately accessible through standard development channels with custom licensing terms.
- LFM2-VL models are available on Hugging Face with example fine-tuning code in Colab, compatible with Hugging Face transformers and TRL.
- They’re released under a custom “LFM1.0 license” based on Apache 2.0 principles, with commercial use permitted under different terms for companies above and below $10 million in annual revenue.
Liquid AI wants to give smartphones small, fast AI that can see with new LFM2-VL model