Pruna AI Raises $6.5M to Make AI Models Leaner, Faster

Pruna AI Raises $6.5M to Make AI Models Leaner, Faster Pruna AI Raises $6.5M to Make AI Models Leaner, Faster
IMAGE CREDITS: EQT GROUP

European AI model compression startup Pruna AI is opening up its technology to the global developer community, unveiling an open-source framework designed to tackle one of AI’s biggest challenges—model bloat.

As AI models grow larger and more expensive to operate, developers are increasingly searching for smarter ways to reduce size and cost without hurting performance. That’s where Pruna AI’s compression framework steps in—bringing together multiple efficiency techniques under one roof.

Rather than focusing on a single method, Pruna’s system combines advanced tools like pruning, quantization, knowledge distillation, and caching into a unified platform. The result? Developers can shrink models significantly while maintaining control over quality and speed.

“AI models shouldn’t be black boxes when it comes to efficiency,” said co-founder and CTO John Rachwan. “We’ve built a system where developers can not only apply multiple optimization methods but also measure the impact of each one in real time.”

What makes Pruna’s platform stand out is its ability to balance performance with accuracy. Developers can experiment, adjust, and even reverse changes if the compression process threatens model quality—a common risk when using isolated optimization tools.

Big AI players like OpenAI have long used similar techniques behind closed doors to streamline their models. It’s how systems like GPT-4 Turbo likely deliver faster performance while keeping costs down. But until now, smaller teams lacked access to a full-stack optimization tool that could compete.

“There’s no shortage of one-off solutions out there,” Rachwan explained. “You might find a quantization library for language models or a caching tool for image generation. But what was missing is a framework that connects everything, standardizes the process, and simplifies evaluation. That’s what we’re offering.”

While Pruna AI’s system is compatible with a wide range of AI models—from large language models (LLMs) to computer vision and audio systems—the team is currently doubling down on image and video generation models, which are among the most resource-hungry.

Some early adopters, including companies like Scenario and PhotoRoom, have already integrated the framework to improve efficiency in production environments.

For larger organizations, Pruna AI is also rolling out premium features, including an automated optimization agent designed to make compression effortless. With this tool, developers simply set their priorities—speed, size, or accuracy—and the agent tests combinations behind the scenes to deliver the best outcome.

“Imagine telling the system, ‘I want to speed this model up but keep accuracy within 2%,’ and the tool does the rest,” Rachwan shared. “It’s like renting a GPU, but instead of raw power, you’re renting optimization expertise.”

Pruna’s approach is already proving effective. In one test case, the startup managed to compress a Llama model down to just 12.5% of its original size while keeping performance largely intact—a breakthrough that could help AI teams slash inference costs.

The company operates on a usage-based model, charging customers hourly—similar to how cloud providers price GPU usage. The founders believe the long-term savings will far outweigh the upfront costs, especially for businesses running AI models at scale.

After raising $6.5 million in seed funding from investors like EQT Ventures, Daphni, Motier Ventures, and Kima Ventures, Pruna AI is now betting that model compression will become essential infrastructure for the AI industry.

“Our goal is simple,” said Rachwan. “We want teams to stop thinking of compression as optional. In a world where every model run costs money, efficient AI is the future.”

By releasing its framework to the open-source community, Pruna AI hopes to kickstart a shift in how developers think about model optimization—making compression a core part of the AI development pipeline, not an afterthought.

Share with others

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Service

Follow us