TheStage AI, a Delaware-based startup founded by former Huawei engineers, has raised $4.5 million in funding to accelerate development of its AI inference platform. The company, which focuses on automated performance optimisation for diffusion models, is pushing the boundaries of generative AI speed with NVIDIA’s latest B200 GPUs through a strategic collaboration with Nebius AI Cloud.
In closed beta tests, TheStage AI reports a 3.5× performance boost in diffusion model inference compared to older hardware, thanks to its automated compiler optimisations. According to CEO Kirill Solodskih, this acceleration enables their FLUX.1 model to reach 22.5 iterations per second on the B200 GPU, compared to 6.5 on the H100 using standard PyTorch bf16 setups.
“Other platforms are still manually writing kernels for GPU support. Our automated approach allows instant adaptation to new architectures,” said Solodskih.
Unlocking Speed With Automated Optimisation
TheStage AI stands out with its automated AI compiler built for next-gen GPU architectures. Unlike other providers still adapting manually, TheStage AI delivers plug-and-play optimised models via Hugging Face, giving developers instant access to accelerated inference performance.
Its FLUX.1-schnell model can now generate 1024×1024 images in just 0.3 seconds—cutting previous speeds in half. Meanwhile, its more advanced FLUX-dev model completes the same task in 1.85 seconds, outperforming other solutions averaging 3.1 seconds.
These gains have real-world applications, especially as diffusion models power everything from AI art generators to product design tools. TheStage AI’s technology reduces latency, operational cost, and energy use while boosting UX—a vital edge in a saturated market.
Partnership With Nebius AI Cloud
TheStage AI’s success is tightly linked to its early partnership with Nebius, a major AI cloud infrastructure provider. Nebius is one of the first platforms to deploy NVIDIA Blackwell Ultra-powered instances in the US and Europe, including GB200 NVL72 and HGX B200 Superchips.
Together, the two companies report up to 3.5× latency reductions when combining TheStage’s software optimisations with B200 hardware. “This partnership shows the real-world impact of pairing next-gen GPUs with advanced software. It’s a leap forward for scalable diffusion model deployment,” said Aleksandr Patrushev, Head of ML/AI Product at Nebius.
Vision: From Text-to-Image to Text-to-Video
With this funding, TheStage AI plans to expand from text-to-image applications to text-to-video generation and large language models (LLMs). Solodskih notes their approach is built around three pillars: rapid hardware adaptation, broad model applicability, and a scalable roadmap.
The company is already in talks with partners building text-to-video platforms and sees its flexible, pre-compiled models becoming a standard tool for AI developers looking to cut costs and enhance performance at scale.
“We’re building an ecosystem where top-tier performance is accessible, adaptable, and hardware-agnostic,” Solodskih added.
By combining automated acceleration, seamless integration, and next-gen hardware readiness, TheStage AI is positioned to become a leading enabler of high-speed, cost-efficient generative AI.