Two undergraduates with limited AI backgrounds have developed an open-source AI voice model capable of generating podcast-like conversations, offering a new challenge to existing players in the voice synthesis market. Their creation, called Dia, was inspired by Google’s NotebookLM and provides users with notable flexibility and control over synthetic voice generation. Developed by Toby Kim and a fellow co-founder under the name Nari Labs.
Dia offers capabilities typically found in commercial products, including tone customization, disfluencies (like coughs and laughs), and even voice cloning. Despite entering the AI field just three months ago, the duo has created a tool that’s already turning heads for its ease of use and surprisingly competitive performance.
The voice AI space is expanding rapidly. Major names like ElevenLabs dominate the landscape, but startups like PlayAI, Sesame, and now Nari Labs are entering the scene with fresh ideas. Venture capitalists are taking notice — according to PitchBook, voice AI startups attracted more than $398 million in funding in the last year alone.
Building Dia: Ambitions, Capabilities, and Accessibility
According to Kim, the team trained Dia using Google’s TPU Research Cloud, a program that grants free access to advanced AI chips for researchers. The model clocks in at 1.6 billion parameters — a size that strikes a balance between quality output and accessibility on consumer hardware. By comparison, many commercial-grade models remain inaccessible to the public either due to resource demands or closed licenses.
Available now on both Hugging Face and GitHub, Dia is designed to run on most PCs with at least 10GB of VRAM, making it accessible to hobbyists, students, and independent developers. The model defaults to generating a random voice when no prompt is given, but users can guide the output by providing a detailed style description. For more precise results, users can clone voices — a feature that has been made incredibly simple in Dia’s web interface.
In a recent test by TechCrunch, Dia performed impressively. The model generated clear, lifelike two-way conversations on virtually any topic. The system responded promptly, and its voices sounded on par with more mature commercial tools. Notably, voice cloning worked with minimal effort, requiring just a few steps and offering convincing results.
Dia also supports rich conversational dynamics. Users can add non-verbal elements like sighs, laughter, or background sounds into the script, allowing for more realistic and emotionally expressive audio. These capabilities open up new creative possibilities for content creators, educators, game developers, and podcasters looking to enhance their storytelling or automate routine production tasks.
The Ethical Concerns and Future Vision
Despite its impressive functionality, Dia, like many AI voice tools, lacks strong safeguards against misuse. There are currently few technical barriers preventing users from generating fraudulent or misleading content. Although Nari Labs urges responsible usage and warns against impersonation, deception, or illegal activity, the group explicitly states that it “isn’t responsible” for how others use the technology.
Another red flag is the lack of transparency around training data. Nari Labs has not shared what datasets were used to train Dia. A user on Hacker News pointed out that one sample generated by Dia sounds suspiciously similar to NPR’s Planet Money podcast. If true, this could mean that copyrighted content was used in training — a practice common in AI development but fraught with legal uncertainty.
While some companies argue that “fair use” allows them to train models on copyrighted works, this legal defense remains untested in most jurisdictions. Many rights holders disagree and believe that unauthorized use of their content for AI training constitutes copyright infringement.
Looking ahead, Kim says Nari Labs plans to evolve Dia into a socially-driven voice AI platform, possibly including collaborative or community features that extend beyond just speech generation. Future updates may support more languages and expand Dia’s feature set. The group also intends to publish a technical report on the model’s architecture and performance, which could offer further insights into how it compares with other voice generation tools.
For now, Nari Labs’ Dia stands as a compelling new tool in the open-source AI ecosystem — one that may empower a new generation of creators, while also raising familiar concerns about security, consent, and copyright in the age of synthetic media.