Sarvam AI LAUNCHES BULBUL-V2, INDIA’S FIRST MULTILINGUAL VOICE AI MODEL ( 07.05.25)

Authored by Mr. Abhishek (Student, Symbiosis Law School, Noida)

For Indian artificial intelligence innovation, Sarvam AI has officially launched Bulbul-v2, its flagship text-to-speech (TTS) model supporting 11 Indian languages. The Bengaluru-based AI startup, known for its India-first approach, says the new voice AI system offers high-speed performance, real-sounding voices, and a wide range of customisation features suited for commercial and brand use.

 

Authentic Indian Voices Take Centre Stage

According to Sarvam AI, Bulbul-v2 is “a text-to-speech (TTS) model that supports 11 Indian languages.” What sets this model apart is its emphasis on authenticity. As per a company post on LinkedIn, “the AI-generated voice sounds real and not robotic or rehearsed.” The startup claims the accents used in the voice model are native and truly reflect the diversity of Indian linguistic tones, stating that the voices sound “just like India.”

This is part of Sarvam’s broader goal of developing AI that is not only world-class in performance but also culturally and linguistically relevant to India.

Benchmark for Indian Speech AI

In the same announcement, the company made bold claims about the potential of Bulbul-v2, stating that it has “set new benchmarks for speech AI in India.” With a clear focus on scalability and utility, the startup noted that it is committed to “making AI more accessible in the country with lower-latency models and India-first pricing for API access.”

 

Part of India’s Sovereign LLM Initiative

Sarvam AI is also in the spotlight for being “the first startup chosen by the central government to build India’s sovereign large language model (LLM)” as a part of the IndiaAI mission, a national effort aimed at establishing technological self-reliance in foundational AI capabilities. The inclusion of Sarvam AI in this initiative further underscores the government’s trust in the startup’s expertise and innovative direction.

 

What Is Bulbul-v2?

The newly launched Bulbul-v2 is described as Sarvam’s “flagship text-to-speech model that has been specifically designed for Indian languages and accents.” The model’s design incorporates natural-sounding speech with human-like prosody and supports multiple voice personalities. It is built to handle multi-language and code-mixed text, making it especially relevant in a country where multilingual communication is common.

Additionally, Bulbul-v2 provides “real-time synthesis capabilities” along with “fine-grained control over pitch, pace, and loudness.”

 

Advanced Features Tailored for Business and Brands

Bulbul-v2 is equipped with a suite of technical features aimed at allowing deeper personalisation and high usability. According to Sarvam AI, the model includes “voice control, sample rate options, text reprocessing, and language support.”

The system supports multiple sample rates from 8kHz to 24kHz, making it adaptable for different audio quality needs. Moreover, it features “smart normalisation of numbers, dates, and mixed-language text,” allowing for better pronunciation and natural delivery—particularly useful in business applications such as call centres, content localisation, and automated messaging.

 

What Can Bulbul-v2 Do?

The core functionality of the model lies in its ability to convert text to speech with default settings or allow users to tweak voice parameters for more custom outputs. Sarvam AI says the model gives “fine-tune control over voice characteristics by adjusting pitch, pace, and loudness.” The company added that Bulbul-v2 “is perfect for creating the exact voice style one needs.”

Whether the use case demands a warm, welcoming tone or a professional and direct voice, Bulbul-v2’s flexibility allows users to design their brand’s voice identity with precision. The model’s “sample rate options” offer further control over audio fidelity, helping businesses align voice quality with their platform needs.

 

Performance and Accessibility

One of the key selling points of Bulbul-v2 is its low latency. With a fast response time, it is positioned as a cost-effective alternative to international counterparts. Given its localised features and pricing, it is expected to attract a large number of small and medium-sized Indian businesses looking to leverage AI-driven voice technology without heavy infrastructure investments.

The startup also highlighted its “India-first pricing for API access,” a move that aims to lower entry barriers for AI adoption across different sectors, including education, healthcare, media, and customer service.

 

Bulbul-v1 Laid the Foundation

Bulbul-v2 builds upon the success of Bulbul-v1, which was launched in August 2024. The original model came with six preset voice personalities and marked Sarvam AI’s first major entry into the voice tech market. With version 2, the startup appears to be stepping into a more refined, enterprise-ready space.

 

Looking Ahead

The launch of Bulbul-v2 marks an important chapter in India’s homegrown AI journey. With its localisation, customisation, and support for Indian languages, the model not only reflects technical advancement but also cultural relevance. For startups, brands, and government agencies looking to localise their AI communication, Bulbul-v2 offers a powerful, efficient, and affordable solution.

With its inclusion in the IndiaAI mission and the government’s push for digital sovereignty, Sarvam AI seems poised to play a major role in shaping the future of AI in India.

REFERENCES