पाठ से वाणी
न्यूरल आवाज़ (Kokoro) और ब्राउज़र आवाज़ें · सब आपके ब्राउज़र में स्थानीय रूप से चलती हैं।
इस टूल के बारे में
न्यूरल वॉइस (Kokoro AI) 82 मिलियन पैरामीटर के अत्याधुनिक टेक्स्ट-टू-स्पीच मॉडल का उपयोग करता है। यह 100% आपके ब्राउज़र में चलता है।
ब्राउज़र वॉइस आपके सिस्टम की अंतर्निहित Web Speech API का उपयोग करती हैं। वे तुरंत हैं और बिना डाउनलोड के।
How the Web Speech API Works
Browsers expose a SpeechSynthesis interface (part of the Web Speech API, originally drafted by the W3C Speech API Community Group) that takes text and a chosen voice and produces audible speech via the underlying operating system's TTS engine. The full API surface is small but powerful: speechSynthesis.speak(utterance) starts speech, cancel() / pause() / resume() control playback, and getVoices() lists every voice the OS exposes. Each SpeechSynthesisUtterance carries the text, language tag, voice, rate, pitch, and volume.
The audio itself is generated by the OS, not the browser. macOS and iOS ship with dozens of high-quality voices built into the system. Windows surfaces voices installed via Settings → Time & Language → Speech. Android uses Google's Text-to-Speech engine (or alternatives like Samsung TTS). Linux falls through to whatever speech-dispatcher / espeak setup the distro provides, often robotic-sounding by default unless you've installed a richer engine.
The Cloud-vs-Local Privacy Distinction
Not every "browser" voice runs on your device. Some browsers send the text to a remote server to render the audio for higher-quality voices, then stream the result back. This matters for privacy:
- Safari (macOS / iOS): synthesis runs entirely on-device. Apple's voices, including the Siri-style natural ones, are bundled in the OS. No text leaves the device.
- Chrome (desktop and Android): for some voices labelled "Google", the text is sent to Google's TTS service to render the audio. Other Chrome voices that mirror local OS voices stay on-device. The
SpeechSynthesisVoice.localServiceproperty tells you which is which (true= on-device,false= cloud). - Microsoft Edge: similar pattern. Edge's high-quality "Online Natural" voices route text to Microsoft's cloud TTS; the standard OS voices are local.
- Firefox: Web Speech API support has historically been limited; on systems where it works, it uses the OS engine.
If your text is sensitive (drafts of confidential documents, internal company memos, anything you wouldn't want copied to a third party) pick a voice marked as local. If you don't see local voices in the dropdown, install OS voice packs and they'll appear there.
Common Use Cases
- Accessibility. Screen readers (VoiceOver, NVDA, JAWS, TalkBack) handle the heavy lifting for blind and low-vision users, but a quick TTS tool helps anyone (dyslexia, eye strain, fatigue) get text read aloud occasionally.
- Proofreading. Hearing your own writing read back catches awkward sentences, missing words, and rhythm problems that silent reading slides past. Common professional-writer trick.
- Language learning pronunciation. Hear words spoken in the target language; helpful when reading a foreign article and unsure how a word sounds.
- Reading articles aloud while doing chores. Cooking, cleaning, exercising, commuting, anywhere reading isn't practical but listening is.
- Voiceover drafts. Quickly mock up a narration to test pacing before recording with a real voice actor or commissioning a paid TTS service like ElevenLabs.
- Education. Generating spoken material for classroom content, vocabulary drills, dictation practice, accessibility for diverse learners.
Quirks and Limitations to Know About
- Chrome's long-text cut-off. A long-standing Chromium bug (679437) makes
speak()stop after roughly 15 seconds, typically around 200–250 characters. Workarounds split the text into sentence-length chunks and callspeak()for each. - The
voiceschangedevent. The first call tospeechSynthesis.getVoices()on Chrome returns an empty array. The voices populate asynchronously; pages need to listen for thevoiceschangedevent before showing the voice list. - User-gesture requirement. Like autoplay-with-audio, browsers block speech synthesis until the user clicks or taps something. The Speak button satisfies that gesture; programmatic speech on page load won't work.
- iOS Low Power Mode. When the iPhone is in Low Power Mode, Safari sometimes refuses to start speech synthesis until the mode is disabled.
- Pause / resume bugs on Android Chrome. Pausing and resuming sometimes drops the queue. If reliability matters, restart from
speak()rather than relying onpause()/resume(). - Out-of-range rate / pitch silently fails. Setting rate above ~3.0 or below 0.1, or pitch above 2.0, causes some engines to produce no audio at all instead of capping the value.
Why Voice Quality Varies So Much
The quality of a TTS voice depends entirely on the underlying engine, which depends on the OS, which depends on what you've installed. The 1990s-era voices (eSpeak, Microsoft Anna, the old Mac "Fred") were synthesised from concatenated phoneme samples and sound robotic and stilted. Modern voices (Apple's Siri voices, Microsoft's Online Natural voices, Google's WaveNet-based voices, ElevenLabs' subscription voices) use deep learning to generate audio that's nearly indistinguishable from a human reader.
If the voices in your dropdown sound robotic, the fix isn't this tool, it's installing better voices in your OS:
- Windows: Settings → Time & Language → Speech → Add voices. Microsoft's "Online Natural" voices are dramatically better than the offline defaults.
- macOS: System Settings → Accessibility → Spoken Content → System Voice → Manage Voices. Look for "Premium" / "Enhanced" voices; they download in the background and significantly improve quality.
- iOS: Settings → Accessibility → Spoken Content → Voices. Same naming convention as macOS.
- Android: Settings → Accessibility → Text-to-speech output → Google → Install voice data.
- Linux: install
festivalormbrolafor better-than-eSpeak quality, or use a cloud TTS via API.
Common Mistakes
- Expecting Firefox to support it. Firefox's Web Speech API support has lagged. The Speak button will be disabled when you visit in Firefox; use a Chromium-based browser or Safari for reliable TTS.
- Pasting confidential text into a Chrome session and assuming it's local. The default Chrome "Google" voices send text to Google's TTS service. Pick a local voice or use Safari for sensitive content.
- Long blocks of text in Chrome. The 15-second / ~250-character cut-off catches anyone who pastes a paragraph and expects it to read all the way through. Either split the text or use Safari (no cut-off).
- Setting rate or pitch too far out of range. The engine doesn't clamp; it silently produces no audio. Stay within rate 0.5–2.5 and pitch 0.5–1.5 for predictable results.
- Treating browser TTS as production-quality voiceover. Even the best browser voices are good enough for proofreading, accessibility, and rough drafts, not for published podcasts or commercial voiceover. For that, look at ElevenLabs, Murf, or similar paid services.
- Forgetting that voices download asynchronously. First page visit on Chrome may show no voices; refresh after a moment and they'll appear.
More Frequently Asked Questions
How do I tell if a voice is local or cloud-based?
Programmatically, the SpeechSynthesisVoice.localService property is true for on-device voices and false for cloud-based ones. In practice, voice naming conventions help: Chrome's voices labelled "Google" are usually cloud-based; voices that match your OS's installed voices (Microsoft David, Apple Samantha, Google Wavenet en-US-Wavenet-D) are local if the OS has them. Safari's voices are always local.
Can I save the audio as an MP3 file?
Not with the browser's Web Speech API directly, the spec doesn't expose the audio stream for capture. If you need a downloadable MP3 / WAV, options include: a dedicated voiceover app like Audacity recording your system audio, a paid TTS API (Google Cloud TTS, Amazon Polly, ElevenLabs) that returns the audio file, or a screen-recording app capturing the playback.
Why is the audio choppy or stopping mid-sentence?
The most common cause on Chrome is the long-text bug, speech stops at ~15 seconds. Refresh and try again with a shorter passage, or switch to Safari which doesn't have that limit. Other causes: a system glitch in the OS speech engine (a restart usually fixes it), or a cloud voice failing to fetch when offline (switch to a local voice).
Does this work in any language?
Any language your operating system has a voice installed for. macOS and iOS ship with dozens of languages built in. Windows requires installing speech packs per language (Settings → Time & Language → Speech → Add voices). Android needs Google TTS or a third-party engine to have the language data downloaded. The Voice dropdown lists everything available; the language tag (en-US, fr-FR, ja-JP, etc.) tells you which language each voice produces.
Is this useful for podcasting?
For drafts and pacing tests, yes. For published episodes, the quality bar is higher, even the best browser voices have subtle artefacts that listeners pick up on quickly. Paid services like ElevenLabs and Murf offer voice models trained for long-form narration and produce noticeably better results, often at a few cents per thousand characters.
Can I use this for blind / low-vision users on my own site?
A site doesn't usually need to embed TTS for accessibility, assistive technologies like screen readers (VoiceOver on Apple devices, NVDA / JAWS on Windows, TalkBack on Android) handle that universally. Embedded TTS is more useful for occasional read-aloud convenience for sighted users with reading fatigue or learners. For accessibility, focus on semantic HTML, ARIA labels, keyboard navigation, and contrast, those help every screen reader work better, including the user's own.