Miso Labs has released MisoTTS, an open-weights 8B text-to-speech model. It uses residual vector quantization (RVQ) to scale its sonic range without scaling parameters, and conditions on both text and audio context to respond to speaker tone. The architecture pairs a 7.7B backbone with a 300M depth decoder.
The post Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights appeared first on MarkTechPost.
Today on Decoder, I’m talking to Ryan Mac, a technology reporter at The New York Times and coauthor of the excellent book Character Limit: How Elon Musk Destroyed Twitter, which came out in 2024. I can’t recommend it enough.
I wanted to have Ryan on the show because we’re on the cusp of the SpaceX IPO, which promises to be one of the most consequential public offerings in history for a variety of reasons — its biggest-ever size, of course, at nearly $2 trillion dollars, but also because all kinds of rules that keep our markets fair are being bent, if not outright broken, along the way. I also wanted to talk to Ryan because buried somewhere inside SpaceX is X, the social platform formerly known as Twitter, which Musk purchased in 2022. That’s what Ryan cowrote that book about.
I was very confident that Musk would come to regret buying Twitter back then. I wrote a piece called “Welcome To hell, Elon,” which is probably the single most-read thing I’ve ever written. My thesis was that th
Text-to-speech changed fast in 2026. This guide ranks the leading commercial and open-weight TTS models, comparing quality, latency, cost, language coverage, and licensing so engineers can match a model to the job.
The post Best Text-to-Speech TTS Models in 2026: A Benchmark-Based Comparison appeared first on MarkTechPost.
The Seoul-based speech AI company ships its third generation of its on-device TTS engine, adding expressive tags, improved reading stability, and a 6× increase in language coverage — all while keeping the inference contract unchanged for existing integrations.
The post Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags appeared first on MarkTechPost.
Learn how the Voxtral TTS model works, what makes its voice cloning and low‑latency performance special, and how to start generating speech with just a few lines of Python code.
In this tutorial, we build an advanced hands-on workflow with the Deepgram Python SDK and explore how modern voice AI capabilities come together in a single Python environment. We set up authentication, connect both synchronous and asynchronous Deepgram clients, and work directly with real audio data to understand how the SDK handles transcription, speech generation, […]
The post A Coding Implementation on Deepgram Python SDK for Transcription, Text-to-Speech, Async Audio Processing, and Text Intelligence appeared first on MarkTechPost.
Today on Decoder, I want to lay out an idea that’s been banging around my head for weeks now as we’ve been reporting on AI and having conversations here on this show. I’ve been calling it software brain, and it’s a particular way of seeing the world that fits everything into algorithms, databases and loops — software.
Software brain is powerful stuff. It’s a way of thinking that basically created our modern world. Marc Andreessen, the literal embodiment of software brain, called it in 2011 when he wrote the piece “Why software is eating the world” as an op-ed in The Wall Street Journal. But software thinking has been turbocharged by AI in a way that I think helps explain the enormous gap between how excited the tech industry is about the technology and how regular people are growing to dislike it more and more over time.
In fact, the polling on this is so strong, I think it’s fair to say that a lot of people hate AI. And Gen Z in particular seems to hate AI more and more as they enco
Elon Musk’s AI company xAI has launched two standalone audio APIs — a Speech-to-Text (STT) API and a Text-to-Speech (TTS) API — both built on the same infrastructure that powers Grok Voice on mobile apps, Tesla vehicles, and Starlink customer support. The release moves xAI squarely into the competitive speech API market currently occupied by […]
The post xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers appeared first on MarkTechPost.
Google has introduced Gemini 3.1 Flash TTS, a preview text-to-speech model focused on improving speech quality, expressive control, and multilingual generation. Unlike previous iterations that prioritized simple conversion, this release emphasizes natural-language audio tags, native support for more than 70 languages, and native multi-speaker dialogue. This release signals a shift from ‘black-box’ audio generation toward […]
The post Google AI Launches Gemini 3.1 Flash TTS: A New Benchmark in Expressive and Controllable AI Voice appeared first on MarkTechPost.