Microsoft Introduces 3 Foundational AI Models To Take on OpenAI, Anthropic
extremetech.comOn Thursday, Microsoft introduced three new foundational AI models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—focused on transcription, audio, and image generation, respectively. The tech giant positions them as in-house systems that will provide it with better control over cost, performance, and integration across its software and cloud services.
MAI-Transcribe-1 offers text-to-speech transcription in 25 different languages. This could be used to create instant transcripts of Teams meetings or customer-facing phone calls. Microsoft describes MAI-Transcribe-1 as "lightning fast," meaning it should produce captions or transcripts with very low latency. The brand also reports its model as having a lower word error rate than GPT-Transcribe, Gemini 3.1 Flash, and other transcription-focused AI models.
MAI-Voice-1 is a voice-generation model aimed at providing "voice experiences and voice agents" with nuance and emotional expression. It can reportedly produce 60 seconds of audio in just one second.
Finally, MAI-Image-2 targets marketing, design, and other professionals who ...
Copyright of this story solely belongs to extremetech.com . To see the full text click HERE

