FFmpeg adds first AI feature with Whisper audio transcription filter

Forward-looking: Although FFmpeg is often associated with video transcoding tasks, it can also handle audio streams and files with ease. The open-source project is now introducing its first AI-powered feature: an audio transcription filter based on a popular speech recognition model developed by OpenAI.

For the first time in its long history, FFmpeg is integrating AI models with the introduction of the new Whisper audio filter. This filter can process audio streams or files to perform automatic speech recognition, potentially simplifying media transcoding workflows – even for live events.

Whisper, developed by OpenAI, is a general-purpose speech recognition model trained on a large and diverse audio dataset. It supports multilingual transcription, speech translation, and language identification. The model is available in six different sizes, each offering a trade-off between speed and accuracy.

With Whisper, FFmpeg users can output transcriptions in multiple formats, including raw text, SRT subtitle files, or JSON. The ...

Copyright of this story solely belongs to techspot.com . To see the full text click HERE

Share: