Fine-tuning NVIDIA Nemotron Speech ASR on Amazon EC2 for domain adaptation

15 hours ago aws.amazon.com - machine-learning

This post is a collaboration between AWS, NVIDIA and Heidi.

Automatic speech recognition (ASR), often called speech-to-text (STT) is becoming increasingly critical across industries like healthcare, customer service, and media production. While pre-trained models offer strong capabilities for general speech, fine-tuning for specific domains and use cases can enhance accuracy and performance.

In this post, we explore how to fine-tune a leaderboard-topping, NVIDIA Nemotron Speech Automatic Speech Recognition (ASR) model; Parakeet TDT 0.6B V2. Using synthetic speech data to achieve superior transcription results for specialised applications, we’ll walk through an end-to-end workflow that combines AWS infrastructure with the following popular open-source frameworks:

Amazon Elastic Compute Cloud (Amazon EC2) GPU instances (p4d.24xlarge with NVIDIA A100 GPUs) for distributed training at scale
NVIDIA NeMo framework for ASR model fine-tuning and optimization
DeepSpeed for memory-efficient distributed training across multiple nodes
MLflow and TensorBoard for comprehensive experiment tracking
Amazon Elastic Kubernetes ...

Copyright of this story solely belongs to aws.amazon.com - machine-learning . To see the full text click HERE

Share: