Fine-tuning NVIDIA Nemotron Speech ASR on Amazon EC2 for domain adaptation
aws.amazon.com - machine-learningThis post is a collaboration between AWS, NVIDIA and Heidi.
Automatic speech recognition (ASR), often called speech-to-text (STT) is becoming increasingly critical across industries like healthcare, customer service, and media production. While pre-trained models offer strong capabilities for general speech, fine-tuning for specific domains and use cases can enhance accuracy and performance.
In this post, we explore how to fine-tune a leaderboard-topping, NVIDIA Nemotron Speech Automatic Speech Recognition (ASR) model; Parakeet TDT 0.6B V2. Using synthetic speech data to achieve superior transcription results for specialised applications, we’ll walk through an end-to-end workflow that combines AWS infrastructure with the following popular open-source frameworks:
- Amazon Elastic Compute Cloud (Amazon EC2) GPU instances (p4d.24xlarge with NVIDIA A100 GPUs) for distributed training at scale
- NVIDIA NeMo framework for ASR model fine-tuning and optimization
- DeepSpeed for memory-efficient distributed training across multiple nodes
- MLflow and TensorBoard for comprehensive experiment tracking
- Amazon Elastic Kubernetes ...
Copyright of this story solely belongs to aws.amazon.com - machine-learning . To see the full text click HERE

