Evaluate models with the Amazon Nova evaluation container using Amazon SageMaker AI
aws.amazon.com - machine-learningThis blog post introduces the new Amazon Nova model evaluation features in Amazon SageMaker AI. This release adds custom metrics support, LLM-based preference testing, log probability capture, metadata analysis, and multi-node scaling for large evaluations.
The new features include:
- Custom metrics use the bring your own metrics (BYOM) functions to control evaluation criteria for your use case.
- Nova LLM-as-a-Judge handles subjective evaluations through pairwise A/B comparisons, reporting win/tie/loss ratios and Bradley-Terry scores with explanations for each judgment.
- Token-level log probabilities reveal model confidence, useful for calibration and routing decisions.
- Metadata passthrough keeps per-row fields for analysis by customer segment, domain, difficulty, or priority level without extra processing.
- Multi-node execution distributes workloads while maintaining stable aggregation, scaling evaluation datasets from thousands to millions of examples.
In SageMaker AI, teams can define model evaluations using JSONL files in Amazon Simple Storage Service (Amazon S3), then execute them as SageMaker ...
Copyright of this story solely belongs to aws.amazon.com - machine-learning . To see the full text click HERE

