Tech »  Topic »  Observing and evaluating AI agentic workflows with Strands Agents SDK and Arize AX

Observing and evaluating AI agentic workflows with Strands Agents SDK and Arize AX


This post is co-written with Rich Young from Arize AI.

Agentic AI applications built on agentic workflows differ from traditional workloads in one important way: they’re nondeterministic. That is, they can produce different results with the same input. This is because the large language models (LLMs) they’re based on use probabilities when generating each token. This inherent unpredictability can lead AI application designers to ask questions related to the correction plan of action, the optimal path of an agent and the correct set of tools with the right parameters. Organizations that want to deploy such agentic workloads need an observability system that can make sure that they’re producing results that are correct and can be trusted.

In this post, we present how the Arize AX service can trace and evaluate AI agent tasks initiated through Strands Agents, helping validate the correctness and trustworthiness of agentic workflows.

Challenges ...


Copyright of this story solely belongs to aws.amazon.com - machine-learning . To see the full text click HERE