Multimodal embeddings at scale: AI data lake for media and entertainment workloads
aws.amazon.com - machine-learningThis post shows you how to build a scalable multimodal video search system that enables natural language search across large video datasets using Amazon Nova models and Amazon OpenSearch Service. You will learn how to move beyond manual tagging and keyword-based searches to enable semantic search that captures the full richness of video content.
We demonstrate this at scale by processing 792,270 videos from two AWS Open Data Registry datasets: Multimedia Commons (787,479 videos, 37-second average) and MEVA (4,791 videos, 5-minute average). Processing 8,480 hours of video content (30.5M seconds) took 41 hours. First-year total cost: $27,328 (with OpenSearch on-demand) or $23,632 (with OpenSearch Service Reserved Instances). The cost consisted of one-time ingestion ($18,088) and annual Amazon OpenSearch Service ($9,240 on-demand or $5,544 Reserved).
The ingestion breakdown is as follows:
- Amazon Elastic Compute Cloud (Amazon EC2) compute (4× c7i.48xlarge ...
Copyright of this story solely belongs to aws.amazon.com - machine-learning . To see the full text click HERE

