Using Large Language Models for Zero-Shot Video Generation: A VideoPoet Case Study
hackernoon.comVideoPoet is a transformer-based model for generating high-quality videos from diverse inputs, excelling in zero-shot video generation with high-fidelity motion.
Table of Links
3. Model Overview and 3.1. Tokenization
3.2. Language Model Backbone and 3.3. Super-Resolution
4. LLM Pretraining for Generation
5. Experiments
5.2. Pretraining Task Analysis
5.3. Comparison with the State-of-the-Art
5.4. LLM’s Diverse Capabilities in Video Generation and 5.5. Limitations
6. Conclusion, Acknowledgements, and References
Abstract
We present VideoPoet, a model for synthesizing high-quality videos from a large variety of conditioning signals. VideoPoet employs a decoderonly transformer architecture that processes multimodal inputs – including images, videos, text, and audio. The training protocol follows that of ...
Copyright of this story solely belongs to hackernoon.com . To see the full text click HERE