Tech »  Topic »  Using Large Language Models for Zero-Shot Video Generation: A VideoPoet Case Study

Using Large Language Models for Zero-Shot Video Generation: A VideoPoet Case Study


Using Large Language Models for Zero-Shot Video Generation: A VideoPoet Case Study by @teleplay

VideoPoet is a transformer-based model for generating high-quality videos from diverse inputs, excelling in zero-shot video generation with high-fidelity motion.

Table of Links

Abstract and 1 Introduction

2. Related Work

3. Model Overview and 3.1. Tokenization

3.2. Language Model Backbone and 3.3. Super-Resolution

4. LLM Pretraining for Generation

4.1. Task Prompt Design

4.2. Training Strategy

5. Experiments

5.1. Experimental Setup

5.2. Pretraining Task Analysis

5.3. Comparison with the State-of-the-Art

5.4. LLM’s Diverse Capabilities in Video Generation and 5.5. Limitations

6. Conclusion, Acknowledgements, and References

A. Appendix

Abstract

We present VideoPoet, a model for synthesizing high-quality videos from a large variety of conditioning signals. VideoPoet employs a decoderonly transformer architecture that processes multimodal inputs – including images, videos, text, and audio. The training protocol follows that of ...


Copyright of this story solely belongs to hackernoon.com . To see the full text click HERE