Tech »  Topic »  Chameleon Sets New Benchmarks in AI Image-Text Tasks

Chameleon Sets New Benchmarks in AI Image-Text Tasks


by Regularization Technology May 20th, 2025

Chameleon is a powerful early-fusion AI model that combines images and text tokens into one system, outperforming others in vision-language tasks and enabling new multimodal reasoning.

Table of Links

Abstract and 1 Introduction

2 Pre-Training

2.1 Tokenization

2.2 Pre-Training Data

2.3 Stability

2.4 Inference

3 Alignment and 3.1 Data

3.2 Fine-Tuning Strategy

4 Human Evaluations and Safety Testing, and 4.1 Prompts for Evaluation

4.2 Baselines and Evaluations

4.3 Inter-annotator Agreement

4.4 Safety Testing

4.5 Discussion

5 Benchmark Evaluations and 5.1 Text

5.2 Image-To-Text

6 Related Work

7 Conclusion, Acknowledgements, Contributors, and References

Appendix

A. Samples

B. Additional Information of Human Evaluations

7 Conclusion

In this paper, we introduced Chameleon, a new family of early-fusion token-based foundation models that set a new bar for multimodal machine learning. By learning a unified representation ...


Copyright of this story solely belongs to hackernoon.com . To see the full text click HERE