Chandra OCR: The BEST in Open-Source AI Document Parsing
perficient.com
In the specialized field of Optical Character Recognition (OCR), a new open-source model from Datalab is setting a new benchmark for accuracy and versatility. Chandra OCR, released in October 2025, has rapidly ascended to the top of the leaderboards, outperforming even proprietary giants like GPT-4o and Gemini Pro on key benchmarks.
Chandra is not just another OCR tool; it’s a comprehensive document AI solution. Unlike traditional pipeline-based approaches that process documents in chunks, Chandra utilizes full-page decoding. This allows it to understand the entire context of a page, leading to significant improvements in accuracy and layout awareness.
- Layout-Aware Output: Chandra preserves the original document structure, outputting to Markdown, HTML, or JSON with remarkable fidelity.
- Image & Figure Extraction: It can identify, caption, and extract images and figures from within a document.
- Advanced Language Support: Chandra supports over 40 languages and can even read handwritten text, making it a truly ...
Copyright of this story solely belongs to perficient.com . To see the full text click HERE

