Tech »  Topic »  The humble PDF is becoming a problem for AI

The humble PDF is becoming a problem for AI


Looking ahead: Three decades after Adobe introduced the Portable Document Format – a design intended to preserve the appearance of printed pages across devices – PDFs are facing pressure from a completely different kind of reader: artificial intelligence. The same fixed layouts that made PDFs indispensable to human users now make them difficult for large language models to interpret. Unlike web pages or plain-text files, columns, embedded graphics, and hidden metadata in PDFs often confuse machine parsing systems trained to process linear text.

Researchers and developers working with large language models say these structural quirks introduce subtle but significant errors. An AI that reads lines strictly from left to right may stumble over multi-column scientific papers or misinterpret footers as part of the main text. These parsing issues can cascade into so-called "hallucinations," where a model produces inaccurate summaries or fabricates details.

Unlike basic text formats, PDFs are not built around logical ...


Copyright of this story solely belongs to techspot.com . To see the full text click HERE