Perplexity is allegedly scraping websites it's not supposed to, again

3 days, 8 hours ago Engadget

(REUTERS / Reuters)

Web crawlers deployed by Perplexity to scrape websites are allegedly skirting restrictions, according to a new report from Cloudflare. Specifically, the report claims that the company's bots appear to be "stealth crawling" sites by disguising their identity to get around robots.txt files and firewalls.

Robots.txt is a simple file websites host that lets web crawlers know if they can scrape a websites' content or not. Perplexity's official web crawling bots are "PerplexityBot" and "Perplexity-User." In Cloudflare's tests, Perplexity was still able to display the content of a new, unindexed website, even when those specific bots were blocked by robots.txt. The behavior extended to websites with specific Web Application Firewall (WAF) rules that restricted web crawlers, as well.

Cloudflare believes that Perplexity is getting around those obstacles by using "a generic browser intended to impersonate Google Chrome on macOS" when robots.txt prohibits ...

Copyright of this story solely belongs to Engadget . To see the full text click HERE

Share:

More related news