FOR AI PLATFORMS

Build better agent experiences with structured data

Stop scraping HTML and guessing at structure. ARW provides clean, machine-readable content that reduces errors, lowers costs, and improves user satisfaction.

The challenges of HTML scraping

Current web scraping is inefficient, error-prone, and expensive. LLMs waste context on navigation, ads, and irrelevant content. Parsing HTML introduces errors and hallucinations.

What AI platforms face today:

  • 55,000 token product pages where only 2,200 tokens are relevant (96% waste)
  • HTML parsing errors cause 30-50% hallucination rates in e-commerce
  • Stale cached content leads to wrong prices and out-of-stock items
  • Legal uncertainty around training data and copyright
  • Cannot complete transactions—must redirect users to websites

Why ARW is better for AI platforms

Structured content, legal clarity, and lower operational costs.

85% fewer tokens

Machine views (.llm.md) provide structured content without HTML overhead. Hierarchical discovery means loading only relevant content.

HTML: 55,000 tokens
ARW: 8,400 tokens

95%+ accuracy

Structured markdown eliminates parsing ambiguity. Real-time data in machine views prevents stale information.

HTML scraping: 70-85%
ARW: 95-99%

5-10x faster

Hierarchical navigation with metadata enables progressive refinement instead of scanning everything.

llms.txt: 15-30 seconds
ARW: 3-5 seconds

Legal clarity

Machine-readable policies specify training/inference permissions and attribution requirements. Reduces liability risk and provides clear usage terms.

Lower costs

85% token reduction translates directly to lower API costs. Faster responses mean better resource utilization and user experience.

Better UX

Actions enable complete workflows through agents. Users can purchase, book, and transact without leaving the conversation.

Research-validated performance

ARW implements patterns proven in academic research on LLM navigation and retrieval.

Hierarchical Retrieval: 60-90% Token Reduction

Research: "LLM-Guided Hierarchical Retrieval" (arXiv:2510.13217)

Hierarchical approaches with progressive disclosure achieve superior token efficiency vs. flat files while maintaining accuracy.

Parent-Child Relationships: 40% Accuracy Boost

Research: Nay, 2024

Structured content relationships enable agents to understand context and navigate intelligently.

Ontology Grounding: 55% Better Fact Recall

Research: arXiv:2412.15235

Schema.org integration provides semantic grounding that improves retrieval quality and reduces hallucinations.

Real-world performance comparison

Example: User asks agent to "find and buy wireless keyboard under $150"

HTML Scraping

  • • Load 100KB llms.txt with 1000 products
  • • Scrape HTML category page
  • • Parse 40 keyboard listings
  • • Scrape 12 product pages
  • • Extract prices (50% error rate)
  • • Cannot complete purchase
Time: 30-60 seconds
Tokens: 55,000
Accuracy: 70%
Conversion: Redirect required

ARW

  • • Load 10KB discovery file
  • • Fetch keyboards category (3KB)
  • • Load 3 product machine views (6KB)
  • • Real-time stock/pricing
  • • OAuth authorization
  • • Complete purchase through agent
Time: 5-10 seconds (6x faster)
Tokens: 8,400 (85% reduction)
Accuracy: 95% (25% improvement)
Conversion: Complete in agent

Easy to implement in your agent

ARW uses standard HTTP, Markdown, OAuth, and Schema.org—technologies you already support.

Basic discovery flow:

  1. 1Fetch /llms.txt (YAML manifest)
  2. 2Parse content endpoints, actions, and policies
  3. 3Load relevant .llm.md files
  4. 4Request OAuth for actions when needed

Build better agents with structured web data

Join the movement toward a machine-readable web. ARW reduces costs, improves accuracy, and enables complete user experiences through agents.