How AI Agents Read Your Website (And What You Must Fix)

How AI Agents Read Your Website (And What You Must Fix)

Every major AI platform can now browse websites on its own. Chrome’s auto-browse can scroll and click. ChatGPT Atlas can fill out forms and even complete purchases. Perplexity Comet can research across multiple tabs. But none of these agents experiences your website the way a human does.

This is Part 4 in a five-part series on optimizing for the agentic web. In Part 1, we traced the shift from SEO to AAIO. Part 2 explored how to get your content cited in AI responses. Part 3 broke down the emerging protocol layer. Now we go deeper: how AI agents actually perceive your website—and what you need to build for them.

One key insight keeps surfacing: the most effective way to make your site compatible with AI agents is the same approach accessibility advocates have championed for years. The accessibility tree—originally designed for screen readers—is quickly becoming the primary interface through which AI agents interpret your site.

According to the 2025 Imperva Bad Bot Report by the cybersecurity company Imperva, automated traffic surpassed human traffic for the first time in 2024, accounting for 51% of all web interactions. Not all of this comes from agentic browsing, but the trend is unmistakable: the non-human audience for your website is already larger than the human one—and still growing. This article draws exclusively on official documentation, peer-reviewed research, and announcements from the companies building this new layer of the web.

Three Ways Agents See Your Website

When a human lands on your site, they experience colors, layout, imagery, and typography. An AI agent encounters something entirely different. Understanding that difference is the foundation of building websites that actually work for these systems.

Today’s major AI platforms rely on three distinct approaches—and each one has real implications for how you should structure your site.

Vision: Reading Screenshots

Anthropic takes the most literal approach with its Computer Use system. Its model, Claude, captures screenshots of a browser, interprets the visual output, and decides what to click or type based on what it “sees.” The process runs as a continuous loop: screenshot → reason → act → repeat. At this level, the agent operates on pixels—identifying buttons by appearance and extracting text from rendered images.

Google’s Project Mariner follows a similar paradigm, described as an “observe–plan–act” loop. “Observe” captures both visual elements and parts of the underlying structure, “plan” determines the sequence of actions, and “act” executes them as simulated user interactions. Mariner has reported an 83.5% success rate on the WebVoyager benchmark.

This vision-based approach is powerful, but it comes with trade-offs: it’s computationally expensive, fragile to layout changes, and constrained by whatever is visibly rendered on the screen.

Accessibility Tree: Reading Structure

OpenAI takes a different approach with ChatGPT Atlas. Its Publishers and Developers FAQ makes one thing clear:

Atlas is built on Chromium, but instead of analyzing rendered pixels, it queries the accessibility tree—looking for elements with defined roles like “button” or “link,” along with their accessible names. This is the same underlying structure used by screen readers such as VoiceOver and NVDA to help visually impaired users navigate the web.

Microsoft takes a similar approach with its Playwright MCP, the official MCP server for browser automation. Rather than relying on screenshots, it provides accessibility snapshots—structured representations of a page that AI models can reliably interpret. This reflects a deliberate design choice: prioritizing semantic structure over visual rendering.

Hybrid: Both at Once

In practice, the most capable agents don’t rely on just one method—they combine them. OpenAI’s Computer-Using Agent (CUA), which powers both Operator and Atlas, blends screenshot analysis with DOM processing and accessibility tree parsing. It prioritizes ARIA roles and labels, falling back to text content and structural selectors when accessibility data is incomplete.

Research from Perplexity AI points to the same pattern. Their BrowseSafe paper, which outlines the safety architecture behind Comet’s browser agent, describes a “hybrid context management” approach—combining accessibility tree snapshots with selective use of vision.

How Major Platforms Interpret Your Website

PlatformPrimary ApproachDetails
Anthropic Computer UseVision (screenshots)Operates in a screenshot → reason → act feedback loop
Google Project MarinerVision + code structureUses an observe–plan–act cycle combining visual and structural signals
OpenAI AtlasAccessibility treeRelies on ARIA roles and semantic structure
OpenAI CUAHybridCombines screenshots, DOM parsing, and accessibility data
Microsoft Playwright MCPAccessibility treeProvides structured accessibility snapshots instead of screenshots
Perplexity AI CometHybridBlends accessibility tree data with selective visual analysis

The pattern is clear: even platforms that began with a vision-first approach are steadily incorporating accessibility data. Meanwhile, systems optimized for reliability and efficiency—like Atlas and Playwright MCP—treat the accessibility tree as the primary interface.

Your website’s accessibility tree isn’t just a compliance checkbox anymore—it’s quickly becoming the primary interface AI agents use to understand and interact with your site.

Before the European Accessibility Act came into force, it was easy to joke that AI might finally push people to care about accessibility more than regulation ever did. That’s no longer hypothetical—it’s happening.

The Accessibility Tree Is Your Agent Interface

The accessibility tree is a simplified representation of your page’s DOM, generated by browsers for assistive technologies. While the full DOM includes every div, span, style, and script, the accessibility tree filters out the noise and surfaces what actually matters: interactive elements, their roles, names, and states.

That’s exactly why it works so well for AI agents. A typical webpage might contain thousands of DOM nodes, but the accessibility tree reduces that complexity to what a user—or an agent—can actually act on: buttons, links, form fields, headings, and landmarks. For AI models operating within limited context windows, that reduction is critical.

OpenAI makes this explicit in its Publishers and Developers FAQ:

Follow WAI-ARIA best practices by adding descriptive roles, labels, and states to interactive elements like buttons, menus, and forms. This helps ChatGPT recognize what each element does and interact with your site more accurately.

And more directly:

Making your website more accessible helps ChatGPT Agent in Atlas understand it better.

This isn’t just guidance—it’s backed by research. A study from University of California, Berkeley and University of Michigan, presented at CHI 2026, tested Claude Sonnet 4.5 across 60 real-world web tasks. The researchers collected over 40 hours of interaction data spanning more than 158,000 events under varying accessibility conditions.

The results were striking:

Performance Under Different Conditions

ConditionTask Success RateAvg. Completion Time
Standard (default)78.33%324.87 seconds
Keyboard-only41.67%650.91 seconds
Magnified viewport28.33%1,072.20 seconds

Under standard conditions, the agent completes tasks successfully nearly 80% of the time. But when interaction is limited to keyboard-only navigation—similar to how screen reader users operate—success drops to 42%, and completion time roughly doubles. Constrain the viewport further, as magnification tools do, and success falls to just 28%, with tasks taking more than three times as long.

The researchers identify three core categories of failure:

  • Perception gaps: Agents can’t reliably access screen reader announcements or detect ARIA state changes that signal what happened after an interaction.
  • Cognitive gaps: Agents struggle to maintain context and track progress across multi-step workflows.
  • Action gaps: Agents underuse keyboard shortcuts and fail at more complex interactions like drag-and-drop.

The takeaway is straightforward. Websites with rich, well-labeled accessibility trees give agents the signals they need to succeed. Sites that depend on visual cues, hover states, or complex JavaScript interactions—without accessible fallbacks—create failure modes for both humans and machines.

Research from Perplexity AI reinforces this from a content perspective. In its search API architecture paper (September 2025), the company notes that its indexing systems prioritize content that is “high quality in both substance and form,” with structure preserved. Pages built around well-organized lists and tables benefit from more consistent parsing and extraction. Structure isn’t just helpful—it’s what makes reliable interpretation possible.

Semantic HTML: The Foundation for Agents

The accessibility tree is derived directly from your HTML. When you use semantic elements, browsers automatically generate a meaningful structure. Skip them, and the resulting tree becomes sparse—or worse, misleading.

This isn’t new advice. Web standards advocates have been emphasizing semantic HTML for over two decades. What’s changed is the audience. It’s no longer just about supporting screen readers for a subset of users—it’s about every AI agent that now interacts with your website.

Use native elements wherever possible. A <button> automatically appears in the accessibility tree with the role “button” and its text as the accessible name. A clickable <div> does not—so an agent may not even recognize it as interactive.

<!-- Agent can identify and interact with this -->
<button type="submit">Search flights</button><!-- Agent may not recognize this as interactive -->
<div class="btn btn-primary" onclick="searchFlights()">Search flights</div>

Label your forms properly. Every input should have an associated label, because agents rely on labels to understand what each field expects.

<!-- Agent knows this is an email field -->
<label for="email">Email address</label>
<input type="email" id="email" name="email" autocomplete="email"><!-- Agent sees an unlabeled text input -->
<input type="text" placeholder="Enter email...">

The autocomplete attribute matters more than most developers realize. It gives agents (and browsers) a standardized hint about what kind of data belongs in each field—values like name, email, tel, street-address, or organization. When an agent fills out a form on a user’s behalf, this can be the difference between precise field mapping and blind guessing.

Structure your headings clearly. Use <h1> through <h6> in a logical hierarchy. Agents depend on headings to understand how content is organized and to jump between sections. Skipping levels—like jumping from <h1> to <h4>—breaks that mental model.

Use landmark regions to define layout. Elements like <nav>, <main>, <aside>, <header>, and <footer> tell agents exactly where they are on the page. A <nav> is explicitly navigation. A <div class="nav-wrapper"> is ambiguous and requires interpretation. Clarity wins every time.

<nav aria-label="Main navigation">
<ul>
<li><a href="/products">Products</a></li>
<li><a href="/pricing">Pricing</a></li>
</ul>
</nav><main>
<article>
<h1>Flight Search</h1>
<!-- Primary content -->
</article>
</main>

This shift is also visible in tooling. Microsoft’s Playwright test agents (introduced in October 2025) generate test code using accessible selectors by default. Instead of CSS selectors or XPath, they target elements by role and name—the same way assistive technologies do:

const todoInput = page.getByRole('textbox', { name: 'What needs to be done?' });

That design choice isn’t accidental. It reflects a broader truth: the most reliable way for AI to interact with interfaces is through the same structured, semantic layer built for accessibility.

ARIA: Useful, Not Magic

World Wide Web Consortium’s ARIA (Accessible Rich Internet Applications) standard is often recommended for improving accessibility—and it does matter. But ARIA is a supplement, not a substitute. Think of it like a protein shake: helpful alongside a solid foundation, but a poor replacement for real structure.

The W3C’s first rule of ARIA says it all:

If you can use a native HTML element with built-in semantics and behavior, do that instead of adding ARIA.

The fact that “don’t overuse ARIA” is the first rule tells you how often it’s misapplied.

Accessibility expert Adrian Roselli raised a sharp critique in his October 2025 analysis of OpenAI’s guidance. His concern: recommending ARIA without enough nuance can backfire. According to surveys from WebAIM, sites that use ARIA tend to be less accessible on average—not because ARIA is flawed, but because it’s frequently misused as a patch over poor HTML. Roselli also warns of a familiar failure mode: treating aria-label like old-school SEO keywords, stuffing it with terms instead of meaningful context.

The Right Approach: Layered

The most effective strategy is to treat ARIA as a second layer—not the foundation:

1. Start with semantic HTML
Use native elements like <button>, <nav>, <label>, and <select>. These come with built-in roles and behaviors that both browsers and agents understand by default.

2. Add ARIA when HTML falls short
For custom UI components—like tab panels, tree views, or disclosure widgets—ARIA roles and states provide the missing semantics.

3. Use ARIA states for dynamic behavior
When JavaScript updates the UI, ARIA attributes communicate those changes to agents:

<!-- Indicates whether the menu is open -->
<button aria-expanded="false" aria-controls="menu-panel">Menu</button><div id="menu-panel" aria-hidden="true">
<!-- Menu content -->
</div>

4. Keep aria-label accurate and meaningful
Use it to clarify context that isn’t visible—like distinguishing between multiple “Delete” buttons. Don’t overload it with keywords or irrelevant text.


The underlying principle is the same one that’s always applied to good SEO: build for the user first, then optimize for the system. Semantic HTML is the foundation. ARIA is the precision tool you reach for when that foundation isn’t enough.

The Rendering Question

Browser-based agents—like Chrome auto-browse, OpenAI’s Atlas, and Perplexity AI’s Comet—run on Chromium. They execute JavaScript. They can fully render modern single-page applications.

But not everything that visits your site has a full browser environment.

AI crawlers—such as PerplexityBot, OAI-SearchBot, and ClaudeBot—index your content for retrieval and citation. Many of them don’t execute client-side JavaScript. If your page is just a blank <div id="root"></div> until a framework like React hydrates it, those crawlers see… nothing. Your content simply doesn’t exist to them.

This connects directly to the citation layer: AI systems pull answers from indexed content. If your content isn’t in the initial HTML, it won’t be indexed. And if it’s not indexed, it won’t be cited. Server-side rendering isn’t just about performance anymore—it’s about visibility.

Even for full browser agents, JavaScript-heavy experiences introduce friction. Content that loads only after interactions, infinite scroll with no clear endpoint, or forms that re-render on every input all increase the risk of failure. Research on agent behavior highlights “cognitive gaps”—situations where agents lose track of state during multi-step workflows. Simpler, more predictable rendering reduces those failures.

Guidance from Microsoft reinforces this: don’t hide important information behind tabs, accordions, or expandable menus. If content requires interaction to appear, there’s a real chance agents won’t see it. If it matters, it should be present in the visible HTML from the start.

Practical Rendering Priorities

  • Server-side render or pre-render content pages
    If an AI crawler can’t see your content in the initial HTML, it effectively doesn’t exist in the AI ecosystem.
  • Avoid blank-shell SPAs for content-heavy pages
    Frameworks like Next.js, Nuxt.js, and Astro make server-side rendering straightforward.
  • Don’t hide critical information behind interactions
    Pricing, specifications, availability, and key details should be directly visible—not buried in tabs or accordions.
  • Use standard <a href> links for navigation
    Client-side routing that relies on onClick handlers without proper URLs can break how agents discover and traverse your site.

The shift is subtle but important: rendering decisions are no longer just about user experience—they directly determine whether AI systems can see, understand, and use your content at all.

Testing Your Agent Interface

You wouldn’t ship a website without testing it in a browser. The same standard is emerging for AI agents: if they’re going to interact with your site, you need to test how they perceive it.

Screen readers as a practical proxy

Screen reader testing remains the most useful baseline. If tools like VoiceOver (macOS), NVDA (Windows), or TalkBack (Android) can successfully navigate your site—identifying buttons, reading labels, and following structure—then AI agents will likely be able to do the same.

That’s because both rely on the same underlying accessibility tree. It’s not a perfect equivalence—AI agents can do things screen readers cannot, and vice versa—but it reliably surfaces the majority of structural issues that block machine understanding.

Seeing exactly what agents see

Microsoft’s Playwright MCP offers a more direct approach. It exposes accessibility snapshots that show precisely how an agent interprets a page. Instead of visual rendering, you get structured output: roles, names, and states.

Published as @playwright/mcp on npm, it effectively lets you inspect your site through an agent’s perspective.

A simplified snapshot might look like this:

[heading level=1] Flight Search
[navigation "Main navigation"]
[link] Products
[link] Pricing[main]
[textbox "Departure airport"] value=""
[textbox "Arrival airport"] value=""
[button] Search flights

If your key controls don’t appear here—or appear without meaningful labels—agents will struggle to use your interface.

Testing real workflows

Another useful tool is Stagehand by Browserbase. Its v3 release (October 2025) combines DOM and accessibility tree parsing with “self-healing” execution that adapts when the DOM changes. It’s particularly effective for validating end-to-end flows like form submissions or checkout processes from an agent’s perspective.

The low-tech reality check

There’s also a simpler method: the text-only browser Lynx. It strips away visual layout entirely and shows how a non-visual system reads your page. It’s an old tool, but surprisingly effective as a proxy for how structured—or unstructured—your content really is.

A Practical Testing Workflow

A simple, repeatable workflow can reveal most agent-perception issues before they ship:

  • Run VoiceOver or NVDA through your key user flows
    Try completing core tasks without relying on visual layout. If you can’t navigate it non-visually, neither can many AI agents.
  • Generate Playwright MCP accessibility snapshots
    For critical pages, inspect the structured output. Are interactive elements clearly labeled, named, and discoverable?
  • Check your page source (not just rendered output)
    Is your primary content present in the initial HTML, or does it depend on JavaScript execution to appear?
  • Load your site in Lynx or disable CSS
    Verify whether the content order and hierarchy still make sense. Agents don’t interpret design—they interpret structure.

A Checklist for Your Development Team

If you share one section of this article with developers, make it this one. It’s prioritized by impact and effort—starting with the highest return changes for the least complexity.

High impact, low effort

  • Use native HTML elements
    Use <button> for actions, <a href> for navigation, and <select> for dropdowns. Replace <div onclick> patterns wherever they exist.
  • Label every form input
    Associate <label> elements using the for attribute, and include standard autocomplete values.
  • Server-side render content pages
    Ensure primary content exists in the initial HTML response, not after JavaScript hydration.

High impact, moderate effort

  • Implement landmark regions
    Use <nav>, <main>, <aside>, and <footer> to define page structure. Add aria-label when multiple landmarks of the same type exist.
  • Fix heading hierarchy
    Maintain a single <h1> and ensure headings follow a logical sequence without skipping levels.
  • Avoid hiding critical content behind interactions
    Pricing, specifications, and key details should be visible in the initial render—not locked behind tabs, accordions, or clicks.

Moderate impact, low effort

  • Add ARIA states to dynamic components
    Use attributes like aria-expanded, aria-controls, and aria-hidden for menus, accordions, and toggles.
  • Use descriptive link text
    Prefer “Read the full report” over “Click here.” Link text is a key signal for both users and agents.
  • Make screen reader testing part of QA
    Treat it as a standard workflow step, not a one-time accessibility audit.

Key Takeaways

AI agents interpret websites through three main mechanisms: vision-based analysis, DOM parsing, and the accessibility tree. Across the industry, there is a clear convergence toward the accessibility tree as the most reliable and efficient interface. Systems like OpenAI Atlas, Microsoft Playwright MCP, and Perplexity AI Comet all increasingly rely on accessibility data as their primary structure.

Web accessibility is no longer just a compliance concern. The accessibility tree is effectively the interface AI agents use to understand and interact with your website. Research from University of California, Berkeley and University of Michigan shows that agent success rates drop sharply when accessibility signals are removed or degraded.

Semantic HTML is the foundation of everything. Native elements like <button>, <label>, <nav>, and <main> automatically generate a meaningful accessibility tree without any additional tooling. In most cases, no framework logic or ARIA is required for core functionality.

ARIA is a supplement, not a replacement. It should be used for dynamic states and custom components that don’t have native HTML equivalents. But it should always come after semantic HTML—not instead of it. Misusing ARIA doesn’t improve accessibility; it often degrades it.

Server-side rendering is now a requirement for visibility. AI crawlers that don’t execute JavaScript cannot see content hidden behind client-side rendering. If your content isn’t present in the initial HTML, it effectively does not exist in the AI indexing ecosystem.

Screen reader testing remains the most practical proxy for agent compatibility. If tools like VoiceOver or NVDA can navigate your site successfully, AI agents likely can as well. For deeper inspection, Playwright MCP accessibility snapshots provide a direct view of what agents actually perceive.

The first three parts of this series covered why this shift matters, how to optimize for citation, and the emerging protocol layer. This article focused on the implementation layer—but these are not separate efforts. The same work that improves accessibility also improves search visibility, AI citation rates, and agent compatibility. One foundation now serves multiple audiences.

And importantly, this work compounds. The semantic HTML and structured content patterns described here are the same foundation systems like WebMCP build on for declarative interfaces. In other words, the accessibility tree you create today becomes the structural substrate for tomorrow’s agent-native web.

Up next in Part 5: the commerce layer—how Stripe, Shopify, and OpenAI are building infrastructure for AI agents to complete transactions, and what that means for the future of checkout flows.