Improving web accessibility with trace-augmented generation

Improving web accessibility of our web applications has never been easier with Tidewave’s new diagnostics pane. Tidewave can perform accessibility checks, based on the page’s content rather than source code analysis, and forward those reports to your coding agent of choice.

Tidewave is the most precise and efficient tool to automatically diagnose and fix accessibility issues thanks to a technique we call Trace-Augmented Generation (TAG). By embedding framework-specific traces into diagnostics, our benchmarks show Tidewave achieves 2x higher accuracy (79% vs 40%), completes tasks 45% faster, and uses 9% fewer tokens on average compared to Claude Code. When compared to Cursor, it is 52% more accurate and 26% faster.

Tidewave + Claude Code

0s ↓25.1k tokens, ↑79% accuracy

Cursor

0s ↓51.8% accuracy

Claude Code

0s ↑27.7k tokens, ↓40.2% accuracy

Runtime web accessibility

Tidewave uses the industry standard axe-core to capture web accessibility diagnostics. axe-core can find on average 57% of WCAG issues automatically, which we then forward to your coding agent.

By running the checks on your page, Tidewave can find accessibility issues even on dynamic content, guaranteeing you can catch issues not handled by your static analysis tools.

Trace-Augmented Generation (TAG)

Whenever you run accessibility reports using Playwright or Chrome’s Lighthouse, those tools capture all DOM elements with accessibility violations. These tools can export metadata about these elements, such as their selector path, attributes, and so forth to coding agents, and we hope agents can ultimately locate where they are rendered in the source code and address those issues.

Most coding agents will attempt to find these elements by searching your codebase, with varying forms of complexity, ranging from using grep on your codebase to performing semantic analysis. These techniques are an active field of research called Retrieval-Augmented Generation (RAG).

Unfortunately, mapping those elements to source files is not always straightforward. For example, if you find an img DOM element with an accessibility violation, the first step of the coding agent is to search the codebase. The non-deterministic nature of model makes search quite random, with varying accuracy and speed. A large codebase may also have hundreds of img tags, leading the agent to waste tokens and time processing false positives.

Furthermore, because many rich web applications use component systems and other high-level abstractions, sometimes the img tag itself cannot even be found in your project’s source. For instance:

In Next.js, Phoenix, and React applications, it may come from an Image component provided by third-party component library
Rails applications typically use the image_tag helper

These issues compound and cause coding agents to take even longer to diagnose the root cause or completely failing to in many cases.

We solved this in Tidewave with Trace-Augmented Generation (TAG). Since Tidewave has tailored integration with each web framework we support (at the time of writing: Django, FastAPI, Flask, Next.js, Phoenix, Rails, and React), we can map DOM elements to their source code locations, which we then attach to each affected element in the accessibility report. This means the coding agent can then precisely act on our diagnostics. Here is a screenshot from the session in our announcement video, showing how the agent was able to immediately read and find the root causes, without search steps:

Fixing accessibility issues in Livebook's learn page

For the cases where the DOM element is rendered from a library, we also include the component/template rendering trace, so the agent can go “up the stack” and explore related files, drastically reducing the search space. You can see this in the screenshot above when the coding agent jumps from its current template to the layout. The only search the agent performed was within a file, it never had to grep the codebase.

Finally, because Tidewave runs in the browser, Tidewave can use browser APIs whenever it cannot statically determine the property of an element (for example, the element’s color or visibility) and use it this additional data to solve problems. In our benchmarks, Tidewave used its automatic error detection to spot bugs and fix pages without additional user input.

Benchmarks

We compared three separate open-source web applications written in different frameworks across three different scenarios:

Claude Code with Sonnet 4.5 via Tidewave (browser tools plus trace-augmented generation)
Claude Code with Sonnet 4.5 via CLI
Cursor with Sonnet 4.5 (browser tools)

All tests were one shot. We use the same model in all of them as we wanted to measure the variation in tooling.

The prompt were the same in all cases, which is the attached accessibility report generated by Tidewave plus the csurrent URL. The report includes failed diagnostics and the query selector of all affected elements. When TAG was enabled, we also included their rendering traces. In all reports below, we computed the average over 3 independent runs.

We measured the number of message token, excluding system prompts. We have not found a reliable way to retrieve this information from Cursor, therefore such data is not included in such cases.

Ruby on Rails: Campfire

Campfire is an open-source super simple group chat by Basecamp. We ran accessibility reports on the /account/edit page and found 4 affected elements. Given the relatively small size of the HTML page, we also ran an additional experiment that included the whole HTML source. Previous experiments showed that including HTML contents does not improve the agent’s ability, which we could once again validate here:

Setup	Accuracy	Time	Tokens
Tidewave + Claude Code	4/4	55s	8.5k
Cursor	2.3/4	2m08s	–
Claude Code	2.3/4	1m24s	15.5k
Claude Code + HTML	2.3/4	2m10s	22.4k

Including the HTML increased the running time, the number of consumed tokens, with no impact in the accuracy.

Phoenix: Livebook

Livebook is an open-source code notebook platforms for Elixir, written in Phoenix and LiveView, maintained by Dashbit (that’s us!). We ran accessibility reports on a brand new notebook page and found 10 affected elements. Livebook has a good mixture of front-end and backend code which increases the search space:

Setup	Accuracy	Time	Tokens
Tidewave + Claude Code	8/10	4m03s	29.5k
Cursor	8/10	4m37s	–
Claude Code	3.6/10	6m45s	39.6k

Cursor did surprisingly well in the scenario above. We wanted to understand if this was caused by its use of browser tools, or by the fact models generally do pretty well across the board on Elixir, or by another reason. For this reason, we did another batch of tests, this time by disabling browser tools in Cursor, and we saw no meaningful variation in the results.

Next.js (React): Shadcn

Shadcn is a set of beautifully designed components, with its landing page and documentation site written in Next.js. We ran accessibility reports on /docs/directory, which also includes MDX content, and found 11 affected elements.

Setup	Accuracy	Time	Tokens
Tidewave + Claude Code	6.3/11	2m56s	37.3k
Cursor	2/11	3m55s	–
Claude Code	3/11	6m21s	27.9k

We saw higher token usage for Tidewave because, when using the browser to verify if the issues were effectively addressed, two our of three times it spotted that a color contrast issue was still unresolved, and then it did additional work to address it. The single time it did not use browser tools lead to worse accuracy.