Presenter(s)

Saabiriin Abdi

Abstract

Automated accessibility evaluation tools, such as WAVE and Google Lighthouse are widely used to assess compliance with the Web Content Accessibility Guidelines (WCAG). However, prior studies indicate that these tools often disagree, vary in the success criteria they support, and differ in the type of issues they detect. Because most evaluations were conducted on real websites, where the precise number of accessibility violations is unknown. Without a ground-truth baseline, it is impossible to measure false negatives, false positives, or the accuracy of detection.

This project addresses that gap by creating a controlled HTML webpage containing 40 intentional WCAG 2.1 Level A and AA violations distributed across the four POUR principles. Each violation is documented with its success criterion, HTML location, and expected detection behavior. WAVE and Lighthouse were executed 40 times under identical system conditions using automated macros to ensure consistency, and all results were directly compared to the WCAG 2.1 success criteria to identify true positives, false negatives, and false positives.

Both tools exhibited deterministic behavior, producing identical results across all runs. WAVE detected a broader range of violations, especially in perceivability and structure, yet generated substantial informational noise. Conversely, Lighthouse identified fewer violations overall but showed stronger performance in programmatically testable areas such as ARIA validation and contrast errors. Overlap analysis showed a high level of agreement regarding missing alt text, absent labels, and contrast failures, but low level of agreement in keyboard operability, focus order, and landmark structure. These findings indicated that the two tools captured fundamentally different subsets of WCAG violations.

Overall, the findings showed that no single automated tool provided complete WCAG 2.1 coverage. WAVE offered a wide range of features but lacked precision, while Lighthouse offered precision but lacked breadth. The results demonstrated that multi-tool evaluation was necessary for meaningful accessibility assessment, and that manual testing is crucial for success criteria that cannot be consistently automated. This study offers empirical evidence supporting the integrated application of both automated and manual approaches in assessing web accessibility.

College

College of Science & Engineering

Department

Computer Science

Campus

Winona

First Advisor/Mentor

Mingrui Zhang; Trung Nguyen

Location

Kryzsko Great River Ballroom, Winona, Minnesota; United States

Start Date

4-23-2026 1:00 PM

End Date

4-23-2026 2:00 PM

Presentation Type

Poster Session

Format of Presentation or Performance

In-Person

Session

2a=1pm-2pm

Poster Number

1

Share

COinS
 
Apr 23rd, 1:00 PM Apr 23rd, 2:00 PM

Analyzing Accessibility Issue Detection Differences Between WAVE and Google Lighthouse Using a Controlled WCAG Violation Webpage

Kryzsko Great River Ballroom, Winona, Minnesota; United States

Automated accessibility evaluation tools, such as WAVE and Google Lighthouse are widely used to assess compliance with the Web Content Accessibility Guidelines (WCAG). However, prior studies indicate that these tools often disagree, vary in the success criteria they support, and differ in the type of issues they detect. Because most evaluations were conducted on real websites, where the precise number of accessibility violations is unknown. Without a ground-truth baseline, it is impossible to measure false negatives, false positives, or the accuracy of detection.

This project addresses that gap by creating a controlled HTML webpage containing 40 intentional WCAG 2.1 Level A and AA violations distributed across the four POUR principles. Each violation is documented with its success criterion, HTML location, and expected detection behavior. WAVE and Lighthouse were executed 40 times under identical system conditions using automated macros to ensure consistency, and all results were directly compared to the WCAG 2.1 success criteria to identify true positives, false negatives, and false positives.

Both tools exhibited deterministic behavior, producing identical results across all runs. WAVE detected a broader range of violations, especially in perceivability and structure, yet generated substantial informational noise. Conversely, Lighthouse identified fewer violations overall but showed stronger performance in programmatically testable areas such as ARIA validation and contrast errors. Overlap analysis showed a high level of agreement regarding missing alt text, absent labels, and contrast failures, but low level of agreement in keyboard operability, focus order, and landmark structure. These findings indicated that the two tools captured fundamentally different subsets of WCAG violations.

Overall, the findings showed that no single automated tool provided complete WCAG 2.1 coverage. WAVE offered a wide range of features but lacked precision, while Lighthouse offered precision but lacked breadth. The results demonstrated that multi-tool evaluation was necessary for meaningful accessibility assessment, and that manual testing is crucial for success criteria that cannot be consistently automated. This study offers empirical evidence supporting the integrated application of both automated and manual approaches in assessing web accessibility.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.