Who owns critical accessibility findings — engineering or design?

Depends on the type. Critical findings split roughly: ARIA / keyboard / programmatic name issues are engineering-owned (they're code). Color contrast / focus ring visibility / target size issues are design-owned (they're design tokens). Content-related issues (alt text quality, link purpose) are content-owned. The triage policy should make ownership unambiguous per finding category, not per finding instance.

When should an accessibility issue escalate to leadership?

Three triggers. First: a customer files a formal complaint or accessibility-related lawsuit. Second: a critical finding is open past its SLA without a clear remediation plan. Third: the team's accessibility score drops more than 10 points in a single sprint, indicating a systemic regression. Escalation isn't punitive; it's about getting resources and visibility for problems that can't be solved at the team level.

How should I treat customer-reported accessibility issues?

Treat them as critical until triaged. The fact that a customer filed a complaint means the issue is impactful enough to a real user that they took the time to report it. Internal severity assessment may downgrade some to serious or moderate, but the default starting position is critical SLA. Always send a personal acknowledgement within 24 hours — don't auto-respond. The relationship value of taking the report seriously is significant.

What if design and engineering disagree on severity?

Default to the higher severity. If design says 'this is moderate' and engineering says 'critical', call it critical. The cost of over-prioritizing a moderate issue is small (it gets fixed sooner than necessary); the cost of under-prioritizing a critical issue is large (it ships, a user complains, you're now in a remediation crisis). When in doubt, defer to whichever role most directly observes user impact — typically QA or accessibility specialists.

How often should we re-test fixed issues?

Fixed issues should be verified within the same sprint they're fixed in (the engineer running the fix re-runs the scan). The full scan should re-run nightly on main to catch regressions. Quarterly external audits provide a third-party check. The combination — immediate fix verification + nightly regression scan + quarterly audit — keeps your accessibility posture honest without requiring continuous expensive testing.

Is the goal zero critical findings forever?

Yes for steady state, but expect transient spikes. New features will introduce new findings during development. The metric to track is 'time from critical-found to critical-fixed' (sub-3-days is healthy) and 'critical findings on main branch at end of sprint' (zero is the goal). A zero-criticals dashboard at the end of every sprint is a strong organizational signal that accessibility is being treated as a release blocker, not a backlog item.

Critical, Serious, Moderate, Minor: How to Triage Accessibility Issues by Severity

You ran your first comprehensive accessibility audit. The scan returned 247 findings. Your team has shipping pressure, limited bandwidth, and now a 247-row spreadsheet of "things that are wrong with the app".

What do you fix first?

This is the triage problem. Without a clear severity-to-ownership-to-SLA mapping, accessibility programs stall — every finding feels equally urgent and equally not-urgent at the same time. The team picks the easy ones, the hard ones rot, and six months later you're back where you started.

This post is the practical triage playbook: how the four severity levels actually map to user impact, how to assign ownership across design / engineering / content / QA, a 6-week rollout for a triage policy that won't drown the team, and how to share findings without forwarding PDFs.

ℹ️

What you'll get from this post

A working severity model that doesn't conflate "critical" with "annoying", a default ownership matrix you can adopt or adapt, an SLA template per severity level, a triage funnel diagram, and the metrics that tell you whether the program is healthy or backsliding.

Severity ≠ frequency

The most common triage mistake: treating high-severity findings as more important than high-frequency ones. They're different axes.

Consider:

Critical, single page. A modal with a focus trap on the homepage. A real but localized issue.
Minor, every page. The page footer's copyright date has a contrast issue (3.8:1 vs the AA threshold of 4.5:1). Affects every single screen, every user.

Both deserve attention. The minor-but-frequent issue may actually be more urgent in customer-impact terms — every user hits it on every page — even though "critical" sits higher on the severity ladder.

The right way to think about triage: severity sets the SLA; frequency sets the priority within an SLA tier.

A critical-on-rare-flow issue gets P0 SLA but lower priority within P0 than a critical-on-checkout issue. A moderate-everywhere issue gets P2 SLA but the highest priority within P2. The same finding can be both: priority is the cross of severity × frequency × business impact.

The four severity levels with concrete examples

Severity in WCAG-aligned scanners follows axe-core's 4-level system: critical, serious, moderate, minor. The labels aren't arbitrary — they map to specific user-impact descriptions. Memorize the canonical examples for each.

Severity ladder with one canonical example per level

Critical

Definition: Blocks users with disabilities from completing core functionality. The user cannot accomplish the task at all.

Canonical examples:

A form submit button with no accessible name. Screen-reader users can't identify it; voice-control users can't activate it.
A modal dialog with no focus trap. Keyboard users tab into the page behind the modal and interact with hidden elements.
A keyboard-only flow that requires mouse interaction. Pure keyboard users can't proceed.
A missing alt on the only image that conveys core information (a CAPTCHA, a chart that's the page's primary content, a product photo with no surrounding description).

SLA target: Fix within the current sprint. P0 in your tracker.

Serious

Definition: Major barrier; most affected users can work around with effort, but the workaround is significant. The user can accomplish the task but only after extra friction.

Canonical examples:

Color contrast failing AA for primary body text (~3:1 instead of 4.5:1). Most users with low vision strain to read; some give up.
A keyboard trap in a non-modal component (e.g., a date picker that swallows arrow keys). User has to refresh the page to escape.
Inline form-validation errors with no aria-live. SR users hear no error feedback but can re-read the form.
A missing visible focus indicator on hover-styled buttons. Keyboard users can navigate but with significant difficulty.

SLA target: Fix within the next sprint. P1.

Moderate

Definition: Usability issue; causes friction but rarely blocks. The task is achievable but the experience is degraded for affected users.

Canonical examples:

Inconsistent heading order (<h2> followed by <h4> skipping <h3>). SR navigation by headings is awkward.
Color-only differentiation for required fields (asterisk in red without text). Color-blind users may miss the indicator but the form still works.
Generic alt text ("image", "photo") on functional images. Conveys some info but not what's needed.
Tab order that's logical but not visually expected (e.g., right column gets focus before left).

SLA target: Fix within the next 2-3 sprints. P2.

Minor

Definition: Polish item; very rarely user-blocking. The fix improves the experience but the absence doesn't materially impair use.

Canonical examples:

Missing lang attribute on <html> when the document language is obvious from context.
Decorative icons next to text labels marked as <img> instead of properly hidden with aria-hidden.
Empty headings (<h2></h2>) — usually a templating mistake that produces no actual content.
<button> without type="button" inside a <form> (defaults to submit, often unintentional).

SLA target: Fix when you're already in the file. P3 / "maintenance" backlog.

Ownership: who fixes what

The single most useful artifact in any accessibility program is a clear ownership matrix mapping finding categories to owning teams. Without it, every finding starts as "who fixes this?" and stalls.

Ownership matrix: severity × team

The default mapping that works for most teams:

The principle: owners should be one person or one team per category, not multi-team committees. A designer who owns contrast across the design system is faster than a contrast committee with reps from design, engineering, and QA.

The triage funnel

Once findings have severity and ownership, they flow through a state machine. The cleaner the funnel, the faster issues move through it.

Triage funnel: open → in progress → fixed → verified

Five states cover the lifecycle:

Open — newly detected by a scan. Severity assigned automatically by the scanner; owner assigned by category mapping.
In progress — assigned to a specific engineer/designer; pull request open or design ticket in review.
Fixed — code merged or design tokens updated; ready for verification.
Verified — re-scanned (automated) or manually checked; confirmed resolved.
Closed — fully resolved and not re-detected for 14+ days (regression-proofed).

Edge states: deferred (acknowledged, intentionally out-of-scope, with documented reason and date), and wontfix (decision not to fix, e.g., third-party widget the team doesn't control). Both should be rare; both should be reviewed quarterly.

SLA template

A working SLA per severity. Adopt verbatim or adjust for your team's velocity.

| Severity | Triage SLA | Fix SLA | Re-verify SLA | Escalation trigger | |---|---|---|---|---| | Critical | 4 hours from detection | 3 business days from triage | 24 hours from fix | Open past 5 days, or 2+ open simultaneously | | Serious | 1 business day | 10 business days | 3 business days | Open past 15 days | | Moderate | 3 business days | 2 sprints (4 weeks) | Next scheduled scan | Open past 8 weeks | | Minor | 1 sprint | "When you're in the file" | Next quarterly audit | Aging >6 months |

The triage SLA matters more than people expect. A finding that sits in "open" for a week unowned isn't a triage problem; it's an organizational problem masquerading as a triage problem. Tight triage SLAs (4 hours for critical, 1 business day for serious) keep findings moving.

API field name reminder: `impact` vs `severity`

A practical note when working with accessibility scanners' APIs and webhooks: what the docs / UI call "severity" is impact in the wire format. axe-core uses impact: "critical" | "serious" | "moderate" | "minor" as the field name in JSON output, and most scanners (TestKase included) preserve that naming for API compatibility.

Example finding in JSON:

{
  "id": "color-contrast",
  "impact": "serious",
  "tags": ["wcag2aa", "wcag143"],
  "description": "Ensures the contrast between foreground and background colors meets WCAG 2 AA contrast ratio thresholds",
  "nodes": [
    {
      "target": [".btn-secondary"],
      "html": "<button class='btn-secondary'>Cancel</button>",
      "failureSummary": "Element has insufficient color contrast of 3.8 (foreground color: #6b7280, background color: #f3f4f6)"
    }
  ]
}

Same four buckets, same meaning, just a different field name when consumed by code. Worth flagging in your team's onboarding doc — engineers writing webhook handlers will look for severity and find nothing.

The single most preventable cause of accessibility-program decay: PDFs.

Pattern that goes wrong: the QA team runs a scan, exports a 60-page PDF, emails it to engineering. Engineering opens it once, the issues live in PDF format with no way to integrate into the team's actual workflow tools, and within a week the PDF is forgotten in someone's downloads folder.

The pattern that works:

Findings live in your tracker. GitHub issues, Linear, Jira — wherever the team's other work lives. Each finding becomes a ticket with severity, owner, repro steps, and suggested fix (the six-field template from our flow-audits post).
Scans link to tickets, not produce reports. When a new scan finds a previously-unseen issue, the scanner's webhook auto-creates a ticket. Existing tickets get updated (e.g., "still failing on 2026-04-28 scan").
Comments and sign-off live in the ticket. Designer pushes back on a contrast finding? Comment on the ticket. Engineer claims the fix is shipped? Mark fixed; auto-verification runs on next scan. Audit trail is the ticket history.
Reports are summary, not source. When a stakeholder asks "what's our accessibility status?", point them to a dashboard (live, not exported) or generate a fresh PDF as an artifact-of-the-moment, not as the canonical source.

TestKase's team-sharing implements this pattern: scans share to teams, comments thread on each finding, sign-off creates an audit trail, and the export-to-PDF is for stakeholders only — never the working surface. Same effect with any tracker integration; the principle matters more than the tool.

Handling severity disputes

You'll get them. PM says a finding is "minor", QA says "critical". Here's the protocol:

Default to the higher severity. Cost of over-prioritizing is small; cost of under-prioritizing can be lawsuits, customer churn, or (most often) backlog rot.
Defer to the role closest to user impact. Usually QA or an accessibility specialist. PMs sometimes underestimate severity because they're optimizing for product velocity; designers sometimes underestimate because they wrote the design that produced the failure. The role most directly observing impact has the best calibration.
Document the decision in the ticket. Don't argue over the severity-classification line in private Slack and call it a day. Comment on the ticket: "Discussed 2026-04-28 — keeping as Critical because [user impact reason]. Reviewer: @accessibility-lead". Keeps the audit trail clean and prevents the same conversation in 3 months.
Escalate persistent disputes. If the same person is repeatedly arguing for lower severity across many findings, that's a process signal — the calibration conversation isn't a one-off, it's organizational. Run a calibration session with concrete examples.

Metrics that matter

Five metrics tell you whether the triage program is healthy. Most teams track at least 1-3; mature programs track all five.

1. Open by severity, over time

Plot critical / serious / moderate / minor open counts week-over-week. Healthy programs have:

Critical: trending toward zero, occasional spikes that resolve within a week.
Serious: stable or trending down. Spikes resolve within a sprint.
Moderate: slowly trending down. Persistent backlog is fine; growing backlog is a signal.
Minor: ignored except in deeper-cleanup sprints.

A multi-week growth in open critical findings without resolution is the canary for a degrading program — investigate before it becomes a crisis.

2. Mean time to fix (MTTF) per severity

How long, on average, from "open" to "fixed". Targets matching the SLAs above:

Critical MTTF: under 3 days
Serious MTTF: under 10 days
Moderate MTTF: under 4 weeks
Minor MTTF: under 1 quarter (or "as you're in the file")

MTTF rising over multiple sprints, especially for critical / serious, is a stronger signal than open-count metrics. Open count can hide a healthy backlog (lots of moderates, all moving). MTTF reveals systemic slowdown.

3. Regression rate

% of fixed findings that re-appear within 14 days. A healthy program has under 5%. Higher rates suggest:

Fixes are partial (the change addressed one element but not the design pattern).
The CI gate isn't catching regressions before merge.
The fix is in code that other teams also touch and they don't know the constraint.

If regression rate is climbing, look at whether your CI gate is working (see Accessibility in CI/CD) and whether your design tokens are codifying the fixes (see Color Contrast).

4. Net new findings per sprint

How many new findings does each sprint introduce? Steady state is ~0 — your CI gate should prevent new violations. Spikes correlate with:

New feature launches (touch many surfaces, introduce new patterns)
Framework upgrades (Tailwind v3 → v4, React 17 → 18 — token mappings change)
Onboarding new engineers (until trained on accessibility patterns)

Spike-and-recover is fine. Sustained high net-new is a process problem.

5. % of findings auto-resolved by CI

How many findings get caught at PR (CI scan) before they ever land on main. Mature programs see 70-90% auto-caught. Lower rates suggest the CI gate isn't running on all routes or the team is skipping gates frequently.

A 6-week rollout

If you have no triage policy today, here's how to put one in place without exhausting the team.

Week 1: Pick the severity buckets. Adopt the 4-level axe-core scheme verbatim. Don't invent your own.

Week 2: Build the ownership matrix. Adopt the table above as a starting point; adjust for your org's structure (e.g., if you don't have a content team, fold "alt text" into UX). Publish the matrix on your team wiki.

Week 3: Set the SLAs. Adopt the SLA template above; adjust to your team's velocity (a team on 2-week sprints might use longer fix SLAs than a team on 1-week sprints).

Week 4: Move existing findings into the tracker. Don't try to fix them all — just get them visible. Use the four buckets to triage.

Week 5: Run the first triage meeting. 30 minutes, weekly cadence. Review critical findings (status, blockers). Spot-check serious findings (any aging past SLA?). Don't try to discuss every finding — only the exceptions.

Week 6: Set up the dashboard. Track the five metrics above; pick a tool (your existing tracker's dashboard, a Grafana board, Notion). Make it visible to the wider team — not just accessibility specialists.

By end of week 6, you have an operating triage system. From there, the program runs itself; your role becomes monitoring metrics, escalating exceptions, and tuning the policy as the team grows.

Closing

Severity isn't an opinion. It's the bridge between "the scan found 247 issues" and "we have a working program that resolves issues at a sustainable pace". The four levels (critical / serious / moderate / minor) map cleanly to user impact; the SLAs map cleanly to engineering velocity; the ownership matrix maps cleanly to team boundaries.

Adopt the defaults, run the 6-week rollout, watch the metrics. The teams that get accessibility right aren't the ones with the smartest scanners — they're the ones with the cleanest triage and ownership.

For the broader rollout context, TestKase's team-sharing implements the "no PDFs" principle directly: scans live in shared workspaces, findings thread comments, sign-off creates an audit trail. Combined with the WCAG 2.2 AA checklist for what to check, color contrast deep-dive for the most-common category, and CI/CD integration for catching new issues at PR time, the triage policy in this post completes the full operational loop.

Set up your team's accessibility triage free →

Stay up to date with TestKase

Get the latest articles on test management, QA best practices, and product updates delivered to your inbox.

Critical, Serious, Moderate, Minor: How to Triage Accessibility Issues by Severity

Critical, Serious, Moderate, Minor: How to Triage Accessibility Issues by Severity

Severity ≠ frequency

The four severity levels with concrete examples

Critical

Serious

Moderate

Minor

Ownership: who fixes what

The triage funnel

SLA template

API field name reminder: `impact` vs `severity`

Handling severity disputes

Metrics that matter

1. Open by severity, over time

2. Mean time to fix (MTTF) per severity

3. Regression rate

4. Net new findings per sprint

5. % of findings auto-resolved by CI

A 6-week rollout

Closing

Stay up to date with TestKase

Related Articles

Why Single-Page Accessibility Scans Miss Real Bugs (and What Multi-Page Audits Catch)

Accessibility Testing in CI/CD: Catching WCAG Issues Before They Ship

Color Contrast: The #1 Accessibility Violation (and How to Fix It in 30 Minutes)

Critical, Serious, Moderate, Minor: How to Triage Accessibility Issues by Severity

Severity ≠ frequency

The four severity levels with concrete examples

Critical

Serious

Moderate

Minor

Ownership: who fixes what

The triage funnel

SLA template

API field name reminder: impact vs severity

Sharing findings without forwarding PDFs

Handling severity disputes

Metrics that matter

1. Open by severity, over time

2. Mean time to fix (MTTF) per severity

3. Regression rate

4. Net new findings per sprint

5. % of findings auto-resolved by CI

A 6-week rollout

Closing

Stay up to date with TestKase

Related Articles

Why Single-Page Accessibility Scans Miss Real Bugs (and What Multi-Page Audits Catch)

Accessibility Testing in CI/CD: Catching WCAG Issues Before They Ship

Color Contrast: The #1 Accessibility Violation (and How to Fix It in 30 Minutes)

API field name reminder: `impact` vs `severity`