Tutorial 8: Add Visual Regression Testing
- Contributor
- May 3
- 3 min read
Functional tests verify what the UI does. Visual regression tests verify what it looks like. This tutorial walks through adding visual regression without producing noise nobody triages.
What You'll Build
Visual regression coverage for 3-5 critical UI views, with a process for reviewing diffs.
Step 1: Pick the Tool (10 min)
Options:
Playwright built-in: screenshot comparison; free; cross-browser
Percy: SaaS; cleaner DX; reviewers can approve diffs in the UI
Chromatic: great for Storybook-based projects
BackstopJS: open source, self-hosted
For Playwright projects, Playwright's built-in is enough to start. Upgrade to Percy/Chromatic if you need better review UX.
Step 2: Configure (10 min)
Playwright built-in:
// playwright.config.ts
export default defineConfig({
expect: {
toHaveScreenshot: {
maxDiffPixels: 100,
threshold: 0.2,
},
},
});
The thresholds tolerate minor anti-aliasing differences. Too strict produces noise; too loose misses bugs.
Step 3: Pick Views to Cover (10 min)
Not every page. The high-value views:
Sign-in
Main dashboard
Critical user actions
Design-system showcase (if you have one)
5-15 views is a reasonable start. Each one captures the look of a critical part of the product.
Step 4: Write the Visual Tests (20 min)
import { test, expect } from '@playwright/test';
test('signin page matches baseline', async ({ page }) => {
await page.goto('/signin');
await expect(page).toHaveScreenshot('signin.png');
});
test('dashboard matches baseline', async ({ page }) => {
await signIn(page);
await page.goto('/dashboard');
// Wait for content to load
await page.waitForLoadState('networkidle');
await expect(page).toHaveScreenshot('dashboard.png');
});
test('button component matches baseline', async ({ page }) => {
await page.goto('/storybook/button');
const button = page.locator('[data-testid="primary-button"]');
await expect(button).toHaveScreenshot('button-primary.png');
});
You can screenshot a full page or a specific element.
Step 5: Generate Baselines (5 min)
First run generates baselines:
npx playwright test --update-snapshots
Review the baselines. Are they what you'd expect? Commit them.
Step 6: Handle Dynamic Content (15 min)
Visual regression breaks on dynamic content (timestamps, animated elements, ads). Strategies:
Mask dynamic elements:
await expect(page).toHaveScreenshot('dashboard.png', {
mask: [
page.locator('[data-testid="current-time"]'),
page.locator('[data-testid="user-avatar"]'),
],
});
Stabilize the data:
// Use a known test user with stable data
await signIn(page, 'visualtest@example.com');
Disable animations:
await page.addStyleTag({
content: `
*, *::before, *::after {
animation-duration: 0s !important;
transition-duration: 0s !important;
}
`,
});
Step 7: Run Cross-Browser (15 min)
// playwright.config.ts
export default defineConfig({
projects: [
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
{ name: 'webkit', use: { ...devices['Desktop Safari'] } },
],
});
Each browser gets its own baseline. Catches browser-specific rendering issues.
If you're not running cross-browser, skip this. The complexity adds up.
Step 8: Set Up the Review Flow (10 min)
When a visual test fails, the developer sees:
Baseline image
Current image
Diff overlay
They decide:
Bug: fix the code
Intentional change: update the baseline (npx playwright test --update-snapshots)
Noise: mask or stabilize
Without a review flow, visual tests become noise. Make sure each diff gets a real look.
Step 9: Track in CI (15 min)
- name: Run visual tests
run: npx playwright test --project=chromium
- name: Upload diff images on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: playwright-report/
Failed visual tests produce reviewable artifacts.
Step 10: Maintain (ongoing)
Quarterly:
Review baselines that haven't been updated in a long time — still correct?
Review masks — do they still target the right dynamic content?
Add coverage for new critical views
Remove coverage for removed views
Visual baselines drift if not maintained.
What You Just Did
You added a layer of testing that catches what functional tests can't — visual breakage. With masking and stabilization, the suite stays low-noise.
Common Failure Modes
Coverage of every view. Hundreds of baselines; high maintenance. Focus on critical views.
Strict tolerance. Every minor change fails. Tune the diff threshold.
Auto-accept diffs. "I'll just update the baseline." Defeats the purpose. Real review.
Dynamic content not stabilized. Tests fail every run. Mask or stabilize.
Browser variance not handled. Tests pass on Chromium, fail on WebKit. Either handle both or pick one.
Next Tutorial
Tests that lie are worse than missing tests. Triage them: Tutorial 9: Quarantine Flaky Tests.
Related reading
Keep learning. This article is part of the Test Automation path in the ShiftQuality Learning Center. Build test automation that lasts, with ROI you can defend.


