top of page

Regex for Humans

  • ShiftQuality Contributor
  • May 31, 2025
  • 4 min read

Regular expressions have a reputation for being unreadable. This reputation is earned:

^(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})$

That's a phone number validator. Don't worry about reading it. Nobody can.

But regex doesn't have to be like that. The basics are simple, useful, and learnable in an afternoon. The advanced stuff exists when you need it. You don't need to start there.

What Regex Does

Regular expressions describe patterns in text. Instead of searching for an exact string ("hello"), you search for a pattern ("a word that starts with h and has 5 letters" — \bh\w{4}\b).

Where you'll use them:

  • Search and replace in your editor (VS Code, any IDE)

  • Validating user input (email format, phone numbers)

  • Parsing log files for specific patterns

  • Extracting data from text (scraping, data cleaning)

  • Command-line tools (grep, sed, awk)

The Building Blocks

Literal Characters

Letters and numbers match themselves. The regex cat matches the text "cat." Nothing fancy.

The Dot: Any Character

. matches any single character.

c.t matches "cat", "cut", "c9t", "c!t" — anything with c, then any character, then t.

Character Classes: Specific Options

[abc] matches "a", "b", or "c" (one character from the set).

[0-9] matches any digit.

[a-zA-Z] matches any letter (upper or lowercase).

[^0-9] matches anything that's NOT a digit (the ^ inside brackets means "not").

Shorthand Classes

These are so common they get shortcuts:

| Pattern | Meaning | Equivalent | |---------|---------|------------| | \d | Any digit | [0-9] | | \w | Any word character | [a-zA-Z0-9_] | | \s | Any whitespace | [ \t\n\r] | | \D | NOT a digit | [^0-9] | | \W | NOT a word character | [^a-zA-Z0-9_] | | \S | NOT whitespace | [^ \t\n\r] |

Quantifiers: How Many

| Pattern | Meaning | |---------|---------| | * | Zero or more | | + | One or more | | ? | Zero or one (optional) | | {3} | Exactly 3 | | {2,5} | Between 2 and 5 | | {3,} | 3 or more |

\d{3} matches exactly three digits: "123", "999", "007".

\d+ matches one or more digits: "1", "42", "98765".

colou?r matches "color" and "colour" — the u is optional.

Anchors: Position

^ matches the start of a line. $ matches the end.

^Hello matches "Hello world" but not "Say Hello".

world$ matches "Hello world" but not "world class".

\b matches a word boundary — the edge between a word character and a non-word character.

\bcat\b matches "cat" but not "category" or "concatenate".

Groups: Capturing Parts

Parentheses group parts of a pattern and capture the match.

(\d{3})-(\d{4}) matches "555-1234" and captures "555" as group 1 and "1234" as group 2.

This is how you extract data from text — match the pattern and pull out the parts you need.

Practical Examples

Validate an Email (Simple Version)

^[\w.+-]+@[\w-]+\.[\w.]+$

Translation: Start of string → one or more word characters/dots/plus/hyphens → @ → one or more word characters/hyphens → a literal dot → one or more word characters/dots → end of string.

This isn't a perfect email validator (the RFC is insanely complex). But it catches the obvious formatting errors and works for 99% of real-world use.

Find Phone Numbers in Text

\d{3}[-.\s]?\d{3}[-.\s]?\d{4}

Translation: Three digits → optional separator (dash, dot, or space) → three digits → optional separator → four digits.

Matches: "555-123-4567", "555.123.4567", "555 123 4567", "5551234567".

Extract Dates

(\d{4})-(\d{2})-(\d{2})

Matches "2026-03-21" and captures year (2026), month (03), day (21) as separate groups.

Find URLs in Text

https?://[\w./\-?=&#]+

Translation: "http" + optional "s" + "://" + one or more URL-safe characters.

Search and Replace in Your Editor

Find all console.log( and replace with logger.info(:

Find: console\.log\( Replace: logger.info(

The backslash before . and ( means "literal dot" and "literal parenthesis" rather than the regex special meanings.

Reading Regex You Find in the Wild

When you encounter a complex regex:

  1. Read left to right. Each piece matches sequentially.

  2. Identify the quantifiers. What's repeated? What's optional?

  3. Find the groups. What's being captured?

  4. Use a tool. Regex101.com lets you paste a regex and see what each part does, with plain English explanations.

You don't need to write complex regex from memory. You need to understand the building blocks well enough to construct what you need, test it, and read what others have written.

The Rules

Always test your regex. Regex101.com is free and shows matches in real time. Paste your pattern, paste test text, see what matches. Never deploy a regex you haven't tested.

Start simple, add complexity. Build the pattern one piece at a time. Get the basic structure matching, then add quantifiers, then add groups. Don't try to write the complete pattern in one attempt.

Don't validate what you don't have to. "Is this roughly an email?" is a reasonable use of regex. "Does this email comply with RFC 5322?" is not — let the mail server validate it by sending a confirmation email.

Comment complex regex. Most languages support verbose regex mode where you can add comments:

import re

pattern = re.compile(r"""
    ^               # Start of string
    [\w.+-]+        # Local part of email
    @               # The @ symbol
    [\w-]+          # Domain name
    \.              # Dot before TLD
    [\w.]+          # TLD (and any subdomains)
    $               # End of string
""", re.VERBOSE)

A commented regex is readable. An uncommented one is a puzzle.

Key Takeaway

Regex is a pattern language for text: . matches any character, \d matches digits, \w matches word characters, + means "one or more," ? means "optional," and () captures groups. Start with the building blocks, combine them for practical tasks (validation, search, extraction), and always test at regex101.com. You don't need to write the 200-character phone number validator. You need the 15-character patterns that solve everyday problems.

Comments


bottom of page