Free Regex Cheatsheet

Interactive reference guide for regular expressions.

Live pattern testing

Test Pattern

No matches

How to Use

  1. Browse the pattern categories or use the search box to find specific patterns.
  2. Enter a regex pattern in the "Test Pattern" field and sample text in "Test Text".
  3. Toggle flags (global, case-insensitive, multiline) and see matches highlighted instantly.

Frequently Asked Questions

What is a regular expression?

A regular expression (regex or regexp) is a pattern used to match, search, and replace text. It uses special characters and syntax to define what strings to find.

What do the flags do?

Global (g) finds all matches. Case Insensitive (i) ignores letter case. Multiline (m) treats ^ and $ as line boundaries instead of string boundaries.

Can I use this cheatsheet in my code?

Yes! Once you've tested a pattern here and verified it works, copy the regex pattern directly into your JavaScript, Python, or other programming language.

A Brief History of the Pattern Language

Regular expressions began as a piece of theoretical computer science. Stephen Kleene defined "regular sets" in a 1956 paper on neural networks; Ken Thompson built them into Unix in 1968 with grep. Henry Spencer's open-source regex library (mid-1980s) became the basis for many later implementations. Larry Wall extended the syntax dramatically in Perl, and his "Perl-compatible regular expressions" (PCRE) became the de facto standard most modern languages followed. Today there are several closely-related but subtly different regex flavours, and a pattern that works in one engine doesn't always work identically in another.

The Engine Your Pattern Lives In

The same syntax can mean different things in different engines. The big families:

This cheatsheet's interactive tester runs in JavaScript, so the pattern is evaluated by the browser's JS engine. Patterns that work here may behave differently in Python or PHP. Most differences are in advanced features (lookbehinds, Unicode property escapes, backreferences) rather than basic syntax.

The Core Building Blocks

Almost every regex pattern is built from these elements:

Patterns Worth Memorising

A handful of patterns come up so often it's worth keeping them in your head:

UsePattern
Email (basic)^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
URLhttps?://[^\s]+
US phone number\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
ISO date (YYYY-MM-DD)\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])
IPv4 address (no octet validation)\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
Hex colour^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$
Whitespace at start/end of line^\s+|\s+$
Multiple consecutive spaces\s{2,}

A note on email regex: full RFC 5322 email validation needs a 6,000-character monster regex. The simple form above accepts 99% of real email addresses and rejects nothing legitimate; for production use, send a confirmation email instead of trying to perfectly validate the syntax.

Greedy vs Lazy: A Common Surprise

By default, quantifiers are greedy: they match as much as possible while still allowing the overall pattern to match. So <.+> against <a>text</a> matches the whole thing, not just <a>, because .+ grabs as much as it can. To match the smallest possible string, append ? to the quantifier: <.+?> matches <a> and then </a> separately. The greedy/lazy choice is one of the most common sources of "why isn't my regex matching what I expected" bugs.

Catastrophic Backtracking and ReDoS

Some regex patterns can take exponential time to fail on certain inputs, a class of denial-of-service vulnerability called ReDoS (Regular Expression Denial of Service). The classic culprits are nested quantifiers like (a+)+ or (a|aa)+ applied to a long string of as followed by a non-matching character. The engine tries every possible way to split the string before giving up, and the number of ways is exponential.

Real-world incidents: Cloudflare's 2019 outage was triggered by a regex deployed in a WAF rule that catastrophically backtracked on certain inputs. Stack Overflow had a similar incident in July 2016: a post-trim regex (^[\s‌]+|[\s‌]+$) hit exponential backtracking on a single comment containing roughly 20,000 consecutive whitespace characters and took the site down for 34 minutes. Defensive habits: avoid nested quantifiers, prefer atomic groups ((?>...)) where supported, and consider using RE2 / linear-time engines for untrusted input.

Per-Language Quirks Worth Knowing

When NOT to Use Regex

Jamie Zawinski's famous 1997 line: "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

Common Mistakes

  1. Forgetting to escape special characters. ., *, ?, +, (, ), [, ], {, }, \, ^, $, |, / all have special meanings. To match them literally, prefix with backslash.
  2. Greedy quantifiers consuming too much. Add ? for lazy matching when you want the smallest possible match.
  3. Missing the global flag and wondering why only the first match shows. JavaScript's String.prototype.match() returns only the first match without the g flag.
  4. Catastrophic backtracking on long inputs. Nested quantifiers like (a+)+ can hang on certain inputs. Test with edge cases.
  5. Assuming the same regex behaves the same in every language. Lookbehinds, Unicode escapes, and character class shortcuts all vary.
  6. Trying to validate emails too strictly. The technically-correct RFC 5322 regex is unmaintainable; a simple regex plus confirmation-email-on-signup is the working pattern.
  7. Using regex on HTML, JSON, or CSV. Use a proper parser; the time you save up front you'll lose to bugs.

More Frequently Asked Questions

Why does my pattern work here but fail in my code?

The most common cause is engine differences. JavaScript's RegExp doesn't support some features that PCRE does (and vice versa). Common gotchas: lookbehinds added late to JS (ES2018), named groups syntax differs slightly, Unicode property escapes need the u flag, and POSIX character classes like [[:alpha:]] are mostly absent from JS. Test in the engine you'll deploy to.

Is there a "global" way to match across multiple lines?

Two flags work together. The m (multiline) flag makes ^ and $ match the start and end of each line rather than the whole string. The s (dotall) flag makes . match newline characters too. Combined with g for global, you can write line-spanning patterns that find every match: /^foo.+$/gms.

Are my patterns and test text sent anywhere?

No. The pattern matching uses the browser's built-in JavaScript RegExp engine; nothing is uploaded to any server. This matters when you're testing patterns against real production log data, internal API responses, or sensitive content.

Should I learn lookbehinds?

Useful but not essential. Lookbehinds let you match text preceded by something without including the "something" in the match. Example: (?<=\$)\d+ matches digits after a dollar sign without consuming the dollar sign. They're supported in PCRE, modern JavaScript (ES2018+), and Python's regex module. If you're writing portable patterns, check the target engine first.

Why use (?:...) instead of (...)?

Non-capturing groups ((?:...)) are slightly faster, take no slot in the capture array, and keep your match results clean. Use them whenever you need grouping for alternation or quantification but don't need to extract the matched text. (http|https):// creates a capture you may not need; (?:http|https):// doesn't.

What's the right way to match Unicode characters?

In JavaScript, add the u flag and use Unicode property escapes: /\p{Letter}+/gu matches sequences of letters in any script. Without the u flag, \w only matches ASCII word characters. Python's re module is Unicode-aware by default in Python 3. Java needs Pattern.UNICODE_CHARACTER_CLASS. Most engines have some way to be Unicode-aware; check the docs for yours.

Related Tools

JSON Formatter URL Encoder Text Tools