Free Regex Cheatsheet
Interactive reference guide for regular expressions.
Test Pattern
How to Use
- Browse the pattern categories or use the search box to find specific patterns.
- Enter a regex pattern in the "Test Pattern" field and sample text in "Test Text".
- Toggle flags (global, case-insensitive, multiline) and see matches highlighted instantly.
Frequently Asked Questions
What is a regular expression?
A regular expression (regex or regexp) is a pattern used to match, search, and replace text. It uses special characters and syntax to define what strings to find.
What do the flags do?
Global (g) finds all matches. Case Insensitive (i) ignores letter case. Multiline (m) treats ^ and $ as line boundaries instead of string boundaries.
Can I use this cheatsheet in my code?
Yes! Once you've tested a pattern here and verified it works, copy the regex pattern directly into your JavaScript, Python, or other programming language.
A Brief History of the Pattern Language
Regular expressions began as a piece of theoretical computer science. Stephen Kleene defined "regular sets" in a 1956 paper on neural networks; Ken Thompson built them into Unix in 1968 with grep. Henry Spencer's open-source regex library (mid-1980s) became the basis for many later implementations. Larry Wall extended the syntax dramatically in Perl, and his "Perl-compatible regular expressions" (PCRE) became the de facto standard most modern languages followed. Today there are several closely-related but subtly different regex flavours, and a pattern that works in one engine doesn't always work identically in another.
The Engine Your Pattern Lives In
The same syntax can mean different things in different engines. The big families:
- POSIX BRE (Basic Regular Expressions), used by
grep's default mode,sed. Many metacharacters require backslash escaping:(,),{,},+,?,|are literal unless escaped. - POSIX ERE (Extended Regular Expressions), used by
egrep,awk. The above metacharacters work without escaping. - PCRE (Perl-Compatible Regular Expressions), extends ERE with lookarounds, atomic groups, named captures and backreferences. Used by PHP and most modern languages. The Perl-derived shorthand classes
\d/\w/\sare common to PCRE, JavaScript, .NET, Java and Python. - JavaScript RegExp, close to PCRE but with notable differences. ES2018 added lookbehinds, named capture groups, the
sdotall flag, and Unicode property escapes via theuflag. Thevflag for set notation arrived in ES2024. - Python
reand Pythonregex,reis in the standard library; the third-partyregexmodule adds Unicode-aware features, variable-width lookbehinds, and other PCRE-style enhancements. - RE2 (Google's library, used in Go), guarantees linear time but doesn't support backreferences or lookarounds. The trade-off: predictable performance, fewer features.
This cheatsheet's interactive tester runs in JavaScript, so the pattern is evaluated by the browser's JS engine. Patterns that work here may behave differently in Python or PHP. Most differences are in advanced features (lookbehinds, Unicode property escapes, backreferences) rather than basic syntax.
The Core Building Blocks
Almost every regex pattern is built from these elements:
- Literals, match themselves.
catmatches the substring "cat". - Anchors,
^(start of string / line),$(end),\b(word boundary),\B(non-word-boundary). - Character classes,
[abc]matches a, b, or c.[^abc]negates.[a-z]is a range. Shorthands:\d(digit),\w(word character: letter, digit, underscore),\s(whitespace), and uppercase versions for negation (\D,\W,\S). - Quantifiers,
?(0 or 1),*(0 or more),+(1 or more),{n},{n,},{n,m}. Greedy by default (match as much as possible); add?for lazy:*?,+?,??. - Groups,
(...)capturing,(?:...)non-capturing,(?<name>...)named (PCRE / JS / Python). - Alternation,
cat|dogmatches either. - Lookarounds,
(?=...)positive lookahead,(?!...)negative lookahead,(?<=...)positive lookbehind,(?<!...)negative lookbehind. Match without consuming. - Backreferences,
\1,\2(numbered),\k<name>(named). Match the same text the corresponding capture matched. - Flags,
g(global),i(case-insensitive),m(multiline:^and$match line boundaries),s(dotall:.matches newlines),u(Unicode),y(sticky in JS).
Patterns Worth Memorising
A handful of patterns come up so often it's worth keeping them in your head:
| Use | Pattern |
|---|---|
| Email (basic) | ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ |
| URL | https?://[^\s]+ |
| US phone number | \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4} |
| ISO date (YYYY-MM-DD) | \d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]) |
| IPv4 address (no octet validation) | \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b |
| Hex colour | ^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$ |
| Whitespace at start/end of line | ^\s+|\s+$ |
| Multiple consecutive spaces | \s{2,} |
A note on email regex: full RFC 5322 email validation needs a 6,000-character monster regex. The simple form above accepts 99% of real email addresses and rejects nothing legitimate; for production use, send a confirmation email instead of trying to perfectly validate the syntax.
Greedy vs Lazy: A Common Surprise
By default, quantifiers are greedy: they match as much as possible while still allowing the overall pattern to match. So <.+> against <a>text</a> matches the whole thing, not just <a>, because .+ grabs as much as it can. To match the smallest possible string, append ? to the quantifier: <.+?> matches <a> and then </a> separately. The greedy/lazy choice is one of the most common sources of "why isn't my regex matching what I expected" bugs.
Catastrophic Backtracking and ReDoS
Some regex patterns can take exponential time to fail on certain inputs, a class of denial-of-service vulnerability called ReDoS (Regular Expression Denial of Service). The classic culprits are nested quantifiers like (a+)+ or (a|aa)+ applied to a long string of as followed by a non-matching character. The engine tries every possible way to split the string before giving up, and the number of ways is exponential.
Real-world incidents: Cloudflare's 2019 outage was triggered by a regex deployed in a WAF rule that catastrophically backtracked on certain inputs. Stack Overflow had a similar incident in July 2016: a post-trim regex (^[\s]+|[\s]+$) hit exponential backtracking on a single comment containing roughly 20,000 consecutive whitespace characters and took the site down for 34 minutes. Defensive habits: avoid nested quantifiers, prefer atomic groups ((?>...)) where supported, and consider using RE2 / linear-time engines for untrusted input.
Per-Language Quirks Worth Knowing
- JavaScript: backslashes need double-escaping in string literals (
"\\d") but not in regex literals (/\d/). Use the regex literal form when possible. - Python: use raw strings (
r"\d+") to avoid backslash issues. Theremodule is in the standard library;regexon PyPI adds extra features. - Java: backslashes need quadruple-escaping (
"\\\\d"for\d) because Java string literals use\as escape and the regex compiler then sees\\d. - Bash: regex matching in
[[ string =~ pattern ]]uses POSIX ERE. Quoting rules are tricky; consultman bash. - Go: uses RE2, so backreferences and lookarounds aren't available. Trade-off: linear-time guarantee.
When NOT to Use Regex
Jamie Zawinski's famous 1997 line: "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
- Don't parse HTML / XML with regex. Use a real parser (DOMParser in browsers, BeautifulSoup in Python, jsoup in Java, etc.). HTML's nested structure is fundamentally beyond what regex can express cleanly.
- Don't parse JSON with regex. Use JSON.parse / standard library JSON parsers.
- Don't validate emails strictly with regex. Send a confirmation email; that's the only reliable test.
- Don't write a CSV parser as a regex. Quoted fields with embedded commas, escaped quotes, and multi-line values quickly outgrow what regex handles cleanly.
- Don't try to match balanced parentheses. Standard regex can't (it's a context-free language); some PCRE engines have recursion features that cheat, but a real parser is cleaner.
Common Mistakes
- Forgetting to escape special characters.
.,*,?,+,(,),[,],{,},\,^,$,|,/all have special meanings. To match them literally, prefix with backslash. - Greedy quantifiers consuming too much. Add
?for lazy matching when you want the smallest possible match. - Missing the global flag and wondering why only the first match shows. JavaScript's
String.prototype.match()returns only the first match without thegflag. - Catastrophic backtracking on long inputs. Nested quantifiers like
(a+)+can hang on certain inputs. Test with edge cases. - Assuming the same regex behaves the same in every language. Lookbehinds, Unicode escapes, and character class shortcuts all vary.
- Trying to validate emails too strictly. The technically-correct RFC 5322 regex is unmaintainable; a simple regex plus confirmation-email-on-signup is the working pattern.
- Using regex on HTML, JSON, or CSV. Use a proper parser; the time you save up front you'll lose to bugs.
More Frequently Asked Questions
Why does my pattern work here but fail in my code?
The most common cause is engine differences. JavaScript's RegExp doesn't support some features that PCRE does (and vice versa). Common gotchas: lookbehinds added late to JS (ES2018), named groups syntax differs slightly, Unicode property escapes need the u flag, and POSIX character classes like [[:alpha:]] are mostly absent from JS. Test in the engine you'll deploy to.
Is there a "global" way to match across multiple lines?
Two flags work together. The m (multiline) flag makes ^ and $ match the start and end of each line rather than the whole string. The s (dotall) flag makes . match newline characters too. Combined with g for global, you can write line-spanning patterns that find every match: /^foo.+$/gms.
Are my patterns and test text sent anywhere?
No. The pattern matching uses the browser's built-in JavaScript RegExp engine; nothing is uploaded to any server. This matters when you're testing patterns against real production log data, internal API responses, or sensitive content.
Should I learn lookbehinds?
Useful but not essential. Lookbehinds let you match text preceded by something without including the "something" in the match. Example: (?<=\$)\d+ matches digits after a dollar sign without consuming the dollar sign. They're supported in PCRE, modern JavaScript (ES2018+), and Python's regex module. If you're writing portable patterns, check the target engine first.
Why use (?:...) instead of (...)?
Non-capturing groups ((?:...)) are slightly faster, take no slot in the capture array, and keep your match results clean. Use them whenever you need grouping for alternation or quantification but don't need to extract the matched text. (http|https):// creates a capture you may not need; (?:http|https):// doesn't.
What's the right way to match Unicode characters?
In JavaScript, add the u flag and use Unicode property escapes: /\p{Letter}+/gu matches sequences of letters in any script. Without the u flag, \w only matches ASCII word characters. Python's re module is Unicode-aware by default in Python 3. Java needs Pattern.UNICODE_CHARACTER_CLASS. Most engines have some way to be Unicode-aware; check the docs for yours.