Skip to main content

A Primer on Regex

info

This page is part of a primer series that also includes Java, Python, JavaScript (JS), and TypeScript. It is subject to continuous improvement.

Character Classes

PatternDescription (matches...)
[abcde]any of a, b, c, d, and e
[^abcde]all but any of a, b, c, d, and e (^ means negation)
[a-e]any character in the reange, a-e
[a-e[w-z]]any character in the range, a-e or w-z (union)
[a-e&&[c-e]]only common characters in both ranges: c-e (intersection)
[a-e&&[^c-e]]any character in the range a-e, but not in the range c-e: a-b (subtraction)
.any character
\dany digit; equivalent to [0-9]
\Dany character that is not a digit: equivalent to [^0-9]
\sa space character
\Sany character that is not a space character: equivalent to [^\s]
\wa word character (alphanumeric/underscore): equivalent to [a-zA-Z0-9_]
\Wany character that is not a word character: equivalent to [^\w]

Boundary Patterns

PatternDescription (matches...)
^the beginning of a line
$the end of a line
\Athe beginning of the input
\bthe word boundary line
\Bthe non-word boundary
\Gthe end of the previous match
\zthe end of the input
\Zthe end of the input (if present, for the final terminator)

Quantifiers

  • Curly braces, {}, or symbols like *, ?, + are used to quantify the number of occurrences of a pattern.
  • Options include: {specific number}, {minimum number, maximum number}, and {minimum number,}
  • * : {0,} (used to search for patterns that may occur any number of times, or may not be present)
  • ? : {0,1} (used to search for patterns that may or may not be present)
  • + : {1,} (used to search for patterns that may occur at least once or more times)
  • Useful external link 1: Quantifiers
  • Useful external link 2: Symbols
  • Useful external link 3: More symbols

Logical Operators

PatternDescription (matches...)
x|yx or y
xyFirst x, then y

Groups

  • Defined using (pattern)
PatternDescription (matches...)
(\w\d\w)a group having a digit character in the middle of two word characters

Backreferences

  • Added by putting \n at the end of a pattern where n is the group number
PatternDescription (matches...)
(\w\d\w)/\1a group—having a digit character in the middle of 2 word characters—repeated twice