Regular Expressions

In this section, you will learn about Regular Expressions.

What is a Regular Expression?

A Regular Expression (RegExp) is a powerful tool used to define search patterns for strings. It allows you to perform complex pattern matching and manipulation on strings, making it useful for tasks such as validation, extraction, and formatting of data.

Syntax

A regular expression can be defined as follows:

/pattern/flags

Example

const regex = /abc/i;  // Pattern to match 'abc', case insensitive

Flags

Flags are optional modifiers that change how the regular expression behaves. The most commonly used flags in JavaScript are:

  • g: Global search. The regex finds all matches, not just the first one.
  • i: Case-insensitive search.
  • m: Multiline search. Anchors (^ and $) match the start or end of each line.
  • s: Dot (.) matches newline characters as well.
  • u: Treat pattern as a Unicode pattern.
  • y: Sticky mode. Matches only from the current position in the target string.

Special Characters

Here are some special characters used in regular expressions:

  • .: Matches any character except newline.
  • ^: Matches the beginning of a string.
  • $: Matches the end of a string.
  • \d: Matches any digit (equivalent to [0-9]).
  • \w: Matches any word character (alphanumeric + underscore).
  • \s: Matches any whitespace character (spaces, tabs).
  • \b: Matches a word boundary.
  • \: Escapes a special character, treating it as a literal character.

Character Sets and Ranges

  • [...]: Matches any character within the square brackets.

    const regex = /[abc]/; // Matches 'a', 'b', or 'c'
    
  • [^...]: Matches any character NOT within the square brackets.

    const regex = /[^abc]/; // Matches any character except 'a', 'b', or 'c'
    

Quantifiers

Quantifiers define how many times a character or group can appear.

  • *: 0 or more times

    const regex = /ab*c/; // Matches 'ac', 'abc', 'abbc', etc.
    
  • +: 1 or more times

    const regex = /ab+c/; // Matches 'abc', 'abbc', but not 'ac'
    
  • ?: 0 or 1 time

    const regex = /ab?c/; // Matches 'ac' or 'abc'
    
  • {n}: Exactly n times

    const regex = /a{3}/; // Matches 'aaa'
    
  • {n,}: At least n times

    const regex = /a{2,}/; // Matches 'aa', 'aaa', 'aaaa', etc.
    
  • {n,m}: Between n and m times

    const regex = /a{2,4}/; // Matches 'aa', 'aaa', or 'aaaa'
    

Groups and Alternations

  • (...): Groups multiple tokens together for capturing or applying quantifiers.

    const regex = /(abc)+/; // Matches 'abc', 'abcabc', etc.
    
  • |: Alternation, like a logical OR.

    const regex = /cat|dog/; // Matches either 'cat' or 'dog'
    

Lookaheads and Lookbehinds

Lookaheads and lookbehinds allow for conditional matching without consuming characters.

  • Positive Lookahead (?=):

    Ensures that the following characters match a pattern.

    const regex = /abc(?=d)/; // Matches 'abc' only if followed by 'd'
    
  • Negative Lookahead (?!):

    Ensures that the following characters do NOT match a pattern.

    const regex = /abc(?!d)/; // Matches 'abc' only if NOT followed by 'd'
    
  • Positive Lookbehind (?<=):

    Ensures that the preceding characters match a pattern.

    const regex = /(?<=a)bc/; // Matches 'bc' only if preceded by 'a'
    
  • Negative Lookbehind (?<!):

    Ensures that the preceding characters do NOT match a pattern.

    const regex = /(?<!a)bc/; // Matches 'bc' only if NOT preceded by 'a'
    

Common Pattern

Digital

PatternDescExample
\d+Whole Numbers1, 2, 3
\d+\.\d+Decimal Numbers1.1, 1.2
\d+(\.\d+)?Whole + Decimal Numbers1, 1.1
-?\d+(\.\d+)?Negative, Positive Whole + Decimal Numbers-1, 1, 1.2

Language

PatternDescExample
[a-zA-Z]+Englishabc
[\u4e00-\u9fa5]+Chinese汉语

JSON

PatternDescExample
(?<="name":)[^,]+(?=,)JSON Value{"name":"Tapicker", "age":18} -> "Tapicker"

Credit

PatternDescExample
4[0-9]{12}(?:[0-9]{3})?Visa Credit-
3[47][0-9]{13}American Express Credit-
([1-9]{1})(\d{15}|\d{18})China Credit-

Phone Number

PatternDescExample
\(\d{3}\) \d{3}-?\d{4}US Phone Number(562) 988-1688
(562) 9881688
\d{3}-\d{8}|\d{4}-\d{7}CN Phone Number0511-4405222
021-87888822
1[3456789]\d{9}CN Cellphone Number18623236565

Zip Code

PatternDescExample
[1-9]\d{5}(?!\d)US516285
\d{5}-\d{4}|\d{5}CN90807 or 92064-3404

Email

PatternDescExample
\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*Emailsupport@tapicker.com

Date

PatternDescExample
\d{1,2}\/\d{1,2}\/\d{4}Date10/24/2022
\d{4}-\d{1,2}-\d{1,2}Date2022-10-24
\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}Datetime2022-10-24 12:08:16

Others

PatternDescExample
https?://([\w-]+\.)+[\w-]+(/[\w-./?%&=]*)?URLhttps://www.tapicker.com/
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}IP Address192.168.1.1
(?:[0-9a-fA-F]{2}\:){5}[0-9a-fA-F]{2}MAC Address00:1b:63:84:45:e6
<(\S*?)[^>]*>.*?|<\/.*?>HTML<p id="test"></p>

Reference