JavaScript Introduction | Regular Expressions | Applying Regular Expressions

Special Characters

When using a regular expression to search for a simple pattern, you can directly list the string you want to find. However, if you want to use more complex conditions than exact matching, such as finding only numbers or searching for whitespace, you need to use special characters.

Representative special characters available in JavaScript regular expressions are as follows.

Special character Description
\ If a normal character follows the backslash (\), it is interpreted as an escape character. If a special character follows it, that character is interpreted as a normal character.
\d Searches for digits. Same as /[0-9]/.
\D Searches for non-digit characters. Same as /[^0-9]/.
\w Searches for letters and digits, including underscore (_). Same as /[A-Za-z0-9_]/.
\W Searches for characters that are not underscore (_), letters, or digits. Same as /[^A-Za-z0-9_]/.
\s Searches for whitespace characters such as spaces, tabs, and line breaks.
\S Searches for characters that are not whitespace characters such as spaces, tabs, and line breaks.
\b Checks whether the beginning or end of a word matches the pattern.
\xhh Searches for the Unicode character corresponding to hexadecimal hh.
\uhhhh Searches for the Unicode character corresponding to hexadecimal hhhh.

 

The following example uses the special character \d.

var targetStr = "ab1bc2cd3de";
var reg1 = /\d/;    // 2 -> Searches for a digit from 0 through 9.
var reg2 = /[3-9]/; // 8 -> Searches for a digit from 3 through 9.

In the example above, the first regular expression checks whether a character is a digit from 0 through 9. This regular expression returns the same result as /[0-9]/. The second regular expression checks whether a character is a digit from 3 through 9.

Square brackets ([]) have a special meaning in regular expressions, and they will be covered in more detail in the bracket section below.

The following example uses the special characters \s and \w.

var targetStr = "abc 123";
// Searches for a string made of letters and digits, including underscores, with a whitespace character between them.
var reg = /\w\s\w/; // c 1

The regular expression used above searches for a letter or digit including underscore (_) as the first character. As the second character, it searches for a whitespace character such as a space, tab, or line break. As the third and final character, it again searches for a letter or digit including underscore (_).

By listing special characters like this, you can search for substrings made up of characters that satisfy each condition.

The match() method is a JavaScript String method that finds all strings matching the regular expression passed as an argument and returns them as an array.

The following example uses the special character \b.

var targetStr1 = "abc123abc";   // 7
var targetStr2 = "abc 123 abc"; // 1
var targetStr3 = "abc@123!abc"; // 1

// Searches whether the substring "bc" exists at the beginning or end of a word.
var reg = /bc\b/;

A regular expression using the special character \b checks whether the beginning or end of a word matches the pattern. In the first example above, the regular expression treats the target string as one word and searches for the pattern at both ends. In the second and third examples, however, the regular expression treats the target string as multiple words and searches every word for the pattern.

In JavaScript, only underscores (_), letters, and digits are treated as characters that can be included in a word. Therefore, all other characters are treated as word break characters in a string.

Quantifiers

In regular expressions, you can use several special characters as quantifiers.

Quantifier Description
n* Searches for cases where the preceding character appears zero or more times. Same as /{0, }/.
n+ Searches for cases where the preceding character appears one or more times. Same as /{1, }/.
n? Searches for cases where the preceding character appears zero or one time. Same as /{0,1}/.
var targetStr = "Hello World!";
var zeroReg = /lo*/;          // Searches for cases where 'o' appears zero or more times after 'l'.
var oneReg = /lo+/;           // Searches for cases where 'o' appears one or more times after 'l'.
var zeroOneReg = /lo?/;       // Searches for cases where 'o' appears zero or one time after 'l'.

targetStr.search(zeroReg);    // 2
targetStr.search(oneReg);     // 3
targetStr.search(zeroOneReg); // 2

If a question mark (?) is placed immediately after a regular expression quantifier (*, +, ?, {}), it changes the default behavior, which searches for a pattern with as many characters as possible, so that the quantifier searches for a pattern with as few characters as possible.

var targetStr = "123abc";
var oneReg = /\d+/;           // Searches for digits. Same as /[0-9]/.
var anotherReg = /\d+?/;      // Searches for digits, but with as few characters as possible.

targetStr.search(oneReg);     // 123
targetStr.search(anotherReg); // 1

In the example above, the first regular expression searches for one or more digits, so it matches "123" in the string "123abc" with as many characters as possible. However, if you add a question mark (?) immediately after it as in the second regular expression, the match changes to use as few characters as possible, so only "1" is matched.

Brackets

The meanings of the various brackets that can be used in regular expressions are as follows.

Bracket Description
a(b)c Searches for the whole pattern and then stores the string specified inside the parentheses. For example, after searching for "abc", it stores b.
[abc] Searches for characters specified inside square brackets ([]). For example, searches for "abc".
[0-3] Searches for numbers specified inside square brackets ([]). For example, searches for digits from 0 through 3.
[\b] Searches for a backspace character.
{n} Searches for cases where the preceding character appears exactly n times. n must be a positive integer.
{m,n} Searches for cases where the preceding character appears at least m times and at most n times. m and n must be positive integers.

\b is a special character that checks whether the beginning or end of a word matches the pattern, while [\b] is a regular expression that searches for a backspace character. Do not confuse the two.

The following example uses parentheses to search for a pattern, store that pattern, and change its position.

var targetStr = "Hong Gil Dong";
var nameReg = /(\w+)\s(\w+)\s(\w+)/;                  // Stores each substring separated by whitespace.
var engName = targetStr.replace(nameReg, "$2 $3 $1"); // Moves the first substring to the end.
engName;                                              // Gil Dong Hong

In the example above, the three substrings found by the regular expression enclosed in parentheses are stored in order. In the replace() method, these stored substrings can be used with the $1, $2, …, $n expressions.

Parentheses used in regular expressions are also called capturing parentheses.

These stored substrings can be used not only in the replace() method but also directly inside a regular expression.

var targetStr = "abc 123 abc 123";
var oneReg = /(\w+) (\d+)/;
var anotherReg = /(\w+) (\d+) \1 \2/;

targetStr.match(oneReg);     // abc 123, abc, 123
targetStr.match(anotherReg); // abc 123 abc 123, abc, 123

In the example above, the first regular expression searches for one substring made up of letters and digits including underscore (_), and another substring made up of digits separated by a space. Therefore, the first "abc" and "123" are found and stored in the target string. The second regular expression uses the substrings stored in this way again inside the regular expression. Inside a regular expression, you can use stored substrings with the \1, \2, …, \n expressions.

The match() method returns not only the substring that fully matches the regular expression, but also substrings stored with parentheses.

Position Characters

In regular expressions, you can specify the position of the word where a pattern should be searched.

Character Description
^a Searches only for the pattern located at the beginning of a word. For example, only the a in a word that starts with a.
a$ Searches only for the pattern located at the end of a word. For example, only the a in a word that ends with a.
var firstStr = "Php";
var secondStr = "phP";
var strReg = /^p/;       // Searches only for 'p' at the beginning of a word.
firstStr.match(strReg);  // null
secondStr.match(strReg); // p