Regular Expressions Word Boundary

From WikiOD
Revision as of 04:05, 14 June 2021 by Admin (talk | contribs) (Text replacement - "{{note| This article is an extract of the original Stack Overflow Documentation created by contributors and released under [ CC BY-SA 3.0]. This website is not affiliated with Stack Overflow }}" to "{{note| Credit:Stack_Overflow_Documentation }}")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Syntax[edit | edit source]

  • POSIX style, end of word: [[:>:]]
  • POSIX style, start of word: [[:<:]]
  • POSIX style, word boundary: [[:<:][:>:]]
  • SVR4/GNU, end of word: \>
  • SVR4/GNU, start of word: \<
  • Perl/GNU, word boundary: \b
  • Tcl, end of word: \M
  • Tcl, start of word: \m
  • Tcl, word boundary: \y
  • Portable ERE, start of word: (^|[^[:alnum:]_])
  • Portable ERE, end of word: ([^[:alnum:]_]|$)

Remarks[edit | edit source]

Additional Resources[edit | edit source]

Word boundaries[edit | edit source]

The \b metacharacter[edit | edit source]

To make it easier to find whole words, we can use the metacharacter \b. It marks the beginning and the end of an alphanumeric sequence*. Also, since it only serves to mark this locations, it actually matches no character on its own.

*: It is common to call an alphanumeric sequence a word, since we can catch it's characters with a \w (the word characters class). This can be misleading, though, since \w also includes numbers and, in most flavors, the underscore.

Examples:[edit | edit source]

Regex Input Matches?
\bstack\b stackoverflow No, since there's no ocurrence of the whole word stack
\bstack\b foo stack bar Yes, since there's nothing before nor after stack
\bstack\b stack!overflow Yes: there's nothing before stack and !is not a word character
\bstack stackoverflow Yes, since there's nothing before stack
overflow\b stackoverflow Yes, since there's nothing after overflow

The \B metacharacter[edit | edit source]

This is the opposite of \b, matching against the location of every non-boundary character. Like \b, since it matches locations, it matches no character on its own. It is useful for finding non whole words.

Examples:[edit | edit source]

Regex Input Matches?
\Bb\B abc Yes, since b is not surrounded by word boundaries.
\Ba\B abc No, a has a word boundary on its left side.
a\B abc Yes, a does not have a word boundary on its right side.
\B,\B a,,,b Yes, it matches the second comma because \B will also match the space between two non-word characters (it should be noted that there is a word boundary to the left of the first comma and to the right of the second).

Find patterns at the beginning or end of a word[edit | edit source]

Examine the following strings:

  • the regular expression bar will match all four strings,
  • \bbar\b will only match the 2nd,
  • bar\b will be able to match the 2nd and 3rd strings, and
  • \bbar will match the 2nd and 4th strings.

Match complete word[edit | edit source]


will match the complete word with no alphanumeric and _ preceding or following by it.

Taking from

There are three different positions that qualify as word boundaries:

  1. Before the first character in the string, if the first character is a word character.
  2. After the last character in the string, if the last character is a word character.
  3. Between two characters in the string, where one is a word character and the other is not a word character.

The term word character here means any of the following

  1. Alphabet([a-zA-Z])
  2. Number([0-9])
  3. Underscore _

In short, word character = \w = [a-zA-Z0-9_]

Make text shorter but don't break last word[edit | edit source]

To make long text at most N characters long but leave last word intact, use .{0,N}\b pattern: