Regular Expressions Anchor Characters: Caret (^)

From WikiOD

Remarks[edit | edit source]

Terminology

The Caret (^) character is also referred to by the following terms:

  • hat
  • control
  • uparrow
  • chevron
  • circumflex accent

Usage

It has two uses in regular expressions:

  • To denote the start of the line
  • If used immediately after a square bracket ([^) it acts to negate the set of allowed characters (i.e. [123] means the character 1, 2, or 3 is allowed, whilst the statement [^123] means any character other than 1, 2, or 3 is allowed.

Character Escaping

To express a caret without special meaning, it should be escaped by preceding it with a backslash; i.e. \^.

Start of Line[edit | edit source]

When multi-line (?m) modifier is turned off, ^ matches only the input string's beginning:[edit | edit source]

For the regex

^He

The following input strings match:

  • Hedgehog\nFirst line\nLast line
  • Help me, please
  • He

And the following input strings do not match:

  • First line\nHedgehog\nLast line
  • IHedgehog
  • Hedgehog (due to white-spaces )

When multi-line (?m) modifier is turned on, ^ matches every line's beginning:[edit | edit source]

^He

The above would match any input string that contains a line beginning with He.

Considering \n as the new line character, the following lines match:

  • Hello
  • First line\nHedgehog\nLast line (second line only)
  • My\nText\nIs\nHere (last line only)

And the following input strings do not match:

  • Camden Hells Brewery
  • Helmet (due to white-spaces )

Matching empty lines using ^[edit | edit source]

Another typical use case for caret is matching empty lines (or an empty string if the multi-line modifier is turned off).

In order to match an empty line (multi-line on), a caret is used next to a $ which is another anchor character representing the position at the end of line (Anchor Characters: Dollar ($) ). Therefore, the following regular expression will match an empty line:

 ^$

Escaping the caret character[edit | edit source]

If you need to use the ^ character in a character class (Character classes ), either put it somewhere other than the beginning of the class:

[12^3]

Or escape the ^ using a backslash \:

[\^123]

If you want to match the caret character itself outside a character class, you need to escape it:

\^

This prevents the ^ being interpreted as the anchor character representing the beginning of the string/line.

Comparison start of line anchor and start of string anchor[edit | edit source]

While many people think that ^ means the start of a string, it actually means start of a line. For an actual start of string anchor use, \A.

The string hello\nworld (or more clearly)

hello
world

Would be matched by the regular expressions ^h, ^w and \Ah but not by \Aw

Multiline modifier[edit | edit source]

By default, the caret ^ metacharacter matches the position before the first character in the string.

Given the string "charsequence" applied against the following patterns: /^char/ & /^sequence/, the engine will try to match as follows:

/^char/

  • ^ - charsequence
  • c - charsequence
  • h - charsequence
  • a - charsequence
  • r - charsequence

Match Found

/^sequence/

  • ^ - charsequence
  • s - charsequence

Match not Found

The same behaviour will be applied even if the string contains line terminators, such as \r?\n. Only the position at the start of the string will be matched.

For example:

/^/g

┊char\r\n

\r\n

sequence

However, if you need to match after every line terminator, you will have to set the multiline mode (//m, (?m)) within your pattern. By doing so, the caret ^ will match "the beginning of each line", which corresponds to the position at the beginning of the string and the positions immediately after1 the line terminators.

1 In some flavors (Java, PCRE, ...), ^ will not match after the line terminator, if the line terminator is the last in the string.

For example:

/^/gm

┊char\r\n

┊\r\n

┊sequence

Some of the regular expression engines that support Multiline modifier:

Java

Pattern pattern = Pattern.compile("(?m)^abc");
Pattern pattern = Pattern.compile("^abc", Pattern.MULTILINE);

.NET

var abcRegex = new Regex("(?m)^abc");
var abdRegex = new Regex("^abc", RegexOptions.Multiline)

PCRE

/(?m)^abc/
/^abc/m

Python 2 & 3 (built-in re module)

abc_regex = re.compile("(?m)^abc");
abc_regex = re.compile("^abc", re.MULTILINE);

Credit:Stack_Overflow_Documentation