Photo by JJ Ying on Unsplash.com

Photo by JJ Ying on Unsplash.com

Regular expressions and metacharacters in PowerShell

Patterns

Article from ADMIN 42/2017
By , By
Almost all programming and scripting languages allow the use of regular expressions, but many professionals still believe regex is a relic from ancient times. With specific examples, we show how useful and meaningful regex can be in PowerShell.

When administrators worked exclusively at the command line, they could impress the ordinary user with the endless rows of cryptic letter and number combinations (e.g., (\d{1,4}\.){4}(\d{1,4})), which then changed entries in text files as if by magic.

Even though system administrators today do a large part of their work with graphical tools that provide a convenient interface, the use of regular expressions (regex) significantly facilitates the work. This is true, in particular, when you need to automate and simplify tasks with the use of PowerShell scripts.

PowerShell with Regex

If you have already developed or used some PowerShell scripts, you will typically have come into contact with regular expressions – even if you were perhaps not aware of it. The following example illustrates this very well:

$An_Array = @('somethingno1', 'somethingno2','morestuff')
$An_Array | Where-Object {$_ -match 'something'}

Here, you first create an array of strings and then launch a query that only displays the first two elements of the array, because the third element does not match the 'something' pattern. The -match operator can also be used without the Where-Object cmdlet. Thus, calling:

'somethingno1' -match 'something'

returns the value True because the search pattern was found in the string, whereas calling:

'somethingno1' -match 'nothing'

logically returns False . The -replace operator also works with regular expressions such as

'The book is good' -replace 'The book', 'The ITA book'

which then returns the string The ITA book is good . The -replace operator compares, finds the matching string The book , and replaces it with the The ITA book before output. Thus, the purpose of regular expressions is summarized as follows: They are mainly used for making comparisons or replacing values and characters. In addition to operators for direct comparison of values, such as -eq (equals) and -gt (greater than), the similarity operators -like and -notlike, the replacement operator -replace, and the match operators -match and -notmatch all belong to the comparison operators category. The -replace operator, as in the example here, and the -match and -notmatch operators can all handle regex. The two following calls thus produce exactly the same output on screen:

> Get-Service | where {$_.status -like "running"}
> Get-Service | where {$_.status -match "running"}

Please note that this query is not case sensitive – it does not distinguish between upper- and lowercase. Both calls will find processes that are displayed as running or Running . If you need a comparison that is explicitly case sensitive, use the -cmatch operator. To make it clear to any other user reading your shell script that you do not want to differentiate between upper- and lowercase, use the -imatch operator, which works in the same way as -match.

Both calls display all of the processes that are active (running) on the system, which could initially lead to confusion with many PowerShell beginners. However, the -like operator works exclusively together with the asterisk (*) metacharacter (or wildcard), which stands for any number of characters, excluding other metacharacters. Therefore, comparisons can be made in a far more accurate and meaningful way by using -match with the help of metacharacters. Metacharacters are most responsible for the bad reputation of regular expressions, because they make your command line look like hieroglyphics.

Patterns and Metacharacters

Regular expressions are patterns (character strings) that describe data. Such an expression always represents a certain type of data in the search pattern and often include metacharacters. Some of the most important of these characters used in PowerShell scripts are:

. ^ $ [ ] { } * ? + \

This list is not exhaustive and only reflects a small selection of the metacharacters available in PowerShell. In many cases, you want to determine whether a string that stands for a file name starts with a specific letter or has a particular extension. Three metacharacters known as quantifiers are used here: the asterisk *, plus +, and question mark ?. The asterisk stands for a character that occurs a random number of times, or not at all, which means the expression will be true even if the character you are looking for is not in the string. In contrast, the plus sign stands for a character that occurs at least once or an arbitrary number of times. Finally, the question mark stands for a character that might only be found in the string once or not at all. Thus, the call

> 'something.txt' -match 'i*'

returns the value True because the i pattern was found, followed by no or any number of characters in the string. In this type of search pattern with an * metacharacter, it does not matter where the i is found. The following call also returns True :

> 'Thatistheone.txt' -match 'i*'

It would make more sense to determine whether the letter i , for which you are searching, occurs at the start of the character string. To do so, use the ^ metacharacter (circumflex accent or hat). After calling

> 'itssomething.txt' -match '^i'

the shell returns True , whereas the call

> 'Thatistheone.txt' -match '^i'

returns the value False . If you are looking for a character at the end of the string, you can use the dollar sign $, which must then be specified after the comparison template. Thus,

> 'something.txt' -match 't$'

returns the value True. You can read the $matches variable, which is automatically created and filled by calling -match and in which the corresponding hash table is stored. Enter

> 'something.txt' -match 't$'
True
 **
> $matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** t

to check the $matches variable.

Letters and Numbers

The regular expressions in PowerShell use character classes, such as those that are also available in Microsoft .NET Framework 3.5. If you want to use the -match operator to determine whether, for example, an object is a letter, then use

> $Teststring='Programming'
> $Teststring -match "\w"

which returns True . In this case, it is important that you type a small w after the escape character \, which is used here to keep the w from being interpreted as a normal single character by the shell. In contrast to PowerShell's normal behavior, upper- and lowercase are distinguished. If you use the call

> $Teststring -match "\W"

PowerShell checks for non-letters, which means the expression would be True if the shell were to come across a number in the string, for example. However, because the first character is a letter, the comparison is cancelled immediately and a value of False is returned. In this case, reading $matches shows that the character P was found. PowerShell always compares the pattern to be examined with the regex call, until the condition is met. This also works when comparing numbers, which you can do with the following call (Figure 1):

> $Teststring='Programm456ing
> $Teststring -match "\d"
True
 **
> $matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** 4

The shell stops the comparison after finding the first number in the string. Because this is not practical in many cases, the behavior of the comparison can be changed using a metacharacter.

> $Teststring -match "\d+"
True
 **
> $matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** 456
Figure 1: When searching for numbers, the shell stops at the first match. If + is added, all the numbers in the string are found.

Curly brackets help improve precision and can help you determine a number of characters that should be found in the string. The basic syntax of such a call is {no. of min. characters, no. of max. characters}. If you just have one number between the curly brackets, PowerShell checks for at least this number of characters,

> $Teststring -match "\d{2}"
True
 **
> $matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** 45

whereas calling

> $Teststring -match "\d{2,3}"

returns True if at least two and at most three numbers exist in the character string.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus