103.7 Using Regular Expressions
Candidates should be able to manipulate files and text data using regular expressions. This objective includes creating simple regular expressions containing several notational elements. It also includes using regular expression tools to perform searches through a filesystem or file content.
Key Knowledge Areas
- Create simple regular expressions containing several notational elements.
- Use regular expression tools to perform searches through a filesystem or file content.
Finding a word or multiple words in a text is achieved using grep, fgrep or egrep. The keywords used during a search are a combination of letters called regular expressions. Regular expressions are recognised by many other applications such as sed, and vi.
Traditional Regular Expressions (regex)
A regular expression is a sequence of characters (or atoms) used to match a pattern. Characters are either constants (treated literally) or metacharacters.
|\<KEY||Words beginning with ‘KEY’|
|WORD\>||Words ending with ‘WORD’|
|^||Beginning of a line|
|$||End of a line|
|[ Range ]||Range of ASCII characters enclosed|
|[^c ]||Not the character ‘c’|
|\[||Interpret character ‘[‘ literally|
|“ca*t”||Strings containing ‘c’ followed by no 'a' or any number of the letter 'a' followed by a 't'|
|“.”||Match any single character|
The main eregex’s are: +,?,() and |
|"A1|A2|A3"||Strings containing ‘A1’ or ‘A2’ or ‘A3’|
|"ca+t"||Strings containing a 'ca' followed by any number of the letter 'a' followed by a 't'|
|"ca?t"||Strings containing ‘c’ followed by no 'a' or exactly one 'a' followed by a 't'|
The grep family
The grep utility supports regular expressions regex such as those listed in Table1.
Working with basic grep
Syntax for grep:
grep PATTERN FILE
Options for grep include:
|-c||count the number of lines matching PATTERN|
|-f||obtain PATTERN from a file|
|-i||ignore case sensitivity|
|-n||Include the line number of matching lines|
|-v||output all lines except those containing PATTERN|
|-w||Select lines only if the pattern matches a whole word.|
For example list all non blank lines in /etc/lilo.conf:
|$ grep –v “^$” /etc/lilo.conf|
The egrep tool supports extended regular expressions eregex such as those listed in Table2.
The egrep utility will handle any modern regular expressions. It can also search for several keywords if they are entered at the command line, separated by the vertical bar character.
|$ egrep 'linux|^image' /etc/lilo.conf|
fgrep stands for fast grep and fgrep interprets strings literally (no regex or eregex support). The fgrep utility does not recognise the special meaning of the regular expressions.
|$ fgrep 'cat*' FILE|
will only match words containing ‘cat*’. The main improvement came from fgrep’s ability to search from a list of keywords entered line by line in a file, say LIST. The syntax would be
|$ fgrep –f LIST FILE|
The Stream Editor - sed
sed performs automatic, non-interactive editing of files. It is often used in scripts to search and replace patterns in text. It supports most regular expressions.
Syntax for sed:
sed [options] 'command' [INPUTFILE]
The input file is optional since sed also works on file redirections and pipes. Here are a few examples assuming we are working on a file called MODIF.
Delete all commented lines:
|$ sed '/^#/ d ' MODIF|
Notice that the search pattern is between the double slashes.
Substitute /dev/hda1 by /dev/sdb3:
|$ sed 's/\/dev\/hda1/\/dev\/sdb3/g' MODIF|
The s in the command stands for ‘substitute’. The g stands for “globally” and forces the substitution to take place throughout each line. You can also specify which line numbers the substitutions should occur on, either using line numbers or regular expression match.
If the line contains the keyword KEY then substitute ‘:’ with ‘;’ globally:
|$ sed '/KEY/ s/:/;/g' MODIF|
More Advanced sed
You can issue several commands each starting with –e at the command line. For example, (1) delete all blank lines then (2) substitute ‘OLD’ by ‘NEW’ in the file MODIF
|$ sed –e '/^$/ d’ -e ‘s/OLD/NEW/g' MODIF|
These commands can also be written to a file, say COMMANDS. Then each line is interpreted as a new command to execute (no quotes are needed).
The syntax to use this COMMANDS file is:
sed -f COMMANDS MODIF
This is much more compact than a very long command line !
|-e Execute the following command|
|-f Read commands from a file|
|-n Do not printout unedited lines|
|d Delete an entire line|
|r Read a file and append to output|
|w Write output to a file|
Used files, terms and utilities: