Coming to grips with grep

Patterns

That's No Moon

A slightly more complex application of the stellar grep uses a Bash alias to ignore frequent offenders in Apache logfiles. When an exclude list is lengthy or a logfile is continuously changing as information accumulates, it can be tricky to exclude certain phrases or keywords accurately. With this method, however, I can very effectively do just that. As you might have guessed, at the heart of such a command is a new switch.

Standard input/output (stdio) buffers its output in certain circumstances, which can cause commands piped further down the command-line chain to lose input; however, the --line-buffered switch disables buffering:

alias cleanlog='cat /var/log/apache2/access.log | \
  grep -v --line-buffered -f /home/chris/.apache_exceptions | \
  less +G -n | cut -d' ' -f2 | sort | uniq'

This Bash alias looks at the exceptions in the file .apache_exceptions in my home directory. The -v switch ignores any occurrence of these exceptions in the access.log logfile before pushing the surviving log entries onward to the formatting commands that finish the command line.

With the output provided by running the cleanlog alias, I can tell quickly which new IP addresses have hit the web server, excluding search engines and staff. Just running that alias offers a list of unique, raw IP addresses, but by adding wc to the command

# cleanlog | wc

the output gives me a count of unique IP addresses in the log. To experiment, you can try the cleanlog alias directly on the command line incorporating tail -f.

A coreutils utility apparently can achieve a similar solution like this:

tail -f /var/log/foo | stdbuf -o0 grep

The -o0 switch disables buffering of the standard output stream.

Next in Line

Once upon time, a group of arcane grep utilities existed that were then forged into one single, powerful tool. That is to say, formerly a number of grep derivatives came into being on a variety of operating systems such as Solaris and other more archaic Unix-like OSs.

The utilities egrep, fgrep, and rgrep are for all intents and purposes now the equivalent of grep -e, grep -f, and grep -r. There's also pgrep which I'll look at in a second.

In this section, I'll very briefly touch on each of these extensions to the standard grep, leaving rgrep aside because I've looked at grep -r already.

The versatile fgrep extension (or grep -F) obtains a text file with a list of patterns, each on a new line, and then searches for these fixed strings within another file.

Note that the first letter in the word "fixed" recalls the fgrep command name, although some say it means fast grep ; because fgrep ignores regular expressions and instead takes everything literally, it is faster than grep. An example pattern file could look like this:

patternR2
patternD2
patternC3
patternP0

A very large logfile might also start with lots of similar "pattern" entries. Then, to find exact matches in the logfile of lines beginning with precisely the entries in the pattern file, you would use:

# fgrep -f patternfile.txt hugelogfile.log

Whereas grep stands for Globally search a Regular Expression and Print, its counterpart egrep prefixes the word "Extended." The egrep command enables full regular expression support and is a little bulkier to run than grep.

The pgrep command, on the other hand, is a slightly different animal and originally came from Solaris 7 by prefixing "grep" with a "p" for process ID .

As you can probably guess, pgrep doesn't concern itself with such mundane activities as the filesystem's contents; instead, it checks against the process table. For example, to ask only for processes that are owned by both users root AND chris , you would enter:

# pgrep -u root chris

Subtly adding a comma outputs the processes that are owned by one user OR the other:

# pgrep -u root,chris

If you add -f, then pgrep checks against the full command line as opposed to just the process name. Again, you can invert pattern matching by introducing -v so that only those processes without the pattern are output:

# pgrep -v daemon

Alternatively, to nail down a match precisely, you can force pgrep to match a pattern verbatim:

# pgrep -x -u root,chris daemon

In other words, x means exact in this case.

A command that until recently I hadn't realized was part of the pgrep family is the fantastic pkill. If you find looking up a process number before applying the scary

kill -9 12345

command, then why bother with PIDs at all? The affable pkill lets you do this with all matching process names:

# pkill httpd

Obviously, you will want to use this with some care, especially on remote servers or as the root user.

By combining a couple of the pgrep tricks, the following example offers a nice way of retrieving detailed information regarding the ssh command:

# ps -p $(pgrep -d, -x sshd)
PID    TTY    TIME     CMD
1905   ?      00:00:00 sshd
16863  ?      00:00:00 sshd
16869  ?      00:00:00 sshd

Just for fun, I'll check the difference that adding the -f flag makes by using the full command (Listing 2).

Listing 2

pgrep with -f

# ps -fp $(pgrep -d, -x sshd)
UID       PID   PPID    C  STIME  TTY     TIME       CMD
root     1905      1    0  Jul03    ?     00:00:00   /usr/sbin/sshd
root    16863   1905    0  13:11    ?     00:00:00   sshd: chris [priv]
chris   16869  16863    0  13:11    ?     00:00:00   sshd: chris@pts/0

Simply too many scenarios are possible to give the full credit due to pgrep, but the combination of any of the popular utilities (awk, sed, trim, cut, etc.) makes pgrep an efficient script and command-line tool.

Elixir of Grep

As sys admins, we are commonly purging lots of data to get to the precious pieces of information buried within. However, it's not always easy, so now I want to look at another text file of sentences (Listing 3) as an example and try running some regular expressions (regex) alongside grep, starting with:

Listing 3

simplefile

A stitch in time saves nine.
An apple a day keeps the doctor away.
As you sow so shall you reap.
A nod is as good as a wink to a blind horse.
A volunteer is worth twenty pressed men.
A watched pot never boils.
All that glitters is not gold.
A bird in the hand is worth two in the bush.
# grep "^[[:upper:]]" simplefile

The output of this command shows which characters are uppercase (Figure 8). Those readers who are familiar with regular expressions know that the [[:upper:]] regex is the same as [A-Z].

Figure 8: A grep pattern match using regular expressions highlights the uppercase character at the start of a line.

Similarly, to look for an uppercase followed by a lowercase character at the start of the line, entering

# grep "^[A-Z][a-z]" simplefile

outputs the results shown in Figure 9. Many other regex potions suit the powerful grep command. The resources online are extensive, so I'll move on to yet more derivatives.

Figure 9: The output of a pattern match using regular expressions of an uppercase followed by a lowercase character at the start of a line.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus