« Previous 1 2 3 Next »
Coming to grips with grep
Patterns
That's No Moon
A slightly more complex application of the stellar grep uses a Bash alias to ignore frequent offenders in Apache logfiles. When an exclude list is lengthy or a logfile is continuously changing as information accumulates, it can be tricky to exclude certain phrases or keywords accurately. With this method, however, I can very effectively do just that. As you might have guessed, at the heart of such a command is a new switch.
Standard input/output (stdio) buffers its output in certain circumstances, which can cause commands piped further down the command-line chain to lose input; however, the --line-buffered
switch disables buffering:
alias cleanlog='cat /var/log/apache2/access.log | \ grep -v --line-buffered -f /home/chris/.apache_exceptions | \ less +G -n | cut -d' ' -f2 | sort | uniq'
This Bash alias looks at the exceptions in the file .apache_exceptions
in my home directory. The -v
switch ignores any occurrence of these exceptions in the access.log
logfile before pushing the surviving log entries onward to the formatting commands that finish the command line.
With the output provided by running the cleanlog
alias, I can tell quickly which new IP addresses have hit the web server, excluding search engines and staff. Just running that alias offers a list of unique, raw IP addresses, but by adding wc
to the command
# cleanlog | wc
the output gives me a count of unique IP addresses in the log. To experiment, you can try the cleanlog alias directly on the command line incorporating tail -f
.
A coreutils utility apparently can achieve a similar solution like this:
tail -f /var/log/foo | stdbuf -o0 grep
The -o0
switch disables buffering of the standard output stream.
Next in Line
Once upon time, a group of arcane grep utilities existed that were then forged into one single, powerful tool. That is to say, formerly a number of grep derivatives came into being on a variety of operating systems such as Solaris and other more archaic Unix-like OSs.
The utilities egrep
, fgrep
, and rgrep
are for all intents and purposes now the equivalent of grep -e
, grep -f
, and grep -r
. There's also pgrep
which I'll look at in a second.
In this section, I'll very briefly touch on each of these extensions to the standard grep, leaving rgrep aside because I've looked at grep -r
already.
The versatile fgrep
extension (or grep -F
) obtains a text file with a list of patterns, each on a new line, and then searches for these fixed strings
within another file.
Note that the first letter in the word "fixed" recalls the fgrep
command name, although some say it means fast grep
; because fgrep ignores regular expressions and instead takes everything literally, it is faster than grep. An example pattern file could look like this:
patternR2 patternD2 patternC3 patternP0
A very large logfile might also start with lots of similar "pattern" entries. Then, to find exact matches in the logfile of lines beginning with precisely the entries in the pattern file, you would use:
# fgrep -f patternfile.txt hugelogfile.log
Whereas grep stands for Globally search a Regular Expression and Print, its counterpart egrep prefixes the word "Extended." The egrep command enables full regular expression support and is a little bulkier to run than grep.
The pgrep command, on the other hand, is a slightly different animal and originally came from Solaris 7 by prefixing "grep" with a "p" for process ID .
As you can probably guess, pgrep doesn't concern itself with such mundane activities as the filesystem's contents; instead, it checks against the process table. For example, to ask only for processes that are owned by both users root AND chris , you would enter:
# pgrep -u root chris
Subtly adding a comma outputs the processes that are owned by one user OR the other:
# pgrep -u root,chris
If you add -f
, then pgrep checks against the full command line as opposed to just the process name. Again, you can invert pattern matching by introducing -v
so that only those processes without the pattern are output:
# pgrep -v daemon
Alternatively, to nail down a match precisely, you can force pgrep to match a pattern verbatim:
# pgrep -x -u root,chris daemon
In other words, x
means exact
in this case.
A command that until recently I hadn't realized was part of the pgrep family is the fantastic pkill
. If you find looking up a process number before applying the scary
kill -9 12345
command, then why bother with PIDs at all? The affable pkill lets you do this with all matching process names:
# pkill httpd
Obviously, you will want to use this with some care, especially on remote servers or as the root user.
By combining a couple of the pgrep tricks, the following example offers a nice way of retrieving detailed information regarding the ssh
command:
# ps -p $(pgrep -d, -x sshd) PID TTY TIME CMD 1905 ? 00:00:00 sshd 16863 ? 00:00:00 sshd 16869 ? 00:00:00 sshd
Just for fun, I'll check the difference that adding the -f
flag makes by using the full command (Listing 2).
Listing 2
pgrep with -f
# ps -fp $(pgrep -d, -x sshd) UID PID PPID C STIME TTY TIME CMD root 1905 1 0 Jul03 ? 00:00:00 /usr/sbin/sshd root 16863 1905 0 13:11 ? 00:00:00 sshd: chris [priv] chris 16869 16863 0 13:11 ? 00:00:00 sshd: chris@pts/0
Simply too many scenarios are possible to give the full credit due to pgrep, but the combination of any of the popular utilities (awk, sed, trim, cut, etc.) makes pgrep an efficient script and command-line tool.
Elixir of Grep
As sys admins, we are commonly purging lots of data to get to the precious pieces of information buried within. However, it's not always easy, so now I want to look at another text file of sentences (Listing 3) as an example and try running some regular expressions (regex) alongside grep, starting with:
Listing 3
simplefile
A stitch in time saves nine. An apple a day keeps the doctor away. As you sow so shall you reap. A nod is as good as a wink to a blind horse. A volunteer is worth twenty pressed men. A watched pot never boils. All that glitters is not gold. A bird in the hand is worth two in the bush.
# grep "^[[:upper:]]" simplefile
The output of this command shows which characters are uppercase (Figure 8). Those readers who are familiar with regular expressions know that the [[:upper:]]
regex is the same as [A-Z]
.
Similarly, to look for an uppercase followed by a lowercase character at the start of the line, entering
# grep "^[A-Z][a-z]" simplefile
outputs the results shown in Figure 9. Many other regex potions suit the powerful grep command. The resources online are extensive, so I'll move on to yet more derivatives.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)