Coming to grips with grep
Patterns
We are all creatures of habit in varying degrees, and I frequently find myself settling into various routines in my job as a sys admin. I have had many a moment in which I lucidly caught myself thinking: "Not this task again! I really need to speed this process up." By simply automating a procedure or ultimately deprecating and condemning it as redundant, I could save a lot of time in the long run.
When I recently spent some time away from the computer monitor, I had a chance to consider what I could use to help automate tasks and to think about the procedures that I face routinely. In the end, I concluded that clever little command lines were the way forward, and in most cases, these could translate into clever little shell scripts.
To increase my efficiency in creating command lines and scripts, I made a conscious decision to start again and effectively go back to basics with some of the core shell commands; that is, I wanted a timely way to improve my understanding of a few key packages so that I wasn't always looking up a parameter or switch and, thus, speed up my ability to automate tasks.
When I typed history
at the prompt, lo and behold, that old favorite grep
stood out as something I use continually throughout my working day. Therefore, in this article, I will dig into some of the history of grep and attempt to help you improve your readily available knowledge, with the aim of being able to solve problems more efficiently and quickly.
Not Such a Bad Pilot
As most sys admins soon discover, one of the heroes of the Linux command line is the grep command. With grep you can rifle through just one file, all of a directory, the process table, and much more without batting an eyelid.
The stalwart that is grep comes to a command prompt near you in a few forms, but I'll come to that a little later. First, I want to get my hands dirty with a reminder of some of the basics.
To get moving in the right direction, I'll consider a text file with five lines of words (Figure 1, top). Using that file, I'll start with a simple grep example that searches for a pattern within a file:
# grep one filename
The output in the bottom of Figure 1 shows all the lines that contain the search string (i.e., "one" in this example). Then, I reverse that operation and output everything that does not have the pattern "one" (Figure 2):
# grep -v one filename
These two examples are surprisingly simple but powerful. You can see everything that does and conversely does not include a specified pattern. This is not rocket science, but combined with some suitably juicy command-lines tools (e.g., awk
, sed
, and cut
), you will soon find yourself boasting a formidable arsenal with a little practice.
With this simple introduction, I'll look a little more closely into why grep is so highly regarded.
To count how many times the pattern "one" shows up in a file, I use:
# grep -c one filenane
The output from grep faithfully produces an accurate count of the number of pattern instances in that file, as shown in Figure 3.
Many extensions exist to complement grep's functionality, such as less
and zgrep
. If I simply compress the five-line text file using zip
, then I should see the file output in Figure 4 when I use less filename
to look inside filename.zip
. The excellent less
command shows a number of useful details about the shrunken file, such as how much the zip file compresses the original file as a percentage.
The zgrep
command can forage among the depths of compressed files and yet promptly return pattern matches. Your mileage may vary with compressed archives containing non-ASCII or binary files. If you think about it for a while, you can see the full potential of this functionality, but if you're unfamiliar with it, I hope you find it as intriguing as I did when I first came across it. To try out zgrep
on the zipped text file (Figure 5), enter:
# zgrep one filename.zip
Now I'll dismiss the compression format more commonly associated with Windows operating systems for a moment and doff my cap at files compressed with gzip
. Many of the instruction manuals, or info pages, bundled with Unix-like operating systems are compressed to save space. For example, when you run the command,
# info grep
the details displayed by the program it spawns is actually a viewer converting compressed content from the file /usr/share/info/grep.info.gz
– thanks to the functionality of grep and less.
With the malleable grep command, it's possible to look through subdirectories, too. According to the man page, you can use either the upper- or lowercase version of -r
to search recursively through subdirectories and receive the same result:
# grep -r one /home/chris
The screeds of output from this command might alarm some users executing it on desktops. I was surprised at the fact that the diligent grep hunted down hidden files deep into the dark depths of my subdirectories (those files beginning with a dot) but also at how many subdirectories my Ubuntu desktop needed to operate. It was also surprising because, among the numerous tiny and previously unseen files, a great many returned a hit for the simple pattern "one." However, such unforeseen output is a learning experience that increases my knowledge of how systems work.
Let Me Contain My Surprise
Two pieces of output from recursively checking my home directory piqued my interest. The first was a dictionary file I'd forgotten about and hadn't purged from my Trash, although it's not surprising that there might be one or two matches for the word "one" in a dictionary. The second piece of output (Listing 1) also was not expected. Clearly this history and any associated preferences need to be cached somewhere, and where better than deep within a user's home directory hidden away from prying eyes? The very last word aplicaciones shows the hit for the word "one."
Listing 1
Some Recursive Grep Output
/home/chris/.cache/software-center/piston-helper/software-center.ubuntu.\ com,api,2.0,applications,en,ubuntu,precise,\ amd64,,bbc2274d6e4a957eb7ea81cf902df9e0: <snippet> "description": "La revista para los usuarios de Ubuntu\nNúmero 5: Escritorios \ a Examen\r\n\r\n\r\nEn el interior: Artículos sobre los escritorios Unity, \ Gnome3 y KDE, tutoriales para crear tu propia distro, consejos de seguridad, \ anáslisi de aplicaciones
Next, I'll look directly at the /proc
virtual filesystem (as opposed to querying the process table per se), which doesn't contain "real files," and look at the results of the command I ran on my home directory. With two of the switches I've already talked about, the grep command in Figure 6 searches recursively and counts the number of "one" patterns it finds.
The next example uses the asterisk wildcard, which allows you to search every file in the current directory, and the -l
switch, which returns only the filenames of those files that return a positive hit, as opposed to the full line of content that has a match:
# grep -l pattern *
Change it to uppercase -L
and you'll only get those files without the pattern, much as the -v
switch inverted matching in Figure 2.
Another switch I come across frequently with regular expressions is the almost globally used -i
, which conveniently means that grep should ignore all case sensitivity, as in:
# grep -i ERROR daemon.log
If you're diagnosing an issue or rescuing a failing system and can't remember whether the service in question writes to its log files in upper- or lowercase, then just flick the -i
switch.
I hope I have demonstrated that inverting and reporting pattern matches are useful when solving a problem or performing root cause analysis, and when much of a Linux server exists as pure text on a filesystem, you have endless applications to consider.
This Is Some Rescue
As you have seen, the powerful grep outputs a whole line when it matches a pattern. By adding the -o
switch, you can output only the string of interest and suppress the other potentially less relevant data. This switch can be very useful when executed on a text file such as this:
# cat searchedfile start-steven-end start-raheem-end start-daniel-end start-phillipe-end start-middle-stop # grep -o "start-.*-end" searchedfile
This statement says to output lines with the suitable middle element (represented by the wildcard asterisk) and yet still find the whole line being sought. As you would expect, this grep statement produces the following output:
start-steven-end start-raheem-end start-daniel-end start-phillipe-end
If you add -b
to the command, you can tell how many bytes into a file you will find each entry; that is, the byte offset
of a pattern.
Although some characters take up more than 1 byte (e.g., Chinese character sets use more than one byte per letter), generally speaking the following examples will apply. Using the same input file:
# grep -ob raheem searchedfile 17:raheem
If I had used daniel
, the output would be 34:daniel
. Note how grep counts the bytes in a file.
The ever-useful -n
gives grep the power to report the line number in which the offending (or pleasing) pattern appears in your file. This ability becomes very useful when you're hunting high and low for a number of patterns in a large file.
Grep also recognizes adjacent lines with the switches -A
(after) and -B
(before). This capability could help you manipulate massive chunks of content without splitting up files or getting involved in a manual cut-and-paste marathon. With the use of the text file I employed earlier in this article, I'll run through an example that uses one of these simple but flexible options.
Figure 7 offers a look at what the five-line text file produces from one of the Context Line Control commands, as per the GNU grep manual [1]:
# grep "six" -B1 filename
Using the after and before options can make light work of all kinds of search applications.
Buy this article as PDF
(incl. VAT)