Advertisement
  1. Computer Skills
  2. Terminal
Computers

Grep and sed Demystified

by
Difficulty:BeginnerLength:MediumLanguages:

Grep. You hear it a lot. You see those cryptic IT guys typing the command, system admins mentioning it in passing, you even see it in some shell scripts. It seems like one of those things that just exists, but isn't meant for you. This article will change that - we'll explain and take a quick look at grep (and its less famous friend sed) in this newest installment of OS X Demystified.


Introduction

Grep

Grep is a command line utility for searching and filtering some kind of textual input based on parameters you feed it

Grep is a command line utility for searching and filtering some kind of textual input based on parameters you feed it. In other words, it runs in the Terminal (Applications → Utilities → Terminal), and is used exclusively by typing commands. There are, of course, GUI wrappers that help out a bit, but none are as powerful or versatile as the bare bones command line usage, so that's what we'll be focusing on.

That's all nice, but what does it actually do? Does the above sound too vague? Here's an example. Say you have a block of text in a file called jungle.txt with five lines:

In order to find the line which contains the word tiger, we use grep thusly:

The result we are given is:

Ok, that's clear, right? Let's take a step back though.

Upgrading Grep

As it turns out Mac grep is slower than GNU grep, so let's do an upgrade first. To install a faster grep, enter the following into Terminal and press enter:

Please note that you need Homebrew installed to be able to do this, and to find out how to install Homebrew, see my previous article.

A wide array of geeklets will often rely on grep to fetch data from large text files or harvested websites

What have we accomplished by performing this upgrade? Well, many apps use the installed native grep tool to function. For example, a wide array of geeklets will often rely on grep to fetch data from large text files or harvested websites. Thus, all your grep-using geeklets will now be several times faster in their greppy parts. Additionally, you might sometimes need to grep some kind of error log (let's say you have a huge error log from an application and the app's support service tells you to paste them "grep port-1723"). If the log has millions of lines of code, you could save a lot of time using this much faster grep.

Once Homebrew installs your new grep, try doing the following if you made the files. If not, go ahead and make them, then run the command to make sure everything works.

Sed

Sed is a stream editor. Put bluntly, it takes input, edits it, and outputs the edited content. Whether it is editing into a file or is being fed directly from Terminal is completely irrelevant to sed - it has one highly advanced and configurable function, and performs it to the best of its ability.

Sed takes some text input, a command on how to change it, and produces changed output

So where is sed used? Editing file contents and the like, of course, but it just so happens that it works flawlessly hand-in-hand with grep. Let's see some pure sed examples first, though. Type the following into Terminal:

and press enter. The terminal says hello. Now type

and press enter. You should see "Heaveno". What just happened? See, sed works by taking two arguments. The first one is the feed, the input, and the second is a string (you can see it's a string because it's quoted) which tells it what actions to perform on the first argument. In our case that's:

  • s (substitute)
  • / (delimiter - in our case forward slash, see next paragraph for alternatives)
  • Hell (regular expression pattern to search for)
  • Heaven (replacement string)

The second list item mentions alternatives to the forward slash delimiter; sometimes they'll come in very handy due to having to, for example, write URLs or file paths. Take for example the url myfolder/mysubfolder/myfile. If we put this into sed in order to replace it with myotherfolder/myotherfile, the parameter would look like so: s/myfolder/mysubfolder/myfile/myotherfolder/myotherfile/ which is just a big bag of nonsense - sed cannot possibly know which of those fragments is the regexp and which is the replacement string. Therefore, we would need to escape the forwardslashes in our filepath with backslash, so every forwardslash in the path would turn into \/. I'm guessing you can see the problem. The new sed parameter looks like this:

This barely readable format is called a "picket fence", and to avoid it, sed supports different delimiters like underscore (_), colon (:) and pipe (|). For example, if we wanted to use the pipe character as the delimiter, we would end up with the following:

Much better, no?

One other thing, though. We said sed takes two arguments, yet we only ever give it one - right after the sed command. This is because of the pipe character after our echo command. The pipe serves as a means to direct the output of the left operand into the input of the right operand. In our case, the pipe character told the sed program "Take as input whatever it is that you get from whatever there is on the left side of me". sed has no idea it's dealing with echo - it doesn't need to know. All it knows is that it's taking text input. Discussing the pipeline in more detail than this is outside the scope of this article, but feel free to read up if you're interested.

The pipe serves as a means to direct the output of the left operand into the input of the right operand.

So how do we combine it with grep? It's exactly the same. Taking our previous example, let's enter the following into the terminal.

and we get the output

Now let's look at a real world use case.


Real World Application

For our "dissection" we'll take the grep+sed command of a popular weather geeklet and explain it bit by bit. Go ahead and download the sample geeklet. Once downloaded, open it with a text editor of any kind. You'll notice it's no more than an XML file. If you have no experience with XML, don't fret - Josh already did an amazing article on Geektool and its ins and outs. We won't be dealing with the nitty gritty of it all today. Instead, let's focus on the part between the <string> tags:

This cryptic mess is a simple Terminal command - nothing more. You can even paste it into Terminal and you'll get the weather condition for Makati City in the Philippines, which the original author set it to fetch. The geeklet tells Geektool to run said command and take whatever output it gets by running it. Let's take a look at it, pipe segment by pipe segment, and explain in detail:

curl is a tool for transferring data with a URL syntax. This means it can go to a URL and retrieve data from it.

Curl is a tool used for transferring data with a URL syntax on the command line

If you paste the quoted URL into your browser (or just click here), you'll notice you get an XML file from Yahoo! - they have a live weather conditions service which you can easily access and retrieve data from. This is the exact same thing you get when you curl it; only instead of the browser, the input is sent to Terminal. The --silent flag tells curl to be quiet about progress, status and errors, so that the only output we get is the output we need (or nothing, if it fails).

The pipe character follows, meaning the output from curl is sent into grep as input. Grep receives this downloaded XML file in text format, and runs a search on it with the -E flag, which means Extended Regular Expression. The value it's searching for is either the string Current Conditions: or C<BR (the pipe character inside an ereg means "or"). For additional clarification, if you typed the following into our previous example:

you would get

because it returns all lines which contain either "tiger" or "weh".

So if we run these two first pipe segments together like so:

we get the following:

But we only want to get "Haze, 23 C". This is where sed comes in. We simply replace anything we don't want with an empty string (nothing) effectively deleting it.

The -e flag is short for --expression= and allows us to chain multiple sed commands. Therefore, we first replace the string "Current Conditions:" with nothing, followed by replacing <br /> with nothing, etc. until we reach the possible end of the line (<description>).

In the end, all that is left is "Haze, 23 C".

I should mention that the geeklet we've used as an example could have been done far better, but the sheer complexity of the command used seemed like a very good opportunity to cover multiple examples at once. The author could have, for example, simply fetched the line containing "Current conditions:" and the line after it with the -A 1 flag combination, without relying on the temperature symbol (in this case, we rely on Celsius, but what if we wanted Fahrenheit? The author's C<br grep search would fail). Nonetheless, the example served a purpose - and that was introducing you to the wonderful world of grep and sed.


More Resources

While teaching advanced regular expressions and deeper grep, curl and sed functionality is far outside the scope of this article (and this website), feel free to look at the following resources if you wish to know more.


Conclusion

You now know the basics of grep, sed and even curl. While this crash course was far from enough to make you an expert, we hope it was at least enough to get you interested in trying your own data harvesting and querying. At the very least, it's something to talk about around the water cooler on Monday.

I hope you enjoyed it, and if you're up for a challenge, try rewriting the Geeklet to not only being temperature symbol agnostic, but also to figure out the location of the user on its own, without having to manually alter the 'w' parameter in the Yahoo! URL.

Advertisement
Advertisement
Looking for something to help kick start your next project?
Envato Market has a range of items for sale to help get you started.