awk
is a specialized tool and programming language for querying and transforming text. In this lesson, we’ll scratch the surface of the possibilities it provides by leveraging it in one of its most common use cases: extracting data from columns of text. Awk makes it easy to select and print just the columns and rows that we want from commands, like ps
, that return a table of data.
Cameroon Nokes: [0:00] Do ps aux, pipe that to less so we can see the column headers at the top. We have the USER, the process ID, the CPU usage percentage, and so on. Let's say I want to print the CPU usage percentage column, which is the third column. Let's see how we do that.
[0:18] We run the command again, we'll pipe it to awk. Then here, we'll do opening and closing braces and we're going to do print, and then, it was the third column. What I did it here, is I invoke awk, and in the single quotes here, this is my awk script. I put it in single quotes to prevent Bash from interpreting the awk script.
[0:39] Awk automatically splits out the columns for you and assigns them to a variable based on their order, so to get the third column, I use $3. From here, I'll pipe this to head so I don't get too much data. Cool. We see that worked.
[0:53] The basic syntax of awk is this. We have the awk command that we invoke, optionally any flags here, which we don't have. Then, in single quotes, we have the awk script. In there we can optionally have a condition, which filters down the information that we process and is handed off to the command statements, which then operate on the data selected by the condition.
[1:15] Let's look at some conditions we can use. Let's say we want to filter to processes that have CPU usage greater than 2 percent, so we run our command again, column three. Let's do greater than 2 percent, and then I'll print. Cool. That looks like that's working.
[1:33] Let's print the process name column as well, which was column 11. I can do comma, column 11 there, and then I'll print them out with a space between them. Cool. That looks like that's working.
[1:48] Interestingly, we can see that it doesn't print all of column 11 because there can be spaces in a process name, so it's getting split out. Awk isn't going to be perfect in every case, just something to be aware of.
[1:59] Note that if we want to print the whole row, we can leave out the print statement all together and do the condition. You can see it's working, it's a lot of information. This is the process name column and it's pretty long. Then, here's a new row.