Select and count your production.log!
Last time, I have demonstrated how we can put together simple filters to show only the lines we are interested in. This time I will show you how you can manipulate these lines and select only the parts you really need.
This post is the continuation to the What can you learn from production.log? , and will continue where the last one stopped. Please read that post, before continuing to this one.
Enumeration and counting
You have finally managed to put together the right combination of
grep
filters to only get the lines you want. But what can you do
with it?
What I like to start with is a simple line count. For example, I can
easily count the number of POST
requests my server has received for
this day, by executing the following filter:
cat log/production.log | grep 'method=POST'
| grep 'time=2015-02-16'
| wc -l
The above command should be pretty familiar by now except the last
command in the pipe. To count lines, we can use the word count
command wc
and pass it the option -l
to get the number of
lines.
Similarly to the above example we can enumerate the lines by using
the nl
command:
cat log/production.log | grep 'method=POST'
| grep 'time=2015-02-16'
| nl
This command doesn’t look so useful as the above, but it can come handy for orientation when you share the output with a friend or when you combine it with some other interesting commands.
Manipulating the lines
Now let’s do something more interesting. You have probably noticed that while we do remove the unnecessary lines, we do not remove the unnecessary noise from within the lines. For example let’s say that a query for today’s requests returned the following output:
method=POST path=/user format=*/* controller=users action=create status=200 duration=19.4 view=0.00 db=3.91 time=2016-02-16 16:48:37 +0000
method=GET path=/user format=*/* controller=users action=create status=200 duration=19.4 view=0.00 db=3.91 time=2016-02-16 16:50:37 +0000
method=POST path=/user format=*/* controller=users action=create status=200 duration=19.4 view=0.00 db=3.91 time=2016-02-16 16:52:12 +0000
Ugh, that is a little overwhelming, especially if I only want to
know the path and the method for example. Luckily, we can use
awk
to help us with pruning the above lines. But first let’s see
how awk
works on a simpler example. This command is really handy
for reorganizing and cutting of unnecessary parts of lines. Let’s
use the following line, and show only the first and the last entry.
dog cat fish
By default, awk
separates the fields in a line on the and gives
them back to us in the form of $1
, $2
, $3
… To show only the
first and the last entry, execute the following command:
echo "dog cat fish" | awk '{ print $1 $2 }'
In the above example I only used echo
as a mean to push something
into the awk
command, but it can work on any output. But wait!
This command has the following output, that is not what we want:
dogfish
Yes, we also have to put a separator between them:
echo "dog cat fish" | awk '{ print $1 " " $2 }'
We can of course put any string between them, like an arrow:
echo "dog cat fish" | awk '{ print $1 " ---> " $2 }'
We can also reorganize the output:
echo "dog cat fish" | awk '{ print $3 " ---> " $1 " ---> " $2 }'
That will give us:
fish ---> dog ---> cat
Enough with the examples! Let’s get back to our original objective — showing only the method and the path of today’s request:
cat log/production.log | grep 'method=POST'
| grep 'time=2015-02-16'
| awk '{ print $1 " " $2 }'
Hooray, this just what we needed!
method=POST path=/user
method=GET path=/user
method=POST path=/user
Final words
I hope you enjoyed this tutorial, I will give my best to finish the next one soon, where I will be talking about uniqueness and sorting.
Happy hacking!