Tuesday, January 27, 2009

How to parse Apache log files

How to parse Apache log files using the cat, grep and awk command, in order to see what a particular person was browsing

To view the Apache access log file:
cat /var/log/apache/access_log

Log lines look as follows:
74.6.72.36 - - [01/Nov/2006:02:10:33 -0500] "GET /~lupsha/artificialintelligence/artificialintelligence.htm HTTP/1.0" 200 116 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

To only view log lines that contain "Nov/2006" and "lupsha" and Mary's ip of "141.213.178.207", and to save all the contents of the log in another file called mary.txt, use grep as follows:
cat /var/log/apache/access_log | grep Nov/2006 | grep lupsha | grep 141.213.178.207 > /tmp/lupsha/mary.txt

The contents of /tmp/lupsha/mary.txt are now:
141.213.178.207 - - [10/Nov/2006:22:47:13 -0500] "GET /~lupsha/personal/pictures/2000.08.Trip.to.Crete/x-rethimnon-artistic.jpg HTTP/1.1" 200 7996 "http://socr.uwindsor.ca/~lupsha/personal/pictures/2000.08.Trip.to.Crete/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; InfoPath.1)"
141.213.178.207 - - [10/Nov/2006:22:47:13 -0500] "GET /~lupsha/personal/pictures/2000.08.Trip.to.Crete/x-rethimnon-artistic-2.jpg HTTP/1.1" 200 5503 "http://socr.uwindsor.ca/~lupsha/personal/pictures/2000.08.Trip.to.Crete/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; InfoPath.1)"
... etc

Take a log line as the one above, for example:
141.213.178.207 - - [10/Nov/2006:22:47:13 -0500] "GET /~lupsha/personal/pictures/2000.08.Trip.to.Crete/x-rethimnon-artistic.jpg HTTP/1.1" 200 7996 "http://socr.uwindsor.ca/~lupsha/personal/pictures/2000.08.Trip.to.Crete/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; InfoPath.1)"

We wish to take the path
/~lupsha/personal/pictures/2000.08.Trip.to.Crete/x-rethimnon-artistic.jpg
and convert it to:
<a href ="http://socr.uwindsor.ca/~lupsha/personal/pictures/2000.08.Trip.to.Crete/x-rethimnon-artistic.jpg"> [10/Nov/2006:22:47:13 </a>
so that a user can click on the link with the date in order to see what files were browsed.

We use the awk command as follows:
awk '{print "<a href=\"http://socr.uwindsor.ca"$7"\">" $4 "</a>"}' mary.txt > mary.html
The awk command above takes argument 7 and 4 from the log line and concatenates it to a custom html string, creating a link for the user.

by Alan Lupsha

No comments:

Post a Comment