pwlog 2.1.1
The goals of pwlog include reducing the length and mystery of a typical getlogs output.
The default action of pwlog is to abbreviate the directory path in filenames and to drop the referral information. As of this writing, a "hit" as reported by getlogs includes "/htdocs/userdirs/userid" at the start of a file in personal webspace and "/htdocs/userid" for a file in corporate webspace. pwlog abbreviates these as "(u)" and "(c)", respectively. Thus, using the same example log information as in the getlogs description, invoking pwlog without any of its options set would result in:
3819 WWW 182 1998:09:01:01:24:58 (u)/Skate 209.240.199.53 301 www1 3819 WWW 203 1998:09:01:03:25:47 (u)/Skate 204.244.93.232 301 www2 3819 FTP 9600 1998:09:01:04:16:05 (f)/pub/incoming/newfile.txt 166.84.197.198 200 ftp 3819 WWW 15130 1998:09:01:05:23:41 (c)/blur/index.cgi 24.112.48.33 200 web4 3819 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 web4 3819 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 web4 3819 WWW 191 1998:09:01:05:23:44 (c)/blur/banner.cgi 24.112.48.33 302 web4 3819 WWW 43 1998:09:01:05:23:44 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 web4 3819 WWW 162 1998:09:01:06:04:56 (c)/skatecity/robots.txt 204.123.9.47 200 web4 3819 WWW 4301 1998:09:01:06:18:17 (c)/blur/article.cgi 193.13.129.79 200 web4 3819 WWW 6665 1998:09:01:07:11:49 (c)/skatecity/ah/ 195.133.10.89 200 web4 3819 WWW 1343 1998:09:01:07:11:52 (c)/skatecity/ah/gfx/uchronia.sml.GIF 195.133.10.89 200 web4 3819 WWW 911 1998:09:01:07:11:54 (c)/skatecity/ah/gfx/intro.GIF 195.133.10.89 200 web4 3819 WWW 9723 1998:09:01:08:08:51 (c)/blur/resources/reviews.cgi 155.78.124.187 200 web4 |
Running pwlog
If you have already run getlogs, the procedure for creating output like the above example can most simply be done by just typing:
pwlog logfilename > newlogfilename
However, you can get pwlog to call getlogs for you. In fact, if you specify no input log file name, that automatically happens. In other words, instead of typing
getlogs > logfilename
pwlog logfilename > newlogfilename
you can instead just type
pwlog > newlogfilename
Options
Most of the more useful features of pwlog are only available via its option switches. A complete list and a short help message can be obtained by typing
pwlog -h
Noteable among these options are:
- pwlog -A
- Don't bother to do the filepath abbreviating described above.
- pwlog -P
- In the above example, the first column of the output remain the same. Additionally, the final column, which lists the name of the specific Panix machine which serviced the request, is generally non-useful information to all except Panix staffers. The -P option deletes these columns. In addition to reducing the filesize for the log information, this option makes it a bit more likely than a line in the file will fit into an 80-character screen, thereby making it somewhat more readable. Using this option on the above example would result in:
WWW 182 1998:09:01:01:24:58 (u)/Skate 209.240.199.53 301 WWW 203 1998:09:01:03:25:47 (u)/Skate 204.244.93.232 301 FTP 9600 1998:09:01:04:16:05 (f)/pub/incoming/newfile.txt 166.84.197.198 200 WWW 15130 1998:09:01:05:23:41 (c)/blur/index.cgi 24.112.48.33 200 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 WWW 191 1998:09:01:05:23:44 (c)/blur/banner.cgi 24.112.48.33 302 WWW 43 1998:09:01:05:23:44 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 WWW 162 1998:09:01:06:04:56 (c)/skatecity/robots.txt 204.123.9.47 200 WWW 4301 1998:09:01:06:18:17 (c)/blur/article.cgi 193.13.129.79 200 WWW 6665 1998:09:01:07:11:49 (c)/skatecity/ah/ 195.133.10.89 200 WWW 1343 1998:09:01:07:11:52 (c)/skatecity/ah/gfx/uchronia.sml.GIF 195.133.10.89 200 WWW 911 1998:09:01:07:11:54 (c)/skatecity/ah/gfx/intro.GIF 195.133.10.89 200 WWW 9723 1998:09:01:08:08:51 (c)/blur/resources/reviews.cgi 155.78.124.187 200 |
- pwlog -r
- One of the most mysterious things about getlogs output is the use of IP numbers to report the IDs of computers which have requested your webpages. You can convert these numbers to computer names, making it much easier to figure out where your visitors are coming from; just use the -r option with pwlog. However, be aware that (a) approximately 10-25% of IP numbers cannot be converted to hostnames, perhaps because the computers haven't been assigned "real names", and (b) the IP->name conversion takes time and if you have a busy site, it can take a really long time, perhaps on the order of hours, even days. Invoking this option and the -P option on the example log would result in output like the following:
WWW 182 1998:09:01:01:24:58 (u)/Skate proxy-226.iap.bryant.webtv.net 301 WWW 203 1998:09:01:03:25:47 (u)/Skate kam1d40.dial.uniserve.ca 301 FTP 9600 1998:09:01:04:16:05 (f)/pub/incoming/newfile.txt rbs.dialup.access.net 200 WWW 15130 1998:09:01:05:23:41 (c)/blur/index.cgi pc-403.on.rogers.wave.ca 200 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF pc-403.on.rogers.wave.ca 200 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF pc-403.on.rogers.wave.ca 200 WWW 191 1998:09:01:05:23:44 (c)/blur/banner.cgi pc-403.on.rogers.wave.ca 302 WWW 43 1998:09:01:05:23:44 (c)/blur/gfx/spacer.GIF pc-403.on.rogers.wave.ca 200 WWW 162 1998:09:01:06:04:56 (c)/skatecity/robots.txt vscooter.av.pa-x.dec.com 200 WWW 4301 1998:09:01:06:18:17 (c)/blur/article.cgi 193.13.129.79 200 WWW 6665 1998:09:01:07:11:49 (c)/skatecity/ah/ 89.10.133.195.dynamic.dialup.ru 200 WWW 1343 1998:09:01:07:11:52 (c)/skatecity/ah/gfx/uchronia.sml.GIF 89.10.133.195.dynamic.dialup.ru 200 WWW 911 1998:09:01:07:11:54 (c)/skatecity/ah/gfx/intro.GIF 89.10.133.195.dynamic.dialup.ru 200 WWW 9723 1998:09:01:08:08:51 (c)/blur/resources/reviews.cgi 155.78.124.187 200 |
- pwlog -g
- "Smash" the filenames of graphics, reducing them to *.gif or *.jpeg as appropriate. For example, foo.gif and bar.GIF would both be converted to "*.gif". This is not necessarily useful when running pwlog, but comes in very handy when running pwstat.
- pwlog -q list
- Filter log entries by usage types, where "list" can be one or more of c, u, or f. If c, then we want corporate web hits included; if u, include personal web hits; and if f, include ftp transfers. Note: most Panix users do not have both corporate and personal web traffic, but corporate users may want to use this option to generate separate logs for their web and ftp traffic.
- pwlog -m
- Omit any request coming from any *.panix.com and *.access.net host.
- pwlog -M
- Include only requests coming from within the *.panix.com and *.access.net domains.
- pwlog -b pattern
- Include this log entry only if the request came from a machine which includes this pattern (a Perl regexp). Note: This test is made after IP-to-hostname is attempted, assuming that option is turned on.
- pwlog -B pattern
- Omit this log entry only if the request came from a machine which includes this pattern (a Perl regexp). Note: If you specify any combination of the -b, -B, -m and -M options, only one of them will be evaluated. Preference is in the order just given (i.e., -b wins)
- pwlog -d somedate
- Report only entries occurring on or after the specified date. The format of the specified date must be YYYY:MM:DD; for example, to obtain a report limited to requests on or after August 15, 1997, you would replace somedate with 1997:08:15. Note: Remember that the only web requests which will be checked against the specified date are those from the log file(s) you've specified.
- pwlog -D somedate
- Similar to the -d option except that it reports only entries on or before the specified date.
- pwlog -f pattern
- Skip this log entry if the filename does not include this pattern (a Perl regexp).
- pwlog -F pattern
- Skip this log entry if the filename does include this pattern (a Perl regexp).
- pwlog -l
- Execute getlogs -o and use the result as input for pwlog. This option is ignored if you specify an input log file.
Converting to Common Log Format
NOTE. This section is obsolete. If you need you logs in Common Log Format, use getclogs to obtain them.
It may be that you have obtained some handy-dandy third-party stats program which you'd like to use, but you can't because the output from getlogs and the above described output from pwlog aren't in "common log format", which most such programs require. If so, there are two additional pwlog options which you will find of use:
- pwlog -k
- Use this option to convert getlogs or pwlog output to common log format. (If you specify both the -P and -k options, -k wins.) The result should looking something like the following:
209.240.199.53 - - [01/Sep/1998:01:24:58 -0500] "HEAD (u)/Skate HTTP/1.0" 301 182 204.244.93.232 - - [01/Sep/1998:03:25:47 -0500] "HEAD (u)/Skate HTTP/1.0" 301 203 166.84.197.198 - - [01/Sep/1998:04:16:05 -0500] "FTP (f)/pub/incoming/newfile.txt FTP/X.X" 200 9600 24.112.48.33 - - [01/Sep/1998:05:23:41 -0500] "GET (c)/blur/index.cgi HTTP/1.0" 200 15130 24.112.48.33 - - [01/Sep/1998:05:23:43 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 24.112.48.33 - - [01/Sep/1998:05:23:43 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 24.112.48.33 - - [01/Sep/1998:05:23:44 -0500] "HEAD (c)/blur/banner.cgi HTTP/1.0" 302 191 24.112.48.33 - - [01/Sep/1998:05:23:44 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 204.123.9.47 - - [01/Sep/1998:06:04:56 -0500] "GET (c)/skatecity/robots.txt HTTP/1.0" 200 162 193.13.129.79 - - [01/Sep/1998:06:18:17 -0500] "GET (c)/blur/article.cgi HTTP/1.0" 200 4301 195.133.10.89 - - [01/Sep/1998:07:11:49 -0500] "GET (c)/skatecity/ah/ HTTP/1.0" 200 6665 195.133.10.89 - - [01/Sep/1998:07:11:52 -0500] "GET (c)/skatecity/ah/gfx/uchronia.sml.GIF HTTP/1.0" 200 1343 195.133.10.89 - - [01/Sep/1998:07:11:54 -0500] "GET (c)/skatecity/ah/gfx/intro.GIF HTTP/1.0" 200 911 155.78.124.187 - - [01/Sep/1998:08:08:51 -0500] "GET (c)/blur/resources/reviews.cgi HTTP/1.0" 200 9723 |
- pwlog -K
- Similar to the -k option, except that it attempts to convert the log file to "extended common log format". Basically, this means also including referral information. Extended common log should also include the user agent, i.e., browser type, but that data is not available from the getlogs output to start with. The result should looking something like the following:
209.240.199.53 - - [01/Sep/1998:01:24:58 -0500] "HEAD (u)/Skate HTTP/1.0" 301 182 "http://www.xs4all.nl:80/~lowlevel/skate/linx.html" "UNKNOWN" 204.244.93.232 - - [01/Sep/1998:03:25:47 -0500] "HEAD (u)/Skate HTTP/1.0" 301 203 "-" "UNKNOWN" 166.84.197.198 - - [01/Sep/1998:04:16:05 -0500] "FTP (f)/pub/incoming/newfile.txt FTP/X.X" 200 9600 "-" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:41 -0500] "GET (c)/blur/index.cgi HTTP/1.0" 200 15130 "http://www.yahoo.ca/Recreation/Sports/Skating/Inline_Skating/Magazines/" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:43 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 "http://www.skating.com/" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:43 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 "http://www.skating.com/" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:44 -0500] "HEAD (c)/blur/banner.cgi HTTP/1.0" 302 191 "http://www.skating.com/" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:44 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 "http://www.skating.com/" "UNKNOWN" 204.123.9.47 - - [01/Sep/1998:06:04:56 -0500] "GET (c)/skatecity/robots.txt HTTP/1.0" 200 162 "-" "UNKNOWN" 193.13.129.79 - - [01/Sep/1998:06:18:17 -0500] "GET (c)/blur/article.cgi HTTP/1.0" 200 4301 "http://altavista.digital.com/cgi-bin/query?pg=q&kl=XX&q=%22Salomon+inline%22" "UNKNOWN" 195.133.10.89 - - [01/Sep/1998:07:11:49 -0500] "GET (c)/skatecity/ah/ HTTP/1.0" 200 6665 "http://www.yahoo.com/Arts/Humanities/Literature/Genres/" "UNKNOWN" 195.133.10.89 - - [01/Sep/1998:07:11:52 -0500] "GET (c)/skatecity/ah/gfx/uchronia.sml.GIF HTTP/1.0" 200 1343 "http://www.skatecity.com/ah/" "UNKNOWN" 195.133.10.89 - - [01/Sep/1998:07:11:54 -0500] "GET (c)/skatecity/ah/gfx/intro.GIF HTTP/1.0" 200 911 "http://www.skatecity.com/ah/" "UNKNOWN" 155.78.124.187 - - [01/Sep/1998:08:08:51 -0500] "GET (c)/blur/resources/reviews.cgi HTTP/1.0" 200 9723 "http://www.hotbot.com/?SW=web&SM=MC&MT=Rollerblade%2bReviews&DC=10&DE=2&RG=NA&_v=2" "UNKNOWN" |
One warning about this conversion process: Besides the non-availability of user agent information, getlogs also does not include the request method (GET, POST or HEAD) and so pwlog will make an educated guess when converting to common log format. Basically, it assumes that all web requests are GETs unless there is a return code in the 300s. In that case, pwlog decided that it's a HEAD. It will not assign the POST method to any entry in the log, which is of course quite wrong if you have a lot of CGI scripts running. This should not be a problem when you are running stats, but we include the warning here just so that you know.
Also, pwstat does not recognize Common Log format.
IP-to-Hostname Resolving and the Host Hash File
The way that the -r option in the pwlog and pwstat programs determines the machine names corresponding to the IP numbers in the weblogs is to do a host lookup for each number. However, since most people who hit a good website hit it more than once, doing a lookup for every single entry in a log file would be needlessly repetitious. Thus, the pwlog and pwstat programs maintain a file of matching IP numbers and hostnames, and they check in this file for a match before actually executing an IP lookup. At present, this is done for every single user who executes pwlog and pwstat; there is no Panix-wide common file which all pwlog and pwstat users can access. The name of the hostfile is .pwhosts, and you will find your copy in your home (login) directory.
The process of converting IP numbers to hostnames can be incredibly slow, whether it occurs in pwlog or in pwstat. In fact, it can be downright maddening if you have a popular site. Lookup time for just a couple days worth of hits on my own pages can take over an hour. clay once reported that it took about 10 hours to resolve the new hostnames seen during a week of traffic to his site, and that was back in late 1995, when web traffic was a fraction of what it is now.
Persons with popular sites will also find that their .pwhosts file can get pretty large. Mine, for example, was up to 475 kb by the summer of 1995, after only a few months of traffic to my pages. If you have a popular set of pages, it wouldn't be too long before your copy of .pwhosts was into the megabytes. At that point, it's time to ask if you really need to know the names of all the machines visiting your site.
All this said, you may understand why your time is better spent (and less computing time and disk space wasted) if you do not invoke the -r option in either pwlog or pwstat
Last Modified:Wednesday, 30-Jan-2013 12:14:10 EST
© Copyright 2006-2021
Public Access Networks Corporation