[yaala] logfile parse

qMax yaala@verplant.org
Thu, 10 Feb 2005 15:09:06 +0600


Thursday, February 10, 2005, 2:44:55 PM, gehaijiang@baidu.com wrote:
g> 61.234.149.130 - - [09/Feb/2005:00:00:00 +0800] "GET
g> /m?ct=134217728
g> tn=umt,-%20-%20\xd6\xdc\xbd\xdc\xc2\xd7%20-%20\xbb\xd8\xb5\
g> xbd\xb9\xfd\xc8\xa5
g> word=mp3,http://202.196.150.201/mu/ht/zhoujielun/badukongjian/YWQz.mp3
g> lm=16777216 HTTP/1.1" 200 1595 mod_gzip: 
g> 55pct. "-"

Copy Common.pm parser into new name take a look at it:
change %DATAFIELDS and function parse.

the regexp to match your log is:
/(\d+\.\d+\.\d+\.\d+\)\s-\s-\s[(\d+)/(\w+)/(\d\d\d\d):(\d\d):(\d\d):(\d\d) ([+-]\d\d\d\d)]\s"(\w+)\s([^"]+)\sHTTP/\d\.\d"\s(\d\d\d)\s(\d+)\smod_gzip:\s(\d+)pct.\s"-"/
here:
($ip,$mday,$Month,$year,$hour,$min,$sec,$tzone,$http_method,$url,$http_version,$http_response,$size,$gzip_count) =
($1,$2,$3,$4,$5,$6,$7,$8,$9,${10},${11},${12},${13},${14});

you may replace all "\s" with space - it is the same, i put it here
to avoid text wrapping in mail

then examine other code of sub parse and correct it to use your datafields.

if you need also to parse internal then u need another regexp,
but at first look i cannot figure out what this url contains.

-- 
 qMax