Python regex, how to capture multiple rules from 1 string -
got quick question here regex. have file(testlog-date.log) has lines this
# 2014-04-09 16:43:15,136|pid: 1371|info|test.controller.root|finished processing request in 0.003355s https://website/heartbeat
i'm looking use regex capture pid , time. far have this
import re file_handler = open("testlog-20140409.log", "r") line in file_handler: var1 = re.findall(r'(\d+.\d+)s', line) print var1 file_handler.close()
so i'm able print process time..question how capture pid (and possibly other information variable var1? tried doing
var1 = re.findall(r'pid: (\d+) (\d+.\d+)s', line)
it prints out empty structures.
much appreciated thanks!
followup: file quite large. i'm thinking of storing data 1 structure , sort them using process time, , print out top 20. idea how properly?
use regex (.*)\|(pid: .*)\|(.*)\|(.*)\|(.*)
. each parenthesis in regex pattern denotes separate group.
in [125]: text = '2014-04-09 16:43:15,136|pid: 1371|info|test.controller.root|finished processing request in 0.003355s https://website/heartbeat' in [126]: pattern = re.compile(r'(.*)\|(pid: .*)\|(.*)\|(.*)\|(.*)') in [127]: results = re.findall(pattern, text) in [128]: results out[128]: [('2014-04-09 16:43:15,136', 'pid: 1371, 'info', 'test.controller.root', 'finished processing request in 0.003355s https://website/heartbeat')]
so have tuple each element belonging each of groups (timestamp, pid, routine, log level , log message.
edit
for large files, regex time consuming. log lines have '|' delimiter. can use split line.
all_lines = [] line in file: all_lines.append(line.split('|'))
this stores data list of lists:
[['2014-04-09 16:43:15,136','pid: 1371','info','test.controller.root','finished processing request in 0.003355s https://website/heartbeat'], ..., ...]
to sort all_lines
can use sorted()
function , pass first field of each of sub-lists comparator.
sorted_lines = sorted(all_lines, key=lambda x: x[0])
Comments
Post a Comment