python - Finding patterns in two files -

i have analyse 10 years worth of data , 50+ files each year. extracted data internet , have done extracted text regular expressions. format files differ each year , i'm not sure pattern consistent in files of individual years. format 2003 seems be

title (.*)

[header(.*)

color number number number string (\w+\s\d+\s\d+\s\d+\s.+)

color number number number string

color number number number string

color number number number string]<==== 1 block

header

color number number number string

color number number number string

color number number number string

color number number number string

........

my question is, there way program in python identify patterns within text files of given year?

a kind of pattern recognition, program outputs regular expression matches 1 block of data perhaps.

i using data linear algebra, want data accessible , organized other uses.

if it's possible maybe should go simpler , check see if each line of each block of data has same length when split space (or tab, or whatever token divides each column). there create tree data. like:

{title: {     block: [         [color, number, number, number, string],         [color, number, number, number, string]     ]  }  title:      ... }

if data irregular that, try using third party libraries either (1) clean html scraping data or (2) use natural language processing tokenize/parse data, seems overkill.

Search This Blog

Brazzel

python - Finding patterns in two files -

Comments

Post a Comment

Popular posts from this blog

Why can rails not find a route created by a helper? -

javascript - jquery or ashx not working -

opencv - DataType<cv::detail::deriv_type>::depth what is it used for -