python - Beautiful soup extracting specific columns -


slowly learning python , beautifulsoupbut been stumped this.

i trying extract 1st , 4th columns of data following layout (reduced in size) http://pastebin.com/btruubrn

the file stored locally , @ present have patchwork of code other similar issues cannot work

for row in soup.find('table')[0]body.findall('tr'): first_column = row.findall('td')[0].contents third_column = row.findall('td')[3].contents print (first_column, third_column) 

there multiple things wrong code. line:

soup.find('table')[0]body.findall('tr'): 

makes no sense. when use find returns single bs object. can not access elements index on single object. , wherever use findall, returns list of bs objects. means have loop on individual elements. reason body of loop won't work expected.

below code gets want:

from bs4 import beautifulsoup  html_file = open('html_file') soup = beautifulsoup(html_file)  table = soup.findall('table')[0] rows = table.findall('tr')  first_columns = [] third_columns = [] row in rows[1:]:     first_columns.append(row.findall('td')[0])     third_columns.append(row.findall('td')[2])  first, third in zip(first_columns, third_columns):     print(first.text, third.text) 

Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -