python - Beautiful soup extracting specific columns -
slowly learning python , beautifulsoupbut been stumped this.
i trying extract 1st , 4th columns of data following layout (reduced in size) http://pastebin.com/btruubrn
the file stored locally , @ present have patchwork of code other similar issues cannot work
for row in soup.find('table')[0]body.findall('tr'): first_column = row.findall('td')[0].contents third_column = row.findall('td')[3].contents print (first_column, third_column)
there multiple things wrong code. line:
soup.find('table')[0]body.findall('tr'):
makes no sense. when use find
returns single bs object. can not access elements index on single object. , wherever use findall
, returns list of bs objects. means have loop on individual elements. reason body of loop won't work expected.
below code gets want:
from bs4 import beautifulsoup html_file = open('html_file') soup = beautifulsoup(html_file) table = soup.findall('table')[0] rows = table.findall('tr') first_columns = [] third_columns = [] row in rows[1:]: first_columns.append(row.findall('td')[0]) third_columns.append(row.findall('td')[2]) first, third in zip(first_columns, third_columns): print(first.text, third.text)
Comments
Post a Comment