yelp - File Operation in Python -
what trying do:
i trying use 'open' in python , script trying execute. trying give "restaurant name" input , file gets saved (reviews.txt).
script: (in short, script goes page , scrapes reviews)
from bs4 import beautifulsoup urllib import urlopen queries = 0 while queries <201: stringq = str(queries) page = urlopen('http://www.yelp.com/biz/madison-square-park-new-york?start=' + stringq) soup = beautifulsoup(page) reviews = soup.findall('p', attrs={'itemprop':'description'}) authors = soup.findall('span', attrs={'itemprop':'author'}) flag = true indexof = 1 review in reviews: dirtyentry = str(review) while dirtyentry.index('<') != -1: indexof = dirtyentry.index('<') endof = dirtyentry.index('>') if flag: dirtyentry = dirtyentry[endof+1:] flag = false else: if(endof+1 == len(dirtyentry)): cleanentry = dirtyentry[0:indexof] break else: dirtyentry = dirtyentry[0:indexof]+dirtyentry[endof+1:] f=open("reviews.txt", "a") f.write(cleanentry) f.write("\n") f.close queries = queries + 40 problem: it's using append mode 'a' , according documentation, 'w' write mode overwrites. when change 'w' nothing happens.
f=open("reviews.txt", "w") #does not work! actual question: edit: let me clear confusion.
i want one review.txt file reviews. everytime run script, want script overwrite existing review.txt new reviews according input.
thank you,
if understand behavior want, should right code:
with open("reviews.txt", "w") f: review in reviews: dirtyentry = str(review) while dirtyentry.index('<') != -1: indexof = dirtyentry.index('<') endof = dirtyentry.index('>') if flag: dirtyentry = dirtyentry[endof+1:] flag = false else: if(endof+1 == len(dirtyentry)): cleanentry = dirtyentry[0:indexof] break else: dirtyentry = dirtyentry[0:indexof]+dirtyentry[endof+1:] f.write(cleanentry) f.write("\n") this open file writing once , write entries it. otherwise, if it's nested in for loop, file opened each review , overwritten next review.
with statement ensures when program quits block, file closed. makes code easier read.
i'd suggest avoid using brackets in if statement, instead of
if(endof+1 == len(dirtyentry)): it's better use just
if endof + 1 == len(dirtyentry):
Comments
Post a Comment