Python - reading lines of text, exploring html, and writing to a document within a single for loop -
i going try ask question in general way, realized it's little complicated me try , describe generally. here specifically:
i'm not programmer. i'm master's candidate in experimental psychology , side project statistics class created model predicting game purchases on steam. started learning how program in order collect data project.
my program far follows:
#the first line opens list of random steam ids created, #the second assigns them variable list = open('d:\python\steamuserids.txt').read().splitlines() steamid = str(list) #for purposes of figuring things out, i'm using first 10 entries in list #the next 4 lines url requests , assigning output variable "response" steamid in list[0:10:1]: request = urllib2.request('http://api.steampowered.com/iplayerservice/getownedgames/v0001/?key=mysteamapikey&steamid=%s' %steamid, headers={"user-agent": "mozilla/5.0 (windows nt 6.2; win64; x64) applewebkit/537.36 (khtml, gecko) chrome/32.0.1667.0 safari/537.36"}) response = urllib2.urlopen(request) request2 = urllib2.request('http://api.steampowered.com/isteamuser/getplayersummaries/v0001/?key=mysteamapikey&steamids=%s' %steamid, headers={"user-agent": "mozilla/5.0 (windows nt 6.2; win64; x64) applewebkit/537.36 (khtml, gecko) chrome/32.0.1667.0 safari/537.36"}) response2 = urllib2.urlopen(request2) f1 = open('d:/python/steam/userdata/%s playerinfo.txt'% (steamid, ), 'w') lines in response.readlines(): f1.write(lines) lines in response2.readlines(): f1.write(lines) f1.close()
so far, program working great. needs do, need more information. unfortunately, haven't found way way access additional variables i'm interested in through steam api. however, rest of information i'm interested in available in html source of users' profiles on steam. i'm having problems. can profile url line in second request.
in second url request, there's line reads either:
"profileurl": "http://steamcommunity.com/id/playerid/"
where "playerid" string user creates themselves
or
"profileurl": "http://steamcommunity.com/profiles/steamnumber/"
where "steamnumber" number generated steam (this same number used in steamid variable). think used when user has not created custom name profile.
problem 1: having difficulty printing player urls above. have been trying use "profileurl": target , using line.split() capture url, end funky characters indicating tabs , returns , i'm not sure how rid of quotes.
problem 2: when html page, can find data hand, i'm not sure how tell python it. 1 of pieces of information i'm interested in amount of reviews person has made. can find information in part of html:
<div class="profile_count_link"> <a href="http://steamcommunity.com/id/steamuser/recommended/"> <span class="count_link_label">reviews</span> <span class="profile_count_link_total"> 3
for these sections, i'm interested in number, i'm @ loss how capture if it's on different line text i'm using reference.
problem 3: possible keep code within current program , within loop such numbers presented in same document? have tried appending piece of code find profile url, after tried, started losing parts of previous responses.
sorry long-winded post.
when making calls steam api, append &format=json
url. i.e., in urls below:
http://api.steampowered.com/iplayerservice/getownedgames/v0001/?key=mysteamapikey&steamid=%s
http://api.steampowered.com/isteamuser/getplayersummaries/v0001/?key=mysteamapikey&steamids=%s
i think default format returns json
making explicit.
once have result, use python's json
module , load data json object
data = json.load(response)
problem 1:
can access profile url data["profileurl"]
. not need string split functions achieve that.
note: need change way access profleurl depending on structure of json response returned steam api. learn how json data formatted , have idea how access data in it.
problem 2:
content out of particular html, can use beautifulsoup library. html above review count using beautifulsoup, do:
from bs4 import beautifulsoup html = ''' <div class="profile_count_link"> <a href="http://steamcommunity.com/id/steamuser/recommended/"> <span class="count_link_label">reviews</span> <span class="profile_count_link_total"> 3 </span> </div> ''' soup = beautifulsoup(html) review_count = soup.find('span', attrs={'class':'profile_count_link_total'} print review_count.text # prints 3
problem 3:
not entirely sure asking here. start out things suggested above, have clearer picture of going problem.
Comments
Post a Comment