-
Notifications
You must be signed in to change notification settings - Fork 8
Expand file tree
/
Copy pathGame_Score_scrape_itamar.py
More file actions
42 lines (32 loc) · 1.7 KB
/
Game_Score_scrape_itamar.py
File metadata and controls
42 lines (32 loc) · 1.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
"""
Itamar's rough code for Ethan's project
"""
### step 1 : import packages, if this doesn't work for you, google how to download them and if still doesn't work contact me###
from bs4 import BeautifulSoup
import urllib
from urllib.request import Request, urlopen
#### step 2 : preprocess- get the game name. you will need to use this code but change the game_name variable
#### so that you're reading the game name form you're csv file/ dataframe" #######
game_name = "roller coaster" ##just for example
game_name=game_name.replace(' ','%20') ### you should probaly do some trial and error and see how much you need to manipulate
### your game name so that the code is robust. I just included one exmaple - add %20 instead of space.###
url_to_scrape = 'http://www.metacritic.com/search/all/'+game_name+'/results'
try:
req = Request(url_to_scrape, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
except Exception as e:
print ('something went wrong')
print (e)
print (url_to_scrape)
#f = urllib.request.urlopen(url_to_scrape)
html = BeautifulSoup(webpage, 'html.parser') ## beautifulsoup is a package for iterating over html structure
all_spans = html.find_all('span') ### after exploring the structure of the html file, you can see that the score always
### comes within a <span> tag, when the class name include metascore_w
for span in all_spans:
if span.get('class') is not None and 'metascore_w' in span.get('class'):
print (span)
score = span.text
break
print (score, type (score)) ### score is your desired score for the game
print (int(score))
# In[ ]: