KBO 순위를 파이썬으로 크롤링하고 싶습니다.
조회수 1149회
from bs4 import BeautifulSoup
from urllib.request import urlopen
response = urlopen('https://www.koreabaseball.com/TeamRank/TeamRank.aspx')
soup = BeautifulSoup(response, 'html.parser')
i = 1
data = ""
for anchor in soup.select("tbody"):
data += anchor.get_text() + "\n"
i += 1
a = data.split("\n")
print(data.replace(' ', ''))
현재 이렇게 코드를 쓰고 있는데
1
NC
65
44
19
2
0.698
0
6승2무2패
5승
23-1-9
21-1-10
2
두산
66
39
27
0
0.591
6.5
6승0무4패
1패
18-0-13
21-0-14
3
키움
68
38
30
0
0.559
8.5
3승0무7패
3패
23-0-11
15-0-19
4
KIA
64
35
29
0
0.547
9.5
6승0무4패
2승
20-0-11
15-0-18
5
LG
66
35
30
1
0.538
10
5승1무4패
1승
17-1-17
18-0-13
6
삼성
66
34
32
0
0.515
11.5
4승0무6패
2패
21-0-15
13-0-17
7
KT
66
32
33
1
0.492
13
5승1무4패
1패
19-0-15
13-1-18
8
롯데
64
31
33
0
0.484
13.5
5승0무5패
1승
18-0-11
13-0-22
9
SK
67
23
44
0
0.343
23
6승0무4패
3승
14-0-20
9-0-24
10
한화
68
17
51
0
0.250
29.5
2승0무8패
7패
9-0-24
8-0-27
NC■6-3-05-4-02-3-02-1-15-2-08-1-12-1-07-2-07-2-044-19-2두산3-6-0■2-2-07-2-07-3-03-3-03-2-05-3-06-3-03-3-039-27-0키움4-5-02-2-0■4-5-06-3-04-5-03-3-03-4-06-3-06-0-038-30-0KIA3-2-02-7-05-4-0■1-2-04-5-04-5-06-1-04-2-06-1-035-29-0LG1-2-13-7-03-6-02-1-0■4-5-03-4-03-3-07-2-09-0-035-30-1삼성2-5-03-3-05-4-05-4-05-4-0■2-6-06-3-04-2-02-1-034-32-0KT1-8-12-3-03-3-05-4-04-3-06-2-0■2-7-03-0-06-3-032-33-1롯데1-2-03-5-04-3-01-6-03-3-03-6-07-2-0■3-3-06-3-031-33-0SK2-7-03-6-03-6-02-4-02-7-02-4-00-3-03-3-0■6-4-023-44-0한화2-7-03-3-00-6-01-6-00-9-01-2-03-6-03-6-04-6-0■17-51-0
이렇게 되는데 어떻게 해야지 1위 - NC
처럼 만들 수 있을까요?
-
(•́ ✖ •̀)
알 수 없는 사용자
1 답변
-
data = """1 NC 67 44 21 2 0.677 0 5승1무4패 2패 23-1-9 21-1-12 2 두산 68 40 28 0 0.588 5.5 6승0무4패 1패 19-0-14 21-0-14 3 KIA 66 37 29 0 0.561 7.5 6승0무4패 4승 22-0-11 15-0-18 4 키움 70 39 31 0 0.557 7.5 4승0무6패 1승 24-0-12 15-0-19 5 LG 68 36 31 1 0.537 9 6승0무4패 1승 17-1-17 19-0-14 6 KT 68 34 33 1 0.507 11 5승1무4패 2승 21-0-15 13-1-18 7 삼성 68 34 34 0 0.500 11.5 4승0무6패 4패 21-0-15 13-0-19 8 롯데 66 32 34 0 0.485 12.5 5승0무5패 1패 18-0-11 14-0-23 9 SK 68 24 44 0 0.353 21.5 6승0무4패 4승 14-0-20 10-0-24 10 한화 69 17 52 0 0.246 29 1승0무9패 8패 9-0-25 8-0-27""".replace("\n", " ").split("\t") rank = [val for idx, val in enumerate(data) if idx % 12 == 0] team = [val for idx, val in enumerate(data) if idx % 12 == 1] # ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'] # ['NC', '\xeb\x91\x90\xec\x82\xb0', 'KIA', '\xed\x82\xa4\xec\x9b\x80', 'LG', 'KT', '\xec\x82\xbc\xec\x84\xb1', '\xeb\xa1\xaf\xeb\x8d\xb0', 'SK', '\xed\x95\x9c\xed\x99\x94'] print(rank) print(team) result = map(lambda (idx, val): val + '위 - ' + team[idx], enumerate(rank)) print(result) # ['1\xec\x9c\x84 - NC', '2\xec\x9c\x84 - \xeb\x91\x90\xec\x82\xb0', '3\xec\x9c\x84 - KIA', '4\xec\x9c\x84 - \xed\x82\xa4\xec\x9b\x80', '5\xec\x9c\x84 - LG', '6\xec\x9c\x84 - KT', '7\xec\x9c\x84 - \xec\x82\xbc\xec\x84\xb1', '8\xec\x9c\x84 - \xeb\xa1\xaf\xeb\x8d\xb0', '9\xec\x9c\x84 - SK', '10\xec\x9c\x84 - \xed\x95\x9c\xed\x99\x94']
1위 - NC
같은 모양으로 리스트에 담겨서 나옵니다.-
(•́ ✖ •̀)
알 수 없는 사용자
-
댓글 입력