편집 기록

프로필 알 수 없는 사용자님의 편집

날짜2020.02.24

웹 크롤링하려고 하는데, 일부 항목은 클로링이 되는데, 일부 항목이 클로링이 안됩니다. 원인을 모르겠습니다.

python

crawling

페이지내 여러요소중 크롤링 안되는 항목이 있습니다. 어떻게 해야 클로링 할수 있나요?

클로링 되는 항목(2개) : addrs, a_earths

클로링 안되는 항목(1개) : points

맨 마지막에 있는 "points = soup.select('.addr_point')" 이 부분이 크롤링이 안되네요. (빨간색 점선 박스내) 원인을 모르겠습니다.

자문 부탁 드립니다.

import urllib.parse
from bs4 import BeautifulSoup
import re

url = 'http://www.dooinauction.com/auction/ca_list.php'

req = urllib.request.Request(url) #
html = urllib.request.urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser') #beautifulsoup 분석

tots = soup.select('div.title_left font') #총 물건수 추출등
tot = int(re.findall('\d+', tots[0].text)[0]) #
print(f'물건건수 : {tot}건')

url = f'http://www.dooinauction.com/auction/ca_list.php?total_record={tot}&search_fm_off=1&search_fm_off=1&start=0'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser') #beautifulsoup 분석

addrs = soup.select('.addr')  # 클로링 OK
a_earths = soup.select('.list_class.bold') #클로링 OK
points = soup.select('.addr_point') #클로링 NO ㅠ
print()

프로필 편집요청빌런님의 편집

날짜2020.02.18

웹 크롤링이 되는 부분도 있고, 안되는 부분도 있습니다. 원인을 모르겠네요. ㅠㅠ

python

crawling

페이지내 여러요소중 대부분은 크롤링이 되는데. 맨 마지막에 있는 "points = soup.select('.addr_point')" 이 부분이 크롤링이 안되네요. (빨간색 점선 박스내) 원인을 모르겠습니다.

자문 부탁 드립니다.

import urllib.parse
from bs4 import BeautifulSoup
import re

url = 'http://www.dooinauction.com/auction/ca_list.php'

req = urllib.request.Request(url) #
html = urllib.request.urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser') #beautifulsoup 분석

tots = soup.select('div.title_left font') #물건수를 추출 여러 문자 호출됨 예:물건수, 14000건, FF5500 등
tot_123i = int(re.findall('\d+', tots[0].text)[0]) #
print(f'물건건수 : {tot_123i}건')

url = f'http://www.dooinauction.com/auction/ca_list.php?total_record={tot_123i}&search_fm_off=1&search_fm_off=1&start=0'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser') #beautifulsoup 분석

d_nums = soup.select('.first.list_img > img')
nums = soup.select('.no')
addrs = soup.select('.addr') 
a_earths = soup.select('.list_class.bold') 
jprices = soup.select('td.price > div:nth-child(1)')
lprices = soup.select('td.price > div.clr_blue')
decs = soup.select('div.price.clr_blue') 
stimes = soup.select('td.date') 
points = soup.select('.addr_point') 
print()

프로필 알 수 없는 사용자님의 편집

날짜2020.02.17

(초보) 웹 크로링이 되는 부분도 있고, 안되는 부분도 있습니다. 원인을 모르겠네요. ㅠㅠ

python

페이지내 여러요소중 대부분은 클로링이 되는데. 맨 마지막에 있는 "points = soup.select('.addr_point')" 이 부분이 클로링이 안되네요. (빨간색 점선 박스내) 원인을 모르겠습니다. 자문 부탁 드립니다.

import urllib.parse
from bs4 import BeautifulSoup
import re

url = 'http://www.dooinauction.com/auction/ca_list.php'

req = urllib.request.Request(url) #
html = urllib.request.urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser') #beautifulsoup 분석

tots = soup.select('div.title_left font') #물건수를 추출 여러 문자 호출됨 예:물건수, 14000건, FF5500 등
tot_123i = int(re.findall('\d+', tots[0].text)[0]) #
print(f'물건건수 : {tot_123i}건')

url = f'http://www.dooinauction.com/auction/ca_list.php?total_record={tot_123i}&search_fm_off=1&search_fm_off=1&start=0'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser') #beautifulsoup 분석

d_nums = soup.select('.first.list_img > img')
nums = soup.select('.no')
addrs = soup.select('.addr') 
a_earths = soup.select('.list_class.bold') 
jprices = soup.select('td.price > div:nth-child(1)')
lprices = soup.select('td.price > div.clr_blue')
decs = soup.select('div.price.clr_blue') 
stimes = soup.select('td.date') 
points = soup.select('.addr_point') 
print()