파이썬 크롤링시 인코딩 에러

Question

파이썬 크롤링시 인코딩 에러

조회수 2382회

python

crawling

encoding

error

0

싫어요

from bs4 import BeautifulSoup
from urllib.request import urlopen
from html_table_extractor.extractor import Extractor
from selenium import webdriver
import sys

# -*- encoding: utf-8 -*-

driver = webdriver.Firefox(executable_path='C:/Users/i/Downloads/geckodriver-v0.19.1-win32/geckodriver')

driver.implicitly_wait(1)


driver.get('http://terms.naver.com/list.nhn?cid=58401&categoryId=58401&so=st4.asc&viewType=&categoryType=')

xpath = '//*[@id="content"]/div[4]/ul/ul/li[5]/ul/li['
xpath_bottom = ']/a'

index = 2
while (index <= 26) :
    driver.find_element_by_xpath(xpath + str(index) + xpath_bottom).click()

    html = driver.page_source.encode('cp949', errors='replace')
    soup = BeautifulSoup(html, 'html.parser')

    table = soup.select("#size_ct > div.box_tbl > table")

    title1 = soup.select("#content > div.section_wrap > div.headword_title > h2")
    title2 = str(title1).replace("[", "").replace("]", "").replace("<", "").replace(">", "").replace("/", "").replace("h2", "").replace("class", "").replace("=", "").replace("headword", "").replace("\"", "").lstrip()

    stringTable = str(table)

    extractor = Extractor(stringTable).parse()
    extractor.write_to_csv(title2, path='.')

    driver.get('http://terms.naver.com/list.nhn?cid=58401&categoryId=58401&so=st4.asc&viewType=&categoryType=')
    index += 1

현재 파이썬을 사용하여 웹 크롤링을 하고 있는데 UnicodeEncodeError: 'cp949' codec can't encode character '\xa0' in position 26: illegal multibyte sequence 이런 인코딩 에러가 자꾸 발생합니다 ㅜㅜ

(•́ ✖ •̀)
알 수 없는 사용자

파이썬 버전이 몇인가요? 정영훈 2017.12.16 20:54
3.6입니다! 알 수 없는 사용자 2017.12.17 21:57
아래의 답글과 같이 질문의 소스는 python 3.6에서 문제 없이 수행됩니다. 정영훈 2017.12.18 02:53

댓글 입력

Answer 1

파이썬 크롤링시 인코딩 에러

조회수 2382회

python

crawling

encoding

error

0

(•́ ✖ •̀)
알 수 없는 사용자

댓글 입력

1 답변

1

정영훈 15,709 points

2017-12-16 22:07:44에 작성됨

댓글 달기

파이썬 크롤링시 인코딩 에러

조회수 2382회

python

crawling

encoding

error

0

(•́ ✖ •̀)알 수 없는 사용자

댓글 입력

1 답변

1

정영훈 15,709 points

2017-12-16 22:07:44에 작성됨

댓글 달기

답변을 하려면 로그인이 필요합니다.

(•́ ✖ •̀)
알 수 없는 사용자