파이썬 네이버 검색결과 크롤링 문제..

Question

파이썬 네이버 검색결과 크롤링 문제..

조회수 1159회

python

crawling

naver

1

싫어요

결과가 다음과 같이 출력됩니다.. (' ', 경남 '코로나19' 확진 1명 추가, 23명으로 늘어(1보), ' ') (' ', [1보] 코로나19 신규 환자 60명…국내 확진자 총 893명, ' ')

제과 원하는 결과는 아래와 같이 출력하게 만들었다고 생각하는데.. 잘 안되네요.. 전체적으로 문제지만 특히 href를 왜 읽어오지 못하는지 모르겠습니다..

경남 '코로나19' 확진 1명 추가, 23명으로 늘어(1보)

http://www.newsis.com/view/?id=NISX20200225_0000931012&cID=10812&pID=10800

[1보] 코로나19 신규환자 60명…국내 확진자 총 893명

http://www.seoulwire.com/news/articleView.html?idxno=400533

도움 부탁드립니다..

import requests
import urllib.request
from bs4 import BeautifulSoup
from apscheduler.schedulers.blocking import BlockingScheduler

sched = BlockingScheduler()

old_newsflashs = []

def extract_newsflashs(old_newsflashs=[]):
    url = 'https://m.search.naver.com/search.naver?where=m_news&query=1보&sm=mtb_tnw&sort=1'
    req = requests.get(url)
    html = req.text
    soup = BeautifulSoup(html, 'html.parser')

    search_result = soup.select_one('#news_result_list')
    result_list = search_result.select('.bx > .news_wrap > a')

    news_list = []
    for title_list in result_list:
        title = (title_list.get_text())
        news_link = title_list['href']


        if '코로나' in title:
            news_list.append([title, news_link])

    newsflashs = []
    for news_list in result_list[:10]:
        newsflash = news_list
        newsflashs.append(newsflash)

    new_newsflashs=[]
    for newsflash in newsflashs:
        if newsflash not in old_newsflashs:
            new_newsflashs.append(newsflash)

    return new_newsflashs

def send_newsflashs():
    global old_newsflashs
    new_newsflashs = extract_newsflashs(old_newsflashs)
    if new_newsflashs:
        for newsflash in new_newsflashs:
            print(tuple(newsflash))
    else:
        pass
    old_newsflashs += new_newsflashs.copy()
    old_newsflashs = list(map(list, set(map(tuple, old_newsflashs))))

send_newsflashs()

sched.add_job(send_newsflashs, 'interval', seconds=60)

sched.start()

댓글 입력

score 0 · Accepted Answer

    news_list = []
    for title_list in result_list:
        title = (title_list.get_text())
        news_link = title_list['href']

        if '코로나' in title:
            news_list.append([title, news_link])

말씀하신것처럼 코드 전체를 다시 살펴야할 것 같은데요?

일단 href가 출력되지 않는 이유는, href를 담은 news_list를, 담기만 하고 어디서도 사용하지 않기 때문인걸로 보입니다.

파이썬 네이버 검색결과 크롤링 문제..

조회수 1159회

python

crawling

naver

1

nowp 9,214 points

2020-02-25 13:55:50에 수정됨

초보자 1,785 points

2020-02-25 11:21:45에 작성됨

댓글 입력

1 답변

0

편집요청빌런 3,226 points

2020-02-25 13:11:20에 작성됨

댓글 달기

파이썬 네이버 검색결과 크롤링 문제..

조회수 1159회

python

crawling

naver

1

nowp 9,214 points

2020-02-25 13:55:50에 수정됨

초보자 1,785 points

2020-02-25 11:21:45에 작성됨

댓글 입력

1 답변

0

편집요청빌런 3,226 points

2020-02-25 13:11:20에 작성됨

댓글 달기

답변을 하려면 로그인이 필요합니다.