파이썬 웹크롤링, html 내 다수 anchor 중 특정 anchor 선택하는 방법

Question

파이썬 웹크롤링, html 내 다수 anchor 중 특정 anchor 선택하는 방법

조회수 677회

python

0

싫어요

파이썬 웹크롤링 초보자입니다. BeautifulSoup을 활용해 네이버 뉴스 인링크 주소를 추출하는 게 목표입니다.

즉, 첨부 이미지에서 "http://sports.donga.com/"이 아닌 "https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=106&oid=382&aid=0000896566" 를 긁어오고 싶습니다.

아래처럼 코딩했는데 자꾸 http://sports.donga.com/만 프린팅되네요...

살려주십시오

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('https://search.naver.com/search.naver?&where=news&query=%22``%5B%EB%8B%A8%EB%8F%85%5D%22&sm=tab_pge&sort=1&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:dd,p:all,a:all&mynews=0&refresh_start=0&start=1')
bsObject = BeautifulSoup(html, "html.parser")


news_urls = []
for cover in bsObject.find_all('li', {'class':'bx'}):
    link = cover.select('a.info')[0].get('href')
    news_urls.append(link)
print(news_urls)

(•́ ✖ •̀)
알 수 없는 사용자

1. 되도록 코드 전문을 올려주세요. 2.코드를 이미지로 올리지 마세요. 초보자 2021.3.15 14:38
알려주셔서 감사합니다. 본문 수정했습니다. 알 수 없는 사용자 2021.3.15 14:45

댓글 입력

score 0 · Accepted Answer

네이버에서 검색어로 검색한 뉴스들의 링크를 가져오고 싶은 것으로 보입니다.

다음과 같이 수정하면 됩니다.

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('https://search.naver.com/search.naver?&where=news&query=%22``%5B%EB%8B%A8%EB%8F%85%5D%22&sm=tab_pge&sort=1&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:dd,p:all,a:all&mynews=0&refresh_start=0&start=1')
bsObject = BeautifulSoup(html, "html.parser")


news_urls = []
c = bsObject.find_all('a', {'class':'info'})
for cover in c:
    link = cover['href']
    if link.find('https://news.naver.com') != -1:
        news_urls.append(link)
print(news_urls)

>> ['https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=100&oid=081&aid=0003171086',
'https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=106&oid=018&aid=0004876243',
'https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=105&oid=092&aid=0002216140',
'https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=106&oid=015&aid=0004513548']

파이썬 웹크롤링, html 내 다수 anchor 중 특정 anchor 선택하는 방법

조회수 677회

python

0

(•́ ✖ •̀)
알 수 없는 사용자

댓글 입력

1 답변

0

초보자 1,785 points

2021-03-15 15:20:02에 작성됨

댓글 달기

파이썬 웹크롤링, html 내 다수 anchor 중 특정 anchor 선택하는 방법

조회수 677회

python

0

(•́ ✖ •̀)알 수 없는 사용자

댓글 입력

1 답변

0

초보자 1,785 points

2021-03-15 15:20:02에 작성됨

댓글 달기

답변을 하려면 로그인이 필요합니다.

(•́ ✖ •̀)
알 수 없는 사용자