파이썬 웹 크롤링 데이터의 태그가 다를경우

Question

파이썬 웹 크롤링 데이터의 태그가 다를경우

조회수 460회

python

0

싫어요

크롤링한 웹페이지 : https://finance.yahoo.com/quote/AWR?p=AWR

가져온 다른 부분들의 태그는 span이지만 Forward Dividend & Yield 부분만 태그가 td 이기에 이 부분을 수정하여 모든 정보를 나타내기 위한 코드를 작성했습니다...하지만 잘 작동하지 않네요.. 도움 부탁드립니다.

<원본코드>

import pandas as pd
import datetime
import requests
import yfinance as yf
import time
from requests.exceptions import ConnectionError
from bs4 import BeautifulSoup



def web_content_div(web_content,class_path):
    web_content_div = web_content.find_all('div',{'class': class_path})
    try:
        spans = web_content_div[0].find_all('span')
        texts = [span.get_text() for span in spans]

    except IndexError:
        texts = []

    return texts

def real_time_price(stock_code):

    url = 'https://finance.yahoo.com/quote/' + stock_code + '?p=' + stock_code 

    try :
        r = requests.get(url)
        web_content = BeautifulSoup(r.text,'lxml')
        texts = web_content_div(web_content, 'My(6px) Pos(r) smartphone_Mt(6px)')
        if texts != []:
            price, change = texts[0],texts[1]
        else:
            price , change = [] , []

    #Forward Dividend & Yield#################################문제가 되는 부분 ##################################################
        texts = web_content_div(web_content,'D(ib) W(1/2) Bxz(bb) Pstart(12px) Va(t) ie-7_D(i) ie-7_Pos(a) smartphone_D(b) smartphone_W(100%) smartphone_Pstart(0px) smartphone_BdB smartphone_Bdc($seperatorColor)')
        if texts != []:
            for count, forword in enumerate(texts):
                if forword == 'Forward Dividend & Yield':
                   dividend = texts[count + 1]
        else:
            dividend = []
    #####################################################문제가 되는 부분 ##################################################

        texts = web_content_div(web_content,'D(ib) W(1/2) Bxz(bb) Pstart(12px) Va(t) ie-7_D(i) ie-7_Pos(a) smartphone_D(b) smartphone_W(100%) smartphone_Pstart(0px) smartphone_BdB smartphone_Bdc($seperatorColor)')
        if texts != []:
            for count, EX in enumerate(texts):
                if EX == 'Ex-Dividend Date':
                    EXdate = texts[count + 1]
        else:
            EXdate = []

        texts = web_content_div(web_content,'D(ib) W(1/2) Bxz(bb) Pend(12px) Va(t) ie-7_D(i) smartphone_D(b) smartphone_W(100%) smartphone_Pend(0px) smartphone_BdY smartphone_Bdc($seperatorColor)')
        if texts != []:
            for count, vol in enumerate(texts):
                if vol == 'Volume':
                    volume = texts[count + 1]
        else:
            volume = []


    except ConnectionError:
        price, change, dividend, EXdate,volume = [],[],[],[],[]

    return price, change, dividend, EXdate,volume


stock=['awr']


while(True):
    info = []
    col = []
    time_stamp = datetime.datetime.now() - datetime.timedelta(hours=14)
    time_stamp = time_stamp.strftime('%Y-%M-%D %H:%M:%S')
    for stock_code in stock:
        price, change,dividend, EXdate,volume = real_time_price(stock_code)
        info.append(price)
        info.extend([change])
        info.extend([dividend])
        info.extend([EXdate])
        info.extend([volume])
        time.sleep(1)

    col = [time_stamp]
    col.extend(info)
    print(col)

결과

2021-51-03/16/21 03:51:27', '72.38', '+0.22 (+0.30%)', 'Ex-Dividend Date', 'Feb 12, 2021', '203,953'

이중에서 'Ex-dividend date' 가 Forward Dividend & Yield 에 해당하는 1.34 (1.86%)가 되도록 수정하고 싶습니다.

(•́ ✖ •̀)
알 수 없는 사용자

어떤 걸 원하고, 어느 부분에서, 어떤 문제가 있는지요 초보자 2021.3.16 14:30
출력을 원하는 데이터들인 현재가, 변동, dividend,배당락일...중 dividend의 태그만 달라서 출력이 되지 않습니다. 이 부분만 태그를 td로 바꾼다면 출력이 될것 같습니다. 나름대로 row를 추가해서 dividend 부분에만 td를 찾도록 했는데...저렇게 하니 나머지 데이터들도 전부 나오지를 않네요. 알 수 없는 사용자 2021.3.16 16:33
현재 출력 결과물과, 출력 결과물이 어떻게 나오길 바라는지도 본문에 추가해주세요. 초보자 2021.3.16 17:37
요청하신 부분 추가하였습니다..전달이 잘 되었길 바랍니다. 알 수 없는 사용자 2021.3.16 17:58

댓글 입력

score 0 · Accepted Answer

코드가 길어서.. 필요한 부분만 새로 만들었습니다.

html로 제공되는 자료를 lxml로 사용하고 계시더군요.

어떤 형태의 자료가 필요한지는 모르겠으나.. 다음과 같은 식으로 추출이 가능합니다.

import requests
from bs4 import BeautifulSoup

def real_time_price(stock_code = 'AWR'):

    url = 'https://finance.yahoo.com/quote/' + stock_code + '?p=' + stock_code
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')

    price = soup.select_one(r'span.Mb\(-4px\).D\(ib\)').text
    rate = soup.select_one(r'span.Fw\(500\).Fz\(24px\)').text
    Earnings_Date = soup.select_one(r'div.smartphone_Pstart\(0px\).smartphone_BdB.smartphone_Bdc\(\$seperatorColor\) > table > tbody > tr:nth-of-type(5) > td.Ta\(end\).Fw\(600\).Lh\(14px\)').text
    Yield = soup.select_one(r'#quote-summary > div.D\(ib\).W\(1\/2\).Bxz\(bb\).Pstart\(12px\).Va\(t\).ie-7_D\(i\).ie-7_Pos\(a\).smartphone_D\(b\).smartphone_W\(100\%\).smartphone_Pstart\(0px\).smartphone_BdB.smartphone_Bdc\(\$seperatorColor\) > table > tbody > tr:nth-of-type(6) > td.Ta\(end\).Fw\(600\).Lh\(14px\)').text
    Volume = soup.select_one(r'#quote-summary > div.D\(ib\).W\(1\/2\).Bxz\(bb\).Pend\(12px\).Va\(t\).ie-7_D\(i\).smartphone_D\(b\).smartphone_W\(100\%\).smartphone_Pend\(0px\).smartphone_BdY.smartphone_Bdc\(\$seperatorColor\) > table > tbody > tr:nth-of-type(7) > td.Ta\(end\).Fw\(600\).Lh\(14px\) > span').text
    Dividend_Date = soup.select_one(r'#quote-summary > div.D\(ib\).W\(1\/2\).Bxz\(bb\).Pstart\(12px\).Va\(t\).ie-7_D\(i\).ie-7_Pos\(a\).smartphone_D\(b\).smartphone_W\(100\%\).smartphone_Pstart\(0px\).smartphone_BdB.smartphone_Bdc\(\$seperatorColor\) > table > tbody > tr:nth-of-type(7) > td.Ta\(end\).Fw\(600\).Lh\(14px\) > span').text

    print([Earnings_Date, price, rate, Yield, Dividend_Date, Volume])


real_time_price()

>> ['May 03, 2021 - May 07, 2021', '73.07', '+0.69 (+0.95%)', '1.34 (1.86%)', 'Feb 12, 2021', '138,368']

파이썬 웹 크롤링 데이터의 태그가 다를경우

조회수 460회

python

0

(•́ ✖ •̀)
알 수 없는 사용자

댓글 입력

1 답변

0

초보자 1,785 points

2021-03-17 09:26:11에 작성됨

댓글 달기

파이썬 웹 크롤링 데이터의 태그가 다를경우

조회수 460회

python

0

(•́ ✖ •̀)알 수 없는 사용자

댓글 입력

1 답변

0

초보자 1,785 points

2021-03-17 09:26:11에 작성됨

댓글 달기

답변을 하려면 로그인이 필요합니다.

(•́ ✖ •̀)
알 수 없는 사용자