BeautifulSoup을 이용해 이미지 크롤링 하는 방법에 대해 질문 있습니다

Question

BeautifulSoup을 이용해 이미지 크롤링 하는 방법에 대해 질문 있습니다

조회수 1007회

beautifulsoup

selenium

0

싫어요

제가 그 동안 크롤링 했던 이미지들은 소스가 다음과 같아 크롤링 시 소스를 가져온뒤 이미지를 저장 하였습니다.

html 코드 <img src="http://image.yes24.com/goods/89987423/800x0" alt="12별자리 남자" border="0">

크롤링에 사용한 코드

driver = webdriver.Chrome('C:\chromedriver\chromedriver.exe')

html = driver.page_source
soup = BeautifulSoup(html, 'html.pareser')

img = soup.find('div', {'class': 'img_Bdr')
img = img.find('img')['src']
img_name = img.find('img')['alt']
urllib.request.urlretrieve(img_url, "dir/" + str(img_name.strip().replace("/", ",").replace('"', "'").replace(":", "-").replace(">",  ")").replace("<", "(").replace("?", "")) + '.jpg')

그런데 크롤링을 하려는 사이트에서 이미지 소스가 다음과 같이 되어있습니다.

html 코드 background-image: url('https://d3mcojo3jv0dbr.cloudfront.net/2020/09/26/15/23/d64415d1cb8cd5ec291298591e9e97af.jpeg?w=288&h=384&q=65'); 사이트

어떻게 해야 이미지를 가져와 저장할 수 있나요?

(•́ ✖ •̀)
알 수 없는 사용자

크롤링 관련 질의하실때에는 크롤링 target url 에 대해서 말씀 부탁드립니다. 김호원 2020.10.7 16:41

댓글 입력

score 0 · Accepted Answer

import requests
from bs4 import BeautifulSoup as bs
from parse import * #pip install parse

def filesave(url):
    try:
        urlsplit = url.split('/')[-1]
        urlsplit = urlsplit.split('.')[0] # :D
        name = 'C:/Users/User/hi/'+urlsplit
        bn = requests.get(url).content
        if bn[0:3] != b'\xff\xd8\xff':
            print('this file is not JPEG file format')
            return 0
        else:
            if 'jpg' not in urlsplit:
                name += '.jpg'
        f = open(name,'wb')
        f.write(bn)
        f.close()
        print(f'[!] {name} saved')
        return name
    except Exception as e:
        print(e)
        return 0

def main(url):
    s = bs(requests.get(url).text, 'html.parser')
    img = s.find('div', {'class':'article-img'})
    result = parse("background-image: url('{}');", img['style'])[0] # :D
    filesave(result)

if __name__ == "__main__":
    main('https://fhjyang543.postype.com/series/457430/%EB%82%B4%EA%B0%80-%EC%82%AC%EB%9E%91%ED%95%9C-%EC%8B%A0%EC%97%90%EA%B2%8C')

게시글 변경하셨다고 말씀 주시지.

기존에 물어보신 내용에 조금 변형했습니다. 보고 참고해주세요.(주석이 있는 부분이 변형된 부분...)

그리고 크롤링 관련되어서는 관련 html 내용을 모르면 도움드리기 제한되는 부분이 있어서요

다음에... 관련되어서 robots.txt 범주내 url 질문하실때에는 꼭 관련 사이트 주소 남겨주시기를 부탁드립니다.