머신러닝 | LEELAB

2. 블로그 내용 크롤링하기

페이지 정보

작성자 관리자 댓글 0건 조회 2,537회 작성일 22-09-26 13:39

본문

1. 라이브러리 설치

pip install selenium

pip install beautifulsoup4

2. 소스 코딩

파일명 : crawling_test.py

from selenium import webdriver

from selenium.common.exceptions import NoSuchElementException
from bs4 import BeautifulSoup
import time

options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1920x1080')
options.add_argument("disable-gpu")
# 혹은 options.add_argument("--disable-gpu")

driver = webdriver.Chrome('c:/webdrive/chromedriver_win32/chromedriver.exe', chrome_options=options)

url="http://blog.naver.com/sofaraway1?Redirect=Log&logNo=220602998731"
driver.get(url)
time.sleep(1)
driver.switch_to_frame(driver.find_element_by_tag_name("frame"));

html = driver.page_source # 페이지의 elements모두 가져오기
soup = BeautifulSoup(html, 'html.parser') # BeautifulSoup사용하기

print(soup)

driver.close()

3. 실행하기

다음글웹 크롤러 만들기 22.09.26

댓글목록

등록된 댓글이 없습니다.