[교육데이터분석]관할행정구역 크롤링

Python

[교육데이터분석]관할행정구역 크롤링

곽가누 2024. 6. 9. 17:37

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import pandas as pd
from bs4 import BeautifulSoup
# Chrome 브라우저 실행
driver = webdriver.Chrome()

try:
    import pandas as pd

    # 텍스트 파일을 DataFrame으로 읽어오기
    schools = pd.read_csv(r"C:\Users\gykwa\OneDrive\2024 - 교육데이터분석\학교도로명주소.txt")

    # '학교명' 열의 값만 출력
    for index, row in schools.iterrows():
        school = row['학교']

    # 크롤링할 웹 페이지 URL
        url = "https://www.juso.go.kr/support/AddressMainSearch.do?searchKeyword=%EC%84%9C%EC%9A%B8%ED%8A%B9%EB%B3%84%EC%8B%9C+%EC%84%B1%EB%8F%99%EA%B5%AC+%EB%A7%88%EC%9E%A5%EB%A1%9C+161&dsgubuntext=&dscity1text=&dscounty1text=&dsemd1text=&dsri1text=&dssan1text=&dsrd_nm1text=&aotYn=N"
    # 웹 페이지 열기
        driver.get(url)

    # 'searchKeyword' 입력 상자 로드 대기 (최대 10초)
        WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.NAME, 'searchKeyword')))

    # 'searchKeyword' 입력 상자 찾기
        search_input = driver.find_element(By.NAME, 'searchKeyword')

    # 주소 입력
        search_input.clear()
        search_input.send_keys(school)
        search_input.send_keys("\n")

    # 검색 결과 로드 대기 (최대 10초)

    # 버튼 요소 찾기
        try:
            button = WebDriverWait(driver, 5).until(
                EC.presence_of_element_located((By.XPATH, '/html/body/div[2]/div/main/div[3]/div[2]/ol/li[1]/div[2]/ul/li[3]/span[1]'))
            )
            html = button.get_attribute('outerHTML')
            # BeautifulSoup을 사용하여 HTML 파싱
            soup = BeautifulSoup(html, 'html.parser')

# <span> 태그 안에 있는 텍스트 가져오기
            span_text = soup.find('span').get_text()

# '왕십리도선동' 추출
            address = span_text.split(' ')[2]

            print(address)
            
            
        except TimeoutException:
            print("버튼을 찾지 못했습니다.")
            
finally:
    driver.quit()

'Python' 카테고리의 다른 글

[Pytorch] CUDA 설치하기 (0)	2024.07.19
[Python] Pandas 인덱싱 하는 법 정리 (0)	2023.08.11
[Python] 경사도 자동 크롤링 코드 (0)	2023.07.24
[Python] Pandas에서 파일 데이터 조작 (0)	2023.07.13
[Python] VS Code 인터프리터 바꾸는 법 (아나콘다가 VS code에서 자꾸 실행될 때) (0)	2023.07.07

현재글[교육데이터분석]관할행정구역 크롤링

가누의 코딩로그

죽이 되든 밥이 되든

NewsAPI, 백준, C++, 점프 점프, 소프트웨어융합학과, 단어감지프로그램, 17829, 3541, 2023소프트웨어대전, 상근타워, universidade de lisboa, 11060, 레벨 1, 경희대, 13414, 리스본대학교, tecnico lisboa, 도서배달로봇, Python, pandas,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

가누의 코딩로그

[교육데이터분석]관할행정구역 크롤링

'Python' 카테고리의 다른 글

'Python'의 다른글

티스토리툴바

[교육데이터분석]관할행정구역 크롤링

'Python' 카테고리의 다른 글

'Python'의 다른글

관련글

티스토리툴바