Python で Web スクレイピング（2） ~ rteak

前回に引き続き、Web スクレイピングに挑戦中。

本日、新たに学んだことは次の通り。

これで、毎日のドル円の始値・高値・安値・終値を Yahoo ファイナンスのサイトから取得し、記録できるようになった。

■ソースコード

	# Google Sheets のワークシートを選択
	ws = ss.worksheet('USD-JPY')

	# ワークシート１列目（日付）の取得# データの重複をチェックするため
	val = ws.col_values(1)

	# ワークシートに追加する行番号を初期化row = 9
	# Web ページの読み込み
	page = requests.get('https://info.finance.yahoo.co.jp/history/?code=usdjpy')

	# コンテンツの解析
	con = BeautifulSoup(page.content, "html.parser")

	# テーブル情報の取得
	tr = con.find_all('tr')

	# テーブルの行数分回す
	for td in tr:

	# 「td」を含む行を抽出し、文字列化する
	item = td.find_all('td')
	lst = str(item)

	# 「<td>」を含む文字列だけを取得する
	if lst.find('<td>') > -1:

	# 不要な文字を削除／変換する
	lsts = lst.replace('<td>', '')
	lsts = lsts.replace('</td>', '')
	lsts = lsts.replace('[', '')
	lsts = lsts.replace(']', '')
	lsts = lsts.replace('年', '/')
	lsts = lsts.replace('月', '/')
	lsts = lsts.replace('日', '')
	lsts = lsts.replace(',', '')

	# カンマ区切りの文字列をリスト化
	a = lsts.split()

	# 日付で重複チェック
	dt = a[0]

	if val.count(dt) == 0:

	# リストをワークシートに挿入
	ws.insert_row(a, row)
	row = row + 1

view raw web_scraping_2.py hosted with ❤ by GitHub

rteak