慢著我們是資訊社團
pip install requests
import requests #匯入requests模組
url='https://ckefgisc.github.io/' #這是你想要爬的網址
html=requests.get(url) #get函式返回一個response物件
print(html.text) #.text返回網頁原始碼
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<script src="https://code.jquery.com/jquery-3.6.1.min.js"></script>
<script src="/scripts/include.js"></script>
<script src="/scripts/news.js"></script>
<link rel="stylesheet" href="/styles/index.css">
<title>建北電資 CKEFGISC</title>
</head>
<body>
<header></header>
<div id="main-container">
<div id="title-section" class="section">
<h1>全台最「電」</h1>
<h1>資訊系社團</h1>
<h2>建中電子計算機研習社×北一女中資訊研習社</h2>
<a href="https://www.instagram.com/ckeisc_42nd/">點我地社</a>
</div>
<div id="news-section" class="section">
<div class="section-title">
<p>最新消息</p>
</div>
<ul id="news-list" class="line-list"></ul>
</div>
<div id="lesson-section" class="section">
<div class="section-title">
<p>課程介紹</p>
</div>
<div id="lesson-links">
<a id="major-lesson" href="/lesson#major-class">大社課</a>
<a id="noon-lesson" href="/lesson#noon-class">北資中午小社課</a>
<a id="after-lesson" href="/lesson#after-class">建北電資聯合<br>放學小社課</a>
</div>
</div>
<div id="faq-section" class="section">
<div class="section-title">
<p>常見問答</p>
</div>
<ul class="line-list">
<li>
<p class="qq">Q:小社是星期幾?</p>
<p>A:</p>
<p>每天放學之後都有喔~(除了段考週和前一週)</p>
<p class="btn"><a class="btn_a" href="/lesson">點我看小社介紹</a></p>
</li>
<li>
<p class="qq">Q:學長姐,我有第八節,來得及來聽課嗎?</p>
<p>A:</p>
<p>可以喔!</p>
<p>我們基本上會等到北資的同學5:10下課可以安全走到建中的時間才開始上課,不論是北資還是建電的數資、科學班學生,都歡迎來參加小社!</p>
</li>
<li>
<p class="qq">Q:我要怎麼來聽社課呢?</p>
<p>A:</p>
<p>只要你是建北電資的社員,都可以直接來聽課,完全不用事先報名的啦XD</p>
</li>
</ul>
</div>
<div id="aboutsite-section" class="section">
<div class="section-title">
<p>關於本站</p>
</div>
<div class="aboutsite_sec">
<div class="aboutsite_title">緣起</div>
<div class="aboutsite_text">
建北電資以往皆有架設網站作為招生及宣傳用途。但是自從建電社辦的伺服器被學校沒收之後,一直以來都找不到一個良好的網站架設環境,也沒有一個地方讓學術們統一放置教材供學弟妹使用。因此,在一三上幹了之後,一二學術長檸檬便一直希望鹽亞倫可以將他們沒有做出的社網完成。因此,不會css的鹽亞倫便找了溫室菜以及北資學術長嗯嗯,嘗試從頭寫出一個網站,並且透過github pages進行架設。<br>
<div style="text-align: right;"><a href="/about/site.html" class="aaa">>>> 閱讀更多本站歷史</a></div>
</div>
</div>
<div class="aboutsite_sec">
<div class="aboutsite_title">製作團隊</div>
<div class="aboutsite_text">專案管理:建電42nd吳亞倫<br>網站架設:北資36th蘇怡恩、建電42nd蔡政廷<br>
<div style="text-align: right;"><a href="/about/site.html" class="aaa">>>> 更多本站資訊</a></div>
</div>
</div>
</div>
</div>
<footer></footer>
<script>
listNews(0, 4);
</script>
</body>
</html>
import requests #匯入requests模組
url='https://ckefgisc.github.io/' #這是你想要爬的網址
r=requests.get(url) #get函式返回一個response物件
print(r.status_code) #返回網頁狀態碼
#200
import requests #匯入requests模組
url='https://ckefgisc.github.io/'
r=requests.get(url)
print(r.headers) #返回headers
import requests
url = 'https://httpbin.org/'
headers = {"user-agent":"Mozilla/5.0"} #指定headers
r = requests.get(url, headers = headers)
import requests
#方法一
url = 'https://httpbin.org/get?key1=value1&key2=value2'
#直接指定網址
r = requests.get(url)
print(r.text)
#方法二
params = {"key1":"value1","key2":"value2"}
url = 'https://httpbin.org/get'
r = requests.get(url, params = params)
#利用params指定
print(r.text)
才怪
pip install beautifulsoup4
import requests #匯入requests模組
from bs4 import BeautifulSoup #注意大小寫!!!
#匯入bs4模組中的BeautifulSoup
url = 'https://ckefgisc.github.io/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
#要使用需要兩個參數,第一個是原始碼,第二個是解析方式
#把解析後的結果傳進soup
print(soup.prettify())#輸出排版後的HTML
看起來沒差?
import requests
from bs4 import BeautifulSoup #注意大小寫!!!
url = 'https://ckefgisc.github.io/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.title)
小試身手!
import requests
from bs4 import BeautifulSoup #注意大小寫!!!
url = 'https://ckefgisc.github.io/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.find('a'))#尋找<a>標籤
print(soup.find_all('a'))#尋找所有的<a>標籤
print(soup.find_all('p', limit=2))#尋找頭兩個<p>標籤
print(soup.find("div", class_="aboutsite_text"))
#尋找<div>標籤中的特定class的內容
import requests
from bs4 import BeautifulSoup #注意大小寫!!!
url = 'https://ckefgisc.github.io/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
for link in find_all('a'): #每一個<a>標籤中
print(link.get("href")) #取得擁有href屬性的東東
import requests
from bs4 import BeautifulSoup #注意大小寫!!!
url = 'https://ckefgisc.github.io/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.find('li').getText())#注意大小寫!
#印出第一個<li>內的文字
乘一好電orz