遇到问题,在爬取这个网站的时候:
https://www.biququ.com/html/21627/
发现了这种headers:
粘贴下来:
:authority: www.biququ.com
:method: GET
:path: /html/21627/
:scheme: https
这种headers叫做伪请求头
需要如下处理办法:
- 安装库
pip install hyper
2、编写新的headers设置办法
from hyper.contrib import HTTP20Adapter
import requests
headers = {
":authority": "www.biququ.com",
":method": "GET",
":scheme": "https",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36",
}
url = "https://www.biququ.com/html/21627/"
sessions=requests.session()
sessions.mount('https://www.biququ.com', HTTP20Adapter())
r=sessions.get(url,headers=headers)
print(r.status_code)
print(r.text)
爬取成功:
参考文章:
How do I send a HTTP/2 pseudo-headers with requests?
https://stackoverflow.com/questions/62736631/how-do-i-send-a-http-2-pseudo-headers-with-requests