Python爬虫之伪表头pseudo headers

遇到问题,在爬取这个网站的时候:
https://www.biququ.com/html/21627/

file

发现了这种headers:
file

粘贴下来:

:authority: www.biququ.com
:method: GET
:path: /html/21627/
:scheme: https

这种headers叫做伪请求头

需要如下处理办法:

  1. 安装库
    pip install hyper

2、编写新的headers设置办法


from hyper.contrib import HTTP20Adapter
import requests

headers = {
    ":authority": "www.biququ.com",
    ":method": "GET",
    ":scheme": "https",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36",
}

url = "https://www.biququ.com/html/21627/"
sessions=requests.session()
sessions.mount('https://www.biququ.com', HTTP20Adapter())
r=sessions.get(url,headers=headers)
print(r.status_code)
print(r.text)

爬取成功:
file

参考文章:

How do I send a HTTP/2 pseudo-headers with requests?
https://stackoverflow.com/questions/62736631/how-do-i-send-a-http-2-pseudo-headers-with-requests

Leave a Comment