Http协议和Python Requests库

HTTP协议

file

请求

请求的数据

  1. URL
  2. URL的参数,比如http://httpbin.org/get?key1=value1&key2=value2
  3. 请求的数据体,比如填写的表单或者请求的JSON数据

请求的headers,比如:

headers = {
    'useragent': 'Mozilla/5.0(WindowsNT6.1;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/52.0.2743.82Safari/537.36'}

返回

返回的状态码:

  1. r.text是服务器响应的内容,会自动根据响应头部的字符编码进行解码。
  2. r.encoding是服务器内容使用的文本编码。
  3. r.status_code用于检测响应的状态码,如果返回200,就表示请求成功了;如果返回的是4xx,就表示客户端错误;返回5xx则表示服务器错误响应。我们可以用r.status_code来检测请求是否正确响应。
  4. r.content是字节方式的响应体,会自动解码gzip和deflate编码的响应数据。
  5. r.json()是Requests中内置的JSON解码器。

requests发送GET请求

GET用于获取网页服务端的数据

import requests

r = requests.get('https://baidu.com/')
r.encoding = "utf-8"
print("文本编码:", r.encoding)
print("响应状态码:", r.status_code)
print("字符串方式的响应体:", r.text)

其中 r.encoding 可以修改编码,不然乱码

返回的response:

r.status_code用于检测响应的状态码

如果返回200,就表示请求成功了;
如果返回的是4xx,就表示客户端错误;
返回5xx则表示服务器错误响应。

我们可以用r.status_code来检测请求是否正确响应。

requests发送data请求

import requests
import json
key_dict = {"key1": "value1", "key2": "value2"}
r = requests.post("http://httpbin.org/post", data=json.dumps(key_dict))
print(r.text)

# 发送form和data的区别

结果为:

{
  "args": {}, 
  "data": "{\"key1\": \"value1\", \"key2\": \"value2\"}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "36", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.14.0", 
    "X-Amzn-Trace-Id": "Root=1-64196f9e-6b54025b02d214ad5a2762a9"
  }, 
  "json": {
    "key1": "value1", 
    "key2": "value2"
  }, 
  "origin": "61.16.102.74", 
  "url": "http://httpbin.org/post"
}

requests发送form请求

区别是一个字典,不是json

import requests
import json
key_dict = {"key1": "value1", "key2": "value2"}
# r = requests.post("http://httpbin.org/post", data=json.dumps(key_dict))
r = requests.post("http://httpbin.org/post", data=key_dict)
print(r.text)

结果为:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "key1": "value1", 
    "key2": "value2"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "23", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.14.0", 
    "X-Amzn-Trace-Id": "Root=1-64196fbf-6533dc414251c27e59971d5a"
  }, 
  "json": null, 
  "origin": "162.219.34.250", 
  "url": "http://httpbin.org/post"
}

超时时间

import requests

link = "http://www.crazyant.net/"

r = requests.get(link, timeout=0.001)
print(r.text)

会报错:

requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='www.crazyant.net', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7f9acab71650>, 'Connection to www.crazyant.net timed out. (connect timeout=0.001)'))

附带headers

import requests

headers = {
    'useragent': 'Mozilla/5.0(WindowsNT6.1;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/52.0.2743.82Safari/537.36',
    "cookie ":"BIDUPSID=377D2B91ED643DFB14D8CC6E8B3626A2; PSTM=1664248378; BAIDUID=377D2B91ED643DFBF1BDCE23F61D2A0D:SL=0"
    }
r = requests.get('http://httpbin.org/post', headers=headers)
print("响应状态码:", r.status_code)

返回结果:

响应状态码: 200

上传文件

import requests

img = open("图片.jpg", "rb")

myfiles = {"myfile": img}

resp = requests.post("http://httpbin.org/post", files=myfiles)
print(resp.text)

返回结果:

{
  "args": {}, 
  "data": "", 
  "files": {
    "myfile": "data:application/octet-stream;base64,/9j/4AAQSkZJRgABAQAASABIAAD/49k="
  }, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "121974", 
    "Content-Type": "multipart/form-data; boundary=32affeb7cdd148288dd51b55c7c936c9", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.14.0", 
    "X-Amzn-Trace-Id": "Root=1-64197251-407fa733740f8f0678287acb"
  }, 
  "json": null, 
  "origin": "61.16.102.77", 
  "url": "http://httpbin.org/post"
}

Leave a Comment