为何不能使用urllib来构造post请求?

作者站长头像
站长
· 阅读数 4

有这样一个post请求request header:

POST /csindex-home/exportExcel/security-industry-search-excel/CH HTTP/1.1
Host: www.csindex.com.cn
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/json;charset=utf-8
Content-Length: 78
Origin: https://www.csindex.com.cn
Connection: keep-alive
Referer: https://www.csindex.com.cn/
Cookie: aliyungf_tc=0b04a5cc679f3b1e6d9d27a2635a318c4b60b69f0d58a933d780f569a78d043c; ssxmod_itna=iqmxgQDQG=0=oxeq0Lx=S8GO8KDtDcAPcDto=7Qx40Hn0eiODUxn4iaDT=aNBpQ2iwPGbnGar=T+7Gi4Ho/gfgTe87ZeYhoD74i8uDW5DlDDR7NdDlc470DYPG0DiFQDX=Llkqp1IACt2DGCzPqDu24DxxiTDinoq04x0lp7Ih986=xiO6n5NjDrrCxY5ehr8eRbd12IDDcEHChDD===; zg_did=%7B%22did%22%3A%20%221868ba15f2a47b-009a307f25d0578-376a464a-1fa400-1868ba15f2b4c6%22%7D; _uab_collina=167737985610863887816821; ssxmod_itna2=iqmxgQDQG=0=oxeq0Lx=S8GO8KDtDcAPcDto=7DnI3yiqDshKDLe=t7NIQB=AxqidzAPmi+ennDn7UG7FAmGi=es9uIxhej9GB=foqUGA9LPcmQb4ftxPpmiPPvX7Dc8KciqeAM8Ro+YRE378GD8YD0ec34iiWqqGwq7FY4kbAxk+Oe8KDUe7SrSlCDr9fYoA2qh0AYi/+EYjRqeN8deq4h=kc3qiC9Fw8AXFAcCNnKw=fYN/3KIqS=kUdSNubKSDr57Rk2B+YHtBcM8000WytU6=b6en87dmbYV9ztohpGEYToB8CFNxQwOGmiGDx47jI44v7lheo0UaQpoqcDj7jGmlPFgQ8WiGAXt+=4KbPxq4ePc0f8meumLrZqdYj58qFOe+DRuPm2O0o3=xRd1zou6QBDnda0WnfiKvvWmdx49YtjPhleAKgmDOQiKe=1U6KdTILvpiv00UamU8QjRLCPA7BmnI0+79d2azoKcOQ2jvBjyM/foSy/Feu232naAaHchi27rWOyuoKy4ql9TPTT3MWdvfVzExhmQMOZ8qpC5cAeGmmmjmivYFMO8BD7j5ceqx6Ww1wp8kakqeKpRejviMePhF6=c0e+dlpq4aDG2tW4Qz89Wc=D9chE7avY2cu/=iB6orBFcxlmm7c8Ax4DKGRxWo74P8xnzGYDvf=aHaYbeqi7ycDMvuAtLr4pe5RmY4niqjxKy6pGS043YN7Pg37ed2D7i5Qs9GeeP7AXpDD08DG7oGDD=; acw_tc=76b20f8a16773929641405222e21e87483e560370854cd0bb2155ccc8ed2a6; zg_6df0ba28cbd846a799ab8f527e8cc62b=%7B%22sid%22%3A%201677392958817%2C%22updated%22%3A%201677393938767%2C%22info%22%3A%201677379854166%2C%22superProperty%22%3A%20%22%7B%5C%22%E5%BA%94%E7%94%A8%E5%90%8D%E7%A7%B0%5C%22%3A%20%5C%22%E4%B8%AD%E8%AF%81%E6%8C%87%E6%95%B0%E5%AE%98%E7%BD%91%5C%22%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22%22%2C%22landHref%22%3A%20%22https%3A%2F%2Fwww.csindex.com.cn%2F%23%2FdataService%2FindustryClassification%22%2C%22prePath%22%3A%20%22https%3A%2F%2Fwww.csindex.com.cn%2F%23%2FdataService%2FindustryClassification%22%2C%22duration%22%3A%209749.359999999404%2C%22zs%22%3A%200%2C%22sc%22%3A%200%2C%22firstScreen%22%3A%201677392958817%7D
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin

post data

{"searchInput":"","pageNum":3,"pageSize":10,"sortField":null,"sortOrder":null}

使用requests库构造的请求:

import requests
from urllib.parse import unquote
res = requests.post(
    "https://www.csindex.com.cn/csindex-home/exportExcel/security-industry-search-excel/CH",
    headers={"content-type": "application/json;charset=UTF-8"},
    json={
        "searchInput": "", # 输入的搜索内容
        "pageNum": 1, # 页码
        "pageSize": 10, # 每页条数
        "sortField": None, # 按哪个类别排序
        "sortOrder": None, # 排序方式
    },
)
print(res.status_code, res.url)
200 https://www.csindex.com.cn/csindex-home/exportExcel/security-industry-search-excel/CH

我使用urllib库

import urllib.request
base_url="https://www.csindex.com.cn/csindex-home/exportExcel/security-industry-search-excel/CH"
data = {
    'searchInput' : '',
    'pageNum' : 1,
    'pageSize' : 10,
    'sortField' : None,
    'sortOrder' : None
}
headers={"content-type": "application/json;charset=UTF-8"}
postdata=urllib.parse.urlencode(data).encode('utf-8')
req=urllib.request.Request(url=base_url,headers=headers,data=postdata,method='POST')
response=urllib.request.urlopen(req)
html=response.read()
print(html.decode('utf-8'))
{"code":"500","msg":"服务器异常,请联系管理员","data":null,"success":false}

换一种格式也不行

import urllib.request    
import urllib.parse    
    
url="https://www.csindex.com.cn/csindex-home/exportExcel/security-industry-search-excel/CH"   
params = {
    'searchInput' : '',
    'pageNum' : 1,
    'pageSize' : 10,
    'sortField' : None,
    'sortOrder' : NOne
}
    
query_string = urllib.parse.urlencode( params )    
data = query_string.encode( "ascii" )    
    
with urllib.request.urlopen( url, data ) as response:     
    response_text = response.read()     
    print( response_text ) 

为何服务器异常呢?

回复
1个回答
avatar
test
2024-07-09

没错,就像楼上所说的那样,在第一个使用 urllib 的示例中,postdata 应这样写:

import json
postdata = bytes(json.dumps(data), "utf-8")
另外,这个网址返回的是二进制数据,所以在最后也不能这么写:html.decode('utf-8')
回复
likes
适合作为回答的
  • 经过验证的有效解决办法
  • 自己的经验指引,对解决问题有帮助
  • 遵循 Markdown 语法排版,代码语义正确
不该作为回答的
  • 询问内容细节或回复楼层
  • 与题目无关的内容
  • “赞”“顶”“同问”“看手册”“解决了没”等毫无意义的内容