发布于2020-03-15 18:56 阅读(1860) 评论(0) 点赞(4) 收藏(0)
准备考研复试的项目,想了想就先做了个简单爬虫,爬取百度翻译单词意思。
根据需要选择api服务,我这里使用的是通用api
进行简单的分析
进入百度翻译页面,使用F12进入开发者模式
进行简单的单词翻译
可以在右边的数据栏里找到一个v2transapi开头的数据,我们发现这里有目标译文。尝试解析headers,resoponse,发现关联。
查看开发者DEMO文档
在分析request后,回到百度翻译所给的demo(可以在百度翻译开发平台获取)
# coding=utf-8
import http.client
import hashlib
import urllib
import random
import json
appid = '' # 填写你的appid
secretKey = '' # 填写你的密钥
httpClient = None
myurl = '/api/trans/vip/translate'
fromLang = 'auto' #原文语种
toLang = 'zh' #译文语种
salt = random.randint(32768, 65536)
q= 'apple'
sign = appid + q + str(salt) + secretKey
sign = hashlib.md5(sign.encode()).hexdigest()
myurl = myurl + '?appid=' + appid + '&q=' + urllib.parse.quote(q) + '&from=' + fromLang + '&to=' + toLang + '&salt=' + str(
salt) + '&sign=' + sign
try:
httpClient = http.client.HTTPConnection('api.fanyi.baidu.com')
httpClient.request('GET', myurl)
# response是HTTPResponse对象
response = httpClient.getresponse()
result_all = response.read().decode("utf-8")
result = json.loads(result_all)
print (result)
except Exception as e:
print (e)
finally:
if httpClient:
httpClient.close()
阅读后可以知道百度翻译url的加密方式,至此我们对翻译机制有了一定的了解,接下来开始尝试动手写代码了。
到此已经可以对单词进行爬取翻译
def transbaidu(res_q):
appid = '你的id' # 百度翻译开发者appid
secretKey = '你的密钥' # 开发密钥
httpClient = None
myurl = '/api/trans/vip/translate'#url地址
fromLang = 'auto' #原文语种
toLang = 'zh' #译文语种
salt = random.randint(32768, 65536)
q= res_q #翻译内容
sign = appid + q + str(salt) + secretKey #密文
sign = hashlib.md5(sign.encode()).hexdigest() #加密MD5
myurl = myurl + '?appid=' + appid + '&q=' + urllib.parse.quote(q) + '&from=' + fromLang + '&to=' + toLang + '&salt=' + str(
salt) + '&sign=' + sign #最终url
try:
httpClient = http.client.HTTPConnection('api.fanyi.baidu.com')
httpClient.request('GET', myurl)
# response是HTTPResponse对象
response = httpClient.getresponse()
result_all = response.read().decode("utf-8")
result = json.loads(result_all)
#print (result['trans_result'][0]['src'])原文
res=result['trans_result'][0]['dst']
return res
except Exception as e:
print (e)
finally:
if httpClient:
httpClient.close()
如果我们需要翻译的数据较大或是以文件方式存在,我们也需要在对代码进行update(这里假设原文数据存放在excel文件里面)
需要注意的是百度默认有反爬机制,这里采用的等待时长1s,如果数据实在太大可以采用其他方法。
def excelTrans(
srcFilename=r'D:\test\source.xlsx',
desFilename=r'D:\test\result.xlsx',
srcSheet='Sheet1',
num = 2,
#srcColumn=2,
srcRowBegin=1,
srcRowEnd=44,
desColumn=1,
desSheet='result2'):
wb = openpyxl.load_workbook(srcFilename)
ws = wb[srcSheet]
wb2 = Workbook()
#ws2 = wb2.create_sheet(title=desSheet)
#ws2 = wb2.create_sheet(title=desSheet,index = 1)
for j in range(num):
ws2 = wb2.create_sheet(title=desSheet,index = j)
for i in range(srcRowBegin, srcRowEnd, 1):
sstr = ws.cell(row=i, column=j+1).value
if not (sstr is None):
ws2.cell(row=i-srcRowBegin+1, column=desColumn).value = transbaidu(sstr)
time.sleep(1) #反爬,设置定时;数据太大时就要用多线程了。
wb2.save(desFilename)
# coding=utf-8
import http.client
import hashlib
import urllib
import random
import json
import time
import openpyxl
from openpyxl import Workbook
def transbaidu(res_q):
appid = '20200312000397118' # 百度翻译开发者appid
secretKey = '7fTGCEjapa_vz3eoin_u' # 开发密钥
httpClient = None
myurl = '/api/trans/vip/translate'#url地址
fromLang = 'auto' #原文语种
toLang = 'zh' #译文语种
salt = random.randint(32768, 65536)
q= res_q #翻译内容
sign = appid + q + str(salt) + secretKey #密文
sign = hashlib.md5(sign.encode()).hexdigest() #加密MD5
myurl = myurl + '?appid=' + appid + '&q=' + urllib.parse.quote(q) + '&from=' + fromLang + '&to=' + toLang + '&salt=' + str(
salt) + '&sign=' + sign #最终url
try:
httpClient = http.client.HTTPConnection('api.fanyi.baidu.com')
httpClient.request('GET', myurl)
# response是HTTPResponse对象
response = httpClient.getresponse()
result_all = response.read().decode("utf-8")
result = json.loads(result_all)
#print (result['trans_result'][0]['src'])原文
res=result['trans_result'][0]['dst']
return res
except Exception as e:
print (e)
finally:
if httpClient:
httpClient.close()
def excelTrans(
srcFilename=r'D:\test\source.xlsx',
desFilename=r'D:\test\result.xlsx',
srcSheet='Sheet1',
num = 2,
#srcColumn=2,
srcRowBegin=1,
srcRowEnd=44,
desColumn=1,
desSheet='result2'):
wb = openpyxl.load_workbook(srcFilename)
ws = wb[srcSheet]
wb2 = Workbook()
#ws2 = wb2.create_sheet(title=desSheet)
#ws2 = wb2.create_sheet(title=desSheet,index = 1)
for j in range(num):
ws2 = wb2.create_sheet(title=desSheet,index = j)
for i in range(srcRowBegin, srcRowEnd, 1):
sstr = ws.cell(row=i, column=j+1).value
if not (sstr is None):
ws2.cell(row=i-srcRowBegin+1, column=desColumn).value = transbaidu(sstr)
time.sleep(1) #反爬,设置定时;数据太大时就要用多线程了。
wb2.save(desFilename)
if __name__ == '__main__':
excelTrans()
以上就是pytthon爬取百度翻译,要准备下一个项目了,真希望复试能过,好想有书读,太卑微了
原文链接:https://blog.csdn.net/qq_38281781/article/details/104835873
作者:编程gogogo
链接:https://www.pythonheidong.com/blog/article/260182/9557e2179381603b0c46/
来源:python黑洞网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
---无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事,而不是让内心的烦躁、焦虑,坏掉你本来就不多的热情和定力
Copyright © 2018-2021 python黑洞网 All Rights Reserved 版权所有,并保留所有权利。 京ICP备18063182号-1
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!