文章詳情頁

python使用requests庫爬取拉勾網(wǎng)招聘信息的實現(xiàn)

瀏覽：11日期：2022-07-04 16:57:42

按F12打開開發(fā)者工具抓包，可以定位到招聘信息的接口

在請求中可以獲取到接口的url和formdata，表單中pn為請求的頁數(shù)，kd為關請求職位的關鍵字

python使用requests庫爬取拉勾網(wǎng)招聘信息的實現(xiàn)

使用python構建post請求

data = { ’first’: ’true’, ’pn’: ’1’, ’kd’: ’python’}headers = { ’referer’: ’https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=’, ’user-agent’: ’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36’}res = requests.post('https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false', data=data,headers=headers)print(res.text)

發(fā)現(xiàn)沒有從接口獲取到數(shù)據(jù)

python使用requests庫爬取拉勾網(wǎng)招聘信息的實現(xiàn)

換了個網(wǎng)絡后接口還是會返回操作頻繁的錯誤信息，仔細檢查后發(fā)現(xiàn)這個接口需要一個動態(tài)的cookies不然會一值返回錯誤頻繁

data = { ’first’: ’true’, ’pn’: ’1’, ’kd’: ’python’}#頭部中必須有user-agent和referer不然不會返回cookiesheaders = { ’referer’: ’https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=’, ’user-agent’: ’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36’}#通過訪問主頁獲取cookiesr1= requests.get('https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=’',headers=headers)#再post請求中傳入cookiesr2 = requests.post('https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false', data=data,headers=headers, cookies=r2.cookies)print(r2.text)

注意！每請求十次接口cookies也會刷新一次,下面貼上完整爬蟲代碼

import jsonimport loggingimport requests#獲取cookiedef getCookie(): res = requests.get('https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=',headers=headers) return res.cookies#獲取json數(shù)據(jù)def getPage(i, cookies, kw): data = { ’first’: ’true’, ’pn’: i, ’kd’: kw } res = requests.post('https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false', data=data, headers=headers, cookies=cookies) return json.loads(res.text)#合并列表def reduceList(l): text = '' for i in l: text += i + ' ' return text.strip()#提取字段并保存到文件中def saveInCsv(f, data): js = data['content']['positionResult']['result'] for node in js: # 對空值進行處理 district = node['district'] if district != None: district = '-' + district else: district = '' f.write( node['positionName'] + '·' + node['city'] + district + '·' + node['salary'] + '·' + node['workYear'] + '·' + node['education'] + '·' + reduceList(node['skillLables']) + '·' + node['companyShortName'] + '·' + node['companySize'] + '·' + node['positionAdvantage'] + 'n')if __name__ == ’__main__’: #定義頭部 headers = { ’referer’: ’https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=’, ’user-agent’: ’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36’ } #初始化cookie cookies = getCookie() with open('file.csv', 'w', encoding='utf-8') as f: for i in range(1, 31): #每十個請求重新獲取cookie if (i % 10 == 0):cookies = getCookie() #解析字段并存儲 data = getPage(i, cookies, 'python') saveInCsv(f, data)

到此這篇關于python使用requests庫爬取拉勾網(wǎng)招聘信息的實現(xiàn)的文章就介紹到這了,更多相關python requests爬取拉勾網(wǎng)內(nèi)容請搜索好吧啦網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持好吧啦網(wǎng)！

Python 編程

上一條：Python paramiko使用方法代碼匯總下一條：Python getsizeof()和getsize()區(qū)分詳解

相關文章：

1. Python如何實現(xiàn)感知器的邏輯電路2. vue實現(xiàn)移動端返回頂部3. JS錯誤處理與調(diào)試操作實例分析4. asp讀取xml文件和記數(shù)5. python基于scrapy爬取京東筆記本電腦數(shù)據(jù)并進行簡單處理和分析6. 原生js實現(xiàn)的觀察者和訂閱者模式簡單示例7. JS實現(xiàn)表單中點擊小眼睛顯示隱藏密碼框中的密碼8. Python ellipsis 的用法詳解9. vue 驗證兩次輸入的密碼是否一致的方法示例10. xml中的空格之完全解說

排行榜

					
					JS錯誤處理與調(diào)試操作實例分析
vue實現(xiàn)移動端返回頂部
原生js實現(xiàn)的觀察者和訂閱者模式簡單示例
JS實現(xiàn)表單中點擊小眼睛顯示隱藏密碼框中的密碼
asp讀取xml文件和記數(shù)
Python ellipsis 的用法詳解
python基于scrapy爬取京東筆記本電腦數(shù)據(jù)并進行簡單處理和分析
Python如何實現(xiàn)感知器的邏輯電路
PHP實現(xiàn)基本留言板功能原理與步驟詳解
使用ProcessBuilder調(diào)用外部命令，并返回大量結果
簡體中文轉換為繁體中文的PHP函數(shù)