文章詳情頁

csv - python多列存取爬蟲網頁？

瀏覽：107日期：2022-08-30 10:07:16

問題描述

爬蟲抓取的資料想分列存取在tsv上,試過很多方式都沒有辦法成功存存取成兩列資訊。想存取為數字爬取的資料一列,底下類型在第二列 csv - python多列存取爬蟲網頁？

from urllib.request import urlopenfrom bs4 import BeautifulSoupimport reimport csvhtml = urlopen('http://www.app12345.com/?area=tw&store=Apple%20Store')bs0bj = BeautifulSoup (html)def GPname(): GPnameList = bs0bj.find_all('dd',{'class':re.compile('ddappname')}) str = ’’ for name in GPnameList:str += name.get_text()str += ’n’print(name.get_text()) return strdef GPcompany(): GPcompanyname = bs0bj.find_all('dd',{'style':re.compile('color')}) str = ’’ for cpa in GPcompanyname:str += cpa.get_text()str += ’n’print(cpa.get_text()) return strwith open(’0217.tsv’,’w’,newline=’’,encoding=’utf-8’) as f: f.write(GPname()) f.write(GPcompany())f.close()

可能對zip不熟悉，存取下來之后變成一個字一格也找到這篇參考，但怎么嘗試都沒有辦法成功https://segmentfault.com/q/10...

問題解答

回答1：

寫csv文件簡單點你的結構數據要成這樣 [['1. 東森新聞雲','新聞'],['2. 創世黎明(Dawn of world)','遊戲']]

from urllib import urlopenfrom bs4 import BeautifulSoupimport reimport csvhtml = urlopen('http://www.app12345.com/?area=tw&store=Apple%20Store')bs0bj = BeautifulSoup (html)GPnameList = [name.get_text() for name in bs0bj.find_all('dd',{'class':re.compile('ddappname')})]GPcompanyname = [cpa.get_text() for cpa in bs0bj.find_all('dd',{'style':re.compile('color')})]data = ’n’.join([’,’.join(d) for d in zip(GPnameList, GPcompanyname)])with open(’C:/Users/sa/Desktop/0217.csv’,’wb’) as f: f.write(data.encode(’utf-8’))

Python 編程

上一條：python - 搜索大文件（20G左右）下一條：ubuntu - Python3.x的中文字符在Linux下面的占位問題？

相關文章：

1. docker images顯示的鏡像過多，狗眼被亮瞎了，怎么辦？2. 關docker hub上有些鏡像的tag被標記““This image has vulnerabilities””3. docker-machine添加一個已有的docker主機問題4. css - 求推薦適用于vue2的框架像bootstrap這種類型的5. Span標簽6. SessionNotFoundException：會話ID為null。調用quit（）后使用WebDriver嗎？（硒）7. android新手一枚，android使用httclient獲取服務器端數據失敗，但是用java工程運行就可以成功獲取。8. css - 關于div自適應問題，大家看圖吧，說不清9. redis啟動有問題？10. java - Collections類里的swap函數，源碼為什么要新定義一個final的List型變量l指向傳入的list？

排行榜

					
					docker-machine添加一個已有的docker主機問題
關docker hub上有些鏡像的tag被標記““This image has vulnerabilities””
docker images顯示的鏡像過多，狗眼被亮瞎了，怎么辦？
css - 求推薦適用于vue2的框架  像bootstrap這種類型的
Span標簽
docker安裝后出現Cannot connect to the Docker daemon.
angular.js使用$resource服務把數據存入mongodb的問題。
docker-compose中volumes的問題
javascript - ng-options 設置默認選項，不是設置第一個哦，看清楚了！
android新手一枚，android使用httclient獲取服務器端數據失敗，但是用java工程運行就可以成功獲取。
SessionNotFoundException：會話ID為null。調用quit（）后使用WebDriver嗎？（硒）
				

熱門標簽

亚洲免费在线视频-亚洲啊v-久久免费精品视频-国产精品va-看片地址-成人在线视频网

csv - python多列存取爬蟲網頁？