社區(qū) 發(fā)現(xiàn) Amazon 亞馬遜爬蟲(chóng)工具WebScraper使用全...
亞馬遜爬蟲(chóng)工具WebScraper使用全攻略:抓取產(chǎn)品Q(chēng)A信息、抓取指定頁(yè)面代碼、抓取競(jìng)品review、抓取搜索頁(yè)關(guān)鍵詞前幾頁(yè)產(chǎn)品、抓取榜單
?
接下來(lái),直接上各種案例代碼。
1.抓取產(chǎn)品Q(chēng)A信息
有兩種方式:抓取所有頁(yè)面,抓取指定頁(yè)
a.Web scraper點(diǎn)擊翻頁(yè),抓取所有頁(yè)數(shù)。缺點(diǎn)是有的競(jìng)品可能幾百頁(yè)QA,沒(méi)必要抓那么多。
代碼:
{"_id":"amz-qa","startUrl":["https://www.amazon.com/ask/que ... ot%3B],"selectors":[{"id":"contents","parentSelectors":["_root","nextpage"],"type":"SelectorElement","selector":".a-section > div.a-spacing-base > div > div.a-col-right","multiple":true,"delay":null},{"id":"question","parentSelectors":["contents"],"type":"SelectorText","selector":".a-spacing-small div.a-col-right","multiple":false,"delay":0,"regex":""},{"id":"answer","parentSelectors":["contents"],"type":"SelectorText","selector":".a-col-right > span:nth-of-type(1)","multiple":false,"delay":0,"regex":""},{"id":"buyer","parentSelectors":["contents"],"type":"SelectorText","selector":"span.a-profile-name","multiple":false,"delay":0,"regex":""},{"id":"date","parentSelectors":["contents"],"type":"SelectorText","selector":"span.a-color-tertiary","multiple":false,"delay":0,"regex":""},{"id":"nextpage","parentSelectors":["_root","nextpage"],"type":"SelectorLink","selector":".a-last a","multiple":true,"delay":0}]}?
?
按圖示,導(dǎo)入代碼
?



抓取不同競(jìng)品,最好更換ASIN??梢栽谶@里編輯,后面的ASIN換成其他ASIN
?
b.抓取指定頁(yè)面代碼:
導(dǎo)入代碼以后,如果想指定頁(yè)數(shù),就打開(kāi)編輯,更改網(wǎng)址后面的數(shù)字。
?

如果爬取8頁(yè),就改成[1-8],依此類(lèi)推。
代碼:
{"_id":"amz-qa2","startUrl":["https://www.amazon.com/ask/que ... 8QQP/[1-2]"],"selectors":[{"delay":null,"id":"contents","multiple":true,"parentSelectors":["_root"],"selector":".a-section > div.a-spacing-base > div > div.a-col-right","type":"SelectorElement"},{"delay":0,"id":"question","multiple":false,"parentSelectors":["contents"],"regex":"","selector":".a-spacing-small div.a-col-right","type":"SelectorText"},{"delay":0,"id":"answer","multiple":false,"parentSelectors":["contents"],"regex":"","selector":".a-col-right > span:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"buyer","multiple":false,"parentSelectors":["contents"],"regex":"","selector":"span.a-profile-name","type":"SelectorText"},{"delay":0,"id":"date","multiple":false,"parentSelectors":["contents"],"regex":"","selector":"span.a-color-tertiary","type":"SelectorText"}]}
?
抓取其他數(shù)據(jù)依舊是同樣的方式,導(dǎo)入代碼即可。
?
2.抓取競(jìng)品review
代碼:
{"_id":"review","startUrl":["https://www.amazon.com/Insulat ... ot%3B],"selectors":[{"clickElementSelector":".a-last a","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickMore","delay":3000,"discardInitialElements":"do-not-discard","id":"info","multiple":true,"parentSelectors":["_root"],"selector":".a-row div.celwidget","type":"SelectorElementClick"},{"delay":0,"id":"name","multiple":false,"parentSelectors":["info"],"regex":"","selector":"span.a-profile-name","type":"SelectorText"},{"delay":0,"id":"score","multiple":false,"parentSelectors":["info"],"regex":"","selector":"> div:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"status","multiple":false,"parentSelectors":["info"],"regex":"","selector":"div.a-spacing-mini.review-data","type":"SelectorText"},{"delay":0,"id":"time","multiple":false,"parentSelectors":["info"],"regex":"","selector":"span.a-color-secondary","type":"SelectorText"},{"delay":0,"id":"content","multiple":false,"parentSelectors":["info"],"regex":"","selector":"div.a-spacing-small","type":"SelectorText"}]}?
3.抓取搜索頁(yè)關(guān)鍵詞前幾頁(yè)產(chǎn)品
無(wú)限翻頁(yè)代碼:
{"_id":"wxsearch","startUrl":["https://www.amazon.com/s%3Fk%3 ... ot%3B],"selectors":[{"delay":0,"id":"info","multiple":true,"parentSelectors":["_root","panination"],"selector":"div.s-expand-height","type":"SelectorElement"},{"delay":0,"id":"title","multiple":false,"parentSelectors":["info"],"selector":".a-size-mini a","type":"SelectorLink"},{"delay":0,"id":"score","multiple":false,"parentSelectors":["info"],"regex":"","selector":"i.a-icon-star-small","type":"SelectorText"},{"delay":0,"id":"reviews","multiple":false,"parentSelectors":["info"],"regex":"","selector":"span.a-size-base","type":"SelectorText"},{"delay":0,"id":"panination","multiple":true,"parentSelectors":["_root","panination"],"selector":".a-last a","type":"SelectorLink"},{"delay":0,"id":"price","multiple":false,"parentSelectors":["info"],"regex":"","selector":"[data-a-size='l'] span[aria-hidden]","type":"SelectorText"}]}?
指定頁(yè)數(shù)代碼:(前5頁(yè))
{"_id":"search","startUrl":["https://www.amazon.com/s?k=lunch+box&page=[1-5]"],"selectors":[{"delay":0,"id":"info","multiple":true,"parentSelectors":["_root"],"selector":"div.s-expand-height","type":"SelectorElement"},{"delay":0,"id":"title","multiple":false,"parentSelectors":["info"],"selector":".a-size-mini a","type":"SelectorLink"},{"delay":0,"id":"score","multiple":false,"parentSelectors":["info"],"regex":"","selector":"i.a-icon-star-small","type":"SelectorText"},{"delay":0,"id":"reviews","multiple":false,"parentSelectors":["info"],"regex":"","selector":"span.a-size-base","type":"SelectorText"},{"delay":0,"id":"price","multiple":false,"parentSelectors":["info"],"regex":"","selector":"[data-a-size='l'] span[aria-hidden]","type":"SelectorText"}]}?
4.抓取榜單代碼:
?
{"_id":"amazon-com-best-sellers","startUrl":["https://www.amazon.com/Best-Se ... pg%3D[1-8]"],"selectors":[{"id":"info","parentSelectors":["_root"],"type":"SelectorElement","selector":"div.zg-grid-general-faceout","multiple":true,"delay":null},{"id":"name","parentSelectors":["info"],"type":"SelectorText","selector":"div._cDEzb_p13n-sc-css-line-clamp-3_g3dy1","multiple":false,"delay":0,"regex":""},{"id":"score","parentSelectors":["info"],"type":"SelectorText","selector":"div.a-icon-row","multiple":false,"delay":0,"regex":""},{"id":"price","parentSelectors":["info"],"type":"SelectorText","selector":"div:nth-of-type(2)","multiple":false,"delay":0,"regex":""}]}?
如果數(shù)據(jù)為空,有兩種情況:
1.需要更改網(wǎng)址
2.類(lèi)目結(jié)構(gòu)不一樣,需要重新選擇爬蟲(chóng)結(jié)構(gòu)
這種情況,可按照下面的圖示指引,重新選擇數(shù)據(jù)。
?


?



?
相信耐心看完的同學(xué),也能熟練掌握webscraper爬蟲(chóng)了。
?
21 個(gè)回復(fù)
shenlizu - 柯柯可可可的爸爸
贊同來(lái)自: linnaCc7 、 zmbeaman 、 8塊腹肌彭于晏 、 風(fēng)云魚(yú)與漁與浴 、 我也不曾來(lái)過(guò) 、 凱德盟CADMON 、 薄荷口味 、 Andy聊跨境 、 元滄** 、 妖了個(gè)精 更多 ?