本文共 2017 字,大约阅读时间需要 6 分钟。
网络抓取
At a particular point or another, you may want to build a simple bot for scraping website data. For example, on a link sharing website, you may want users to be able to see a meta preview of the what the shared link is about without the user needing to waste time visiting the URL.
在某个特定的时间点,您可能想要构建一个简单的机器人来抓取网站数据。 例如,在链接共享网站上,您可能希望用户能够看到共享链接的内容的元预览,而无需用户浪费时间访问URL。
You fire up your favorite scripting language and build a simple crawler. Once your crawler is live, you begin to experiencing problems like the website thinking you’re a bad actor coming to DDoS them and blocks your bots access to the website. Other times, you get a captcha that says “confirm you’re not robot.”
您可以启动自己喜欢的脚本语言并构建一个简单的搜寻器。 抓取工具上线后,您就会开始遇到一些问题,例如网站认为您是DDoS的坏演员,并阻止了您的漫游器访问该网站。 其他时候,您会收到一个验证码,上面写着“确认您不是机器人。”
These are very frustrating and building a crawler that is smart enough to bypass these checks is a lot of work. This is where Scrapestack comes in. You hand over a URL to Scrapestack and it ensures you encounter none of these inconveniences.
这些非常令人沮丧,并且构建一个足够聪明来绕过这些检查的爬虫是很多工作。 这就是Scrapestack的用武之地。您将URL移交给Scrapestack,这样可以确保您不会遇到任何不便之处。
Some of its key features include it being:
它的一些主要功能包括:
You need to head over to to create an account. After creating an account, copy your API key and send a request like this:
您需要前往创建一个帐户。 创建帐户后,复制您的API密钥并发送如下请求:
https://api.scapestack.com/scrape?access_key=YOUR_API_KEY&url=https://en.wikipedia.org/wiki/KISS_principle
You can replace the URL with any URL of your choice.
您可以将URL替换为您选择的任何URL。
Instead of directly crawling the website, let scrapestack handle fetching the data for you. That way you avoid pesky captchas.
与其直接抓取网站,不如让scrapestack为您获取数据。 这样,您可以避免讨厌的验证码。
It’s all fun and games until you have to “prove you’re not a bot.”
除非您必须“证明自己不是机器人”,否则所有这些都是有趣的游戏。
翻译自:
网络抓取
转载地址:http://asuwd.baihongyu.com/