电子书-用Python进行网站抓取：使用BeautifulSoup和Scrapy（英）

181

电子书-用Python进行网站抓取：使用BeautifulSoup和Scrapy（英）

# 计算机 # 网络 # 应用程序大小：4.75M | 页数：235 | 上架时间：2022-03-29 | 语言：英文

电子书-用Python进行网站抓取：使用BeautifulSoup和Scrapy（英）.pdf

免费阅读10页，购买之后可查看、下载235页完整报告

电子书-用Python进行网站抓取：使用BeautifulSoup和Scrapy（英）.pdf

试看10页

类型: 电子书

上传者: 二一

出版日期: 2022-03-29

摘要：

Closely examine website scraping and data processing: the technique of extracting data from websites in a format suitable for further analysis. You'll review which tools to use, and compare their features and efficiency. Focusing on BeautifulSoup4 and Scrapy, this concise, focused book highlights common problems and suggests solutions that readers can implement on their own.
Website Scraping with Python starts by introducing and installing the scraping tools and explaining the features of the full application that readers will build throughout the book. You'll see how to use BeautifulSoup4 and Scrapy individually or together to achieve the desired results. Because many sites use JavaScript, you'll also employ Selenium with a browser emulator to render these sites and make them ready for scraping.
By the end of this book, you'll have a complete scraping application to use and rewrite to suit your needs. As a bonus, the author shows you options of how to deploy your spiders into the Cloud to leverage your computer from long-running scraping tasks.

What You'll Learn
Install and implement scraping tools individually and together
Run spiders to crawl websites for data from the cloud
Work with emulators and drivers to extract data from scripted sites

Who This Book Is For
Readers with some previous Python and software development experience, and an interest in website scraping.

仔细研究网站刮削和数据处理：以适合进一步分析的格式从网站中提取数据的技术。你将审查使用哪些工具，并比较它们的功能和效率。本书以BeautifulSoup4和Scrapy为重点，简明扼要，重点突出，并提出了读者可以自己实施的解决方案。

用Python进行网站抓取》首先介绍并安装了抓取工具，并解释了读者将在本书中构建的完整应用程序的功能。你将看到如何单独或一起使用BeautifulSoup4和Scrapy来达到预期的效果。由于许多网站使用JavaScript，你还将使用Selenium和一个浏览器模拟器来渲染这些网站，并使它们准备好进行刮削。

在本书结束时，你将拥有一个完整的搜刮应用程序，可以使用和重写以满足你的需要。作为奖励，作者向你展示了如何将你的蜘蛛部署到云中，以利用你的计算机完成长期的搜刮任务的选项。

你将学到的内容

单独或共同安装和实施抓取工具

运行蜘蛛，从云端抓取网站的数据

与仿真器和驱动程序一起工作，从脚本网站中提取数据

本书适用对象

具有一定的Python和软件开发经验，并对网站抓取有兴趣的读者。

展开>> 收起<<

请登录，再发表你的看法

二一

关注