xpath爬取链家二手房

阅读量：

复制代码

    import requests
    from lxml import etree
    from fake_useragent import UserAgent
    import random
    import time
    
    class LianjiaSpider(object):
    def __init__(self):
        self.url='https://nc.lianjia.com/ershoufang/pg{}/'
    
    #功能函数:随机获取User-Agent
    def get_headers(self):
        ua=UserAgent()#创建UserAgent对象
        headers={ "User-Agent": ua.random}#随机获取请求头
        return headers
    
    #获取页面
    def get_html(self,url):
        #设置超时时间为5秒，尝试次数为3次
        for i in range(3):
            try:
                res = requests.get(url=url, headers=self.get_headers(),timeout=5)
                res.encoding = "utf-8"
                html = res.text
                return html
            except Exception as e:
                print("Failed ,Retry:",i)
                continue
    
    #解析页面
    def parse_html(self,url):
        html=self.get_html(url)
        #html返回值有两种：1:html 2:None
        if html:
            parse_obj = etree.HTML(html)  # 解析对象
            # 基准xpath,li节点列表
            li_list = parse_obj.xpath('//ul[@class="sellListContent"]/li[@class="clear LOGVIEWDATA LOGCLICKDATA"]')
            # for循环遍历每个li节点，获取一个房源的所有数据
            item = {}
            for li in li_list:
                # 名称
                item["name"] = li.xpath('.//a[@data-el="region"]/text()')[0].strip()
                # 户型+面积+方位+精装+楼层+楼型
                # info_list:['3室2厅','110平米','南 北','毛坯','中楼层(共19层)','板楼']
                info_list = li.xpath('.//div[@class="houseInfo"]/text()')[0].split('|')
                item["model"] = info_list[0].strip()
                item["area"] = info_list[1].strip()[:-2]
                item["direction"] = info_list[2].strip()
                item["perfect"] = info_list[3].strip()
                item["floor"] = info_list[4].strip()
                # 地区+总价+单价
                item["address"] = li.xpath('.//div[@class="positionInfo"]/a/text()')[1].strip()
                item["totall"] = li.xpath('.//div[@class="totalPrice"]/span/text()')[0].strip()
                item["unit"] = li.xpath('.//div[@class="unitPrice"]/span/text()')[0].strip()[2:-4]
                print(item)
    
    #入口函数
    def run(self):
        for page in range(1,3):
            url=self.url.format(page)
            self.parse_html(url)
            time.sleep(random.uniform(1,3))#产生随机浮点数休眠
    
    if __name__ == '__main__':
    spider=LianjiaSpider()
    spider.run()

全部评论 (0)

还没有任何评论哟~

xpath爬取链家二手房

importrequests fromlxmlimportetree fromfakeuseragentimportUserAgent importrandom importtime classLia...

python爬虫爬取链家二手房信息(xpath)

python爬虫爬取链家二手房信息xpath 需求流程详细代码需求将小区名称、厅室、面积、毛坯还是精装、楼层、建筑年代、板楼还是塔楼、总价和每平方米单价等信息爬取放入字典中流程 1.查看想要...

链家二手房100页Xpath爬取保存csv

importrandom importtime importcsv importrequests fromfakeuseragentimportUserAgent fromlxmlimportetre...

爬取链家二手房数据

爬取链家二手房数据.md 1.介绍项目 2.分析网站结构 3.说明技术选择 4.excel表格构建字段 1\.介绍项目利用python爬取链家网二手房数据 1.包括标题，位置，布局，总价，均价 2实...

python爬取链家_Python爬取链家北京二手房数据

今天分享一下前段时间抓取链家上北京二手房数据的项目。本次分享分为两部分，第一部分介绍如何使用scrapy抓取二手房数据，第二部分我将抓下来的数据进行了一些简单的分析和可视化。

python爬取链家二手房信息

爬取过程分析： 1、链家网url：<https://bj.lianjia.com/ershoufang/pg2/ 显示的二手房信息每个url递增一 2、访问时要加头信息，否则无法访问 3、用beaut...

python爬取链家_python+scrapy爬虫（爬取链家的二手房信息）

之前用过selenium和request爬取数据，但是感觉速度慢，然后看了下scrapy教程，准备用这个框架爬取试一下。 1、目的：通过爬取成都链家的二手房信息，主要包含小区名，小区周边环境，小区楼层...

爬虫爬取链家二手房信息，对二手房做分析

importnumpyasnp importpandasaspd importmatplotlib.pyplotasplt frombs4importBeautifulSoup importreque...

链家二手房爬虫

最近在看模拟登陆，找链家二手房爬取来练手，因为技术有限，有些并行化，复杂度，性能方面等不能兼顾，学习学习学习。下面贴出源码，有志同道合的高手可以帮忙改进或者提些意见。

python爬虫（xpath）爬取链家网房源信息

importrequests fromlxmlimportetree importtime importrandom classLianjiaSpiderobject: definitself: se...

是否确定退出登录?

xpath爬取链家二手房

全部评论 (0)

相关文章推荐

xpath爬取链家二手房

python爬虫爬取链家二手房信息(xpath)

链家二手房100页Xpath爬取保存csv

爬取链家二手房数据

python爬取链家_Python爬取链家北京二手房数据

python爬取链家二手房信息

python爬取链家_python+scrapy爬虫（爬取链家的二手房信息）

爬虫爬取链家二手房信息，对二手房做分析

链家二手房爬虫

python爬虫（xpath）爬取链家网房源信息