python selenium 爬取网页审查元素_Python 使用selenium爬取拉钩网Python职位信息（爬虫）...

阅读量：

爬取拉勾网python招聘职位

**17****/**10

周四晴

整体思路：

1 使用我们最近讲的selenium模块进行模拟浏览器爬取

2 网页解析使用 xpath(底层为c语言，效率高)

3保存为csv数据

需要的模块：

复制代码

    import randomimport timeimport csvfrom urllib.parse import quote   from lxml import etreefrom selenium import webdriver

其中 selenium 和 lxml 需要 pip install 命令进行安装

复制代码

    class LaGoSpider(object):'''封装为一个类，方便操作'''    def __init__(self):        options = webdriver.ChromeOptions()        options.add_argument('--headless')        options.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})        self.driver = webdriver.Chrome(r'D:\外安装软件\selenium1\chromedriver_win32\chromedriver.exe', options=options)        self.data_list = []    def address_url(self):    '''    获取目标url(拼接)    '''        self.citys = ['全国', '北京', '深圳', '广州', '杭州', '成都', '南京', '上海', '厦门', '西安', '长沙']        self.baseurl = 'https://www.lagou.com/jobs/list_python?px=default&city={}'        for self.city in self.citys:            self.url = self.baseurl.format(quote(self.city))            self.driver.get(self.url)            print('正在爬取' % self.city)            while True:                source = self.driver.page_source                self.position_url_parse(source)                next_page = self.driver.find_element_by_xpath('//span[@]')                if 'contains(class, "pager_next")' in next_page.get_attribute('class'): # 判断一页是否爬取完成                     print('' % self.city)                    break                else:                    self.driver.execute_script("arguments[0].click()", next_page)                    print('----------------爬取下一页--------------')                    time.sleep(random.randint(3, 5))

复制代码

    def position_url_parse(self, source):'''获取每个职位的url'''        html = etree.HTML(source)        lis = html.xpath('//ul[@]//li')        for li in lis:            position_url = li.xpath('.//a[@]//@href')[0]            self.request_urls(position_url)            time.sleep(random.randint(1, 3))    def request_urls(self, list_url):        self.driver.execute_script('window.open("%s")' % list_url)        self.driver.switch_to_window(self.driver.window_handles[1])        source = self.driver.page_source        self.parse_position(source)        time.sleep(random.randint(1, 3))        self.driver.close()        self.driver.switch_to_window(self.driver.window_handles[0])        time.sleep(random.randint(1, 3))

复制代码

    def parse_position(self, source):'''抓取每个职位的详情信息'''        self.data = {}        html = etree.HTML(source)        company = html.xpath('//dl[@]/dt/a/img/@alt')[0]        print(company)        self.data['公司'] = company        name = html.xpath('//div[@]//span[@]/text()')[0]        self.data['名称'] = name        salary = html.xpath('//dd[@]/p[1]/span[1][@]/text()')[0]        self.data['薪资'] = salary        city = ''.join(html.xpath('//dd[@]/p[1]/span[2]/text()')[0]).replace('/','')        self.data['城市'] = city        jinyan = ''.join(html.xpath('//dd[@]/p[1]/span[3]/text()')[0]).replace('/', '')        self.data['经验'] = jinyan        xueli = ''.join(html.xpath('//dd[@]/p[1]/span[4]/text()')[0]).replace('/','')        self.data['学历'] = xueli        zhihuo = html.xpath('//*[@id="job_detail"]/dd[1]/p/text()')[0]        self.data['职位诱惑'] = zhihuo        zhimiao = ''.join(html.xpath('//div[@]//p//text()')).replace('岗位职责: ', '').replace('岗位要求：', '').replace('岗位职责：', '').replace('工作职责：', '').replace('项目背景：', '').replace('-', '').strip()        self.data['职位描述'] = zhimiao        self.data_list.append(self.data)        self.csv_()    def csv_(self):    '''    保存数据为csv    '''        header = ['公司', '名称', '薪资', '城市', '经验', '学历', '职位诱惑', '职位描述']        with open('lagou_quanguo.csv', 'w', encoding='utf-8', newline='')as fb:            writer = csv.DictWriter(fb, header)            writer.writeheader()            writer.writerows(self.data_list)if __name__ == '__main__':    LG = LaGoSpider()    LG.address_url()

岁月有你惜惜相处

给我在看

全部评论 (0)

还没有任何评论哟~

python selenium 爬取网页审查元素_Python 使用selenium爬取拉钩网Python职位信息（爬虫）...

爬取拉勾网python招聘职位 17/10 周四晴整体思路： 1使用我们最近讲的selenium模块进行模拟浏览器爬取 2网页解析使用xpath底层为c语言，效率高 3保存为csv数据需要的模块：...

python selenium 爬取网页审查元素_Python 爬虫 | selenium爬取某招聘平台

最近小编参加了学校的爬虫比赛，由于我是比赛开始后一天我才知道有这个比赛，这个比赛不止需要做爬数据，还需要做数据分析，因此时间比较紧。本次比赛的主题是围绕着大数据工程师进行数据的爬取和分析。

使用selenium爬取拉勾网全国python职位信息

selenium做的爬虫一般很稳定，很难被发现，爬取一些做有反爬措施的网站是个不错的选择，但是对于反爬措施很厉害的网站还是会被发现的，可能要添加随机代理，还有随机延迟。拉钩网的反爬做的很厉害，做这个...

爬虫笔记——拉勾网职位信息爬取（selenium方法）

拉勾网爬虫笔记——selenium爬取拉勾网职位信息初步爬虫框架构造第一页职位信息爬取第二页等页面的职位信息爬取爬取数据的保存细节处理爬取过程中出现需要登录的处理爬取过程中网页崩溃的处理...

selenium+lxml爬取(查询)拉勾网职位信息

拉勾网：是一家专为拥有3至10年工作经验的资深互联网从业者，提供工作机会的招聘网站。拉勾网专注于在为求职者提供更人性化、专业化服务的同时，降低企业端寻觅良才的时间和成本。拉勾网致力于帮助互联网人士做出...

Python + selenium 爬取网页信息

最近需要从网页上找一些有用的信息，就简单用python爬了一下。网上方法有很多，request，BeautifulSoup，selenium，Scrapy等等。

selenium爬取拉勾网职位招聘信息

selenium爬取拉勾网职位招聘信息 encoding:utf8 fromseleniumimportwebdriver fromlxmlimportetree frompyqueryimportP...

python爬虫爬取拉勾网职业信息

一、前言最近想做一份关于拉勾网数据分析类职业的报告，便顺手写了个简单的爬虫，记录分享如下。二、思路整理 1、首先我们打开拉勾网，并搜索“”数据分析“”，显示出来的职位便是我们的目标 2、接下来我们...

python爬虫爬取拉勾网职业信息

先放成果招聘关键字词云公司关键字词云代码git地址：<https://github.com/fengyuwusong/lagouscrapy 目标抓取拉钩关于java工程师的招聘信息并制作成词...

python用selenium爬取网页数据_Python项目实战：使用selenium爬取拉勾网数据

“一切不经过项目验证的代码都是耍流氓，今天我们就通过一个简单的招聘网站的数据归档进行当前热门岗位的大数据分析，最后以wordcloud进行显示。本文为数据爬取篇。” 项目准备：这次我们来比较完整的抓...

是否确定退出登录?

python selenium 爬取网页审查元素_Python 使用selenium爬取拉钩网Python职位信息（爬虫）...

全部评论 (0)

相关文章推荐

python selenium 爬取网页审查元素_Python 使用selenium爬取拉钩网Python职位信息（爬虫）...

python selenium 爬取网页审查元素_Python 爬虫 | selenium爬取某招聘平台

使用selenium爬取拉勾网全国python职位信息

爬虫笔记——拉勾网职位信息爬取（selenium方法）

selenium+lxml爬取(查询)拉勾网职位信息

Python + selenium 爬取网页信息

selenium爬取拉勾网职位招聘信息

python爬虫爬取拉勾网职业信息

python爬虫爬取拉勾网职业信息

python用selenium爬取网页数据_Python项目实战：使用selenium爬取拉勾网数据