爬取京东商品信息

阅读量：

爬取京东商品信息

环境：

Python 3.6
Pycharm
MYSQL

京东网页分析

主要抓取以下商品参数：

name

爬取京东商品信息首先得有商品信息入口，以商品女装（关键字）为例，

复制代码

    url = 'https://search.jd.com/Search?keyword=%s&enc=utf-8&page=%s'
    # keyword 为搜索关键字
    # page 为页码

简单分析京东商品信息页面，发现商品讯息：

我们可以通过xpath找到我们想要的信息：

找到我们想要的信息后我们通过PyMysql将商品信息存到数据库中：

复制代码

    def get_db(self):
        db = pymysql.connect(
            host='你的ip',
            port=3306,
            user='用户名',
            password='密码',
            db='库名',
            charset='utf8',
        )
        return db
    
    def result_save(self, data):
        #连接数据库
        db = self.get_db()
        #创建游标
        cursor = db.cursor()
    
        #sql语句
        sql = 'insert into taobao (name, price, shop_name, location) values(%s, %s, %s, %s)'
        try:
    
            for i in range(len(data['price'])):
                #执行sql语句
                cursor.execute(sql, (
                    data['name'], data['price'][i], data['shop_name'][i],
                    data['location'][i])）
                db.commit()
            print('爬取储存成功，共%s条。' % len(data['price']))
        except Exception as e:
            print(e)
            print('爬取失败')
            db.rollback()
    
        #关闭游标
        cursor.close()
        #关闭连接
        db.close()

最终保存的数据：

好了到这里就结束了，来看看我们全部代码：

复制代码

    from lxml import etree
    import pymysql
    import requests
    from selenium.common.exceptions import TimeoutException
    
    
    class Jd:
    def get_db(self):
        db = pymysql.connect(
            host='你的ip',
            port=3306,
            user='用户名',
            password='密码',
            db='库名',
            charset='utf8',
        )
        return db
    
    def result_save(self, data):
        db = self.get_db()
        cursor = db.cursor()
    
        sql = 'insert into table_name(name, price, shop_name, location) values(%s, %s, %s, %s)'
        try:
    
            for i in range(len(data['price'])):
                cursor.execute(sql, (
                    data['name'], data['price'][i], data['shop_name'][i],
                    data['location'][i]))
                db.commit()
            print('爬取储存成功，共%s条。' % len(data['price']))
        except Exception as e:
            print(e)
            print('爬取失败')
            db.rollback()
    
        cursor.close()
        db.close()
    
    
    def page_get(self, page, keyword):
    
        print('正在爬取第', page, '页')
        try:
            url = 'https://search.jd.com/Search?keyword=%s&enc=utf-8&page=%s' % (keyword, page)
            headers = {
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0',
                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
                'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.5,zh;q=0.3',
                'Referer': 'https://www.jd.com/',
                'DNT': '1',
                'Connection': 'keep-alive',
                'Upgrade-Insecure-Requests': '1',
                'TE': 'Trailers',
            }
            browser = requests.get(url, headers=headers)
            html_str = browser.text
            html = etree.HTML(html_str)
            data = {}
            name = keyword
            shop_name = html.xpath("//div[@class='p-img']/a/@title")
            price = html.xpath("//div/ul/li/div/div/strong/i/text()")
            location = html.xpath("//div[@class='p-img']/a/img/@src")
            data['name'] = name
            data['price'] = price
            data['shop_name'] = shop_name
            data['location'] = location
            self.result_save(data)
    
        except TimeoutException:
            self.page_get(page, keyword)
    
    
    jd = Jd()
    pages = int(input('请输入要爬取的页数：'))
    keyword = input('请输入要搜索的关键字：')
    for i in range(1, pages + 1):
    jd.page_get(i, keyword)

全部评论 (0)

还没有任何评论哟~

python爬取京东商品信息代码_爬取京东商品信息

利用BeautifulSoup+Requests爬取京东商品信息并保存在Excel中一、查看网页信息打开京东商城，随便输入个商品，就选固态硬盘吧先看看URL的规律，可以看到我们输入的关键词是在k...

爬取京东商品信息

爬取京东商品信息环境： Python3.6 Pycharm MYSQL 京东网页分析主要抓取以下商品参数： name：商品种类名 price：商品价格 location：商品图片地址 shopna...

爬取京东商品信息

爬取京东商品信息爬虫库：selenium、pyquery 数据库：MongoDB 代码如下： 1.spider.py importtime fromseleniumimportwebdriver f...

爬取京东商品信息

爬取商品的标题、店铺、价格、评价数以及链接，存储为Excel。静态页面解析起来比较简单，有时间再补上分析过程。效果如下：附上代码： importrequests,re,datetime from...

python 爬取京东商品信息

coding:utf8 importos importre importtime fromurllib.parseimporturlencode importrequests fromlxmlimpo...

Python爬取京东商品信息

使用Python爬取京东华为手机前十页的所有商品的链接、名称、价格、评价数以及店铺名称。 1.前期准备（1）下载驱动我使用的是谷歌浏览器，所以要下载谷歌驱动，用来告诉电脑在哪打开浏览器。

selenium爬取京东商品信息

开始编写代码之前你应了解ajax和python基础语法和库，知道异步加载！熟悉html，js。本人ide用的是vscode，浏览器是chrome，python3.7，主要用到了selenium自动化测...

Python爬取京东商品信息

爬取所用url 商品信息列表页：https://search.jd.com/search?keyword=%E6%89%8B%E6%9C%BA&enc=utf8 商品信息详情页：<https://it...

Python 爬虫爬取京东商品信息 ||京东商品详情API

Python爬虫爬取京东商品信息下面我将逐一解释每一部分的代码导入库 from selenium import webdriver from selenium.webdriver.edge.ser...

python爬虫爬取京东商品信息

self.currentrow+=1 保存文件（这里的’./djdata.xls’是默认路径，如果调用此函数，没有传fileurl参数，则使用’./djdata.xls’） defsavefilese...

是否确定退出登录?

爬取京东商品信息