pandas基本的数据处理

阅读量：

数据处理

基于series的数据处理

series默认加法

复制代码

    import pandas as pd
    pd_s1 = pd.Series([1,2,3,4,5,6,7,8,9])
    pd_s2 = pd.Series([1,2,3,4])
    print(pd_s1)
    print(pd_s2)
    print('---------华丽的分隔符-------------')
    print(pd_s1+pd_s2)

输出结果：

复制代码

    0    1
    1    2
    2    3
    3    4
    4    5
    5    6
    6    7
    7    8
    8    9
    dtype: int64
    0    1
    1    2
    2    3
    3    4
    dtype: int64
    ---------华丽的分隔符-------------
    0    2.0
    1    4.0
    2    6.0
    3    8.0
    4    NaN
    5    NaN
    6    NaN
    7    NaN
    8    NaN
    dtype: float64

默认情况是对齐相加，对不齐的全部为NaN。

series的NaN值处理

复制代码

    print(pd_s1.add(pd_s2,fill_value=-10))

输出结果：

复制代码

    0    2.0
    1    4.0
    2    6.0
    3    8.0
    4   -5.0
    5   -4.0
    6   -3.0
    7   -2.0
    8   -1.0
    dtype: float64

map\apply\applymap的使用

map使用

【1】创建数据

复制代码

    import pandas as pd
    import numpy as np
    boolean = [True,False]
    gender = ["男","女"]
    color = ["白色","红色","绿色"]
    df = pd.DataFrame({
    "height":np.random.randint(100,180,100),
    "weight":np.random.randint(35,70,100),
    "swimming":[boolean[x] for x in np.random.randint(0,2,100)],
    "gender":[gender[x] for x in np.random.randint(0,2,100)],
    "color":[color[x] for x in np.random.randint(1,len(color),100)]
    }
    )
    print(df.head(5))

输出结果：

复制代码

       height  weight  swimming gender color
    0     112      47      True      男    绿色
    1     104      38      True      男    绿色
    2     127      57      True      男    红色
    3     157      39      True      男    绿色
    4     106      53      True      女    红色

【2】使用map和字典替换男、女为1,0数值。

复制代码

    df["gender"]=df["gender"].map({"男":1,"女":0})
    print(df.head(5))

输出结果：

复制代码

       height  weight  swimming  gender color
    0     120      41     False       0    绿色
    1     157      61      True       1    红色
    2     145      66      True       1    红色
    3     134      55      True       1    绿色
    4     141      40     False       1    绿色

或者使用map和函数替换男、女为1,0数值。

复制代码

    def genderChange(input):
    if input=="男":
        return 1
    elif input=="女":
        return 0
    else:
        return 99
    
    df["gender"]=df["gender"].map(genderChange)
    print(df.head(5))

输出结果：

复制代码

       height  weight  swimming  gender color
    0     100      69      True       1    绿色
    1     159      48     False       0    红色
    2     114      51      True       1    红色
    3     144      50      True       0    绿色
    4     137      51      True       0    红色

apply使用

apply能够传入更多的参数，比map处理更复杂的逻辑。
【1】创建数据

复制代码

    import pandas as pd
    import numpy as np
    boolean = [True,False]
    gender = ["男","女"]
    color = ["白色","红色","绿色"]
    df = pd.DataFrame({
    "height":np.random.randint(100,180,100),
    "weight":np.random.randint(35,70,100),
    "swimming":[boolean[x] for x in np.random.randint(0,2,100)],
    "gender":[gender[x] for x in np.random.randint(0,2,100)],
    "color":[color[x] for x in np.random.randint(1,len(color),100)]
    }
    )
    print(df.head(5))

输出结果：

复制代码

       height  weight  swimming gender color
    0     132      37     False      男    绿色
    1     164      68      True      女    红色
    2     165      61      True      男    红色
    3     136      41      True      女    红色
    4     119      38     False      男    红色

【2】使用apply调整height列的数据值

复制代码

    def heightChange(x,change):
    return x+change
    
    df["height"]=df["height"].apply(heightChange,args=(4,))
    print(df.head(5))

输出结果：

复制代码

       height  weight  swimming gender color
    0     136      37     False      男    绿色
    1     168      68      True      女    红色
    2     169      61      True      男    红色
    3     140      41      True      女    红色
    4     123      38     False      男    红色

基于dataFrame的数据处理

axis的使用

在dataFrame的数据处理过程中,通常会采用axis参数来指定操作方向.它决定了数据操作是在0轴(即列)还是1轴(即行)进行处理.当axis=0时,则表示对列进行操作;(向下箭头)当axis=1时,则表示对行进行操作;(向右箭头)例如:

准备数据

效果

【样例1】对height列、weight列(axis=0决定)分别计算总和（sum决定）

复制代码

    print(df[["height","weight"]].apply(np.sum,axis=0))

输出结果：

复制代码

    height    13837
    weight     5304
    dtype: int64

【样例2】对height列、weight列(axis=0决定)数值分别取对数

复制代码

    print(df[["height","weight"]].apply(np.log,axis=0))

因为取对数是在每个数值上进行计算的原因导致了数据量的显著增加,而不仅仅是单行数据

复制代码

      height    weight
    0   5.068904  3.951244
    1   4.653960  3.891820
    2   5.159055  3.610918
    3   4.653960  4.127134
    4   4.709530  3.663562
    ..       ...       ...
    95  5.153292  4.204693
    96  5.123964  3.931826
    97  5.056246  3.828641
    98  4.852030  3.806662
    99  5.181784  3.713572

applydata的使用

对每个dataframe中每个数值进行函数操作

准备数据

复制代码

    df = pd.DataFrame(
    {
        "a":np.random.randn(4),
        "b":np.random.randn(4),
        "c":np.random.randn(4)
    }
    )
    print(df)

输出结果：

复制代码

         a         b         c
    0 -0.685025  0.182944 -0.226137
    1 -0.167727 -0.097995 -0.911103
    2 -0.361764  0.708554  1.821675
    3 -1.049370 -0.777683  0.467673

applydata效果

复制代码

    print(df.applymap(lambda x:"%.2f" %x))

输出结果：

复制代码

       a      b      c
    0  -0.69   0.18  -0.23
    1  -0.17  -0.10  -0.91
    2  -0.36   0.71   1.82
    3  -1.05  -0.78   0.47

全部评论 (0)

还没有任何评论哟~

pandas基本的数据处理

数据处理基于series的数据处理 series默认加法 importpandasaspd pds1=pd.Series[1,2,3,4,5,6,7,8,9] pds2=pd.Series[1,2,...

pandas数据基本处理

pandas数据格式： Series：一维 DataFrame：二维，每一列为一个Series DataFrame列定位的不同方式：依据属性名选择： data['column']和data.colu...

Python3——pandas基本的数据处理

一般呢，我们拿到的原始数据中包含大量的脏数据，常常需要对其进行预处理，得到我们想要的数据格式。最常用的不外乎过滤数据、日期格式转换、填空值、排序、去重等，下面就用个实例来展示下pandas处理数据的基...

pandas处理数据的基本方法

pandas库的基本使用 pandas的数据结构常用到一维（series），二维（DataFrame）等：对于二维数据二维数据包含行索引和列索引可以通过pandas生成一组连续时间序列 seri...

pandas数据处理基本操作

一.创建数据 importpandasaspd importnumpyasnp 生成64数据集 data=pd.DataFramenp.random.rand6,4,columns=list'ABCD...

Pandas数据清洗及基本处理

合并数据堆叠合并数据 1、横向表堆叠横向堆叠，即将两个表在X轴向拼接在一起，可以使用concat函数完成， pandas.concatobjs,axis=0,join=‘outer’,joinax...

Pandas数据处理之文本数据处理

Pandas文本数据处理一、string类型的性质 string与object的区别 string类型的转换二、拆分与拼接 str.splite方法 a）分割符与str的位置元素选取 b）expa...

Pandas:文本数据处理

文章目录 1.通过str访问，且自动排除丢失/NA值 2.字符串常用方法（1）lower，upper，len，startswith，endswith 3.字符串常用方法（2）strip去除字符串的空格...

Pandas处理文本数据

概述在Pandas中,为Series对象和Index对象配备了很对处理文本数据的方法,可以轻松地对数组中的每个元素进行操作。最重要的是，这些方法自动排除缺失值/NA值。

Python数据处理012：Pandas 基本介绍

一、Numpy和Pandas有什么不同如果用python的列表和字典来作比较,那么可以说Numpy是列表形式的，没有数值标签，而Pandas就是字典形式。Pandas是基于Numpy构建的，让Num...

是否确定退出登录?

pandas基本的数据处理

数据处理

基于series的数据处理

series默认加法

series的NaN值处理

map\apply\applymap的使用

map使用

apply使用

基于dataFrame的数据处理

axis的使用

准备数据

效果

applydata的使用

准备数据

applydata效果

全部评论 (0)

相关文章推荐

pandas基本的数据处理

pandas数据基本处理

Python3——pandas基本的数据处理

pandas处理数据的基本方法

pandas数据处理基本操作

Pandas数据清洗及基本处理

Pandas数据处理之文本数据处理

Pandas:文本数据处理

Pandas处理文本数据

Python数据处理012：Pandas 基本介绍