Advertisement

365天深度学习训练营-第R3周:天气预测

阅读量:

我的环境:

  • 语言环境:Python3.11.2
  • 编译器:PyCharm Community Edition 2022.3
  • 深度学习环境:TensorFlow2

本次目标**:**

  • 了解探索式数据分析(EDA)

一、导入数据

导入时复制一份数据。

复制代码
 import pandas

    
 data = pandas.read_csv('F:\ weatherAUS.csv')
    
 df = data.copy()
    
 print(data.head())

对数据进行分析。

复制代码
    print(data.describe())

查看各项的数据类型。

复制代码
    print(data.dtypes)

设置数据中的年、月、日。dt是pandas 中datetime型数据的操作(dt.*)。

复制代码
 data['Date'] = pandas.to_datetime(data['Date'])

    
 print(data['Date'])
    
 data['year'] = data['Date'].dt.year
    
 data['Month'] = data['Date'].dt.month
    
 data['day'] = data['Date'].dt.day
    
 df['year'] = data['Date'].dt.year
    
 df['Month'] = data['Date'].dt.month
    
 df['day'] = data['Date'].dt.day
    
 data.drop('Date',axis=1,inplace=True)
    
 print(data.head())

二、探索式分析(EDA)

2.1 数据相关性探索

****seaborn是python的数据可视化库,基于matplotlib。可以绘制更多的统计图形。

复制代码
 import matplotlib.pyplot as plt

    
 import seaborn as sns
    
 data.drop('Location',axis=1,inplace=True)
    
 data.drop('WindGustDir',axis=1,inplace=True)
    
 data.drop('WindDir9am',axis=1,inplace=True)
    
 data.drop('WindDir3pm',axis=1,inplace=True)
    
 data.drop('RainToday',axis=1,inplace=True)
    
 data.drop('RainTomorrow',axis=1,inplace=True)
    
 print(data.head())
    
 print(data.columns)
    
  
    
 plt.figure(figsize=(15,13))
    
 ax = sns.heatmap(data.corr(),square=True,annot=True, fmt='.2f')
    
 ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
    
 plt.show()

data.corr()方法表示data中的两个变量之间的关系。

2.2 是否下雨

复制代码
 sns.set(style="darkgrid")

    
 plt.figure(figsize=(4,3))
    
 sns.countplot(x='RainTomorrow',data=df)
    
 plt.show()
复制代码
 plt.figure(figsize=(4,3))

    
 sns.countplot(x='RainToday',data=df)
    
 plt.show()

统计今天是否下雨与每天是否下雨的数值

复制代码
 x = pandas.crosstab(df['RainTomorrow'],df['RainToday'])

    
 print(x)
    
  
    
 y = x/x.transpose().sum().values.reshape(2,1)*100
    
 print(y)
    
  
    
 y.plot(kind='bar',figsize=(4,3),color=['green','blue'])
    
 plt.show()

这里第一个表格展示的为数值,即今天不下雨明天也不下雨的次数为92728次,剩余三项以此类推。第二个表格则是将这些数值转换为百分比,即在今天不下雨的情况下明天也不下雨的概率是84.616648,明天下雨的概率为15.383352,另外两项以此类推。

2.3 地理位置与下雨的关系

复制代码
 x = pandas.crosstab(df['Location'],df['RainToday'])

    
 y = x/x.transpose().sum().values.reshape((-1,1))*100
    
 y=y.sort_values(by='Yes',ascending=True)
    
 color=['#cc6699','#006666','#862d86','#ff9966']
    
 y.Yes.plot(kind='barh',figsize=(15,20),color=color)
    
 plt.show()

2.4 湿度和压力对下雨的影响

复制代码
 data.columns

    
 plt.figure(figsize=(8,6))
    
 sns.scatterplot(data=df,x='Pressure9am',y='Pressure3pm',hue='RainTomorrow')
    
 plt.figure(figsize=(8,6))
    
 sns.scatterplot(data=df,x='Humidity9am',y='Humidity3pm',hue='RainTomorrow')


低压与高温会增加下雨概率。尤其是下午三时。

2.5 气温对下雨的影响

复制代码
 plt.figure(figsize=(8,6))

    
 sns.scatterplot(x='MaxTemp',y='MinTemp',data=df,hue='RainTomorrow')

结论:当一天的最高气温和最低气温接近时,第二天下雨的概率增加。

三、数据预处理

查看每列缺失数据百分比

复制代码
 import pandas,numpy

    
 import matplotlib.pyplot as plt
    
 import seaborn as sns
    
 import tensorflow as tf
    
 from sklearn.model_selection import train_test_split
    
 from sklearn.preprocessing import MinMaxScaler,LabelEncoder
    
 from tensorflow.keras.models import Sequential
    
 from tensorflow.keras.layers import Dense,Activation,Dropout
    
 from tensorflow.keras.callbacks import EarlyStopping
    
 from tensorflow.keras.optimizers import Adam
    
 from sklearn.metrics import classification_report,confusion_matrix,r2_score,mean_absolute_error
    
 from sklearn.metrics import  mean_absolute_percentage_error,mean_squared_error
    
  
    
 d = df.isnull().sum()/df.shape[0]*100
    
 print(d)

对缺失数据进行填充

复制代码
 lst=['Evaporation','Sunshine','Cloud9am','Cloud3pm']

    
 for col in lst:
    
     fill_list = df[col].dropna()
    
     df[col] = df[col].fillna(pandas.Series(numpy.random.choice(fill_list,size=len(df.index))))
    
 s = (df.dtypes == 'object')
    
 print(s)
    
 print(s[s])
    
 object_cols = list(s[s].index)
    
 print(object_cols)
    
  
    
 for i in object_cols:
    
     df[i].fillna(df[i].mode()[0],inplace=True)
    
 t=(df.dtypes == 'float64')
    
 num_cols = list(t[t].index)
    
 print(num_cols)
    
  
    
 for i in num_cols:
    
     df[i].fillna(df[i].median(),inplace=True)
    
 df.isnull().sum()

构建数据集

复制代码
 X = df.drop(['RainTomorrow','day'],axis=1).values

    
 y = df['RainTomorrow'].values
    
  
    
 X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=101)
    
 scaler = MinMaxScaler()
    
 scaler.fit(X_train)
    
 X_train = scaler.transform(X_train)
    
 X_test = scaler.transform(X_test)

四、预测是否下雨

4.1 搭建、编译、训练神经网络

复制代码
 model = Sequential()

    
 model.add(Dense(units=24,activation='tanh'))
    
 model.add(Dense(units=18,activation='tanh'))
    
 model.add(Dense(units=23,activation='tanh'))
    
 model.add(Dropout(0.5))
    
 model.add(Dense(units=12,activation='tanh'))
    
 model.add(Dropout(0.2))
    
 model.add(Dense(units=1,activation='sigmoid'))
    
  
    
 optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)
    
  
    
 model.compile(optimizer=optimizer,loss='binary_crossentropy',metrics="accuracy")
    
  
    
 early_stop = EarlyStopping(monitor='val_loss',mode='min',min_delta=0.001,verbose=1,patience=25,restore_best_weights=True)
    
  
    
 model.fit(x=X_train,y=y_train,
    
       validation_data=(X_test,y_test),
    
       verbose=1,
    
       callbacks=[early_stop],
    
       epochs=10,
    
       batch_size=32
    
       )

4.2 结果可视化

复制代码
 acc=model.history.history['accuracy']

    
 val_acc=model.history.history['val_accuracy']
    
  
    
 loss = model.history.history['loss']
    
 val_loss = model.history.history['val_loss']
    
  
    
 epochs_range = range(10)
    
  
    
 plt.figure(figsize=(14,4))
    
 plt.subplot(1,2,1)
    
 plt.plot(epochs_range,acc,label='Training Accuracy')
    
 plt.plot(epochs_range,val_acc,label='Validation Accuracy')
    
 plt.legend(loc='lower right')
    
 plt.title('Training and Validation Accuracy')
    
  
    
 plt.subplot(1,2,2)
    
 plt.plot(epochs_range,loss,label='Training Loss')
    
 plt.plot(epochs_range,val_loss,label='Validation Loss')
    
 plt.legend(loc='upper right')
    
 plt.title('Training and Validation Loss')
    
 plt.show()

全部评论 (0)

还没有任何评论哟~