python数据处理及可视化
发布时间
阅读量:
阅读量
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"0.导入相关库"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import cv2\n",
"import random\n",
"import shutil\n",
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from sklearn.utils import shuffle\n",
"from sklearn import metrics\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import classification_report, confusion_matrix,precision_score,f1_score,recall_score,precision_recall_curve,auc\n",
"from sklearn.utils.class_weight import compute_class_weight\n",
"\n",
"import tensorflow as tf\n",
"from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
"from tensorflow.keras.preprocessing import image_dataset_from_directory\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras import layers, models\n",
"from tensorflow.keras.layers import Input,BatchNormalization,Flatten,Dense,MaxPool2D,Conv2D,AveragePooling2D,Dropout,Flatten,Activation,ReLU, Softmax\n",
"from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint\n",
"from tensorflow.keras.applications.resnet50 import ResNet50\n",
"from tensorflow.keras.applications import VGG16\n",
"from tensorflow.keras.utils import to_categorical\n",
"from tensorflow.keras.optimizers import SGD,Adam\n",
"from tensorflow.keras.regularizers import l2\n",
"\n",
"from keras.callbacks import Callback,ModelCheckpoint\n",
"from keras.models import Sequential,load_model\n",
"from keras.layers import Dense, Dropout\n",
"from keras.wrappers.scikit_learn import KerasClassifier\n",
"from keras.initializers import glorot_uniform\n",
"import keras.backend as K"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1、读取数据并预处理"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"#随机种子\n",
"seed_value=42\n",
"np.random.seed(seed_value)\n",
"tf.random.set_seed(seed_value)\n",
"random.seed(seed_value)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found 4098 images belonging to 4 classes.\n",
"Found 1023 images belonging to 4 classes.\n"
]
}
],
"source": [
"#训练数据生成器,包含数据增强和验证集划分\n",
"train_datagen = ImageDataGenerator(\n",
" rescale=1./255, #将像素值缩放到0到1之间\n",
" validation_split=0.2, #将20%的数据用于验证\n",
" #rotation_range=5, #随机旋转图像角度范围(±5度)\n",
" #width_shift_range=0.2, #随机水平平移(20%)\n",
" #height_shift_range=0.2, #随机垂直平移(20%)\n",
" #zoom_range=0.2, #随机缩放(±20%)\n",
" #horizontal_flip=True, #随机水平翻转图像\n",
" #vertical_flip=True, #随机垂直翻转图像\n",
" #fill_mode='nearest' #填充缺失像素使用最邻近的像素\n",
")\n",
"\n",
"#验证数据生成器\n",
"validation_datagen = ImageDataGenerator(rescale=1./255)#将像素值缩放到0到1之间\n",
"\n",
"#训练集生成\n",
"train_dataset = train_datagen.flow_from_directory(\n",
" directory='Alzheimer_s Dataset/train/', # 训练集目录\n",
" target_size=(224, 224), #图像调整为224x224\n",
" class_mode='categorical', #类别模式为多类分类\n",
" subset='training', #获取训练集\n",
" batch_size=32 #每个批次包含32张图像\n",
")\n",
"\n",
"# 创建验证集生成器\n",
"validation_dataset = train_datagen.flow_from_directory(\n",
" directory='Alzheimer_s Dataset/train/', # 同样的训练集目录\n",
" target_size=(224, 224), #图像调整为224x224\n",
" class_mode='categorical', #类别模式为多类分类\n",
" subset='validation', #获取验证集\n",
" batch_size=32 #每个批次包含32张图像\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found 1279 images belonging to 4 classes.\n"
]
}
],
"source": [
"#创建测试数据生成器\n",
"test_datagen = ImageDataGenerator(rescale=1./255)#将像素值缩放到0到1之间\n",
"\n",
"#测试集生成\n",
"test_dataset = test_datagen.flow_from_directory(\n",
" directory='Alzheimer_s Dataset/test/', #测试集目录\n",
" target_size=(224, 224), #图像调整为224x224\n",
" class_mode='categorical', #类别模式为多类分类\n",
" batch_size=32 #每个批次包含32张图像\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2、数据处理及可视化"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MildDemented: 717个样本\n",
"ModerateDemented: 52个样本\n",
"NonDemented: 2560个样本\n",
"VeryMildDemented: 1792个样本\n",
"MildDemented: 179个样本\n",
"ModerateDemented: 12个样本\n",
"NonDemented: 640个样本\n",
"VeryMildDemented: 448个样本\n"
]
},
{
"data": {
"image/png": "xxxxxxxxC",
"text/plain": [
"<Figure size 800x600 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#VGG模型评估\n",
"evaluate_and_plot(VGGmodel,x_test, y_test)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "tf_keras",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
代码解读
以下是对这段代码功能的详细分析:
1. 导入相关库(0.导入相关库部分)
- 引入了多种用于数据处理、模型构建以及评估的Python库。
- 包含以下关键模块:
- OpenCV库(用于图像处理)
- NumPy库(支持高效数值计算)
- Pandas库(提供强大的数据分析功能)
- Seaborn和Matplotlib库(用于高级的数据可视化)
- 涉及多个函数与类模块,主要用于数据分割与模型评估功能。
- 包括以下几个主要部分:
- 基于TensorFlow的深度学习框架
- 基于Keras的深度学习框架
- 模型构建模块:包括Sequence API与Functional API
- 层定义组件:如Conv2D层与Dense层等
- 回调函数集合:包括EarlyStopping与ModelCheckpoint
- 应用化模型包:涵盖ResNet50与VGG16系列
- 优化器组件:SGD优化器与Adam优化器
- 数据生成工具箱:ImageDataGenerator与image_dataset_from_directory
2. 数据读取与预处理(1、读取数据并预处理部分)
- 设置随机种子(seed_value = 42) :通过
np.random.seed、tf.random.set_seed和random.seed确保在数据处理和模型训练过程中随机性的可重复性。 - 数据生成器创建与数据集划分 :
- 训练数据生成器(train_datagen) :使用
ImageDataGenerator创建训练数据生成器。设置了rescale=1./255将图像像素值缩放到0 - 1范围,validation_split=0.2将训练数据中的20%划分为验证集。同时注释了一些数据增强相关的参数(如旋转、平移、缩放、翻转等),这些参数可用于增加训练数据的多样性。 - 验证数据生成器(validation_datagen) :仅进行像素值缩放(rescale)操作,用于从训练集目录中获取验证集数据。
- 训练集生成(train_dataset) :通过
train_datagen.flow_from_directory从指定的训练集目录(‘Alzheimer_s Dataset/train/’)中读取图像数据,将图像大小调整为(224, 224),设置类别模式为categorical(多类分类),获取训练子集,每个批次大小为32张图像。 - 验证集生成(validation_dataset) :类似地,从训练集目录中获取验证子集,参数与训练集生成基本一致。
- 测试数据生成器(test_datagen)和测试集生成(test_dataset) :与验证数据生成器类似,创建测试数据生成器并从指定的测试集目录(‘Alzheimer_s Dataset/test/’)中读取图像数据,图像大小调整和批次大小等参数与训练集相同。
- 训练数据生成器(train_datagen) :使用
3. 数据处理及可视化(2、数据处理及可视化部分)
在代码中调用了evaluate_and_plot(VGGmodel,x_test, y_test)这一函数操作。然而该函数的具体实现细节并未在当前代码片段中给出。根据函数名称推测其功能可能是:基于VGG模型对测试样本集(其中x_test和y_test分别代表输入样本特征向量与对应的目标标签向量)进行性能评估,并通过可视化手段展示评估结果的同时输出各类别样本的数量统计信息。
就整个项目而言,在深度学习领域中我们设计并实现了用于图像分类任务的深度学习框架。该框架的主要模块包括数据导入与预处理环节(其中涉及数据增强准备以及数据集划分策略),同时对模型评估和结果可视化部分进行了实现(尽管目前对模型评估模块的实现较为初步)。通过代码结构和注释信息可以看出,该系统很可能针对的是阿尔茨海默病相关图像数据集进行分类训练。
全部评论 (0)
还没有任何评论哟~
