Advertisement

(五)人工智能进阶:基础概念解释

阅读量:

前面我们介绍了人工智能是如何成为一个强大函数。接下来,搞清损失函数、优化方法和正则化等核心概念,才能真正驾驭它!
请添加图片描述

1. 什么是网络模型?

网络模型就像是一个精密的流水线工厂,由多个车间(层)组成,每个车间都负责特定的加工任务。原材料(输入数据)在这条流水线上逐步加工,最终产出成品(预测结果)。

基本组成部分

  1. 输入层 :接收原始数据
  2. 隐藏层 :进行数据处理转换
  3. 输出层 :产生最终结果
复制代码
    import numpy as np
    
    class SimpleNeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # 初始化网络参数
        self.hidden_weights = np.random.randn(input_size, hidden_size)
        self.hidden_bias = np.zeros(hidden_size)
        self.output_weights = np.random.randn(hidden_size, output_size)
        self.output_bias = np.zeros(output_size)
    
    def relu(self, x):
        """激活函数:小于0则置0,大于0保持原值"""
        return np.maximum(0, x)
    
    def forward(self, x):
        """前向传播:数据通过网络的过程"""
        # 第一层转换
        self.hidden = self.relu(np.dot(x, self.hidden_weights) + self.hidden_bias)
        # 第二层转换
        self.output = np.dot(self.hidden, self.output_weights) + self.output_bias
        return self.output
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/fyO6K5CSw2MA8vjFkoEbnL1lgme9.png)

常见网络模型类型

1. 前馈神经网络(最基础的模型)
复制代码
    class FeedForwardNetwork:
    def __init__(self):
        self.layers = [
            {"neurons": 128, "activation": "relu"},
            {"neurons": 64, "activation": "relu"},
            {"neurons": 10, "activation": "softmax"}
        ]
    
    
    python
    
    
2. 卷积神经网络(处理图像)
复制代码
    class SimpleCNN:
    def __init__(self):
        self.layers = [
            {"type": "conv2d", "filters": 32, "kernel_size": 3},
            {"type": "maxpool", "size": 2},
            {"type": "conv2d", "filters": 64, "kernel_size": 3},
            {"type": "flatten"},
            {"type": "dense", "neurons": 10}
        ]
    
    
    python
    
    
3. 循环神经网络(处理序列)
复制代码
    class SimpleRNN:
    def __init__(self, input_size, hidden_size):
        self.hidden_size = hidden_size
        # 初始化权重
        self.Wx = np.random.randn(input_size, hidden_size)   # 输入权重
        self.Wh = np.random.randn(hidden_size, hidden_size)  # 隐状态权重
        self.b = np.zeros(hidden_size)                       # 偏置
    
    
    python
    
    

模型的实际应用示例

  1. 图像识别模型
复制代码
    def image_recognition_model():
    model = {
        "conv1": {"filters": 32, "kernel_size": 3},
        "pool1": {"size": 2},
        "conv2": {"filters": 64, "kernel_size": 3},
        "pool2": {"size": 2},
        "flatten": {},
        "dense1": {"units": 128},
        "dense2": {"units": 10}
    }
    return model
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/QsEf8kZxYgcbGnJ2tPyD3FUVCX4p.png)
  1. 文本处理模型
复制代码
    def text_processing_model():
    model = {
        "embedding": {"vocab_size": 10000, "embed_dim": 100},
        "lstm": {"units": 64, "return_sequences": True},
        "global_pool": {},
        "dense": {"units": 1, "activation": "sigmoid"}
    }
    return model
    
    
    python
    
    

模型的特点

  1. 层次结构
复制代码
    class LayeredNetwork:
    def __init__(self):
        self.architecture = [
            ("input", 784),           # 输入层:接收原始数据
            ("hidden", 256, "relu"),  # 隐藏层:特征提取
            ("hidden", 128, "relu"),  # 隐藏层:特征组合
            ("output", 10, "softmax") # 输出层:生成预测
        ]
    
    
    python
    
    
  1. 参数学习
复制代码
    def train_step(model, inputs, targets):
    # 前向传播
    predictions = model.forward(inputs)
    # 计算损失
    loss = calculate_loss(predictions, targets)
    # 反向传播
    gradients = calculate_gradients(loss)
    # 更新参数
    model.update_parameters(gradients)
    return loss
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/26jVv4RJDh0ZlGmXu53IrMUFA8sa.png)
  1. 特征提取能力
复制代码
    def extract_features(model, input_data):
    features = []
    # 逐层提取特征
    for layer in model.layers:
        input_data = layer.process(input_data)
        features.append(input_data)
    return features
    
    
    python
    
    

模型选择建议

根据任务类型选择合适的模型:

  1. 图像处理 :使用CNN
复制代码
    def choose_model(task_type):
    if task_type == "image":
        return CNN()
    elif task_type == "text":
        return RNN()
    elif task_type == "tabular":
        return FeedForwardNetwork()
    
    
    python
    
    
  1. 文本处理 :使用RNN或Transformer
  2. 表格数据 :使用前馈神经网络

示例:完整的模型定义

复制代码
    class ComprehensiveModel:
    def __init__(self, input_shape, num_classes):
        self.input_shape = input_shape
        self.num_classes = num_classes
        
    def build(self):
        model = {
            # 特征提取部分
            "feature_extractor": [
                {"type": "conv2d", "filters": 32, "kernel_size": 3},
                {"type": "maxpool", "size": 2},
                {"type": "conv2d", "filters": 64, "kernel_size": 3},
                {"type": "maxpool", "size": 2}
            ],
            
            # 分类部分
            "classifier": [
                {"type": "flatten"},
                {"type": "dense", "units": 128, "activation": "relu"},
                {"type": "dropout", "rate": 0.5},
                {"type": "dense", "units": self.num_classes, "activation": "softmax"}
            ]
        }
        return model
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/csY3d4eIrEuMNWHlkJabXV1wZvQR.png)

这个网络模型就像一个智能工厂:

  • 输入层是原料验收处
  • 隐藏层是各个加工车间
  • 输出层是成品检验处
  • 参数是工人的操作技能
  • 激活函数是工人的操作方法
  • 训练过程就是工人练习和提升技能的过程

通过这种方式,网络模型能够学习处理各种复杂的任务,从图像识别到语言翻译,从游戏对弈到自动驾驶。

2. 什么是学习?

想象你在教一个小孩认识猫:

  • 开始时,他可能把所有毛茸茸的动物都叫做猫
  • 通过不断看例子,他逐渐学会区分猫和狗
  • 最后,他能准确认出猫

在AI中,学习就是:

  1. 看大量例子(数据)
  2. 调整模型参数
  3. 提高预测准确率
复制代码
    # 简单的学习过程示例
    class SimpleModel:
    def __init__(self):
        self.weight = 1.0  # 初始参数
    
    def predict(self, x):
        return self.weight * x
    
    def learn(self, x, true_value, learning_rate):
        prediction = self.predict(x)
        error = true_value - prediction
        # 调整参数
        self.weight += learning_rate * error
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/9jxaTEk8Z03SPMdReJ5vmQNciLH1.png)

3. 什么是学习率?

学习率就像是学习时的"步子大小":

  • 太大:容易跨过最佳答案(学得太快,容易过头)
  • 太小:需要很长时间才能找到答案(学得太慢)
复制代码
    # 不同学习率的效果
    def train_with_different_learning_rates():
    learning_rates = [0.1, 0.01, 0.001]
    for lr in learning_rates:
        model = SimpleModel()
        for _ in range(100):
            model.learn(x=2, true_value=4, learning_rate=lr)
    
    
    python
    
    

4. 什么是损失函数?

损失函数就像是"考试成绩",用来衡量模型预测得有多准:

  • 预测越准确,分数越低
  • 预测越差,分数越高

常见的损失函数:

复制代码
    import numpy as np
    
    # 均方误差(MSE)
    def mse_loss(predictions, targets):
    return np.mean((predictions - targets) ** 2)
    
    # 平均绝对误差(MAE)
    def mae_loss(predictions, targets):
    return np.mean(np.abs(predictions - targets))
    
    # 交叉熵损失(用于分类问题)
    def cross_entropy_loss(predictions, targets):
    return -np.sum(targets * np.log(predictions))
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/Ob07PvHZcudUVT9hjCWR3Lis6eoa.png)

5. 什么是优化器?

优化器就像是"学习策略",决定如何调整模型参数:

常见优化器示例:

复制代码
    class SGD:
    def __init__(self, learning_rate=0.01):
        self.lr = learning_rate
    
    def update(self, parameter, gradient):
        return parameter - self.lr * gradient
    
    class Momentum:
    def __init__(self, learning_rate=0.01, momentum=0.9):
        self.lr = learning_rate
        self.momentum = momentum
        self.velocity = 0
    
    def update(self, parameter, gradient):
        self.velocity = self.momentum * self.velocity - self.lr * gradient
        return parameter + self.velocity
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/iVhSREz9MuZHdXrQKW3CFeN47nGY.png)

6. 什么是收敛?

收敛就像是"学有所成"的状态:

  • 模型的表现趋于稳定
  • 损失不再明显下降
  • 预测结果基本符合预期
复制代码
    def check_convergence(loss_history, tolerance=1e-5):
    """检查是否收敛"""
    if len(loss_history) < 2:
        return False
    
    recent_loss_change = abs(loss_history[-1] - loss_history[-2])
    return recent_loss_change < tolerance
    
    
    python
    
    

7. 什么是正则化?

正则化就像是给模型设置"课外作业",防止它"死记硬背"(过拟合):

复制代码
    # L1正则化(Lasso)
    def l1_regularization(weights, lambda_param):
    return lambda_param * np.sum(np.abs(weights))
    
    # L2正则化(Ridge)
    def l2_regularization(weights, lambda_param):
    return lambda_param * np.sum(weights ** 2)
    
    # Dropout正则化
    def dropout(layer_output, dropout_rate=0.5):
    mask = np.random.binomial(1, 1-dropout_rate, size=layer_output.shape)
    return layer_output * mask / (1-dropout_rate)
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/2Z0NyDTAmzIn39KCMF5wukX6LGVq.png)

实际应用示例

让我们把这些概念组合起来:

复制代码
    class SimpleNeuralNetwork:
    def __init__(self):
        self.weights = np.random.randn(10)
        self.optimizer = Momentum()
        self.loss_history = []
    
    def train(self, x, y, epochs=1000):
        for epoch in range(epochs):
            # 前向传播
            prediction = self.predict(x)
            
            # 计算损失
            loss = mse_loss(prediction, y)
            self.loss_history.append(loss)
            
            # 计算梯度
            gradient = self.calculate_gradient(x, y)
            
            # 更新参数
            self.weights = self.optimizer.update(self.weights, gradient)
            
            # 检查是否收敛
            if check_convergence(self.loss_history):
                print(f"模型在第 {epoch} 轮收敛")
                break
    
    def predict(self, x):
        return np.dot(x, self.weights)
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-17/yhmaK6GNex9TpgVLz2EkCqu8J4FS.png)

小结

这些概念环环相扣:

  1. 函数定义了模型的结构
  2. 学习让模型不断改进
  3. 学习率决定改进的步子大小
  4. 损失函数评估模型表现
  5. 优化器指导参数更新
  6. 收敛标志学习完成
  7. 正则化防止过度学习

就像学习骑自行车:

  • 函数是自行车的结构
  • 学习是练习的过程
  • 学习率是每次调整的幅度
  • 损失函数是摔倒的次数
  • 优化器是练习的方法
  • 收敛是学会骑车
  • 正则化是在不同路况下练习

延伸阅读

深度学习中的优化器解析:从 SGD 到 Adam - https://ruder.io/optimizing-gradient-descent/

神经网络基础:一文搞懂前向传播与反向传播 - https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c

理解 LSTM 网络工作原理 - https://colah.github.io/posts/2015-08-Understanding-LSTMs/

深入浅出 Batch Normalization - https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c

一文理解深度学习中的正则化技术 - https://neptune.ai/blog/fighting-overfitting-with-l1-or-l2-regularization

可视化理解卷积神经网络 - https://poloclub.github.io/cnn-explainer/

深度学习中的学习率设置技巧 - https://www.jeremyjordan.me/nn-learning-rate/

损失函数最优化指南 - https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/

Transformer模型详解:理解自注意力机制 - https://jalammar.github.io/illustrated-transformer/

深度学习中的激活函数对比 - https://mlfromscratch.com/activation-functions-explained/

梯度下降优化算法总结 - https://towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3

深度学习模型训练技巧:实用指南 - https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel

交叉验证与模型评估详解 - https://scikit-learn.org/stable/modules/cross_validation.html

神经网络架构搜索入门 - https://lilianweng.github.io/posts/2020-08-06-nas/

深度学习中的数据增强技术 - https://neptune.ai/blog/data-augmentation-in-deep-learning

全部评论 (0)

还没有任何评论哟~