Advertisement

深度学习基础知识 deep learning

阅读量:

作者:禅与计算机程序设计艺术

1.简介

In recent years, deep learning has made significant advancements in the domains of computer vision (CV) and natural language processing (NLP). Neural networks can effectively learn intricate patterns from vast datasets, rendering them highly effective for tasks such as image classification, object detection, speech recognition, and text analysis. A notable technique within this framework is the neural decision tree model (NDT), which can be applied to both CV and NLP challenges. This article aims to introduce neural decision tree models (NDTs), explain their functionality, and explore their applications in natural language processing and computer vision. The discussion will begin with a clear definition of a decision tree model, an examination of its development motivations, followed by an overview of its limitations before moving on to more detailed explanations.

2.定义、术语说明

A decision tree is a machine learning algorithm that partitions data into smaller groups based on certain conditions. It works by recursively splitting the dataset into two parts at each step, with the condition determining whether to split on one attribute or another. At each node of the tree, we make a prediction based on the majority class label of the samples falling within that node. For example, consider the following decision tree:
Here, we want to classify samples based on two features - “temperature” and “humidity”. We first check if temperature is less than or equal to 25 degrees Celsius. If it is, we move down to the left branch, where we predict that all the samples are “good.” Otherwise, we predict that most of the samples are “bad.” Then, we repeat this process for the right branch, but now based on humidity instead. Based on our sample data, we would say that there’s not much difference between high humidity (“bad”) and low humidity (“good”), so we might end up with a binary decision tree like this:
However, decision trees have limitations. They require careful feature engineering and pruning techniques to handle noisy or missing data, and they tend to overfit to training data when the number of inputs is small. To address these issues, researchers have developed ensemble methods such as random forests and gradient boosting machines, which combine multiple decision trees to reduce variance and improve generalization performance.
Neural decision trees (NDTs) are similar to standard decision trees, but they use artificial neurons instead of traditional statistical measures to determine splits. Specifically, an NDT consists of a series of nodes representing decisions made at each step. Each node receives input from previous layers, and passes output along to subsequent layers until it reaches the final layer, where it outputs a predicted class label. An NDT differs from a regular decision tree in terms of architecture and learning mechanism. In contrast to regular decision trees, NDTs learn non-parametric models using backpropagation through time (BPTT), which allows them to capture non-linear dependencies and interactions among features. Furthermore, NDTs do not rely on feature selection or dimensionality reduction, since they model entire functions rather than individual variables.

Another significant distinction between NDTs and decision trees lies in their distinct approaches to class representation. Standard decision trees employ binary classification through labels such as “yes” or “no,” whereas NDTs utilize real-valued representations, often expressed as probabilities ranging between zero and one. Although this difference may appear minor, it significantly impacts aspects like loss functions, accuracy metrics, and model optimization strategies. Furthermore, it is worth noting that despite shared characteristics, these models differ notably in terms of computational efficiency, memory consumption, and scalability. Consequently, NDTs have gained increasing popularity across fields such as computer vision and natural language processing due to their ability to achieve state-of-the-art performance across diverse domains.

3.核心算法原理和具体操作步骤

NDTs由多个组成部分组成,包括输入层、隐藏层和输出层等关键模块。每个节点代表学习过程中某一步骤所作出的一个决策。输入层接收原始数据点,并将计算结果传递给下一层处理;而隐藏层则通过计算输入向量的加权和并施加非线性激活函数来完成信息处理;最后的输出层基于上一层激活信号生成预测结果。
让我们深入探讨各个组件的具体工作流程:如图所示,在一个简化的NDT架构中,
输入层接收原始数据点并将其传递给后一层进行处理,
而输出层则根据前一层的信息生成预测结果。
中间各层次节点通过计算输入向量的加权和并施加激活函数来完成信息转换,
这一过程会将信号传递到其 preceding 层次。
整个网络的目标是在训练过程中最小化预测值与真实标签之间的误差。
常用激活函数包括sigmoid、tanh、ReLU、LeakyReLU和ELU等。
在推理阶段,
网络仅返回预测结果而不更新权重参数。

Training involves adjusting the parameters of the network to minimize the error between the predicted output and the ground truth targets. Backpropagation Through Time (BPTT) is commonly used to update the weights at each layer during training. BPTT uses recursive updates to calculate gradients and propagate errors backwards through the network. After computing the gradients for each weight parameter, the network uses stochastic gradient descent to update the parameters in the opposite direction of the gradient, effectively minimizing the total cost function. BPTT is computationally expensive, especially for deep networks, but it does allow us to train very complex models using relatively few training examples.
There are various ways to construct NDT structures. Some popular choices include fully connected feedforward NDTs, convolutional NDTs, and recursive NDTs. Fully connected NDTs are essentially linear regression models, where each node connects directly to every input variable and includes bias terms to account for changes in intercepts. Convolutional NDTs are popular for image classification, where each pixel is treated as a separate feature and fed through a set of filters to extract salient features from the images. Recursive NDTs involve nesting additional decision trees inside each parent node to create higher levels of hierarchy.

4.具体代码实例和解释说明

Before exploring further into the theory within neural decision trees, let me take a look at some practical code examples. These will help demonstrate how NDTs work. We'll utilize Python libraries such as scikit-learn and PyTorch for this purpose. It's advisable to install these libraries prior to execution of the code.

Text Classification Example: IMDB Movie Review Dataset

Within Scikit-learn's framework lies an integrated implementation of Normalized Distance Trees (NDTs) tailored for text classification tasks. We utilize it to establish an NDT classifier specifically designed for analyzing IMDB movie review datasets. The process begins with retrieving this dataset from http://ai.stanford.edu/~amaas/data/sentiment/. Once retrieved, we prepare it by removing stop words, converting each word in lowercase, and either stemming or lemmatizing them as needed. Following this preprocessing step, we convert each review into a sequence of word indices during tokenization. This tokenization process involves mapping each word to its corresponding index in a vocabulary list. To ensure consistency across sequences, we perform padding operations during processing. Once all reviews are processed and padded appropriately, we partition this dataset into training and testing subsets. The next step involves training our NDT classifier on this training data using machine learning algorithms. Finally, after completing model training, we assess its performance on validation sets using appropriate evaluation metrics such as accuracy or F1-score before deploying it as part of our code implementing these steps is provided below

复制代码
    import pandas as pd
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.pipeline import Pipeline
    from sklearn.metrics import accuracy_score
    df = pd.read_csv('imdb_reviews.csv')
    # Preprocess text data
    def preprocess(text):
       stopwords = ['the', 'and', 'a']
       tokens = [token.lower() for token in text.split()
             if token.lower() not in stopwords]
       return''.join(tokens)
    df['review'] = df['review'].apply(preprocess)
    # Vectorize the text data
    tfidf = TfidfVectorizer(stop_words='english', max_features=5000)
    X = tfidf.fit_transform(df['review']).toarray()
    y = df['label']
    # Split the data into training and testing sets
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
                                                  random_state=42)
    # Train an NDT classifier
    pipeline = Pipeline([
       ('clf', MultinomialNB())
    ])
    pipeline.fit(X_train, y_train)
    # Evaluate the classifier
    y_pred = pipeline.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print("Accuracy:", acc)
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

This code initially acquires a dataset and prepares text data for analysis. Subsequently, it constructs a matrix in its standard form for document representation and partitions the dataset into separate training and testing portions. Following this, it establishes a Non-negative Decision Tree classifier framework using built-in tools from scikit-learn and trains this model on the training set. Finally, it deploys inference capabilities on the test dataset to evaluate performance by outputting an accuracy score. Notes indicate that results may vary marginally due to inherent class imbalance in datasets (with a higher proportion of negative reviews relative to positive ones).

Image Classification Example: MNIST Handwritten Digits Data Set

Let us attempt to construct an NDT tailored for image classification, utilizing the MNIST dataset of handwritten digits. Starting with this, we will load the dataset and display several representative images. Additionally, we will standardize pixel values within a range of zero to one and reorganize images into a flattened vector structure. Here’s the code:

复制代码
    import torch
    import torchvision
    import matplotlib.pyplot as plt
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    # Load the dataset
    mnist_dataset = torchvision.datasets.MNIST('./mnist/',
                                          transform=torchvision.transforms.ToTensor(),
                                          download=True)
    x, y = [], []
    for i in range(len(mnist_dataset)):
       img, lbl = mnist_dataset[i]
       if len(img.shape) == 2:
       continue # skip grayscale images
    x.append(img / 255.)
       y.append(lbl)
    x = torch.stack(x).reshape(-1, 784).to(device)
    y = torch.tensor(y, dtype=torch.long).to(device)
    fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(12, 6))
    for i in range(10):
       row, col = int(i // 5), int(i % 5)
       idx = np.where(y == i)[0][0]
       ax = axes[row][col]
       ax.imshow(x[idx].view(28, 28).cpu().numpy(), cmap='gray')
       ax.axis('off')
       ax.set_title(str(y[idx]))
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

This code loads the dataset and creates a list of flattened tensors containing normalized pixel values and corresponding labels. It then selects ten randomly chosen images and plots them side by side, showing the raw pixel values. The code assumes you’ve already installed the matplotlib library.
Next, we’ll define a simple NDT for image classification using PyTorch. We’ll use four fully connected layers and ReLU activation functions. The output layer will have one unit per possible class label, and the softmax activation function will convert the raw scores to probabilities. Here’s the code:

复制代码
    import numpy as np
    class NDTClassifier(torch.nn.Module):
       def __init__(self, input_dim, num_classes):
       super().__init__()
        self.fc1 = torch.nn.Linear(input_dim, 128)
       self.relu1 = torch.nn.ReLU()
       self.fc2 = torch.nn.Linear(128, 64)
       self.relu2 = torch.nn.ReLU()
       self.fc3 = torch.nn.Linear(64, 32)
       self.relu3 = torch.nn.ReLU()
       self.fc4 = torch.nn.Linear(32, num_classes)
       self.softmax = torch.nn.Softmax(dim=-1)
    def forward(self, x):
       x = self.fc1(x)
       x = self.relu1(x)
       x = self.fc2(x)
       x = self.relu2(x)
       x = self.fc3(x)
       x = self.relu3(x)
       x = self.fc4(x)
       x = self.softmax(x)
       return x
    ndt_classifier = NDTClassifier(784, 10).to(device)
    optimizer = torch.optim.Adam(ndt_classifier.parameters(), lr=0.001)
    criterion = torch.nn.CrossEntropyLoss()
    epochs = 10
    batch_size = 32
    for epoch in range(epochs):
       permutation = torch.randperm(x.size(0))
       correct_count = 0
       for i in range(0, x.size(0), batch_size):
       indices = permutation[i:i+batch_size]
       inputs, labels = x[indices], y[indices]
        optimizer.zero_grad()
       outputs = ndt_classifier(inputs)
       loss = criterion(outputs, labels)
       loss.backward()
       optimizer.step()
         # Update statistics
       pred_labels = torch.argmax(outputs, dim=-1)
       correct_count += (pred_labels == labels).sum().item()
       
    # Print stats after each epoch
       acc = correct_count / x.size(0)
       print("Epoch", epoch+1, ": Loss =", round(loss.item(), 4), ", Accuracy =", round(acc, 4))
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

This code implements a custom NDT classifier using PyTorch. The framework initializes four fully connected layers, integrates a softmax function, and establishes cross-entropy loss as the objective for training. After training for ten epochs, it computes accuracy metrics on each training batch to monitor performance.

全部评论 (0)

还没有任何评论哟~