半监督学习
半监督学习
- Label Spreading
 - 1.Overview
 - 2.Detailed Implementation
 - 3.Code and Example
 - 
- 3.1 Data Set Construction Process
 
 - 
- 
3.2 Implementation of Semi-Supervised Learning for Label Propagation
 - 
参考文献
 
 - 
 
在应用机器学习模型进行训练之前
通过融合有标记样本与无标记样本的信息...能够有效提升分类性能
本文结合sklearn实现,介绍Label Spreading的具体过程。
1.Label Spreading
1.1 简介
所述论文中提到, 该方法基于两个一致性假设, 这些一致性的性质体现在局部与全局的一致关系上。
- 1)具有较高的相似度(local coherence)的点倾向于表现出相同的标签;
 - 2)内在的一致性(global stability)的点通常会共享相同的标签。
 
假如我们仅利用标注数据进行监督学习,则可能会仅仅满足第一条假设 Local Consistency。
在文中举例说明时, 例如某个数据集, 其中仅标注了两个样本(每个类别各一个)。从满足局部一致性(Local Consistency)和全局一致性(Global Consistency)的角度分析, 我们知道理想情况下, 在图b中我们可以预期达到这样的效果。然而, 当我们实际应用支持向量机(SVM)或k-近邻分类器(k取值为1)时, 它们的表现均不理想, 这是因为它们主要依赖于局部一致性信息

1.2 具体实现
考虑给定的数据集x_1,x_2,...,x_n划分为c个类别,在其中i=1,2,...,l的数据点带有标签,在i=l+1,...,n的数据点则未被标记。
1)构造矩阵Y_{n*c}:当x_i为第j类时Y_{ij}=1,否则为Y_{ij}=0;
2)矩阵F_0=Y;
3)构造邻接矩阵/关联矩阵(affinity_matrix) W_{n*n}对角阵degree matrix D_{n*n},
\qquad W_{ij}=\exp({-\|{x_i-x_j}\|}^2/{2\sigma^2)};如果i=j,W_{ij}=0;
\qquad D_{ii}=\sum_jW_{ij};
\qquad S=D^{-{\frac{1}{2}}} WD^{-{\frac{1}{2}}}
迭代n_iters次:
\qquad F_{t+1}= \alpha S F_{t}+(1- \alpha)Y;
输出结果:
\qquad y_{i}= \argmax F_{i}。
可以看出,\alpha、S和Y是固定不变的,每次迭代只更新F。
在另外的情况下,在迭代之前的阶段进行标签传播的过程与谱聚类的相关内容具有相似之处。具体参考一文GET Kmeans、DBSCAN、GMM、谱聚类Spectral clustering 算法
1.3 代码及示例
这些代码均源自scikit-learn官方版本中的sklearn.semi_supervised._label_propagation.py项目。
1.3.1 构建数据集
这里构建了一个双月形的数据集,设定标注过的数据只是极个别。
    from sklearn.metrics.pairwise import euclidean_distances
    from sklearn.neighbors import NearestNeighbors
    from sklearn.base import BaseEstimator, ClassifierMixin
    from scipy.sparse import csgraph
    import numpy as np
    from abc import ABCMeta, abstractmethod
    from sklearn import datasets
    import matplotlib.pyplot as plt
    
    np.random.seed(123)
    n_samples = 200
    noisy_moons = datasets.make_moons(n_samples=n_samples, noise=0.05)
    features = noisy_moons[0]
    labels = noisy_moons[1]
    
    c_s = ['r', 'b', 'darkgrey']
    rng = np.random.RandomState(42)
    # labels = labels[:]
    random_unlabeled_points = rng.rand(len(labels)) < 0.95
    labels = np.copy(labels)
    labels[random_unlabeled_points] = -1
    
    mk = ["^", "o"]
    markers = np.array([mk[item] for item in labels])
    colors = np.array([c_s[item] for item in labels])
    
    
    plt.scatter(features[labels == -1][:, 0], features[labels == -1][:, 1],
            c=colors[labels == -1], marker="^", label="unlabeled")
    plt.scatter(features[labels == 1][:, 0], features[labels == 1]
            [:, 1], c=colors[labels == 1], marker="o", label="labeled 0")
    plt.scatter(features[labels == 0][:, 0], features[labels == 0]
            [:, 1], c=colors[labels == 0], marker="o", label="labeled 1")
    
    plt.legend(scatterpoints=1, frameon=False,
           labelspacing=1, loc='lower left')
    plt.show()
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读
        
1.3.2 使用label spreading半监督学习
    def rbf_kernel(X, Y=None, gamma=None):
    
    if gamma is None:
        gamma = 1.0 / X.shape[1]
    
    K = euclidean_distances(X, Y, squared=True)
    K *= -gamma
    np.exp(K, K)  # exponentiate K in-place
    return K
    
    
      
      
      
      
      
      
      
      
      
    
    代码解读
            class LabelSpreading:
    
    def __init__(self, kernel="rbf",  gamma=20,
                 alpha=0.99, max_iter=20, tol=1e-3,):
    
        self.max_iter = max_iter
        self.tol = tol
    
        # kernel parameters
        self.kernel = kernel
        self.gamma = gamma
    
        # clamping factor
        self.alpha = alpha
    
    def _get_kernel(self, X, y=None):
        if y is None:
            return rbf_kernel(X, X, gamma=self.gamma)
        else:
            return rbf_kernel(X, y, gamma=self.gamma)
    
    def _build_graph(self):
        """Graph matrix for Label Spreading computes the graph laplacian"""
    
        n_samples = self.X_.shape[0]
        affinity_matrix = self._get_kernel(self.X_)
    
        laplacian = csgraph.laplacian(affinity_matrix, normed=True)
        laplacian = -laplacian
    
        laplacian.flat[:: n_samples + 1] = 0.0  # set diag to 0.0
    
        return laplacian
    
    def predict(self, X):
        probas = self.predict_proba(X)
        return self.classes_[np.argmax(probas, axis=1)].ravel()
    def predict_proba(self, X_2d):
        weight_matrices = self._get_kernel(self.X_, X_2d)
        weight_matrices = weight_matrices.T
        probabilities = weight_matrices @ self.label_distributions_
        normalizer = np.atleast_2d(np.sum(probabilities, axis=1)).T
        probabilities /= normalizer
        return probabilities
    
    def fit(self, X, y):
    
        self.X_ = X
    
        # actual graph construction (implementations should override this)
        graph_matrix = self._build_graph()
    
        # label construction
        # construct a categorical distribution for classification only
        classes = np.unique(y)
        classes = classes[classes != -1]
        self.classes_ = classes
    
        n_samples, n_classes = len(y), len(classes)
    
        alpha = self.alpha
    
        y = np.asarray(y)
        unlabeled = y == -1
    
        # initialize distributions
        self.label_distributions_ = np.zeros((n_samples, n_classes))
        for label in classes:
            self.label_distributions_[y == label, classes == label] = 1
    
        y_static = np.copy(self.label_distributions_)
        y_static *= 1 - alpha
    
        l_previous = np.zeros((self.X_.shape[0], n_classes))
    
        unlabeled = unlabeled[:, np.newaxis]
    
        for self.n_iter_ in range(self.max_iter):
            if np.abs(self.label_distributions_ - l_previous).sum() < self.tol:
                break
    
            l_previous = self.label_distributions_
            self.label_distributions_ = graph_matrix @ self.label_distributions_
    
            # clamp
            self.label_distributions_ = (
                np.multiply(alpha, self.label_distributions_) + y_static
            )
            transduction = self.classes_[
                np.argmax(self.label_distributions_, axis=1)]
            self.transduction_ = transduction.ravel()
            print(self.n_iter_)
            colors = [c_s[item] for item in self.transduction_]
            plt.scatter(self.X_[:, 0], self.X_[:, 1], c=colors)
            plt.show()
        return self
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读
            label_prop_model = LabelSpreading()
    label_prop_model.fit(features[:], labels)
    
    
      
      
    
    代码解读
        label spreading效果确实比较犀利:

值得注意的是,在文献中建议采用\alpha值为0.99以获得更好的效果。
参考文献
[1] 该方法通过局部一致性与全局一致性进行整合以提升数据学习效果(http://www.kernel-machines.org/papers/upload_12169_LLGC_NIPS03.pdf)
[2] 这一API接口位于scikit-learn库的半监督学习模块中(https://scikit-learn.org/stable/modules/generated/sklearn.semi_supervised.LabelSpreading.html#sklearn.semi_supervised.LabelSpreading)
