Advertisement

半监督SVM

阅读量:

标题

半监督SVM

监督学习中的SVM试图找到一个划分超平面,使得两侧支持向量之间的间隔最大,即“最大划分间隔”思想。对于半监督学习,S3VM则考虑超平面需穿过数据低密度的区域。TSVM是半监督支持向量机中的最著名代表,其核心思想是:尝试为未标记样本找到合适的标记指派,使得超平面划分后的间隔最大化。TSVM采用局部搜索的策略来进行迭代求解,即首先使用有标记样本集训练出一个初始SVM,接着使用该学习器对未标记样本进行打标,这样所有样本都有了标记,并基于这些有标记的样本重新训练SVM,之后再寻找易出错样本不断调整。整个算法流程如下所示:

复制代码
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn import datasets
    from sklearn import svm
    from sklearn.semi_supervised import LabelSpreading
    
    rng = np.random.RandomState(0)
    
    iris = datasets.load_iris()
    
    X = iris.data[:, :2]
    y = iris.target
    
    # step size in the mesh
    h = .02
    
    y_30 = np.copy(y)
    y_30[rng.rand(len(y)) < 0.3] = -1
    y_50 = np.copy(y)
    y_50[rng.rand(len(y)) < 0.5] = -1
    # we create an instance of SVM and fit out data. We do not scale our
    # data since we want to plot the support vectors
    ls30 = (LabelSpreading().fit(X, y_30), y_30)
    ls50 = (LabelSpreading().fit(X, y_50), y_50)
    ls100 = (LabelSpreading().fit(X, y), y)
    rbf_svc = (svm.SVC(kernel='rbf', gamma=.5).fit(X, y), y)
    
    # create a mesh to plot in
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))
    
    # title for the plots
    titles = ['Label Spreading 30% data',
          'Label Spreading 50% data',
          'Label Spreading 100% data',
          'SVC with rbf kernel']
    
    color_map = {-1: (1, 1, 1), 0: (0, 0, .9), 1: (1, 0, 0), 2: (.8, .6, 0)}
    
    for i, (clf, y_train) in enumerate((ls30, ls50, ls100, rbf_svc)):
    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    plt.subplot(2, 2, i + 1)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    
    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
    plt.axis('off')
    
    # Plot also the training points
    colors = [color_map[y] for y in y_train]
    plt.scatter(X[:, 0], X[:, 1], c=colors, edgecolors='black')
    
    plt.title(titles[i])
    
    plt.suptitle("Unlabeled points are colored white", y=0.1)
    plt.show()
在这里插入图片描述

全部评论 (0)

还没有任何评论哟~