Advertisement

Machine Learning--SVMs

阅读量:

Machine Learning–SVMs

基本介绍

linearly sperate : 可以用直线分隔数据的两部分的情况

hyberplane : 决策边界,一般是 N-1 维的

margin : 最近元素离 hyberplane 的距离,margin = label * (WT X+ b)

support vecotors : 距离 hyberplane 最近的点(们)

SVM 就是要找到具有最大 marginhyberplane

先找到距离 hyberplane 最近的 n 个点,然后计算它们的 margin ,最后让这 n 个点的 margin 值之和最大。

为了简化计算,用 Largrange mulitiplier 来代替上面的计算:

注意这里面假设所有的数据都是 linearly sperate 的,为了提高准确性,引入 slack variables ,c 是我们设置的常量。

因此 SVMs 的目标变成了寻找 αs

Sequential Minimal Optimization

SMO 是用来寻找 αs 的优化算法

simple SMO

复制代码
    # open the file and parse each line into class labels and data matrix
    
    def loadDataSet(fileName):
    dataMat = []
    labelMat = []
    fr = open(fileName)
    for line in fr.readlines():
        lineArr = line.strip().split('\t')
        dataMat.append([float(lineArr[0]), float(lineArr[1])])
        labelMat.append(float(lineArr[2]))
    return dataMat, labelMat
    
    # return an alpha as long as it is not the same as the present one(i), m is the total number of alphas
    def selectJrand(i, m):
    j=i
    while (j==i):
        j = int(random.uniform(0, m))
    return j
    
    # get ones higher than H, and lower than L
    def clipAlpha(aj, H, L):
    if aj > H:
        aj = H
    if L > aj:
        aj = L
    return aj
    
    # toler, tolerance; maxIter, max number of iterations before quitting;
    def smoSimple(dataMatIn, classLabels, C, toler, maxIter):
    dataMatrix = mat(dataMatIn)
    # a column matrix
    labelMat = mat(classLabels).transpose()
    b = 0
    m, n = shape(dataMatrix)
    # a column matrix initialized to zero
    alphas = mat(zeros((m, 1)))
    # each time we go through the dataset without any alphas changed, iter will increase by 1.
    iter = 0
    while (iter < maxIter):
        # once getting into and completely going through the loop, alphaPairChanged will change to 1
        alphaPairsChanged = 0
        for i in range(m):
            # our prediction of the class
            fXi = float(multiply(alphas, labelMat).T * (dataMatrix*dataMatrix[i, :].T)) + b
            # the error between the real class
            Ei = fXi - float(labelMat[i])
            # the error is large enough and alphas[i](which will be changed later) is in the right range, which means \
            # alpha could be optimized.
            if((labelMat[i]*Ei < -toler) and (alphas[i] < C)) or ((labelMat[i]*Ei > toler) and (alphas[i] > 0)):
                # randomly select a second alpha[i](the first alpha is alphas[i])
                j = selectJrand(i, m)
                # calculate the prediction and error as done on alphas[i]
                fXj = float(multiply(alphas, labelMat).T * (dataMatrix*dataMatrix[j, :].T)) + b
                Ej = fXj - float(labelMat[j])
                # get the old value of calculated alphas[i] and alphas[j].
                alphaIold = alphas[i].copy()
                alphaJold = alphas[j].copy()
                # calculate the L and H in order to make alpha[j] between 0 and C.
                if(labelMat[i] != labelMat[j]):
                    L = max(0, alphas[j] - alphas[i])
                    H = min(C, C + alphas[j] - alphas[i])
                else:
                    L = max(0, alphas[j] + alphas[i] - C)
                    H = min(C, alphas[i] + alphas[j])
                # if L == H, we could not change anything.
                if L==H:
                    print("L == h")
                    continue
                # calculate the optimal amount to change alpha[j]
                eta = 2.0 * dataMatrix[i, :] * dataMatrix[j, :].T - dataMatrix[i, :] * dataMatrix[i, :].T - \
                      dataMatrix[j, :] * dataMatrix[j, :].T
                if eta >= 0:
                    print("eta >= 0")
                    continue
                # calculate the new value of alpha[j] with eta and makes it between L and H.
                alphas[j] -= labelMat[j] * (Ei - Ej) / eta
                alphas[j] = clipAlpha(alphas[j], H, L)
                # check if alpha[j] changes big enough.
                if (abs(alphas[j] - alphaJold) < 0.00001):
                    print("J not moving enough")
                    continue
                # change alpha[i] the same amount but in different direction.
                alphas[i] += labelMat[j] * labelMat[i] * (alphaJold - alphas[j])
                # set const b to b1 for alpha[i] and b2 for alpha[j](if they are still in range 0 and C after optimiza-\
                # tion), if either of them are, make b their average.
                b1 = b - Ei - labelMat[i] * (alphas[i] - alphaIold) * dataMatrix[i, :]*dataMatrix[i, :].T - \
                     labelMat[j]*(alphas[j] - alphaJold)*dataMatrix[i, :]*dataMatrix[j, :].T
                b2 = b - Ej - labelMat[i] * (alphas[i] - alphaIold) * dataMatrix[i, :]*dataMatrix[i, :].T - \
                     labelMat[j]*(alphas[j] - alphaJold)*dataMatrix[j, :]*dataMatrix[j, :].T
                if (0 < alphas[i]) and (C > alphas[i]): b = b1
                elif (0 < alphas[j]) and (C > alphas[j]): b = b2
                else: b = (b1 + b2) / 2.0
                # once we reach here, it means we have successfully changed a pair of alphas.
                alphaPairsChanged += 1
                print("iter: %d i:%d, pairs changed %d" %(iter, i, alphaPairsChanged))
        # check if any alphas have been changed, if so, set iter to 0 and continue, we will exit the while loop until w-
        # e reach the maxIter times of going through the dataset.
        if (alphaPairsChanged == 0): iter += 1
        else: iter = 0
        print("iteration number: %d" % iter)
    return b, alphas
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-16/zYn1PJuUMjwhlAfFG3pI58ERBQck.png)

代码注释很清晰,不想看的话直接看流程图,紫色的流程是完全改变了两个 alpha 的流程

流程图有一些问题,最后没有改变 iter 来重复循环(倒数第三、四行代码)

full Platt SMO

与 simple SMO 差别在于第一个 alpha 的取值,之前是循环取,这里外层又重新加了一层循环, Platt SMO 用 heuristics 的方法

实际做法 :效果是能直接跳过无法被 change 的 alphas, 加快速度。另外是步长的算法也变了。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5SoZiykN-1575957149004)(C:\Users\24220\Downloads\wayOfPlattSMO.png)]

复制代码
    # make different variables in smoSimple into one data struct but add E cache.
    class optStruct:
    def __init__(self, dataMatIn, classLabels, C, toler):
        self.X = dataMatIn
        self.labelMat = classLabels
        self.C = C
        self.tol = toler
        self.m = shape(dataMatIn)[0]
        self.alphas = mat(zeros((self.m, 1)))
        self.b = 0
        # the first column states if this E is valid and the second column is the value of Ek
        self.eCache = mat(zeros((self.m, 2)))
    
    
    def calcEk(oS, k):
    # replace variables in smoSimple with optStruct properties and calculate alpha[k]'s prediction and error.
    fXk = float(multiply(oS.alphas, oS.labelMat).T*(oS.X*oS.X[k, :].T)) + oS.b
    Ek = fXk - float(oS.labelMat[k])
    return Ek
    
    
    # i is the index of the first alpha, we will choose i randomly here.
    def selectJ(i, oS, Ei):
    maxK = -1
    maxDeltaE = 0
    Ej = 0
    # set Ei to valid, valid means it has been calculated.
    oS.eCache[i] = [1, Ei]
    # nonzero() returns the alphas whose E aren't 0s.
    validEcacheList = nonzero(oS.eCache[:, 0].A)[0]
    if (len(validEcacheList)) > 1:
        # loop over the valid list and get the value makes the maximum changes on E.
        for k in validEcacheList:
            # skip this if coming cross i.
            if k == i: continue
            Ek= calcEk(oS, k)
            deltaE = abs(Ei - Ek)
            if deltaE > maxDeltaE:
                maxK = k
                maxDeltaE= deltaE
                Ej = Ek
        return maxK, Ej
    else:
        j = selectJrand(i, oS.m)
        Ej = calcEk(oS, j)
        return j, Ej
    
    
    # calculate Ek and put it into eCache.
    def updateEk(oS, k):
    Ek = calcEk(oS, k)
    oS.eCache[k]= [1, Ek]
    
    
    # almost the same as the simplified SMO, this function would return 1 if the
    def innerL(i, oS):
    Ei = calcEk(oS, i)
    if (oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C) or ((oS.labelMat[i]*Ei > oS.tol) and (oS.alphas[i] > 0)):
        # now the j number is not randomly chosen, but the one makes the maximum step size.
        j, Ej = selectJ(i, oS, Ei)
        alphaIold = oS.alphas[i].copy()
        alphaJold = oS.alphas[j].copy()
        if oS.labelMat[i] != oS.labelMat[j]:
            L = max(0, oS.alphas[j] - oS.alphas[i])
            H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i])
        else:
            L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C)
            H = min(oS.C, oS.alphas[j] + oS.alphas[i])
        if L == H:
            print("L == H")
            return 0
        eta = 2.0 * oS.X[i, :] * oS.X[j, :].T - oS.X[i, :] * oS.X[i, :].T - oS.X[j, :] * oS.X[j, :].T
        if eta >= 0:
            print("eta >= 0")
            return 0
        oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta
        oS.alphas[j] = clipAlpha(oS.alphas[j], H, L)
        updateEk(oS, j)
        if abs(oS.alphas[j] - alphaJold) < 0.00001:
            print("J is not moving enough")
            return 0
        oS.alphas[i] += oS.labelMat[j]*oS.labelMat[i] * (alphaJold - oS.alphas[j])
        updateEk(oS, i)
        b1 = oS.b - Ei - oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i, :]*oS.X[i, :].T - oS.labelMat[j] * \
            (oS.alphas[j]-alphaJold) * oS.X[i, :] * oS.X[j, :].T
        b2 = oS.b - Ei - oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i, :]*oS.X[j, :].T - oS.labelMat[j] * \
            (oS.alphas[j]-alphaJold) * oS.X[j, :] * oS.X[j, :].T
        if 0 < oS.alphas[i] and (oS.C > oS.alphas[i]):
            oS.b = b1
        elif 0 < oS.alphas[j] and (oS.C > oS.alphas[j]):
            oS.b = b2
        else:
            oS.b = (b1 + b2) / 2.0
        return 1
    else:
        return 0
    
    
    # kTup will be used later.
    def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup=("lin", 0)):
    oS = optStruct(mat(dataMatIn), mat(classLabels).transpose(), C, toler)
    iter = 0
    entireSet = True
    alphaPairsChanged = 0
    # use iter, alphaPairsChanged and entireSet to control the loop.
    # the loop will not stop with continue statements.
    while iter < maxIter and ((alphaPairsChanged > 0) or (entireSet)):
        alphaPairsChanged = 0
        if entireSet:
            for i in range(oS.m):
                # alphaPairsChanged will add 1 or 0.
                alphaPairsChanged += innerL(i, oS)
            print("fullSet, iter: %d i: %d, pairs changed %d" %(iter, i, alphaPairsChanged))
            iter += 1
        else:
            nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0]
            for i in nonBoundIs:
                alphaPairsChanged += innerL(i, oS)
                print("non-bound, iter: %d,i: %d, pairs changed %d" %(iter, i, alphaPairsChanged))
            iter += 1
        # toggle between the two for loops.
        if entireSet: entireSet = False
        elif alphaPairsChanged == 0: entireSet = True
        print("iteration number: %d" %iter)
    return oS.b, oS.alphas
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-16/lURDsqTxEro3MCeAgiPJGk1VcwmI.png)

代码中主要是把变量提到 class 中,然后将各个计算函数独立出来,计算的不同主要是加了一层外层循环,然后时随机选取 alpha 变为有依据地选取,加快了速度。

Classification

先计算 w, margin = label * (WT X+ b), marigin 始终为正,WTX + b 与 label 同号:

复制代码
    # only support vectors contribute to w' calculation(the other are 0s)
    def calcWs(alphas, dataArr, classLabels):
    X = mat(dataArr)
    labelMat = mat(classLabels).transpose()
    m, n = shape(X)
    w = zeros((n, 1))
    for i in range(m):
        w += multiply(alphas[i] * labelMat[i], X[i, :].T)
    return w
    
    
    python
    
    

As for circles(not linearly separate) ?

事实上我们是通过找 boundry 来做分类的,分类依据是 WTX + b 正负,如果这个边界变成一个圆,就可以解决下面的这种分类问题:

我们需要写一些 kernel ,我们这里用的是 radial bias function

由于 SVM 中所有的操作几乎都可以写成矢量内积的形式,直接用一个函数返回值替代内积值的方法叫做 kernel tricks

radial bias function(a kernel)

最终将原来的内积(1 维)形式改成 n 维向量 K

复制代码
    def kernelTrans(X, A, kTup):
    # kTup[0] tells which type of kernel here will use, linear or radial bias function, and kTup[1] is the parameter the function may need.
    m, n = shape(X)
    K = mat(zeros((m, 1)))
    if kTup[0] == 'lin': K = X * A.T
    # here we use Gaussian version of rbf to calculate the k.
    elif kTup[0] == 'rbf':
        for j in range(m):
            deltaRow = X[j, :] - A
            K[j] = deltaRow * deltaRow.T
        K = exp(K / (-1*kTup[1]**2))
    else:
        raise NameError('Houston we have a problem that the kernel is not recognized.')
    return K
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-16/YK7DGu0BT9CpL6QJex2yzFnsSAcZ.png)

test the kernel

复制代码
    def testRbf(k1 = 1.3):
    # use testSetRBF to train to get the b, alohas, support vectors indexes and so on.
    dataArr, labelArr = loadDataSet('testSetRBF.txt')
    b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', k1))
    datMat = mat(dataArr)
    labelMat = mat(labelArr).transpose()
    # say it again, nonzero() returns the list of  alphas.
    svInd = nonzero(alphas.A > 0)[0]
    sVs = datMat[svInd]
    labelSV = labelMat[svInd]
    print("there are %d support vectors" % shape(sVs)[0])
    m, n = shape(datMat)
    errorCount = 0
    for i in range(m):
        # the kTup's first value is the type, and the second is the parameter the rbf may need, here is the sigma.
        kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1))
        predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
        if sign(predict) != sign(labelArr[i]): errorCount += 1
    print("the training error rate is: %f" %(float(errorCount) / m))
    # testSetRBF2.txt uses something calculated in the front to do the test.
    dataArr, labelArr = loadDataSet("testSetRBF2.txt")
    errorCount = 0
    datMat = mat(dataArr)
    labelMat = mat(labelArr).transpose()
    m, n = shape(datMat)
    for i in range(m):
        kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1))
        predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
        if sign(predict) != sign(labelArr[i]): errorCount += 1
    print("the test error rate is: %f" % (float(errorCount) / m))
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-16/3JuCwQ5qMmZRcOHVjFiD0gdaoXBK.png)

一个用来训练,另一个用来测试,但最终发现通过调节 sigma (Gaussian version 的 RBF 所需的参数, kTup[1])有的时候 sv 会过多,而 sv 少的话错误率又会提高,书中提到有最佳值,但是没说怎么算。

in Action

终于到了 in action。

复制代码
    def testDigits(kTup=('rbf', 10)):
    dataArr, labelArr = loadImages('trainingDigits')
    b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup)
    datMat = mat(dataArr); labelMat = mat(labelArr).transpose()
    svInd = nonzero(alphas.A > 0)[0]
    sVs = datMat[svInd]
    labelSV = labelMat[svInd]
    print("there are %d Support Vectors" % shape(sVs)[0])
    m, n = shape(datMat)
    errorCount = 0
    for i in range(m):
        kernelEval = kernelTrans(sVs, datMat[i, :], kTup)
        predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
        if sign(predict) != sign(labelArr[i]): errorCount += 1
    print("the training error rate is: %f" % (float(errorCount)/m))
    dataArr, labelArr = loadImages('testDigits')
    errorCount = 0
    datMat=mat(dataArr); labelMat = mat(labelArr).transpose()
    m, n = shape(datMat)
    for i in range(m):
        kernelEval = kernelTrans(sVs, datMat[i, :], kTup)
        predict=kernelEval.T * multiply(labelSV, alphas[svInd]) + b
        if sign(predict)!=sign(labelArr[i]): errorCount += 1
    print("the test error rate is: %f" % (float(errorCount)/m))
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-16/8UHw0ATkjcmaMbvEo5Ch4VFGtle2.png)

例子是 kNN 的数字识别,不过使用了 svm, 但是发现使用 rbf 的时候参数不是很容易找到一个合适的(如前面所说,难以控制 sv 的数量,导致正确率和参考价值取其一),相比之下 linearly separate 速度虽慢,但是正确率高。

svm 算法自身涉及数学推理,但是其构建过程相当复杂。

全部评论 (0)

还没有任何评论哟~