Machine Learning--SVMs
Machine Learning–SVMs
基本介绍
linearly sperate : 可以用直线分隔数据的两部分的情况
hyberplane : 决策边界,一般是 N-1 维的
margin : 最近元素离 hyberplane 的距离,margin = label * (WT X+ b)
support vecotors : 距离 hyberplane 最近的点(们)
SVM 就是要找到具有最大 margin 的 hyberplane

先找到距离 hyberplane 最近的 n 个点,然后计算它们的 margin ,最后让这 n 个点的 margin 值之和最大。
为了简化计算,用 Largrange mulitiplier 来代替上面的计算:

注意这里面假设所有的数据都是 linearly sperate 的,为了提高准确性,引入 slack variables ,c 是我们设置的常量。

因此 SVMs 的目标变成了寻找 αs
Sequential Minimal Optimization
SMO 是用来寻找 αs 的优化算法
simple SMO
# open the file and parse each line into class labels and data matrix
def loadDataSet(fileName):
dataMat = []
labelMat = []
fr = open(fileName)
for line in fr.readlines():
lineArr = line.strip().split('\t')
dataMat.append([float(lineArr[0]), float(lineArr[1])])
labelMat.append(float(lineArr[2]))
return dataMat, labelMat
# return an alpha as long as it is not the same as the present one(i), m is the total number of alphas
def selectJrand(i, m):
j=i
while (j==i):
j = int(random.uniform(0, m))
return j
# get ones higher than H, and lower than L
def clipAlpha(aj, H, L):
if aj > H:
aj = H
if L > aj:
aj = L
return aj
# toler, tolerance; maxIter, max number of iterations before quitting;
def smoSimple(dataMatIn, classLabels, C, toler, maxIter):
dataMatrix = mat(dataMatIn)
# a column matrix
labelMat = mat(classLabels).transpose()
b = 0
m, n = shape(dataMatrix)
# a column matrix initialized to zero
alphas = mat(zeros((m, 1)))
# each time we go through the dataset without any alphas changed, iter will increase by 1.
iter = 0
while (iter < maxIter):
# once getting into and completely going through the loop, alphaPairChanged will change to 1
alphaPairsChanged = 0
for i in range(m):
# our prediction of the class
fXi = float(multiply(alphas, labelMat).T * (dataMatrix*dataMatrix[i, :].T)) + b
# the error between the real class
Ei = fXi - float(labelMat[i])
# the error is large enough and alphas[i](which will be changed later) is in the right range, which means \
# alpha could be optimized.
if((labelMat[i]*Ei < -toler) and (alphas[i] < C)) or ((labelMat[i]*Ei > toler) and (alphas[i] > 0)):
# randomly select a second alpha[i](the first alpha is alphas[i])
j = selectJrand(i, m)
# calculate the prediction and error as done on alphas[i]
fXj = float(multiply(alphas, labelMat).T * (dataMatrix*dataMatrix[j, :].T)) + b
Ej = fXj - float(labelMat[j])
# get the old value of calculated alphas[i] and alphas[j].
alphaIold = alphas[i].copy()
alphaJold = alphas[j].copy()
# calculate the L and H in order to make alpha[j] between 0 and C.
if(labelMat[i] != labelMat[j]):
L = max(0, alphas[j] - alphas[i])
H = min(C, C + alphas[j] - alphas[i])
else:
L = max(0, alphas[j] + alphas[i] - C)
H = min(C, alphas[i] + alphas[j])
# if L == H, we could not change anything.
if L==H:
print("L == h")
continue
# calculate the optimal amount to change alpha[j]
eta = 2.0 * dataMatrix[i, :] * dataMatrix[j, :].T - dataMatrix[i, :] * dataMatrix[i, :].T - \
dataMatrix[j, :] * dataMatrix[j, :].T
if eta >= 0:
print("eta >= 0")
continue
# calculate the new value of alpha[j] with eta and makes it between L and H.
alphas[j] -= labelMat[j] * (Ei - Ej) / eta
alphas[j] = clipAlpha(alphas[j], H, L)
# check if alpha[j] changes big enough.
if (abs(alphas[j] - alphaJold) < 0.00001):
print("J not moving enough")
continue
# change alpha[i] the same amount but in different direction.
alphas[i] += labelMat[j] * labelMat[i] * (alphaJold - alphas[j])
# set const b to b1 for alpha[i] and b2 for alpha[j](if they are still in range 0 and C after optimiza-\
# tion), if either of them are, make b their average.
b1 = b - Ei - labelMat[i] * (alphas[i] - alphaIold) * dataMatrix[i, :]*dataMatrix[i, :].T - \
labelMat[j]*(alphas[j] - alphaJold)*dataMatrix[i, :]*dataMatrix[j, :].T
b2 = b - Ej - labelMat[i] * (alphas[i] - alphaIold) * dataMatrix[i, :]*dataMatrix[i, :].T - \
labelMat[j]*(alphas[j] - alphaJold)*dataMatrix[j, :]*dataMatrix[j, :].T
if (0 < alphas[i]) and (C > alphas[i]): b = b1
elif (0 < alphas[j]) and (C > alphas[j]): b = b2
else: b = (b1 + b2) / 2.0
# once we reach here, it means we have successfully changed a pair of alphas.
alphaPairsChanged += 1
print("iter: %d i:%d, pairs changed %d" %(iter, i, alphaPairsChanged))
# check if any alphas have been changed, if so, set iter to 0 and continue, we will exit the while loop until w-
# e reach the maxIter times of going through the dataset.
if (alphaPairsChanged == 0): iter += 1
else: iter = 0
print("iteration number: %d" % iter)
return b, alphas
python

代码注释很清晰,不想看的话直接看流程图,紫色的流程是完全改变了两个 alpha 的流程
流程图有一些问题,最后没有改变 iter 来重复循环(倒数第三、四行代码)

full Platt SMO
与 simple SMO 差别在于第一个 alpha 的取值,之前是循环取,这里外层又重新加了一层循环, Platt SMO 用 heuristics 的方法
实际做法 :效果是能直接跳过无法被 change 的 alphas, 加快速度。另外是步长的算法也变了。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5SoZiykN-1575957149004)(C:\Users\24220\Downloads\wayOfPlattSMO.png)]
# make different variables in smoSimple into one data struct but add E cache.
class optStruct:
def __init__(self, dataMatIn, classLabels, C, toler):
self.X = dataMatIn
self.labelMat = classLabels
self.C = C
self.tol = toler
self.m = shape(dataMatIn)[0]
self.alphas = mat(zeros((self.m, 1)))
self.b = 0
# the first column states if this E is valid and the second column is the value of Ek
self.eCache = mat(zeros((self.m, 2)))
def calcEk(oS, k):
# replace variables in smoSimple with optStruct properties and calculate alpha[k]'s prediction and error.
fXk = float(multiply(oS.alphas, oS.labelMat).T*(oS.X*oS.X[k, :].T)) + oS.b
Ek = fXk - float(oS.labelMat[k])
return Ek
# i is the index of the first alpha, we will choose i randomly here.
def selectJ(i, oS, Ei):
maxK = -1
maxDeltaE = 0
Ej = 0
# set Ei to valid, valid means it has been calculated.
oS.eCache[i] = [1, Ei]
# nonzero() returns the alphas whose E aren't 0s.
validEcacheList = nonzero(oS.eCache[:, 0].A)[0]
if (len(validEcacheList)) > 1:
# loop over the valid list and get the value makes the maximum changes on E.
for k in validEcacheList:
# skip this if coming cross i.
if k == i: continue
Ek= calcEk(oS, k)
deltaE = abs(Ei - Ek)
if deltaE > maxDeltaE:
maxK = k
maxDeltaE= deltaE
Ej = Ek
return maxK, Ej
else:
j = selectJrand(i, oS.m)
Ej = calcEk(oS, j)
return j, Ej
# calculate Ek and put it into eCache.
def updateEk(oS, k):
Ek = calcEk(oS, k)
oS.eCache[k]= [1, Ek]
# almost the same as the simplified SMO, this function would return 1 if the
def innerL(i, oS):
Ei = calcEk(oS, i)
if (oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C) or ((oS.labelMat[i]*Ei > oS.tol) and (oS.alphas[i] > 0)):
# now the j number is not randomly chosen, but the one makes the maximum step size.
j, Ej = selectJ(i, oS, Ei)
alphaIold = oS.alphas[i].copy()
alphaJold = oS.alphas[j].copy()
if oS.labelMat[i] != oS.labelMat[j]:
L = max(0, oS.alphas[j] - oS.alphas[i])
H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i])
else:
L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C)
H = min(oS.C, oS.alphas[j] + oS.alphas[i])
if L == H:
print("L == H")
return 0
eta = 2.0 * oS.X[i, :] * oS.X[j, :].T - oS.X[i, :] * oS.X[i, :].T - oS.X[j, :] * oS.X[j, :].T
if eta >= 0:
print("eta >= 0")
return 0
oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta
oS.alphas[j] = clipAlpha(oS.alphas[j], H, L)
updateEk(oS, j)
if abs(oS.alphas[j] - alphaJold) < 0.00001:
print("J is not moving enough")
return 0
oS.alphas[i] += oS.labelMat[j]*oS.labelMat[i] * (alphaJold - oS.alphas[j])
updateEk(oS, i)
b1 = oS.b - Ei - oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i, :]*oS.X[i, :].T - oS.labelMat[j] * \
(oS.alphas[j]-alphaJold) * oS.X[i, :] * oS.X[j, :].T
b2 = oS.b - Ei - oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i, :]*oS.X[j, :].T - oS.labelMat[j] * \
(oS.alphas[j]-alphaJold) * oS.X[j, :] * oS.X[j, :].T
if 0 < oS.alphas[i] and (oS.C > oS.alphas[i]):
oS.b = b1
elif 0 < oS.alphas[j] and (oS.C > oS.alphas[j]):
oS.b = b2
else:
oS.b = (b1 + b2) / 2.0
return 1
else:
return 0
# kTup will be used later.
def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup=("lin", 0)):
oS = optStruct(mat(dataMatIn), mat(classLabels).transpose(), C, toler)
iter = 0
entireSet = True
alphaPairsChanged = 0
# use iter, alphaPairsChanged and entireSet to control the loop.
# the loop will not stop with continue statements.
while iter < maxIter and ((alphaPairsChanged > 0) or (entireSet)):
alphaPairsChanged = 0
if entireSet:
for i in range(oS.m):
# alphaPairsChanged will add 1 or 0.
alphaPairsChanged += innerL(i, oS)
print("fullSet, iter: %d i: %d, pairs changed %d" %(iter, i, alphaPairsChanged))
iter += 1
else:
nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0]
for i in nonBoundIs:
alphaPairsChanged += innerL(i, oS)
print("non-bound, iter: %d,i: %d, pairs changed %d" %(iter, i, alphaPairsChanged))
iter += 1
# toggle between the two for loops.
if entireSet: entireSet = False
elif alphaPairsChanged == 0: entireSet = True
print("iteration number: %d" %iter)
return oS.b, oS.alphas
python

代码中主要是把变量提到 class 中,然后将各个计算函数独立出来,计算的不同主要是加了一层外层循环,然后时随机选取 alpha 变为有依据地选取,加快了速度。
Classification
先计算 w, margin = label * (WT X+ b), marigin 始终为正,WTX + b 与 label 同号:
# only support vectors contribute to w' calculation(the other are 0s)
def calcWs(alphas, dataArr, classLabels):
X = mat(dataArr)
labelMat = mat(classLabels).transpose()
m, n = shape(X)
w = zeros((n, 1))
for i in range(m):
w += multiply(alphas[i] * labelMat[i], X[i, :].T)
return w
python
As for circles(not linearly separate) ?
事实上我们是通过找 boundry 来做分类的,分类依据是 WTX + b 正负,如果这个边界变成一个圆,就可以解决下面的这种分类问题:

我们需要写一些 kernel ,我们这里用的是 radial bias function
由于 SVM 中所有的操作几乎都可以写成矢量内积的形式,直接用一个函数返回值替代内积值的方法叫做 kernel tricks
radial bias function(a kernel)

最终将原来的内积(1 维)形式改成 n 维向量 K
def kernelTrans(X, A, kTup):
# kTup[0] tells which type of kernel here will use, linear or radial bias function, and kTup[1] is the parameter the function may need.
m, n = shape(X)
K = mat(zeros((m, 1)))
if kTup[0] == 'lin': K = X * A.T
# here we use Gaussian version of rbf to calculate the k.
elif kTup[0] == 'rbf':
for j in range(m):
deltaRow = X[j, :] - A
K[j] = deltaRow * deltaRow.T
K = exp(K / (-1*kTup[1]**2))
else:
raise NameError('Houston we have a problem that the kernel is not recognized.')
return K
python

test the kernel
def testRbf(k1 = 1.3):
# use testSetRBF to train to get the b, alohas, support vectors indexes and so on.
dataArr, labelArr = loadDataSet('testSetRBF.txt')
b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', k1))
datMat = mat(dataArr)
labelMat = mat(labelArr).transpose()
# say it again, nonzero() returns the list of alphas.
svInd = nonzero(alphas.A > 0)[0]
sVs = datMat[svInd]
labelSV = labelMat[svInd]
print("there are %d support vectors" % shape(sVs)[0])
m, n = shape(datMat)
errorCount = 0
for i in range(m):
# the kTup's first value is the type, and the second is the parameter the rbf may need, here is the sigma.
kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1))
predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
if sign(predict) != sign(labelArr[i]): errorCount += 1
print("the training error rate is: %f" %(float(errorCount) / m))
# testSetRBF2.txt uses something calculated in the front to do the test.
dataArr, labelArr = loadDataSet("testSetRBF2.txt")
errorCount = 0
datMat = mat(dataArr)
labelMat = mat(labelArr).transpose()
m, n = shape(datMat)
for i in range(m):
kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1))
predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
if sign(predict) != sign(labelArr[i]): errorCount += 1
print("the test error rate is: %f" % (float(errorCount) / m))
python

一个用来训练,另一个用来测试,但最终发现通过调节 sigma (Gaussian version 的 RBF 所需的参数, kTup[1])有的时候 sv 会过多,而 sv 少的话错误率又会提高,书中提到有最佳值,但是没说怎么算。
in Action
终于到了 in action。
def testDigits(kTup=('rbf', 10)):
dataArr, labelArr = loadImages('trainingDigits')
b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup)
datMat = mat(dataArr); labelMat = mat(labelArr).transpose()
svInd = nonzero(alphas.A > 0)[0]
sVs = datMat[svInd]
labelSV = labelMat[svInd]
print("there are %d Support Vectors" % shape(sVs)[0])
m, n = shape(datMat)
errorCount = 0
for i in range(m):
kernelEval = kernelTrans(sVs, datMat[i, :], kTup)
predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b
if sign(predict) != sign(labelArr[i]): errorCount += 1
print("the training error rate is: %f" % (float(errorCount)/m))
dataArr, labelArr = loadImages('testDigits')
errorCount = 0
datMat=mat(dataArr); labelMat = mat(labelArr).transpose()
m, n = shape(datMat)
for i in range(m):
kernelEval = kernelTrans(sVs, datMat[i, :], kTup)
predict=kernelEval.T * multiply(labelSV, alphas[svInd]) + b
if sign(predict)!=sign(labelArr[i]): errorCount += 1
print("the test error rate is: %f" % (float(errorCount)/m))
python

例子是 kNN 的数字识别,不过使用了 svm, 但是发现使用 rbf 的时候参数不是很容易找到一个合适的(如前面所说,难以控制 sv 的数量,导致正确率和参考价值取其一),相比之下 linearly separate 速度虽慢,但是正确率高。
svm 算法自身涉及数学推理,但是其构建过程相当复杂。
