What is SVM algorithm
文章目录
Prelude
* Decision Tree Analysis
* * Advantages
* Disadvantages
* SVM Analysis
* * Advantages
* Disadvantages
* Adivces
- Mathematical foundation of Support Vector Machines (SVM)
-
Objective Function
- Lagrangian Duality
- Simplification of Constraints
- Illustrative Example
- Fuzzy Interval
- Kernel Mapping Function
-
Coding
-
- Data type
- Core
- forcast
- All code
- get data
-
Real case
-
The prcface
With the aim to enhance my English proficiency and preparing for the graduate entrance exam, I will write English blogs. Despite having produced a substantial amount of Chinese content, I must translate it into English independently, which is why my blog updates are expected to be slower. Nonetheless, if given additional time, I suppose I could also create a corresponding Chinese version.
okey let’s today’s blog. Welcome to my channel!
Before you read through this blog, I hope you have already been introduced to some basics about machine learning. It is not suitable for someone who hasn't learned about it.
target:
- The mathematrical principle of SVM algorithm
- how coding that
- Real case
no coding no future that’s go gays
Compare
If we mention the SVM algorithm, we should also mention other algorithms in machine learning like decision trees and logistic regression because all of them can help classify objects. However, why do we choose to use SVM instead of other algorithms? What makes it different?
Logical Regression
This algorithm allows for straightforward acquisition of the necessary skills. You are enabled to perform a probabilistic assessment of classification.

When encountering such an object instance (x₁, x₂, label) within our decision-making space, it is common to observe...

You will find it difficult to achieve good performance. Regardless of your efforts, the decision boundary obtained by the logical regression method remains inherently linear, making it fundamentally unable to generate the circular boundary required here. Therefore, logical regression is suitable for addressing classification problems that are practically linearly separable.
Decision tree
if we use this algorithm you can see this:

and then you will got this:

When a tree's size is consistently expanded, its boundary tends to be encircled by parallel lines. As a result, when a boundary is nonlinear and can be approximated by recursively partitioning feature space into rectangular regions, employing a decision tree becomes more advantageous compared to logistic regression.
SVM
Despite the data representing a two-dimensional feature space, we can employ the use of kernel functions to be mapped into a high-dimensional feature space for facilitating classification tasks.
so maybe you will see this:

and then you can get this:

Note: The deciding boundary does not follow a typical circular shape but remains extremely close (often polygons or other shapes). This approach was attempted rather than using rings as an alternative, which simplifies operations.
Currently, we are able to simply analyze which algorithm is suitable for various scenarios.
Analysis
Logistic Regression Analysis
A convenient and useful thing about logical regression is that the output is not a discrete value or an exact category. Instead, you get a list of probabilities associated with each observation sample. You can use different criteria and common performance metrics to analyze this probability score, get a threshold, and then categorize the output in the way that best suits your business problem. In the financial industry, this technique is widely used in scorecards, and for the same model, you can adjust your threshold to get different classification results. Few other algorithms use this score as a direct result. On the contrary, their output is a rigorous direct classification result. At the same time, logical regression is quite efficient in terms of time and memory requirements.
Furthermore, the logistic regression algorithm is resilient to minor to moderate levels of noise and is not significantly impacted by slight multicollinearity. To address severe multicollinearity, logistic regression combined with L2 regularization can be employed. However, if one seeks a more concise model, L2 regularization is not the optimal choice because the resulting model encompasses all feature variables.
When the feature count is high and data loss is significant, logical regression becomes inadequate. At the same time, an excessive number of categorical variables also pose a challenge for logical regression. The probability derived from using all the data in logical regression is its basis for prediction. When attempting to create a decision boundary, logical regression might overlook the 'obvious' data points at both score extremes. Idealistically, logical regression should focus on these boundary points. If certain features exhibit nonlinear relationships, addressing this typically involves transformations. As the feature space's dimensionality increases, such adjustments can become more complex.
Advantages
观测样本的概率分数较为简便。
现有工具的高效实现。
对于逻辑回归来说,多重共线性并非一个问题,并可通过L2正则化加以解决。
逻辑回归在工业问题中有着广泛的应用(这是一个非常关键的点)。
Disadvantages
When the feature space becomes extremely large, the effectiveness of logical regression falters significantly.
The system struggles to process massive quantities of multi-class features or variables.
Nonlinear features necessitate transformation to ensure accurate modeling.
In such cases, the model heavily relies on all available data points, making over-fitting particularly prone.
Decision Tree Analysis
The intrinsic characteristic of decision trees lies in their indifference to unidirectional transformations or nonlinear features (which differ from nonlinearities in predictors), as they merely place rectangles within the feature space, which can adapt to any monotonic transformation. When applied to discrete data or categorical predictors, decision trees do not face challenges with an arbitrary number of classification variables. The model developed through a decision tree is straightforward and easy to interpret for business purposes. Unlike some models that output probability scores directly, decision trees utilize class probabilities iteratively assigned to terminal nodes. This leads us to the primary limitation of decision trees: they are highly biased models. While a decision tree may perform well on training data, its application to test data often results in poor predictive performance. Therefore, pruning the tree and incorporating cross-validation are essential steps to develop an unbiased decision tree model without overfitting.
The random forest method effectively circumvents the significant drawback associated with overfitting, though its primary distinction lies in being an outstanding extension of decision trees. It complicates the ease of interpreting business rules because with thousands of such trees and their complex voting mechanisms, the model becomes increasingly intricate. Additionally, while decision tree variables exhibit interdependence, this relationship often proves ineffective when most variables lack interaction or exhibit weak strength. Furthermore, this design also diminishes their susceptibility to multicollinearity.
Advantages
直观的决策规则。
能处理非线性特性。
变量之间的相互关系被考虑进去。
Disadvantages
It is easy to fit, of course, it can be solved by random deep forest.
SVM Analysis
The key feature of support vector machines lies in their reliance on boundary samples to construct the desired separation curve. As observed, they are capable of managing nonlinear decision boundaries. Their dependence on boundaries also grants them the capability to address "obvious" sample instances in cases of missing data. Support vector machines are adept at handling large feature spaces, which makes them one of the most popular algorithms in text analysis due to their effectiveness with high-dimensional data. Given that text data typically generate a vast array of features, logistic regression is not an ideal choice for such scenarios as they often involve complex decision boundaries and high-dimensional spaces. In summary, support vector machines offer several advantages over other classification methods when dealing with high-dimensional data and complex decision boundaries; their reliance on boundary samples ensures robust performance even when some data points are missing or incomplete.
The outputs of support vector machines are less transparent to users than decision trees. However, by employing a nonlinear kernel, the training process for support vector machines on large datasets becomes notably slower.
Advantages
该模型具备处理大规模特征空间的能力。
该模型能够处理非线性特征之间的相互作用。
无需依赖全部数据
Disadvantages
When there are many observation samples, the efficiency is not very high.
Sometimes it is difficult to find a suitable kernel function
Adivces
But if its effects are not satisfactory, its outcomes can serve as a standard measure. However, given that logistic regression serves as an initial approach in many cases, it often bears the primary responsibility of handling foundational issues in data analysis.
Investigate whether decision trees (random forests) can significantly enhance the performance of the model. Even when it is not employed as the final model, random forests can be utilized to eliminate noise variables.
If the number of features and observation samples are significantly high, then under conditions where resources and time are available, SVM becomes a viable option.
Don’t find it too difficult, but in reality, you just require accessing a variety of APIs through sklearn.
Mathematrical principle of SVM
Target Function
okey is time to say SVM how it work what mathematrical about it?
Our target is sample that we just need to class objects in tow categorys.
just like this:
(If we don’t use kernel functions to calculate)

We need to determine a linear boundary or hypersurface in order to maximize the minimal distance between the nearest points on either side of the plane, in order to attain an effective separation between classes.
just like this:

We can assume that this hyperplane looks like this.

like this:

By calculating the distance between a point and a straight line, we can actually compute the distance from the point to the hyperplane.

(For example, the distance from a point (x0) to a straight line Ax+By+C=0:)

The distance from the point to the hyperplane:

However, since SVM is a binary algorithm, we can stipulate like this:

It denotes that the input data will represent certain patterns or characteristics. For example, (x₁,x₂,1) and (x₂₁,x₂₂,0)… The numbers 1 and 0 are used as labels to classify these patterns. Therefore, the distance between these points will be calculated as follows:

Determine a hyperplane with parameters w and b such that the nearest point is as far as possible. The Argmax operation identifies the value of w and b that minimizes the distance from the nearest point to this hyperplane.

tips:

To facilitate computation, we posit the following assumption, noting that scaling does not influence the distance from a point to a plane, thus:

So, we can reduce the distance formula to:

For better calculation, we change the maximum value to the minimum value.
And for the convenience of derivation, we add square.

so we need do it:

Lagrange duality
To solve this constrained problem, we need to use this


and then we get this:

but if we use this the function will be:

So We transform the dual problem.


so we can find the partial derivative.

Simplify

Last we can get this:

and then we make it

extend this:

(To make the equation clearer, we changed it a little bit.)

To:

with this Conditions

Likewise, for the purpose of simplifying the operation process, we convert the maximum value into the minimum one.
We just need to add a minus sign.
Finally, we got all the values we needed.
Substitute these values into the partial derivative formula derived earlier, and solve for w and b.

example
这个教学视频来自B站平台中的具体视频ID为BV15A4y1X7K1
and methematrical princple from there:
https://zhuanlan.zhihu.com/p/270298485




Soft interval
If we must adhere to the calculated range, it tends to happen easily.

so we make

to

and then target function become this:


Last the result be:

Kernel function
After analyzing through derivations and illustrative examples, you will discover that what has been accomplished thus far only involves linear processing steps, while what we obtain as a result does not constitute a nonlinear boundary. Therefore, additional information may be necessary to convert the distance between a point and a line into the face's distance. This transformation requires addressing it through a kernel function.
such as this:

It is challenging to divide entities into two distinct categories by means of a straight line.
But if the points above can look like this:

If in such a scenario, we are able to leverage a face to classify into two central parts. for it, we can utilize this approach:**

def kernelTrans(X, A, kTup):
m,n = shape(X)
K = mat(zeros((m,1)))
if kTup[0]=='lin': #linner nothing to do here
K = X * A.T
elif kTup[0]=='gaosi': # (gaosi function)
for j in range(m):
deltaRow = X[j,:] - A
K[j] = deltaRow*deltaRow.T
K = exp(K/(-1*kTup[1]**2))
else:
raise NameError("Son of a bitch doesn't have this kernel function.")
return K
代码解读
Still, it's worth mentioning that this video presents an example related to kernel functions. However, in our actual codebase, we have more direct examples. But also include some extra modifications to enhance functionality. The overall workflow mirrors what's described here.

Following this, the same calculation has been performed, but an additional mapping has been executed.
Due to its simplicity, I choose not to elaborate further on this matter. The reason is that explaining such details would require more extensive discussion. However, rest assured that its implementation will be demonstrated in the code. In fact, I have always intended to tackle this problem but have been hesitant about discussing these details further. Moreover, I must acknowledge Dr. Tang Yudi for his valuable contribution as mentioned in the video above. To clarify further, until now, I have long focused on extracting mathematical frameworks prior to applying Support Vector Machines.
Coding
Data type
Okay, the first critical aspect is determining the appropriate data type, as your program must correspond to the chosen datatype for your algorithm. Today, I'll focus on selecting an appropriate datatype for our coding decision tree algorithm. However, I'll store these datasets in a text file.

class DataSet(object):
def __init__(self,path):
self.Features=[]
self.Labels = []
self.path = path
def LoadDataSet(self):
if(os.path.exists(self.path)):
with open(self.path) as file:
for line in file.readlines():
lineArr = [float(x) for x in line.strip().split()]
self.Features.append([lineArr[0], lineArr[1]])
self.Labels.append(lineArr[2])
return self.Features, self.Labels
else:
raise Exception("Fuck you no such file")
代码解读
Core
when we start coding we must to know what is our core or target.
We now need to identify the appropriate actions to take. We possess a designated objective. Upon inputting our data, the system will generate an output resembling this form.

We require knowledge of alpha on our input data, after which we can determine the form of our function.

But if we want to get this,we need to do something make machine can do this:

That is, how to let the machine help us solve the equation?
Here, our central aim is ensuring that machines are able to perform automatic equation-solving operations.
The SMO algorithm has been introduced within the text of 《Statistical Learning Methods》. Notably, alternative heuristic approaches can also be employed for solving similar problems.
def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
iter = 0
entireSet = True
alphaPairsChanged = 0
while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
alphaPairsChanged = 0
if entireSet:
for i in range(self.m): # 遍历所有数据
alphaPairsChanged += self.KKTGoing(i)
print("fullSet, iter: %d i:%d, pairs changed %d" % (
iter, i, alphaPairsChanged))
iter += 1
else:
nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
for i in nonBoundIs:
alphaPairsChanged += self.KKTGoing(i)
print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
iter += 1
if entireSet:
entireSet = False
elif (alphaPairsChanged == 0):
entireSet = True
print("iteration number: %d" % iter)
return self.b, self.alphas
代码解读
forcast
当向量α和标量b被确定之后,则可获得相应的超平面;接着用新样本进行测试以计算其与该超平面的距离;从而实现分类任务。
def forcast(self,dataSet:DataSet,show=False):
res = []
dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
datMat_forcast = mat(dataArr_forcast)
m, n = shape(datMat_forcast)
for i in range(m): # 在测试数据上检验错误率
kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
res.append(predict)
if(show):
print("the result is:",res)
return res
代码解读
All code
now that’s we show all code:
from numpy import *
import os
class DataSet(object):
def __init__(self,path):
self.Features=[]
self.Labels = []
self.path = path
def LoadDataSet(self):
if(os.path.exists(self.path)):
with open(self.path) as file:
for line in file.readlines():
lineArr = [float(x) for x in line.strip().split()]
self.Features.append([lineArr[0], lineArr[1]])
self.Labels.append(lineArr[2])
return self.Features, self.Labels
else:
raise Exception("Fuck you no such file")
class SVMModel(object):
def __init__(self,Ktype):
self.Ktype = Ktype
def __SMO_init(self,Features, Labels, C, toler, Ktype):
"""
:param Features:
:param Labels:
:param C: Soft interval
:param toler: Stop threshold
:param Ktype:
"""
self.X = Features
self.labelMat = Labels
self.C = C
self.tol = toler
self.m = shape(Features)[0]
self.alphas = mat(zeros((self.m,1)))
self.b = 0
self.eCache = mat(zeros(shape(Features)))
self.K = mat(zeros((self.m,self.m)))
self.sVs = None
self.labelSV = None
self.svInd = None
for i in range(self.m):
self.K[:,i] = self.kernelFunction(self.X, self.X[i,:], Ktype)
def kernelFunction(self,X, A, Ktype):
"""
:param X:
:param A:
:param Ktype: (type,param)
:return:
"""
m, n = shape(X)
K = mat(zeros((m, 1)))
if Ktype[0] == 'lin':
K = X * A.T
elif Ktype[0] == 'rbf':
for j in range(m):
deltaRow = X[j, :] - A
K[j] = deltaRow * deltaRow.T
K = exp(K / (-1 * Ktype[1] ** 2))
else:
raise NameError('Houston We Have a Problem -- That Kernel is not recognized')
return K
def __SelectRand(self,i, m):
j = i
while (j == i):
j = int(random.uniform(0, m))
return j
def __SelectAj(self,i, oS, Ei):
maxK = -1
maxDeltaE = 0
Ej = 0
oS.eCache[i] = [1, Ei]
validEcacheList = nonzero(self.eCache[:, 0].A)[0] # 返回矩阵中的非零位置的行数
if (len(validEcacheList)) > 1:
for k in validEcacheList:
if k == i:
continue
Ek = self.__calcEk(k)
deltaE = abs(Ei - Ek)
if (deltaE > maxDeltaE):
maxK = k
maxDeltaE = deltaE
Ej = Ek
return maxK, Ej
else:
j = self.__SelectRand(i, self.m)
Ej = self.__calcEk(j)
return j, Ej
def __HoldAlpha(self,al, H, L):
#(L <= a <= H)
if (al > H):
al = H
elif(L > al):
al = L
return al
def __calcEk(self, k):
fXk = float(multiply(self.alphas, self.labelMat).T * self.K[:, k] + self.b)
Ek = fXk - float(self.labelMat[k])
return Ek
def __updateEk(self,k):
Ek = self.__calcEk(k)
self.eCache[k] = [1, Ek]
def KKTGoing(self,i):
"""
Refer to the following 《Statistical Learning Methods》.
First, check whether ai meets KKT conditions.
If not, randomly select aj for optimization
and update the values of AI, AJ and B.
:param self:
:return:
"""
Ei = self.__calcEk(i) # 计算E值
if ((self.labelMat[i] * Ei < -self.tol) and (self.alphas[i] < self.C)) or (
(self.labelMat[i] * Ei > self.tol) and (self.alphas[i] > 0)):
j, Ej = self.__SelectAj(i, self, Ei)
alphaIold = self.alphas[i].copy()
alphaJold = self.alphas[j].copy()
if (self.labelMat[i] != self.labelMat[j]):
L = max(0, self.alphas[j] - self.alphas[i])
H = min(self.C, self.C + self.alphas[j] - self.alphas[i])
else:
L = max(0, self.alphas[j] + self.alphas[i] - self.C)
H = min(self.C, self.alphas[j] + self.alphas[i])
if L == H:
print("L==H")
return 0
eta = 2.0 * self.K[i, j] - self.K[i, i] - self.K[j, j]
if eta >= 0:
print("eta>=0")
return 0
self.alphas[j] -= self.labelMat[j] * (Ei - Ej) / eta
self.alphas[j] = self.__HoldAlpha(self.alphas[j], H, L)
self.__updateEk(j)
if (abs(self.alphas[j] - alphaJold) < self.tol):
print("j not moving enough")
return 0
self.alphas[i] += self.labelMat[j] * self.labelMat[i] * (alphaJold - self.alphas[j])
self.__updateEk(i)
b1 = self.b - Ei - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, i] - self.labelMat[j] * (
self.alphas[j] - alphaJold) * self.K[i, j]
b2 = self.b - Ej - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, j] - self.labelMat[j] * (
self.alphas[j] - alphaJold) * self.K[j, j]
if (0 < self.alphas[i] < self.C):
self.b = b1
elif (0 < self.alphas[j] < self.C):
self.b = b2
else:
self.b = (b1 + b2) / 2.0
return 1
else:
return 0
def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
"""
SMO algorithm is a heuristic algorithm,
and I don't know the specific principle.
The code comes from Github,
and I plan to use PSO algorithm later.
"""
self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
iter = 0
entireSet = True
alphaPairsChanged = 0
while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
alphaPairsChanged = 0
if entireSet:
for i in range(self.m):
alphaPairsChanged += self.KKTGoing(i)
print("fullSet, iter: %d i:%d, pairs changed %d" % (
iter, i, alphaPairsChanged))
iter += 1
else:
nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
for i in nonBoundIs:
alphaPairsChanged += self.KKTGoing(i)
print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
iter += 1
if entireSet:
entireSet = False
elif (alphaPairsChanged == 0):
entireSet = True
print("iteration number: %d" % iter)
return self.b, self.alphas
def fit(self,dataSet:DataSet):
dataArr, labelArr = dataSet.LoadDataSet()
b, alphas = self.SMO(dataArr, labelArr, 200, 0.0001, 10000, self.Ktype)
self.b = b
self.alphas = alphas
datMat = mat(dataArr)
labelMat = mat(labelArr).transpose()
svInd = nonzero(alphas)[0]
# Select the number of rows of data that is not 0 (that is, support vector)
sVs = datMat[svInd]
labelSV = labelMat[svInd]
self.sVs = sVs
self.labelSV = labelSV
self.svInd = svInd
print("there are %d Support Vectors" % shape(sVs)[0])
m, n = shape(datMat)
errorCount = 0
for i in range(m):
kernelEval = self.kernelFunction(sVs, datMat[i, :], ('rbf', 1.3))
predict = kernelEval.T * multiply(labelSV, alphas[
svInd]) + b
if sign(predict) != sign(labelArr[i]): # sign: -1 if x < 0, 0 if x==0, 1 if x > 0
errorCount += 1
print("the training error rate is: %f" % (float(errorCount) / m))
def save_model(self,path):
dict = {}
dict['b'] = self.b
dict['alphas'] = self.alphas
dict['sVs'] = self.sVs
dict['labelSV'] = self.labelSV
dict['svInd'] = self.svInd
with open(path,'w') as file:
file.write(dict)
def load_mode(self,path):
if(os.path.exists(path)):
with open(path) as file:
model = file.read()
model = eval(model)
self.b = model['b']
self.alphas = model['alphas']
self.sVs = model['sVs']
self.labelSV = model['labelSV']
self.svInd = model['svInd']
else:
raise Exception("Fuck you no such file")
def predict(self,dataSet:DataSet):
dataArr_test, labelArr_test = dataSet.LoadDataSet()
errorCount_test = 0
datMat_test = mat(dataArr_test)
m, n = shape(datMat_test)
for i in range(m): # 在测试数据上检验错误率
kernelEval = self.kernelFunction(self.sVs, datMat_test[i, :], ('rbf', 1.3))
predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
if sign(predict) != sign(labelArr_test[i]):
errorCount_test += 1
print("the test error rate is: %f" % (float(errorCount_test) / m))
def forcast(self,dataSet:DataSet,show=False):
res = []
dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
datMat_forcast = mat(dataArr_forcast)
m, n = shape(datMat_forcast)
for i in range(m):
kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
res.append(predict)
if(show):
print("the result is:",res)
return res
if __name__ == '__main__':
train_path = r'\Data\svm_train.txt'
test_data = r'\Data\svm_eval.txt'
train_data = DataSet(train_path)
test_data = DataSet(test_data)
SVM = SVMModel(('rbf', 1.3))
SVM.fit(train_data)
SVM.predict(test_data)
代码解读
get data
If you wish to obtain my data, you can visit the link:
链接:https://pan.baidu.com/s/1rTmao4zkQJiRW9zGcWpXHQ
提取码:6666
Mr. Wu Enda is highly regarded, and I’m confident in this statement’s accuracy.
Real case
This is a simple web-based example.
(Mr: Wu)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from scipy.io import loadmat
from sklearn import svm
'''
1.Prepare datasets
'''
mat = loadmat('data/ex6data1.mat')
print(mat.keys())
X = mat['X']
y = mat['y']
def plotData(X, y):
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap='rainbow')
plt.xlabel('x1')
plt.ylabel('x2')
pass
def plotBoundary(clf, X):
'''Plot Decision Boundary'''
x_min, x_max = X[:, 0].min() * 1.2, X[:, 0].max() * 1.1
y_min, y_max = X[:, 1].min() * 1.1, X[:, 1].max() * 1.1
# np.linspace(x_min, x_max, 500).shape---->(500, ) 500是样本数
# xx.shape, yy.shape ---->(500, 500) (500, 500)
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500), np.linspace(y_min, y_max, 500))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# model.predict:模型预测 (250000, )
# ravel()将多维数组转换为一维数组 xx.ravel().shape ----> (250000,1)
# np.c 中的c是column(列)的缩写,就是按列叠加两个矩阵,就是把两个矩阵左右组合,要求行数相等。
# np.c_[xx.ravel(), yy.ravel()].shape ----> (250000,2) 就是说建立了250000个样本
Z = Z.reshape(xx.shape)
plt.contour(xx, yy, Z)
# 等高线得作用就是画出分隔得线
pass
models = [svm.SVC(C, kernel='linear') for C in [1, 100]]
# 支持向量机模型 (kernel:核函数选项,这里是线性核函数 , C:权重,这里取1和100)
# 线性核函数画的决策边界就是直线
clfs = [model.fit(X, y.ravel()) for model in models] # model.fit:拟合出模型
score = [model.score(X, y) for model in models] # [0.9803921568627451, 1.0]
# title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]
def plot():
title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]
for model, title in zip(clfs, title):
# zip() 函数用于将可迭代的对象作为参数,将对象中对应的元素打包成一个个元组,然后返回由这些元组组成的列表。
plt.figure(figsize=(8, 5))
plotData(X, y)
plotBoundary(model, X) # 用拟合好的模型(预测那些250000个样本),绘制决策边界
plt.title(title)
pass
pass
# plt.show()
'''
2.SVM with Gaussian Kernels
'''
def gaussKernel(x1, x2, sigma):
return np.exp(-(x1 - x2) ** 2).sum() / (2 * sigma ** 2)
a = gaussKernel(np.array([1, 2, 1]), np.array([0, 4, -1]), 2.) # 0.32465246735834974
# print(a)
'''
Example Dataset 2
'''
mat = loadmat('data/ex6data2.mat')
x2 = mat['X']
y2 = mat['y']
plotData(x2, y2)
plt.show()
sigma = 0.1
gamma = np.power(sigma, -2)/2
'''
高斯核函数中的gamma越大,相当高斯函数中的σ越小,此时的分布曲线也就会越高越瘦。
高斯核函数中的gamma越小,相当高斯函数中的σ越大,此时的分布曲线也就越矮越胖,smoothly,higher bias, lower variance
'''
clf = svm.SVC(C=1, kernel='rbf', gamma=gamma)
model = clf.fit(x2, y2.flatten()) # kernel='rbf'表示支持向量机使用高斯核函数
#
# plotData(x2, y2)
# plotBoundary(model, x2)
# plt.show()
'''
Example Dataset3
'''
mat3 = loadmat('data/ex6data3.mat')
x3, y3 = mat3['X'], mat3['y']
Xval, yval = mat3['Xval'], mat3['yval']
plotData(x3, y3)
# plt.show()
Cvalues = (0.01, 0.03, 0.1, 0.3, 1., 3., 10., 30.) # 权重C的候选值
sigmavalues = Cvalues # 核函数参数的候选值
best_pair, best_score = (0, 0), 0 # 最佳的(C,sigma)权值 ,决定系数(R2)
# 寻找最佳的权值(C,sigma)
for C in Cvalues:
for sigma in sigmavalues:
gamma = np.power(sigma, -2.) / 2
model = svm.SVC(C=C, kernel='rbf', gamma=gamma) # 使用核函数的支持向量机
model.fit(x3, y3.flatten()) # 拟合出模型
this_score = model.score(Xval, yval) # 利用交叉验证集来选择最合适的权重
'''
model.score函数的返回值是决定系数,也称R2。
可以测度回归直线对样本数据的拟合程度,决定系数的取值在0到1之间,
决定系数越高,模型的拟合效果越好,即模型解释因变量的能力越强。
'''
# 选择拟合得最好得权重值
if this_score > best_score:
best_score = this_score
best_pair = (C, sigma)
pass
pass
print('最优(C, sigma)权值:', best_pair, '决定系数:', best_score)
# 最优(C, sigma)权值: (1.0, 0.1) 决定系数: 0.965
model = svm.SVC(1, kernel='rbf', gamma=np.power(0.1, -2.) / 2)
# 用确定好的权重再重新声明一次支持向量机
model.fit(x3, y3.flatten())
plotData(x3, y3)
plotBoundary(model, x3)
# plt.show()
代码解读
