What is SVM algorithm

阅读量：

文章目录

Prelude

复制代码

  * Decision Tree Analysis
  * * Advantages
* Disadvantages

  * SVM Analysis
  * * Advantages
* Disadvantages

* Adivces

Mathematical foundation of Support Vector Machines (SVM)
- Objective Function
  - Lagrangian Duality
  - Simplification of Constraints
  - Illustrative Example
  - Fuzzy Interval
  - Kernel Mapping Function
- Coding
- - Data type
  - Core
  - forcast
  - All code
  - get data
- Real case

The prcface

With the aim to enhance my English proficiency and preparing for the graduate entrance exam, I will write English blogs. Despite having produced a substantial amount of Chinese content, I must translate it into English independently, which is why my blog updates are expected to be slower. Nonetheless, if given additional time, I suppose I could also create a corresponding Chinese version.

okey let’s today’s blog. Welcome to my channel!

Before you read through this blog, I hope you have already been introduced to some basics about machine learning. It is not suitable for someone who hasn't learned about it.

target:

The mathematrical principle of SVM algorithm

how coding that

Real case

no coding no future that’s go gays

Compare

If we mention the SVM algorithm, we should also mention other algorithms in machine learning like decision trees and logistic regression because all of them can help classify objects. However, why do we choose to use SVM instead of other algorithms? What makes it different?

Logical Regression

This algorithm allows for straightforward acquisition of the necessary skills. You are enabled to perform a probabilistic assessment of classification.

When encountering such an object instance (x₁, x₂, label) within our decision-making space, it is common to observe...

You will find it difficult to achieve good performance. Regardless of your efforts, the decision boundary obtained by the logical regression method remains inherently linear, making it fundamentally unable to generate the circular boundary required here. Therefore, logical regression is suitable for addressing classification problems that are practically linearly separable.

Decision tree

if we use this algorithm you can see this:

and then you will got this:

When a tree's size is consistently expanded, its boundary tends to be encircled by parallel lines. As a result, when a boundary is nonlinear and can be approximated by recursively partitioning feature space into rectangular regions, employing a decision tree becomes more advantageous compared to logistic regression.

SVM

Despite the data representing a two-dimensional feature space, we can employ the use of kernel functions to be mapped into a high-dimensional feature space for facilitating classification tasks.

so maybe you will see this:

and then you can get this:

Note: The deciding boundary does not follow a typical circular shape but remains extremely close (often polygons or other shapes). This approach was attempted rather than using rings as an alternative, which simplifies operations.

Currently, we are able to simply analyze which algorithm is suitable for various scenarios.

Analysis

Logistic Regression Analysis

A convenient and useful thing about logical regression is that the output is not a discrete value or an exact category. Instead, you get a list of probabilities associated with each observation sample. You can use different criteria and common performance metrics to analyze this probability score, get a threshold, and then categorize the output in the way that best suits your business problem. In the financial industry, this technique is widely used in scorecards, and for the same model, you can adjust your threshold to get different classification results. Few other algorithms use this score as a direct result. On the contrary, their output is a rigorous direct classification result. At the same time, logical regression is quite efficient in terms of time and memory requirements.

Furthermore, the logistic regression algorithm is resilient to minor to moderate levels of noise and is not significantly impacted by slight multicollinearity. To address severe multicollinearity, logistic regression combined with L2 regularization can be employed. However, if one seeks a more concise model, L2 regularization is not the optimal choice because the resulting model encompasses all feature variables.

When the feature count is high and data loss is significant, logical regression becomes inadequate. At the same time, an excessive number of categorical variables also pose a challenge for logical regression. The probability derived from using all the data in logical regression is its basis for prediction. When attempting to create a decision boundary, logical regression might overlook the 'obvious' data points at both score extremes. Idealistically, logical regression should focus on these boundary points. If certain features exhibit nonlinear relationships, addressing this typically involves transformations. As the feature space's dimensionality increases, such adjustments can become more complex.

Advantages

观测样本的概率分数较为简便。

现有工具的高效实现。

对于逻辑回归来说，多重共线性并非一个问题，并可通过L2正则化加以解决。

逻辑回归在工业问题中有着广泛的应用（这是一个非常关键的点）。

Disadvantages

When the feature space becomes extremely large, the effectiveness of logical regression falters significantly.

The system struggles to process massive quantities of multi-class features or variables.

Nonlinear features necessitate transformation to ensure accurate modeling.

In such cases, the model heavily relies on all available data points, making over-fitting particularly prone.

Decision Tree Analysis

The intrinsic characteristic of decision trees lies in their indifference to unidirectional transformations or nonlinear features (which differ from nonlinearities in predictors), as they merely place rectangles within the feature space, which can adapt to any monotonic transformation. When applied to discrete data or categorical predictors, decision trees do not face challenges with an arbitrary number of classification variables. The model developed through a decision tree is straightforward and easy to interpret for business purposes. Unlike some models that output probability scores directly, decision trees utilize class probabilities iteratively assigned to terminal nodes. This leads us to the primary limitation of decision trees: they are highly biased models. While a decision tree may perform well on training data, its application to test data often results in poor predictive performance. Therefore, pruning the tree and incorporating cross-validation are essential steps to develop an unbiased decision tree model without overfitting.

The random forest method effectively circumvents the significant drawback associated with overfitting, though its primary distinction lies in being an outstanding extension of decision trees. It complicates the ease of interpreting business rules because with thousands of such trees and their complex voting mechanisms, the model becomes increasingly intricate. Additionally, while decision tree variables exhibit interdependence, this relationship often proves ineffective when most variables lack interaction or exhibit weak strength. Furthermore, this design also diminishes their susceptibility to multicollinearity.

Advantages

直观的决策规则。

能处理非线性特性。

变量之间的相互关系被考虑进去。

Disadvantages

It is easy to fit, of course, it can be solved by random deep forest.

SVM Analysis

The key feature of support vector machines lies in their reliance on boundary samples to construct the desired separation curve. As observed, they are capable of managing nonlinear decision boundaries. Their dependence on boundaries also grants them the capability to address "obvious" sample instances in cases of missing data. Support vector machines are adept at handling large feature spaces, which makes them one of the most popular algorithms in text analysis due to their effectiveness with high-dimensional data. Given that text data typically generate a vast array of features, logistic regression is not an ideal choice for such scenarios as they often involve complex decision boundaries and high-dimensional spaces. In summary, support vector machines offer several advantages over other classification methods when dealing with high-dimensional data and complex decision boundaries; their reliance on boundary samples ensures robust performance even when some data points are missing or incomplete.

The outputs of support vector machines are less transparent to users than decision trees. However, by employing a nonlinear kernel, the training process for support vector machines on large datasets becomes notably slower.

Advantages

该模型具备处理大规模特征空间的能力。

该模型能够处理非线性特征之间的相互作用。

无需依赖全部数据

Disadvantages

When there are many observation samples, the efficiency is not very high.

Sometimes it is difficult to find a suitable kernel function

Adivces

But if its effects are not satisfactory, its outcomes can serve as a standard measure. However, given that logistic regression serves as an initial approach in many cases, it often bears the primary responsibility of handling foundational issues in data analysis.

Investigate whether decision trees (random forests) can significantly enhance the performance of the model. Even when it is not employed as the final model, random forests can be utilized to eliminate noise variables.

If the number of features and observation samples are significantly high, then under conditions where resources and time are available, SVM becomes a viable option.

Don’t find it too difficult, but in reality, you just require accessing a variety of APIs through sklearn.

Mathematrical principle of SVM

Target Function

okey is time to say SVM how it work what mathematrical about it?

Our target is sample that we just need to class objects in tow categorys.

just like this:

(If we don’t use kernel functions to calculate)

We need to determine a linear boundary or hypersurface in order to maximize the minimal distance between the nearest points on either side of the plane, in order to attain an effective separation between classes.

just like this:

We can assume that this hyperplane looks like this.

like this:

By calculating the distance between a point and a straight line, we can actually compute the distance from the point to the hyperplane.

(For example, the distance from a point (x0) to a straight line Ax+By+C=0:)

The distance from the point to the hyperplane:

However, since SVM is a binary algorithm, we can stipulate like this:

It denotes that the input data will represent certain patterns or characteristics. For example, (x₁,x₂,1) and (x₂₁,x₂₂,0)… The numbers 1 and 0 are used as labels to classify these patterns. Therefore, the distance between these points will be calculated as follows:

Determine a hyperplane with parameters w and b such that the nearest point is as far as possible. The Argmax operation identifies the value of w and b that minimizes the distance from the nearest point to this hyperplane.

tips:

To facilitate computation, we posit the following assumption, noting that scaling does not influence the distance from a point to a plane, thus:

So, we can reduce the distance formula to:

For better calculation, we change the maximum value to the minimum value.

And for the convenience of derivation, we add square.

so we need do it:

Lagrange duality

To solve this constrained problem, we need to use this

and then we get this:

but if we use this the function will be:

So We transform the dual problem.

so we can find the partial derivative.

Simplify

Last we can get this:

and then we make it

extend this:

(To make the equation clearer, we changed it a little bit.)

To:

with this Conditions

Likewise, for the purpose of simplifying the operation process, we convert the maximum value into the minimum one.

We just need to add a minus sign.

Finally, we got all the values we needed.

Substitute these values into the partial derivative formula derived earlier, and solve for w and b.

example

这个教学视频来自B站平台中的具体视频ID为BV15A4y1X7K1

and methematrical princple from there:
https://zhuanlan.zhihu.com/p/270298485

Soft interval

If we must adhere to the calculated range, it tends to happen easily.

so we make

and then target function become this:

Last the result be:

Kernel function

After analyzing through derivations and illustrative examples, you will discover that what has been accomplished thus far only involves linear processing steps, while what we obtain as a result does not constitute a nonlinear boundary. Therefore, additional information may be necessary to convert the distance between a point and a line into the face's distance. This transformation requires addressing it through a kernel function.

such as this:

It is challenging to divide entities into two distinct categories by means of a straight line.

But if the points above can look like this:

If in such a scenario, we are able to leverage a face to classify into two central parts. for it, we can utilize this approach:**

复制代码

    def kernelTrans(X, A, kTup): 
    m,n = shape(X)
    K = mat(zeros((m,1)))
    if kTup[0]=='lin': #linner nothing to do here
        K = X * A.T
    elif kTup[0]=='gaosi': # (gaosi function)
        for j in range(m):
            deltaRow = X[j,:] - A
            K[j] = deltaRow*deltaRow.T
        K = exp(K/(-1*kTup[1]**2)) 
    else:
        raise NameError("Son of a bitch doesn't have this kernel function.")
    return K
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

Still, it's worth mentioning that this video presents an example related to kernel functions. However, in our actual codebase, we have more direct examples. But also include some extra modifications to enhance functionality. The overall workflow mirrors what's described here.

Following this, the same calculation has been performed, but an additional mapping has been executed.

Due to its simplicity, I choose not to elaborate further on this matter. The reason is that explaining such details would require more extensive discussion. However, rest assured that its implementation will be demonstrated in the code. In fact, I have always intended to tackle this problem but have been hesitant about discussing these details further. Moreover, I must acknowledge Dr. Tang Yudi for his valuable contribution as mentioned in the video above. To clarify further, until now, I have long focused on extracting mathematical frameworks prior to applying Support Vector Machines.

Coding

Data type

Okay, the first critical aspect is determining the appropriate data type, as your program must correspond to the chosen datatype for your algorithm. Today, I'll focus on selecting an appropriate datatype for our coding decision tree algorithm. However, I'll store these datasets in a text file.

复制代码

    class DataSet(object):
    def __init__(self,path):
        self.Features=[]
        self.Labels = []
        self.path = path
    
    def LoadDataSet(self):
        if(os.path.exists(self.path)):
            with open(self.path) as file:
                for line in file.readlines():
                    lineArr = [float(x) for x in line.strip().split()]
                    self.Features.append([lineArr[0], lineArr[1]])
                    self.Labels.append(lineArr[2])
            return self.Features, self.Labels
    
        else:
            raise Exception("Fuck you no such file")
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

Core

when we start coding we must to know what is our core or target.

We now need to identify the appropriate actions to take. We possess a designated objective. Upon inputting our data, the system will generate an output resembling this form.

We require knowledge of alpha on our input data, after which we can determine the form of our function.

But if we want to get this,we need to do something make machine can do this:

That is, how to let the machine help us solve the equation?

Here, our central aim is ensuring that machines are able to perform automatic equation-solving operations.

The SMO algorithm has been introduced within the text of 《Statistical Learning Methods》. Notably, alternative heuristic approaches can also be employed for solving similar problems.

复制代码

    def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
        self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
        iter = 0
        entireSet = True
        alphaPairsChanged = 0
        while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
            alphaPairsChanged = 0
            if entireSet:
                for i in range(self.m):  # 遍历所有数据
                    alphaPairsChanged += self.KKTGoing(i)
                    print("fullSet, iter: %d i:%d, pairs changed %d" % (
                    iter, i, alphaPairsChanged))  
                iter += 1
            else:
                nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
                for i in nonBoundIs:  
                    alphaPairsChanged += self.KKTGoing(i)
                    print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
                iter += 1
            if entireSet:
                entireSet = False
            elif (alphaPairsChanged == 0):
                entireSet = True
            print("iteration number: %d" % iter)
        return self.b, self.alphas
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

forcast

当向量α和标量b被确定之后，则可获得相应的超平面；接着用新样本进行测试以计算其与该超平面的距离；从而实现分类任务。

复制代码

    def forcast(self,dataSet:DataSet,show=False):
        res = []
        dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
        datMat_forcast = mat(dataArr_forcast)
        m, n = shape(datMat_forcast)
        for i in range(m):  # 在测试数据上检验错误率
            kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
            res.append(predict)
        if(show):
            print("the result is:",res)
        return res
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

All code

now that’s we show all code:

复制代码

    from numpy import *
    import os
    
    class DataSet(object):
    def __init__(self,path):
        self.Features=[]
        self.Labels = []
        self.path = path
    
    def LoadDataSet(self):
        if(os.path.exists(self.path)):
            with open(self.path) as file:
                for line in file.readlines():
                    lineArr = [float(x) for x in line.strip().split()]
                    self.Features.append([lineArr[0], lineArr[1]])
                    self.Labels.append(lineArr[2])
            return self.Features, self.Labels
    
        else:
            raise Exception("Fuck you no such file")
    
    
    class SVMModel(object):
    
    def __init__(self,Ktype):
        self.Ktype = Ktype
    
    def __SMO_init(self,Features, Labels, C, toler, Ktype):
        """
        :param Features:
        :param Labels:
        :param C: Soft interval
        :param toler: Stop threshold
        :param Ktype:
        """
        self.X = Features
        self.labelMat = Labels
        self.C = C
        self.tol = toler
        self.m = shape(Features)[0]
        self.alphas = mat(zeros((self.m,1)))
        self.b = 0
        self.eCache = mat(zeros(shape(Features)))
        self.K = mat(zeros((self.m,self.m)))
        self.sVs = None
        self.labelSV = None
        self.svInd = None
        for i in range(self.m):
            self.K[:,i] = self.kernelFunction(self.X, self.X[i,:], Ktype)
    
    
    
    def kernelFunction(self,X, A, Ktype):
        """
        :param X:
        :param A:
        :param Ktype: (type,param)
        :return:
        """
        m, n = shape(X)
        K = mat(zeros((m, 1)))
        if Ktype[0] == 'lin':
            K = X * A.T
        elif Ktype[0] == 'rbf':
            for j in range(m):
                deltaRow = X[j, :] - A
                K[j] = deltaRow * deltaRow.T
            K = exp(K / (-1 * Ktype[1] ** 2))
        else:
            raise NameError('Houston We Have a Problem -- That Kernel is not recognized')
        return K
    
    def __SelectRand(self,i, m): 
        j = i
        while (j == i):
            j = int(random.uniform(0, m))
        return j
    
    
    def __SelectAj(self,i, oS, Ei):
        maxK = -1
        maxDeltaE = 0
        Ej = 0
        oS.eCache[i] = [1, Ei]
        validEcacheList = nonzero(self.eCache[:, 0].A)[0]  # 返回矩阵中的非零位置的行数
        if (len(validEcacheList)) > 1:
            for k in validEcacheList:
                if k == i:
                    continue
                Ek = self.__calcEk(k)
                deltaE = abs(Ei - Ek)
                if (deltaE > maxDeltaE): 
                    maxK = k
                    maxDeltaE = deltaE
                    Ej = Ek
            return maxK, Ej
        else:
            j = self.__SelectRand(i, self.m)
            Ej = self.__calcEk(j)
        return j, Ej
    
    def __HoldAlpha(self,al, H, L):
        #（L <= a <= H）
        if (al > H):
            al = H
        elif(L > al):
            al = L
        return al
    
    def __calcEk(self, k): 
        fXk = float(multiply(self.alphas, self.labelMat).T * self.K[:, k] + self.b)
        Ek = fXk - float(self.labelMat[k])
        return Ek
    
    def __updateEk(self,k):
        Ek = self.__calcEk(k)
        self.eCache[k] = [1, Ek]
    
    def KKTGoing(self,i):
        """
        Refer to the following 《Statistical Learning Methods》.
        First, check whether ai meets KKT conditions. 
        If not, randomly select aj for optimization 
        and update the values of AI, AJ and B.
        :param self: 
        :return: 
        """
        Ei = self.__calcEk(i)  # 计算E值
        if ((self.labelMat[i] * Ei < -self.tol) and (self.alphas[i] < self.C)) or (
                (self.labelMat[i] * Ei > self.tol) and (self.alphas[i] > 0)): 
            j, Ej = self.__SelectAj(i, self, Ei) 
            alphaIold = self.alphas[i].copy()
            alphaJold = self.alphas[j].copy()
            if (self.labelMat[i] != self.labelMat[j]):
                L = max(0, self.alphas[j] - self.alphas[i])
                H = min(self.C, self.C + self.alphas[j] - self.alphas[i])
            else:
                L = max(0, self.alphas[j] + self.alphas[i] - self.C)
                H = min(self.C, self.alphas[j] + self.alphas[i])
            if L == H:
                print("L==H")
                return 0
            eta = 2.0 * self.K[i, j] - self.K[i, i] - self.K[j, j] 
            if eta >= 0:
                print("eta>=0")
                return 0
            self.alphas[j] -= self.labelMat[j] * (Ei - Ej) / eta  
            self.alphas[j] = self.__HoldAlpha(self.alphas[j], H, L)  
            self.__updateEk(j)
            if (abs(self.alphas[j] - alphaJold) < self.tol): 
                print("j not moving enough")
                return 0
            self.alphas[i] += self.labelMat[j] * self.labelMat[i] * (alphaJold - self.alphas[j]) 
            self.__updateEk(i)  
            
            b1 = self.b - Ei - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, i] - self.labelMat[j] * (
                        self.alphas[j] - alphaJold) * self.K[i, j]
            b2 = self.b - Ej - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, j] - self.labelMat[j] * (
                        self.alphas[j] - alphaJold) * self.K[j, j]
            if (0 < self.alphas[i] < self.C):
                self.b = b1
            elif (0 < self.alphas[j] < self.C):
                self.b = b2
            else:
                self.b = (b1 + b2) / 2.0
            return 1
        else:
            return 0
    
    
    def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
    	"""
    		SMO algorithm is a heuristic algorithm, 
    		and I don't know the specific principle. 
    		The code comes from Github, 
    		and I plan to use PSO algorithm later.
    		"""
        self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
        iter = 0
        entireSet = True
        alphaPairsChanged = 0
        while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
            alphaPairsChanged = 0
            if entireSet:
                for i in range(self.m):  
                    alphaPairsChanged += self.KKTGoing(i)
                    print("fullSet, iter: %d i:%d, pairs changed %d" % (
                    iter, i, alphaPairsChanged)) 
                iter += 1
            else:
                nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
                for i in nonBoundIs:  
                    alphaPairsChanged += self.KKTGoing(i)
                    print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
                iter += 1
            if entireSet:
                entireSet = False
            elif (alphaPairsChanged == 0):
                entireSet = True
            print("iteration number: %d" % iter)
        return self.b, self.alphas
    
    def fit(self,dataSet:DataSet):
        dataArr, labelArr = dataSet.LoadDataSet()
        b, alphas = self.SMO(dataArr, labelArr, 200, 0.0001, 10000, self.Ktype) 
        self.b = b
        self.alphas = alphas
        datMat = mat(dataArr)
        labelMat = mat(labelArr).transpose()
        svInd = nonzero(alphas)[0]
        # Select the number of rows of data that is not 0 (that is, support vector)
        sVs = datMat[svInd]
        labelSV = labelMat[svInd]
        self.sVs = sVs
        self.labelSV = labelSV
        self.svInd  = svInd
        print("there are %d Support Vectors" % shape(sVs)[0])
        m, n = shape(datMat)  
        errorCount = 0
        for i in range(m):
            kernelEval = self.kernelFunction(sVs, datMat[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(labelSV, alphas[
                svInd]) + b
            if sign(predict) != sign(labelArr[i]):  # sign: -1 if x < 0, 0 if x==0, 1 if x > 0
                errorCount += 1
        print("the training error rate is: %f" % (float(errorCount) / m))  
    
    def save_model(self,path):
        dict = {}
        dict['b'] = self.b
        dict['alphas'] = self.alphas
        dict['sVs'] = self.sVs
        dict['labelSV'] = self.labelSV
        dict['svInd'] = self.svInd
        with open(path,'w') as file:
            file.write(dict)
    def load_mode(self,path):
        if(os.path.exists(path)):
            with open(path) as file:
                model = file.read()
                model = eval(model)
                self.b = model['b']
                self.alphas = model['alphas']
                self.sVs = model['sVs']
                self.labelSV = model['labelSV']
                self.svInd = model['svInd']
        else:
            raise Exception("Fuck you no such file")
    
    def predict(self,dataSet:DataSet):
        dataArr_test, labelArr_test = dataSet.LoadDataSet() 
        errorCount_test = 0
        datMat_test = mat(dataArr_test)
        m, n = shape(datMat_test)
        for i in range(m):  # 在测试数据上检验错误率
            kernelEval = self.kernelFunction(self.sVs, datMat_test[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
            if sign(predict) != sign(labelArr_test[i]):
                errorCount_test += 1
        print("the test error rate is: %f" % (float(errorCount_test) / m))
    
    def forcast(self,dataSet:DataSet,show=False):
        res = []
        dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
        datMat_forcast = mat(dataArr_forcast)
        m, n = shape(datMat_forcast)
        for i in range(m):  
            kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
            res.append(predict)
        if(show):
            print("the result is:",res)
        return res
    
    
    if __name__ == '__main__':
    
    train_path = r'\Data\svm_train.txt'
    test_data = r'\Data\svm_eval.txt'
    train_data = DataSet(train_path)
    test_data = DataSet(test_data)
    
    SVM = SVMModel(('rbf', 1.3))
    SVM.fit(train_data)
    SVM.predict(test_data)
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

get data

If you wish to obtain my data, you can visit the link:
链接： $https://pan.baidu.com/s/1rTmao4zkQJiRW9zGcWpXHQ$
提取码：6666
Mr. Wu Enda is highly regarded, and I’m confident in this statement’s accuracy.

Real case

This is a simple web-based example.

(Mr: Wu)

复制代码

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sb    
    from scipy.io import loadmat
    from sklearn import svm
    
    '''
    1.Prepare datasets
    '''
    mat = loadmat('data/ex6data1.mat')
    print(mat.keys())
    X = mat['X']
    y = mat['y']
    
    
    def plotData(X, y):
    plt.figure(figsize=(8, 6))
    plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap='rainbow')
    plt.xlabel('x1')
    plt.ylabel('x2')
    pass
    
    def plotBoundary(clf, X):
    '''Plot Decision Boundary'''
    x_min, x_max = X[:, 0].min() * 1.2, X[:, 0].max() * 1.1
    y_min, y_max = X[:, 1].min() * 1.1, X[:, 1].max() * 1.1
    # np.linspace(x_min, x_max, 500).shape---->(500, )  500是样本数
    # xx.shape, yy.shape ---->(500, 500) (500, 500)
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500), np.linspace(y_min, y_max, 500))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    # model.predict:模型预测 (250000, )
    # ravel()将多维数组转换为一维数组 xx.ravel().shape ----> (250000,1)
    # np.c 中的c是column（列）的缩写，就是按列叠加两个矩阵，就是把两个矩阵左右组合，要求行数相等。
    # np.c_[xx.ravel(), yy.ravel()].shape ----> (250000,2) 就是说建立了250000个样本
    Z = Z.reshape(xx.shape)
    plt.contour(xx, yy, Z)
    # 等高线得作用就是画出分隔得线
    pass
    
    
    models = [svm.SVC(C, kernel='linear') for C in [1, 100]]
    # 支持向量机模型 (kernel:核函数选项，这里是线性核函数 , C:权重，这里取1和100)
    # 线性核函数画的决策边界就是直线
    clfs = [model.fit(X, y.ravel()) for model in models]    # model.fit:拟合出模型
    score = [model.score(X, y) for model in models]        # [0.9803921568627451, 1.0]
    # title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]
    
    def plot():
    title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]
    for model, title in zip(clfs, title):
        # zip() 函数用于将可迭代的对象作为参数，将对象中对应的元素打包成一个个元组，然后返回由这些元组组成的列表。
        plt.figure(figsize=(8, 5))
        plotData(X, y)
        plotBoundary(model, X)  # 用拟合好的模型（预测那些250000个样本），绘制决策边界
        plt.title(title)
        pass
    pass
    
    # plt.show()
    
    '''
    2.SVM with Gaussian Kernels
    '''
    
    
    def gaussKernel(x1, x2, sigma):
    return np.exp(-(x1 - x2) ** 2).sum() / (2 * sigma ** 2)
    
    
    a = gaussKernel(np.array([1, 2, 1]), np.array([0, 4, -1]), 2.)  # 0.32465246735834974
    # print(a)
    
    '''
    Example Dataset 2
    '''
    
    mat = loadmat('data/ex6data2.mat')
    x2 = mat['X']
    y2 = mat['y']
    plotData(x2, y2)
    plt.show()
    
    sigma = 0.1
    gamma = np.power(sigma, -2)/2
    '''
    高斯核函数中的gamma越大，相当高斯函数中的σ越小，此时的分布曲线也就会越高越瘦。
    高斯核函数中的gamma越小，相当高斯函数中的σ越大，此时的分布曲线也就越矮越胖,smoothly,higher bias, lower variance
    '''
    clf = svm.SVC(C=1, kernel='rbf', gamma=gamma)
    model = clf.fit(x2, y2.flatten())       # kernel='rbf'表示支持向量机使用高斯核函数
    # 
    # plotData(x2, y2)
    # plotBoundary(model, x2)
    # plt.show()
    
    
    '''
    Example Dataset3
    '''
    mat3 = loadmat('data/ex6data3.mat')
    x3, y3 = mat3['X'], mat3['y']
    Xval, yval = mat3['Xval'], mat3['yval']
    plotData(x3, y3)
    # plt.show()
    
    Cvalues = (0.01, 0.03, 0.1, 0.3, 1., 3., 10., 30.)  # 权重C的候选值
    sigmavalues = Cvalues   # 核函数参数的候选值
    best_pair, best_score = (0, 0), 0        # 最佳的（C，sigma）权值 ，决定系数（R2）
    # 寻找最佳的权值（C，sigma）
    for C in Cvalues:
    for sigma in sigmavalues:
        gamma = np.power(sigma, -2.) / 2
        model = svm.SVC(C=C, kernel='rbf', gamma=gamma)     # 使用核函数的支持向量机
        model.fit(x3, y3.flatten())      # 拟合出模型
        this_score = model.score(Xval, yval)        # 利用交叉验证集来选择最合适的权重
        '''
         model.score函数的返回值是决定系数,也称R2。
         可以测度回归直线对样本数据的拟合程度,决定系数的取值在0到1之间,
         决定系数越高,模型的拟合效果越好,即模型解释因变量的能力越强。
         '''
        # 选择拟合得最好得权重值
        if this_score > best_score:
            best_score = this_score
            best_pair = (C, sigma)
        pass
    pass
    print('最优（C, sigma）权值：', best_pair, '决定系数：', best_score)
    # 最优（C, sigma）权值： (1.0, 0.1) 决定系数： 0.965
    model = svm.SVC(1, kernel='rbf', gamma=np.power(0.1, -2.) / 2)
    # 用确定好的权重再重新声明一次支持向量机
    model.fit(x3, y3.flatten())
    plotData(x3, y3)
    plotBoundary(model, x3)
    # plt.show()
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

全部评论 (0)

还没有任何评论哟~

What is SVM algorithm

文章目录 Theprcface Compare LogicalRegression Decisiontree SVM Analysis LogisticRegressionAnalysis Advan...

【1】What is the Most Powerful Optimization Algorithm in

作者：禅与计算机程序设计艺术 1.简介：深度学习（DeepLearning）在最近几年发展迅猛，各种任务的模型越来越复杂，参数量也越来越大，导致训练过程耗时长、资源消耗大，如何有效地进行优化是目前研...

What is Floorplanning？

WhatisFloorplanning？ Floorplanning包括macro/block放置，pin放置，powerplanning和powergrid。使这项工作更重要的是，对macro/bl...

What is CRA?

WhatisCRA? 什么是CRA？众所周知，sensor的效能与sensor本身的灵敏度与光线入射到sensor的角度有关。而光线入射到sensorpixel的角度是由Lens的CRA和senso...

What is URL?

URL统一资源定位符:是对可以从互联网上得到的资源的位置和访问方法的一种简洁的表示，是互联网上标准资源的地址。互联网上的每个文件都有一个惟一的URL，它包含的信息指出文件的位置以及浏览器应该如何处理它...

what is APN？

1.whatisAPN？ APN是一种网络接入技术，是通过无线上网时必须配置的一个参数。它决定着用户通过何种方式接入哪个网络。在GPRS骨干网中，APN被用来标识要使用的外部PDN。

what is noSQL

NoSQL，泛指非关系型的数据库。随着互联网web2.0网站的兴起，传统的关系数据库在处理web2.0网站，特别是超大规模和高并发的SNS类型的web2.0纯动态网站已经显得力不从心，出现了很多难以克...

DESIGN CORNER IS WHAT？

一、工艺角（ProcessCorner）与双极晶体管不同，在不同的晶片之间以及在不同的批次之间，MOSFETs参数变化很大。为了在一定程度上减轻电路设计任务的困难，工艺工程师们要保证器件的性能在某个...

What is Quantum Circuits?

作者：禅与计算机程序设计艺术 1.背景介绍 Inthelastdecadeorso,quantumcomputinghasseenarevolutionarydevelopmentinthefield...

What is the Probability

第1关：WhatistheProbability 100 任务要求参考答案评论题目描述概率一直是计算机算法中不可或缺的一部分。当确定性算法不能在短时间内解决一个问题时，就要用概率算法。在本题中...

是否确定退出登录?

What is SVM algorithm

文章目录

The prcface

Compare

Logical Regression

Decision tree

SVM

Analysis

Logistic Regression Analysis

Advantages

Disadvantages

Decision Tree Analysis

Advantages

Disadvantages

SVM Analysis

Advantages

Disadvantages

Adivces

Mathematrical principle of SVM

Target Function

Lagrange duality

Simplify

example

Soft interval

Kernel function

Coding

Data type

Core

forcast

All code

get data

Real case

全部评论 (0)

相关文章推荐

What is SVM algorithm

【1】What is the Most Powerful Optimization Algorithm in

What is Floorplanning？

What is CRA?

What is URL?

what is APN？

what is noSQL

DESIGN CORNER IS WHAT？

What is Quantum Circuits?

What is the Probability