Advertisement

What is SVM algorithm

阅读量:

文章目录

Prelude

复制代码
  * Decision Tree Analysis
  * * Advantages
* Disadvantages

  * SVM Analysis
  * * Advantages
* Disadvantages

* Adivces
  • Mathematical foundation of Support Vector Machines (SVM)
    • Objective Function

      • Lagrangian Duality
      • Simplification of Constraints
      • Illustrative Example
      • Fuzzy Interval
      • Kernel Mapping Function
    • Coding

      • Data type
      • Core
      • forcast
      • All code
      • get data
    • Real case

The prcface

With the aim to enhance my English proficiency and preparing for the graduate entrance exam, I will write English blogs. Despite having produced a substantial amount of Chinese content, I must translate it into English independently, which is why my blog updates are expected to be slower. Nonetheless, if given additional time, I suppose I could also create a corresponding Chinese version.

okey let’s today’s blog. Welcome to my channel!

Before you read through this blog, I hope you have already been introduced to some basics about machine learning. It is not suitable for someone who hasn't learned about it.

target:

  1. The mathematrical principle of SVM algorithm
  2. how coding that
  3. Real case

no coding no future that’s go gays

Compare

If we mention the SVM algorithm, we should also mention other algorithms in machine learning like decision trees and logistic regression because all of them can help classify objects. However, why do we choose to use SVM instead of other algorithms? What makes it different?

Logical Regression

This algorithm allows for straightforward acquisition of the necessary skills. You are enabled to perform a probabilistic assessment of classification.

在这里插入图片描述

When encountering such an object instance (x₁, x₂, label) within our decision-making space, it is common to observe...

在这里插入图片描述

You will find it difficult to achieve good performance. Regardless of your efforts, the decision boundary obtained by the logical regression method remains inherently linear, making it fundamentally unable to generate the circular boundary required here. Therefore, logical regression is suitable for addressing classification problems that are practically linearly separable.

Decision tree

if we use this algorithm you can see this:

在这里插入图片描述

and then you will got this:

在这里插入图片描述

When a tree's size is consistently expanded, its boundary tends to be encircled by parallel lines. As a result, when a boundary is nonlinear and can be approximated by recursively partitioning feature space into rectangular regions, employing a decision tree becomes more advantageous compared to logistic regression.

SVM

Despite the data representing a two-dimensional feature space, we can employ the use of kernel functions to be mapped into a high-dimensional feature space for facilitating classification tasks.

so maybe you will see this:

在这里插入图片描述

and then you can get this:

在这里插入图片描述

Note: The deciding boundary does not follow a typical circular shape but remains extremely close (often polygons or other shapes). This approach was attempted rather than using rings as an alternative, which simplifies operations.

Currently, we are able to simply analyze which algorithm is suitable for various scenarios.

Analysis

Logistic Regression Analysis

A convenient and useful thing about logical regression is that the output is not a discrete value or an exact category. Instead, you get a list of probabilities associated with each observation sample. You can use different criteria and common performance metrics to analyze this probability score, get a threshold, and then categorize the output in the way that best suits your business problem. In the financial industry, this technique is widely used in scorecards, and for the same model, you can adjust your threshold to get different classification results. Few other algorithms use this score as a direct result. On the contrary, their output is a rigorous direct classification result. At the same time, logical regression is quite efficient in terms of time and memory requirements.

Furthermore, the logistic regression algorithm is resilient to minor to moderate levels of noise and is not significantly impacted by slight multicollinearity. To address severe multicollinearity, logistic regression combined with L2 regularization can be employed. However, if one seeks a more concise model, L2 regularization is not the optimal choice because the resulting model encompasses all feature variables.

When the feature count is high and data loss is significant, logical regression becomes inadequate. At the same time, an excessive number of categorical variables also pose a challenge for logical regression. The probability derived from using all the data in logical regression is its basis for prediction. When attempting to create a decision boundary, logical regression might overlook the 'obvious' data points at both score extremes. Idealistically, logical regression should focus on these boundary points. If certain features exhibit nonlinear relationships, addressing this typically involves transformations. As the feature space's dimensionality increases, such adjustments can become more complex.

Advantages
  1. 观测样本的概率分数较为简便。

  2. 现有工具的高效实现。

  3. 对于逻辑回归来说,多重共线性并非一个问题,并可通过L2正则化加以解决。

  4. 逻辑回归在工业问题中有着广泛的应用(这是一个非常关键的点)。

Disadvantages

When the feature space becomes extremely large, the effectiveness of logical regression falters significantly.

The system struggles to process massive quantities of multi-class features or variables.

Nonlinear features necessitate transformation to ensure accurate modeling.

In such cases, the model heavily relies on all available data points, making over-fitting particularly prone.

Decision Tree Analysis

The intrinsic characteristic of decision trees lies in their indifference to unidirectional transformations or nonlinear features (which differ from nonlinearities in predictors), as they merely place rectangles within the feature space, which can adapt to any monotonic transformation. When applied to discrete data or categorical predictors, decision trees do not face challenges with an arbitrary number of classification variables. The model developed through a decision tree is straightforward and easy to interpret for business purposes. Unlike some models that output probability scores directly, decision trees utilize class probabilities iteratively assigned to terminal nodes. This leads us to the primary limitation of decision trees: they are highly biased models. While a decision tree may perform well on training data, its application to test data often results in poor predictive performance. Therefore, pruning the tree and incorporating cross-validation are essential steps to develop an unbiased decision tree model without overfitting.

The random forest method effectively circumvents the significant drawback associated with overfitting, though its primary distinction lies in being an outstanding extension of decision trees. It complicates the ease of interpreting business rules because with thousands of such trees and their complex voting mechanisms, the model becomes increasingly intricate. Additionally, while decision tree variables exhibit interdependence, this relationship often proves ineffective when most variables lack interaction or exhibit weak strength. Furthermore, this design also diminishes their susceptibility to multicollinearity.

Advantages

直观的决策规则。

能处理非线性特性。

变量之间的相互关系被考虑进去。

Disadvantages

It is easy to fit, of course, it can be solved by random deep forest.

SVM Analysis

The key feature of support vector machines lies in their reliance on boundary samples to construct the desired separation curve. As observed, they are capable of managing nonlinear decision boundaries. Their dependence on boundaries also grants them the capability to address "obvious" sample instances in cases of missing data. Support vector machines are adept at handling large feature spaces, which makes them one of the most popular algorithms in text analysis due to their effectiveness with high-dimensional data. Given that text data typically generate a vast array of features, logistic regression is not an ideal choice for such scenarios as they often involve complex decision boundaries and high-dimensional spaces. In summary, support vector machines offer several advantages over other classification methods when dealing with high-dimensional data and complex decision boundaries; their reliance on boundary samples ensures robust performance even when some data points are missing or incomplete.

The outputs of support vector machines are less transparent to users than decision trees. However, by employing a nonlinear kernel, the training process for support vector machines on large datasets becomes notably slower.

Advantages

该模型具备处理大规模特征空间的能力。

该模型能够处理非线性特征之间的相互作用。

无需依赖全部数据

Disadvantages

When there are many observation samples, the efficiency is not very high.

Sometimes it is difficult to find a suitable kernel function

Adivces

But if its effects are not satisfactory, its outcomes can serve as a standard measure. However, given that logistic regression serves as an initial approach in many cases, it often bears the primary responsibility of handling foundational issues in data analysis.

Investigate whether decision trees (random forests) can significantly enhance the performance of the model. Even when it is not employed as the final model, random forests can be utilized to eliminate noise variables.

If the number of features and observation samples are significantly high, then under conditions where resources and time are available, SVM becomes a viable option.

Don’t find it too difficult, but in reality, you just require accessing a variety of APIs through sklearn.

Mathematrical principle of SVM

Target Function

okey is time to say SVM how it work what mathematrical about it?

Our target is sample that we just need to class objects in tow categorys.

just like this:

(If we don’t use kernel functions to calculate)

在这里插入图片描述

We need to determine a linear boundary or hypersurface in order to maximize the minimal distance between the nearest points on either side of the plane, in order to attain an effective separation between classes.

just like this:

在这里插入图片描述

We can assume that this hyperplane looks like this.

在这里插入图片描述

like this:

在这里插入图片描述

By calculating the distance between a point and a straight line, we can actually compute the distance from the point to the hyperplane.

在这里插入图片描述

(For example, the distance from a point (x0) to a straight line Ax+By+C=0:)

在这里插入图片描述

The distance from the point to the hyperplane:

在这里插入图片描述

However, since SVM is a binary algorithm, we can stipulate like this:

在这里插入图片描述

It denotes that the input data will represent certain patterns or characteristics. For example, (x₁,x₂,1) and (x₂₁,x₂₂,0)… The numbers 1 and 0 are used as labels to classify these patterns. Therefore, the distance between these points will be calculated as follows:

在这里插入图片描述

Determine a hyperplane with parameters w and b such that the nearest point is as far as possible. The Argmax operation identifies the value of w and b that minimizes the distance from the nearest point to this hyperplane.

在这里插入图片描述

tips:

在这里插入图片描述

To facilitate computation, we posit the following assumption, noting that scaling does not influence the distance from a point to a plane, thus:

在这里插入图片描述

So, we can reduce the distance formula to:

在这里插入图片描述

For better calculation, we change the maximum value to the minimum value.

And for the convenience of derivation, we add square.

在这里插入图片描述

so we need do it:

在这里插入图片描述

Lagrange duality

To solve this constrained problem, we need to use this

在这里插入图片描述
在这里插入图片描述

and then we get this:

在这里插入图片描述

but if we use this the function will be:

在这里插入图片描述

So We transform the dual problem.

在这里插入图片描述
在这里插入图片描述

so we can find the partial derivative.

在这里插入图片描述

Simplify

在这里插入图片描述

Last we can get this:

在这里插入图片描述

and then we make it

在这里插入图片描述

extend this:

在这里插入图片描述

(To make the equation clearer, we changed it a little bit.)

在这里插入图片描述

To:

在这里插入图片描述

with this Conditions

在这里插入图片描述

Likewise, for the purpose of simplifying the operation process, we convert the maximum value into the minimum one.

We just need to add a minus sign.

Finally, we got all the values we needed.

Substitute these values into the partial derivative formula derived earlier, and solve for w and b.

在这里插入图片描述

example

这个教学视频来自B站平台中的具体视频ID为BV15A4y1X7K1

and methematrical princple from there:
https://zhuanlan.zhihu.com/p/270298485

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

Soft interval

If we must adhere to the calculated range, it tends to happen easily.

在这里插入图片描述

so we make

在这里插入图片描述

to

在这里插入图片描述

and then target function become this:

在这里插入图片描述
在这里插入图片描述

Last the result be:

在这里插入图片描述

Kernel function

After analyzing through derivations and illustrative examples, you will discover that what has been accomplished thus far only involves linear processing steps, while what we obtain as a result does not constitute a nonlinear boundary. Therefore, additional information may be necessary to convert the distance between a point and a line into the face's distance. This transformation requires addressing it through a kernel function.

such as this:

在这里插入图片描述

It is challenging to divide entities into two distinct categories by means of a straight line.

But if the points above can look like this:

在这里插入图片描述

If in such a scenario, we are able to leverage a face to classify into two central parts. for it, we can utilize this approach:**

\(
复制代码
    def kernelTrans(X, A, kTup): 
    m,n = shape(X)
    K = mat(zeros((m,1)))
    if kTup[0]=='lin': #linner nothing to do here
        K = X * A.T
    elif kTup[0]=='gaosi': # (gaosi function)
        for j in range(m):
            deltaRow = X[j,:] - A
            K[j] = deltaRow*deltaRow.T
        K = exp(K/(-1*kTup[1]**2)) 
    else:
        raise NameError("Son of a bitch doesn't have this kernel function.")
    return K
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

Still, it's worth mentioning that this video presents an example related to kernel functions. However, in our actual codebase, we have more direct examples. But also include some extra modifications to enhance functionality. The overall workflow mirrors what's described here.

在这里插入图片描述

Following this, the same calculation has been performed, but an additional mapping has been executed.

Due to its simplicity, I choose not to elaborate further on this matter. The reason is that explaining such details would require more extensive discussion. However, rest assured that its implementation will be demonstrated in the code. In fact, I have always intended to tackle this problem but have been hesitant about discussing these details further. Moreover, I must acknowledge Dr. Tang Yudi for his valuable contribution as mentioned in the video above. To clarify further, until now, I have long focused on extracting mathematical frameworks prior to applying Support Vector Machines.

Coding

Data type

Okay, the first critical aspect is determining the appropriate data type, as your program must correspond to the chosen datatype for your algorithm. Today, I'll focus on selecting an appropriate datatype for our coding decision tree algorithm. However, I'll store these datasets in a text file.

在这里插入图片描述
复制代码
    class DataSet(object):
    def __init__(self,path):
        self.Features=[]
        self.Labels = []
        self.path = path
    
    def LoadDataSet(self):
        if(os.path.exists(self.path)):
            with open(self.path) as file:
                for line in file.readlines():
                    lineArr = [float(x) for x in line.strip().split()]
                    self.Features.append([lineArr[0], lineArr[1]])
                    self.Labels.append(lineArr[2])
            return self.Features, self.Labels
    
        else:
            raise Exception("Fuck you no such file")
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

Core

when we start coding we must to know what is our core or target.

We now need to identify the appropriate actions to take. We possess a designated objective. Upon inputting our data, the system will generate an output resembling this form.

在这里插入图片描述

We require knowledge of alpha on our input data, after which we can determine the form of our function.

在这里插入图片描述

But if we want to get this,we need to do something make machine can do this:

在这里插入图片描述

That is, how to let the machine help us solve the equation?

Here, our central aim is ensuring that machines are able to perform automatic equation-solving operations.

The SMO algorithm has been introduced within the text of 《Statistical Learning Methods》. Notably, alternative heuristic approaches can also be employed for solving similar problems.

复制代码
    def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
        self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
        iter = 0
        entireSet = True
        alphaPairsChanged = 0
        while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
            alphaPairsChanged = 0
            if entireSet:
                for i in range(self.m):  # 遍历所有数据
                    alphaPairsChanged += self.KKTGoing(i)
                    print("fullSet, iter: %d i:%d, pairs changed %d" % (
                    iter, i, alphaPairsChanged))  
                iter += 1
            else:
                nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
                for i in nonBoundIs:  
                    alphaPairsChanged += self.KKTGoing(i)
                    print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
                iter += 1
            if entireSet:
                entireSet = False
            elif (alphaPairsChanged == 0):
                entireSet = True
            print("iteration number: %d" % iter)
        return self.b, self.alphas
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

forcast

当向量α和标量b被确定之后,则可获得相应的超平面;接着用新样本进行测试以计算其与该超平面的距离;从而实现分类任务。

复制代码
    def forcast(self,dataSet:DataSet,show=False):
        res = []
        dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
        datMat_forcast = mat(dataArr_forcast)
        m, n = shape(datMat_forcast)
        for i in range(m):  # 在测试数据上检验错误率
            kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
            res.append(predict)
        if(show):
            print("the result is:",res)
        return res
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

All code

now that’s we show all code:

复制代码
    from numpy import *
    import os
    
    class DataSet(object):
    def __init__(self,path):
        self.Features=[]
        self.Labels = []
        self.path = path
    
    def LoadDataSet(self):
        if(os.path.exists(self.path)):
            with open(self.path) as file:
                for line in file.readlines():
                    lineArr = [float(x) for x in line.strip().split()]
                    self.Features.append([lineArr[0], lineArr[1]])
                    self.Labels.append(lineArr[2])
            return self.Features, self.Labels
    
        else:
            raise Exception("Fuck you no such file")
    
    
    class SVMModel(object):
    
    def __init__(self,Ktype):
        self.Ktype = Ktype
    
    def __SMO_init(self,Features, Labels, C, toler, Ktype):
        """
        :param Features:
        :param Labels:
        :param C: Soft interval
        :param toler: Stop threshold
        :param Ktype:
        """
        self.X = Features
        self.labelMat = Labels
        self.C = C
        self.tol = toler
        self.m = shape(Features)[0]
        self.alphas = mat(zeros((self.m,1)))
        self.b = 0
        self.eCache = mat(zeros(shape(Features)))
        self.K = mat(zeros((self.m,self.m)))
        self.sVs = None
        self.labelSV = None
        self.svInd = None
        for i in range(self.m):
            self.K[:,i] = self.kernelFunction(self.X, self.X[i,:], Ktype)
    
    
    
    def kernelFunction(self,X, A, Ktype):
        """
        :param X:
        :param A:
        :param Ktype: (type,param)
        :return:
        """
        m, n = shape(X)
        K = mat(zeros((m, 1)))
        if Ktype[0] == 'lin':
            K = X * A.T
        elif Ktype[0] == 'rbf':
            for j in range(m):
                deltaRow = X[j, :] - A
                K[j] = deltaRow * deltaRow.T
            K = exp(K / (-1 * Ktype[1] ** 2))
        else:
            raise NameError('Houston We Have a Problem -- That Kernel is not recognized')
        return K
    
    def __SelectRand(self,i, m): 
        j = i
        while (j == i):
            j = int(random.uniform(0, m))
        return j
    
    
    def __SelectAj(self,i, oS, Ei):
        maxK = -1
        maxDeltaE = 0
        Ej = 0
        oS.eCache[i] = [1, Ei]
        validEcacheList = nonzero(self.eCache[:, 0].A)[0]  # 返回矩阵中的非零位置的行数
        if (len(validEcacheList)) > 1:
            for k in validEcacheList:
                if k == i:
                    continue
                Ek = self.__calcEk(k)
                deltaE = abs(Ei - Ek)
                if (deltaE > maxDeltaE): 
                    maxK = k
                    maxDeltaE = deltaE
                    Ej = Ek
            return maxK, Ej
        else:
            j = self.__SelectRand(i, self.m)
            Ej = self.__calcEk(j)
        return j, Ej
    
    def __HoldAlpha(self,al, H, L):
        #(L <= a <= H)
        if (al > H):
            al = H
        elif(L > al):
            al = L
        return al
    
    def __calcEk(self, k): 
        fXk = float(multiply(self.alphas, self.labelMat).T * self.K[:, k] + self.b)
        Ek = fXk - float(self.labelMat[k])
        return Ek
    
    def __updateEk(self,k):
        Ek = self.__calcEk(k)
        self.eCache[k] = [1, Ek]
    
    def KKTGoing(self,i):
        """
        Refer to the following 《Statistical Learning Methods》.
        First, check whether ai meets KKT conditions. 
        If not, randomly select aj for optimization 
        and update the values of AI, AJ and B.
        :param self: 
        :return: 
        """
        Ei = self.__calcEk(i)  # 计算E值
        if ((self.labelMat[i] * Ei < -self.tol) and (self.alphas[i] < self.C)) or (
                (self.labelMat[i] * Ei > self.tol) and (self.alphas[i] > 0)): 
            j, Ej = self.__SelectAj(i, self, Ei) 
            alphaIold = self.alphas[i].copy()
            alphaJold = self.alphas[j].copy()
            if (self.labelMat[i] != self.labelMat[j]):
                L = max(0, self.alphas[j] - self.alphas[i])
                H = min(self.C, self.C + self.alphas[j] - self.alphas[i])
            else:
                L = max(0, self.alphas[j] + self.alphas[i] - self.C)
                H = min(self.C, self.alphas[j] + self.alphas[i])
            if L == H:
                print("L==H")
                return 0
            eta = 2.0 * self.K[i, j] - self.K[i, i] - self.K[j, j] 
            if eta >= 0:
                print("eta>=0")
                return 0
            self.alphas[j] -= self.labelMat[j] * (Ei - Ej) / eta  
            self.alphas[j] = self.__HoldAlpha(self.alphas[j], H, L)  
            self.__updateEk(j)
            if (abs(self.alphas[j] - alphaJold) < self.tol): 
                print("j not moving enough")
                return 0
            self.alphas[i] += self.labelMat[j] * self.labelMat[i] * (alphaJold - self.alphas[j]) 
            self.__updateEk(i)  
            
            b1 = self.b - Ei - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, i] - self.labelMat[j] * (
                        self.alphas[j] - alphaJold) * self.K[i, j]
            b2 = self.b - Ej - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, j] - self.labelMat[j] * (
                        self.alphas[j] - alphaJold) * self.K[j, j]
            if (0 < self.alphas[i] < self.C):
                self.b = b1
            elif (0 < self.alphas[j] < self.C):
                self.b = b2
            else:
                self.b = (b1 + b2) / 2.0
            return 1
        else:
            return 0
    
    
    def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
    	"""
    		SMO algorithm is a heuristic algorithm, 
    		and I don't know the specific principle. 
    		The code comes from Github, 
    		and I plan to use PSO algorithm later.
    		"""
        self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
        iter = 0
        entireSet = True
        alphaPairsChanged = 0
        while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
            alphaPairsChanged = 0
            if entireSet:
                for i in range(self.m):  
                    alphaPairsChanged += self.KKTGoing(i)
                    print("fullSet, iter: %d i:%d, pairs changed %d" % (
                    iter, i, alphaPairsChanged)) 
                iter += 1
            else:
                nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
                for i in nonBoundIs:  
                    alphaPairsChanged += self.KKTGoing(i)
                    print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
                iter += 1
            if entireSet:
                entireSet = False
            elif (alphaPairsChanged == 0):
                entireSet = True
            print("iteration number: %d" % iter)
        return self.b, self.alphas
    
    def fit(self,dataSet:DataSet):
        dataArr, labelArr = dataSet.LoadDataSet()
        b, alphas = self.SMO(dataArr, labelArr, 200, 0.0001, 10000, self.Ktype) 
        self.b = b
        self.alphas = alphas
        datMat = mat(dataArr)
        labelMat = mat(labelArr).transpose()
        svInd = nonzero(alphas)[0]
        # Select the number of rows of data that is not 0 (that is, support vector)
        sVs = datMat[svInd]
        labelSV = labelMat[svInd]
        self.sVs = sVs
        self.labelSV = labelSV
        self.svInd  = svInd
        print("there are %d Support Vectors" % shape(sVs)[0])
        m, n = shape(datMat)  
        errorCount = 0
        for i in range(m):
            kernelEval = self.kernelFunction(sVs, datMat[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(labelSV, alphas[
                svInd]) + b
            if sign(predict) != sign(labelArr[i]):  # sign: -1 if x < 0, 0 if x==0, 1 if x > 0
                errorCount += 1
        print("the training error rate is: %f" % (float(errorCount) / m))  
    
    def save_model(self,path):
        dict = {}
        dict['b'] = self.b
        dict['alphas'] = self.alphas
        dict['sVs'] = self.sVs
        dict['labelSV'] = self.labelSV
        dict['svInd'] = self.svInd
        with open(path,'w') as file:
            file.write(dict)
    def load_mode(self,path):
        if(os.path.exists(path)):
            with open(path) as file:
                model = file.read()
                model = eval(model)
                self.b = model['b']
                self.alphas = model['alphas']
                self.sVs = model['sVs']
                self.labelSV = model['labelSV']
                self.svInd = model['svInd']
        else:
            raise Exception("Fuck you no such file")
    
    def predict(self,dataSet:DataSet):
        dataArr_test, labelArr_test = dataSet.LoadDataSet() 
        errorCount_test = 0
        datMat_test = mat(dataArr_test)
        m, n = shape(datMat_test)
        for i in range(m):  # 在测试数据上检验错误率
            kernelEval = self.kernelFunction(self.sVs, datMat_test[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
            if sign(predict) != sign(labelArr_test[i]):
                errorCount_test += 1
        print("the test error rate is: %f" % (float(errorCount_test) / m))
    
    def forcast(self,dataSet:DataSet,show=False):
        res = []
        dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
        datMat_forcast = mat(dataArr_forcast)
        m, n = shape(datMat_forcast)
        for i in range(m):  
            kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
            res.append(predict)
        if(show):
            print("the result is:",res)
        return res
    
    
    if __name__ == '__main__':
    
    train_path = r'\Data\svm_train.txt'
    test_data = r'\Data\svm_eval.txt'
    train_data = DataSet(train_path)
    test_data = DataSet(test_data)
    
    SVM = SVMModel(('rbf', 1.3))
    SVM.fit(train_data)
    SVM.predict(test_data)
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

get data

If you wish to obtain my data, you can visit the link:
链接:https://pan.baidu.com/s/1rTmao4zkQJiRW9zGcWpXHQ
提取码:6666
Mr. Wu Enda is highly regarded, and I’m confident in this statement’s accuracy.

Real case

This is a simple web-based example.

(Mr: Wu)

复制代码
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sb    
    from scipy.io import loadmat
    from sklearn import svm
    
    '''
    1.Prepare datasets
    '''
    mat = loadmat('data/ex6data1.mat')
    print(mat.keys())
    X = mat['X']
    y = mat['y']
    
    
    def plotData(X, y):
    plt.figure(figsize=(8, 6))
    plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap='rainbow')
    plt.xlabel('x1')
    plt.ylabel('x2')
    pass
    
    def plotBoundary(clf, X):
    '''Plot Decision Boundary'''
    x_min, x_max = X[:, 0].min() * 1.2, X[:, 0].max() * 1.1
    y_min, y_max = X[:, 1].min() * 1.1, X[:, 1].max() * 1.1
    # np.linspace(x_min, x_max, 500).shape---->(500, )  500是样本数
    # xx.shape, yy.shape ---->(500, 500) (500, 500)
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500), np.linspace(y_min, y_max, 500))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    # model.predict:模型预测 (250000, )
    # ravel()将多维数组转换为一维数组 xx.ravel().shape ----> (250000,1)
    # np.c 中的c是column(列)的缩写,就是按列叠加两个矩阵,就是把两个矩阵左右组合,要求行数相等。
    # np.c_[xx.ravel(), yy.ravel()].shape ----> (250000,2) 就是说建立了250000个样本
    Z = Z.reshape(xx.shape)
    plt.contour(xx, yy, Z)
    # 等高线得作用就是画出分隔得线
    pass
    
    
    models = [svm.SVC(C, kernel='linear') for C in [1, 100]]
    # 支持向量机模型 (kernel:核函数选项,这里是线性核函数 , C:权重,这里取1和100)
    # 线性核函数画的决策边界就是直线
    clfs = [model.fit(X, y.ravel()) for model in models]    # model.fit:拟合出模型
    score = [model.score(X, y) for model in models]        # [0.9803921568627451, 1.0]
    # title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]
    
    def plot():
    title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]
    for model, title in zip(clfs, title):
        # zip() 函数用于将可迭代的对象作为参数,将对象中对应的元素打包成一个个元组,然后返回由这些元组组成的列表。
        plt.figure(figsize=(8, 5))
        plotData(X, y)
        plotBoundary(model, X)  # 用拟合好的模型(预测那些250000个样本),绘制决策边界
        plt.title(title)
        pass
    pass
    
    # plt.show()
    
    '''
    2.SVM with Gaussian Kernels
    '''
    
    
    def gaussKernel(x1, x2, sigma):
    return np.exp(-(x1 - x2) ** 2).sum() / (2 * sigma ** 2)
    
    
    a = gaussKernel(np.array([1, 2, 1]), np.array([0, 4, -1]), 2.)  # 0.32465246735834974
    # print(a)
    
    '''
    Example Dataset 2
    '''
    
    mat = loadmat('data/ex6data2.mat')
    x2 = mat['X']
    y2 = mat['y']
    plotData(x2, y2)
    plt.show()
    
    sigma = 0.1
    gamma = np.power(sigma, -2)/2
    '''
    高斯核函数中的gamma越大,相当高斯函数中的σ越小,此时的分布曲线也就会越高越瘦。
    高斯核函数中的gamma越小,相当高斯函数中的σ越大,此时的分布曲线也就越矮越胖,smoothly,higher bias, lower variance
    '''
    clf = svm.SVC(C=1, kernel='rbf', gamma=gamma)
    model = clf.fit(x2, y2.flatten())       # kernel='rbf'表示支持向量机使用高斯核函数
    # 
    # plotData(x2, y2)
    # plotBoundary(model, x2)
    # plt.show()
    
    
    '''
    Example Dataset3
    '''
    mat3 = loadmat('data/ex6data3.mat')
    x3, y3 = mat3['X'], mat3['y']
    Xval, yval = mat3['Xval'], mat3['yval']
    plotData(x3, y3)
    # plt.show()
    
    Cvalues = (0.01, 0.03, 0.1, 0.3, 1., 3., 10., 30.)  # 权重C的候选值
    sigmavalues = Cvalues   # 核函数参数的候选值
    best_pair, best_score = (0, 0), 0        # 最佳的(C,sigma)权值 ,决定系数(R2)
    # 寻找最佳的权值(C,sigma)
    for C in Cvalues:
    for sigma in sigmavalues:
        gamma = np.power(sigma, -2.) / 2
        model = svm.SVC(C=C, kernel='rbf', gamma=gamma)     # 使用核函数的支持向量机
        model.fit(x3, y3.flatten())      # 拟合出模型
        this_score = model.score(Xval, yval)        # 利用交叉验证集来选择最合适的权重
        '''
         model.score函数的返回值是决定系数,也称R2。
         可以测度回归直线对样本数据的拟合程度,决定系数的取值在0到1之间,
         决定系数越高,模型的拟合效果越好,即模型解释因变量的能力越强。
         '''
        # 选择拟合得最好得权重值
        if this_score > best_score:
            best_score = this_score
            best_pair = (C, sigma)
        pass
    pass
    print('最优(C, sigma)权值:', best_pair, '决定系数:', best_score)
    # 最优(C, sigma)权值: (1.0, 0.1) 决定系数: 0.965
    model = svm.SVC(1, kernel='rbf', gamma=np.power(0.1, -2.) / 2)
    # 用确定好的权重再重新声明一次支持向量机
    model.fit(x3, y3.flatten())
    plotData(x3, y3)
    plotBoundary(model, x3)
    # plt.show()
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

全部评论 (0)

还没有任何评论哟~