Bag of words（matlab实现）

阅读量：

Bag of Word主要思想：将训练样本特征Kmeans聚类，对测试样本的每个特征，计算与其最近的类心，相应类别计数count加1，这样每个测试样本可以生成ncenter维的直方图。

比如：训练样本特征a、b、c、a、d、f、e、b、e、d、c、f，如果类别数ncenter为6，则可以聚成6类[a,b,c,d,e,f]注意实际聚类时类心不一定为训练样本中特征，因为kmeans聚类更新类心时都重新计算。

假如一个测试样本特征为：a、b、c、d.那么经过BoW生成6维的直方图[1,1,1,1,0,0].

其实前面就是kmeans，然后Hard voting。关于kmeans不细说了，就是更新类心的过程，一直到类心变化在误差范围内。

kmeans聚类时用的训练数据中center个随机数据初始化，后面用的欧氏距离度量，其中计算欧氏距离时用了矢量化编程，加速运算。

这是参考了别人的代码实现的，每个人针对自己的研究可能还需要小小修改。适合入门的看看。

复制代码

    function dic=CalDic(data,dicsize)
    fprintf('Building Dictionary using Training Data\n\n');
    dictionarySize = dicsize;
    niters=100;%迭代次数
    centres=zeros(dictionarySize,size(data,2));
    [ndata,data_dim]=size(data);
    [ncentres,dim]=size(centres);
    %% initialization
    
    perm = randperm(ndata);
    perm = perm(1:ncentres);
    centres = data(perm, :);
    
    num_points=zeros(1,dictionarySize);
    old_centres = centres;
    display('Run k-means');
    
    for n=1:niters
        % Save old centres to check for termination
        e2=max(max(abs(centres - old_centres)));
        
        inError(n)=e2;
        old_centres = centres;
        tempc = zeros(ncentres, dim);
        num_points=zeros(1,ncentres);             
            
          
            [ndata, data_dim] = size(data);
            
            id = eye(ncentres);
            d2 = EuclideanDistance(data,centres);
            % Assign each point to nearest centre
            [minvals, index] = min(d2', [], 1);
            post = id(index,:); % matrix, if word i is in cluster j, post(i,j)=1, else 0;
            
            num_points = num_points + sum(post, 1);
            
            for j = 1:ncentres
                tempc(j,:) =  tempc(j,:)+sum(data(find(post(:,j)),:), 1);
            end            
        
        
        for j = 1:ncentres
            if num_points(j)>0
                centres(j,:) =  tempc(j,:)/num_points(j);
            end
        end
        if n > 1
            % Test for termination
            
            %Threshold
            ThrError=0.009;
            
            if max(max(abs(centres - old_centres))) <0.009
                dictionary= centres;
                fprintf('Saving texton dictionary\n');
                mkdir('data');%建立data文件夹
                save ('data\dictionary','dictionary');%保存dictionary到data文件夹下。
                break;
            end
            
            fprintf('The %d th interation finished \n',n);
        end
        
    end

下面是欧氏距离函数：

复制代码

    function d = EuclideanDistance(a,b)
    % DISTANCE - computes Euclidean distance matrix
    %
    % E = EuclideanDistance(A,B)
    %
    %    A - (MxD) matrix 
    %    B - (NxD) matrix
    %
    % Returns:
    %    E - (MxN) Euclidean distances between vectors in A and B
    %
    %
    % Description : 
    %    This fully vectorized (VERY FAST!) m-file computes the 
    %    Euclidean distance between two vectors by:
    %
    %                 ||A-B|| = sqrt ( ||A||^2 + ||B||^2 - 2*A.B )
    %
    % Example : 
    %    A = rand(100,400); B = rand(200,400);
    %    d = EuclideanDistance(A,B);
    
    % Author   : Roland Bunschoten
    %            University of Amsterdam
    %            Intelligent Autonomous Systems (IAS) group
    %            Kruislaan 403  1098 SJ Amsterdam
    %            tel.(+31)20-5257524
    %            bunschot@wins.uva.nl
    % Last Rev : Oct 29 16:35:48 MET DST 1999
    % Tested   : PC Matlab v5.2 and Solaris Matlab v5.3
    % Thanx    : Nikos Vlassis
    
    % Copyright notice: You are free to modify, extend and distribute 
    %    this code granted that the author of the original code is 
    %    mentioned as the original author of the code.
    
    if (nargin ~= 2)
    b=a;
    end
    
    if (size(a,2) ~= size(b,2))
       error('A and B should be of same dimensionality');
    end
    
    aa=sum(a.*a,2); bb=sum(b.*b,2); ab=a*b'; 
    d = sqrt(abs(repmat(aa,[1 size(bb,1)]) + repmat(bb',[size(aa,1) 1]) - 2*ab));

复制代码

    <strong><span style="font-family:Times New Roman;font-size:18px;">
    </span></strong>

复制代码

    <strong><span style="font-family:Times New Roman;font-size:18px;">下面是Hard Voting函数：</span></strong>

复制代码

    function His=HardVoting(data,dic)
    ncentres=size(dic,1);
    id = eye(ncentres);
    d2 = EuclideanDistance(data,dic);% Assign each point to nearest centre
    [minvals, index] = min(d2', [], 1);
    post = id(index,:); % matrix, if word i is in cluster j, post(i,j)=1, else 0
    His=sum(post, 1);
    end

如果用于分类问题，可以尝试用LLC(CVPR2010) 一般比Hard Voting效果好。

全部评论 (0)

还没有任何评论哟~

Bag of words（matlab实现）

BagofWord主要思想：将训练样本特征Kmeans聚类，对测试样本的每个特征，计算与其最近的类心，相应类别计数count加1，这样每个测试样本可以生成ncenter维的直方图。

week5 Bag of Visual Words (Bag of Features)

BagofVisualWordsBagofFeatures 采用kmeans聚类方法对所提取的大量特征进行无监督聚类，将具有相似性较强的特征归入到一个聚类类别里，定义每个聚类的中心即为图像的“单词”，...

浅析Bag-of-words及Bag-of-features

目录 Bagofwords简介 Bagofwords应用于图像 Bagoffeatures基础流程 1、特征提取 2、学习“视觉词典（visualvocabulary）” 3、针对输入特征集，根据视觉...

词袋模型（Bag of Words）

词袋模型（BagofWords，简称BoW）是自然语言处理和信息检索中的一种常用文本表示方法。它将文本表示为一个词的集合，忽略词语的顺序和语法结构，只关注词语的出现频率。

Bag-of-words model in computer vision

一、Bagofwordsmodel Bagofwordsmodel是用于自然语言处理和信息检索中的一种简单的文档表示方法。通过这一模型，一篇文档可以通过统计所有word的数目来表示，这种方法不考虑语法...

Bag Of Visual Words 三大步

第一步：Featuredetection Incomputervisionandimageprocessingtheconceptoffeaturedetectionreferstomethodsth...

Bag of words 词袋模型（概念+代码实现）

Bagofwords ThebagofwordsBOWmodelisarepresentationthatturnsarbitrarytextintofixedlengthvectorsbycount...

词袋模型（Bag of Words Model）

词袋模型是将文本转换成向量的一种方式，且容易实现，本文将详细地阐述词袋模型以及如何实现词袋模型。文本存在的问题在对文本进行建模的时候存在一个问题，就是“混乱”，因为像机器学习算法通常更喜欢固定长度...

Bag of words model (词袋模型)

Thebagofwordsmodelisasimplifyingassumptionusedinnaturallanguageprocessingandinformationretrieval.Int...

「机器学习_8」Bag-of-Words

BagofWords 1.文字问题 2.什么是BagofWords具体例子） 3\.局限性 1.文字问题文本建模的一个问题是它很杂乱，机器学习算法之类的技术更喜欢定义明确的固定长度输入和输出。机器...

是否确定退出登录?

Bag of words（matlab实现）

全部评论 (0)

相关文章推荐

Bag of words（matlab实现）

week5 Bag of Visual Words (Bag of Features)

浅析Bag-of-words及Bag-of-features

词袋模型（Bag of Words）

Bag-of-words model in computer vision

Bag Of Visual Words 三大步

Bag of words 词袋模型（概念+代码实现）

词袋模型（Bag of Words Model）

Bag of words model (词袋模型)

「机器学习_8」Bag-of-Words