Advertisement

MySQL全文索引之布尔全文索引、查询扩展全文索引

阅读量:

MySQL通过IN BOOLEAN MODE修饰符实现布尔全文搜索功能。其中,在搜索字符串中的单词开头或结尾处的一些字符具有特殊意义。在这些查询中,+和-运算符分别指示是否需要包含特定单词来进行匹配操作。因此,在以下示例中

复制代码
    mysql> select * from articles;
    +----+-----------------------+------------------------------------------+
|id|title|body|

    +----+-----------------------+------------------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|
|---|---|---|
|3|Optimizing MySQL|In this tutorial we will show ...|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|
|5|MySQL vs. YourSQL|In the following database comparison ...|
|6|MySQL Security|When configured properly, MySQL ...|

    +----+-----------------------+------------------------------------------+
    6 rows in set (0.00 sec)
    
    mysql> select * from articles where match(title,body)
    -> against ('+mysql -database' in boolean mode);		//包含mysql不含database
    +----+-----------------------+-------------------------------------+
|id|title|body|

    +----+-----------------------+-------------------------------------+
|6|MySQL Security|When configured properly, MySQL ...|
|---|---|---|
|3|Optimizing MySQL|In this tutorial we will show ...|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|

    +----+-----------------------+-------------------------------------+
    4 rows in set (0.00 sec)

MySQL采用了被称为隐含布尔逻辑的特殊内容;其中'+"表示AND操作,'-"表示NOT操作,而没有操作符则默认为OR

全文布尔索引对于InnoDB表和MyISAM表而言有一些差别:

InnoDB MyISAM
是否相关性降低的顺序自动对行进行排序
最小字长全文参数 innodb_ft_min_token_size ft_min_word_len
最大字长全文参数 innodb_ft_max_token_size ft_max_word_len
停止字适用
是否支持单个搜索词上使用多个运算符
是否支持后导加减号

(1)InnoDB示例

复制代码
    mysql> show create table articles\G;
    *************************** 1. row ***************************
       Table: articles
    Create Table: CREATE TABLE `articles` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `title` varchar(200) DEFAULT NULL,
      `body` text,
      PRIMARY KEY (`id`),
      FULLTEXT KEY `title` (`title`,`body`)
    ) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    
    
    //自动按照相关性降低的顺序排序了
    
    mysql> select * from articles where match(title,body)
    -> against('+mysql -database' in boolean mode);		
    +----+-----------------------+-------------------------------------+
|id|title|body|

    +----+-----------------------+-------------------------------------+
|6|MySQL Security|When configured properly, MySQL ...|
|---|---|---|
|3|Optimizing MySQL|In this tutorial we will show ...|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|

    +----+-----------------------+-------------------------------------+
    4 rows in set (0.00 sec)
    
    mysql> select *,match(title,body) against('+mysql -database') as relevance
    -> from articles where match(title,body)
    -> against('+mysql -database' in boolean mode);
    +----+-----------------------+-------------------------------------+----------------------------+
|id|title|body|relevance|

    +----+-----------------------+-------------------------------------+----------------------------+
|6|MySQL Security|When configured properly, MySQL ...|0.000000003771856604828372|
|---|---|---|---|
|3|Optimizing MySQL|In this tutorial we will show ...|0.000000001885928302414186|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|0.000000001885928302414186|

    +----+-----------------------+-------------------------------------+----------------------------+
    4 rows in set (0.00 sec)

最大最小字长全文参数

复制代码
    mysql> show variables like 'innodb_ft_max_token_size';
    +--------------------------+-------+
|Variable_name|Value|

    +--------------------------+-------+
|innodb_ft_max_token_size|84|

    +--------------------------+-------+
    1 row in set, 1 warning (0.00 sec)
    
    mysql> show variables like 'innodb_ft_min_token_size';
    +--------------------------+-------+
|Variable_name|Value|

    +--------------------------+-------+
|innodb_ft_min_token_size|3|

    +--------------------------+-------+
    1 row in set, 1 warning (0.00 sec)

使用多个运算符会报错

复制代码
    mysql>  select * from articles where match(title,body)
    ->  against('++database' in boolean mode);
    ERROR 1064 (42000): syntax error, unexpected '+'

使用后导加减号报错

复制代码
    mysql> select * from articles where match(title,body)
    -> against('database+' in boolean mode);
    ERROR 1064 (42000): syntax error, unexpected $end
    
    mysql> select * from articles where match(title,body)
    -> against('+mysql database-' in boolean mode);
    ERROR 1064 (42000): syntax error, unexpected $end

(2)MyISAM示例

复制代码
    mysql> show create table articles\G;
    *************************** 1. row ***************************
       Table: articles
    Create Table: CREATE TABLE `articles` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `title` varchar(200) DEFAULT NULL,
      `body` text,
      PRIMARY KEY (`id`),
      FULLTEXT KEY `title` (`title`,`body`)
    ) ENGINE=MyISAM AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    
    //并不会按照相关性顺序降低进行排序
    
    mysql> select * from articles where match(title,body)
    -> against('+mysql -database' in boolean mode);
    +----+-----------------------+-------------------------------------+
|id|title|body|

    +----+-----------------------+-------------------------------------+
|2|How To Use MySQL Well|After you went through a ...|
|---|---|---|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|
|6|MySQL Security|When configured properly, MySQL ...|

    +----+-----------------------+-------------------------------------+
    4 rows in set (0.00 sec)
    
    mysql>  select *,match(title,body) against('+mysql -database') as relevance
    ->  from articles where match(title,body)
    ->  against('+mysql -database' in boolean mode);
    +----+-----------------------+-------------------------------------+-----------+
|id|title|body|relevance|

    +----+-----------------------+-------------------------------------+-----------+
|2|How To Use MySQL Well|After you went through a ...|0|
|---|---|---|---|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|0|
|6|MySQL Security|When configured properly, MySQL ...|0|

    +----+-----------------------+-------------------------------------+-----------+
    4 rows in set (0.00 sec)

最大最小字长参数

复制代码
    mysql> show variables like 'ft_max_word_len';
    +-----------------+-------+
|Variable_name|Value|

    +-----------------+-------+
|ft_max_word_len|84|

    +-----------------+-------+
    1 row in set, 1 warning (0.01 sec)
    
    mysql> show variables like 'ft_min_word_len';
    +-----------------+-------+
|Variable_name|Value|

    +-----------------+-------+
|ft_min_word_len|4|

    +-----------------+-------+
    1 row in set, 1 warning (0.00 sec)

对于包含多个运算符的情况而言,在使用MyISAM进行全文搜索时能够正确处理这些情况;它不会考虑任何不在紧邻搜索词后面的运算符;例如,在下述示例中仅处理了紧邻的加号。

复制代码
    mysql> select * from articles where match(title,body)
    -> against('+--++database' in boolean mode);
    +----+-------------------+------------------------------------------+
|id|title|body|

    +----+-------------------+------------------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|
|---|---|---|

    +----+-------------------+------------------------------------------+
    2 rows in set (0.00 sec)

可以处理后导加号,后导减号不处理但不报错

复制代码
    mysql> select * from articles where match(title,body)
    -> against('mysql+ -database' in boolean mode);
    +----+-----------------------+-------------------------------------+
|id|title|body|

    +----+-----------------------+-------------------------------------+
|2|How To Use MySQL Well|After you went through a ...|
|---|---|---|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|
|6|MySQL Security|When configured properly, MySQL ...|

    +----+-----------------------+-------------------------------------+
    4 rows in set (0.00 sec)
    
    mysql> select * from articles where match(title,body)
    -> against('mysql+ database-' in boolean mode);
    +----+-----------------------+------------------------------------------+
|id|title|body|

    +----+-----------------------+------------------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|
|---|---|---|
|3|Optimizing MySQL|In this tutorial we will show ...|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|
|5|MySQL vs. YourSQL|In the following database comparison ...|
|6|MySQL Security|When configured properly, MySQL ...|

    +----+-----------------------+------------------------------------------+
    6 rows in set (0.00 sec)

(3)其他

复制代码
    mysql> select * from articles where match(title,body)
    -> against('+mysql' in boolean mode);
    +----+-----------------------+------------------------------------------+
|id|title|body|

    +----+-----------------------+------------------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|
|---|---|---|
|3|Optimizing MySQL|In this tutorial we will show ...|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|
|5|MySQL vs. YourSQL|In the following database comparison ...|
|6|MySQL Security|When configured properly, MySQL ...|

    +----+-----------------------+------------------------------------------+
    6 rows in set (0.00 sec)

减号运算符专用于过滤掉那些通过其他途径匹配的搜索项所在的行。例如,在以下示例中,并不会返回除了那些只包含'database'以外的所有行

复制代码
    mysql> select * from articles where match(title,body)
    -> against('-database' in boolean mode);
    Empty set (0.00 sec)

以下示例演示了一些使用布尔全文运算符的搜索字符串:

查找包含tutorial 或database中任意一个词的行

复制代码
    mysql> select * from articles where match(title,body)
    -> against('tutorial database' in boolean mode);
    +----+-------------------+------------------------------------------+
|id|title|body|

    +----+-------------------+------------------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|
|---|---|---|
|5|MySQL vs. YourSQL|In the following database comparison ...|

    +----+-------------------+------------------------------------------+
    3 rows in set (0.00 sec)

查找包含tutorial 和database两个单词的行。

复制代码
    mysql> select * from articles where match(title,body)
    -> against('+tutorial +database' in boolean mode);
    +----+----------------+------------------------------+
|id|title|body|

    +----+----------------+------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|

    +----+----------------+------------------------------+
    1 row in set (0.00 sec)

搜索包含单词' tutorial '的所有行时,请注意如果这些行同时也包含' database '的话,则应优先排列它们。

复制代码
    mysql> select * from articles where match(title,body)
    -> against('+tutorial database' in boolean mode);
    +----+------------------+-----------------------------------+
|id|title|body|

    +----+------------------+-----------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|
|---|---|---|

    +----+------------------+-----------------------------------+
    2 rows in set (0.00 sec)

搜索并筛选出包含关键词" tutorial "的所有记录。对于那些同时含有" database "这一字段值的记录,在评分标准中给予其较低的评分等级。采用符号""作为评分分隔符更具灵活性(即使用符号""作为评分分隔符时,则对该记录予以跳过)。

复制代码
    mysql> select * from articles where match(title,body)
    -> against('+tutorial -database' in boolean mode);
    +----+------------------+-----------------------------------+
|id|title|body|

    +----+------------------+-----------------------------------+
|3|Optimizing MySQL|In this tutorial we will show ...|

    +----+------------------+-----------------------------------+
    1 row in set (0.00 sec)
    
    mysql> select * from articles where match(title,body)
    -> against('+tutorial ~database' in boolean mode);
    +----+------------------+-----------------------------------+
|id|title|body|

    +----+------------------+-----------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|
|---|---|---|

    +----+------------------+-----------------------------------+
    2 rows in set (0.00 sec)

搜索所有同时含有"mysql"与"tutorial"或者"mysql"与"database"(不论顺序)的行,但仅当"mysql tutorial"的排名高于"mysql database".
使用括号创建子表达式.
通过大于号(>)和小于号(<)来调整单词对赋予行的相关性值的影响.

复制代码
    mysql> select * from articles where match(title,body)
    -> against('+mysql +(>tutorial<database)' in boolean mode);
    +----+-------------------+------------------------------------------+
|id|title|body|

    +----+-------------------+------------------------------------------+
|3|Optimizing MySQL|In this tutorial we will show ...|
|---|---|---|
|5|MySQL vs. YourSQL|In the following database comparison ...|

    +----+-------------------+------------------------------------------+
    3 rows in set (0.00 sec)

查找包含诸如“ data”等字词的行。

复制代码
    mysql> select * from articles where match(title,body)
    -> against('+data*' in boolean mode);
    +----+-------------------+------------------------------------------+
|id|title|body|

    +----+-------------------+------------------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|
|---|---|---|

    +----+-------------------+------------------------------------------+
    2 rows in set (0.00 sec)

双引号(")字符中包含的短语仅匹配字面上直接出现该短语的内容行;如果未加双引号,则会检索到同时包含MySQL或Tutorial中的任一词的内容行。通过使用双引号可以执行短语搜索;同样能达到类似的效果。

复制代码
    mysql> select * from articles where match(title,body)
    -> against('"MySQL Tutorial"' in boolean mode);
    +----+----------------+------------------------------+
|id|title|body|

    +----+----------------+------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|

    +----+----------------+------------------------------+
    1 row in set (0.00 sec)
    
    mysql> select * from articles where title like '%mysql tutorial%';
    +----+----------------+------------------------------+
|id|title|body|

    +----+----------------+------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|

    +----+----------------+------------------------------+
    1 row in set (0.00 sec)

在执行上述查询时, MySQL必须从索引中检索所有同时包含两个' Tutorial'关键词的记录条目, 这将导致搜索效率显著下降. 为此, 在回表过滤之前, 系统可能会返回大量可能需要进一步验证的数据项. 因此, 如果所选关键词过于常见, 由于前一步骤会返回大量可能需要进一步验证的数据项, 这种情况下 LIKE 操作将不得不逐条检查每一条记录以确保完整性.

2、查询扩展全文索引

支持全文检索功能的系统能够实现查询范围的扩展。当搜索提示过于简短时,在某些情况下这一特性会非常有用。这种情况下通常意味着用户依赖于该全文搜索引擎所蕴含的知识来完成特定的任务。例如,在搜索'database'的情况下,实际意图可能是识别为MySQL、Oracle、DB2或RDBMS等相关的术语进行匹配并返回相应的结果集。

为了启用盲查询扩展功能(亦称自动相关性反馈),需在搜索短语后添加WITH QUERY EXPANSION或IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION以启用盲查询扩展功能。该机制通过两次检索流程实现信息获取:首先,MySQL全文搜索引擎会检索与搜索查询相匹配的所有行;随后系统会从搜索结果中提取相关关键词,并基于这些关键词执行第二次检索。

以下示例显示了这种差异:

复制代码
    mysql> show create table articles\G;
    *************************** 1. row ***************************
       Table: articles
    Create Table: CREATE TABLE `articles` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `title` varchar(200) DEFAULT NULL,
      `body` text,
      PRIMARY KEY (`id`),
      FULLTEXT KEY `title` (`title`,`body`)
    ) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    
    mysql> select * from articles
    -> where match(title,body)
    -> against('database' in natural language mode);
    +----+-------------------+------------------------------------------+
|id|title|body|

    +----+-------------------+------------------------------------------+
|1|MySQL Tutorial|DBMS stands for DataBase ...|
|---|---|---|

    +----+-------------------+------------------------------------------+
    2 rows in set (0.00 sec)
    
    mysql> select * from articles
    -> where match(title,body)
    -> against('database' in natural language mode with query expansion);		//包含MySQL的也被返回
    +----+-----------------------+------------------------------------------+
|id|title|body|

    +----+-----------------------+------------------------------------------+
|5|MySQL vs. YourSQL|In the following database comparison ...|
|---|---|---|
|3|Optimizing MySQL|In this tutorial we will show ...|
|6|MySQL Security|When configured properly, MySQL ...|
|4|1001 MySQL Tricks|1. Never run mysqld as root. 2. ...|
|2|How To Use MySQL Well|After you went through a ...|

    +----+-----------------------+------------------------------------------+
    6 rows in set (0.00 sec)

因为信息检索功能增强可能导致检索结果包含与搜索主题无关的内容,在这种情况下建议仅在搜索短语较短时采用此方法。

全部评论 (0)

还没有任何评论哟~