分析器自动完成名称

小编典典

分析器自动完成名称

elasticsearch

我希望能够自动完成姓名。

例如，如果我们有名字John Smith，我希望能够寻找Jo并Sm和John Sm获取文档回来。

另外，我不想jo sm匹配该文档。

我目前有这个分析仪：

return array(
    'settings' => array(
        'index' => array(
            'analysis' => array(
                'analyzer' => array(
                    'autocomplete' => array(
                        'tokenizer' => 'autocompleteEngram',
                        'filter' => array('lowercase', 'whitespace')
                    )
                ),

                'tokenizer' => array(
                    'autocompleteEngram' => array(
                        'type' => 'edgeNGram',
                        'min_gram' => 1,
                        'max_gram' => 50
                    )
                )
            )   
        )
    )
);

问题在于，首先我们将文本分割开，然后使用edgengrams进行标记化。

结果是： j jo joh john s sm smi smit smith

这意味着，如果我搜索john smith或john sm，则不会返回任何内容。

因此，我需要生成看起来像这样的令牌： j jo joh john s sm smi smit smith john s john sm john smi john smit john smith。

如何设置分析仪，以便生成这些额外的令牌？

阅读 226

2020-06-22

共1个答案

小编典典

我最终没有使用edgengrams。

我用standard令牌生成器standard和lowercase过滤器创建了一个分析器。这实际上与standard分析器相同，但是没有任何停用词过滤器（我们毕竟是在搜索名称，可能会有一个叫The或的人An）。

然后我设定在上述分析仪作为index_analyzer和simple作为search_analyzer。将此设置与match_phrase_prefix查询配合使用非常有效。

这是我使用的自定义分析器（称为自动完成功能，用PHP表示）：

'autocomplete' => array(
                        'tokenizer' => 'standard',
                        'filter' => array('standard', 'lowercase')
                ),

2020-06-22