我是Elasticsearch的新手,我想使用同义词,我在配置文件中添加了以下几行:
index : analysis : analyzer : synonym : type : custom tokenizer : whitespace filter : [synonym] filter : synonym : type : synonym synonyms_path: synonyms.txt
然后我创建了一个索引测试:
"mappings" : { "test" : { "properties" : { "text_1" : { "type" : "string", "analyzer" : "synonym" }, "text_2" : { "search_analyzer" : "standard", "index_analyzer" : "synonym", "type" : "string" }, "text_3" : { "type" : "string", "analyzer" : "synonym" } } }
}
并使用以下数据插入类型测试:
{ "text_3" : "foo dog cat", "text_2" : "foo dog cat", "text_1" : "foo dog cat" }
onymousy.txt包含“ foo,bar,baz”,当我搜索foo时,它返回我期望的结果,但是当我搜索baz或bar时,它返回零结果:
{ "query":{ "query_string":{ "query" : "bar", "fields" : [ "text_1"], "use_dis_max" : true, "boost" : 1.0 }}}
结果:
{ "took":1, "timed_out":false, "_shards":{ "total":5, "successful":5, "failed":0 }, "hits":{ "total":0, "max_score":null, "hits":[ ] } }
我不知道,如果您的问题是因为您为“ bar”定义了错误的同义词。正如您所说的,您是一个非常新的人,我将举一个与您的例子相似的例子。我想展示一下Elasticsearch在搜索时和索引时如何处理同义词。希望能帮助到你。
首先,创建同义词文件:
foo => foo bar, baz
现在,我使用您要测试的特定设置创建索引:
curl -XPUT 'http://localhost:9200/test/' -d '{ "settings": { "index": { "analysis": { "analyzer": { "synonym": { "tokenizer": "whitespace", "filter": ["synonym"] } }, "filter" : { "synonym" : { "type" : "synonym", "synonyms_path" : "synonyms.txt" } } } } }, "mappings": { "test" : { "properties" : { "text_1" : { "type" : "string", "analyzer" : "synonym" }, "text_2" : { "search_analyzer" : "standard", "index_analyzer" : "standard", "type" : "string" }, "text_3" : { "type" : "string", "search_analyzer" : "synonym", "index_analyzer" : "standard" } } } } }'
请注意,onymous.txt必须与配置文件位于同一目录中,因为该路径相对于config目录。
现在为文档编制索引:
curl -XPUT 'http://localhost:9200/test/test/1' -d '{ "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" }'
现在搜索
在字段text_1中搜索
curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz' { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.15342641, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 0.15342641, "_source": { "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" } } ] } }
您得到该文档,因为baz是foo的同义词,并且在索引时间foo用其同义词扩展
在字段text_2中搜索
curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz'
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }
我没有获得成功,因为我在索引(标准分析器)时没有扩展同义词。而且,由于我正在搜索baz,并且baz不在文本中,所以没有任何结果。
在字段text_3中搜索
curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo' { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.15342641, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 0.15342641, "_source": { "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" } } ] } }
注意:text_3是“巴兹狗猫”
text_3是没有扩展同义词的索引。当我搜索foo时,它的同义词之一是“ baz”,我得到了结果。
如果要调试,可以使用_analyze端点,例如:
_analyze
curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'
{ "tokens": [ { "token": "foo", "start_offset": 0, "end_offset": 3, "type": "SYNONYM", "position": 1 }, { "token": "baz", "start_offset": 0, "end_offset": 3, "type": "SYNONYM", "position": 1 }, { "token": "bar", "start_offset": 0, "end_offset": 3, "type": "SYNONYM", "position": 2 } ] }