我想做一个相当复杂的查询/聚合。我看不到该怎么做,因为我刚刚开始使用ES。我的文档看起来像这样:
{ "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A", ( other properties here ) }, { "name":"my second item", "item_property_1":"B", ( other properties here ) }, { "name":"my third item", "item_property_1":"A", ( other properties here ) } ] ( other properties... ) }, { "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A", ( other properties here ) }, { "name":"awesome item", "item_property_1":"C", ( other properties here ) }, ] ( other properties... ) }, ( other documents... )
现在,我想为每个关键字计算property_1可以具有的几个可能值中有多少个。也就是说,我需要一个具有以下响应的存储桶聚合:
{ "keyword": "some keyword", "item_property_1_aggretation": [ { "key":"A", "count": 2, }, { "key":"B", "count": 1, } ] }, { "keyword": "different keyword", "item_property_1_aggretation": [ { "key":"A", "count": 1, }, { "key":"C", "count": 1, } ] }, ( other keywords... )
如果需要映射,您还可以指定哪个吗?我没有任何非默认映射,我只是将所有内容都转储在那里。
编辑:通过在此处发布上一个示例的批量PUT为您节省了麻烦
PUT /test/test/_bulk { "index": {}} { "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]} { "index": {}} { "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]}
编辑2:
我只是试过这个:
POST /test/test/_search { "size":2, "aggregations": { "property_1_count": { "terms":{ "field":"item_property_1" } } } }
并得到了这个:
"aggregations": { "property_1_count": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "a", "doc_count": 2 }, { "key": "b", "doc_count": 1 }, { "key": "c", "doc_count": 1 } ] } }
关闭但没有雪茄。您可以看到发生了什么,item_property_1无论keyword它们属于哪个,它都在进行存储。我确定该解决方案涉及正确添加一些映射,但是我无法全力以赴。有什么建议吗?
item_property_1
keyword
EDIT3:基于此:https ://www.elastic.co/guide/zh- cn/elasticsearch/reference/current/mapping-nested-type.html 我想尝试将一个nested类型添加到property items。为此,我尝试:
nested
items
PUT /test/_mapping/test { "test":{ "properties": { "items": { "type": "nested", "properties": { "item_property_1":{"type":"string"} } } } } }
但是,这将返回错误:
{ "error": "MergeMappingException[Merge failed with failures {[object mapping [items] can't be changed from non-nested to nested]}]", "status": 400 }
这可能与该URL上的警告有关:“将对象类型更改为嵌套类型需要重新索引。”
那么,我该怎么做呢?
不错的尝试,您快到了!这是我想出的。根据您的映射建议,我正在使用的映射如下:
curl -XPUT localhost:9200/test/_mapping/test -d '{ "test": { "properties": { "keyword": { "type": "string", "index": "not_analyzed" }, "items": { "type": "nested", "properties": { "name": { "type": "string" }, "item_property_1": { "type": "string", "index": "not_analyzed" } } } } } }'
注意:您需要擦除数据并重新编制索引,因为您无法将字段类型从不是更改nested为nested。
然后,我使用您共享的批量查询创建了一些数据:
curl -XPOST localhost:9200/test/test/_bulk -d ' { "index": {}} { "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]} { "index": {}} { "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]} '
最后,这是可用于获取期望结果的聚合查询。我们首先keyword使用terms聚合来进行存储,然后针对每个关键字通过嵌套item_property_1字段进行存储。由于items现在是一个nested类型的,关键是用nested聚合的items,然后一个terms子聚集的item_property_1领域。
terms
{ "size": 0, "aggregations": { "by_keyword": { "terms": { "field": "keyword" }, "aggs": { "prop_1_count": { "nested": { "path": "items" }, "aggs": { "prop_1": { "terms": { "field": "items.item_property_1" } } } } } } } }
在您的数据集上运行该查询将产生以下结果:
{ ... "aggregations" : { "by_keyword" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "different keyword", <---- keyword 1 "doc_count" : 1, "prop_1_count" : { "doc_count" : 2, "prop_1" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { <---- buckets for item_property_1 "key" : "A", "doc_count" : 1 }, { "key" : "C", "doc_count" : 1 } ] } } }, { "key" : "some keyword", <---- keyword 2 "doc_count" : 1, "prop_1_count" : { "doc_count" : 3, "prop_1" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { <---- buckets for item_property_1 "key" : "A", "doc_count" : 2 }, { "key" : "B", "doc_count" : 1 } ] } } } ] } } }