我有一个看起来像这样的文档:
{ "_id":"some_id_value", "_source":{ "client":{ "name":"x" }, "project":{ "name":"x November 2016" } } }
我正在尝试执行一个查询,该查询将为我获取每个客户端的唯一项目名称的计数。对于这一点,我在查询中使用cardinality过project.name。我确定4该特定客户端只有唯一的项目名称。但是,当我运行查询时,我得到的计数5,我知道这是错误的。
cardinality
project.name
4
5
项目名称全部包含客户端的名称。例如,如果客户为“ X”,则项目名称将为“ X Testing November 2016”或“ X Jan 2016”,等等。我不知道这是不是一个考虑因素。
这是文档类型的映射
{ "mappings":{ "vma_docs":{ "properties":{ "client":{ "properties":{ "contact":{ "type":"string" }, "name":{ "type":"string" } } }, "project":{ "properties":{ "end_date":{ "format":"yyyy-MM-dd", "type":"date" }, "project_type":{ "type":"string" }, "name":{ "type":"string" }, "project_manager":{ "index":"not_analyzed", "type":"string" }, "start_date":{ "format":"yyyy-MM-dd", "type":"date" } } } } } } }
这是我的搜索查询
{ "fields":[ "client.name", "project.name" ], "query":{ "bool":{ "must":{ "match":{ "client.name":{ "operator":"and", "query":"ABC systems" } } } } }, "aggs":{ "num_projects":{ "cardinality":{ "field":"project.name" } } }, "size":5 }
这些是我得到的结果(为简洁起见,我仅发布了2个结果)。请发现num_projects聚合返回5,但必须仅返回4,这是项目的总数。
{ "hits":{ "hits":[ { "_score":5.8553367, "_type":"vma_docs", "_id":"AVTMIM9IBwwoAW3mzgKz", "fields":{ "project.name":[ "ABC" ], "client.name":[ "ABC systems Pvt Ltd" ] }, "_index":"vma" }, { "_score":5.8553367, "_type":"vma_docs", "_id":"AVTMIM9YBwwoAW3mzgK2", "fields":{ "project.name":[ "ABC" ], "client.name":[ "ABC systems Pvt Ltd" ] }, "_index":"vma" } ], "total":18, "max_score":5.8553367 }, "_shards":{ "successful":5, "failed":0, "total":5 }, "took":4, "aggregations":{ "num_projects":{ "value":5 } }, "timed_out":false }
FYI:项目名称ABC,ABC Nov 2016,ABC retest November,ABC Mobile App
ABC
ABC Nov 2016
ABC retest November
ABC Mobile App
您需要为您的project.name字段进行以下映射:
{ "mappings": { "vma_docs": { "properties": { "client": { "properties": { "contact": { "type": "string" }, "name": { "type": "string" } } }, "project": { "properties": { "end_date": { "format": "yyyy-MM-dd", "type": "date" }, "project_type": { "type": "string" }, "name": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "project_manager": { "index": "not_analyzed", "type": "string" }, "start_date": { "format": "yyyy-MM-dd", "type": "date" } } } } } } }
从根本上讲raw,这是一个子字段,在其中输入了相同的值project.name,project.name.raw但没有触及(进行标记或分析)。然后,您需要使用的查询是:
raw
project.name.raw
{ "fields": [ "client.name", "project.name" ], "query": { "bool": { "must": { "match": { "client.name": { "operator": "and", "query": "ABC systems" } } } } }, "aggs": { "num_projects": { "cardinality": { "field": "project.name.raw" } } }, "size": 5 }