我们正在尝试在Elasticsearch中找到不同的内部对象。这将是我们案例的最小示例。我们一直坚持下面的映射(更改类型或索引或添加新字段不会有问题,但结构应保持原样):
{ "building": { "properties": { "street": { "type": "string", "store": "yes", "index": "not_analyzed" }, "house number": { "type": "string", "store": "yes", "index": "not_analyzed" }, "city": { "type": "string", "store": "yes", "index": "not_analyzed" }, "people": { "type": "object", "store": "yes", "index": "not_analyzed", "properties": { "firstName": { "type": "string", "store": "yes", "index": "not_analyzed" }, "lastName": { "type": "string", "store": "yes", "index": "not_analyzed" } } } } } }
假设我们有以下示例数据:
{ "buildings": [ { "street": "Baker Street", "house number": "221 B", "city": "London", "people": [ { "firstName": "John", "lastName": "Doe" }, { "firstName": "Jane", "lastName": "Doe" } ] }, { "street": "Baker Street", "house number": "5", "city": "London", "people": [ { "firstName": "John", "lastName": "Doe" } ] }, { "street": "Garden Street", "house number": "1", "city": "London", "people": [ { "firstName": "Jane", "lastName": "Smith" } ] } ] }
当查询街道“贝克街”(以及所需的任何其他选项)时,我们希望获得以下列表:
[ { "firstName": "John", "lastName": "Doe" }, { "firstName": "Jane", "lastName": "Doe" } ]
格式并不重要,但是我们应该能够解析名字和姓氏。只是,由于我们的实际数据集要大得多,因此我们需要使输入项不同。
我们正在使用Elasticsearch 1.7。
我们终于解决了我们的问题。
我们的解决方案是(如我们预期的那样)一个预先计算的people_all字段。但是在导入数据时,我们正在编写其他字段,而不是使用copy_toor transform而是在编写它。该字段如下所示:
people_all
copy_to
transform
"people": { "type": "nested", .. "properties": { "firstName": { "type": "string", "store": "yes", "index": "not_analyzed" }, "lastName": { "type": "string", "store": "yes", "index": "not_analyzed" }, "people_all": { "type": "string", "index": "not_analyzed" } } }
请"index": "not_analyzed"在people_all现场注意。这对于拥有完整的存储桶很重要。如果您不使用它,我们的示例将返回3个存储桶“ john”,“ jane”和“ doe”。
"index": "not_analyzed"
编写完这个新字段后,我们可以进行如下操作:
{ "size": 0, "query": { "term": { "street": "Baker Street" } }, "aggs": { "people_distinct": { "nested": { "path": "people" }, "aggs": { "people_all_distinct": { "terms": { "field": "people.people_all", "size": 0 } } } } } }
我们返回以下响应:
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.0, "hits": [] }, "aggregations": { "people_distinct": { "doc_count": 3, "people_name_distinct": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "John Doe", "doc_count": 2 }, { "key": "Jane Doe", "doc_count": 1 } ] } } } }
现在,在响应中,我们可以创建不同的人员对象。
请让我们知道是否有更好的方法来实现我们的目标。 解析存储桶不是最佳解决方案,firstName并且lastName在每个存储桶中都包含字段会更加有趣。
firstName
lastName