我在理解ES查询系统的基础方面遇到很多问题。
我有以下查询示例:
{ "size": 0, "query": { "bool": { "must": [ { "term": { "referer": "www.xx.yy.com" } }, { "range": { "@timestamp": { "gte": "now", "lt": "now-1h" } } } ] } }, "aggs": { "interval": { "date_histogram": { "field": "@timestamp", "interval": "0.5h" }, "aggs": { "what": { "cardinality": { "field": "host" } } } } } }
该请求得到太多结果:
“状态”:500,“原因”:“ ElasticsearchException [org.elasticsearch.common.breaker.CircuitBreakingException:数据太大,字段[@timestamp]的数据将大于[3200306380 / 2.9gb]]的限制;嵌套: UncheckedExecutionException [org.elasticsearch.common.breaker.CircuitBreakingException:数据太大,字段[@timestamp]的数据将大于限制[3200306380 / 2.9gb]];嵌套:CircuitBreakingException [数据太大,字段[@的数据]时间戳记]将大于[3200306380 / 2.9gb]的限制];“
我试过了这个请求:
{ "size": 0, "filter": { "and": [ { "term": { "referer": "www.geoportail.gouv.fr" } }, { "range": { "@timestamp": { "from": "2014-10-04", "to": "2014-10-05" } } } ] }, "aggs": { "interval": { "date_histogram": { "field": "@timestamp", "interval": "0.5h" }, "aggs": { "what": { "cardinality": { "field": "host" } } } } } }
我想过滤数据以便能够获得正确的结果,我们将不胜感激!
我找到了解决方案,这很奇怪。我遵循了dimzak的建议并清除了缓存:
curl --noproxy localhost -XPOST "http://localhost:9200/_cache/clear"
然后我使用了过滤,而不是按照Olly的建议进行查询:
{ "size": 0, "query": { "filtered": { "query": { "term": { "referer": "www.xx.yy.fr" } }, "filter" : { "range": { "@timestamp": { "from": "2014-10-04T00:00", "to": "2014-10-05T00:00" } } } } }, "aggs": { "interval": { "date_histogram": { "field": "@timestamp", "interval": "0.5h" }, "aggs": { "what": { "cardinality": { "field": "host" } } } } } }
我不能给你们两个答案,我认为dimzak是最好的选择,但是请你们两个人赞成:)