Elasticsearch Aggregation (Aggregation and Classification)

Aggregation

Elasticsearch can run the following aggregation queries.

Metrics
Buckets

First, create the demo_agg index to try aggregation queries.

Create the demo_agg index

POST /_bulk
{ "index" : { "_index" : "demo_agg" , "_id" : "1" }}
{ "text" : "This is Elasticsearch test." , "type" : 1 }
{ "index" : { "_index" : "demo_agg" , "_id" : "2" }}
{ "text" : "Elasticsearch is God." , "type" : 2 }
{ "index" : { "_index" : "demo_agg" , "_id" : "3" }}
{ "text" : "This is a pen." , "type" : 3 }
{ "index" : { "_index" : "demo_agg" , "_id" : "4" }}
{ "text" : "I have a pen." , "type" : 2 }

Metrics

Run a metric aggregation against the demo_agg index. First, use the avg query to calculate the average value.

avg query

GET demo_agg/_search
{
  "size": 0,
  "aggs": {
    "hoge_name": {
      "avg": {
        "field":"type"
      }
    }
  }
}

avg query result

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "hoge_name" : {
      "value" : 2.0
    }
  }
}

The value is 2.0. The type values used to calculate the average are [1,2,3,2], so this correctly results in an average of 2.

Also, "size": 0 means that the _search API displays zero search results. In other words, it does not display the matching documents. If you increase this value, you can see the search results used to calculate the average.

The following aggregations are also available.

sum: returns the total value.
max: returns the maximum value.
min: returns the minimum value.
stats: returns all of the above values.
cardinary: returns the number of distinct values. For example, [1,2,3,2] has three distinct values: 1, 2, and 3.

Buckets

You can create buckets for specified conditions and place matching documents into those buckets.

Classifying by range

Create buckets

GET demo_agg/_search
{
  "size": 0,
  "aggs": {
    "bucket_name": {
      "range": {
        "field":"type",
        "ranges": [
          {
            "from": 0, 
            "to":2
          },
          {
            "from": 2
          }
        ]
      }
    }
  }
}

Bucket creation result

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "bucket_name" : {
      "buckets" : [
        {
          "key" : "0.0-2.0",
          "from" : 0.0,
          "to" : 2.0,
          "doc_count" : 1
        },
        {
          "key" : "2.0-*",
          "from" : 2.0,
          "doc_count" : 3
        }
      ]
    }
  }
}

From the result, you can confirm the following.

The bucket where the value of the "type" field is greater than or equal to 0 and less than 2 contains one document. (type=[1])
The bucket where the value of the "type" field is greater than or equal to 2 contains three documents. (type=[2,3,2])

Classifying by equal intervals (histogram)

You can also classify values by equal intervals with the histogram query.

histogram query

GET demo_agg/_search
{
  "size" : 0 ,
   "aggs" : {
     "bucket_name" : {
       "histogram" : {
         "field" : "type" ,
         "interval" : 1
      }
    }
  }
}

histogram query result

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "bucket_name" : {
      "buckets" : [
        {
          "key" : 1.0,
          "doc_count" : 1
        },
        {
          "key" : 2.0,
          "doc_count" : 2
        },
        {
          "key" : 3.0,
          "doc_count" : 1
        }
      ]
    }
  }
}

As shown above, documents are classified into buckets at the specified interval value of 1.