ElasticSearch 总结

ES

索引常用操作

//创建索引(在ES中创建一个索引类似于在数据库中建立一个数据库(ES6.0之后类似于创建一个表))
//设置索引的分片数为3，备份数为2
PUT twitter
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    }
}
//默认的分片数是5到1024
//默认的备份数是1

//创建mapping映射
//在ES中创建一个mapping映射类似于在数据库中定义表结构，即表里面有哪些字段、字段是什么类型、字段的默认值等
PUT twitter
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    },
   "mappings" : {
        "type1" : {
            "properties" : {
                "field1" : { "type" : "text" }
            }
        }
    }
}
// type1 指的是type。 field1指的是字段名。text是字段类型。

//查看索引的定义信息
GET /twitter
//查看所有索引的定义信息
GET /_all

//删除索引
DELETE /twitter
// 删除所有索引
DELETE /_all

1 2	//判断索引是否存在 HEAD twitter

//索引监控
GET /_stats
//查看指定索引的状态信息
GET /index1,index2/_stats

映射常见操作

映射定义索引中有什么字段、字段的类型等结构信息。相当于数据库中表结构定义，或 solr中的schema。因为lucene索引文档时需要知道该如何来索引存储文档的字段。ES中支持手动定义映射，动态映射两种方式。

//为索引test创建mapping
PUT test
{
//映射定义
"mappings" : {
//名为type1的映射类别(mapping type)
        "type1" : {
        //字段定义
            "properties" : {
            //名为field1的字段，它的field datatype 为 text
                "field1" : { "type" : "text" }
            }
        }
    }
}
//映射类别的废除说明
//为了与之后的版本匹配，现在将这个唯一的映射类别名定义为“_doc”,因为索引的请求地址将规范为：PUT {index}/_doc/{id} and POST {index}/_doc
因此改为了
PUT twitter
{
  "mappings": {
    "_doc": { //////////////// type 改为 _doc
      "properties": {
        "type": { "type": "keyword" }, 
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" },
        "content": { "type": "text" },
        "tweeted_at": { "type": "date" }
      }
    }
  }
}

动态映射

动态映射是ES中提供的重要特性，让我们可以快速使用ES，而不需要先创建索引、定义映射。如我们直接向ES提交文档进行索引。

1 2	PUT data/_doc/1 { "count": 5 }

ES将自动为我们创建data索引、_doc 映射、类型为 long 的字段 count

索引文档时，当有新字段时， ES将根据我们字段的json的数据类型为我们自动加人字段定义到mapping中。

分析器Analyzer

在ES中一个分析器由以下三种组件组合而成。

character filter ：字符过滤器，对文本进行字符过滤处理，如处理文本中的html标签字符。处理完后再交给tokenizer进行分词。一个analyzer中可包含0个或多个字符过滤器，多个按配置顺序依次进行处理。
tokenizer：分词器，对文本进行分词。一个analyzer必需且只可包含一个tokenizer。
token filter：词项过滤器，对tokenizer分出的词进行过滤处理。如转小写、停用词处理、同义词处理。一个analyzer可包含0个或多个词项过滤器，按配置顺序进行过滤。

文档管理

//指定文档id，新增、修改
PUT twitter/_doc/1
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

//新增，自动生成文档id
POST twitter/_doc/
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
------------
//  返回结果
{
  "_index" : "twitter",   // 所属索引
  "_type" : "_doc",       //所属mapping type
  "_id" : "ec5B9mwB-bWstPBClp9t",   //文档id
  "_version" : 1,         // 文档版本
  "result" : "created",   
  "_shards" : {            // 分片的写入情况
    "total" : 3,           //所在分片有三个副本
    "successful" : 1,      // 1个副本上成功写入
    "failed" : 0           // 失败副本数
  },
  "_seq_no" : 0,           // 第几次操作该文档
  "_primary_term" : 1      // 词项数
}

//获取单个文档
HEAD twitter/_doc/11
GET twitter/_doc/1
不获取文档的source：
GET twitter/_doc/1?_source=false
获取文档的source：
GET twitter/_doc/1/_source

//删除文档
指定文档id进行删除
DELETE twitter/_doc/1
 用版本来控制删除
DELETE twitter/_doc/1?version=1

//更新文档
// 指定文档id进行修改
PUT twitter/_doc/1
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
----------------乐观锁并发更新控制
PUT twitter/_doc/1?version=1
{
    "id": 1,
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

搜索

//从索引tweet里面搜索字段user为k的记录
 GET /twitter/_search?q=user:k
//从索引tweet,user里面搜索字段user为k的记录
GET /tweet,user/_search?q=user:k
//从所有索引里面搜索字段tag 为wow的记录
GET /_all/_search?q=tag:wow
GET /_search?q=tag:wow

特殊的搜索方法

如果我们只想知道有多少文档匹配某个查询，可以这样用参数：

1	GET /bank/_search?q=city:b*&size=0

如果我们只想知道有没有文档匹配某个查询，可以这样用参数：

1	GET /bank/_search?q=city:b*&size=0&terminate_after=1

Request body Search

Request body 搜索方式以JSON格式在请求体中定义查询 query。请求方式可以是 GET 、POST 。

GET /twitter/_search
{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
--------
可用的参数:(query和下面的都是同等级的参数)
query: 用Query DSL来定义查询
timeout：请求超时时长，限定在指定时长内响应（即使没查完）；
from： 分页的起始行，默认0；
size：分页大小；
request_cache：是否缓存请求结果，默认true。
terminate_after：限定每个分片取几个文档。如果设置，则响应将有一个布尔型字段terminated_early来指示查询执行是否实际已经terminate_early。缺省为no terminate_after；
search_type：查询的执行方式，可选值dfs_query_then_fetch or query_then_fetch ，默认： query_then_fetch ；
batched_reduce_size：一次在协调节点上应该减少的分片结果的数量。如果请求中的潜在分片数量可能很大，则应将此值用作保护机制以减少每个搜索请求的内存开销。

指定返回哪些内容（和query同等级）

source filter 对_source 字段进行选择

GET /_search
{
    "_source": false,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

version 来指定返回文档的版本字段

GET /_search
{
    "version": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

Sort 排序

可以指定按一个或多个字段排序。也可通过_score指定按评分值排序，_doc 按索引顺序排序。默认是按相关性评分从高到低排序。
GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }    },
    {
      "balance": {
        "order": "asc"
      }    },
    "_score"
  ]
}

具体的Query 即 Query DSL领域特定语言

//Match all query 查询所有
GET /_search
{
    "query": {
        "match_all": {}
    }
}
// 什么都不查
GET /_search
{
    "query": {
        "match_none": {}
    }
}
//match query 全文查询的标准查询，可以对一个字段进行模糊，短语查询。
GET /_search
{
    "query": {
        "match" : {
            "message" : "this is a test"
        }
    }
}

聚合分析

聚合分析是数据库中重要的功能特性，完成对一个查询的数据集中数据的聚合计算，如：找出某字段（或计算表达式的结果）的最大值、最小值，计算和、平均值等。ES作为搜索引擎兼数据库，同样提供了强大的聚合分析能力。对一个数据集求最大、最小、和、平均值等指标的聚合，在ES中称为指标聚合 metric。而关系型数据库中除了有聚合函数外，还可以对查询出的数据进行分组group by，再在组上进行指标聚合。在 ES 中group by 称为分桶，桶聚合 bucketing。ES中还提供了矩阵聚合（matrix）、管道聚合（pipleline），但还在完善中。

聚合的写法

"aggregations" : {
    "<aggregation_name>" : { <!--聚合的名字 -->
        "<aggregation_type>" : { <!--聚合的类型 -->
            <aggregation_body> <!--聚合体：对哪些字段进行聚合 -->
        }
        [,"meta" : {  [<meta_data_body>] } ]? <!--元 -->
        [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 -->
    }
    [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
}
//聚合计算的值可以取字段的值，也可是脚本计算的结果。
// aggregations 可以简写为aggs

Max min sum avg

//查询所有客户中余额的最大值
POST /bank/_search?
{
  "size": 0, 
  "aggs": {            //   !!!!!!!!!!!
    "masssbalance": {  //!!!!!!!!!!!!!!
      "max": {         //   !!!!!!!!!!!!!
        "field": "balance"
      }
    }
  }
}
-------------
{
  "took": 2080,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {      //!!!!!!!!!!!!!!!!
    "masssbalance": {    //!!!!!!!!!!!!!!!！
      "value": 49989     //!!!!!!!!!!!!!!!!
    }
  }
}

// 查询年龄为21岁的客户中的余额最大值
POST /bank/_search?
{
  "size": 2, 
  "query": {      //!!!!!!!!!!!!!!
    "match": {    //!!!!!!!!!!!!!!
      "age": 24   //!!!!!!!!!!!!!
    }
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ],
  "aggs": {
    "max_balance": {
      "max": {    //!!!!!!!!!!!!!!!!
        "field": "balance"
      }
    }
  }
}
---------------------
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 42,
    "max_score": null;
  },
  "aggregations": {          //!!!!!!!!!!!!!!
    "max_balance": {         //!!!!!!!!!!!!!!
      "value": 48745         //!!!!!!!!!!!!!!
    }
  }
}

文档计数count

//统计银行索引bank下年龄为24的文档数量
POST /bank/_doc/_count
{
  "query": {
    "match": {
      "age" : 24
    }
  }
}
-------------------
{
  "count": 42,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  }
}

桶聚合

//根据项分组计数(根据工资进行统计)
POST /customer/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "terms": {"field": "salary"}
    }
  }
}
----------
{
    "took": 26,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 6,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "group_count": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 15000,
                    "doc_count": 2
                },
                {
                    "key": 20000,
                    "doc_count": 2
                },
                {
                    "key": 17000,
                    "doc_count": 1
                },
                {
                    "key": 18000,
                    "doc_count": 1
                }
            ]
        }
    }
}

//分组基础上再统计
POST /customer/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "terms": {"field": "salary"},
      "aggs":{
        "avg_age":{
          "avg":{"field": "age"}
        }
      }
    }
  }
}
------------
{
    "took": 30,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 6,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "group_count": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 15000,
                    "doc_count": 2,
                    "avg_age": {
                        "value": 26.5
                    }
                },
                {
                    "key": 20000,
                    "doc_count": 2,
                    "avg_age": {
                        "value": 30.5
                    }
                },
                {
                    "key": 17000,
                    "doc_count": 1,
                    "avg_age": {
                        "value": 26
                    }
                },
                {
                    "key": 18000,
                    "doc_count": 1,
                    "avg_age": {
                        "value": 27
                    }
                }
            ]
        }
    }
}

Contents

ES

相关概念