Elasticsearch

Elastic Stack

发布日期: 2020-04-06

文章字数: 1,718

阅读时长: 8 分

阅读次数:

Elasticsearch篇之入门

常用术语

文档 Document

用户存储在es中的数据文档.(相当于数据库表中的一行)
索引 Index

由具有相同字段的文档列表组成.(相当于数据库中的表, es6.0以后index下的type只能有一个, 且官方声明以后会取消掉type这个概念)
节点 Node

一个Elasticsearch的运行实例, 是集群的构成单元.
集群 Cluster

由一个或多个节点组成, 对外提供服务.

文档 Document

Document在es中是一个 Json Object, 由字段(Field)组成, 常见数据类型如下:
- 字符串: text(进行分词的字符串), keyword(不进行分词的字符串)
- 数值型: long, integer, short, byte, double, float, half_float, scaled_float
- 布尔: boolean
- 日期: date
- 二进制: binary
- 范围类型: integer_range, float_range, long_range, double_range, date_range
每个文档有唯一的id标识
- 可以自行指定
- 也可以es自动生成
如下所示是一条Nginx日志在ES储存为一条文档(Document), ES对其日志信息进行结构化处理, 包含多个字段(Field), 每个字段的字段名(Field Name)对应一个字段值(Field Value)

2020-04-06_210508

文档元数据 Document MetaData

每个Document都有一个文档元数据(Document MetaData), 用于标注文档的相关信息
- _index: 文档所在的索引名
- _type: 文档所在的类型名
- _id: 文章唯一id
- _uid: 组合id, 由_type和_id组成(6.x中_type不再起作用, 所以在6.x版本中这个字段值和_id一样)
- _source: 文档的原始Json数据, 可以从这里获取每个字段的内容
- _all: 整合所有的字段内容到该字段, 默认禁用(官方不推荐使用)

索引 Index

索引中存储具有相同结构的文档(Document)
- 每个索引都有自己的mapping定义, 用于定义字段名和类型
一个集群可以有多个索引, 比如:
- nginx日志存储的时候可以按日期每天生成一个索引来存储, 方便维护
  - nginx-log-2020-04-03
  - nginx-log-2020-04-04
  - nginx-log-2020-04-05

Rest API

Elasticsearch集群对外提供RESTful API
- REST: REpresentational State Transfer (表述性状态转移)
- URI指定资源, 如Index, Document等
- Http Method指明资源操作类型, 如GET, POST, PUT, DELETE等
常用两种交互方式
- Curl命令行
- Kibana DevTools

2020-04-06_213418

索引 Index API

es有专门的Index API, 用于创建, 更新, 删除索引配置等

创建索引

PUT /{索引名}

示例:

# request
PUT /test_index

# response
{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "test_index"
}

查看现有索引

GET /_cat/indices

示例:

# request
GET /_cat/indices

# response
red    open account    eIBKm9zfQhOZkW6tC1uyEA 5 1 1 0 5.4kb 5.4kb
yellow open test_index XTzQRFtzRqK3B3EfULLrEg 5 1 0 0 1.1kb 1.1kb

删除索引

DELETE /{索引名}

示例:

# request
DELETE /test_index

# response
{
  "acknowledged": true
}

文档 Document API

es有专门的Document API

创建文档 (创建文档时, 如果索引不存在, es会自动创建对应的index和type)

指定id创建文档

# 其中类型名在6.x以后无实际作用, 并且将来版本要删除, 在这里可以任意指定, 一般指定无意义的doc
PUT /{索引名}/{类型名}/{Id}
{
    # 文档内容
}

示例:

# request
PUT /test_index/doc/1
{
  "username": "Jiavg",
  "age": 21
}

# response
# _version是为了在并行修改文档时, 防止发生错误
{
  "_index": "test_index",
  "_type": "doc",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

- 不指定id创建文档

  ```
  POST /test_index/doc
  {
      # 文档内容
  }
  ```

  示例:

  ```shell
  # request
  POST /test_index/doc
  {
    "username": "jlc",
    "age": 20
  }

  # response
  # 由于未指定id, es将会生成一个id
  {
    "_index": "test_index",
    "_type": "doc",
    "_id": "QzXQT3EBkfca6l6Y9SXp",
    "_version": 1,
    "result": "created",
    "_shards": {
      "total": 2,
      "successful": 1,
      "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
  }
  ```

查询文档

指定要查询的文档id

GET /{索引名}/{类型名}/{id}

示例:

# request
GET /test_index/doc/1

# response
# _source 储存了文档的原始数据
# 200 response
{
  "_index": "test_index",
  "_type": "doc",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "username": "Jiavg",
    "age": 21
  }
}

# 404 response
{
  "_index": "test_index",
  "_type": "doc",
  "_id": "2",
  "found": false
}

- 搜索所有文档, 用到_search

  ```
  # 不含查询条件 (查询所有文档)
  GET /{索引名}/{文档名}/_search

  # 包含查询条件 (查询符合条件的所有文档)
  GET /{索引名}/{文档名}/_search
  {
      # 查询条件
  }
  ```

  示例:

  ```shell
  # 不含查询条件 (查询所有文档)
  # request
  GET /test_index/doc/_search

  # response
  # took: 查询花费时间, 单位ms
  {
    "took": 4,
    "timed_out": false,
    "_shards": {
      "total": 5,
      "successful": 5,
      "skipped": 0,
      "failed": 0
    },
    "hits": {
      "total": 2,            # 符合条件的总文档数
      "max_score": 1,
      "hits": [            # 返回的文档详情数据数组, 默认前10个文档
        {
          "_index": "test_index",
          "_type": "doc",
          "_id": "QzXQT3EBkfca6l6Y9SXp",
          "_score": 1,    # 文档的得分
          "_source": {
            "username": "jlc",
            "age": 20
          }
        },
        {
          "_index": "test_index",
          "_type": "doc",
          "_id": "1",
          "_score": 1,
          "_source": {
            "username": "Jiavg",
            "age": 21
          }
        }
      ]
    }
  }

  # 包含查询条件 (查询符合条件的所有文档)
  # request
  GET /test_index/doc/_search
  {
    "query": {
      "term": {
        "_id": 1
      }
    }
  }

  # response
  {
    "took": 23,
    "timed_out": false,
    "_shards": {
      "total": 5,
      "successful": 5,
      "skipped": 0,
      "failed": 0
    },
    "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
        {
          "_index": "test_index",
          "_type": "doc",
          "_id": "1",
          "_score": 1,
          "_source": {
            "username": "Jiavg",
            "age": 21
          }
        }
      ]
    }
  }
  ```

更新文档

POST /{索引名}/{类型名}/{id}
{ 
   # 更新文档内容
}

删除文档
```
DELETE /{索引名}/{类型名}/{id}
```

es允许一次创建多个文档, 从而减少网络传输开销, 提升写入速率

endpoint 为 _bulk, 如下:

index和create同为创建文档, 不同的是index在创建文档时, 如果文档id已经存在, 则会覆盖相应的内容, 但是create在创建文档时,如果文档id已经存在, 则会报错。

请求

响应

注意: 在使用 _bulk时,REST API端点为/ _bulk，并且期望使用以下以换行符分隔的JSON（NDJSON）结构：

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

NDJSON: ndjson（New-line Delimited JSON）是一个比较新的标准，本身超简单，就是一个.ndjson文件中，每行都是一个传统json对象，当然每个json对象中要去掉原本用于格式化的换行符，而json的string中本身就不允许出现换行符（取而代之的是\n）.

所以当请求的数据为普通Json时会发生错误.

示例:

# NDJSON
# request
POST _bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"username":"Jiavg-1","age":5}
{"update":{"_index":"test_index","_type":"doc","_id":"QzXQT3EBkfca6l6Y9SXp"}}
{"doc":{"age":25}}
{"create":{"_index":"test_index","_type":"doc","_id":3}}
{"username":"znc","age":22}

# response
{
  "took": 52,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "test_index",
        "_type": "doc",
        "_id": "1",
        "_version": 3,
        "result": "updated",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 2,
        "_primary_term": 2,
        "status": 200
      }
    },
    {
      "update": {
        "_index": "test_index",
        "_type": "doc",
        "_id": "QzXQT3EBkfca6l6Y9SXp",
        "_version": 2,
        "result": "updated",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 2,
        "status": 200
      }
    },
    {
      "create": {
        "_index": "test_index",
        "_type": "doc",
        "_id": "3",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 2,
        "status": 201
      }
    }
  ]
}

  # 普通json
  # request
  POST _bulk
  {
    "index": {
      "_index": "test_index",
      "_type": "doc",
      "_id": 1
    }
  }
  {
    "username": "Jiavg-1",
    "age": 5
  }
  {
    "update": {
      "_index": "test_index",
      "_type": "doc",
      "_id": "QzXQT3EBkfca6l6Y9SXp"
    }
  }
  {
    "doc": {
      "age": 25
    }
  }
  {
    "create": {
      "_index": "test_index",
      "_type": "doc",
      "_id": 3
    }
  }
  {
    "username": "znc",
    "age": 22
  }

  # response
  {
    "error": {
      "root_cause": [
        {
          "type": "json_e_o_f_exception",
          "reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@618ff58; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@618ff58; line: 1, column: 3]"
        }
      ],
      "type": "json_e_o_f_exception",
      "reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@618ff58; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@618ff58; line: 1, column: 3]"
    },
    "status": 500
  }
  ```

  json和ndjson区别参考: https://blog.csdn.net/github_38885296/article/details/100915601