Elastic Stack

发布日期: 2020-04-09

文章字数: 2,278

阅读时长: 10 分

阅读次数:

Elasticsearch篇之Mapping设置

Mapping简介

mapping类似于数据库中的表结构定义,主要的作用如下
- 定义Index下的字段名 (Field Name)
- 定义字段的类型, 比如数值型, 字符串型, 布尔型等
- 定义倒排索引相关的配置, 比如是否索引, 记录position等

获得索引的mapping设置

GET {索引名}/_mapping

示例:

# request
GET test_index/_mapping

# response
{
  "test_index": {            # 索引名
    "mappings": {
      "doc": {                # 类型名
        "properties": {        # 里面包含字段的详细信息
          "age": {
            "type": "long"
          },
          "username": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

自定义Mapping

自定义Mapping API
Mapping中的字段类型一旦设定后, 禁止直接修改, 原因如下
- Lucene实现的倒排索引生成后不允许修改
如果需要修改, 重新建立新的索引, 然后做reindex操作
允许新增字段

dynamic

通过dynamic参数来控制字段的新增
true (默认) 允许自动新增字段
false 不允许自动新增字段, 但是文档可以正常写入, 但无法对字段进行查询等操作
strict 文档不能写入, 报错

设置dynamic参数方式如下:

# request
PUT test_index3
{
  "mappings": {
    "doc": {
      "dynamic": false,            # 设置不允许自动新增字段 (全局)
      "properties": {
        "user": {
          "type": "keyword"
        },
        "social_networks": {
          "dynamic": true,        # 设置social_networks字段可以自增字段 (局部)
          "properties": {}
        }
      }
    }
  }
}

# request
# 在对test_index3进行新增文档后, 查看字段自增情况
PUT test_index3/doc/2
{
  "user": "jiavg",
  "pwd": "123456",
  "age": 23,                    # 新增了一个字段(全局)
  "social_networks": {
    "cookie": "user-agent"        # 新增了一个字段(social_networks局部)
  }
}

# request
# 查看索引mapping设置
GET test_index3/_mapping

# response
# 可见全局由于设置 "dynamic": false, age字段没有自动添加到索引的mapping中去
# 而由于"social_networks"局部设置 "dynamic": true, cookie字段自动添加到了索引的mapping中去
# 其中,不仅仅是mapping设置发生改变, 新增的cookie字段可以由es query进行查询, 而age则不可以
{
  "test_index3": {
    "mappings": {
      "doc": {
        "dynamic": "false",
        "properties": {
          "social_networks": {
            "dynamic": "true",
            "properties": {
              "cookie": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "user": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

copy_to

将该字段的值复制到目标字段, 实现类似_all的作用
不会出现在_source中, 只用来搜索

示例:

# request
# copy_to使用示例
PUT test_index4
{
  "mappings": {
    "doc": {
      "properties": {
        "first-name": {
          "type": "text",
          "copy_to": "full-name"    # 把该字段复制到"full-name"字段
        },
        "last-name": {
          "type": "text",
          "copy_to": "full-name"    # 把该字段复制到"full-name"字段
        },
        "full-name": {                # 目标字段
          "type": "text"
        }
      }
    }
  }
}

# request
# 插入文档数据到test_index4索引
PUT test_index4/doc/1
{
  "first-name": "jike",
  "last-name": "shijian"
}

# request
# 查看test_index4索引的文档信息
GET test_index4/_search

# response
# 由结果可见, 复制的目标字段("full-name")并没有出现在_source中
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index4",
        "_type": "doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "first-name": "jike",
          "last-name": "shijian"
        }
      }
    ]
  }
}

# request
# 对test_index4索引的"full-name"字段进行查询
GET test_index4/_search
{
  "query": {
    "match": {
      "full-name": {
        "query": "jike shijian",
        "operator": "and"
      }
    }
  }
}

# response
# 由查询结果, 根据"full-name"字段可以查询到文档数据
# 可见虽然copy_to的目标字段数据虽然不会存储到_source中, 但是却可以根据此字段进行查询
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "test_index4",
        "_type": "doc",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "first-name": "jike",
          "last-name": "shijian"
        }
      }
    ]
  }
}

index

控制当前字段是否索引, 默认为true, 即记录索引, false不记录, 即不可搜索
设置为false的情况:
- 该字段为敏感字段, 不想通过搜索进行该字段的检索
- 为了节省磁盘, 内存, 因为不索引当前字段, 即不对该字段进行倒排索引, 即可节省磁盘空间

示例:

# request
# index使用示例
PUT test_index5
{
  "mappings": {
    "doc": {
      "properties": {
        "cookie": {
          "type": "text",    
          "index": false        # 不对该字段进行索引
        }
      }
    }
  }
}

# request
# 为了验证设置为"index": false 的字段不能进行查询, 我们为test_index5新增一个文档
PUT test_index5/doc/1
{
  "cookie": "user-agent..."
}

# request
# 查看test_index5的所有文档信息
GET test_index5/_search

# response
# 可见该文档已被插入
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index5",
        "_type": "doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "cookie": "user-agent..."
        }
      }
    ]
  }
}

# request
# 根据cookie字段("index"设置为了false)对test_index5索引进行查询
# 由结果可知, 该查询发生错误, 错误原因(Cannot search on field [cookie] since it is not
# indexed.)
{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: {\n  \"match\" : {\n    \"cookie\" : {\n      \"query\" : \"user\",\n      \"operator\" : \"OR\",\n      \"prefix_length\" : 0,\n      \"max_expansions\" : 50,\n      \"fuzzy_transpositions\" : true,\n      \"lenient\" : false,\n      \"zero_terms_query\" : \"NONE\",\n      \"auto_generate_synonyms_phrase_query\" : true,\n      \"boost\" : 1.0\n    }\n  }\n}",
        "index_uuid": "aBw3JkTQT7aA8Oa8Y3i6VA",
        "index": "test_index5"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test_index5",
        "node": "Eyro1LLhQOSsyBQNJV86kQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"match\" : {\n    \"cookie\" : {\n      \"query\" : \"user\",\n      \"operator\" : \"OR\",\n      \"prefix_length\" : 0,\n      \"max_expansions\" : 50,\n      \"fuzzy_transpositions\" : true,\n      \"lenient\" : false,\n      \"zero_terms_query\" : \"NONE\",\n      \"auto_generate_synonyms_phrase_query\" : true,\n      \"boost\" : 1.0\n    }\n  }\n}",
          "index_uuid": "aBw3JkTQT7aA8Oa8Y3i6VA",
          "index": "test_index5",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Cannot search on field [cookie] since it is not indexed."
          }
        }
      }
    ]
  },
  "status": 400
}

index_options

index_options用于控制倒排索引记录的内容, 有如下4中配置
- docs 只记录doc id
- freqs 记录doc id 和 term frequencies
- positions 记录 doc id , term frequencies 和 term position
- offsets 记录 doc id , term frequencies , term position 和character offsets
text类型默认配置为positions, 其他默认为docs
记录内容越多, 占用空间越大

示例:

null_value

当字段遇到null值时的处理策略, 默认为null, 即空值, 此时es会忽略该值, 可以通过设定该值设定字段的默认值.

示例:

mapping文档说明

具体自定义mapping参数信息详见es官方文档

链接地址: https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping.html

数据类型

核心数据类型

字符串型 text, keyword
数值型 long, integer, short, byte, double, float, half_float, scaled_float
日期类型 date
布尔类型 boolean
二进制类型binary
范围类型 integer_range, float_range, long_range, double_range, date_range

复杂数据类型

数组类型 array
对象类型 object
嵌套类型 nested object

地理位置数据类型

geo_point
geo_shape

专用类型

记录ip地址 ip
实现自动补全 completion
记录分词数 token_count
记录字符串hash 值 nurmur3
percolator
join

多字段特性

允许对同一个字段采用不同的配置, 比如分词, 常见例子如对人名实现拼音搜索, 只需在人名中新增一个子字段为pinyin即可

示例:

Dynamic Mapping

es可以自动识别文档字段类型, 从而降低用户使用成本, 如下所示:

自动识别支持类型

es是依靠JSON文档的字段类型来实现自动识别字段类型, 支持的类型如下:
验证es的自动识别

日期的自动识别

日期的自动识别可以自行配置日期格式, 以满足各种需求
- 默认是 [“strict_date_optional_time”, “yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z”]
- strict_date_optional_time 是ISO datetime的格式, 完整格式类似下面:
  - YYYY-MM-DDThh:mm:ssTZD (eg: 1997-07-16T19:20:30+01:00)
- dynamic_date_formats可以自定义日期类型
- date_detection 可以关闭日期自动识别的机制

字符串中数字的自动识别

字符串是数字时, 默认不会自动识别为整型, 因为字符串出现数字是完全合理的
- numeric_detection可以开启字符串中数字的自动识别, 如下所示:

Dynamic Templates

允许根据es自动识别的数据类型, 字段名等来动态设定字段类型, 可以实现如下效果:
- 所有字符串类型都设定为keyword类型, 即默认不分词
- 所有以message开头的字段都设定为text类型, 即分词
- 所有以long_开头的字段都设定为long类型
- 所有自动匹配为double类型的都设定为float类型, 以节省空间
Dynamic Templates API
匹配规则一般有如下几个参数:
- match_mapping_type 匹配es自动识别的字段类型, 如boolean, long, string等
- match, unmatch 匹配字段名
- path_match, path_unmatch匹配路径

Dynamic Templates 设置示例

字符串默认使用keyword类型

es默认会为字符串设置为text类型, 并增加一个keyword的子字段

# request
# 通过设置dynamic_templates, 使字符串默认使用keyword类型
PUT test_index6
{
  "mappings": {
    "doc": {
      "dynamic_templates": [
        {
          "string_as_keyword": {
            "match_mapping_type": "string", # 匹配string类型
            "mapping": {
              "type": "keyword"        # 把string类型映射为keyword类型, 而不是默认的text
            }
          }
        }
      ]
    }
  }
}

# request
# 向test_index6索引添加文档, 以验证dynamic_templates配置
PUT test_index6/doc/1
{
  "user": "jiavg"
}

# request
# 获得es的Dynamic Mapping机制的默认识别类型
GET test_index6/_mapping

# response
# 由此可见, user被Dynamic Mapping机制识别为keyword, 而不是默认的text, 可见dynamic_templates配置生效
{
  "test_index6": {
    "mappings": {
      "doc": {
        "dynamic_templates": [
          {
            "string_as_keyword": {
              "match_mapping_type": "string",
              "mapping": {
                "type": "keyword"
              }
            }
          }
        ],
        "properties": {
          "user": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

以message开头的字段设置为text类型
double类型设置为float, 节省空间

自定义Mapping建议

自定义Mapping的操作步骤如下
- 写入一条文档到es的临时索引中, 获取es自动生成的mapping
- 修改步骤1得到的mapping, 自定义相关配置
- 使用步骤2的mapping创建实际所需索引

索引模板

索引模板, 英文名为Index Template, 主要用于在新建索引时自动应用预先设定的配置, 简化索引创建的操作步骤
- 可以设定索引的配置和mapping
- 可以有多个模板, 根据order设置, order大的覆盖小的配置
索引模板API, endpoint为_template, 如下所示: