Elasticsearch 笔记

Monday, September 26, 2016

最近把项目的 ES 升级到了 2.4.0, 并优化了一下搜索结果, 做个笔记。

安装

如果没有安装 JDK,请安装 JDK

sudo add-apt-repository ppa:webupd8team/java -y
sudo apt-get update
sudo apt-get install oracle-java8-installer

安装 ES

aria2c https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.4.0/elasticsearch-2.4.0.tar.gz
tar zxvf elasticsearch-2.4.0.tar.gz
sudo mv elasticsearch-2.4.0 /usr/local/
# 修改一下相关配置 (path.data, path.logs等)
vim /usr/local/elasticsearch-2.4.0/config/elasticsearch.yml
/usr/local/elasticsearch/bin/elasticsearch

用 supervisor 来管理 ES, 创建配置文件 /etc/supervisor/conf.d/es.conf

[program:elasticsearch]
command=/usr/local/elasticsearch-2.4.0/bin/elasticsearch
directory=/usr/local/elasticsearch-2.4.0
user=lxneng
autostart=true
autorestart=true
redirect_stderr=true

reload supervisor config and check status

sudo supervisorctl reload
sudo supervisorctl status

check

$ http localhost:9200
HTTP/1.1 200 OK
Content-Length: 311
Content-Type: application/json; charset=UTF-8

{
    "cluster_name": "es001",
    "name": "Weapon X",
    "tagline": "You Know, for Search",
    "version": {
        "build_hash": "ce9f0c7394dee074091dd1bc4e9469251181fc55",
        "build_snapshot": false,
        "build_timestamp": "2016-08-29T09:14:17Z",
        "lucene_version": "5.5.2",
        "number": "2.4.0"
    }
}

done.

安装中文分词器ik

ES默认的中文分词器不是太好,比如上海这个词,会分成两个词来搜索,
而我们期待的可能是作为一个整体的词进行搜索。

这里安装的是ik:elasticsearch-analysis-ik

  1. 前往下载页面 https://github.com/medcl/elasticsearch-analysis-ik/releases 下载编译好的 zip 包,
  2. 解压到 /usr/local/elasticsearch-2.4.0/plugins/ik/,
  3. 重启 ES sudo supervisorctl restart elasticsearch

查看效果

$ http localhost:9200/posts/_analyze analyzer=ik_smart text=上海大学生

HTTP/1.1 200 OK
Content-Length: 177
Content-Type: application/json; charset=UTF-8

{
    "tokens": [
        {
            "end_offset": 2,
            "position": 0,
            "start_offset": 0,
            "token": "上海",
            "type": "CN_WORD"
        },
        {
            "end_offset": 5,
            "position": 1,
            "start_offset": 2,
            "token": "大学生",
            "type": "CN_WORD"
        }
    ]
}

Rails 项目中配置 ik 分词

Gemfile

# elasticsearch
gem "elasticsearch", git: "git://github.com/elasticsearch/elasticsearch-ruby.git"
gem "elasticsearch-model", git: "git://github.com/elasticsearch/elasticsearch-rails.git"
gem "elasticsearch-rails", git: "git://github.com/elasticsearch/elasticsearch-rails.git"

配置 model , 在 CURD 的时候同步到 ES

class Post < ActiveRecord::Base
  # index_name "posts-#{Rails.env}"
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
  # ...

end

重新导入现有的数据

  • 创建 Rakefile, vi {Rails.root}/lib/tasks/elasticsearch.rake
require 'elasticsearch/rails/tasks/import'
  • 对某个字段用上 ik 分词器, 在 model 中配置
settings index: { number_of_shards: 1, number_of_replicas: 0 } do
  mapping do
   indexes :title, type: 'string', analyzer: 'ik_smart'
   indexes :body, type: 'string', analyzer: 'ik_smart'
  end
end
  • 重新导入
bundle exec rake environment elasticsearch:import:model CLASS='Post' BATCH=500  FORCE=y
  • 检查 mapping 是否更新, http localhost:9200/subjects/_mapping

  • done.

使用 function_score 来优化搜索结果

2.4 中这个 feature 真是太赞了, 我们可以写一段 groovy 脚本来修改 score 达到优化搜索结果,比如把最新的高分的匹配排在前面。

def self.search(query)
  __elasticsearch__.search(
    {query: {
      function_score: {
        functions: [
          { script_score: {
            script: "_score * log10(max(doc['like_count'].value, 1)) + ((doc['created_at'].value/1000) - 1412092800)/86400.0"
            }
          }],
        query: {
          match: { title: query }
        },
        boost_mode: "replace"
      }
    }})
end

This entry was tagged Rails and Elasticsearch

comments powered by Disqus

© 2009-2013 lxneng.com. All rights reserved. Powered by Pyramid

go to Top