Elasticsearch 笔记
Monday, September 26, 2016最近把项目的 ES 升级到了 2.4.0, 并优化了一下搜索结果, 做个笔记。
安装
如果没有安装 JDK,请安装 JDK
sudo add-apt-repository ppa:webupd8team/java -y
sudo apt-get update
sudo apt-get install oracle-java8-installer
安装 ES
aria2c https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.4.0/elasticsearch-2.4.0.tar.gz
tar zxvf elasticsearch-2.4.0.tar.gz
sudo mv elasticsearch-2.4.0 /usr/local/
# 修改一下相关配置 (path.data, path.logs等)
vim /usr/local/elasticsearch-2.4.0/config/elasticsearch.yml
/usr/local/elasticsearch/bin/elasticsearch
用 supervisor 来管理 ES, 创建配置文件 /etc/supervisor/conf.d/es.conf
[program:elasticsearch]
command=/usr/local/elasticsearch-2.4.0/bin/elasticsearch
directory=/usr/local/elasticsearch-2.4.0
user=lxneng
autostart=true
autorestart=true
redirect_stderr=true
reload supervisor config and check status
sudo supervisorctl reload
sudo supervisorctl status
check
$ http localhost:9200
HTTP/1.1 200 OK
Content-Length: 311
Content-Type: application/json; charset=UTF-8
{
"cluster_name": "es001",
"name": "Weapon X",
"tagline": "You Know, for Search",
"version": {
"build_hash": "ce9f0c7394dee074091dd1bc4e9469251181fc55",
"build_snapshot": false,
"build_timestamp": "2016-08-29T09:14:17Z",
"lucene_version": "5.5.2",
"number": "2.4.0"
}
}
done.
安装中文分词器ik
ES默认的中文分词器不是太好,比如上海这个词,会分成上
和海
两个词来搜索,
而我们期待的可能是作为一个整体的词进行搜索。
这里安装的是ik:elasticsearch-analysis-ik
- 前往下载页面 https://github.com/medcl/elasticsearch-analysis-ik/releases 下载编译好的 zip 包,
- 解压到
/usr/local/elasticsearch-2.4.0/plugins/ik/
, - 重启 ES
sudo supervisorctl restart elasticsearch
查看效果
$ http localhost:9200/posts/_analyze analyzer=ik_smart text=上海大学生
HTTP/1.1 200 OK
Content-Length: 177
Content-Type: application/json; charset=UTF-8
{
"tokens": [
{
"end_offset": 2,
"position": 0,
"start_offset": 0,
"token": "上海",
"type": "CN_WORD"
},
{
"end_offset": 5,
"position": 1,
"start_offset": 2,
"token": "大学生",
"type": "CN_WORD"
}
]
}
Rails 项目中配置 ik 分词
Gemfile
# elasticsearch
gem "elasticsearch", git: "git://github.com/elasticsearch/elasticsearch-ruby.git"
gem "elasticsearch-model", git: "git://github.com/elasticsearch/elasticsearch-rails.git"
gem "elasticsearch-rails", git: "git://github.com/elasticsearch/elasticsearch-rails.git"
配置 model , 在 CURD 的时候同步到 ES
class Post < ActiveRecord::Base
# index_name "posts-#{Rails.env}"
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
# ...
end
重新导入现有的数据
- 创建 Rakefile,
vi {Rails.root}/lib/tasks/elasticsearch.rake
require 'elasticsearch/rails/tasks/import'
- 对某个字段用上 ik 分词器, 在 model 中配置
settings index: { number_of_shards: 1, number_of_replicas: 0 } do
mapping do
indexes :title, type: 'string', analyzer: 'ik_smart'
indexes :body, type: 'string', analyzer: 'ik_smart'
end
end
- 重新导入
bundle exec rake environment elasticsearch:import:model CLASS='Post' BATCH=500 FORCE=y
-
检查 mapping 是否更新,
http localhost:9200/subjects/_mapping
-
done.
使用 function_score 来优化搜索结果
2.4 中这个 feature 真是太赞了, 我们可以写一段 groovy 脚本来修改 score 达到优化搜索结果,比如把最新的高分的匹配排在前面。
def self.search(query)
__elasticsearch__.search(
{query: {
function_score: {
functions: [
{ script_score: {
script: "_score * log10(max(doc['like_count'].value, 1)) + ((doc['created_at'].value/1000) - 1412092800)/86400.0"
}
}],
query: {
match: { title: query }
},
boost_mode: "replace"
}
}})
end