Elasticsearch 使用optimize强制合并segment测试

首先我操作如下状态的索引:
health index                  pri rep docs.count docs.deleted store.size pri.store.size 
green  javaindex_20160518       5   1   23330821            0     15.8gb          7.9gb 
合并前:
[root@betaing nock]# curl -s "http://localhost:9200/_cat/segments/javaindex_20160518?v&h=shard,segment,size,size.memory" | awk '{sum += $NF} END {print sum}'
456833526
合并操作:
curl -XPOST 'http://localhost:9200/javaindex_20160518/_optimize?max_num_segments=1'
合并后:
[root@betaing nock]# curl -s "http://localhost:9200/_cat/segments/javaindex_20160518?v&h=shard,segment,size,size.memory" | awk '{sum += $NF} END {print sum}'
369622567
合并后减少的segment memory为:
>>> print (456833526 - 369622567)
87210959  ----> 87.2M  缩减大小
百分比:
>>> print (456833526 - 369622567) / 456833526.0 
0.190903149696  ----> 19% 缩减百分比
换个更大的索引测试一遍,同样也是合并为一个segment 索引大小:
health index                  pri rep docs.count docs.deleted store.size pri.store.size 
green  javaindex_20160520       5   1  103324505            0     70.3gb         35.1gb 
合并前:
[root@betaing nock]# curl -s "http://localhost:9200/_cat/segments/javaindex_20160520?v&h=shard,segment,size,size.memory" | awk '{sum += $NF} END {print sum}'
1698117764
合并操作:
curl -XPOST 'http://localhost:9200/javaindex_20160520/_optimize?max_num_segments=1'
合并后:
[root@betaing index]# curl -s "http://localhost:9200/_cat/segments/javaindex_20160520?v&h=shard,segment,size,size.memory" | awk '{sum += $NF} END {print sum}'
1622962469
>>> print ( 1698117764 - 1622962469 ) / 1698117764.0
0.0442579994116

压缩后释放了4.4%的内存,大小就是75.2M
总结: 从上面的例子,可以看出来索引越大,反而释放的segment memory效率越低!   下面我们针对单个索引,做个合并segment个数不同来对比一下效率: 合并前:
health index                  pri rep docs.count docs.deleted store.size pri.store.size 
green  phpindex_20160526        5   1  260338401            0     96.5gb         48.2gb
[root@betaing nock]# curl -s "http://localhost:9200/_cat/segments/phpindex_20160526?v&h=shard,segment,size,size.memory" | awk '{sum += $NF} END {print sum}'
3955994758
合并为10个segment:
curl -XPOST 'http://localhost:9200/phpindex_20160526/_optimize?max_num_segments=10'
[root@betaing nock]# curl -s "http://localhost:9200/_cat/segments/phpindex_20160526?v&h=shard,segment,size,size.memory" | awk '{sum += $NF} END {print sum}'
3929919062
>>> print ( 3955994758 - 3929919062 ) / 3955994758.0
0.00659143846115
[quote]>> print ( 3955994758 - 3929919062 ) 
26075696
合并后memory减少了26M,百分比为0.66%   合并为5个segment:
curl -XPOST 'http://localhost:9200/phpindex_20160526/_optimize?max_num_segments=5'
{"_shards":{"total":10,"successful":10,"failed":0}}
[root@betaing nock]# curl -s "http://localhost:9200/_cat/segments/phpindex_20160526?v&h=shard,segment,size,size.memory" | awk '{sum += $NF} END {print sum}'
3899949448
合并后效率:
>>> print ( 3955994758 - 3899949448 ) / 3955994758.0
0.0141671851022
>>> print ( 3955994758 - 3899949448 ) 
56045310
合并后减少了56M segment memory,效率为1.42%[/quote] 合并为1个segment:
[root@betaing nock]# curl -XPOST 'http://localhost:9200/phpindex_20160526/_optimize?max_num_segments=1'
{"_shards":{"total":10,"successful":10,"failed":0}}
[root@betaing nock]# curl -s "http://localhost:9200/_cat/segments/phpindex_20160526?v&h=shard,segment,size,size.memory" | awk '{sum += $NF} END {print sum}'
3892073433
合并后效率:
>>> print ( 3955994758 - 3892073433 ) / 3955994758.0
0.0161580914309
[quote]>> print ( 3955994758 - 3892073433 ) 
63921325
合并后减少了64M segment memory,效率为1.6%   总结: 随着合并数的减少,释放的segment memory增加,效率增大,但是不是成倍的。   性能如下: [attach]1253[/attach][/quote]

1 个评论

还有一个问题没有强调,那就是合并过程时间周期有点长,像文章中,73G的索引合并为一个seagment需要大概30分钟左右!

要回复文章请先登录注册