es for hadoop cluster

Es Nodes

We have 4 nodes.

  • 1 Master
  • 3 Slave
Node Ip


root: OMAN@123
admin: OMAN@123

Software Added

RHEL Development

Using the Original Source

  • yum groupinstall 'Development Tools'
  • yum install zlib zlib-devel

Python 3.6

Python 3.6 Built from Source

- installed in /usr/local/bin
- Private Python Environment built
- Private Python Modules used to add modules

Using Private Python Environemnt


Java 8

ES needs Java 8.

I pulled the latest rpm from the Oracle site, and installed it like this

rpm -i jdk-8u152-linux-x64.rpm


Using the ES supplied rpm

rpm -i elasticsearch-5.6.3.rpm

Auto Start

Using the suggested start scripts

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service

GNU Parallel

I like GNU Parallel as it allows me to use the multi-cpu and multi core facilities.

This is how to install it.

 wget gnu_parallel.bz2
 gunzip gnu_parallel.bz2
 tar -xvf gnu_parallel.bz2
 cd gnu_parallel_20171102
 su root
 make install


 parallel citation
 will cite

At this point we should have GNU Parallel installed and ready to use in a script.

Test Loading

Building data

To build some test data I used my custom script called

./ 10 500000

Starting from index position 10, create 1 Million records (500000*2: Half English, Half Arabic)


Mount the RHEL Image and issues the command

yum insall httpd

Thats it

ElasticSearch Head

I downloaded from github the elastic-search header, and then tried to open the web page.

However inside firfox I was getting an error

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at This can be fixed by moving the resource to the same domain or enabling CORS.

We can apparactly allow access by creating a .htaccess file which looks like this

Header set Access-Control-Allow-Origin "*"

This can also be placed in a Directory section of the server config file (httpd.conf usually).

I think this is best fixed however from ElasticSearch

cd /etc/elasticsearch
vi elasticsearch.yml

At the bottom of this file I placed

http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, X-User"

Restart the elastic-search

service elasticsearch restart

RHEL 7.1 Firewal Config

to See the firewall status

firewall --status

To See what is the setting

 firewall-cmd --list-all-zones

To Allow Http Port 80 and https

 firewall-cmd --perm --add-service=http
 firewall-cmd --perm --add-service=https
 firewall-cmd --reload

To Allow 9200 (ElasticSearch)

 firewall-cmd --perm --add-port=9200/tcp
 firewall-cmd --perm --add-port=9300/tcp
 firewall-cmd --reload

Elasticsearch Slave Nodes

These are the steps

  • mkdir /media/dvd
  • mount -t iso9660 file /media/dvd
  • vi /etc/yum.repos.d/media
  • yum groupinstall 'Development Tools'
  • yum install zlib zlib-devel
  • rpm -i jdk-8u152-linux-x64.rpm
  • rpm -i elasticsearch-5.6.3.rpm
  • mkdir /esdata
  • mkdir /eslog
  • chown elasticsearch:elasticsearch /esdata
  • chown elasticsearch:elasticsearch /eslog

Elasticearch Config file

The host IP Address needs changing eq master
node.master: false true
node.ingest: false /esdata
path.logs: /eslog
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, X-User"


Generally it has been excellent - with fast loading and very fast retreival. However at around 1.4B records, I am starting to notice that things are slowing down a little.

Shard Time ?

So with 1.4B records in an index called eia, I will now start to send this data into a new shard.

I created the shard like this

curl -XPUT 'pem01:9200/eia2?pretty' -H 'Content-Type: application/json' --data-binary @eia.json

And the Data defition is

    "settings" : {
        "number_of_shards" : 3
    "mappings" : {
        "type1" : {
            "properties" : {
                "name" : { "type" : "text" },
                "from" : { "type" : "integer" },
                "to"   : { "type" : "integer" },
                "msg"  : { "type" : "text" }

You will notice this is exactly the same data defintiion file as I created earlier - I just changed the URL for the CURL.