{"id":1609,"date":"2019-03-12T17:12:53","date_gmt":"2019-03-12T11:42:53","guid":{"rendered":"http:\/\/www.mka.in\/wp\/?p=1609"},"modified":"2019-03-14T09:55:06","modified_gmt":"2019-03-14T04:25:06","slug":"efk-stack-versatile-and-very-capable-analytics-platform","status":"publish","type":"post","link":"https:\/\/www.mka.in\/wp\/efk-stack-versatile-and-very-capable-analytics-platform\/","title":{"rendered":"EFK stack &#8211; versatile and very capable analytics platform"},"content":{"rendered":"\n<p>So far I was happily using ELK stack to feed syslog messages into Elasticsearch. In ELK stack I had used Logstash to aggregate syslogs and feed them into elasticsearch. <\/p>\n\n\n\n<p>Recently, I came across fluentd and found it quite interesting and flexible. <br> Using fluentd with Elasticsearch and Kibana I have now build a EFK stack.<\/p>\n\n\n\n<p>EFK stack is opensource and free platform which can be used to store unstructured data and later use it for analytics and build visualization.<\/p>\n\n\n\n<p>Here are the steps to build and configure this powerful analytics platform.<\/p>\n\n\n\n<p>Download OpenJDK, Elasticsearch and Kibana<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nwget -c \"https:\/\/download.java.net\/java\/GA\/jdk11\/9\/GPL\/openjdk-11.0.2_linux-x64_bin.tar.gz\"\n\nwget -c \"https:\/\/artifacts.elastic.co\/downloads\/elasticsearch\/elasticsearch-6.6.1.tar.gz\"\n\nwget -c https:\/\/artifacts.elastic.co\/downloads\/kibana\/kibana-6.6.1-linux-x86_64.tar.gz<\/code><\/pre>\n\n\n\n<p>Explode OpenJDK, Elasticsearch and Kibana tars in \/opt\/efk<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nmkdir \/opt\/efk\n\ntar xvzf openjdk-11.0.2_linux-x64_bin.tar.gz --directory \/opt\/efk\/\n\ntar xvzf elasticsearch-6.6.1.tar.gz --directory \/opt\/efk\/\n\ntar xvzf kibana-6.6.1-linux-x86_64.tar.gz --directory \/opt\/efk\/<\/code><\/pre>\n\n\n\n<p>Create symbolic links for the extracted directories <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nln -s elasticsearch-6.6.1\/ elasticsearch\n\nln -s jdk-11.0.2\/ java\n\nln -s kibana-6.6.1-linux-x86_64\/ kibana\n\nls -l\ntotal 12\nlrwxrwxrwx.  1 root root   20 Mar 12 07:22 elasticsearch -> elasticsearch-6.6.1\/\ndrwxr-xr-x.  8 root root 4096 Feb 13 17:11 elasticsearch-6.6.1\nlrwxrwxrwx.  1 root root   11 Mar 12 07:23 java -> jdk-11.0.2\/\ndrwxr-xr-x.  8 root root 4096 Mar 12 07:19 jdk-11.0.2\nlrwxrwxrwx.  1 root root   26 Mar 12 07:23 kibana -> kibana-6.6.1-linux-x86_64\/\ndrwxr-xr-x. 13 root root 4096 Mar 12 07:20 kibana-6.6.1-linux-x86_64<\/code><\/pre>\n\n\n\n<p>Add Unix users ids for elasticsearch and kibana processes<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nadduser --home \/opt\/efk\/elasticsearch elasticsearch\n\nadduser --home \/opt\/efk\/kibana kibana<\/code><\/pre>\n\n\n\n<p>Change ownership of Elasticsearch and Kibana components<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nchown -R kibana:kibana \/opt\/efk\/kibana-6.6.1-linux-x86_64\/ \/opt\/efk\/kibana\n\nchown -R elasticsearch:elasticsearch \/opt\/efk\/elasticsearch-6.6.1\/ \/opt\/efk\/elasticsearch<\/code><\/pre>\n\n\n\n<p>Create log directories and change ownership of log directories<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nmkdir \/var\/log\/{elasticsearch,kibana}\n\nchown -R kibana:kibana \/var\/log\/kibana\/\n\nchown -R elasticsearch:elasticsearch \/var\/log\/elasticsearch\/<\/code><\/pre>\n\n\n\n<p>Download Fluentd (rpm or deb depending upon you are on RHEL or debian\/ubuntu) from https:\/\/td-agent-package-browser.herokuapp.com\/3\/<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nwget -c http:\/\/packages.treasuredata.com.s3.amazonaws.com\/3\/redhat\/7\/x86_64\/td-agent-3.3.0-1.el7.x86_64.rpm<\/code><\/pre>\n\n\n\n<p>On RHEL you may face dependency of redhat-lsb-core, fulfill this dependency by installing redhat-lsb-core. <\/p>\n\n\n\n<p>Then Install fluentd td-agent.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nrpm -Uvh td-agent-3.3.0-1.el7.x86_64.rpm<\/code><\/pre>\n\n\n\n<p>Now, let us jump to the configurations of each component to complete EFK platform.<\/p>\n\n\n\n<p>Start configuring elasticsearch.<br>Create data location for elasticsearch.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nmkdir \/data\/elasticdata\nchown -R elasticsearch:elasticsearch \/data\/elasticdata\n\ncd \/opt\/efk\/elasticsearch\/config<\/code><\/pre>\n\n\n\n<p>Edit &#8220;elasticsearch.yml&#8221; config file<\/p>\n\n\n\n<p>Give some nodename and set data path, log path<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nnode.name: mysmartanalytics\npath.data=\/data\/elasticdata\npath.logs: \/var\/log\/elasticsearc<\/code><\/pre>\n\n\n\n<p>You can tune Java memory options in jvm.options config file.<\/p>\n\n\n\n<p>Now, let us configure Kibana<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncd \/opt\/efk\/kibana\/config<\/code><\/pre>\n\n\n\n<p>Edit &#8220;kibana.yml&#8221; config file<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nserver.name: \"mysmartnalytics\"<\/code><\/pre>\n\n\n\n<p>Now, let us run Elasticsearch and Kibana processes before Fluentd config.<\/p>\n\n\n\n<p>Elasticsearch startup<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nsu - elasticsearch -c \"export JAVA_HOME=\/opt\/efk\/java\/;export ES_PATH_CONF=\/opt\/efk\/elasticsearch\/config\/; \/opt\/efk\/elasticsearch\/bin\/elasticsearch -v &amp;\"<\/code><\/pre>\n\n\n\n<p>List elasticsearch indices, though it wont return any index but will confirm if startup is good.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncurl -XGET \"http:\/\/localhost:9200\/_cat\/indices?v\"\n\nhealth status index uuid pri rep docs.count docs.deleted store.size pri.store.size<\/code><\/pre>\n\n\n\n<p>Upon startup logs will go in \/var\/log\/elasticsearch\/<\/p>\n\n\n\n<p>Kibana startup<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nsu - kibana -c \"export JAVA_HOME=\/opt\/efk\/java\/; \/opt\/efk\/kibana\/bin\/kibana -c \/opt\/efk\/kibana\/config\/kibana.yml -l \/var\/log\/kibana\/kibana.log &amp;\"<\/code><\/pre>\n\n\n\n<p>Moment you start Kibana, again list elasticsearch indices and you will see Kibana creates its own index in elasticsearch.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncurl -XGET \"http:\/\/localhost:9200\/_cat\/indices?v\"\n\nhealth status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size\ngreen  open   .kibana_1 901xy-VVSiOq07hD572FmA   1   0          1            0        5kb            5kb<\/code><\/pre>\n\n\n\n<p>It is always a good idea to expose Kibana and elasticsearch (for RESTFul access) on network interface through some HTTP proxy. You can also add HTTP basic authentication and source IP whitelisting for elasticsearch and kibana in proxy config.<\/p>\n\n\n\n<p>You can use apache webserver as inbound HTTP proxy. Add following lines to your apache config.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nProxyRequests On\n\n&lt;Location \/>\n        ProxyPass http:\/\/127.0.0.1:5601\/\n        ProxyPassReverse http:\/\/127.0.0.1:5601\/\n&lt;\/Location>\n\n&lt;Location \/elastic>\n        ProxyPass http:\/\/127.0.0.1:9200\/\n        ProxyPassReverse http:\/\/127.0.0.1:9200\/\n        Deny from All\n        Allow from 127.0.0.1\n        Allow from 192.168.0.23\n        Allow from 192.168.0.25\n&lt;\/Location><\/code><\/pre>\n\n\n\n<p>Restart apache service<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nsystemctl restart httpd.service<\/code><\/pre>\n\n\n\n<p>Keep looking into apache logs<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>tail -f \/var\/log\/httpd\/access_log<\/code><\/pre>\n\n\n\n<p>Now, let us test this setup.<\/p>\n\n\n\n<p>You can use any browser, but in during development and testing I suggest to use curl tool.<\/p>\n\n\n\n<p>Access Elasticsearch through proxy url of elastricsearch<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncurl -v -L http:\/\/your_host_ip\/elastic\n* About to connect() to your_host_ip port 80 (#0)\n*   Trying your_host_ip...\n* Connected to your_host_ip (your_host_ip) port 80 (#0)\n> GET \/elastic HTTP\/1.1\n> User-Agent: curl\/7.29.0\n> Host: your_host_ip\n> Accept: *\/*\n>\n&lt; HTTP\/1.1 200 OK\n&lt; Date: Tue, 12 Mar 2019 09:39:40 GMT\n&lt; Server: Apache\/2.4.6 (SLES Expanded Support platform) OpenSSL\/1.0.2k-fips mod_fcgid\/2.3.9\n&lt; content-type: application\/json; charset=UTF-8\n&lt; content-length: 497\n&lt;\n{\n  \"name\" : \"mystartanalytics\",\n  \"cluster_name\" : \"elasticsearch\",\n  \"cluster_uuid\" : \"vUXDDln7TUuS_1DL5YE1bQ\",\n  \"version\" : {\n    \"number\" : \"6.6.1\",\n    \"build_flavor\" : \"default\",\n    \"build_type\" : \"tar\",\n    \"build_hash\" : \"1fd8f69\",\n    \"build_date\" : \"2019-02-13T17:10:04.160291Z\",\n    \"build_snapshot\" : false,\n    \"lucene_version\" : \"7.6.0\",\n    \"minimum_wire_compatibility_version\" : \"5.6.0\",\n    \"minimum_index_compatibility_version\" : \"5.0.0\"\n  },\n  \"tagline\" : \"You Know, for Search\"\n}<\/code><\/pre>\n\n\n\n<p>Access kibana through proxy url of kibana<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncurl -v -L http:\/\/your_host_ip\/\n\n* About to connect() to your_host_ip port 80 (#0)\n*   Trying your_host_ip...\n* Connected to your_host_ip (your_host_ip) port 80 (#0)\n> GET \/ HTTP\/1.1\n> User-Agent: curl\/7.29.0\n> Host: your_host_ip\n> Accept: *\/*\n>\n&lt; HTTP\/1.1 302 Found\n&lt; Date: Tue, 12 Mar 2019 09:36:35 GMT\n&lt; Server: Apache\/2.4.6 (SLES Expanded Support platform) OpenSSL\/1.0.2k-fips mod_fcgid\/2.3.9\n&lt; location: \/app\/kibana\n&lt; kbn-name: kibana\n&lt; kbn-xpack-sig: 126a568e0707377747401e2824d2f268\n&lt; content-type: text\/html; charset=utf-8\n&lt; cache-control: no-cache\n&lt; content-length: 0\n&lt;\n* Connection #0 to host your_host_ip left intact\n* Issue another request to this URL: 'http:\/\/your_host_ip\/app\/kibana'\n* Found bundle for host your_host_ip: 0x1dbbe50\n* Re-using existing connection! (#0) with host your_host_ip\n* Connected to your_host_ip(your_host_ip) port 80 (#0)\n> GET \/app\/kibana HTTP\/1.1\n> User-Agent: curl\/7.29.0\n> Host: your_host_ip\n> Accept: *\/*\n>\n&lt; HTTP\/1.1 200 OK<\/code><\/pre>\n\n\n\n<p>Now you can use kibana on any browser to see if all looks good so far.<\/p>\n\n\n\n<p>Let us start Fluentd config<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncd \/etc\/td-agent\/<\/code><\/pre>\n\n\n\n<p>Take backup of default config<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncp td-agent.conf td-agent.conf.backup<\/code><\/pre>\n\n\n\n<p>Here, we will prepare Fluentd config to accept events via RESTFUL calls. You can explore other input and output mechanisms of Fluentd as well.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncat td-agent.conf\n&lt;source>\n  @type http\n  @id input_http\n  port 8080\n#  bind 0.0.0.0\n#  body_size_limit 32m\n#  keepalive_timeout 10s\n&lt;\/source>\n\n&lt;match mka.event>\n  @type copy\n  &lt;store>\n    @type elasticsearch\n    include_tag_key true\n    host localhost\n    port 9200\n    logstash_format true\n    logstash_prefix mka\n  &lt;\/store>\n&lt;\/match><\/code><\/pre>\n\n\n\n<p>Above config will accept events over RESTful calls with json payload. Each json payload having &#8220;mka.event&#8221; tag will be matched and pushed into elasticsearch datastore. You can have more tags in REST calls and match them to desired processing units of Fluentd. You can also mention index prefix in logstash_prefix parameter in each match section.<\/p>\n\n\n\n<p>Restart td-agent<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\n\/etc\/init.d\/td-agent restart<\/code><\/pre>\n\n\n\n<p>td-agent logs go in <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ntail -f \/var\/log\/td-agent\/td-agent.log<\/code><\/pre>\n\n\n\n<p>Let us push some data over REST interface to test.<\/p>\n\n\n\n<p>Generic syntax is like:<\/p>\n\n\n\n<p>curl -X POST -d &#8216;json={json payload}&#8217; http:\/\/your_host_ip:port\/some.tag<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncurl -X POST -d 'json={\"PET\":\"DOG\",\"BREED\":\"Pug\",\"SIZE\":\"Short\",\"FOOD\":\"VEG\"}' http:\/\/192.168.0.23:8080\/mka.event\n\ncurl -X POST -d 'json={\"Plant\":\"Xmas tree\",\"Indoor\":\"yes\",\"LeafType\":\"Spines\"}' http:\/\/192.168.0.23:8080\/mka.event<\/code><\/pre>\n\n\n\n<p>Now, let us verify if index with prefix &#8220;mka&#8221; is created in elasticsearch.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncurl -XGET \"http:\/\/localhost:9200\/_cat\/indices?v\"\n\nhealth status index          uuid                   pri rep docs.count docs.deleted store.size pri.store.size\n\nyellow open   mka-2019.03.12 1Wnyeb9tSMyFvtyDSCTVJg   5   1          2            0     12.1kb         12.1kb\n\ngreen  open   .kibana_1      901xy-VVSiOq07hD572FmA   1   0          2            0      8.6kb          8.6kb<\/code><\/pre>\n\n\n\n<p>If you want to restrict access of fluentd REST interface to certain IP addresses and also want to add authentication on REST calls, then you can bind to 127.0.0.1 port 8080 in td-agent.conf and put 127.0.0.1:8080 behind HTTP proxy.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\n&lt;source>\n  @type http\n  @id input_http\n  port 8080\n  bind 127.0.0.1\n#  body_size_limit 32m\n#  keepalive_timeout 10s\n&lt;\/source><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>\n&lt;Location \/fluentd>\n        ProxyPass http:\/\/127.0.0.1:8080\n        ProxyPassReverse http:\/\/127.0.0.1:8080\n       Deny from All\n       Allow from 127.0.0.1\n       Allow from 192.168.0.23\n       Allow from 192.168.0.25\n&lt;\/Location>\n<\/code><\/pre>\n\n\n\n<p>Let us search the data now.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ncurl -XGET \"http:\/\/127.0.0.1:9200\/_search\" -H 'Content-Type: application\/json' -d'\n{\n  \"size\": 100,\n    \"query\": {\n        \"range\" : {\n            \"@timestamp\" : {\n                \"gte\": \"now-1d\",\n                \"lte\": \"now\",\n                \"format\": \"HH:mm:ss dd\/MM\/yyyy \"\n            }\n        }\n    }\n}'<\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\n{\n   \"took\":0,\n   \"timed_out\":false,\n   \"_shards\":{\n      \"total\":6,\n      \"successful\":6,\n      \"skipped\":0,\n      \"failed\":0\n   },\n   \"hits\":{\n      \"total\":2,\n      \"max_score\":1.0,\n      \"hits\":[\n         {\n            \"_index\":\"mka-2019.03.12\",\n            \"_type\":\"fluentd\",\n            \"_id\":\"TNNxcWkBeQURKlY1UfB4\",\n            \"_score\":1.0,\n            \"_source\":{\n               \"PET\":\"DOG\",\n               \"BREED\":\"Pug\",\n               \"SIZE\":\"Short\",\n               \"FOOD\":\"VEG\",\n               \"@timestamp\":\"2019-03-12T10:26:22.845303539+00:00\",\n               \"tag\":\"mka.event\"\n            }\n         },\n         {\n            \"_index\":\"mka-2019.03.12\",\n            \"_type\":\"fluentd\",\n            \"_id\":\"TdNxcWkBeQURKlY1UfB4\",\n            \"_score\":1.0,\n            \"_source\":{\n               \"Plant\":\"Xmas tree\",\n               \"Indoor\":\"yes\",\n               \"LeafType\":\"Spines\",\n               \"@timestamp\":\"2019-03-12T10:27:17.473281491+00:00\",\n               \"tag\":\"mka.event\"\n            }\n         }\n      ]\n   }\n}<\/code><\/pre>\n\n\n\n<p>You can visualize same data in Kibana as well.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"http:\/\/www.mka.in\/wp\/wp-content\/uploads\/2019\/03\/kibana_efk.jpg\" data-lbwps-width=\"1149\" data-lbwps-height=\"525\" data-lbwps-srcsmall=\"https:\/\/www.mka.in\/wp\/wp-content\/uploads\/2019\/03\/kibana_efk.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"960\" height=\"439\" src=\"http:\/\/www.mka.in\/wp\/wp-content\/uploads\/2019\/03\/kibana_efk-960x439.jpg\" alt=\"\" class=\"wp-image-1610\" srcset=\"https:\/\/www.mka.in\/wp\/wp-content\/uploads\/2019\/03\/kibana_efk-960x439.jpg 960w, https:\/\/www.mka.in\/wp\/wp-content\/uploads\/2019\/03\/kibana_efk-595x272.jpg 595w, https:\/\/www.mka.in\/wp\/wp-content\/uploads\/2019\/03\/kibana_efk-768x351.jpg 768w, https:\/\/www.mka.in\/wp\/wp-content\/uploads\/2019\/03\/kibana_efk.jpg 1149w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/><\/a><\/figure>\n\n\n\n<p>I will further explore to use this platform for SIEM and IoT data analysis and visualisation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So far I was happily using ELK stack to feed syslog messages into Elasticsearch. In ELK stack I had used Logstash to aggregate syslogs and feed them into elasticsearch. Recently, I came across fluentd and found it quite interesting and flexible. Using fluentd with Elasticsearch and Kibana I have now build a EFK stack. EFK [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1612,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[71,70,10,69,72,9,67],"class_list":["post-1609","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-bytes","tag-analytics","tag-efk","tag-elasticsearch","tag-fluentd","tag-json","tag-kibana","tag-rest"],"_links":{"self":[{"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/posts\/1609","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/comments?post=1609"}],"version-history":[{"count":12,"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/posts\/1609\/revisions"}],"predecessor-version":[{"id":1625,"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/posts\/1609\/revisions\/1625"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/media\/1612"}],"wp:attachment":[{"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/media?parent=1609"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/categories?post=1609"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mka.in\/wp\/wp-json\/wp\/v2\/tags?post=1609"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}