Elasticsearch can’t connect to Graylog Server

Upon rebooting both my Graylog and Elasticsearch servers, suddenly Graylog could not connect to Elasticsearch. I checked to make sure the config files hadn’t changed, adequate disk space, both servers could ping each other, etc. As someone who is new to both Graylog and Elasticsearch it was definitely a head-scratcher. I ran a tail -f /var/log/elasticsearch/graylog2.log to see what was up.

[2015-08-03 08:14:49,874][INFO ][node] [White Tiger] initialized
[2015-08-03 08:14:49,875][INFO ][node] [White Tiger] starting ...
[2015-08-03 08:14:49,972][INFO ][transport] [White Tiger] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/]}
[2015-08-03 08:14:49,984][INFO ][discovery] [White Tiger] graylog2/HP46Fyz-RV-Z4UHSzxJYMg
[2015-08-03 08:14:53,764][INFO ][cluster.service] [White Tiger] new_master [White Tiger][HP46Fyz-RV-Z4UHSzxJYMg][localhost][inet[/]], reason: zen-disco-join (elected_as_master)
[2015-08-03 08:14:53,970][INFO ][http] [White Tiger] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/]}
[2015-08-03 08:14:53,970][INFO ][node] [White Tiger] started
[2015-08-03 08:14:55,018][INFO ][gateway] [White Tiger] recovered [9] indices into cluster_state
[2015-08-03 08:15:08,864][INFO ][cluster.service] [White Tiger] added {[graylog2-server][ETxBUIRtReSC40zU3wWzEg][graylog.corp.waters.com][inet[/]]{client=true, data=false, master=false},}, reason: zen-disco-receive(join from node[[graylog2-serv
er][ETxBUIRtReSC40zU3wWzEg][graylog.corp.waters.com][inet[/]]{client=true, data=false, master=false}])
[2015-08-03 08:15:09,190][INFO ][cluster.service ] [White Tiger] removed {[graylog2-server][ETxBUIRtReSC40zU3wWzEg][graylog.corp.waters.com][inet[/]]{client=true, data=false, master=false},}, reason: zen-disco-node_left([graylog2-server][ETxBUIR
tReSC40zU3wWzEg][graylog.corp.waters.com][inet[/]]{client=true, data=false, master=false})
[2015-08-03 08:15:38,971][INFO ][cluster.service ] [White Tiger] added {[graylog2-server][Ovc47S_XT0OyaLKtrVdXfQ][graylog.corp.waters.com][inet[/]]{client=true, data=false, master=false},}, reason: zen-disco-receive(join from node[[graylog2-serv
er][Ovc47S_XT0OyaLKtrVdXfQ][graylog.corp.waters.com][inet[/]]{client=true, data=false, master=false}])
[2015-08-03 08:15:39,107][INFO ][cluster.service ] [White Tiger] removed {[graylog2-server][Ovc47S_XT0OyaLKtrVdXfQ][graylog.corp.waters.com][inet[/]]{client=true, data=false, master=false},}, reason: zen-disco-node_left([graylog2-server][Ovc47S_
XT0OyaLKtrVdXfQ][graylog.corp.waters.com][inet[/]]{client=true, data=false, master=false})

**note: I edited out my real IP Addresses.**

After searching through 3 pages of Google search results, I found a post that said to check your indices by running:
curl 'localhost:9200/_cat/indices?v'

Low and behold, two of the came back as “red” under the health column.
curl 'localhost:9200/_cat/indices?v'
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open gpswae1.html 5 1 0 0 575b 575b
yellow open webui 5 1 0 0 575b 575b
yellow open phppath 5 1 0 0 575b 575b
green open graylog2_3 1 0 15736279 0 18gb 18gb
yellow open perl 5 1 0 0 575b 575b
green open graylog2_2 1 0 20000243 0 22.7gb 22.7gb
red open graylog2_1 1 0 20000243 0 22.7gb 22.7gb
red open graylog2_0 1 0 20000243 0 22.7gb 22.7gb
yellow open spipe 5 1 0 0 575b 575b

Since this server right now is a POC and not in full production, I deleted the two corrupt indexes by running:
curl -XDELETE 'http://localhost:9200/graylog2_0/'
curl -XDELETE 'http://localhost:9200/graylog2_1/'

Once deleted, I was able to restart Elasticsearch, and Graylog connected successfully. I decided to look into other ways to monitor Elasticsearch and found: ElasticHQ

Quick and easy install on the ElasticSearch server and I am now able to monitor the Elasticsearch server with better visibility.


I recently started using ConEmu since I wanted to be able to have tabbed PowerShell consoles. In an effort to become more proficient with it I’ve forced myself to try not to do anything outside of a PowerShell window.

At my current employer I’m a Jack of All Trade SysAdmin. For me this means I work with both Windows and *nix machines. For the past few years I’ve been using SecureCRT or PuTTY to SSH into *nix machines. The other day I had an idea to make it easier for me to SSH from my Windows PC to these servers. I installed OpenSSH on my Windows 8.1 machine which enables me to SSH/SCP from my Windows machine to a Linux/Solaris/etc box from either the command prompt or PowerShell. I then added it to my path to enable me to just type “ssh” or “scp” instead of having to navigate to the OpenSSH directory every time. I’ve been using PowerShell and OpenSSH for almost a month now, and I’ve only ran into one issue of not being able to use vi on one Solaris 10 machine.

Here’s a quick tutorial:

Navigate to System > Advanced system settings
Advanced System Settings

Click on the Advanced tab then Environment Variables
Environment Variables

Edit your PATH variable under User variables and add the location of the bin directory for OpenSSH on your system (ex: C:\Program Files (x86)\OpenSSH\bin)
PATH Variable

That’s all you need to do. Now you can SSH or SCP directory from either cmd or PowerShell:

Quick Script: Enable AD Users and Change Password

Root Cause Analysis (AKA: Be careful where you root)

A week ago, I received a phone call from both my manager and a coworker at 8PM on a Sunday. Something was changing all of our Solaris system’s hostnames to -f. My coworker had fixed all those boxes and shortly after – it happened two more times. At this point we’re at a complete loss – we’re not sure why or what is doing this. The logs show absolutely nothing and none of our Linux boxes were seeing the issue.

Then we check – hostname -f on Linux will give you the FQDN. However, Solaris’ hostname command has no arguments and will just change the hostname to whatever follows the command. We stay up until 10:30PM or so checking logs and trying to find a trace of anything. Absolutely nothing in the logs that point us to what could be causing this. We decide we’ll reconvene in the morning as it hasn’t happened again in ~2 hours.

The next morning we’re still trying to figure out what happened and why. What we did know is it happened at 4:50PM and again at 8-8:15PM. I wander over to our security guy and ask if there’s anyway we can tell me who was logged in on the VPN between 4:45 – 8:30PM. As we’re awaiting for him to check the logs, something hit me. I asked “Who knows the root credentials?” My manager quickly responded with “Just the 6 people in our group.” At that point, a lightbulb went off in my head.

Whatever caused this one, needed to be someone in our group and two – the way the change was happening was almost like something was scanning the network. Then I remembered a coworker was playing with Dell OpenManage Essentials. The Security guy verified he was VPNed in around the time the first one happened as well. From there, I logged into Dell OpenManage Essentials and sure enough, there were 4 scans that had ran and had root credentials configured. I pinged my boss at this point to let him know my theory to which he responded with “That shouldn’t…but try running it on a test system.”

Sure enough, I ran the scan against a test Solaris box and the hostname was changed to -f. More concerning as well was that the scan was using ssh to run and root should have not been allowed on any of these systems. Apparently using Centrify (that we use for SSO) creates another sshd config where the default is that root login is permitted.

Certainly not the worst thing that could happen, but changing the hostname on our productive SAP Solaris boxes was pretty scary. Also, we’re lucky it was ran on a Sunday and not Monday morning. So word of caution to those who permit ssh login via root, and also share those root credentials with those who may not be familiar on what that could do: Don’t. Or at the very least, make sure you check the tools they’re using first.

P.S.: Dell – please tell your developers of OpenManage to use uname -n and not hostname -f.

Terminate User Script

I wrote this script to help streamline the process of terminating an employee at work. I’m still certainly learning Powershell, but this definitely scratched an itch at $job.

The only thing you will need to change to run it is the path to the OU. You will also need a Log folder on your C:\ drive – this is where the list of groups will get dumped to with the username as the filename. We occasionally have rehires, so it’s nice to have a list of what groups they were in prior to being terminated in case they come back.

VMNet Build Error for VMware Player Plus on Fedora 20

I ran into this error today when I reformatted my laptop to install Fedora 20.

I installed VMware Player, however whenever I went to launch it I received a vmnet build error:

vthread-3| W110: Failed to build vmnet.  Failed to execute the build command.

On kernels 3.13 that have enabled the network packet filtering framework (Netfilter) will fail to build the vmnet module.

To fix this issue, which I found on the ArchWiki:

It took a bit of digging for me to find this link – so I thought I would put it somewhere more easily accessible for if (when) I run into this error again.