Graylog2 is an awesome tool that can provide insight into your environment to help you catch small problems before they become raging infernos. It’s also great at just making things more visible. Perhaps you’ll catch something in your logs that could be changed to make things run more efficiently. Stop looking through those boring log files! Get some Graylog2 in your life, but first let’s go over some recommended prerequisites.

1. Design an environment that fits your needs.

It’s easy to get in over your head when setting up your own Graylog2 log analysis environment. Without proper preparation, you could end up with an environment that does not fit your needs. Most likely, you’ll end up with a Graylog2 server that cannot process the volume of logs being forwarded to it, which can lead to dropped logs and unreliable data. There are a couple of steps you can take to avoid these problems:
Determine how many devices will be forwarding logs to Graylog2.
Estimate the number of logs being generated per device per second.
Build your servers with enough resources to handle the estimated messages per second, with room for growth.

2. Build your environment with scalability in mind.

Individual components that make up a Graylog2 environment were made to be scalable. Elasticsearch allows a new node to be added to a cluster with minimal configuration. A Graylog2 server can also be added to the environment with minimal changes to the Graylog2 Web Interface configuration file. Graylog2 also has easy load balancer integration via REST API that allows for HTTP ALIVE or DEAD checks on Graylog2 servers, which can help remove a Graylog2 node from the cluster if it is having issues.

3. Optimize data retention to get the most out of your environment.

By default, Graylog2 is configured to retain a total of 20 indices, with each index holding close to 20,000,000 events.

Graylog2.conf code snippet:
elasticsearch_max_docs_per_index = 20000000

# How many indices do you want to keep?
# elasticsearch_max_number_of_indices*elasticsearch_max_docs_per_index=total number of messages in your setup
elasticsearch_max_number_of_indices = 20

Unfortunately, Graylog2 does not yet allow for indexing and rotation based on date, but it is likely to be added in a future release. It is possible to tweak the elasticsearch_max_docs_per_index setting in the Graylog2 configuration file, so that one index holds close to a single day’s or week’s worth of logs. This may make it easier to manage data retention by date, but it is not exact. It is important to note that you should not set your data retention so high that disk space runs out on the Elasticsearch nodes.

4. Install and configure NTP!

It’s critical that you keep your Graylog2 servers synced with a time server, or you may end up wasting time looking at the error below and wondering what the heck is going on.

graylog ntp time skew

Search the Graylog2 Google Group for this error, and you may find a post by a certain blog writer (ahem) that had to troubleshoot this issue a while back. Time skewing causes your Graylog2 nodes to get out of sync. Don’t let that happen to you!

5. Utilize the LDAP authentication functionality of the Graylog2 Web Interface.

The built-in LDAP authentication module in Graylog2 is great. For one, you don’t have to keep track of another account in your ever-growing password safe. Also, it’s just plain smart for security. The fewer user accounts out there being neglected, the less likely that one of them will be compromised!

I recommend creating an Active Directory group and adding those who will be using Graylog2 to that group. Be sure to set the default role as “Reader.” If a user needs administrative rights, just have that user log in one time so that the account is created, and then manually bump the user up from “Reader” to “Administrator.” Easy! Plus, it’s better than making everyone Administrators by default.

6. Monitor key components in your Graylog2 environment.

If you run Nagios or Icinga internally then it would be extremely beneficial to monitor the Graylog2 environment. Monitoring the graylog2-server, graylog2-web-interface, mongod, and elasticsearch services is a great way to ensure that these key services remain running. A loss of any of those services could be detrimental to log processing.

AppliedTrust offers 2 opensource Graylog2 service checks that can be pulled from the Github links below. One will assist with monitoring the Graylog2 server itself. The other can be used to run queries on the log data stored by Graylog2. For example, have it alert if any query returns with results because a user ran “sudo su” or some other privilege escalating command. It’s super simple and very effective!

The Elasticsearch API provides some awesome methods to determine the health of your Elasticsearch cluster. It is possible to pull the Elasticsearch cluster status through the API. The Elasticsearch documentation recommends using this Nagios plugin for intergrating your monitoring environment with your Graylog2 environment: https://github.com/anchor/nagios-plugin-elasticsearch. Using this, you can have a Nagios or Icinga check that will check for a “green” cluster status and alert if it ever goes “yellow” or “red.”

It is very important that you monitor disk space usage on servers in the Graylog2 environment. The root partition of all servers should be monitored to prevent any operating systems from crashing. The Elasticsearch data directory (/var/elasticsearch/data/) is also a key partition to monitor, because letting it max out could cause your Graylog2 environment to crash.

It is also a best practice to monitor key system components such as CPU utilization, memory usage, and system I/O to watch for any irregular activity on the core Graylog2 servers.

Oh yeah, don’t forget to monitor the ntpd service!

7. Don’t depend on Graylog2 as your only log retention tool.

Don’t rely on any single point of failure for log retention, even Graylog2! Yes, it has all the nice bells and whistles, but it can be unstable at times. It is wise to store all of your logs in a secondary location. If possible, have all of your devices log to a centralized syslog server. This centralized server can store logs in a flat log file format and then forward logs over to the Graylog2 servers.