Community:UseSplunkForEventCorrelation

From Splunk Wiki

Jump to: navigation, search

< Back to Best Practices

Using Splunk for Event Correlation

Sometimes it's not so much individual IT events, but a combination of events that signal there is a problem. For example, perhaps an attacker manages to open a shell on a breached host. There are many logical ways to detect attempted attacks through automation, such as watching for many ports to be hit in rapid succession by a port scanner, but detecting a successful attack can be a bit more difficult. Rather than watching for a single type of event, detecting successful attacks benefits from a technique called event correlation, where you use your tools to cross-reference between different events and issues.

There is not one single technique for correlating events in Splunk. How you accomplish this task depends on what you are actually trying to do. This document offers a number of examples to help you determine what method makes the most sense for your particular needs.

NOTE: While some might consider transactions an event correlation example, there are some specific methods for working with these that warrant their own topic, coming soon.

Event correlation with the search command

One piece of the potential attacker puzzle is being able to tell how many IP addresses a particular user's traffic is coming from. To narrow down this example, let's say we're only worried about network logins, and the only network logins we have enabled are through ssh. We know that we're using sshd to handle ssh logins.

Building the search for sshd logins

In this case, we don't care if a login was successful or not. In fact, it's best to make sure we get both failed and successful logins. Doing so raises the likelihood of finding problems, as separately the failed and successful logins may not mean much but together they may create a pattern that is cause for concern.

Let's start by determining what search will find us successful sshd login events. When digging through our logs with Splunk, lines like the following appear:

Feb  8 08:41:41 splunk4 sshd[23292]: Accepted password for vily from ::ffff:10.2.1.254 port 47526 ssh2
Feb  8 16:11:40 splunk5 sshd(pam_unix)[28308]: session opened for user mark by (uid=0)
Feb  8 16:23:52 splunk4 sshd[28529]: Accepted publickey for mark from ::ffff:10.1.5.78 port 50045 ssh2

So, the following search should find successful sshd login events:

``sshd (Accepted publickey) OR (Accepted password) OR (session opened)``

In order to save yourself typing time (not to mention so you won't have to figure out this whole search again), save it as an event type. Through the example we'll assume you named this event type sshd-session-open-success.

To find the failed logins, we use Splunk to look through our logs again. This gives us:

Feb  8 08:35:11 splunk5 sshd(pam_unix)[25665]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.1.1.162  user=vily
Feb  8 08:35:13 splunk5 sshd[25665]: Failed password for vily from ::ffff:10.1.1.162 port 55775 ssh2

So, we might search on sshd authentication failure, sshd failed password, or just sshd fail*. However, the last option is too broad, we get too many events, so instead this works better:

sshd fail* authentication OR password

Save this out as an event type for your convenience. For this example, we'll assume the name is sshd-session-fail.

Since we want all events that have failed or successful sshd logins, then, our larger search would start with:

eventtype=sshd-session-fail OR eventtype=sshd-session-open-success

Or you could make a single event type out of the two, saving out the search:

``sshd (Accepted publickey) OR (session opened) OR (fail* authentication) OR password``

as eventtype=sshd-session-fail-or-success

Summarizing the results based on the source IP

We could just look through the results to see the source IPs, but the whole point of using a tool like Splunk is to let it process the data for us. The stats command lets us tell Splunk to generate summary statistics for us out of whatever is passed to it. The command's dc (distinct_count) argument specifies that the summary we're interested in is to determine what each distinct item is--in this case all of the unique IP addresses--and give us a count of how many there are of each item. This command and argument are used in the format:

stats dc(//field-to-count//) by //fields-to-return//

In our case here, we run into a complication. Look again at the sample events:

Feb  8 08:35:11 splunk5 sshd(pam_unix)[25665]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.1.1.162  user=vily
Feb  8 08:35:13 splunk5 sshd[25665]: Failed password for vily from ::ffff:10.1.1.162 port 55775 ssh2
Feb  8 08:41:41 splunk4 sshd[23292]: Accepted password for vily from ::ffff:10.2.1.254 port 47526 ssh2
Feb  8 16:11:40 splunk5 sshd(pam_unix)[28308]: session opened for user mark by (uid=0)
Feb  8 16:23:52 splunk4 sshd[28529]: Accepted publickey for mark from ::ffff:10.1.5.78 port 50045 ssh2

Notice that you have an rhost field in one of the samples, a field starting with ::ffff: in others of the samples, and one of the samples has no IP information at all. The Splunk interface standard calls for a field named src_ip, so the rhost field needs to be renamed, and the ::ffff: fields need to be changed to a src_ip=//value// pair. Again, see http://wiki.splunk.com/Community:How_to_make_your_data_source_work_with_Splunk-provided_reports for how to make these conversions.

Once you have implemented this conversion, your stats search begins with:

stats dc(src_ip) by

The second complication now emerges. only one of these lines actually has a //name=value// pair for user, kindly generated by the PAM daemon. The sshd daemon doesn't do this by default, or at least this sshd daemon. See http://www.splunkbase.com/howtos/Splunk/Data_Sources_%5BSplunk%5D/howto:Make_your_data_source_work_with_Splunk-provided_reports for how to convert these entries to //name=value// pairs.

When this conversion is completed, then your stats search becomes:

stats dc(src_ip) by user

Looking for users who accessed from more than one IP address

Once you have your summary, you can filter out all of the users who came in from just one IP address. You start with the first two searches as follows:

eventtype=sshd-session-fail-or-success | stats dc(src_ip) by user

where you're using a pipe (|) to send the results of the first search as the input to the second. Now you add another pipe, and the search command so you can run a search on the results of your stats command. For this example, we chose the search command because we are evaluating the contents of a specific field in each event.

You use the search command in the format:

search //field terms//

where //field// is the field you're searching on and //terms// refers to one or more search terms. Looking at the results of the command at the beginning of this section, you now have a field named dc(src_ip). This is what you want to use as the //field//. As far as the rest of the search terms, what you're interested in are values greater than 1. So, you can use:

search "dc(src_ip)" > 1

as this component, giving you a final search of:

eventtype=sshd-session-fail-or-success | stats dc(src_ip) by user | search "dc_src_ip" > 1

Event correlation with subsearches

Event correlation with Splunk sometimes involves subsearches, which allow you to take the results of one search and use them in another, using the where command's syntax for filtering. Subsearches are used when you are evaluating events in the context of the whole event set, rather than evaluating the events individually.

Typically subsearches are built in the form:

//2ndsearch// ``[``search //1stsearch//``]``

Using the example mentioned above, the first search would be formed to detect an attempted attack, and the second would watch for the successful opening of a shell on the same machine as the attempted attack.

Building the attack detection search

One way to watch for an attack is to watch for multiple failed logins on a single account. As with the previous scenario, we're only worried about network logins, and the only network logins we have enabled are through ssh, which is being handled by sshd. Also, we can narrow further to the specific user account that has repeatedly failed logins until it succeeds.

As before, we find events such as:

Feb  8 08:35:11 splunk5 sshd(pam_unix)[25665]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.1.1.162  user=vily
Feb  8 08:35:13 splunk5 sshd[25665]: Failed password for vily from ::ffff:10.1.1.162 port 55775 ssh2

We can use the same sshd-session-fail event type here that we created for the previous example.

Building the login detection search

When digging through logs for successful sshd login events, lines like the following appear:

Feb  8 08:41:41 splunk4 sshd[23292]: Accepted password for vily from ::ffff:10.2.1.254 port 47526 ssh2
Feb  8 16:11:40 splunk5 sshd(pam_unix)[28308]: session opened for user mark by (uid=0)
Feb  8 16:23:52 splunk4 sshd[28529]: Accepted publickey for mark from ::ffff:10.1.5.78 port 50045 ssh2

These should look familiar from the previous example. We can use the same event type here as before, sshd-session-open-success.

Combining the two

Now that you have a search that finds failed ssh logins, and a search that finds successful ssh logins, it's time to correlate the two so that you can search for failed logins, and then find which of these resulted in successful logins. To start with, this means using the structure:

``eventtype=sshd-session-open-success [search eventtype=sshd-session-fail`` //operators//``]``

The field we're primarily interested in here is the user field, so we know which account to investigate. Again, you will need to do some customization in order to create such a field where it doesn't exist. Once you do, your search becomes:

``eventtype=sshd-session-open-success [search eventtype=sshd-session-fail | fields + user]``

We can also have the output sorted by the users this has happened most with by adding onto the end:

``eventtype=sshd-session-open-success [search eventtype=sshd-session-fail | fields + user] | top user``

However, this would be more useful if we limited the failed login matches to those who had failed multiple times. Everyone mistypes their password every once in a while. How you define this threshold is dependent on your own environment. If you just wanted to see those where there were more than two failed logins, then you could expand the search to:

``eventtype=sshd-session-open-success [search eventtype=sshd-session-fail | where count > 2 | fields + user] | top user``

You also might choose to get fancier, but this is a relatively straightforward example of how you can use Splunk for event correlation through subsearches.

Event correlation through time

Another attack-detection scenario can involve when an attempted login takes place. These days after-hours and weekend logins may not be uncommon, but frequency and duration still tend to follow a different pattern than during work hours. For this example, let's look at the users who logged in Monday through Friday from 7pm through 7am, and those who logged in over the weekend.

Finding who logged in

We could get fancy here and build searches to add extra priority to failed logins during these hours, but for the moment we will just focus on who successfully logged in during these hours. In the first example we created the event type sshd-session-open-success that contained the search:

``sshd (Accepted publickey) OR (Accepted password) OR (session opened)``

The events it matched when digging through our logs were:

Feb  8 08:41:41 splunk4 sshd[23292]: Accepted password for vily from ::ffff:10.2.1.254 port 47526 ssh2
Feb  8 16:11:40 splunk5 sshd(pam_unix)[28308]: session opened for user mark by (uid=0)
Feb  8 16:23:52 splunk4 sshd[28529]: Accepted publickey for mark from ::ffff:10.1.5.78 port 50045 ssh2

However, if we match all of these, our count will actually be off as different stages of the login process would all be counted, rather than just one. For this search, we will focus on the sshd(pam_unix) entry that declares that the session was opened. So, after running the search session opened and ensuring that it does only match what we're looking for, we can save this search out as sshd-session-opened.

Finding weekend logins

To find events that happened on a specific day of the week with Splunk, you use the field date_wday. Unlike issues in other parts of these examples, you don't need to go through and do anything special. Splunk already knows when each of your events happened. This field is available by default--unless you have done something special to disable it. If you'd like to be able to see this field in your data, click the Fields dropdown box, select date_wday, and then click Apply. This field will now appear with its value beneath each event in your search results.

Now you can contruct the search that looks for events that happened on Saturday or Sunday. To do so, you use:

date_wday=saturday OR date_wday=sunday

So to find all succesfully-opened ssh logins on the weekends, use:

eventtype=sshd-session-opened date_wday=saturday OR date_wday=sunday

Finding weekday logins between 7pm and 7am

On the other side of the day search, you need to look at all weekdays between 7pm and 7am. To find weekdays, just reverse the weekend portion of the previous search:

NOT date_wday=saturday NOT date_wday=sunday

Now, to narrow it down to the specific hours. You have two options here. You could build a fun regular expression such as:

``"(([1][9]|[2][0-4])|^[1-7]$)"``

which looks for values of date_hour from 7pm (19:00) through 7am (07:00) but not after 7am through just before 7pm, and make it into:

``| regex date_hour="(([1][9]|[2][0-4])|^[1-7]$)"``

or you can (as most people probably prefer) specify a range like so:

``(date_hour >= 19 OR date_hour <= 7)``

Putting it all together

So, now that all of the components are in place, you can put it all together to make one aggregate search command that shows you who's logging in outside of work hours. From there you can click Report on results and adjust until you've got a report that helps get the data across easily.

session opened (date_wday=saturday OR date_wdate=sunday) OR (NOT date_wday=saturday NOT date_wday=sunday) (date_hour >= 19 OR date_hour <= 7)

Event correlation with mapping

Another method of finding all cases where logins had previously failed and then a session successfully opens is using //mapping//, The map command allows you to take the results of a search and map them to a variable in a saved search or a subsearch. Say that you begin with the search:

sshd failed password

to look for failed logins through sshd. You might choose to group the results by time, say in five minute intervals, so you can get a feeling for how frequently the failed logins are occurring. To do so, use the localize command, which allows you to break results into time ranges in seconds, minutes, hours, or days. Adding the following:

| localize maxspan=5m

tells Splunk to break the results into five minute timespans, giving us:

sshd failed password | localize maxspan=5m

To then add the mapping portion, you send the results through another pipe to the map command.


(will finish when I can)

| map search="search session open starttimeu=$starttimeu endtimeu=$endtimeu"

Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk