Community:How to make your data source work with Splunk-provided reports

From Splunk Wiki

Jump to: navigation, search

Make your data sources work with Splunk-provided reports

Splunk's reports are based on an evolving solution standard. This standard governs what fields, event type tags, and host tags Splunk can expect to find in data sources. These entities can then be plugged into Splunk reports for putting the data into context.

However, many data sources do not follow these standards out of the box. In these cases, you have two options:

  • Manually add the necessary tags and normalize the fields.
  • Find, download, and install an existing add-on that someone else has provided for your particular data source. Be sure that it does everything you want. If not, you can extend the add-on to suit your needs.

This document addresses option #1, taking a data source and manually readying it for use with Splunk reports. Go here for information on option #2.

Determining context

Every data source has a context. Understanding this context helps you decide what data--and hence what fields--you're interested in being able to extract for use in reports. For example, is this data source a Cisco PIX firewall log that is part of your organization's PCI solution? With firewall logs you can expect to have (and need) fields such as:

  • The time the event occurred, which can help you narrow down to just the events that happened around a time you are suspicious of, or can help you determine when an attack started.
  • The host the event occurred on (the firewall host itself), as you may have more than one firewall in your overall network structure.
  • Source IP address, hostname, or FQDN, so you can identify where the traffic is arriving from.
  • Destination IP address, hostname, or FQDN, so you can identify where the traffic is headed.

Note: FQDN refers to Fully Qualified Domain Name, or the full host, domain, and extension such as host5.example.com.

Often you can also expect firewall events to contain:

  • Source port, which will help you identify the port the traffic left from.
  • Destination port, which will identify the port the traffic is headed to.
  • The protocol and/or service being used, so you can identify what type of traffic this is. Sometimes you can imply certain information about the traffic type by the port, but not always, as it is possible to reconfigure which services use which ports.
  • Action, where the firewall details what it did with the packet. Often an action is in the form of terms such as accept and deny, but you have to look at each individual firewall to be sure of its terminology.

If you want this information and your firewall logs don't contain it, you can typically reconfigure your firewall to begin including this information for all new events.

Examining your data source

On top of the general information you may be able to find in a firewall event, each firewall product has its own data and ways of doing things. Let's go back to the Cisco PIX example. A log entry might look like:

Apr 23 04:12:35 10.0.0.3 %PIX-2-106006: Deny inbound UDP from 10.0.1.19/80 to 10.0.0.23/80 on interface eth0.

The data here breaks down to:

  • The timestamp of Apr 23 04:12:35
  • The firewall's IP address of 10.0.0.3
  • The Cisco PIX message identification code of %PIX-2-106006
  • The action taken in this message: Deny.
  • The indication that this is an inbound packet.
  • The originating host's IP address of 10.0.1.19.
  • The originating host's port of 80.
  • The receiving host's IP address of 10.0.0.23.
  • The receiving host's port of 90.
  • The interface that the traffic came into the firewall through: eth0.

So which pieces of this data represent fields you might want to extract? It's possible that you want access to all of them except for the timestamp, maybe not all at once, but at least at different times when looking for different information. The timestamp is dealt with more in terms of time navigation than as an actual search term.

A single event can give you a general idea of structure, but you'll want to take a far more extensive look through your log sample to determine what types of events need to be captured for reporting, alerting, and searches.

An example: sshd events

For the reasoning behind this particular example, see http://www.splunkbase.com/howtos/Splunk/howto:Using_Splunk_for_Event_Correlation.

When investigating ssh logins from some sshd servers, you can run into a collection of events like this:

Feb  8 08:35:11 splunk5 sshd(pam_unix)[25665]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.1.1.162  user=vily
Feb  8 08:35:13 splunk5 sshd[25665]: Failed password for vily from ::ffff:10.1.1.162 port 55775 ssh2
Feb  8 08:41:41 splunk4 sshd[23292]: Accepted password for vily from ::ffff:10.2.1.254 port 47526 ssh2
Feb  8 16:11:40 splunk5 sshd(pam_unix)[28308]: session opened for user mark by (uid=0)
Feb  8 16:23:52 splunk4 sshd[28529]: Accepted publickey for mark from ::ffff:10.1.5.78 port 50045 ssh2

Trying to act on users or IP addresses here can be tricky for three reasons:

  • PAM logs the IP address with a field name (rhost) that doesn't match the Splunk standard field (src_ip), so without adjustment the reports and other canned items available to download won't find the IP address in the first sample event.
  • sshd logs the IP addresses in the format ::ffff:ipaddress, which isn't a name=value pair, so this needs to be converted to a proper field with the field name src_ip.
  • Only the first line offers the user in a proper field with the right name. The rest of the lines have the user names just in prose, so they'll have to be converted to fields.

There's different ways of configuring Splunk to make each of the necessary changes.

Extracting the fields for sshd Failed and Accepted password events

Looking at the password events, there are actually two different versions (we're showing the Failed versions, the same happens for Accepted):

Feb  8 08:35:13 splunk5 sshd[25665]: Failed password for vily from ::ffff:10.1.1.162 port 55775 ssh2
Jan 15 17:08:25 splunk5 sshd[23393]: Failed password for invalid user hbscharp from ::ffff:10.2.1.254 port 48443 ssh2

The documentation explains how to interactively extract these fields. What you will find is that it is best to create four interactive extractions:

  • Search on sshd* password invalid, save it as an event type, choose that event type as the bind for the extraction. Specify that the field is user and the example is vily.
  • With the same event, do a second field extraction. This time, specify that the field is src_ip and the example is 10.2.1.254.
  • Search on sshd* password NOT invalid, save it as an event type, and choose that event type as the bind for the extraction. Specify that the field is user and the example is hbscharp.
  • With the same event, do a second field extraction. This time, specify that the field is src_ip and the example is 10.2.1.254.

Be sure to save after you're happy with each.

Extracting the user from the PAM session opened events

Looking at the PAM session opened for sshd event:

Feb  8 16:11:40 splunk5 sshd(pam_unix)[28308]: session opened for user mark by (uid=0)

There's no IP address to extract, but there is a user. Again, you can use interactive field extraction here. Search on sshd(pam_unix)* session opened, save is as an event type, choose that event type for the bind, specify that the field is user, and enter the example as mark.

Renaming a field

Now to rename the rhost field here to src_ip:

Feb  8 08:35:11 splunk5 sshd(pam_unix)[25665]: authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.1.1.162  user=vily

There are two ways to accomplish this field renaming. For one, you could write a transform. However, this method has some drawbacks:

  • You have to build a regular expression. For this purpose it's often not actually that difficult, but just the term regular expression makes some people break out in hives.
  • Nothing will change except for data indexed after you write the transform and restart Splunk. Your old data will still keep the rhost field name.
  • Most importantly, to some, rhost will be changed to src_ip in the event itself in your index. If you need to retain data integrity and can't make changes, then absolutely you cannot use this method.

Instead, we recommend that you use interactive field extraction to rename this field. Really, instead of renaming it, you can re-extract the IP address with the field name user, as though the rhost portion just didn't exist. To do this, run the search sshd(pam_unix)* authentication, save it as an event type if you haven't already done so, in the Field name box enter src_ip, choose the event type for this search in the Bind list, and enter 10.1.1.162 in the example value box.

Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk