Community:BestPracticesForNormalizingFieldNames

From Splunk Wiki

Jump to: navigation, search

< Back to Best Practices

Best practices for normalizing field names in Splunk

Using Splunk's Common Information Model as a guide, you can normalize field names in your IT data so that loading external applications like firewall reports will "just work" with your existing fields. Otherwise you will have identical data with different field names and have to constantly edit and adjust applications to make everything work together. This document helps you better understand the issue, and teaches best practices on how to accomplish useful field normalization.

Introduction

Splunk automatically extracts fields from IT data in the form of key=value pairs. For example, when you see an event such as:

Nov 25 04:47:36 bombo sendmail[10895]: lAPClZXV010894: to=<root@bombo.example.com>, ctladdr=<root@bombo.example.com> (0/0), delay=00:00:01, xdelay=00:00:00, mailer=local, pri=31529, dsn=2.0.0, stat=Sent

the key=value pairs are:

to=<root@bombo.example.com>
ctladdr=<root@bombo.example.com> (0/0)
delay=00:00:01
xdelay=00:00:00
mailer=local
pri=31529
dsn=2.0.0
stat=Sent

You can then use these fields to quickly generate reports. However, what if you want to generate reports that involve IT data from more than one source? There is no consensus among programmers over what to name the key side of the equation. So, if you want to run a report that cross-references both your firewall events and your router events, you can't be sure that the keys (the field names) are going to be the same in both logs. One way or another you have to tell Splunk that these two differently-named fields are actually the same.

Creating your own field name standards

It is virtually impossible to pull together a standard that takes into account every potential IT data field, especially given that many organizations are using solutions written in-house. In these cases, it can be useful to build your own field name normalization standards to ensure that everyone is using the same fields. If you're using custom solutions, then you can have your programmers--or the company building the solutions for you--use the standards directly in the code and avoid having to go through normalization steps later. Those who don't have that level of control will do well to audit the IT data they want to normalize the fields within, and then sit down and write up a document that defines the standards for what to name these fields. Be sure that your internal standards don't conflict with the Splunk Common Information Model standard, or duplicate what is already defined there.

Once you have your internal standard ready, be sure to educate your staff on its existence and how to put the standard into use. It's typically a good idea to have one or two people be responsible for maintaining the standard. These maintainers can then be in charge of inventorying new internal IT data sources and extending the standard when needed, always making sure that it really does require extending. These individuals can also be responsible for submitting a request to include the new fields in the Splunk Common Information Model by contacting support at http://www.splunk.com/page/submit_issue .

Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk