From Splunk Wiki
Best practices for creating event types
Using Splunk's interface standards, you can create event types that not only match critical events, but add useful meta-data as well. This document details the best practices on creating useful event types that allow you to quickly view the events that matter most to you.
Building useful event types
To begin building an event type, start by pulling together as many samples of the data you want to match as possible, such as ssh-related log files from multiple machines. The more limited your data set, the more likely it is that the resulting event type may not match everything you hope in the future.
Once you have a good representative data sample, construct a search that pulls up the data you want to match. Make sure that your search isn't too limiting, it is possible for a poorly-constructed search to leave out events that you would otherwise find interesting. For example, searching on ssh failed brings up a limited set of events compared to searching for ssh* failed, and searching on sshd failed will leave out events where only ssh is referenced. You would want to try ssh, sshd, and ssh* and compare the results to decide which term makes the most sense given what you want to match. If you have a large amount of data, comparing results can be made easier by using NOT statements (for example, ssh NOT sshd failed) or by using a feature such as Splunk's field filtering to glance over aspects of your results.
You typically want to avoid using the following in an event type, particularly if you intend to share it throughout or outside your organization:
- source - Using a path to a particular file is risky as not every system and not every system administrator uses the same file names or paths. For your event type to be portable, it can't expect to find a certain file in exactly the right place with exactly the right name.
- sourcetype - A product such as Splunk does its best to establish a sourcetype for events as they are indexed. These sourcetypes identify the kind of data you're looking at, such as cups_access, syslog, or linux_secure. The problem with including a sourcetype in an event type is that people can customize what sourcetype is assigned to their data. For your event type to be portable, it can't expect that everyone is using the same sourcetype designations.
- host - Using host names and/or IP addresses in your event types will severely limit what the event type will match. Even worse, if you're using private network address ranges, just using the same event type on more than one private network can create false positives. For your event type to be portable, it can't rely on host names or IP addresses. However, there are cases where you are making event types for internal use and are very interested in events coming in through, say, a specific firewall. Here, you would want to specify a host name, IP address, or both.
Note: Obviously, if you are making event types to only be used by you or in known environments, then the rules above don't entirely apply. Still, if you want to follow best practices, try to avoid these particular pitfalls.
There is more to an event type than just matching what you want, however. Generally speaking, you want to try to match just the events that interest you. Everyone's tolerance for stray events that aren't what they want is different, so only you can decide how exact you want to be, but you'll want to put at least a bit of effort into making sure that you're as exact as possible. To do this, start by making sure that you have a wide variety of data loaded, such as all of the contents of /var/log and any other log files you can find on your system. The more types of data you have to check against, the more sure you can be that you won't get more false positives than you are willing to deal with. Bad matches can especially be annoying if the event type is being used to trigger an alert that wakes you up at 4 a.m. to check what it thinks is a critical server fault.
A good way to avoid false positives is to construct your event type around two different elements: keywords, and punctuation. Your initial event type probably consists entirely of keywords. To help ensure that you're getting the right data, adding the element of punctuation can be key. You search for punctuation by using Splunk's punct search operator. The punct field is not active by default, see the Splunk documentation for how to make punct available to work with.
Once you have enabled punct field filtering, use the punct field filter to see how many punctuation variations you're dealing with. There may be only one punctuation, in which case your job is easy. Select the punctuation in the filter, and then select Add filter to search. The punctuation value is now part of your search.
However, if there is more than one punctuation in the filter, you will need to create a wildcarded version that matches your desired events. A single wildcarded punct isn't always available, sometimes you need two with AND or NOT in between them, but it's best to strive for one. The rules for wildcards in punct are relatively straightforward:
- You can add an asterisk to match zero or more characters into the middle of the punct value.
- You can add an asterisk to match zero or more characters at the end of a punct value.
- If you do add an asterisk, it's best to put the entire punct statement in quotes--really, it's not a bad idea to always put it in quotes. For example, a punct of
__::__:_:*..._:____.might be added as
There are a number of techniques you can use to build your wildcarded punct. Those with a talent for seeing patterns might choose to look through the punct field listings and use what they see to try to make a wildcarded pattern. Those who prefer a bit more of a brute force method might try a more step by step approach. For example:
1. Start with a search that matches the data you want and has both your keywords and a single punct. For example, ssh* failed punct="__::__:_:______..._:____."
2. Add a NOT in front of the punct and press Enter to run the new search. For example, ssh* failed NOT punct="__::__:_:______..._:____."
3. Look through the resulting punct field listing to see if there are any punct values that are very similar to the one in your search. For example, it might find ((punct="___::__:_:______..._:____."}}.
- If so, add the filter to the search.
- If not, skip to step 7's "If not" bullet.
4. Visually compare the two punct values. It can help to copy them out into a text file, one above the other, so you don't have to squint and count out items such as underscores.
5. Build a wildcarded punct that will match both punct values without (hopefully) matching false positives. In the example, the only difference between the two punct statements is an extra underscore in the first section, so punct="__*::__:_:______..._:____." will match both. The rest of the steps will help make sure there aren't false positives in the mix.
6. Edit your search to remove the second punct and change the first (with the NOT) to the appropriate wildcarded version. So, the example search would now be ssh* failed NOT punct="__*::__:_:______..._:____.".
7. Press Enter to run the new search.
8. Look at the new punct filter list. Are there any listed that are similar to your wildcarded punct?
- If so, return to step 3.
- If not, are they very different from this punct but similar to one another? If this is the case, then you probably are looking at results from two very different sources (for example, sshd output and a software updater log that happens to mention updating the sshd package). Filter on the the following to determine what you're looking at, clearing the filter between each type:
- The sourcetype - Multiple sources and puncts might all be evaluated as the same sourcetype. If they are, then your events all probably come from the program and are just in different files.
- The source - Multiple sources typically means that the data is coming from multiple files. As mentioned above, these different files may all actually be created by the same process. This relationship can be obvious, for example, if you're seeing /var/log/secure, /var/log/secure.1, and /var/log/secure.4. Or, it could be not so obvious. The syslog daemon may be sending the same events, for example, to /var/log/messages and /var/log/secure. Using the sourcetype filter can be invaluable when you aren't sure if events are really related or not. While you don't want a source or sourcetype in your event type (typically), you can NOT the sources or sourcetypes that you don't want to match for now. This technique will allow you to focus at this stage on only the data you want.
If you didn't have many punct statements to deal with, then you may be finished, or almost finished. Sometimes, though, you have hundreds of different puncts, especially when you're dealing with events that have file paths or URLs. When this is the case, use the filters and your understanding of where the paths or URLs are to identify where you can do some bulk wildcarding. If worse comes to worse, copy them all out into a text file, one on top of the other, and compare them that way.
Once you have a combination of keywords and puncts that match all of the data you want, remove all of the NOTs and change the search so it's only looking for the positive matches. If you end up with more than one punct in that search, be sure to put an OR between each punct or nothing will match your search. When you run the search again, this time focus on the false positives. You may find that you have to adjust your punct statements or keywords to eliminate bad matches. If there are only a few and you can tolerate them, then don't drive yourself crazy trying to get rid of them. If there are a significant number, though, you will definitely want to refine your event type.
To refine the search, you may need to add some NOTs, though try not to add too many as they make a search more "expensive," meaning that it takes longer. However, for example, perhaps you're interested in the CUPS daemon's activities but not all of its actual HTTP events, of which there are many. Just adding a NOT HTTP to the search eliminates them, allowing you to make a CUPS event type that won't be overflowing with POST and GET requests. Another element many people like to get rid of are package manager logs, which often show up because the service name is part of the package's name. In the case of a Linux distribution that uses RPM for package management, adding a NOT *rpm eliminates these items from your search, though you will have to double-check that you're not removing data that you actually want.
Once you have your event type exactly the way you want it, you need to save your event type. When saving, you will name the event type, and assign tags to it. Before you finish the saving process, be sure to read the next section. Adding meta-data with tags
You can make your event types even more useful by giving them descriptive tags. When at all possible, follow the Splunk interface standards when you choose your primary tags. If you are consistent with your tagging, then you can easily use the eventtypetag:: operator in order to search for all events that match event types with that tag. For example, if you wanted to search for particular events involving authorization verification, if you were using the Splunk interface standards you could search on eventtypetag="authorization/verify" and then narrow your search from there.
You may find that there are tags you want to apply in addition to those mentioned in the Splunk interface standards. For example, your responsibilities may actually span multiple physical locations. You could add a host tag to each of your hosts that refers to its location--if you do, be sure to make a standardized list everyone follows or the tags will be nearly useless. You can then search with the hosttag:: operator to find all hosts with a particular tag. You could do something similar with event type tags as well, with wildcarded IP address ranges for example.
Think of tags in general as memory-jogger notes, though again, they need to be standardized within your organization so they're useful. Here are some more ideas for useful event type tagging strategies:
- The operating system producing the event.
- The program producing the event.
- The admin in charge of the program producing the event.
- The administrative role in charge of the program producing the event.
- The type of event this is, such as a verbose debugging event, a specific type of security event, a "counter" event for tracking how many hits something has had (for example).
- If the application doesn't include the term ERROR, WARNING, or another searchable message level, you can use tags to make up for this gap.
- Date-related information, such as holiday shopping seasons.
- Incoming traffic through a firewall or DMZ. Or outgoing.
- If you are interested in finding "anomalous" events, tagging known events can allow you to search on NOT eventtypetag=known, or on eventtypetag=anomalous.
Note: Some of these suggestions may only work in an organization where responsibilities and what a machine is doing don't change often. Otherwise, you'll have to constantly go in and edit the tags.