Community:Intro to Splunk Search Performance
From Splunk Wiki
Here are some quick guidelines on crafting searches that run faster.
- More items that identify your result is almost always going to be better than fewer. It's very tricky to be sure which will be most effective for any given search, and the more you tell the search engine, the better shot you have at a good result.
Effectiveness of terms:
- Time is the most efficient filter in Splunk. Narrowing your search by time is the most effective thing you can do.
- host, source, and sourcetype have more support for our time filtering than other fields and values. Use one or more of these when you can.
- keywords are always useful when filtering, though not as strong as host/source/sourcetype. This includes the words in a phrase like "access denied", indexed fields, and the values of extracted fields for almost all cases. Thus username=joe should have about the same performance benefit as just including 'joe' in your search.
Interacting with all of the above are all the normal physics of search:
- Terms correspond closely with your result set are more powerful than terms that correspond less closely. If 90% of your events include the word 'error' but only 5% include the word 'sshd', and your events have both of these words, including 'sshd' in your search terms will help more than 'error'. Using both should be essentially no penalty.
- Inclusion is generally better than exclusion. searching for
"access denied"will be faster than
NOT "access granted"
- Shorter conjuctions or disjunctions will be more efficent than longer expressions, if both match the same data. E.g.
(host=A or host=B)is preferable than
(username=user1 OR username=user2 .... username=user32424234)
- Apply powerful filtering commands as early in your search as possible. Filtering to one million events and then narrowing to ten events is much slower than filtering to one thousand events and then ten events.
Tricks specific to splunk product
- A shortcut to skipping work you don't need is to run searches from the command line, when all you need is the eventual result. This should become less relevant with future releases.
- Telling splunk you don't need fields makes things faster. Ie. a command like
|fields field1, field2, field3telegraphs to splunk that other fields not mentioned are not needed. Interactively, the UI still asks for all fields, but from the command line or from scheduled searches, the benefits are reaped.
- For sparse searches (infrequent events evenly distributed in a large amount of backing data) you can become CPU bound on decompressing. This can be alleviated by tactics such as: storing your sparse events in another index, using distributed search to bring more CPUs to bear on the problem, storing your rawdata in uncompressed form (an uncommon tradeoff choice)