Community:SearchPerformance
From Splunk Wiki
Analyzing Search Performance
NOTE: This document was written for version 3.x
This document is intended to guide an experienced administrator of Splunk. You should have working knowledge of installing and admministering Splunk, creating advanced search queries, using scripts in a *NIX environment, knowledge of the database structure (buckets), and be familiar with indexing operations. Prior to proceeding any further, you should make sure that your hardware is sufficient for the task at hand. See our recommendations here.
Background
This topic is intended to guide an advanced Splunk administrator through the process of debugging slow searches. Search speed has many dependencies and we must consider the fact that slow and fast are relative terms. One could consider a slow search to be a query that takes a long time period to return results. On most systems, searching for "*" over the past day can be considered to be an average search query.
Methodology
The Flow Chart below is the ordering in which an Administrator should go about troubleshooting slow searches. Although there is detail in the flow, it is intended to be a guideline for the proper debugging process. There are specific searches that may need to be run at each point in the flow chart. Most of these searches are at the bottom of this page and can be run from both the Web Interface and Command Line Interface.
Slow Search Flow Chart
Tips for troubleshooting slow searches
- If using distributed search, turn on SSL compression
- Perform Bucket Span analysis
- Check for # of tsidx files in each bucket. Buckets include db-hot, warm, and cold. 10 is ideal. >15 is not optimal. >50 will hinder performance. >100 will hurt performance.
- Perform Bonnie++ analysis
Future events
See the following topic for complete details on analyzing buckets: http://www.splunk.com/base/Deploy:UnderstandingBuckets In the flow chart above, there is an action described as "Use Future Events Script". The script is linked below and is packaged in a zip file. Unzip the file and place it in a location where you can access it via python. The command to run future events script is as follows.
# cat systeminfo.txt | grep 'default.*db_.*rawdata' | grep -v compressedAddresses| python <path to script>/future_events
http://www.splunk.com/base/Image:Future_events.zip
Search Examples
The searches below are all applicable for debugging slow searches.
- From the GUI, run a search for "*" and time how long it takes
- Run this search without a time span
- Run this search with a recent 15 minute time span
- Run this search with a 15 minute time span, occurring 1 week ago
- From the GUI, run a search for a common event term such as "error" or "access denied"
- As above, run this search without a time span
- Run this search with a recent 15 minute time span
- Run this search with a 15 minute time span, occurring 1 week ago
- From the GUI, run a search for a rare event. Using a rare hostname or username are good starting points.
- From the command line, run a timed dispatch command for "*"
- >time splunk dispatch '*'
- >time splunk dispatch '* startminutesago=15'
- From the command line, run a timed dispatch command for a common event.
- >time splunk dispatch 'error'
- >time splunk dispatch 'access denied'
- From the command line, run a timed dispatch command for a rare event or hostname.
- >time splunk dispatch 'host123'
More Search Examples
Run these searches from the CLI:
>time ./splunk dispatch '* | stats count'
- will scan for ALL results without returning the events themselves. You will get a return value for the duration.
>time ./splunk dispatch '* | stats count' -maxtime 10
- will scan for ALL results without returning the events themselves, limited to 10 seconds.
>time ./splunk dispatch '* | head 10000' -maxtime 10
- will scan for ALL results AND return the events, limited to 10 seconds.
>time ./splunk dispatch '* | head 10000 | stats count' -maxtime 10
- will scan for ALL results AND call the events (won't return them), limited to 10 seconds
>time ./splunk dispatch '* | stats count' -maxtime 2
How can I fix my timestamps?
Force your events to use the current system time by setting the following in your props.conf:
DATETIME_CONFIG=CURRENT
Tell Splunk not to trust time stamps within a file which are older than the following:
MAX_DAYS_AGO=5
Tell Splunk to limit how far it will look into a event to find a time stamp:
MAX_TIMESTAMP_LOOKAHEAD=50