Community:SearchPerformance

From Splunk Wiki

(Redirected from Deploy:SearchPerformance)
Jump to: navigation, search

< Back to Best Practices

Analyzing Search Performance

NOTE: This document was written for version 3.x

This document is intended to guide an experienced administrator of Splunk. You should have working knowledge of installing and admministering Splunk, creating advanced search queries, using scripts in a *NIX environment, knowledge of the database structure (buckets), and be familiar with indexing operations. Prior to proceeding any further, you should make sure that your hardware is sufficient for the task at hand. See our recommendations here.

Background

This topic is intended to guide an advanced Splunk administrator through the process of debugging slow searches. Search speed has many dependencies and we must consider the fact that slow and fast are relative terms. One could consider a slow search to be a query that takes a long time period to return results. On most systems, searching for "*" over the past day can be considered to be an average search query.

Methodology

The Flow Chart below is the ordering in which an Administrator should go about troubleshooting slow searches. Although there is detail in the flow, it is intended to be a guideline for the proper debugging process. There are specific searches that may need to be run at each point in the flow chart. Most of these searches are at the bottom of this page and can be run from both the Web Interface and Command Line Interface.


Slow Search Flow Chart

Search Analysis Flowchart Search Analysis Flowchart

Tips for troubleshooting slow searches

  • If using distributed search, turn on SSL compression
  • Perform Bucket Span analysis
  • Check for # of tsidx files in each bucket. Buckets include db-hot, warm, and cold. 10 is ideal. >15 is not optimal. >50 will hinder performance. >100 will hurt performance.
  • Perform Bonnie++ analysis

Future events

See the following topic for complete details on analyzing buckets: http://www.splunk.com/base/Deploy:UnderstandingBuckets In the flow chart above, there is an action described as "Use Future Events Script". The script is linked below and is packaged in a zip file. Unzip the file and place it in a location where you can access it via python. The command to run future events script is as follows.

  • # cat systeminfo.txt | grep 'default.*db_.*rawdata' | grep -v compressedAddresses| python <path to script>/future_events

http://www.splunk.com/base/Image:Future_events.zip

Search Examples

The searches below are all applicable for debugging slow searches.

  • From the GUI, run a search for "*" and time how long it takes
    • Run this search without a time span
    • Run this search with a recent 15 minute time span
    • Run this search with a 15 minute time span, occurring 1 week ago
  • From the GUI, run a search for a common event term such as "error" or "access denied"
    • As above, run this search without a time span
    • Run this search with a recent 15 minute time span
    • Run this search with a 15 minute time span, occurring 1 week ago
  • From the GUI, run a search for a rare event. Using a rare hostname or username are good starting points.
  • From the command line, run a timed dispatch command for "*"
    • >time splunk dispatch '*'
    • >time splunk dispatch '* startminutesago=15'
  • From the command line, run a timed dispatch command for a common event.
    • >time splunk dispatch 'error'
    • >time splunk dispatch 'access denied'
  • From the command line, run a timed dispatch command for a rare event or hostname.
    • >time splunk dispatch 'host123'


More Search Examples

Run these searches from the CLI:

  • >time ./splunk dispatch '* | stats count'
    • will scan for ALL results without returning the events themselves. You will get a return value for the duration.
  • >time ./splunk dispatch '* | stats count' -maxtime 10
    • will scan for ALL results without returning the events themselves, limited to 10 seconds.
  • >time ./splunk dispatch '* | head 10000' -maxtime 10
    • will scan for ALL results AND return the events, limited to 10 seconds.
  • >time ./splunk dispatch '* | head 10000 | stats count' -maxtime 10
    • will scan for ALL results AND call the events (won't return them), limited to 10 seconds
  • >time ./splunk dispatch '* | stats count' -maxtime 2

How can I fix my timestamps?

Force your events to use the current system time by setting the following in your props.conf:

  • DATETIME_CONFIG=CURRENT

Tell Splunk not to trust time stamps within a file which are older than the following:

  • MAX_DAYS_AGO=5

Tell Splunk to limit how far it will look into a event to find a time stamp:

  • MAX_TIMESTAMP_LOOKAHEAD=50
Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk