Community:TroubleshootingDistributedSearch

From Splunk Wiki

Jump to: navigation, search

When using distributed search in Splunk 4.x, there are additional considerations to be taken when troubleshooting search problems.

Isolate the problem

The splunk_server field tells you which distributed search node returned the specific result. This can answer questions like

  • Are all my nodes working?
  • Is my data being distributed evenly? or at all?

You can can also use the splunk_server field as part of a search to evaluate behavior (such as performance) on different nodes.

Try one node at a time

To nail down where a breakage is occurring, it's often useful to skip the entire ui layer. So from a command line prompt:

> splunk search 'my search terms earliest=-15m latest=0'

You can compare the previous search to the following (searches a specific Splunk server):

> splunk search 'my search terms splunk_server=server1 earliest=-15m latest=0'

And then another:

> splunk search 'my search terms splunk_server=server2 earliest=-15m latest=0'

Performance debugging Distributed Search in Splunk 4.x

What kind of performance should I expect from a distributed environment?

  • This is very dependent on the total number of indexers being searched.
  • Common sourcetypes on our spec hardware should yield at least few thousand events per refresh. Searching for * will also yield similar results.
    • Note that the number of results returned per refresh (of event count) will increase as the search runs (# events per refresh appears to increase)
  • Increasing the number of indexers will help with the total result set.
  • A single search head with multiple indexers may yield a few thousand events per refresh for densely populated events.

How to gauge your system performance...

First, select a common sourcetype that makes up a good portion of your data. You can view the Index or Inputs Activity dashboard to get more detail on which sourcetype to use for debugging. If you can't figure out an ideal sourcetype, use a wildcard (*).

1. Log in to the Splunk Search UI

2. Search for 100,000 events of your sourcetype. For our example, we'll use sourcetype=access_combined

sourcetype=access_combined | head 100000

3. Check the performance of the search:

  • Go to the Jobs link at the top right corner
  • Look for the search you just ran and write down the Run Time value
  • This number is your baseline for searching a densely populated sourcetype

4. Run the same search in step 1 with a pipe to fields

sourcetype=access_combined | fields splunk_server | head 100000
  • Note - you may want to examine the splunk_server field as it will give you an idea of the distribution of data (mouse over and click the splunk_server field in the left window pane)
    • EXPLANATION - using "| fields" tells Splunk to bypass the action of returning the fields for each event. This should significantly improve search performance.

5. Check the performance of the search:

  • Go to the Jobs link at the top right corner
  • Look for the search you just ran and write down the Run Time value
  • This number should be significantly (orders of magnitude) lower than the time to run the first search.

6. Run the same search in step 3 and look for data from a few hours ago

sourcetype=access_combined latest=-2h@h | fields splunk_server| head 100000

7. Check the performance of the search:

  • Go to the Jobs link at the top right corner
  • Look for the search you just ran and write down the Run Time value
  • Ideally, this should be similar to the previous number. If not, there may be bursty/sporadic event data and one may want to check the event distribution over time.

8. Run the same search in step 3 and select a specific splunk_server

sourcetype=access_combined splunk_server=my_splunk_server | fields splunk_server | head 100000
  • Note - substitute my_splunk_server with the name of one of your splunk_server

9. Check the performance of the search:

  • Go to the Jobs link at the top right corner
  • Look for the search you just ran and write down the Run Time value
  • This search should be slightly slower than the 2nd search, as far as performance. This is because we are only selecting events from a single server, rather than many at once.

10. Repeat all of the previous steps (2-9), but from the Command Line Interface

  • To check the duration of a search, use the unix time command
  • To prevent the searches from returning results to the screen, use "| stats count"
  • Search syntax should be similar to the following (for the first example)
> time ./splunk dispatch "sourcetype=access_combined | head 100000 | stats count"
  • Note - you will need to log in to Splunk auth from the CLI.

11. The final step is to compare the numbers.

  • CLI searches should generally perform faster
  • Using the "| fields" operator will significantly improve search times
  • The numbers are intended to give a baseline for performance.

Things to look for

  • How many fields are being returned and can I trim them?
    • Enable/disable auto_kv?
  • How are each individual servers performing?
    • If one server is really slow, this will generally slow down results
    • Search individual splunk indexers to verify if there is a single under-performing server
  • How is my data distributed?
    • If the majority of events for a specific search reside on one server, you will not see performance gains when running distributed search
    • To verify the location of data, run this search and examine the splunk_server percentages in the field window pane:
sourcetype=my_sourcetype | fields splunk_server | head 10000 | stats count
  • Check to see if there are peaks with traffic. Utilize the following search:
index=_internal metrics group=per_sourcetype_thruput earliest=-1d| timechart span=1h sum(kb) by splunk_server
  • Is it a UI problem?
    • Check the splunkd_access.log files for search times that exceed 500-1000 ms. Use this search:
index=_internal source=*splunkd_access.log | rex (?<spent>\d+)ms | search spent >= 1000
Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk