From Splunk Wiki

Jump to: navigation, search

Troubleshooting search quotas

This content applies to Splunk versions 4.1+

When a Splunk indexer or a search head is used intensively, some of your searches might be skipped or not run.

# By example in scheduler.log
02-09-2011 18:52:59.862 INFO  SavedSplunker - savedsearch_id="nobody;search;ALERT--XXX--WAN_Core_device", user="nobody", app="search", savedsearch_name="search;ALERT--XXX--WAN_Core_device", status=skipped, scheduled_time=1297306320
02-14-2011 18:01:03.487 INFO  SavedSplunker - savedsearch_id="admin;MySummarization of QoS", user="admin", app="Qos", savedsearch_name="MySummarization of QoS", status=skipped, scheduled_time=1297706400

# Or warnings in splunkd.log
The system is approaching the maximum number of historical searches that can be run concurrently. current=10 maximum=12

# Or in the UI
Your maximum disk usage quota has been reached. usage=6000MB quota=1000MB The search was not run. SearchId=123456789.3 

You can solve the core of the problem:

  • Too many scheduled searches, too frequent, too long to achieve
  • Server performance too low for the volume. (not enough cores, slow storage)

Or try to tune your quotas:

  • increase the search quotas for the role of the user
  • or change the owner of the searches
  • or change the role of the users to segment your resources, (you can also create specific roles)

First of all, check your search performance

The goal is to figure which users are running the most searches and if some of them are skipped.

A very nice dashboard exists in the Search app > status > scheduler activity > By user or app.

You can look also at the detail with this search over a day or an hour.

index=_internal source=*scheduler.log* | stats count by user, app , savedsearch_name, status

index="_internal" source="*scheduler.log" savedsplunker | stats count BY user, savedsearch_name, host

Note: The user "nobody" means that no particular user is owner of the search. This is often the case for a savedsearches.conf defined for an app. For the user "nobody" the default quotas will be applied.

Then get your settings by looking the configuration files, or use ./splunk cmd btool authorize list

How the quotas are calculated

The system overall number of jobs is determined by the number of cores in limits.conf

The maximum number of concurrent searches per CPU. The system-wide number of searches is computed as base_max_searches + max_searches_per_cpu x number_of_cpus. Changing this without adding more cores will probably not magically solve the problem.

see in Splunk 4.*

base_max_searches = 4
max_searches_per_cpu = 4

see in Splunk 5.* (the number of searches was reduced because of an improvement of the scheduler that can queue more instead of skipping.

base_max_searches = 6
max_searches_per_cpu = 1

Another setting is well hidden: the ratio of jobs that the scheduler can use (versus the manual/dashboard ones). By default, 25% of your quota is for scheduled searches.

max_searches_perc = 25

NOTE: the searches are NOT reserved for the scheduler - this is just a limit - ad-hoc user searches take priority over the scheduler.

If you are curious about the actual concurrent searches :

index=_internal sourcetype=splunkd source=*metrics.log group=search_concurrency "system total" 
        | timechart max(active_hist_searches) as "Historical Searches" min(active_realtime_searches) as "Real-time Searches" by host

The quota per role is in authorize.conf

The quota is per user. This means that the quota is available to every user in a role. If you wish to have a quota on total jobs fora role, you should use the "cumulativeSrchJobsQuota" and "cumulativeRTSrchJobsQuota" settings, per the authorize.conf.spec file.

The parameters are:

  • rtSrchJobsQuota for the real time search quota
  • srchJobsQuota for all the historical searches (dashboards, manual and scheduled), this is usually the one to increase
  • srchDiskQuota the maximum disk space in MB to store the results of the searches, increase this one if you are planning to retrieve a lot of results.
# default quotas
rtSrchJobsQuota = 100
srchDiskQuota = 10000
srchJobsQuota = 50
rtSrchJobsQuota = 20
srchDiskQuota = 500
srchJobsQuota = 10
rtSrchJobsQuota = 6
srchDiskQuota = 100
srchJobsQuota = 3
rtSrchJobsQuota = 6 
srchDiskQuota = 100
srchJobsQuota = 3

Don't forget that for the user "nobody" the default settings will apply. This means that by default, the "nobody" user can run 25% of 3 jobs simultaneously for scheduled searches, and when rounded, this is just 1 job!!!! The scheduler's 25% is applied at the aggregate level, the scheduler is able to consume the entire quota of a user if there are enough scheduled searches!!!

Personal tools
Hot Wiki Topics

About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk