From Splunk Wiki
Troubleshooting search quotas
This content applies to Splunk versions 4.1+
When a Splunk indexer or a search head is used intensively, some of your searches might be skipped or not run.
# By example in scheduler.log 02-09-2011 18:52:59.862 INFO SavedSplunker - savedsearch_id="nobody;search;ALERT--XXX--WAN_Core_device", user="nobody", app="search", savedsearch_name="search;ALERT--XXX--WAN_Core_device", status=skipped, scheduled_time=1297306320 02-14-2011 18:01:03.487 INFO SavedSplunker - savedsearch_id="admin;MySummarization of QoS", user="admin", app="Qos", savedsearch_name="MySummarization of QoS", status=skipped, scheduled_time=1297706400 # Or warnings in splunkd.log The system is approaching the maximum number of historical searches that can be run concurrently. current=10 maximum=12 # Or in the UI Your maximum disk usage quota has been reached. usage=6000MB quota=1000MB The search was not run. SearchId=123456789.3
You can solve the core of the problem:
- Too many scheduled searches, too frequent, too long to achieve
- Server performance too low for the volume. (not enough cores, slow storage)
Or try to tune your quotas:
- increase the search quotas for the role of the user
- or change the owner of the searches
- or change the role of the users to segment your resources, (you can also create specific roles)
First of all, check your search performance
The goal is to figure which users are running the most searches and if some of them are skipped.
A very nice dashboard exists in the Search app > status > scheduler activity > By user or app.
You can look also at the detail with this search over a day or an hour.
index=_internal source=*scheduler.log* | stats count by user, app , savedsearch_name, status index="_internal" source="*scheduler.log" savedsplunker | stats count BY user, savedsearch_name, host
Note: The user "nobody" means that no particular user is owner of the search. This is often the case for a savedsearches.conf defined for an app. For the user "nobody" the default quotas will be applied.
Then get your settings by looking the configuration files, or use
./splunk cmd btool authorize list
How the quotas are calculated
The system overall number of jobs is determined by the number of cores in limits.conf
The maximum number of concurrent searches per CPU. The system-wide number of searches is computed as base_max_searches + max_searches_per_cpu x number_of_cpus. Changing this without adding more cores will probably not magically solve the problem.
see in Splunk 4.*
[search] base_max_searches = 4 max_searches_per_cpu = 4
see in Splunk 5.* (the number of searches was reduced because of an improvement of the scheduler that can queue more instead of skipping.
[search] base_max_searches = 6 max_searches_per_cpu = 1
Another setting is well hidden: the ratio of jobs that the scheduler can use (versus the manual/dashboard ones). By default, 25% of your quota is for scheduled searches.
[scheduler] max_searches_perc = 25
NOTE: the searches are NOT reserved for the scheduler - this is just a limit - ad-hoc user searches take priority over the scheduler.
If you are curious about the actual concurrent searches :
index=_internal sourcetype=splunkd source=*metrics.log group=search_concurrency "system total" | timechart max(active_hist_searches) as "Historical Searches" min(active_realtime_searches) as "Real-time Searches" by host
The quota is per user. This means that the quota is available to every user in a role. If you wish to have a quota on total jobs fora role, you should use the "cumulativeSrchJobsQuota" and "cumulativeRTSrchJobsQuota" settings, per the authorize.conf.spec file.
The parameters are:
- rtSrchJobsQuota for the real time search quota
- srchJobsQuota for all the historical searches (dashboards, manual and scheduled), this is usually the one to increase
- srchDiskQuota the maximum disk space in MB to store the results of the searches, increase this one if you are planning to retrieve a lot of results.
# default quotas [role_admin] rtSrchJobsQuota = 100 srchDiskQuota = 10000 srchJobsQuota = 50 [role_power] rtSrchJobsQuota = 20 srchDiskQuota = 500 srchJobsQuota = 10 [role_user] rtSrchJobsQuota = 6 srchDiskQuota = 100 srchJobsQuota = 3 [default] rtSrchJobsQuota = 6 srchDiskQuota = 100 srchJobsQuota = 3
Don't forget that for the user "nobody" the default settings will apply.
This means that by default, the "nobody" user can run 25% of 3 jobs simultaneously for scheduled searches, and when rounded, this is just 1 job!!!! The scheduler's 25% is applied at the aggregate level, the scheduler is able to consume the entire quota of a user if there are enough scheduled searches!!!