Community:TroubleshootingIndexedDataVolume
From Splunk Wiki
Troubleshooting Indexed Data Volume
So data is flowing into Splunk. Where's it coming from? What's chatty today? Who just blew the doors off our indexing volume?
First thing you should check is how many apps are running. Have you installed the Unix app? That can index a lot of data really quickly because it runs lots of scripted inputs. What about other apps, or other inputs? Where did you (or some else?) tell Splunk to get data from? The searches below will help you figure this out.
There are a few tools available to answer these questions.
Log file specificity to 4.3
In 4.3 the license_usage.log file contains more detail, differentiated by type.
- type=Usage => is the equivalent of 4.2
- type=RolloverSummary => is the summary for the previous day for each license-slave (replaces the tedious daily sum of all the volumes). It is calculated at midnight and refers to the previous day.
- type=SlaveWarnSummary => counts the number of violations per slave.
Log file differences between pre-4.2 and post-4.2
If you're using Splunk 4.2 license master/slaves for license management, you cannot use license_audit.log to calculate Indexed data volume anymore. Instead, you need to use license_usage.log to check total indexed volume. The buit-in "Deployment Monitor" app makes use of license_usage.log files for indexedDataVolume, and metrics.log files for fowarders' connectivity, etc. Splunk 4.2 introduced a new license management feature: the license master and slaves schema. If you simply upgraded from 4.1 to 4.2 and did not configure anything for licensing management, license_audit.log should be logging the same information as before. However, once you enable license slaves, or create your own license pools, at the license master, a license_usage.log starts to log usage information, and the license_audit.log will stop logging license events. At the slaves, there is no license_audit.log because only the master manages total index volumes and violations. Also, keep in mind that license_usage.log will not log indexed volume for non-license related index db, such as _internal or summarydb. You can check metrics.log for these non-license related db for both pre-4.2 and 4.2.
license_usage.log
This log is available in the Splunk license master instance only. A license master logs indexed events volume every minute by the information the slaves send to the master. A slave maintains a table of how much you've indexed on a slave in chunks of time. Typically that chunk of time is 1 minute, but the chunk may grow if the slave cannot contact the master -- Splunk only resets the chunk when the table is sent to the master. The table is of src,srctype,host tuples…if that table grows to exceed 1000 entries, then Splunk squashes the host/source keys. So, if you have more than 1000 different tuple entries, you find no value for h(ost) and s(ource) fields. Splunk never suppresses st(sourcetype) in the log.
license_audit.log
This log is useful only for pre-4.2. Please read the topic above about the log file difference between pre and post 4.2. This log maintains daily license usage and exceeded violation count and log these information right after the midnight.
metrics.log
This log maintain metrics of internal queues, internal processor, etc. A per_index_thruput in metrics.log collects only the ten busiest samples as a default. So, if you have more than ten indexes, you will need to edit the maxseries attribute in [metrics] stanza in $SPLUNK_HOME/etc/system/local/limits.conf.
However, this can affect your Indexing performance to some extent.
In 4.2, you can also change the interval attribute from the default of 30 sec. So, you should increase the interval when you need to increase the number of maxseries.
[metrics] maxseries = <integer> * The number of series to include in the per_x_thruput reports in metrics.log. * Defaults to 10. interval = <integer> * Number of seconds between logging splunkd metrics to metrics.log. * Minimum of 10. * Defaults to 30.
Data volume seen by the license code
You may care about this for licensing concerns. Or you may just want to sanity-check what quantity of data the licensing code is seeing.
You can review some information in the Manager portion of the Splunk 4.0.x interface, or you can run a search on the _internal index to see a pretty chart, etc.
Splunk 4.3
You can use the same searches as on 4.2, but in addition, for the volume of the previous days you can use the Rolloversummary (faster) :
index=_internal source=*license_usage* type=RolloverSummary | bucket _time span=1d | stats sum(b) AS volume by _time pool
Splunk 4.2 & 4.3
Run on the license master
index=_internal source=*license_usage.log type=Usage | eval GB=b/1024/1024/1024 | timechart span=1d sum(GB) by pool
if you prefer detail, you can add details on the source "s", host "h", sourcetype "st", indexer "i".
detail per source type
index=_internal source=*license_usage.log type=Usage | eval GB=b/1024/1024/1024 | timechart span=1d sum(GB) by st useother=false
detail per host
index=_internal source=*license_usage.log type=Usage | eval GB=b/1024/1024/1024 | timechart span=1d sum(GB) by h useother=false
detail per source
index=_internal source=*license_usage.log type=Usage | eval GB=b/1024/1024/1024 | timechart span=1d sum(GB) by s useother=false
detail per indexer (useful in case of indexer cluster)
index=_internal source=*license_usage.log type=Usage | eval GB=b/1024/1024/1024 | timechart span=1d sum(GB) by i useother=false
Splunk pre-4.2
index=_internal todaysBytesIndexed LicenseManager-Audit source=*license_audit.log | eval Daily_Indexing_Volume_in_MBs = todaysBytesIndexed/1024/1024 | timechart avg(Daily_Indexing_Volume_in_MBs) by host
This is snarfed from: http://www.splunk.com/base/Documentation/latest/Installation/AboutSplunklicenses#View_your_license_and_usage_details
You can also review the license_audit.log file itself in your Splunk installation, if you need history longer than 28 days. If undertaking this, you may find the following unix-platform incantation useful, which creates a more readable variation of the file.
cat license_audit.log |awk '{ printf("%s\n",substr($0,0,(index($0,"]["))-1)) }' > readable-license-audit.log
Daily volume by host:
This will capture the daily percentage license volume used.
index=_internal todaysBytesIndexed LicenseManager-Audit source=*license_audit.log | eval Daily_Indexing_Volume_in_MBs = todaysBytesIndexed/1024/1024 | bucket _time span=1d | stats avg(Daily_Indexing_Volume_in_MBs) AS UsageMB first(licenseSize) AS LicenseSize by _time host | eval UsagePercent=UsageMB/LicenseSize*100 | eval UsagePercent=round(UsagePercent, 2) | table _time host LicenseSize UsageMB UsagePercent
Quick summary information by host, source, source type, and index
Okay, so there's a problem with the data volume.. it's higher than you expected, or higher than you were planning for. Or you just want to get a better picture of where the data is coming from in a bulk manner. The metrics.log data already has totals for this on a reasonable interval, so we can mine this.
You can use the GUI to have some reports, in the search app > status > index activity > indexing volume
Or you can run those searches that are more precise and flexible. Splunk Metrics Reports has searches for this purpose in the section 'How much was indexed'. For example:
index=_internal group="per_host_thruput" | eval mb=kb/1024| timechart span=1d sum(mb) by series
index=_internal group="per_source_thruput" NOT series="*splunk/var/log*" | eval mb=kb/1024| timechart span=1d sum(mb) by series
index=_internal group="per_sourcetype_thruput" NOT series="splunk*" | eval mb=kb/1024| timechart span=1d sum(mb) by series
These searches provide a sampling of the top producers by different categories. The default sampling size is 10, so if for example you expect to receive 20 source types, this will not be a complete data picture, but will have the 10 busiest for each sub-minute time window. Thus, this search gives you a quick picture of what's going on generally, but not a to-the-byte accurate value.
To see how much data Splunk has actually written to your various indexes, use this search (some of the index out of volume quotas are excluded):
index=_internal host =<indexer hostname> group="per_index_thruput" NOT series="_*" NOT series="history" NOT series="summary" | eval mb=kb/1024| timechart span=1d sum(mb) by series
Counting event sizes over a time range
Roughly, you can run a search where you look at all (or some) data over a range of indexed_time values, counting up the size of the actual events.
For example, where the endpoints START_TIME and END_TIME are numbers in seconds from the start of unix epoch, the search would be
- indexed_time>START_TIME indexed_time<END_TIME |eval event_size=len(_raw) | stats sum(event_size)
This is a slow and expensive search, but when you really need to know, can be valuable. It *must* be run across a time range that can contain all possible events that were indexed at that time -- meaning regardless of timestamp regularity. Typically this means it must be run over all time. The stats computationg as well as initial filters can of course be adjusted to look at the problem more closely.
Set up a scheduled search to alert you if a license violation occurs
First off, learn how to set up a daily scheduled search with an email alert trigger here. You can then use the search string below as the basis for your alert. It will only return results if the quota has incremented and it checks every host separately (handy if you have more than one indexer):
on Splunk 4.2 only using splunkd.log or license_usage.log
On the license master using license_usage.log
Simple alert :
Schedule this search each day on the License Master, you want an email every day this event is recorded.
index=_internal source="*splunkd.log" "Indexing quota exceeded"
Detailed alert on the volume used the previous day :
- precise our license pool name, and your pool size.
- search over the previous day (earliest=-1d@d latest=@d)
index=_internal source=*license_usage* pool="$mypoolname$" | eval GB=b/1024/1024/1024 | stats sum(GB) by pool | where 'sum(GB)' > $mypoolsize$
if you want you can also schedule searched running in the middle of the day to send you warnings if the pool usage is already high.
- precise our license pool name, and your pool size.
- search over the current day until now (earliest=@d latest=now)
index=_internal source=*license_usage* pool="$mypoolname$" | eval GB=b/1024/1024/1024 | stats sum(GB) by pool | where 'sum(GB)' > $myalertvolume$
On Splunk (pre4.2 and after), using the violations counter in license_audit.log
index=_internal source=*license_audit.log LicenseManager-Audit | streamstats current=f global=f window=1 first(quotaExceededCount) as next_quotaExceededCount by host | eval quotadiff = next_quotaExceededCount - quotaExceededCount | search quotadiff>0
example in savedsearches.conf
[new violation alert] action.email = 1 action.email.sendresults = 1 action.email.to = admin@XXXXXXXXXX.com counttype = number of events cron_schedule = 0 1 * * * dispatch.earliest_time = -24h@h dispatch.latest_time = now displayview = flashtimeline enableSched = 1 quantity = 1 relation = rises by request.ui_dispatch_view = flashtimeline search = index=_internal source=*license_audit.log LicenseManager-Audit | streamstats current=f global=f window=1 first(quotaExceededCount) as next_quotaExceededCount by host | eval quotadiff = next_quotaExceededCount - quotaExceededCount | search quotadiff>0