From Splunk Wiki
Some information about reviewing crashlogs, mostly useful for internal consumption, but no reason for the customers out there to not know.
Crash logs are generated as crash-data.log or on windows, executable_name_crash-date.log.
- The filename informs you of the time of crash, even though the content does not.
- The build line informs you of the version running, even in confusing situations like upgrades
- Crashes fall into two general categories:
- Problems our shipped code identified
- Problems the operating system and related runtime features identified
Self-identified crash conditions
Nearly all cases where code we ship has identified the crash-condition will show as an ABORT. Whenever you see an abort message, or sigabort, the most useful information is usually located in splunkd_stderr.log Typically this is an uncaught exception or an assertion. Some common types:
- Exception, std_badalloc: splunk ran out of memory. Be sure the computer has enough memory for splunk (typically many gigs). Be sure some other program isn't using all the memory. Try to gather information on how rapidly the related splunk program grows in memory use.
- assertion failed "...", this is case where explicit sanity checks in our code have identified a Very Bad Condition, where continuing could be worse. Any messages on the particular test, file, function will likely find duplicate occurrences of the bug. There could of course be multiple ways that a sanity test could start to fail, but usually they are the same problem. These cases should definitely be given to someone with source code access to triage, unless the problem is obvious.
- other exceptions can often be understood by google or inference
OS / System identified crash conditions
SIGBUS, fault, access violation, no memory mapped are all cases where the operating system has identified that splunk has done something that is not allowed.
One key point is "no memory mapped" is definitely not a memory exhausted problem.