From Splunk Wiki
Troubleshooting alert scripts
This can be a complex problem and it's important to be thorough in checking that every step of the process (from scheduled search to alert script) is working as expected :
Is my scheduled search running?
- Check $SPLUNK_HOME/var/log/splunk/scheduler.log (on Windows : %SPLUNK_HOME%\var\log\splunk\scheduler.log) or search for "index=_internal source=*scheduler.log savedsearch_name="my_saved_search_name" | stats count by status" to determine if your scheduled search is running and with a status of "success"? If you see failures here, drill down into those to see why the search is not running. Is it taking too long to execute? Are there too many concurrent searches running at that time?
- More on the topic of troubleshooting saved scheduled searches : Community:TroubleshootingScheduledSearches
Is my scheduled search generating the expected results?
- Again, check $SPLUNK_HOME/var/log/splunk/scheduler.log (on Windows : %SPLUNK_HOME%\var\log\splunk\scheduler.log) or search for "index=_internal source=*scheduler.log savedsearch_name="my_saved_search_name" | stats count by result_count". Is the result event count as you would expect it?
- Provided you have configured Splunk to be able to send emails, add an email action to send yourself an email . Check the search results in the URL provided by the email, and see if they are as expected.
Is my alert action being triggered?
- Set up an additional alert action (typically an email) to see if alert actions are being triggered or not.
- Make sure that you don't have an issue with the condition of your alert. To verify this, change the alert condition to "always". If your script runs then, you know the problem is with the condition and you should study the results of your scheduled search to see why it isn't triggering the alert action as expected.
Is my alert script working?
- First, make sure that your script sits where it should : $SPLUNK_HOME/bin/scripts/ is a good location, but you may also want to put it in $SPLUNK_HOME/etc/apps/your_app_name/bin/scripts if your scheduled search is app-specific and not global.
- In 4.3.4 (and maybe before) runshellscript.py has been updated to "look for scripts first in the app's bin/scripts/ dir, if that fails try SPLUNK_HOME/bin/scripts", if that fails then error and die.
- Check that the script itself runs outside of Splunk. As the user that splunkd runs as on your system, launch the script. Is it producing the expected output? If the script is dependent on variables passed by the Splunk scheduled search, you may want to temporarily set those to hard values in the script itself.
- Use the "somesearch here | runshellscript <your_script.(sh|bat)>" command to run your script manually from the search bar. Does this work? (This will show an error about the number of arguments in 4.3.4, but does confirm that Splunk can see runshellscript.py)
- External search command 'runshellscript' returned error code 1. Script output = "ERROR "Missing arguments to operator 'runshellscript', expected at least 10, got 2." "
- If your own script is running manually and from the search bar but not when called as an alert action, it's time to check if Splunk is able to run a simple script in that manner. Splunk ships with a simple script located in $SPLUNK_HOME/bin/scripts (on Windows : %SPLUNK_HOME%\bin\scripts) called echo.sh (on Windows : echo.bat). This script does a very simple thing when called : It attempts to write the 8 arguments passed by the Splunk scheduled search to the script (see http://www.splunk.com/base/Documentation/latest/Admin/Configurescriptedalerts#Script_options) to a text control file.
This is a good script to test the basic functionality of triggering an alert script, and seeing how Splunk is passing variables to it. Note that you may want to change the script slightly, for example to write it's output to a different location or to add a time stamp to the output.
I typically modify it in the following way in order to time stamp the output and also to specify an absolute path to a custom directory :
For *nix systems : $SPLUNK_HOME/bin/scripts/test.sh :
echo "`date` ARG0='$0' ARG1='$1' ARG2='$2' ARG3='$3' ARG4='$4' ARG5='$5' ARG6='$6' ARG7='$7' ARG8='$8'" >> "/var/tmp/splunk-script.out"
For Windows systems : %SPLUNK_HOME%\bin\scripts\test.bat :
@echo off echo %0, %1, %2, %3, %4, %5, %6, %7, %8 >> "c:\temp\test_output.txt" date /T >> "c:\temp\test_output.txt" time /t >> "c:\temp\test_output.txt" echo ---------------------------------------- >> "c:\temp\test_output.txt"
To use this script, first run it manually as the user that runs splunkd and make sure that you get the expected result : A file called "echo_output.txt" should be created in $SPLUNK_HOME/bin/scripts/ (on Windows : %SPLUNK_HOME%\bin\scripts\). If you have passed arguments to the script from the command line, they should be listed in the last line that was written to that file.
Next, change your scheduled search to use echo.(bat|sh) instead of your own script, and check the contents of the control file. In our example :
# tail -f /var/tmp/splunk-script.out
Is the output written to this file as expected? Do we have entries at the expected times? Are the arguments passed correctly?
Advanced Troubleshooting steps - Checking for ERRORS
Where to look for errors:
How to make the logs more verbose?
Edit $SPLUNK_HOME/etc/apps/search/bin/runshellscript.py file and change all "logger.info" instances to "logger.error", let your scheduled search run a couple of times and check python.log for more verbose messages.
If none of the above steps allow you to solve your problem, it's probably time to contact Splunk Support!