Community:TroubleshootingWMIIssues

From Splunk Wiki

Jump to: navigation, search

< Back to Troubleshooting

Troubleshooting common issues with Splunk and WMI

This topic discusses the following common issues with Splunk and WMI:

  • I am able to index events locally, but I am unable to index from remote machines.
  • I am able to extract events from remote machines, however, I occasionally see crash files in %SPLUNK_HOME%\var\log\splunk.
  • Why can I connect using semisynchronous but not asynchronous? Why does Splunk not use semisynchronous?
  • Splunk is not collecting local WMI eventdata. (Not a permissions issue) It appears to be collecting other events (non-WMI) just fine.

I am able to index events locally, but I am unable to index from remote machines

1. Check if Splunk has been installed as a domain user.

  • Open the msinfo-sum.txt file in diag output and look for the section [Services]. Under that stanza look for Splunkd and SplunkWeb:
Splunkd	Splunkd	Running	Auto	Own Process	c:\program files\splunk\bin\splunkd.exe service	Normal	COCO\Administrator	0
SplunkWeb	SplunkWeb	Running	Auto	Own Process	"c:\program files\splunk\bin\pythonservice.exe"	Normal	COCO\Administrator	0

Note: check if Splunk Services are installed using a domain account. It should be domain\user.

2. Look in splunkd.log and search for wmi. If you see HRESULT error it is likely that customer will be able to reproduce the same error when connecting via wbemtest. The error will look like this:

03-11-2009 10:08:29.296 ERROR ExecProcessor - error from "python E:\Splunk\bin\scripts\splunk-wmi.py" ERROR WMI - Instantiation of IWbemServices::ExecQueryAsync failed (error code 800706be)
03-11-2009 10:08:29.296 ERROR ExecProcessor - error from "python E:\Splunk\bin\scripts\splunk-wmi.py" ERROR WMI - IWbemServices::CancelAsyncCall error (WMI Namespace "\\ADLDBS01\root\cimv2", Param "Application", HRESULT error 80041002)

Note: Most common errors that have been observed are:

80070005 - Access is denied 
80041064 - User credentials cannot be used for local connections 
800706BA - The RPC server is unavailable 
80041003 - Access Denied 

Note: You can gather a more detailed information by enabling debug logging. Remember to reset to default once necessary DEBUG logging is gathered.

  • For Splunk 3.x and 4.0.x set the following parameter in %SPLUNK_HOME%\etc\log-cmdline.cfg and restart Splunk:
category.WMI=DEBUG
  • For Splunk 4.1.x set the following parameters in two places:

In log_cmdline.cfg

category.WMI=DEBUG 

In log.cfg (or log-local.cfg)

category.ExecProcessor=DEBUG 

(Once done debugging, revert back to default. Default values are category.WMI=ERROR and category.ExecProcessor=WARN )


This is a useful site to find out what the error means and point to things for you to investigate: http://www.manageengine.com/products/opmanager/help/troubleshoot_opmanager/troubleshoot_wmi.html

3. When you see HRESULT error xxxxxx, do the following to reproduce the error outside of Splunk:

  • Make sure that you are logged into the machine using the same Splunk domain account (COCO\Administrator).
  • Launch wbemtest and connect to the server that is problematic. You can find this in the splunkd.log. In above example \\ADLDBS01\root\cimv2

Wmifaq1.png

Note: You should be able to connect to the server \\ADLDBS01 without submitting Credentials. If they encounter error, it will be like the following:

Wmifaq2.png

  • Once connected to the server, click on “Query…” button, submit any query and click on Apply:
  • SELECT Category, CategoryString, ComputerName, EventCode, EventIdentifier, EventType, Logfile, Message, RecordNumber, SourceName, TimeGenerated, TimeWritten, Type, User FROM Win32_NTLogEvent WHERE Logfile = "Application”

Note: On 3.4.9 and earlier please select Asynchronous. With 3.4.10 and later as well as 4.0.x we are now using semisynchronous.

  • Results should return like this:

Wmifaq5.png

4. Check Firewall. Often disabling the firewall for WMI will work but if you restrict it, it won’t. Microsoft has good information regarding this at http://msdn.microsoft.com/en-us/library/aa389286(VS.85).aspx . Take a look at the firewall especially when dealing with “HRESULT error 800706ba” If you are trying to extract events from a Vista or Windows Server 2008 check this posting http://msdn.microsoft.com/en-us/library/aa822854(VS.85).aspx

I am able to extract events from remote machines, however, I occasionally see crash files in %SPLUNK_HOME%\var\log\splunk

Note: WMI is not very stable and occasionally can cause Splunk-wmi.exe to crash. However, Splunk-wmi is very resilient and the moment it comes down, it will spawn a new one (you can tell by the PID). 1. You can reduce incidence of crashes by reducing the number of Servers they are indexing. Anything above 80, we start to see crashes, so reduce it. 2. The second thing is how many things are you trying to pull? For example: From wmi.conf:

server = app1, app2, app3, app4, app5, app6, app7, app8, app9, app10, app11, app12, app13, app14, app15, app16, app17, app18, app19, app20……app80
event_log_file = Application, System, Security, Directory Service, DNS Server, File Replication Service

This means, you have 6 applications to pull from 80 servers: 6 X 80 = 480 providers/connections. WMI does not handle memory very well and will cause crashes. The best thing to do is to reduce the number of providers/connections. According to Windows Dev team, they can reproduce a crash with ~120 providers/connections. It is expected 64-bit Windows should be able to handle this better.

Why can I connect using semisynchronous but not asynchronous? Why does Splunk not use semisynchronous?

  • Microsoft can probably explain this in a little more in detail but basically we are using asynchronous for better security as well as better resource utilization.

Microsoft has a good website that explains the difference at http://msdn.microsoft.com/en-us/library/aa384832(VS.85).aspx

    • Note: With Splunk 3.4.10 and later builds, Splunk makes semisynchronous connection.

Splunk is not collecting local WMI eventdata. (Not a permissions issue) It appears to be collecting other events (non-WMI) just fine.

  • It is not uncommon to have the WMI service misbehave under heavy (and sometimes, not so heavy) load. Try restarting the Windows Management Instrumentation Service from the Control Panel.

Appendix

Different ways to launch wbemstest if you are logged in as yourself as a domain user and not as Splunk domain user:

Method A

  • Right click on cmd.exe file (located in c:\windows\system32) and select “Run As…” and select “The following user:” and click on OK.

Wmifaq6.png

  • This will bring up a command prompt window. In the command prompt window type “wbemtest” and enter:

Wmifaq7.png

  • This will launch wbemtest.

Method B

  • Launch cmd.exe in start>run.
  • Type” runas /user:<domainname>\<login> cmd <enter>
  • Enter password for the login then <enter>

Wmifaq8.png

  • This will launch a new cmd windows. Note that it says at upper left corner “cmd(running as COCO\administrator”
  • At prompt type “wbemtest” This will launch wbemtest running as coco\administrator.

Wmifaq9.png

How can I see for sure if a WQL query is working / can work

You can run splunk-wmi.exe manually with a desired query and/or namespace to see the output that it produces.

NOTE: when running this command, change your SPLUNK_DB for your environment.

C:\Program Files\Splunk\bin> splunk cmd splunk-wmi.exe -wql "select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk"

Failing example:

$ ./splunk cmd splunk-wmi.exe -wql "select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk_typo"
ERROR WMI - Error occurred while trying to retrieve results from a WMI query (error="Specified class is not valid." HRESULT=80041010) (.: select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk_typo)
ERROR WMI - Giving up attempt to connect to WMI provider after maximum number of retries at maximum backoff time (.: select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk_typo)

Clean shutdown completed.

Succeeding example:

jrodman@jrodman-PC /cygdrive/c/Program Files/Splunk/bin
$ ./splunk cmd splunk-wmi.exe -wql "select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk"
20090904144105.000000
AvgDiskBytesPerRead=0
AvgDiskBytesPerTransfer=0
AvgDiskBytesPerWrite=0
AvgDiskQueueLength=0
AvgDiskReadQueueLength=0
AvgDiskWriteQueueLength=0
AvgDisksecPerRead=0
AvgDisksecPerTransfer=0
AvgDisksecPerWrite=0
Caption=NULL
CurrentDiskQueueLength=0
Description=NULL
DiskBytesPersec=0
$
DiskReadsPersec=0
DiskTransfersPersec=0
DiskWriteBytesPersec=0
DiskWritesPersec=0
Frequency_Object=NULL
Frequency_PerfTime=NULL
Frequency_Sys100NS=NULL
Name=0 D: C:
PercentDiskReadTime=0
PercentDiskTime=0
PercentDiskWriteTime=0
PercentIdleTime=98
SplitIOPerSec=0
Timestamp_Object=NULL
Timestamp_PerfTime=NULL
Timestamp_Sys100NS=NULL
wmi_type=unspecified

---splunk-wmi-end-of-event---
20090904144105.000000
AvgDiskBytesPerRead=0
AvgDiskBytesPerTransfer=0
AvgDiskBytesPerWrite=0
AvgDiskQueueLength=0
AvgDiskReadQueueLength=0
AvgDiskWriteQueueLength=0
AvgDisksecPerRead=0
AvgDisksecPerTransfer=0
AvgDisksecPerWrite=0
Caption=NULL
CurrentDiskQueueLength=0
Description=NULL
DiskBytesPersec=0
DiskReadBytesPersec=0
DiskReadsPersec=0
DiskTransfersPersec=0
DiskWriteBytesPersec=0
DiskWritesPersec=0
Frequency_Object=NULL
Frequency_PerfTime=NULL
Frequency_Sys100NS=NULL
Name=Total
PercentDiskReadTime=0
PercentDiskTime=0
PercentDiskWriteTime=0
PercentIdleTime=98
SplitIOPerSec=0
Timestamp_Object=NULL
Timestamp_PerfTime=NULL
Timestamp_Sys100NS=NULL
wmi_type=unspecified

---splunk-wmi-end-of-event---

Clean shutdown completed.
Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk