Community:GatherWindowsStacks
From Splunk Wiki
Gathering Windows Process Stacks Using Process Monitor
Intro
Primarily this page will document the steps to get functionality equivalent to 'pstack' on Unix for a given process ID for splunk troubleshooting and diagnosis goals. However, Process Monitor is a bit of a swiss army knife, along with its many other tools from System Internals. Please DO experiement and learn your way around the tool if you are someone who has to support windows systems.
Step by Step
- Step 1 - obtain a copy of the "System Internals" Process Monitor program; since Microsoft's website is useless and the urls change all the time, the est approach is to simply search the web for
"process monitor" system internals
- Select a url from technet.microsoft.com or msdn.microsoft.com. The title will probably be "Process Monitor", and it will probably be the first hit. System Internals was a third-party tools and inspection project that was acquired by Microsoft over a decade ago and is maintained internally by the company since then.
- Step 2 - On that page, click Download Process Monitor. You should receive a zip of around 1 MB or so.
- Step 3 - unpack the zip to a convenient folder. It should include a
procmon.exe
executable, a procmon help file, and a eula text file. The Eula is not required (nor technically is the help file). There is no installer. I recommend simply unpacking it to a folder on the desktop called something like sysinternals or process_monitor. If you end up making use of other Systems Internals tools installing them all to the same folder may be desirable. - Step 3.a - Run procmon.
- Step 4 - You will see a license terms, accept this.
- Step 5 - Quickly, turn off event capture. By default, process monitor immediately begins to capture every event in the system for all programs. Click the magnifying glass icon in the toolbar, or press Ctrl-E or in the File menu uncheck the Capture Events menu item.
- Step 6 - Set up a filter. We want to capture events from the process we want, not every process on the computer. Probably we want to only capture splunkd.exe, and probably only a specific splunkd.exe process.
- Step 6.a - Filter on splunkd.exe - choose
- Step 6.b - determine the process ID you want to monitor -- you could use process explorer if you want a nice tool, but Task Manager will do.
- Step 6.b.1 - start Task manager from the right-click-menu on the task bar; other means of starting it are also fine.
- Step 6.b.2 - go to the Processes tab.
- Step 6.b.3 - From the View menu, choose Select Columns...
- Step 6.b.4 - First, select the Process ID column,
- Step 6.b.5 - Use the command line to find the main splunkd, search or otherwise as you would have done on Unix. Note the PID of this process. If necessary, write it down. (I don't have splunkd running for these screenshots, so I'll demo this with iexplore*32))
- Step 6.c - Add a filter on the process-ID or PID. Use
- Step 6.d - CHECK that you have two green INCLUDE filters in the list, then click OK to leave the filter editor.
- Step 7 - TEST the filter! Try turning it on for a second to see that we get events from the one splunk process we want and no other. Turn it off again!
- Step 8 - Change where to store the data -- by default process explorer stores all the data it collects in the page file (I don't know if this means virtual memory or if it uses some weird API). This can break the computer if you leeave it running. We should use files in the filesystem in case it gets big (make sure there's many gigabytes free if you're going to leave it running for a long time).
- Step 9 - Avoid collecting filtered events -- In the Filter menu, check Drop Filtered Events. Otherwise Process Monitor will still be filling the disk with everything we worked hard to ignore.
- Step 10 - Select a frequency for our stack capturing. This tool can capture once a second or once every 100 ms or 10x a second. For a long lived problem, the 1-per-second may be good. For a shorter problem or a high-cpu problem 100ms is probably more informative.
- Step 11 - Turn off categories we don't want. For strace-goals you might want another set of event types gathered.
- Step 12 - Clear any already collected events
- Step 13 - Turn on the capture.
- Step 14 - Wait the right amount of time -- ie, however long you want to monitor. For a snapshot set for a problem constantly ongoing, some tens of seconds with a 100ms capture granularity should be fine. For a poorly understood problem that happens over time, or eventually you may want to leave it running at 1 second resolution until the problem is well covered, or occurs.
- Step 15 - Stop the capture, and SAVE IT!
- Step 16 - send us the PML file