Community:GatherWindowsStacks

From Splunk Wiki

Jump to: navigation, search

Gathering Windows Process Stacks Using Process Monitor

Intro

Primarily this page will document the steps to get functionality equivalent to 'pstack' on Unix for a given process ID for splunk troubleshooting and diagnosis goals. However, Process Monitor is a bit of a swiss army knife, along with its many other tools from System Internals. Please DO experiement and learn your way around the tool if you are someone who has to support windows systems.

Step by Step

  • Step 1 - obtain a copy of the "System Internals" Process Monitor program; since Microsoft's website is useless and the urls change all the time, the est approach is to simply search the web for
    "process monitor" system internals
    Select a url from technet.microsoft.com or msdn.microsoft.com. The title will probably be "Process Monitor", and it will probably be the first hit. System Internals was a third-party tools and inspection project that was acquired by Microsoft over a decade ago and is maintained internally by the company since then.
  • Step 2 - On that page, click Download Process Monitor. You should receive a zip of around 1 MB or so.
  • Step 3 - unpack the zip to a convenient folder. It should include a procmon.exe executable, a procmon help file, and a eula text file. The Eula is not required (nor technically is the help file). There is no installer. I recommend simply unpacking it to a folder on the desktop called something like sysinternals or process_monitor. If you end up making use of other Systems Internals tools installing them all to the same folder may be desirable.
  • Step 3.a - Run procmon.
  • Step 4 - You will see a license terms, accept this.
    Procmon-step4.png
  • Step 5 - Quickly, turn off event capture. By default, process monitor immediately begins to capture every event in the system for all programs. Click the magnifying glass icon in the toolbar, or press Ctrl-E or in the File menu uncheck the Capture Events menu item.
    Before deactivating: Procmon-step5-before.png
    After correctly deactivated: Procmon-step5-after.png
  • Step 6 - Set up a filter. We want to capture events from the process we want, not every process on the computer. Probably we want to only capture splunkd.exe, and probably only a specific splunkd.exe process.
    • Step 6.a - Filter on splunkd.exe - choose
      [Process Name] [is] [splunkd.exe] then [include]
      IMPORTANT you must manually click the Add button after every rule.
      Procmon-step6a.png
    • Step 6.b - determine the process ID you want to monitor -- you could use process explorer if you want a nice tool, but Task Manager will do.
      • Step 6.b.1 - start Task manager from the right-click-menu on the task bar; other means of starting it are also fine.
        Procmon-step6b1.png
      • Step 6.b.2 - go to the Processes tab.
        Procmon-step6b2.png
      • Step 6.b.3 - From the View menu, choose Select Columns...
        Procmon-step6b3.png
      • Step 6.b.4 - First, select the Process ID column,
        Procmon-step6b4-1.png
        then scroll down the list and select the Command Line column, then click OK.
        Procmon-step6b4-2.png
      • Step 6.b.5 - Use the command line to find the main splunkd, search or otherwise as you would have done on Unix. Note the PID of this process. If necessary, write it down. (I don't have splunkd running for these screenshots, so I'll demo this with iexplore*32))
        Procmon-step6b5.png
    • Step 6.c - Add a filter on the process-ID or PID. Use
      [PID] [is] [<number from 6b>] then [include]
      Note we are being redundant here, but windows reuses PIDs quickly, so a crashed splunk or a finished search might be reused in seconds, thus it's best to be specific.
      Procmon-step6c.png
    • Step 6.d - CHECK that you have two green INCLUDE filters in the list, then click OK to leave the filter editor.
      Procmon-step6d.png
  • Step 7 - TEST the filter! Try turning it on for a second to see that we get events from the one splunk process we want and no other. Turn it off again!
    Procmon-step7.png
  • Step 8 - Change where to store the data -- by default process explorer stores all the data it collects in the page file (I don't know if this means virtual memory or if it uses some weird API). This can break the computer if you leeave it running. We should use files in the filesystem in case it gets big (make sure there's many gigabytes free if you're going to leave it running for a long time).
    • Step 8.a - From the File menu, choose Backing Files...
      Procmon-step8a.png
    • Step 8.b - In the Backing Files dialog, choose Use File named: and pick a location where sufficient free space exists. Do not use a slow storage location like NFS, SMB, or a floppy disk!
      Procmon-step8b.png
  • Step 9 - Avoid collecting filtered events -- In the Filter menu, check Drop Filtered Events. Otherwise Process Monitor will still be filling the disk with everything we worked hard to ignore.
    Procmon-step9.png
  • Step 10 - Select a frequency for our stack capturing. This tool can capture once a second or once every 100 ms or 10x a second. For a long lived problem, the 1-per-second may be good. For a shorter problem or a high-cpu problem 100ms is probably more informative.
    • Step 10.a - From the Options Menu select Profiling Events...
      Procmon-step10a.png
    • Step 10.b - Ensure Generate thread profiling events is on. Choose your interval, and press OK.
      Procmon-step10b.png
  • Step 11 - Turn off categories we don't want. For strace-goals you might want another set of event types gathered.
    For our stack-collecting goal, we only want profiling events.
    Turn off Registry, File System, Networking, and Thread and Process events, ensure Profiling events are on.
    Procmon-step11.png
  • Step 12 - Clear any already collected events
    Procmon-step12.png
  • Step 13 - Turn on the capture.
    Procmon-step13.png
  • Step 14 - Wait the right amount of time -- ie, however long you want to monitor. For a snapshot set for a problem constantly ongoing, some tens of seconds with a 100ms capture granularity should be fine. For a poorly understood problem that happens over time, or eventually you may want to leave it running at 1 second resolution until the problem is well covered, or occurs.
  • Step 15 - Stop the capture, and SAVE IT!
    • Step 15.a - From the File menu, choose Save...
      Procmon-step15a.png
    • Step 15.b - Use Native Process Monitor Format (PML)
      DO NOT use csv or xml
      Enter a filename. Click OK
      Procmon-step15b.png
  • Step 16 - send us the PML file
Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk