Community:Reporting on access patterns
From Splunk Wiki
Reporting Access Patterns Over Time
Goldburtd 12:10, 30 November 2009 (PST)
Splunk is a powerful tool for analyzing change over time - in web hits, logins, sent emails, java errors, etc... Universal timestamp extraction lays the groundwork, and the timechart search command allows you to visualize data over time. However, this kind of analysis can get sophisticated - the entire field of Calculus is devoted to it. Here are several Splunk tips and tricks.
Lets take the example of a university campus wireless network with many access points. Each base station outputs data on which clients are associated with the wireless access point.
[11-09-2008 05:02:02PM] client_mac=00:90:4b:9f:94:9f, client_vendor=GemTek Technology Co., Ltd., ap_mac=00:5c:f9:04:e4:e0, ap_name=AP-ST-BRH-C6Z, controller_ip=188.8.131.52, state=associated, ssid=XXX-PUBLIC, vlan=725
The default Splunk search features a timeline which reports the number of matched events over time. This is simple and often useful, but to report on any other attribute besides event count Splunk provides the timechart command. There are two required arguments:
- What attribute to track. This needs to be a Splunk field.
- Which statistic to calculate. There are over a dozen statistical operators, and the choice can often be subtle. In our example we choose the distinct count rather than just the count in order to suppress clients that accessed the network more than once in the same period.
There are other arguments to timechart and the default values specified do not always yield the expected result. Here are some of the things you may want to think about:
- How to draw intervals on the x-axis. You may want to look at change at various resolutions - say by minute, or by hour. Splunk can be told to draw intervals on the x-axis by in specified time buckets using the "span" option.
- What to do for periods where there is no data. In our example the base stations report only every 15 minutes. Splunk will assume by default that the data stream is continuous and if there is no data for a time bucket it will plot the value as 0. You can override this with "cont=f"
This shows the overall change in access for the campus wireless network. However, we may want to break this down into usage by access point to track student movement - say from the classrooms in the morning to the libraries at night. When you split data into series you should consider:
- Whether to plot events that are missing values for the field you are splitting by. This is controlled by the flag "usenull"
- Whether to group and display all series that are not plotted individual into an "other" series. This is controlled by the flag "useother"
- Which (and how many) series to display. By default this is the top 10 series by area (that is, those where the sum of the series over all time is in the top 10 sums). You can specify a where clause to select the comparison calculation and quantity threshold - for example, "where stdev in top20" will calculate the 20 series that have changed the most. Note that the UI will render up to 12 series in the legend and up to 500 results (rows in the table).
You may want to look for large-scale patterns of change. For example, say you don't care about which wireless access points were used today (it may be Sunday), but the general use during a typical week-day. A trick to do this is to split not by time but by the hour of the day (extracted by Splunk as date_hour) using the chart command. A few other tricks:
- Select only particular days by using the date_wday field
- Sort the hours correctly by using the fields command to manually arrange the columns
ssid state=associated NOT (date_wday=saturday OR date_wday=sunday) ap_name=AP-ST-ENGR-ITEB* | chart count(client_mac) by ap_name,date_hour where sum in top24 | fields ap_name 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Rendered as a heatmap this will light up the higher values to show where on the engineering quad students connect from most during different times of the day.