Community:Plotting a linear trendline

From Splunk Wiki

Jump to: navigation, search

Did you know you can do a linear trendline (regression) within Splunk? While Splunk already comes with a built-in search command for doing trendlines based on moving averages (see the "trendline" command), you can also do more complex computations, such as a linear regression using search commands such as 'eventstats' and 'eval'.

Additionally, you can use search macros to mask the complexity of the underlying search.

Step 1: Creating the macro

1. Create a search macro named "lineartrend(2)" via the Splunk Manager user interface. Official documentation on search macros are here:

2. The macro definition should be:

eventstats count as numevents sum($x$) as sumX sum($y$) as sumY sum(eval($x$*$y$)) as sumXY sum(eval($x$*$x$)) as sumX2 sum(eval($y$*$y$)) as sumY2
    | eval slope=((numevents*sumXY)-(sumX*sumY))/((numevents*sumX2)-(sumX*sumX))
    | eval yintercept= (sumY-(slope*sumX))/numevents
    | eval newY=(yintercept + (slope*$x$))
    | eval R=((numevents*sumXY) - (sumX*sumY))/sqrt(((numevents*sumX2)-(sumX*sumX))* ((numevents*sumY2)-(sumY*sumY)))
    | eval R2=R*R

Note: When copy/pasting the macro definition, make sure you paste it into Splunk as a single line. You will get errors otherwise.

3. Macro arguments should be:

x, y

Step 2: Plotting the trendline

The macro `lineartrend` gets passed two arguments, x and y values from each event, and creates the following fields for each event:

  • slope - the slope of the trendline,
  • yintercept - the y-intercept of the trendline
  • R - coefficient of correlation
  • R2- R-squared, the coefficent of determination
  • newY - the trendline values

For interested readers, the calculation was borrowed from this web site: http://easycalculation.com/statistics/learn-regression.php

You can quickly test this on time series data with the Y-value of the chart being simple event counts. Run the following search from the Search App, and then click "Build Report":

<search>sourcetype=my_data | timechart count as yvalue | `lineartrend(_time,yvalue)` | timechart sum(yvalue) sum(newY)</search>

The first timechart generates the sample source data (a plot of event counts over time) which is then piped to the macro 'lineartrend' which generates the new y axis values of the trend. The second timechart stanza plots both the original values and the trendline on a chart, which you can visualize by clicking "Build report"

Step 3: Getting the correlation coefficient (R) or coefficient of determination (R-squared)

The above macro also calculates R and R-square for the above chart automatically. To display R-squared only (for example, to put it below your chart on a dashboard), run the following search:

sourcetype=my_data | timechart count as yvalue | `lineartrend(_time,yvalue)` | stats first(R2)

Sample dashboard

Plottingtrendline.JPG

Enjoy!

Gaurav 19:41, 7 December 2009 (PST)

Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk