# Community:Plotting a linear trendline

### From Splunk Wiki

Did you know you can do a linear trendline (regression) within Splunk? While Splunk already comes with a built-in search command for doing trendlines based on moving averages (see the "trendline" command), you can also do more complex computations, such as a linear regression using search commands such as 'eventstats' and 'eval'.

Additionally, you can use search macros to mask the complexity of the underlying search.

### Step 1: Creating the macro

1. Create a search macro named "lineartrend(2)" via the Splunk Manager user interface. Official documentation on search macros are here:

2. The macro definition should be:

eventstats count as numevents sum($x$) as sumX sum($y$) as sumY sum(eval($x$*$y$)) as sumXY sum(eval($x$*$x$)) as sumX2 sum(eval($y$*$y$)) as sumY2 | eval slope=((numevents*sumXY)-(sumX*sumY))/((numevents*sumX2)-(sumX*sumX)) | eval yintercept= (sumY-(slope*sumX))/numevents | eval newY=(yintercept + (slope*$x$)) | eval R=((numevents*sumXY) - (sumX*sumY))/sqrt(((numevents*sumX2)-(sumX*sumX))* ((numevents*sumY2)-(sumY*sumY))) | eval R2=R*R

**Note:** When copy/pasting the macro definition, make sure you paste it into Splunk as a single line. You will get errors otherwise.

3. Macro arguments should be:

x, y

### Step 2: Plotting the trendline

The macro `lineartrend` gets passed two arguments, x and y values from each event, and creates the following fields for each event:

*slope*- the slope of the trendline,*yintercept*- the y-intercept of the trendline*R*- coefficient of correlation*R2*- R-squared, the coefficent of determination*newY*- the trendline values

For interested readers, the calculation was borrowed from this web site: http://easycalculation.com/statistics/learn-regression.php

You can quickly test this on time series data with the Y-value of the chart being simple event counts. Run the following search from the Search App, and then click "Build Report":

<search>sourcetype=my_data | timechart count as yvalue | `lineartrend(_time,yvalue)` | timechart sum(yvalue) sum(newY)</search>

The first timechart generates the sample source data (a plot of event counts over time) which is then piped to the macro 'lineartrend' which generates the new y axis values of the trend. The second timechart stanza plots both the original values and the trendline on a chart, which you can visualize by clicking "Build report"

### Step 3: Getting the correlation coefficient (R) or coefficient of determination (R-squared)

The above macro also calculates R and R-square for the above chart automatically. To display R-squared only (for example, to put it below your chart on a dashboard), run the following search:

sourcetype=my_data | timechart count as yvalue | `lineartrend(_time,yvalue)` | stats first(R2)

### Sample dashboard

Enjoy!

Gaurav 19:41, 7 December 2009 (PST)