Community:Commands

From Splunk Wiki

Jump to: navigation, search

Assume you have two sets of data.

The raw events as seen by Splunk are as follows.

Raw dataset 1, sourcetype=names

12:03:05 ID=1 Name=Alice
12:06:33 ID=2 Name=Billy
12:08:12 ID=3 Name=Chuck
12:09:55 ID=4 Name=Danni

Raw dataset 2, sourcetype=types

12:04:07 ID=1 Type=Aardvark
12:08:15 ID=2 Type=Bumblebee
12:09:26 ID=3 Type=Cat
12:09:35 ID=4 Type=Dragon

Table view, dataset 1, sourcetype=names

sourcetype=names | table *

_time ID Name
12:03:05 1 Alice
12:06:33 2 Billy
12:08:12 3 Chuck
12:09:55 4 Danni

Table view, dataset 2, sourcetype=types

sourcetype=types | table *

_time ID Type
12:04:07 1 Aardvark
12:08:15 2 Bumblebee
12:09:26 3 Cat
12:09:35 4 Dragon

Searches using that data set

The Easy Way, using OR between search terms

A simple search using "OR" will put the two sets of data into the same output.

sourcetype=names OR sourcetype=types | table *

_time ID Type Name
12:03:05 1 Alice
12:04:07 1 Aardvark
12:06:33 2 Billy
12:08:12 3 Charlie
12:08:15 2 Bumblebee
12:09:26 3 Cat
12:09:35 4 Dragon
12:09:55 4 Debbie

Notice some shortcomings

  • It is ordered by _time.
    • Fixable by adding | sort <fieldname> to the end.
    • sourcetype=names OR sourcetype=types | sort ID
  • It has empty fields.
    • Fixed in various ways below.
  • Events with matching ID fields aren't "together."
    • Also fixed in various ways below.

Putting events with matching ID fields together

Using stats

Using stats to combine these is the fastest way [see note 1]. You do need to know a bit about each field, though.

sourcetype=names OR sourcetype=types | stats values(Name) as Name, values(Type) as Type, earliest(_time) as StartTime, latest(_time) as EndTime BY ID | table *

StartTime EndTime ID Name Type
12:03:05 12:04:07 1 Alice Aardvark
12:06:33 12:08:15 2 Billy Bumblebee
12:08:12 12:09:26 3 Chuck Cat
12:09:35 12:09:55 4 Danni Dragon

Note 1) Stats can split up a problem and have many indexers each work on their own part to the problem and doing the majority of the work. It then combines the separate results together at the end, a relatively easy and fast problem. If you have only a single indexer, the different between it and, say, the transaction command is minimal. With a larger environment, the difference in speed can be very large.

Using transaction

Transaction is conceptually simpler than the results you get out of stats when your objective is to "combine" events together. To use transaction you can use a field to group them together on (ID in this case), and in addition you can specify specific search terms to use to define when to start the grouping and when to end the grouping and start a new group.

sourcetype=names OR sourcetype=types | transaction ID | table *

Time ID Name Type duration eventcount
12:03:05 1 Alice Aardvark 62 2
12:06:33 2 Billy Bumblebee 102 2
12:08:12 3 Chuck Cat 74 2
12:09:35 4 Danni Dragon 20 2

Without the | table * at the end, you could see all transaction really did was use the first _time field of each group and smash the rest of the fields together.

Time Event
12:03:05
12:03:05 ID=1 Name=Alice
12:04:07 ID=1 Type=Aardvark
12:06:33
12:06:33 ID=2 Name=Billy
12:08:15 ID=2 Type=Bumblebee
12:08:12
12:08:12 ID=3 Name=Chuck
12:09:26 ID=3 Type=Cat
12:09:55
12:09:55 ID=4 Name=Danni
12:09:35 ID=4 Type=Dragon

Because of that, each of the combined events has both a Name and a Type.

If your items don't match perfectly, like if there are occasional names that have no type that matches the ID, then on those combined events those particular fields will just be empty.

Some things to note:

  • The first Time is assigned as the _time
  • The duration was calculated and included for each result.
  • A new field "eventcount" was also created, which in the above case is always 2 because there were two events smashed into one each time.
Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk