Community:Commands
From Splunk Wiki
Assume you have two sets of data.
The raw events as seen by Splunk are as follows.
Raw dataset 1, sourcetype=names
12:03:05 ID=1 Name=Alice 12:06:33 ID=2 Name=Billy 12:08:12 ID=3 Name=Chuck 12:09:55 ID=4 Name=Danni
Raw dataset 2, sourcetype=types
12:04:07 ID=1 Type=Aardvark 12:08:15 ID=2 Type=Bumblebee 12:09:26 ID=3 Type=Cat 12:09:35 ID=4 Type=Dragon
Table view, dataset 1, sourcetype=names
sourcetype=names | table *
_time | ID | Name |
---|---|---|
12:03:05 | 1 | Alice |
12:06:33 | 2 | Billy |
12:08:12 | 3 | Chuck |
12:09:55 | 4 | Danni |
Table view, dataset 2, sourcetype=types
sourcetype=types | table *
_time | ID | Type |
---|---|---|
12:04:07 | 1 | Aardvark |
12:08:15 | 2 | Bumblebee |
12:09:26 | 3 | Cat |
12:09:35 | 4 | Dragon |
Searches using that data set
The Easy Way, using OR between search terms
A simple search using "OR" will put the two sets of data into the same output.
sourcetype=names OR sourcetype=types | table *
_time | ID | Type | Name |
---|---|---|---|
12:03:05 | 1 | Alice | |
12:04:07 | 1 | Aardvark | |
12:06:33 | 2 | Billy | |
12:08:12 | 3 | Charlie | |
12:08:15 | 2 | Bumblebee | |
12:09:26 | 3 | Cat | |
12:09:35 | 4 | Dragon | |
12:09:55 | 4 | Debbie |
Notice some shortcomings
- It is ordered by _time.
- Fixable by adding
| sort <fieldname>
to the end. sourcetype=names OR sourcetype=types | sort ID
- Fixable by adding
- It has empty fields.
- Fixed in various ways below.
- Events with matching ID fields aren't "together."
- Also fixed in various ways below.
Putting events with matching ID fields together
Using stats
Using stats to combine these is the fastest way [see note 1]. You do need to know a bit about each field, though.
sourcetype=names OR sourcetype=types | stats values(Name) as Name, values(Type) as Type, earliest(_time) as StartTime, latest(_time) as EndTime BY ID | table *
StartTime | EndTime | ID | Name | Type |
---|---|---|---|---|
12:03:05 | 12:04:07 | 1 | Alice | Aardvark |
12:06:33 | 12:08:15 | 2 | Billy | Bumblebee |
12:08:12 | 12:09:26 | 3 | Chuck | Cat |
12:09:35 | 12:09:55 | 4 | Danni | Dragon |
Note 1) Stats can split up a problem and have many indexers each work on their own part to the problem and doing the majority of the work. It then combines the separate results together at the end, a relatively easy and fast problem. If you have only a single indexer, the different between it and, say, the transaction command is minimal. With a larger environment, the difference in speed can be very large.
Using transaction
Transaction is conceptually simpler than the results you get out of stats when your objective is to "combine" events together. To use transaction you can use a field to group them together on (ID in this case), and in addition you can specify specific search terms to use to define when to start the grouping and when to end the grouping and start a new group.
sourcetype=names OR sourcetype=types | transaction ID | table *
Time | ID | Name | Type | duration | eventcount |
---|---|---|---|---|---|
12:03:05 | 1 | Alice | Aardvark | 62 | 2 |
12:06:33 | 2 | Billy | Bumblebee | 102 | 2 |
12:08:12 | 3 | Chuck | Cat | 74 | 2 |
12:09:35 | 4 | Danni | Dragon | 20 | 2 |
Without the | table *
at the end, you could see all transaction really did was use the first _time field of each group and smash the rest of the fields together.
Time | Event |
---|---|
12:03:05 | 12:03:05 ID=1 Name=Alice 12:04:07 ID=1 Type=Aardvark |
12:06:33 | 12:06:33 ID=2 Name=Billy 12:08:15 ID=2 Type=Bumblebee |
12:08:12 | 12:08:12 ID=3 Name=Chuck 12:09:26 ID=3 Type=Cat |
12:09:55 | 12:09:55 ID=4 Name=Danni 12:09:35 ID=4 Type=Dragon |
Because of that, each of the combined events has both a Name and a Type.
If your items don't match perfectly, like if there are occasional names that have no type that matches the ID, then on those combined events those particular fields will just be empty.
Some things to note:
- The first Time is assigned as the _time
- The duration was calculated and included for each result.
- A new field "eventcount" was also created, which in the above case is always 2 because there were two events smashed into one each time.