Where do I configure my Splunk settings?
From Splunk Wiki
In many environments there are a lot of different Splunk servers performing different roles. For example:
- Light Forwarders
- Search Heads
When we want Splunk to do something, we can find out which configuration file, what settings, and what values to set in the Administration Manual. However it is not always clear which server the settings need to be on, especially for indexing data, and especially with the
transforms.conf file settings.
Phases of the Splunk data life cycle
To understand this, we first have to understand the different stages of the data life cycle in Splunk. These main phases for the purposes of understanding configuration are:
The Input phase acquires the raw data stream from its source and annotates it with source-wide keys. The keys are values that apply to the entire input source overall, and includes the host, source, and sourcetype of the data. The keys may also include values that are used internally by Splunk such as the character encoding of the data stream, and values that can control later processing of the data, such as the index into which the events should be stored.
During this phase, Splunk does not look at the contents of the data stream, so key fields must apply to the entire source, and not to individual events. In fact, at this point, Splunk has no notion of individual events at all, only a stream of data with certain global properties.
The Parsing phases looks at, analyzes, and transforms the data. The parsing phase has many sub-phases:
- Breaking the stream of data into individual lines
- Identifying, parsing, and setting time stamps
- Annotating individual events with metadata copied from the source-wide source, host, sourcetype, and other keys
- Transforming event data and metadata according to Splunk regex transform rules
The Indexing phase takes the events as annotated with metadata and after transformations and writes it into the search index.
Search is probably easier to understand and distinguish from the other phases, but configuration for search is similar to and often combined with that for input and parsing.
A couple of other phases and sub-phases:
also govern the data life cycle, but for the sake of simplification will not be discussed in this article.
Which Splunk servers go with which phases
Here are how some common Splunk server configurations correspond to these phases:
|Input||→||Parsing, Indexing, Search|
|Input, Parsing||→||Indexing, Search|
|Universal/Light Forwarder||→||Indexer||→||Search Head|
|Universal/Light Forwarder||→||Heavy Forwarder||→||Indexer|
Which configuration parameters go with which phases
This is a non-exhaustive list of which configuration parameters go with which phase. By combining this information with an understanding of which server a phase occurs on, you can determine which server particular settings need to be made on.
- LINE_BREAKER, SHOULD_LINEMERGE, BREAK_ONLY_BEFORE_DATE, and all other line merging settings
- TZ, DATETIME_CONFIG, TIME_FORMAT, TIME_PREFIX, and all other time extraction settings and rules
- TRANSFORMS* which includes per-event queue filtering, per-event index assignment, per-event routing. Applied in the order defined
- MORE_THAN*, LESS_THAN*
- stanzas referenced by a TRANSFORMS* clause in props.conf
- LOOKAHEAD, DEST_KEY, WRITE_META, DEFAULT_VALUE, REPEAT_MATCH
- stanzas referenced by a REPORT* clause in props.conf
- filename, external_cmd, and all other lookup-related settings
- FIELDS, DELIMS
- lookup files in the lookups folders
- search and lookup scripts in the bin folders
- search commands and lookup scripts
There are some settings that don't work well in a distributed server Splunk environment. These tend to be exceptional and include:
- CHECK_FOR_HEADER, LEARN_MODEL, maxDist. These are created in the parsing phase, but they require generated configurations to be moved to the search phase configuration location.