From Splunk Wiki

Jump to: navigation, search


The best way to index log4j files is to set up a standard log4j-syslog appender on your log4j host.

Then configure the Splunk server's properties to strip the syslog header prior to other processing, so Splunk doesn't think the logs are single-line syslog entries.

More discussion around stripping syslog headers from log4j files

(from the #splunk irc channel)

[3:54pm] jpenvey: do you know if BREAK_ONLY_BEFORE is applied before or after TRANSFORMS in props?
[4:01pm] jpenvey: or how to remove multiple occurances of something in an event? I am trying to clean syslog headers from a multiline log4j event
[4:11pm] jpenvey: from what I can gather, there is no way to clean off the syslog header from every line in a multiline event
[4:19pm] jpenvey: is there a way I can apply a transform before the events are joined with BREAK_ONLY_BEFORE?
[4:20pm] mlanghor: jpenvey, this is being sent directly to splunk, or via a syslog daemon and picking up via splunk?
[4:21pm] mlanghor: amrit, or bob_deep any ideas on his question ^
[4:21pm] jpenvey: directly to splunk
[4:21pm] mlanghor: and you set it up to strip that syslog header already?
[4:22pm] amrit: hmm.. afaik the syslog stripper only works on the first line of an event (syslog is always single-line, isn't it???)
[4:22pm] amrit: but bob_deep  may know more..
[4:22pm] jpenvey: yeah so the syslog stripper that comes with splunk only works on single line events
[4:22pm] mlanghor: since the header's there, it's not being sent as a multi-line
[4:22pm] jpenvey: i am joining multiple lines into a single event, then the stripper doesnt work
[4:22pm] jpenvey: these are log4j logs (long events) coming over into a udp input in splunk
[4:23pm] mlanghor: I'd verify by sending to normal syslog daemon, but it's not a multi-line event according to syslog
[4:23pm] jpenvey: ive done that, its sent as separate lines
[4:24pm] mlanghor: not sure myself on that one then, I'd have to defer to the master bob_deep
[4:25pm] mlanghor: ;-)
[4:25pm] jpenvey: basically I need to run a transform to clean the headers, then join them via the BREAK_ONLY_BEFORE 
regex, and then run my other transforms :)
[4:25pm] amrit: hmm.
[4:26pm] • amrit can't remember on which side of break_only_before  transforms  are run
[4:26pm] jpenvey: they are run after, i am sure of that much
[4:26pm] gkanapath: transforms are run after break only before
[4:27pm] jpenvey: although break_only_before seems to ignore the syslog header
[4:27pm] gkanapath: but, if syslog is going straight into splunk udp
[4:27pm] gkanapath: you might be able to suppress adding the header in the first place
[4:27pm] amrit: so can you not make your transform remove every syslog header in the event after the first one..?
[4:27pm] jpenvey: right
[4:28pm] jpenvey: this is my regex: REGEX        = (?m)[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s[^\s]*\s([^^]*)
[4:28pm] jpenvey: that [^^] is a hack to get around .* not matching newlines, but not the problem
[4:28pm] gkanapath: set no_appending_timestamp = true in the inputs
[4:28pm] gkanapath: you could make the regex run over multi line
[4:29pm] gkanapath: REPEAT_MATCH=true
[4:29pm] jpenvey: oh nice
[4:29pm] gkanapath: but it would be quite expensive
[4:29pm] gkanapath: actually no that won't work
[4:29pm] jpenvey: yeah if I can prevent them from appearing in the first place, all the better
[4:29pm] gkanapath: b/c you can transform it
[4:29pm] jpenvey: ahh dude.. no_appending_timestamp worked
[4:30pm] jpenvey: excellent
[4:31pm] jpenvey: oh man that is so good.
[4:32pm] jpenvey: gkanapath, thanks
[4:32pm] gkanapath: np
[4:32pm] jpenvey: same goes to you amrit and mlanghor
[4:32pm] amrit: hey np
[4:32pm] amrit: good work gkanapath
[4:32pm] mlanghor: !gkanapath++
[4:33pm] mlanghor: damn, always forget the syntax
[4:33pm] jpenvey: seriously that was a huge pita, worthy of a dump someplace so anyone googling splunk log4j syslog w t f can find it
[5:12pm] piebob: is the reason [[Apps:StripSyslog|StripSyslog]] exists because we did not offer that setting back in the day?
[5:12pm] gkanapath: possibly. well also there is applicability if there are headers that were put on by syslog daemon
[5:13pm] gkanapath: the no-appending-timestamp only prevents splunk udp input from adding the headers
[5:13pm] gkanapath: but if it goes into a file instead of straight into splunk, or uses syslog forwarding, then there will be a header. 
(although in the latter case you'll want it to prevent splunk from adding a second header)
[5:14pm] gkanapath: Yeah, that setting only worked because he happened to be feeding UDP straight into splunk.
Personal tools
Hot Wiki Topics

About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk