From Splunk Wiki
The best way to index log4j files is to set up a standard log4j-syslog appender on your log4j host.
Then configure the Splunk server's properties to strip the syslog header prior to other processing, so Splunk doesn't think the logs are single-line syslog entries.
More discussion around stripping syslog headers from log4j files
(from the #splunk irc channel)
[3:54pm] jpenvey: do you know if BREAK_ONLY_BEFORE is applied before or after TRANSFORMS in props? [4:01pm] jpenvey: or how to remove multiple occurances of something in an event? I am trying to clean syslog headers from a multiline log4j event [4:11pm] jpenvey: from what I can gather, there is no way to clean off the syslog header from every line in a multiline event [4:19pm] jpenvey: is there a way I can apply a transform before the events are joined with BREAK_ONLY_BEFORE? [4:20pm] mlanghor: jpenvey, this is being sent directly to splunk, or via a syslog daemon and picking up via splunk? [4:21pm] mlanghor: amrit, or bob_deep any ideas on his question ^ [4:21pm] jpenvey: directly to splunk [4:21pm] mlanghor: and you set it up to strip that syslog header already? [4:22pm] amrit: hmm.. afaik the syslog stripper only works on the first line of an event (syslog is always single-line, isn't it???) [4:22pm] amrit: but bob_deep may know more.. [4:22pm] jpenvey: yeah so the syslog stripper that comes with splunk only works on single line events [4:22pm] mlanghor: since the header's there, it's not being sent as a multi-line [4:22pm] jpenvey: i am joining multiple lines into a single event, then the stripper doesnt work [4:22pm] jpenvey: these are log4j logs (long events) coming over into a udp input in splunk [4:23pm] mlanghor: I'd verify by sending to normal syslog daemon, but it's not a multi-line event according to syslog [4:23pm] jpenvey: ive done that, its sent as separate lines [4:24pm] mlanghor: not sure myself on that one then, I'd have to defer to the master bob_deep [4:25pm] mlanghor: ;-) [4:25pm] jpenvey: basically I need to run a transform to clean the headers, then join them via the BREAK_ONLY_BEFORE regex, and then run my other transforms :) [4:25pm] amrit: hmm. [4:26pm] • amrit can't remember on which side of break_only_before transforms are run [4:26pm] jpenvey: they are run after, i am sure of that much [4:26pm] gkanapath: transforms are run after break only before [4:27pm] jpenvey: although break_only_before seems to ignore the syslog header [4:27pm] gkanapath: but, if syslog is going straight into splunk udp [4:27pm] gkanapath: you might be able to suppress adding the header in the first place [4:27pm] amrit: so can you not make your transform remove every syslog header in the event after the first one..? [4:27pm] jpenvey: right [4:28pm] jpenvey: this is my regex: REGEX = (?m)[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s[^\s]*\s([^^]*) [4:28pm] jpenvey: that [^^] is a hack to get around .* not matching newlines, but not the problem [4:28pm] gkanapath: set no_appending_timestamp = true in the inputs [4:28pm] gkanapath: you could make the regex run over multi line [4:29pm] gkanapath: REPEAT_MATCH=true [4:29pm] jpenvey: oh nice [4:29pm] gkanapath: but it would be quite expensive [4:29pm] gkanapath: actually no that won't work [4:29pm] jpenvey: yeah if I can prevent them from appearing in the first place, all the better [4:29pm] gkanapath: b/c you can transform it [4:29pm] jpenvey: ahh dude.. no_appending_timestamp worked [4:30pm] jpenvey: excellent [4:31pm] jpenvey: oh man that is so good. [4:32pm] jpenvey: gkanapath, thanks [4:32pm] gkanapath: np [4:32pm] jpenvey: same goes to you amrit and mlanghor [4:32pm] amrit: hey np [4:32pm] amrit: good work gkanapath [4:32pm] mlanghor: !gkanapath++ [4:33pm] mlanghor: damn, always forget the syntax [4:33pm] jpenvey: seriously that was a huge pita, worthy of a dump someplace so anyone googling splunk log4j syslog w t f can find it . . . [5:12pm] piebob: is the reason [[Apps:StripSyslog|StripSyslog]] exists because we did not offer that setting back in the day? [5:12pm] gkanapath: possibly. well also there is applicability if there are headers that were put on by syslog daemon [5:13pm] gkanapath: the no-appending-timestamp only prevents splunk udp input from adding the headers [5:13pm] gkanapath: but if it goes into a file instead of straight into splunk, or uses syslog forwarding, then there will be a header. (although in the latter case you'll want it to prevent splunk from adding a second header) [5:14pm] gkanapath: Yeah, that setting only worked because he happened to be feeding UDP straight into splunk.