Community:WritingLogfilesOnWindows

From Splunk Wiki

Jump to: navigation, search

Patterns of writing logfiles on Windows

Background

Log files generated by systems and applications are the most common way to acquire data for search, reporting, alerting and so on in Splunk.

Patterns of log file creation and writing have emerged over time under pressures of necessity, utility, performance, and file system semantics. Since UNIX in its various flavors was the dominant datacenter operating system during the 90s, the patterns of logging which became widespread are established with this type of system in mind. Windows has different filesystem semantics, and so some of these patterns are much more problematic on this platform.

In general, Splunk tries to avoid recommending changes to environments, data formats, communications strategies, etc where existing patterns are working. However in this case, there is an intractable problem which is not Splunk-specific, and we wish to provide useful data to make informed decisions.

Problems

1. Renaming files on UNIX is extremely reliable, while renaming files on Windows is can be problematic.

2. Exclusive access to files on UNIX is hard to do and requires platform-specific work to reliably achieve. Exclusive access to files on Windows occurs by default, requiring specific actions to enable concurrent access.

Background

On UNIX, file data and filenames are two different pieces. The filename is merely a pointer to the background storage mechanism. It is possible to have multiple pointers to the file data (hard links), or to even have no pointers to the file data (when a filename is deleted with rm, the content remains on disk so long as any programs are reading or writing to that file.)

By contrast, on Windows, there is no separation, so the filename and the file are the same thing. This means that deleting a file is the same as deleting the file's data, and renaming an open file is not possible.

The second issue is simply an artifact of backwards compatability. UNIX didn't start out with locking due to its age and multi-user approach, while DOS could never have contention issues since only one program could run at once, so exclusivity was presumed by many programs.

Renaming

On UNIX, renaming is desirable because it is almost guaranteed to succeed, so long as write permission to the containing directory(s) is available. It is nearly instant, atomic, and for many situations durable. These facts have caused "log rolling" to become popular on UNIX systems, where logs are created at a fixed name, and periodically renamed and recreated.

On Windows, renaming is impossible so long as the file is open for either reading or writing. Since there is no general handshaking API provided for arbitrary programs to synchronize close operations, file names cannot be reliably arranged in a situation where a file is written by one program and being read by another.

For the logging goal, the logging application cannot choose not to write the log file, and to acquire the data, Splunk cannot avoid reading the file. Thus renames are unavoidably problematic.

To avoid the problem entirely, logging applications on Windows should choose to write to new files instead of rolling, for example writing to a new file each day, or simply writing to a new file including the timestamp. Applications which are developed on windows typically follow this pattern (such as MS SQL Server or IIS).

For example, Java applications which run on windows using log4j may wish to use the third-party dated file appender class, which contains no renames.


File access exclusion

Whenever files are opened (via CreateFile() and equivalent calls) the default is exclusive access. In order to enable concurrent access by both a writer (a logging application) and a reader (such as Splunk), both programs *must* enable sharing flags. Splunk does enable the maximum amount of sharing possible in its attempts to open files. However, if a logging application does not enable sharing for read-only access in its attempts to open the log file a denial of access will occur.

Two scenarios are possible.

1. If a logging application has a logfile open without sharing for reading, then Splunk will not be able to open the file at all, until the writing application closes it. Typically, the result of this scenario is splunk will acquire no data at all until the program finishes with the file, so will fall behind for an hour, a day, or similar. Once the logging application moves on to the next file, Splunk quickly catches up on the file it is allowed to access, but again is denied access to the current file.

2. Sometimes Splunk may open the file for reading before the application does. This could happen, for example, when the logging application is restarted. In this situation the logging application will fail to open its own file for writing, because the expressed requirements for exclusion are not met. How the logging application deals with this situation is implementation dependent. Possibilities might include sending the data elsewhere, losing the log data, cessation of primary application duties until the file becomes writable again, and other possibilities. Usually this situation will resolve itself relatively quickly as Splunk reads the entire file and then closes it.


In order to mitigate this problem to provide real-time access to data, applications *must* be written with sharing actively enabled in their log file writing routines. Alternatively a non-file data routing method (such as network transmission) may be possible. As a last resort, monitoring the next-to-most-recent log file may enable functional operation with a data lag.

Specific known issues

One known issue is that log4j 1.2.x prior to release 1.2.15 does not recover from the failure to rename the logfile in its default configurations (such as RollingLogAppender or DailyRollingAppender), and will overwrite its own datafiles if the rename attempts fail. Customers using log4j are advised to be sure to update to 1.2.15 or later if they are monitoring their log files via any method on Windows. Another possibility is to use the addon listed above class that does not perform any renames at all.

Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk