Reading Growing Files#

tail -f#

The tail -f command is one of the most frequent used by Sysadmins around the world. Reading of growing log files allows us to monitor system health online and react to the changes in a realtime. So why not to script it? In this chapter we will try to find out the way to create realtime log monitoring programs.

Basic: Polling the File#

In the previous chapter, we were reading from the infinite block device - /dev/random. We can do the same with the growing file. First at all we should ‘rewind’ the file to its end to make sure we will catch only the just added parts, and then read the opened file forever.

# this script emulates the tail -f command

open(ARGV.first) do |file|
  file.seek(0, IO::SEEK_END) # rewinds file to the end
  loop do                    # inifinite loop
    changes = file.read
    unless changes.empty?    # file.read returns "" if there is not more data to read
      print changes  # reads only a additional part of the file
    end
    sleep 1.0        # sleep for a second; without it script would use 100% of processor
  end
end

This program works as expected, displays the additional data which is appended to the file. It works almost in realtime with maximum delay of 1 second. But it is a waste of system resources: the process must wake up every second, read the file, check for a changes, etc. Much better would be to have an OS event triggered every time when file changes, handled with our own callback.

Event is an action coming from the program or outside world, which may be handled by the program. Function or code block which handle the event is called callback.

KQueue on BSD systems#

BSD-like operating systems (FreeBSD, OS X) have built-in kernel event notification subsystem, called kqueue. It notifies the program when some kernel event occurs. In particular, you can ask the kernel to trigger the event every time any program appends some specified file. With this technique you do not need to check the file yourself - the event will launch the specified code.

Remember that kqueue is only available on BSD and Darwin.

You do not need to know all the complicated kqueue staff: There is a Gem for That. This gem provides Ruby interface to kqueue. Simple install it with:

gem install rb-kqueue

If you are not familiar with event-based programming, take a deeper look at the program below. After initializing the object we create a Watcher on our file, triggering every time when the file is extended. The other possibilities are: :write, :delete, :rename, etc (refer to ri watch_file for full list). The Watcher takes a block - this is a callback of the event.

The next step is to wait for the event. It is done with queue.run - this puts our script in the sleep mode and starts responding the the events. So from now, every time the file is appended, the callback block is executed.

# this script emulates the tail -f command, on BSD-like systems only
require 'rb-kqueue'              # load rb-kquque gem

open(ARGV.first) do |file|
  file.seek(0, IO::SEEK_END)     # rewinds file to the end
  queue = KQueue::Queue.new
  queue.watch_file(ARGV.first, :extend) do
    print file.read              # this is a callback block
  end
  queue.run                      # runs the infinite loop on a queue
end

Digression: KQueue with Processes#

Notice that kqueue has more possibilites. You can watch_process as well, react when some process exits or forks. Below is a small example - waiting the process with the given PID to finish. Consider how useful it could be when waiting for some background jobs to finish!

Notice that we used queue.process instead of queue.run. The difference it that queue.process launches only one event and continues after handling it. In this case, we are waiting for process to finish, so it can happen only once.

require 'rb-kqueue'
pid = ARGV.first.to_i

queue = KQueue::Queue.new
queue.watch_process(pid, :exit) do |event|
  puts "Process #{pid} terminated."
end
queue.process
puts "Nothing to watch anymore"

Inotify on Linux#

On Linux there is a kernel subsystem called inotify to notice changes on the filesystem. To use it in Ruby the best is to install the ‘rb-inotify’ Gem with gem install rb-inotify. The API for that is very similar to kqueue:

require 'rb-inotify'

open(ARGV.first) do |file|
  file.seek(0, IO::SEEK_END)     # rewinds file to the end
  queue = INotify::Notifier.new
  queue.watch(ARGV.first, :modify) do
    print file.read              # this is a callback block
  end
  queue.run                      # runs the infinite loop on a queue
end

The difference is that the watcher creation method is called watch instead of watch_file, and it notifies on :modify, not :extend. All the rest remains similar.

Remember that inotify is only available on Linux.

Putting All Systems Together#

It is a good practice to write platform-independent code. But it does not mean that we must stick with the simplest techniques, like polling the file every few seconds. Kqueue and inotify are powerful systems and it is worth to use them. So why not to put all the scripts together and use different technique depends on which OS we are on?

def tail_dash_f(filename)
  open(ARGV.first) do |file|
    file.seek(0, IO::SEEK_END)
    case RUBY_PLATFORM   # string with OS name, like "amd64-freebsd8"
    when /bsd/, /darwin/
      require 'rb-kqueue'
      queue = KQueue::Queue.new
      queue.watch_file(ARGV.first, :extend) do
        yield file.read
      end
      queue.run
    when /linux/
      require 'rb-inotify'
      queue = INotify::Notifier.new
      queue.watch(ARGV.first, :modify) do
        yield file.read
      end
      queue.run
    else
      loop do
        changes = file.read
        unless changes.empty?
          yield changes
        end
        sleep 1.0
      end
    end
  end
end

tail_dash_f ARGV.first do |data|
  print data
  if data =~ /error/i
    # do something else, for example send an email to administrator
  end
end

In the code above we defined a general, OS-independent method tail_dash_f, which takes filename and the code block. This block will be executed every time the file changes, and the block variable data is filled with just appended part of the file.

RUBY_PLATFORM constant contains string representing the current operating system, for example “amd64-freebsd8” or “x86_64-darwin12.4.0”. So it is easy to find out on which OS the script is running. Notice that loading the gems (require method) is inside the when blocks, so you do not have to install kqueue Gem on Linux or Inotify on BSD.