Drew Fradette

The future is already here — it's just not very evenly distributed.

Python - Native Head and Tail Functions

| Comments

I was working on some log processing the other day when I encountered a situation where I wanted to have the python equivilant of Unix’s head and tail commands.

The problem here is that most anywhere you look, Python’s version of tail tends to share the same problems.

  • Reads the entire file into memory.
  • Iterate over the entire file.

Obviously, this can be quite problematic when you have 300mb logs constantly processing. Here is a more efficent version of tail:

headtail.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import os
def tail(filename, count=1, offset=1024):
    """
    A more efficent way of getting the last few lines of a file.
    Depending on the length of your lines, you will want to modify offset
    to get better performance.
    """
    f_size = os.stat(filename).st_size
    if f_size == 0:
        return []
    with open(filename, 'r') as f:
        if f_size <= offset:
            offset = int(f_size / 2)
        while True:
            seek_to = min(f_size - offset, 0)
            f.seek(seek_to)
            lines = f.readlines()
            # Empty file
            if seek_to <= 0 and len(lines) == 0:
                return []
            # count is larger than lines in file
            if seek_to == 0 and len(lines) < count:
                return lines
            # Standard case
            if len(lines) >= (count + 1):
                return lines[count * -1:]

def head(filename, count=1):
    """
    This one is fairly trivial to implement but it is here for completeness.
    """
    with open(filename, 'r') as f:
        lines = [f.readline() for line in xrange(1, count+1)]
        return filter(len, lines)

This is of course available as a gist as well.

Comments