I was working on some log processing the other day when I encountered a situation where I wanted to have the python equivilant of Unix’s head and tail commands.
The problem here is that most anywhere you look, Python’s version of tail tends to share the same problems.
- Read the entire file into memory
- Iterate over the entire file
Obviously, this can be quite problematic when you have 300mb logs constantly processing. Here is a more efficent version of tail:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
import os def tail(filename, count=1, offset=1024): """ A more efficent way of getting the last few lines of a file. Depending on the length of your lines, you will want to modify offset to get better performance. """ f_size = os.stat(filename).st_size if f_size == 0: return [] with open(filename, 'r') as f: if f_size <= offset: offset = int(f_size / 2) while True: seek_to = min(f_size - offset, 0) f.seek(seek_to) lines = f.readlines() # Empty file if seek_to <= 0 and len(lines) == 0: return [] # count is larger than lines in file if seek_to == 0 and len(lines) < count: return lines # Standard case if len(lines) >= (count + 1): return lines[count * -1:] def head(filename, count=1): """ This one is fairly trivial to implement but it is here for completeness. """ with open(filename, 'r') as f: lines = [f.readline() for line in xrange(1, count+1)] return filter(len, lines) |
This is of course available as a gist as well.