Published at 17:01, Tue 6 May 2008
I’ve just released version 1.00 of my Text::Match::FastAlternatives Perl module. Since I’m apparently declaring it stable, I thought it was worth writing up a description of what it does, and how it does it.
Suppose you have a large list of strings, and a set of keys, and you need to determine, for each of the strings, whether any of the keys occur in it. For example, the list of strings might be a list of user-agent headers sent to a web server, and the keys a set of strings that are good indicators of robots accessing your site; you want to calculate some server statistics, but disregard any robotic traffic.
How do you go about doing that?
Published at 17:08, Fri 3 Aug 2007
[Also published here.]
Mastering Regular
Expressions has been around for a long time — this is the third
edition of a book originally published a decade ago. Does that actually
reflect justified popularity, or is it just that this is the only
book-length treatment of the various regex engines, how they differ, and how
to get the most out of them? I’m glad to say that doesn’t seem to be case:
if you use regexes in any depth at all, you should probably read this book.
Published at 18:31, Sun 28 Jan 2007
[Previously published here.]
Russ Cox recently wrote an article about the worst-case run time of two different implementation strategies for matching regexes: Thompson NFA, and backtracking NFA. In particular, Russ points to the Perl regex engine as an example of how not to do it.
The article’s an interesting read. But I don’t think the approach described is necessarily easy and/or useful to fit into Perl, though.