Bit of a quirky title this one… A set of scripts I have been coding rely heavily on regular expressions to do pattern matching and substitution, and I have noticed as of late that the performance has been suffering somewhat. I decided to do a bit of investigation to see what is going on.
When I originally wrote the code I used case-insensitive matching to ensure that the results wouldn’t be thrown off by another developer making a subtle change later on. It turns out that the performance impact of doing this type of checking is huge!
To put things into perspective, my code was originally taking 20 seconds to read through a logfile (only about 5MB) and parse it for certain patterns. Now that the case-insensitive matching has been disabled, it takes less than 1!.
Just for fun I rewrote the code to try lowercasing the input string first and then trying a normal pattern match (as I could ensure the text was always lower-case) and that also turned out to be faster (though not as fast as no transformation obviously).
Lesson of the day: Only use case-insensitive matching where you have to, as it will SERIOUSLY impact performance when working with large data sets!