91: Seeing double with regular expressions

A few weeks ago, David and Katie were kind enough to have me on Mac Power Users to talk (to the extent that's possible) about regular expressions. It's a massive topic and we barely scratched the surface, but having listened back to the segment, I think it gives Mac users a number of points of entry. Even so, there was one big topic on my outline that I forgot to cover: multiple matches.

As David said, the basic conceit of regex is "find and replace on steroids". I think even that description is limiting, since the basic model of find and replace is to find one thing and replace it with one other thing. But once you're proficient in the basic search syntax, you can go much further by learning the replacement syntax. (Frustratingly, this tends to vary from app to app. I'll describe things with BBEdit's syntax, which I'm most familiar with.)

If you want to match multiple components of your overall regular expression, all you have to do is enclose them in parentheses. Then, each of these can be referenced in order, left to right, in your replacement string by using \1, \2, \3, and so on — though it gets pretty unwieldy after 4 or 5. This can be used for mundane tasks, like reordering parts of a date string, but it's also endlessly flexible, especially because you can refer to a captured group more than once.

This past week at work I applied multiple matches to what would have otherwise been a tedious clicky-draggy spreadsheet task. I was setting up software for some computer-based tests, including mapping the number of questions answered correctly to a standard score. (Remember the SAT? It's out of 1600 points, but there aren't 1600 questions. Thankfully.)

I had an Excel spreadsheet that a coworker made, but the software required a CSV, with each row having the low raw score, high raw score, low scaled score, and high scaled score for different ranges. We had already run an analysis and decided these correspondences for every possible value, so I needed those pairs of columns to be duplicates of one another. Yes, I could do that in Excel or Numbers, duplicating columns and saving out CSV files via an Export dialog. Or I could make a much faster round trip to BBEdit.

The routine was simple: paste two spreadsheet columns into a text file, giving me numbers separated by a tab. I captured the first number, matched a tab, and captured the second number: (\d+)\t(\d+). Then, since I needed to both flip the column order and duplicate them, I replaced the string with \2,\2,\1,\1. And thanks to BBEdit making the "Replace all" command available even without opening the Find and Replace window, transforming new data only took one keystroke: ⌥⌘=. 15 repetitions of the process only took a couple minutes. (This was right on the threshold of "should I automate this?" If it was 100 repetitions, the answer would have been a definite "yes".)

Now, chances are you'll never have to perform this exact task. In fact, the chance that I'll have to do this again are near zero. But the chances that you'll have to rearrange some lines of text are pretty high. Even if you have to look up the syntax, just knowing that the apps you have on your Mac are up to the task will enable you to solve the problem and get the work done faster.