A Peek Into the Black Magic That Is RegEx
Interestingly enough, the first topic of my ruby blog isn’t explicitly limited to the ruby language. Regular Expressions, or RegEx, are something I had seen used in ruby documentation and StackOverflow posts for months, but because its cryptic presentation felt so unapproachable to my young and impressionable eyes, I never investigated further. Sure, I copied and pasted some examples to use for string substitutions, but I certainly didn’t understand what on earth I was telling my method to actually do. Since then, the tune of my song has changed - after a brief explanation by a classmate, I realized that RegEx is an extremely powerful and endlessly useful tool to master, one that our instructor (only somewhat) jokingly compares to black magic.
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
A valid regular expression used to match a standard email address
In short, a regular expression is a sequence characters that can be used to describe a search pattern for text in a string. Correct usage of RegEx can create powerful search parameters in a single line of code. As such, there are a handful of common uses, from character replacement or subbing, list parsing and splitting, to ensuring proper text/user entry formatting in programs or web applications.
Regular expressions came about in the 1950s and were commonly used with Unix text processing. Since then, they have been used by a bevy of other programming languages: C, C++, Perl, PHP, Java, .NET, Ruby, Python, etc. While the specific uses of RegEx within each language may differ in use and implementation, my impression is that the core concepts are the same.
This post is meant by no means to be a comprehensive tutorial - I merely want to provide a helpful jumping-off point for interested readers. As such, I’ve provided a handful of helpful resources and RegEx practice environments linked below. I do, however, want to also introduce a few basic selectors that, once understood, will help begin lifting the fog for a beginner.
/abc/
vs /[abc]/
The first example highlights basic RegEx selector syntax. In the example above, we can see that surrounding text with brackets grants the ability to select any of the listed characters, rather than an explicit pattern. Doing so, you are able to link together multiple character selectors.
[a-z]
Utilizing this bracketing strategy, we’re able to select characters that fall within a certain range regardless of positioning - this format also works for digits.
/\d/
vs /\D/
& /\w/
vs /\W/
In the first you see the /\d/
and /\D/
selectors which target digits and non-digits, respectively. Next is the /\w/
which allows the targeting of all alphanumeric characters, and /\W/
which targets non-alphanumeric characters.
/[,\s.]/
Here we are using specific special characters, specifically ‘,\s.’ which allows the selection of white spaces, commas and periods. There are other special-symbol-specific selectors available as well.
As you can likely imagine, the use of regular expressions could get pretty complex, and my understanding and usage of the topic has really only begun. As promised, you can find some excellent resources below to explore regular expressions further.
RegexOne: Excellent interactive introductory tutorial on regular expressions.
RegExr: Useful tool to practice regular expressions with custom text.
Rubular: Another tool to practice.