Introduction to Regular Expressions
Materials for an entry-level regex course
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Markus Fritz, Toby Hodges, and Mike Smith
Bio-IT Community, EMBL
Do you often work with lots of data files on the computer? Are you often trying to spot particular files or lines of text in them that are important for you?
If so, then using regular expressions could save you a lot of time and frustration!
Regular expressions (regex/REs) are a method for describing patterns of characters
that you want to match in a body of text. A knowledge of regular expressions can
be extremely helpful in computational biology and when combined with text editors
and common tools (
awk, etc) used in command line computing.
These course materials are designed to give an introduction to using regular regular expressions. Working through the materials, you will learn how to quickly find and replace text in large files, controlling the types and numbers of characters matched, handling repeats, keeping certain parts of a matched pattern during replacement, and constructing sets of different options to be matched.
The course does not provide a comprehensive overview of the regex syntax or engine. Instead, they reflect the vast majority of use cases that the authors encounter. The background of the authors is represented in many of the examples chosen, which often focus on biological contexts and file formats.
For a comprehensive overview of regular expressions, we highly recommend the excellent regular-expressions.info.
- Regex Fundamentals
- Tokens & Wildcards
- Capture Groups
- Alternative Matching
- Links & Recommended Reading
Download a ZIP of all of the exercise files here.
These materials have been developed from a short workshop originally hosted by the authors at EMBL in January 2016. They will be first used in their more developed new state at a half-day course taking place at EMBL on 23rd March 2017.