Create Capture Groups in Regular Expressions

Joe Maddalone
InstructorJoe Maddalone
Share this video with your friends

Social Share Links

Send Tweet
Published 8 years ago
Updated 5 years ago

In this lesson we'll capture groups of characters we wish to match, use quantifiers with those groups, and use references to those groups in String.prototype.replace.

[00:00] Capturing groups allow us to group together parts of a regular expression and then allows us to apply quantifiers to those groups of characters. So let's go ahead and create a string here, and we're going to say, "Foo foo bar foo baz and foo boo." So in our regular expression let's say we want to capture foo bar and foo boo. First let's start with foo, we know how to do that, we're capturing all of the foos. If we just wanted to get foo bar, we could just add bar there.

[00:36] But if we want to get foo bar or foo boo, we can create a capturing group. We do that with parentheses. We just add the strings that we're looking for here, so in this case it's going to be the exact same result, we've just added bar, but unlike a character class these characters are not optional, and they need to be this order. We're specifically looking for B-A-R after F-O-O. Now to get foo boo as well, we can add alternation by using a pipe delimiter to identify the options that we're willing to accept.

[01:09] So in this case we can just say foo boo, and now when we save that we've got foo bar and foo boo both highlighted. We can also apply quantifiers to our capturing group so if I wanted to say find foo followed by zero or more instances of bar or boo, you can see that we're getting foo followed by zero instances, followed by one instance, followed by zero instances, followed by one instance of B-O-O.

[01:35] So I'm going to go ahead and drop the quantifier there, and we're going to take a look at how capturing groups actually provide us a little bit more functionality than just being able to apply a quantifier to a group of characters. So we're going to console log string.replace, pass in our regex, and now we're going to use a reference to our capture group, and the way we do that is with a prefix of a $, and then a numeric value representing the index of that capture group.

[02:04] Now we've only got one capture group here, so the index of it is 1, it's not 0bound. So let's say we're going to replace it with ** around the value that we got from our capture group. So jump over here and load up our dev tools, and we can see that on the first line we just have our foo, nothing was replaced, but on foo bar, we replaced that with the ** around bar, and on the fourth line we replaced that with the ** around boo. So let's look at a way that we could use this more practically.

[02:39] Let's say we're going to try to find area codes in phone numbers. We're going to have that in a capture group of one. Here is a list of phone numbers, this certainly isn't going to be the most robust list, there's obviously international numbers, and a number of different ways that people could enter a phone number, but just using these as our basic data set we're going to try to identify the area code.

[03:04] Let's start off pretty simple, we know that we're going to have three digits, followed by three digits, followed by four digits, and then between each of these we're going to have a character class for a possible non-breaking space or a dash. We'll make that whole character class optional, so we'll drop that in between each of these guys.

[03:30] Now these first three digits are the ones that we want to capture, so let's put that into a capturing group, and before that we may or may not have an opening parenthesis, so that will be escaped, the opening parenthesis and make it optional, and then after that, we may or may not have a closing parenthesis. I think we're looking pretty good. Let's save that and see where we're at.

[03:54] So cool, we've been able to identify the area code on each of those lines referenced inside of this capture group. So one last thing to mention is you may want to opt out of the actual storing of a reference, creating memory for that reference of your capture group. You may just want to use it for the alternation feature. So if you want to opt out of the actual capturing and storing that reference, you can do question mark, colon, right inside the capture group.

[04:23] If we save that, we can see over here that we're no longer getting a reference to our capture group, we're actually just getting the string $1. So that's an optimization you can make depending on your usage.

Piyush
Piyush
~ 8 years ago

I did not understand the application of ?: (non capturing of the capture group). When would I use this optimization?

Joe Maddalone
Joe Maddaloneinstructor
~ 8 years ago

As explained in the lesson, if you wanted to trim back on the memory allocated for storing the reference to the group you'd opt out. Perhaps you only wanted to utilize the alternation functionality, but did not need to reference the group later as expressed in something like:

var regex = /www\.google\.(?:com|net|org)/ regex.test('www.google.net') //true regex.test('www.google.edu') //false

Markdown supported.
Become a member to join the discussionEnroll Today