Greedy Regular Expressions

By default the pattern matching in regular expressions is greedy in nature. The regex engine will try to match the largest string possible for the given expression rather than the first smallest match, which may not be desired behavior every time. Consider the example below where we are parsing an XML string using regular expressions.

Example

Test string:

<quote>There is enough in this world for everyone's need. There will be never enough for every one's greed. – Mahatma Gandhi<quote>
<quote>Greed is the inventor of injustice as well as the current enforcer. – Julian Casablancas<quote>

Test regular expression:

/<quote>.*<\/quote>/

Result:

There is enough in this world for everyone's need. There will be never enough for every one's greed. – Mahatma Gandhi</quote>
<quote>Greed is the inventor of injustice as well as the current enforcer. – Julian Casablancas

Solution

If you want to the regex engine to stop when it encounters first </quote> you can do it by specifically instructing the engine to not to be greedy by using .*? instead of .* (or .+? instead of .+?) to match the content between the <quote> and the first </quote>

Non-greedy regular expression:

/<quote>.*?<\/quote>/

Result:

There is enough in this world for everyone's need. There will be never enough for every one's greed. – Mahatma Gandhi

• • •