Regular expression: A regular expression is a pattern, constructed with a combination of characters and symbols that represents the strings. Regular expression are often used to match the strings or tokenize the strings. The short form for regular expression is commonly written as regex or regexp.

Lot of programmers confuse with the concept of regular expression and how this works? I wanted to explain this concept in detail with few examples. First, let's get an idea of why regular expressions have been introduced. Observe below scenarios

Scenario 1: Let's consider that we've to compare a string which should exactly matching "abc". We could immediately construct a if condition like below to check this matching


if(str.equals("abc")
{
//Code if string matches
}

Scenario 2: Let's slightly change the scenario to compare a string which should starts with 'a' and ends with 'c'. We could even write this check quickly using string methods (of course without using regular expression, please note that we're trying to understand why regular expressions are introduced).


if(str.startsWith("a") && str.endsWith("c"))
{
//Code if string matches
}

Scenario 3: Let's make a slight modification to the above scenario, here we want to compare a string that should starts with 'a', ends with 'c', and should contain at least one digit (0 - 9) in between. Do you sense the complication here? We could still implement this without regexp but we are at scenario 3. What shall we do for scenario 100 i.e.; a very very complex string comparison check.

So, I hope you would have concluded yourself that regular expressions are mainly introduced to match the string, in other words, to compare a string with a predefined format. We can use a method matches(String regexp), defined in java.lang.String class, to compare a string with regular expression. The code defined in matches() method will use a regular expression parser, a very complex algorithm, to compare a string. This method returns true if the string is matching with the regular expression passed as an argument.

Regular expression character definition

Here the list of different elements that could be used to form a regular expression. Just read these element types and understand with examples given in the next section.

  • Any character - any character from the ASCII character set apart from ^, !, <, >, $, (, ), *, -, +, ?, ., [, ], =, &, |, \. These characters have a special definition.
  • Set elements - any characters defined in between [ and ]
  • Group elements - any characters defined in between ( and )
  • Escape sequence characters - special characters have to escaped with slash (\) character

Special characters: There are few characters which have a predefined definition as below. These characters will be applied to a previous character or set or group.

  • $ - represents a end of line
  • * - represents none or more
  • . - represents any character
  • + - represents one or more
  • ? - represents only one character
  • - - represents a range or subtraction
  • < - less than
  • > - grater than
  • & - logical end
  • | - logical or
  • ^ - represents a negation
  • ( and ) - a group can be created
  • [ and ] - a set can be created

Examples

Regular expression Result Explanation
abc Matches with 'abc'  
a.c Matches with any string with length 3, starts with 'a' and ends with 'c' . represents any one character. So, the length of the string should be only 3 i.e.; 'a' followed by any one character followed by 'c'
a.*c Matches with any string starts with 'a' and ends with 'c' If you compare this with previous expression, we just added extra * next to ., this means that . can repeat any number of times (i.e.; 0 or more). This matches with ac, abc, adc, abbbbbbc, abcccccc, etc.
a.+c Matches with any string starts with 'a' and ends with 'c' Just added extra + in place of *, this means that . can repeat at least once and can be repeated more than one time (i.e.; 1 or more). This matches with abc, adc, abbbbbbc, abcccccc, etc. (Note, this will not match with ac)
[a-z] Matches with any one alphabet from a through z We just created a set with a-z elements. We can add A-Z to include capital letters or 0-9 to include digits.
[a-z]* Matches with any set of alphabets from a through z Addition of * represents that the elements from set can be repeated 0 or more times.
[^abc]* Matches with a string that doesn't contain 'abc' characters. ^ represents the negation. So, here it applied for abc.

RegExp.java

public class RegExp
{
public static void main(String[] args)
{
String str[] = {"abc", "ac", "abxdfc", "A", "a", "%@#,/_:;'\""};
String exp[] = {"abc", "a.c", "a*c", "[a-z]*", "[^abc]*", "[#|@|%|/|,|_|:|;|'|\"]+"};
for(String s:str)
{
System.out.println("********************************************************");
for(String e:exp)
{
if(s.matches(e))
{
System.out.println(s+" matches with regexp '"+e+"'");
}
else
{
System.out.println(s+" doesn't match with regexp '"+e+"'");
}
}
System.out.println("********************************************************\n");
}
}
}

Output

Santhosh>java RegExp
********************************************************
abc matches with regexp 'abc'
abc matches with regexp 'a.c'
abc doesn't match with regexp 'a*c'
abc matches with regexp '[a-z]*'
abc doesn't match with regexp '[^abc]*'
abc doesn't match with regexp '[#|@|%|/|,|_|:|;|'|"]+'
********************************************************

********************************************************
ac doesn't match with regexp 'abc'
ac doesn't match with regexp 'a.c'
ac matches with regexp 'a*c'
ac matches with regexp '[a-z]*'
ac doesn't match with regexp '[^abc]*'
ac doesn't match with regexp '[#|@|%|/|,|_|:|;|'|"]+'
********************************************************

********************************************************
abxdfc doesn't match with regexp 'abc'
abxdfc doesn't match with regexp 'a.c'
abxdfc doesn't match with regexp 'a*c'
abxdfc matches with regexp '[a-z]*'
abxdfc doesn't match with regexp '[^abc]*'
abxdfc doesn't match with regexp '[#|@|%|/|,|_|:|;|'|"]+'
********************************************************

********************************************************
A doesn't match with regexp 'abc'
A doesn't match with regexp 'a.c'
A doesn't match with regexp 'a*c'
A doesn't match with regexp '[a-z]*'
A matches with regexp '[^abc]*'
A doesn't match with regexp '[#|@|%|/|,|_|:|;|'|"]+'
********************************************************

********************************************************
a doesn't match with regexp 'abc'
a doesn't match with regexp 'a.c'
a doesn't match with regexp 'a*c'
a matches with regexp '[a-z]*'
a doesn't match with regexp '[^abc]*'
a doesn't match with regexp '[#|@|%|/|,|_|:|;|'|"]+'
********************************************************

********************************************************
%@#,/_:;'" doesn't match with regexp 'abc'
%@#,/_:;'" doesn't match with regexp 'a.c'
%@#,/_:;'" doesn't match with regexp 'a*c'
%@#,/_:;'" doesn't match with regexp '[a-z]*'
%@#,/_:;'" matches with regexp '[^abc]*'
%@#,/_:;'" matches with regexp '[#|@|%|/|,|_|:|;|'|"]+'
********************************************************

Why don't you try implementing the regular expressions to validate email/web addresses or validate time

If you've sometime, why don't you checkout my other Java programs.