Some people, when confronted with a problem, think:
“I know, I'll use regular expressions”. Now they have two problems. - Jamie Zawinski
^ - start of line$ - end of line[] - character class. Matches any character in range with "or". Examples: 1. <H[123456]> will match <h1> .. <h6> HTML tags; 2. [0-9a-fA-F] - it doesn't matter the position of ranges inside the character class. ! Metacharacters have different meaning when used inside the ch. classes. For example "-" is a metacharacter only when used inside a ch. class and if and only if it is not the first ch. in the class. Ex: [-1-3] -> this will match one single ch. which can be: -/1/2/3 ![^...] - negated ch. class matches any ch. that isn't listed. Examples: 1. [^1-6] matches any character that's not 1-6; 2. q[^u] applied on "Qantas" or "Iraq" fail. Why?. - matches any ch. Some flavors of regex have an options which allows . to match a new line| - "or" Examples: 1. Bob|Robert - Bob and Robert are called "alternatives"; 2. gr[ea]y is the same as gray|grey; 3. gr[e|a]y - | is not a metacharacter in a ch. class!() - used to constrain the alternation Examples: 1. gr(a|e)y == gr[ea]y 2. (First|1st) [Ss]treet == (Fir|1)st [Ss]treet
\<cat\> - means: match if we can find a start-of-word position followed immediately by "cat" followed immediately by an end-of-word position
? - optional. It is placed after the ch. that is allowed to appear at that point in the expression.
Examples: 1. (July|Jul) (fourth|4th|4) == (July?) (fourth|4(th)?) == July? (fourth|4(th)?)
! This expression always succeeds. !
+ - one ore more of the immediately-preceding item. Should match at least once, fails otherwise
* - any member, including none, of the item. Allows but not requires.
! This expression always succeeds. ! ! + and * are quantifiers because they influence the quantity of what they govern ! Examples: 1. <H[1-6] *> -> matches: <H1 >; <H1>; <H1 > and so on; 2. An expression to match an HTML tag like: <HR SIZE=14> :<HR +SIZE *= *14 *> or, to match any SIZE: <HR +SIZE *= *[0-9]+ *> or, to match tag with optional SIZE: <HR( +SIZE *= *[0-9]+)? *>
{min, max} - interval quantifier. Matches the preceding sequence from min to max times{n} - matches the preceding sequence n times
! Backreferencing is a regular-expression feature that allows you to match new text that is the same as some text matched earlier in the expression !
Examples: to match doubled words in a text: "the the":
\<the +the\> == \<([A-Za-z]+) +\1\>
! \1 - counts the parentheses set from left to right !
Examples: 1. ega.att.com -> can match -> "megawatt computing"
To avoid this we will use: ega\.att\.com
(?:...) - non capturing parentheses.
! To understand this take a look at the example given in "BACKREFERENCES" section: \<([A-Za-z]+) +\1\>. If we use here a non-capturing parentheses: \<(?:[A-Za-z]+) +\1\> the \1 will not work because the parentheses didn't get counted by the regex engine !
1. Variable names: [a-zA-Z][a-zA-Z_0-9]*2. A string with double quotes: "[^"]*"3. Dollar amount with optional cents: \$[0-9]+(\.[0-9][0-9])?4. HTML/HTTP URL: \<http://[-a-z0-9_.:]+/[-a-z0-9_:@&?=+,.!/~*%$]*\.html?\>5. An HTML tag:- the wrong way :) -> <.*> will match "<i>short</i> example ";- a better way might be: <[^>]*>
\b - matches a word boundary. It doesn't matter if it's the start-of-word or end-of-word. Inside a class it matches a backspace\s - whitespace. Includes: space, tab, newline, CR Examples: 1. \s* -> matches 0 or more spaces\S - anything not \s\t - a tab character\r - a CR character\n - a newline character\w - [a-zA-Z0-9_] Examples: 1. \w+ to match a word\W - anything not \w
\d - [0-9] a digit\D - [^0-9]\G - specifies that the matches must be consecutive, without any intervening non-matching characters.
1. Lookahead - peeks forward in the text (towards the right) to see if its subexpression can match. Examples: (?=...) ; (?=\d) -> is successful at positions where a digit comes next.2. Lookbehind - looks back (towards the left). Examples: (?<=...) ; (?<=\d) -> is successful at positions with a digit to the left.! These are non-consuming expressions !Examples:
1. Positive Lookahead - (?=...) -> successful if can match to the right;2. Negative Lookahead - (?!...) -> successful if can not match to the right;3. Positive Lookbehind - (?<=...) -> successful if can match to the left4. Negative Lookbehind - (?<!...) -> successful if can not match to the left
(?<groupName>...) - This will assign a name to the captured group in the parentheses.
Example: \b(?<Area>\d\d\d)-(?<Exch>\d\d\d)-(?<Num>\d\d\d)\b
1. nfa|nfa not - if only "nfa" match -> NFA - if the match is "nfa not" -> DFA or POSIX DFA2. X(.+)+X on a string like: =XX=============== - if it takes long -> NFA - short -> DFA
1. The longest the string that can be pre-checked, the fewer "false starts": Example: (this|that|other) - can match only at locations beginning with [ot];2. Avoid superfluous parentheses;3. Use leading anchors in a regex: $ or ^;4. Being more specific shortens the number of cycles backtracking has to do before finding a match. Example: - apply ".*" on: The name "McDonald's" is said "makudonarudo" in Japanese;
- replacing the dot with [^"] gives less surprising results because it is more specific, and the end result is that with it, the new regex is more efficient to boot. With "[^"]*" the [^"]* can't get past the closing quote, eliminating much matching and subsequent backtracking.
.SingleLine - causes dot to match any character;.Multiline - expands where ^ and $ can match;.IgnorePatternWhitespace - sets free-spacing and comment mode;.IgnoreCase - turns on case-insensitive matching;.ExplicitCapture - turns capturing off for (...), so only (?<groupName>) capture will be captured;.ECMAScript - Restricts \w, \s and \d to match ASCII characters only, and more;.RightToLeft - The transmission applies the regex normally, but in the opposite direction (starting at the end of the string). Unfortunately, buggy;.Compiled - Spends extra time up front optimizing the regex so it matches more quickly when applied.

1. RegexObj.IsMatch(target [,offset]) Returns: bool IsMatch applies the object's regex to the target string indicating whether the attempt is successful.2. RegexObj.Match(target [,offset] [,maxlength]) Returns: the first Match object Match applies the object's regex to the target string, returning a Match object.3. RegexObj.Matches(target [,offset]) Returns: a collection of Match objects: MatchCollection Matches is similar to Match method, except Matches returns a collection of Match objects representing all the matches in the target, rather than just the first match.
Doing a Replace:
RegexObj.Replace(target, replacement [,count] [,offset])
Returns: string (a possibly changed copy of target)
The Replace method does a search-and-replace on the target returning a possibly changed copy of it. It applies the Regex object's regular expression, but instead of returning a Match object, it replaces the matched text.
Special Per-Match Replacement Sequences:
$& - text matched by the regex;
$1, $2, ... - text matched by the corresponding set of capturing parentheses;
${groupName} - text matched by the corresponding named capture;
$` - text of the target string before the match location;
$' - text of the target string after the match location;
$$ - a single '$' character;
$_ - a copy of the entire original target string
Splitting strings:
RegexObj.Split(target [,count] [,offset])
Return: string[]
The Split method applies the object's regex to the tar get string, returning an array of the strings separated by the matches.
Static "Convenience" Functions:
Regex.IsMatch(target [,pattern] [,options]);Regex.Match(target [,pattern] [,options]);Regex.Matches(target [,pattern] [,options]);Regex.Replace(target [,pattern] [,replacement] [,options]);Regex.Split(target [,pattern] [,options]);
import System.Text.RegularExpressions;
Create the regex object:
Regex myRegex = new Regex("\d+\w+", RegexOptions.IgnoreCase);
Create the Match object:
Match myMatch = myRegex.Match(sourceString);
Navigate through the Match object:
while (myMatch != Match.Empty)
{
// do something whith myMatch properties
// then advance to the next match if any
myMatch = myMatch.NextMatch();
}
while (myMatch != Match.Empty)
{
// do something whith myMatch properties
string[] groupNames = myRegex.GetGroupNames();
for (var i=0; i<groupNames.Length; i++)
{
Group currentGroup = myMatch.Groups[groupNames[i]];
// do something with currentGroup object properties
}
// then advance to the next match if any
myMatch = myMatch.NextMatch();
}
import System.Text.RegularExpressions;
Create the regex object:
Regex myRegex = new Regex("\d+\w+", RegexOptions.IgnoreCase);
Create the resultant String object:
string replacedText = myRegex.Replace(sourceString, "<b>$1</b>"); import System.Text.RegularExpressions;
Create the regex object:
Regex myRegex = new Regex("\.");
Create the resultant string[] object:
string[] splittedText = myRegex.Split("209.204.146.22");
// splittedText will be ["209", "204", "146", "22"]"Selecting the proper tool to use when faced with a problem seems to be half the battle." - Mastering Regular Expressions 2nd Edition, Jeffrey E.F. Friedl
"Half the battle when writing regular expressions is getting successful matches when and where you want them. The other half is to not match when and where you don't want." - Mastering Regular Expressions 2nd Edition, Jeffrey E.F. Friedl