Regular expressions
Telerik Academy Alpha
DSA
Table of contents
What are regular expressions
Regular expressions
- A regular expression is a set of patterns used to match character combinations in strings
- Find and extract data from a document
- Validate content supplied in a form before it is submitted like:
- Telephone numbers
- SSN/EGN
- Email addresses
- Anything that follows a pattern
Regular expressions
Regular expressions - example
Regular expressions
- Regular expressions are an extremely powerful tool implemented in most languages
- Yet, regular expressions have their own syntax and usage of special characters
- Difficult to remember if you use them infrequently
- Regular expressions can be tested at:
Regular expression syntax
RegEx - syntax
- Special Characters:
-
\( ^\wedge \) - matches the beginning of input
-
\( ^\wedge T \)
- Matches: 'Telerik Academy', 'Telerik', 'Theta'
- Does not match: 'Academy', 'Good Telerik'
-
\( ^\wedge T \)
-
\( \$ \) - matches the end of input
-
\( y \$ \)
- Matches: 'Telerik Academy', 'Academy', 'yummy'
- Does not match: 'Telerik', 'Good Telerik'
-
\( y \$ \)
-
\( ^\wedge \) - matches the beginning of input
RegEx - syntax
- The regular expressions have a set of special characters,
that have a different behavior- Characters for matching multiple characters
- Characters for matching whitespace
- Characters for matching digits
- Characters for matching letters
- Etc…
- Full list of special characters can be found at:
https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions#Using_special_characters
RegEx - syntax
- Special Characters in Regex:
-
\( * \) – The preceding character is matched 0 or more times
-
\( a* \)
- Matches: 'alaaaaaa bala'
- Does not match: 'Telerik', 'John Doe'
- Remark: Ja*ohn
- Matches: 'John Doe'
- 'a' is matched 0 times
- Remark: Ja*ohn
-
\( a* \)
-
\( * \) – The preceding character is matched 0 or more times
RegEx - syntax
- Special Characters in Regex:
-
\( * \) – The preceding character is matched 1 or more times
-
\( a+ \)
- Matches: 'alaaaaaa bala'
- Does not match: 'Telerik', 'John Doe'
- Remark: Ja+ohn
- Does NOT matches: 'John Doe'
- 'a' is not matched
- Remark: Ja+ohn
-
\( a+ \)
-
\( * \) – The preceding character is matched 1 or more times
RegEx - syntax
- Special Characters in Regex:
-
\( ? \) – The preceding character is matched 0 or 1 times
-
\( T? \)
- Matches: 'Telerik Academy'
- Does not match: 'John Doe'
- Remark: Ja+ohn
- Does NOT matches: 'John Doe'
- 'a' is not matched
- Remark: Ja+ohn
-
\( T? \)
-
\( ? \) – The preceding character is matched 0 or 1 times
RegEx - syntax
- Special Characters in Regex:
-
\( . \)(dot) – matches any single character except the newline character
-
\( . \)
- Matches: 'Telerik Academy'
- (note: symbol by symbol)
- Remark: \( .* \)
- Matches any whole string
- Matches: 'Telerik Academy'
-
\( . \)
-
\( . \)(dot) – matches any single character except the newline character
RegEx - syntax
- Special Characters in Regex:
-
| – Matches one pattern or the other
- T|A
- Matches: 'Telerik Academy'
-
[xyz] – Character set
- Matches any one of the enclosed characters
-
[TAy]
- Matches: 'Telerik Academy'
-
| – Matches one pattern or the other
RegEx - syntax
- Special Characters in Regex:
-
[x-z] – Character set - Matches any one between the characters range
-
[0-9] - Matches a single character in the list
- Matches: 'John in 19-years-old'
-
[a-q] - Matches a single character in the list
- Matches: 'Telerik Academy'
-
[0-9] - Matches a single character in the list
- [A-Z], [a-z], [A-Za-z], [A-Za-z0-9]
-
[x-z] – Character set - Matches any one between the characters range
RegEx - syntax
- Special Characters in Regex:
-
[^xyx] – A negated or complemented character set
Matches anything that is not enclosed in the brackets-
[^john]+
- Matches: 'Telerik Academy'
- Does not match: 'john', 'jjjoo', 'jon'
-
[^john]+
-
[^xyx] – A negated or complemented character set
RegEx - syntax
- Special Characters in Regex:
-
{N} – matches exactly N occurrences
- Where N is a positive number
-
[A-z]{5}
- Matches: 'Telerik Academy'
- Does not match: 'JS is the best'
-
{N, M} – matches at least N and at most M occurrences of the preceding character
- Where N and M are positive integers
-
[A-z]{4, 5}
- Matches: 'Telerik Academy', 'JS is best'
- Does not match: 'Jon is the MAN'
-
{N} – matches exactly N occurrences
RegEx - syntax
- Special Characters in Regex:
-
\s – matches a single white space character, including space, tab, form feed, line feed
-
\S– matches a single character other than white space
-
\d – matches a digit character
- Equivalent to [0-9]
-
\s – matches a single white space character, including space, tab, form feed, line feed
RegEx - syntax
- Special Characters in Regex:
-
\D – matches any non-digit character
- Equivalent to [^0-9]
- Equivalent to [^0-9]
-
\w – matches any alphanumeric character including the underscore
- \W – matches any non-alphanumeric or underscore character
-
\D – matches any non-digit character
Questions?
[C# DSA] Regular expressions
By telerikacademy
[C# DSA] Regular expressions
- 994