Regular expressions
Telerik Academy Alpha

 

DSA

 Table of contents

What are regular expressions

 Regular expressions

  • A regular expression is a set of patterns used to match character combinations in strings
    • Find and extract data from a document
    • Validate content supplied in a form before it is submitted like:
      • Telephone numbers
      • SSN/EGN
      • Email addresses
      • Anything that follows a pattern

 Regular expressions

 Regular expressions - example

 Regular expressions

  • Regular expressions are an extremely powerful tool implemented in most languages
  • Yet, regular expressions have their own syntax and usage of special characters
  • Difficult to remember if you use them infrequently
  • Regular expressions can be tested at:

Regular expression syntax

 RegEx - syntax

  • Special Characters:
    • \( ^\wedge \) - matches the beginning of input
      • \( ^\wedge T \)
        • Matches: 'Telerik Academy', 'Telerik', 'Theta'
        • Does not match: 'Academy', 'Good Telerik'
    • \( \$ \) - matches the end of input
      • \( y \$ \)
        • Matches: 'Telerik Academy', 'Academy', 'yummy'
        • Does not match: 'Telerik', 'Good Telerik'

 RegEx - syntax

 RegEx - syntax

  • Special Characters in Regex:
    • \( * \) – The preceding character is matched 0 or more times
      • \( a* \)
        • Matches: 'alaaaaaa bala'
        • Does not match: 'Telerik', 'John Doe'
          • Remark: Ja*ohn
            • Matches: 'John Doe'
            • 'a' is matched 0 times

 RegEx - syntax

  • Special Characters in Regex:
    • \( * \) – The preceding character is matched 1 or more times
      • \( a+ \)
        • Matches: 'alaaaaaa bala'
        • Does not match: 'Telerik', 'John Doe'
          • Remark: Ja+ohn
            • Does NOT matches: 'John Doe'
            • 'a' is not matched

 RegEx - syntax

  • Special Characters in Regex:
    • \( ? \) The preceding character is matched 0 or 1 times
      • \( T? \)
        • Matches: 'Telerik Academy'
        • Does not match: 'John Doe'
          • Remark: Ja+ohn
            • Does NOT matches: 'John Doe'
            • 'a' is not matched

 RegEx - syntax

  • Special Characters in Regex:
    • \( . \)(dot) matches any single character except the newline character
      • \( . \)
        • Matches: 'Telerik Academy'
          • (note: symbol by symbol)
        • Remark: \( .* \)
          • Matches any whole string

 RegEx - syntax

  • Special Characters in Regex:
    • | – Matches one pattern or the other
      • T|A
      • Matches: 'Telerik Academy'
         
    • [xyz] – Character set
      • Matches any one of the enclosed characters
      • [TAy]
        • Matches: 'Telerik Academy'

 RegEx - syntax

  • Special Characters in Regex:
    • [x-z] – Character set - Matches any one between the characters range
      • [0-9] - Matches a single character in the list
        • Matches: 'John in 19-years-old'
      • [a-q] - Matches a single character in the list
        • Matches: 'Telerik Academy'
    • [A-Z], [a-z], [A-Za-z], [A-Za-z0-9]

 RegEx - syntax

  • Special Characters in Regex:
    • [^xyx] – A negated or complemented character set
      Matches anything that is not enclosed in the brackets
      • [^john]+
        • Matches: 'Telerik Academy'
        • Does not match: 'john', 'jjjoo', 'jon'

 RegEx - syntax

  • Special Characters in Regex:
    • {N} – matches exactly N occurrences
      • Where N is a positive number
      • [A-z]{5}
        • Matches: 'Telerik Academy'
        • Does not match: 'JS is the best'
    • {N, M} – matches at least N and at most M occurrences of the preceding character
      • Where N and M are positive integers
      • [A-z]{4, 5}
        • Matches: 'Telerik Academy', 'JS is best'
        • Does not match: 'Jon is the MAN'

 RegEx - syntax

  • Special Characters in Regex:
    • \s – matches a single white space character, including space, tab, form feed, line feed
       
    • \S– matches a single character other than white space
       
    • \d – matches a digit character
      • Equivalent to [0-9]

 RegEx - syntax

  • Special Characters in Regex:
    • \D – matches any non-digit character
      • Equivalent to [^0-9]
         
    • \w – matches any alphanumeric character including the underscore
       
    • \W – matches any non-alphanumeric or underscore character

Questions?

[C# DSA] Regular expressions

By telerikacademy

[C# DSA] Regular expressions

  • 1,007