Regular Expressions

What are regular expressions?

  • In short: regex
     
  • Is a sequence of characters that define a search pattern
     
  • Matching strings of text, such as particular characters, words, or patterns of characters.
((?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15})

What do you use it for?

  • Matching
     
  • Finding
     
  • Validation
     
  • Parsing data
     
  • Converting data

Different pieces of a regex

  • Non-printable characters
  • Anchors
  • Character classes
  • Shorthand characters
  • Quantifiers
  • Special characters
  • Assertions

Non-printable characters

  • Used to put non-printable characters in your regex
     

  • Many regex flavors also support the tokens \cA through \cZ to insert ASCII control characters

// Windows text files use \r\n to terminate lines, while UNIX text files use \n.

// To match a tab character (ASCII 0x09)
\t 

// For carriage return (0x0D)
\r 

// For line feed (0x0A)
\n  

Anchors

  • Do not match any characters, they match a position
^    // Match the beginning of the line

$    // Match the end of the line

\A   // Matches only at the very beginning

\z   // Matches only at the very end

\Z   // Matches like $ used in single-line mode 

\b   // Matches when the current position is a word boundary

\<   // Start of word

\>   // End Of Word

\B   // Matches when the current position is not a word boundary

Character classes

  • Defines the content of a pattern.
  • Matches only one out of several characters
  • The order of the characters inside a character class does not matter
.   // Any character

[]  // Encapsulate the definition for a class of characters, [0-9] matches any digit

-   // Defines the range of characters which are within, [a-zA-Z] 

^   // Defines a class by excluding the characters which follow the hat character

\w  // Any of the characters which are allowed in words

\W  // Any of the characters which are allowed as word separators

\r  // Carriage return, ASCII 13

\n  // Line feed, ASCII 10

\t  // Tab, ASCII 9

Shorthand characters

  • The actual characters matched by the shorthands depends on the software you're using
\d  // A digit: [0-9]

\D  // A non-digit: [^0-9]

\s  // A whitespace character: [ \t\n\x0B\f\r]

\S  // A non-whitespace character: [^\s]

\w  // A word character: [a-zA-Z_0-9]

\W  // A non-word character: [^\w]

Quantifiers

  • Question mark
    Makes the preceding token in
    the regular expression optional

     
  • Asterisk
    Attempts to match the preceding                                      
    token zero or more times

     
  • Curly braces
    Specifies a specific amount of
    repetition
// matches colour or color
colou?r 

// matches an HTML tag without 
// any attributes <[A-Za-z0-9]+>
<[A-Za-z][A-Za-z0-9]*> 
// to match a number between 
// 1000 and 9999
\b[1-9][0-9]{3}\b

Special characters

  • Reserves certain characters for special use
  • There are 12 characters with special meanings, often called "metacharacters"
// Backslash                    // Pipe symbol
\                               |

// The caret                    // Question mark
^                               ?

// Dollar sign                  // Asterisk
$                               *

// Period/dot                   // Plus sign
.                               +

// Opening parenthesis          // Closing parenthesis
(                               )

// Opening square bracket       // Opening curly brace
[                               {

// Most of them are errors when used alone.

Assertions

  • Does not consume characters in the string, but only assert whether a match is possible or not
     
  • It's like the start and end of line, and start and end of word anchors
// Collectively called "Lookarounds"

// Positive lookahead
q(?=u)

// Negative lookahead
q(?!u)

// Positive lookbehind
(?<=text)

// Negative lookbehind
(?<!text)

Regex in JavaScript

How to implement?

  • Regular expression literal

    • Provides a compilation when the script is loaded

    • Remains constant

    • Better performance

  • Constructor function

    • Provides runtime compilation

    • Used when you know the regex pattern will change or you don't know the pattern and are getting it from another source

// Regular expression literal
var re = /ab+c/;

// Constructor function of the regex object
var re = new RegExp("ab+c");

Example

function validateName() {

    var name = document.getElementById("fullName").value;
    
    if(name.length === 0 || !name.match(/^[A-Za-z]*\s{1}[A-Za-z]*$/)) {
        alert("Please fill in your full name");
        return false;
    } else {
        alert('Welcome '  + name);
        return true;
    }
}
^         // Matches beginning of a string
[A-Za-z]  // Matches any character in the set
*         // Match 0 or more of the preceding token
\s        // Matches any whitespace character
{1}       // Match 1 of the preceding token
$         // Matches the end of a string

Regex in PHP

Different regex

  • POSIX Regular Expressions

    • PHP has 7 functions for searching strings

    • is a collection of standards that define some of the functionality that a (UNIX) operating system should support
       

  • PERL Style Regular Expressions

    • much more powerful and flexible than

    • includes the POSIX classes and anchors

Functions

// Performs a Regular Expression Search & Replace
preg_filter()	

// Returns Array Entries That Match the Pattern
preg_grep()
	
// Returns the Error Code of the Last PCRE Regex Execution
preg_last_error() 

// Performs a Global Regular Expression Match
preg_match_all()	

// Performs a Regular Expression Match
preg_match()	

// Quote Regular Expression Characters
preg_quote()	

// Performs a Regular Expression Search & Replace Using a Callback
preg_replace_callback()	

// Performs a Regular Expression Search & Replace
preg_replace()	

//Splits a String By a Regular Expression
preg_split()	

Types of regex

  • Literal characters 

    • Can be a single character, a word, a phrase etc. that match themselves
       

  • Meta-characters

    • those that are recognised anywhere in the pattern except within square brackets

    • those that are recognised in square brackets
       

  • Literal and meta-character patterns must both be enclosed in slash "/" delimiters

Example

<?php
    // Literal expression
    $string = "You've found a bug in my code?";
    if (preg_match("/bug/", $string)) {
        echo $string . "<br>";
        echo "Bitch, it's a feature!";
    }
?>
<?php
    // Meta-characters expression
    $email = "test@example.org";
    $expression = "/^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$/";
    if (preg_match($expression, $email)) {
        echo "Correct!";
    } else {
        echo "Fail!";
    }
?>

Please check
these websites

Regular Expressions

By Kim Massaro

Regular Expressions

Introduction to regex

  • 687