Regular Expressions
What are regular expressions?
- In short: regex
- Is a sequence of characters that define a search pattern
- Matching strings of text, such as particular characters, words, or patterns of characters.
((?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15})
What do you use it for?
- Matching
- Finding
- Validation
- Parsing data
- Converting data
Different pieces of a regex
- Non-printable characters
- Anchors
- Character classes
- Shorthand characters
- Quantifiers
- Special characters
- Assertions
Non-printable characters
-
Used to put non-printable characters in your regex
-
Many regex flavors also support the tokens \cA through \cZ to insert ASCII control characters
// Windows text files use \r\n to terminate lines, while UNIX text files use \n.
// To match a tab character (ASCII 0x09)
\t
// For carriage return (0x0D)
\r
// For line feed (0x0A)
\n
Anchors
- Do not match any characters, they match a position
^ // Match the beginning of the line
$ // Match the end of the line
\A // Matches only at the very beginning
\z // Matches only at the very end
\Z // Matches like $ used in single-line mode
\b // Matches when the current position is a word boundary
\< // Start of word
\> // End Of Word
\B // Matches when the current position is not a word boundary
Character classes
- Defines the content of a pattern.
- Matches only one out of several characters
- The order of the characters inside a character class does not matter
. // Any character
[] // Encapsulate the definition for a class of characters, [0-9] matches any digit
- // Defines the range of characters which are within, [a-zA-Z]
^ // Defines a class by excluding the characters which follow the hat character
\w // Any of the characters which are allowed in words
\W // Any of the characters which are allowed as word separators
\r // Carriage return, ASCII 13
\n // Line feed, ASCII 10
\t // Tab, ASCII 9
Shorthand characters
- The actual characters matched by the shorthands depends on the software you're using
\d // A digit: [0-9]
\D // A non-digit: [^0-9]
\s // A whitespace character: [ \t\n\x0B\f\r]
\S // A non-whitespace character: [^\s]
\w // A word character: [a-zA-Z_0-9]
\W // A non-word character: [^\w]
Quantifiers
- Question mark
Makes the preceding token in
the regular expression optional
- Asterisk
Attempts to match the preceding
token zero or more times
- Curly braces
Specifies a specific amount of
repetition
// matches colour or color
colou?r
// matches an HTML tag without
// any attributes <[A-Za-z0-9]+>
<[A-Za-z][A-Za-z0-9]*>
// to match a number between
// 1000 and 9999
\b[1-9][0-9]{3}\b
Special characters
- Reserves certain characters for special use
- There are 12 characters with special meanings, often called "metacharacters"
// Backslash // Pipe symbol
\ |
// The caret // Question mark
^ ?
// Dollar sign // Asterisk
$ *
// Period/dot // Plus sign
. +
// Opening parenthesis // Closing parenthesis
( )
// Opening square bracket // Opening curly brace
[ {
// Most of them are errors when used alone.
Assertions
-
Does not consume characters in the string, but only assert whether a match is possible or not
- It's like the start and end of line, and start and end of word anchors
// Collectively called "Lookarounds"
// Positive lookahead
q(?=u)
// Negative lookahead
q(?!u)
// Positive lookbehind
(?<=text)
// Negative lookbehind
(?<!text)
Regex in JavaScript
How to implement?
-
Regular expression literal
-
Provides a compilation when the script is loaded
-
Remains constant
-
Better performance
-
-
Constructor function
-
Provides runtime compilation
-
Used when you know the regex pattern will change or you don't know the pattern and are getting it from another source
-
// Regular expression literal
var re = /ab+c/;
// Constructor function of the regex object
var re = new RegExp("ab+c");
Example
function validateName() {
var name = document.getElementById("fullName").value;
if(name.length === 0 || !name.match(/^[A-Za-z]*\s{1}[A-Za-z]*$/)) {
alert("Please fill in your full name");
return false;
} else {
alert('Welcome ' + name);
return true;
}
}
^ // Matches beginning of a string
[A-Za-z] // Matches any character in the set
* // Match 0 or more of the preceding token
\s // Matches any whitespace character
{1} // Match 1 of the preceding token
$ // Matches the end of a string
Regex in PHP
Different regex
-
POSIX Regular Expressions
-
PHP has 7 functions for searching strings
-
is a collection of standards that define some of the functionality that a (UNIX) operating system should support
-
-
PERL Style Regular Expressions
-
much more powerful and flexible than
-
includes the POSIX classes and anchors
-
Functions
// Performs a Regular Expression Search & Replace
preg_filter()
// Returns Array Entries That Match the Pattern
preg_grep()
// Returns the Error Code of the Last PCRE Regex Execution
preg_last_error()
// Performs a Global Regular Expression Match
preg_match_all()
// Performs a Regular Expression Match
preg_match()
// Quote Regular Expression Characters
preg_quote()
// Performs a Regular Expression Search & Replace Using a Callback
preg_replace_callback()
// Performs a Regular Expression Search & Replace
preg_replace()
//Splits a String By a Regular Expression
preg_split()
Types of regex
-
Literal characters
-
Can be a single character, a word, a phrase etc. that match themselves
-
-
Meta-characters
-
those that are recognised anywhere in the pattern except within square brackets
-
those that are recognised in square brackets
-
-
Literal and meta-character patterns must both be enclosed in slash "/" delimiters
Example
<?php
// Literal expression
$string = "You've found a bug in my code?";
if (preg_match("/bug/", $string)) {
echo $string . "<br>";
echo "Bitch, it's a feature!";
}
?>
<?php
// Meta-characters expression
$email = "test@example.org";
$expression = "/^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$/";
if (preg_match($expression, $email)) {
echo "Correct!";
} else {
echo "Fail!";
}
?>
Please check
these websites
Regular Expressions
By Kim Massaro
Regular Expressions
Introduction to regex
- 687