regex
https://regex101.com/r/9n9yhW/1
https://regex101.com/r/nnseOW/2
https://sourceforge.net/p/omegat/wiki/Filtering%20Segments/
(add ttt-docs too)
also the expressions to find source and target...
tags in omegat:
https://regex101.com/library?orderBy=RELEVANCE&search=omegat
https://regex101.com/r/A2aDKJ/1
https://regex101.com/r/l5zSWX/1
regular
expressions
reg
ex
what is a regex?
regular
that define a
search pattern
expression
A sequence of characters
Matches text that follows a pattern :Not a pattern in the sense of a sequence, but in the sense of having something in common. What do these pair of strings have in common?
- test - test
- abc - xyz
- 123 - 456
- 1 - 99999999
- 1 - a
- a23 - =,&
A sequence of characters
that define a search pattern.
regular
expressions
what are regexes useful for?
Purposes of regex
- Searching for a match (finding stuff in text)
- Segmentation
- Validating input
- QA automation
- Replacing part of the text
- Automatic or manual fixes
- Entity extraction
Goals of this training
- Make you less scared of using a computer to handle text (if you were!)
- Make you more independent
- Make you more capable (a bit closer to a power user)
- Make you more aware of what can be done even if you're not sure how it's done
- Ask for help
Non-goals
- Turn you into regex champs
everything is a character
characters form strings
literal characters
metacharacters
backslash
literal characters
literal matches
metacharacters
pattern matching
backslash
escaping
two exercises
- find asterisk in text
- find text "\n"
examples
the dot
the dot
- Matches any character
- including the line break, or not
character classes
character class shortcuts
quantifiers
Title Text
{n,m}
?
+
*
exercise
- find figures: [0-9]+
negation
Title Text
- [^...]
- negative lookaround
- capital case \S, \D, \W
execise
- find text between parenthesis
- find text between angle brackets
alternatives
Exercise after \d and |
- michael, Michael, Mike, mike
- Addresses: digit words Road or Street or abbrev
group constructs
anchors
metasequences
replacements
lookarounds
exercise
- remove the angle brackets
Blocks, example
\p{InArabic}\(
find an opening parenthesis preceded by an Arabic character
flags
case sensitive
e.g. A and a are different characters! (different code points)
global
Subtitle
Named groups
(?<token>[\d]*)
(?P<year>(?:19|20)\d\d)(?P<delimiter>[- /.])(?P<month>0[1-9]|1[012])\2(?P<day>0[1-9]|[12][0-9]|3[01])
Exercises
- [Qq]uestionnaire.*\.docx?$ <- add a list of files to the text sample
- Bullet Two
- Bullet Three
References
- https://www.regexmagic.com (regex generator)
- http://regular-expressions.com (complete reference)
- https://regexr.com (js regex tester)
- https://www.princeton.edu/~mlovett/reference/Regular-Expressions.pdf (complete tutorial)
Books
- Bullet One
- Bullet Two
- Bullet Three
if all that was unfathomable...
here's one last technique:
do not forget to cover examples sent by Valentina in a Word file (same as found in OmegaT's regex page)
Technical
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
- https://www.regular-expressions.info/quickstart.html
General public
- https://www.theguardian.com/technology/2012/dec/04/ict-teach-kids-regular-expressions
- https://regexcrossword.com/ <- want to play?
regex training
By msoutopico
regex training
- 170