💡 Recommendation: Use ERE (grep -E) for cleaner, more readable patterns
The Dot Metacharacter
. matches any single character (except newline)
Pattern: h.t
✓ "hat", "hot", "hit", "h9t", "h t"
✗ "ht", "hoot", "HEAT"
# Find any three-letter word starting with 'c' and ending with 't'
grep 'c.t' /usr/share/dict/words
# cat, cot, cut, c@t, etc.
# Match any character between slashes
grep '/./.' /etc/passwd
# Matches paths like /bin/bash
# Be careful - dot is greedy!
echo "192.168.1.1" | grep '192.168.1.1' # Literal (but . matches anything!)
echo "192x168y1z1" | grep '192.168.1.1' # Also matches!
Anchors: ^ and $
^ matches start of line
Pattern: ^root
✓ "root:x:0:0..."
✗ "the root user"
$ matches end of line
Pattern: bash$
✓ "/bin/bash"
✗ "bash script"
# Find users with bash shell
grep 'bash$' /etc/passwd
# Find comment lines (starting with #)
grep '^#' /etc/ssh/sshd_config
# Find empty lines
grep '^$' /etc/ssh/sshd_config
# Find lines with ONLY "root"
grep '^root$' /etc/group
Escaping Metacharacters
\ removes special meaning from the next character
# Match a literal dot (IP address)
grep '192\.168\.1\.1' /etc/hosts
# Match a literal asterisk
grep '\*\*\*' logfile
# Match a dollar sign
grep '\$HOME' script.sh
# Match a caret
grep '\^' file
# Match a backslash itself
grep '\\' /etc/fstab
⚠️ Shell Quoting: Use single quotes to prevent shell expansion!
# Match lines containing a vowel
grep '[aeiou]' /etc/passwd
# Match any digit
grep '[0123456789]' file
# Case insensitive matching
grep '[Rr]oot' /etc/passwd
# Match specific characters
grep 'log[0-9]' /var/log/
Character Ranges
[a-z] matches any character in the range
Range
Matches
Example
[a-z]
Lowercase letters
a, b, c, ... z
[A-Z]
Uppercase letters
A, B, C, ... Z
[0-9]
Digits
0, 1, 2, ... 9
[a-zA-Z]
All letters
Any letter
[a-zA-Z0-9]
Alphanumeric
Letters and digits
[0-9a-fA-F]
Hexadecimal
0-9, a-f, A-F
# Find lines starting with uppercase letter
grep '^[A-Z]' /etc/services
⚠️ Note:^ means negation only when it's the first character inside [ ]
POSIX Character Classes
Class
Equivalent
Matches
[[:alpha:]]
[a-zA-Z]
Alphabetic characters
[[:digit:]]
[0-9]
Digits
[[:alnum:]]
[a-zA-Z0-9]
Alphanumeric
[[:space:]]
[ \t\n\r\f\v]
Whitespace
[[:lower:]]
[a-z]
Lowercase
[[:upper:]]
[A-Z]
Uppercase
[[:punct:]]
-
Punctuation
[[:print:]]
-
Printable characters
# Find lines with digits (locale-safe)
grep '[[:digit:]]' /var/log/messages
Quantifiers: How Many?
Specify how many times the preceding element should match
* 0 or more+ 1 or more? 0 or 1{n} exactly n{n,m} n to m
The Asterisk: Zero or More
* matches the preceding element zero or more times
Pattern: ab*c
✓ "ac" - zero b's
✓ "abc" - one b
✓ "abbbc" - three b's
✗ "adc" - wrong character
# Match "color" or "colour"
grep 'colou*r' file
# Match any amount of whitespace
grep 'error: *' logfile # Space followed by zero or more spaces
# Match anything (greedy!)
grep '.*' file # Matches entire line
# Common pattern: find lines with repeated characters
grep 'ss*' /etc/passwd # One or more 's'
Plus and Question Mark (ERE)
+ = one or more
Pattern: ab+c
✗ "ac" - needs at least one b
✓ "abc"
✓ "abbbc"
? = zero or one
Pattern: colou?r
✓ "color"
✓ "colour"
✗ "colouur"
# ERE: Match one or more digits (must use -E)
grep -E '[0-9]+' /var/log/messages
# ERE: Optional 's' for plural
grep -E 'files?' file
# BRE equivalent (escaped)
grep '[0-9]\+' /var/log/messages
grep 'files\?' file
Interval Quantifiers: { }
{n,m} matches between n and m times (inclusive)
Syntax
Meaning
Example
{3}
Exactly 3 times
[0-9]{3} = "123"
{2,4}
2 to 4 times
a{2,4} = "aa", "aaa", "aaaa"
{2,}
2 or more times
x{2,} = "xx", "xxx", ...
{0,3}
0 to 3 times
y{0,3} = "", "y", "yy", "yyy"
# Match US ZIP codes (5 digits)
grep -E '^[0-9]{5}$' zipcodes.txt
# Match ZIP+4 format (5 digits, hyphen, 4 digits)
grep -E '^[0-9]{5}-[0-9]{4}$' zipcodes.txt
# Match 2-4 letter words
grep -E '\b[a-zA-Z]{2,4}\b' document.txt
Greedy vs. Lazy Matching
⚠️ Quantifiers are greedy by default - they match as much as possible
Text: <b>bold</b> and <b>more</b>
Pattern: <b>.*</b>
Greedy match: "<b>bold</b> and <b>more</b>"
Better pattern: <b>[^<]*</b>
Matches: "<b>bold</b>" then "<b>more</b>"
# Problem: greedy matching
echo 'firstsecond' | grep -o '.*'
# Returns: firstsecond
# Solution: negated character class
echo 'firstsecond' | grep -oE '[^<]*'
# Returns: first
# second
Alternation: The OR Operator
| matches either the expression before OR after
Pattern: cat|dog
✓ "I have a cat"
✓ "I have a dog"
✗ "I have a bird"
# Match error or warning
grep -E 'error|warning' /var/log/messages
# Match multiple file extensions
ls | grep -E '\.jpg|\.png|\.gif'
# Match different log levels
grep -E 'ERROR|WARN|FATAL' application.log
# BRE requires escape
grep 'error\|warning' /var/log/messages
Grouping with Parentheses
( ) groups expressions for quantifiers and alternation
Without Grouping
Pattern: ab+
Matches: a followed by one+ b's
✓ "ab", "abb", "abbb"
With Grouping
Pattern: (ab)+
Matches: "ab" one or more times
✓ "ab", "abab", "ababab"
# Repeat a group
grep -E '(na)+' lyrics.txt # "na", "nana", "nanana"
# Group with alternation
grep -E 'http(s)?://' urls.txt # http:// or https://
# Complex grouping
grep -E '(Mon|Tue|Wed|Thu|Fri)day' calendar.txt
Backreferences
\1, \2 reference previously matched groups
# Find repeated words
grep -E '\b([a-z]+)\s+\1\b' document.txt
# Matches: "the the", "is is", etc.
# Find lines where first and last word are the same
grep -E '^([a-zA-Z]+).*\1$' file
# Match HTML tags with matching close tags
grep -E '<([a-z]+)>.*\1>' file.html
# Find duplicate lines (consecutive)
sort file | grep -E '^(.*)$' | uniq -d
💡 Use Case: Finding duplicate words, validating paired elements, data consistency checks
grep: Pattern Searching
The primary tool for regex searching in Linux
grep
Basic Regular Expressions
grep -E / egrep
Extended Regular Expressions
grep -F / fgrep
Fixed strings (no regex)
Essential grep Options
Option
Description
Example
-i
Case insensitive
grep -i 'error'
-v
Invert match
grep -v '^#'
-c
Count matches
grep -c 'pattern'
-n
Show line numbers
grep -n 'TODO'
-l
List filenames only
grep -l 'main' *.c
-o
Only matching part
grep -oE '[0-9]+'
-r
Recursive search
grep -r 'config' /etc
-w
Whole word match
grep -w 'is'
Context Options
# Show 3 lines BEFORE match
grep -B3 'error' /var/log/messages
# Show 3 lines AFTER match
grep -A3 'error' /var/log/messages
# Show 3 lines before AND after (context)
grep -C3 'error' /var/log/messages
# Combine with other options
grep -B2 -A2 -n 'Exception' application.log
--
May 15 10:23:45 server process[1234]: Starting operation
May 15 10:23:46 server process[1234]: Loading config
May 15 10:23:47 server process[1234]: error: config not found
May 15 10:23:48 server process[1234]: Falling back to defaults
May 15 10:23:49 server process[1234]: Continuing...
sed applies text transformations using regular expressions
# Basic syntax
sed 's/pattern/replacement/' file
sed 's/pattern/replacement/g' file # Global (all occurrences)
# In-place editing
sed -i 's/old/new/g' file # Modifies file directly
sed -i.bak 's/old/new/g' file # Creates backup first
⚠️ Critical:sed -i modifies files directly! Always test first or create backups.
sed Substitution Patterns
# Basic substitution
sed 's/error/ERROR/' logfile
# Global substitution (all occurrences on line)
sed 's/old/new/g' file
# Case insensitive
sed 's/error/ERROR/gi' file
# Delete matching lines
sed '/pattern/d' file
# Delete empty lines
sed '/^$/d' file
# Delete comments
sed '/^#/d' /etc/config
# Multiple operations
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file
# Using different delimiter (useful for paths)
sed 's|/usr/local|/opt|g' file
sed with Capture Groups
# Swap first two fields (colon-separated)
sed 's/\([^:]*\):\([^:]*\)/\2:\1/' /etc/passwd
# ERE syntax (cleaner)
sed -E 's/([^:]*):([^:]*)/\2:\1/' /etc/passwd
# Reformat date: MM/DD/YYYY to YYYY-MM-DD
sed -E 's|([0-9]{2})/([0-9]{2})/([0-9]{4})|\3-\1-\2|g' dates.txt
# Add prefix to captured content
sed -E 's/^([0-9]+)/ID: \1/' file
# Surround matches with tags
sed -E 's/([0-9]{3}-[0-9]{4})/PHONE:\1:PHONE/g' contacts.txt
# Remove duplicate words
sed -E 's/\b([a-z]+)\s+\1\b/\1/g' document.txt
sed Address Ranges
# Apply only to line 5
sed '5s/old/new/' file
# Apply to lines 5-10
sed '5,10s/old/new/' file
# Apply from line 5 to end
sed '5,$s/old/new/' file
# Apply to lines matching pattern
sed '/^#/s/old/new/' file
# Apply between two patterns
sed '/START/,/END/s/old/new/' file
# Delete from pattern to end of file
sed '/pattern/,$d' file
# Print only lines 10-20
sed -n '10,20p' file
awk: Pattern Processing
awk combines regex pattern matching with field processing
# Match at beginning of line
awk '/^root/' /etc/passwd
# Match at end of line
awk '/bash$/' /etc/passwd
# Match specific field
awk -F: '$7 ~ /bash/' /etc/passwd # Field 7 contains "bash"
awk -F: '$7 == "/bin/bash"' /etc/passwd # Field 7 equals exactly
# Negation
awk -F: '$7 !~ /nologin/' /etc/passwd # Field 7 doesn't contain
# Complex conditions
awk -F: '$3 >= 1000 && $7 ~ /bash/' /etc/passwd
# Multiple patterns
awk '/start/,/end/' file # Range between patterns
awk Practical Examples
# Sum values in a column
awk '{ sum += $1 } END { print sum }' numbers.txt
# Average of matching lines
awk '/error/ { count++; sum += $NF } END { print sum/count }' log
# Extract unique values
awk -F: '{ print $7 }' /etc/passwd | sort -u
# Format output
awk -F: '{ printf "%-15s %s\n", $1, $7 }' /etc/passwd
# Count pattern occurrences by category
awk '/error/ { errors++ } /warning/ { warnings++ }
END { print "Errors:", errors, "Warnings:", warnings }' log
# Process Apache logs - count requests per IP
awk '{ ips[$1]++ } END { for (ip in ips) print ip, ips[ip] }' access.log
Common Patterns Library
Frequently used regex patterns for system administration
IP AddressesEmailDates/TimesURLsLog Entries
IP Address Pattern
[0-9]{1,3}1-3 digits
\.literal dot
[0-9]{1,3}1-3 digits
\.literal dot
[0-9]{1,3}1-3 digits
\.literal dot
[0-9]{1,3}1-3 digits
# Simple IP pattern (matches invalid IPs like 999.999.999.999)
grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' log
# Extract IPs from Apache log
awk '{ print $1 }' access.log | grep -oE '[0-9.]+' | sort -u
# Count connections per IP
grep -oE '^[0-9.]+' access.log | sort | uniq -c | sort -rn | head
# Find specific subnet
grep -E '192\.168\.[0-9]+\.[0-9]+' /var/log/messages
Email and URL Patterns
# Basic email pattern
grep -oE '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' contacts.txt
# URL pattern
grep -oE 'https?://[^[:space:]]+' document.txt
# Domain extraction from URL
grep -oE 'https?://[^/]+' urls.txt | sed 's|https\?://||'
# Find mailto links in HTML
grep -oE 'mailto:[^"]+' page.html
# Validate URL format
if [[ "$URL" =~ ^https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} ]]; then
echo "Valid URL"
fi
💡 Note: RFC-compliant email/URL validation is complex. These patterns work for common cases.