Regular Expressions, commonly known as Regex, are sequences of characters that define a search pattern. They are powerful tools used for sophisticated pattern matching, searching, and manipulating text. Regex patterns are a mini-language within themselves, understood by various programming languages and text editors.
Purpose and Applications:
* Validation: Ensuring user input (e.g., email addresses, phone numbers, passwords) conforms to specific formats.
* Searching: Finding specific patterns of text within larger bodies of text (e.g., all URLs, dates, or specific words).
* Extraction: Pulling out specific pieces of information from unstructured or semi-structured text.
* Replacement: Modifying text by replacing patterns with other text.
* Parsing: Breaking down text into meaningful components.
Key Regex Syntax Elements (Commonly Used):
* Literals: Most characters match themselves (e.g., `a` matches 'a', `hello` matches 'hello').
* Metacharacters: Special characters with specific meanings:
* `.`: Matches any single character (except newline, by default).
* `*`: Matches zero or more occurrences of the preceding character or group.
* `+`: Matches one or more occurrences of the preceding character or group.
* `?`: Matches zero or one occurrence of the preceding character or group (makes it optional). Also used for non-greedy matching.
* `[]`: Character set. Matches any one character within the brackets (e.g., `[aeiou]` matches any vowel). Ranges can be specified (e.g., `[a-z]` for lowercase letters, `[0-9]` for digits).
* `[^...]`: Negated character set. Matches any character *not* within the brackets.
* `()`: Grouping and capturing. Treats multiple characters as a single unit and captures the matched text for later use (e.g., backreferences or replacement).
* `|`: OR operator. Matches either the pattern before or after it (e.g., `cat|dog` matches 'cat' or 'dog').
* `^`: Anchor. Matches the beginning of the string or line.
* `$`: Anchor. Matches the end of the string or line.
* `\`: Escape character. Used to treat a metacharacter as a literal character (e.g., `\.` matches a literal dot, `\$` matches a literal dollar sign). Also used to introduce special sequences.
* `{n}`: Quantifier. Matches exactly `n` occurrences of the preceding element.
* `{n,}`: Quantifier. Matches `n` or more occurrences.
* `{n,m}`: Quantifier. Matches between `n` and `m` occurrences (inclusive).
* Special Sequences (Character Classes):
* `\d`: Matches any digit (0-9). Equivalent to `[0-9]`.
* `\D`: Matches any non-digit character. Equivalent to `[^0-9]`.
* `\w`: Matches any word character (alphanumeric + underscore). Equivalent to `[a-zA-Z0-9_]`.
* `\W`: Matches any non-word character.
* `\s`: Matches any whitespace character (space, tab, newline, etc.).
* `\S`: Matches any non-whitespace character.
* `\b`: Word boundary. Matches the position between a word character and a non-word character.
* `\B`: Non-word boundary.
In Rust:
Rust provides robust regex capabilities through the `regex` crate, which is widely used and highly performant. It offers functions for compilation, matching, finding all occurrences, capturing groups, and replacing text.
Example Code
```rust
// To use the regex crate, add it to your Cargo.toml:
// [dependencies]
// regex = "1"
use regex::Regex;
fn main() {
// 1. Creating a Regex object
// The `Regex::new` function compiles a regex from a string.
// It returns a Result, so we use `unwrap()` for simplicity in this example.
// The `r"..."` syntax creates a raw string literal, which is useful for regex patterns
// because it reduces the need to double-escape backslashes for Rust's string literal rules.
let re = Regex::new(r"\\b\\w{4}\\b").unwrap(); // Matches exactly 4-letter words, bounded by non-word chars
let text = "The quick brown fox jumps over the lazy dog.";
println!("Original text: \"{}\"", text);
// 2. Checking for a match
// `is_match` returns true if the regex matches anywhere in the text.
if re.is_match(text) {
println!("\nPattern '\\b\\w{4}\\b' (4-letter words) found in the text.");
}
// 3. Finding all occurrences
// `find_iter` returns an iterator over all non-overlapping matches.
println!("\nAll 4-letter words found:");
for mat in re.find_iter(text) {
println!("- {}", mat.as_str());
}
// 4. Using capturing groups
// This regex matches a date in YYYY-MM-DD format and captures year, month, day.
// Parentheses `()` create capturing groups.
let date_re = Regex::new(r"(\\d{4})-(\\d{2})-(\\d{2})").unwrap();
let date_text = "Today's date is 2023-10-27 and tomorrow will be 2023-10-28.";
println!("\nText with dates: \"{}\"", date_text);
// `captures` finds the first match and returns an Option<Captures>.
if let Some(captures) = date_re.captures(date_text) {
println!("\nFirst date found and captured groups:");
// captures.get(0) is the entire matched string.
// captures.get(1) is the first capturing group (year).
// captures.get(2) is the second capturing group (month), etc.
println!(" Full match: {}", captures.get(0).map_or("", |m| m.as_str()));
println!(" Year: {}", captures.get(1).map_or("", |m| m.as_str()));
println!(" Month: {}", captures.get(2).map_or("", |m| m.as_str()));
println!(" Day: {}", captures.get(3).map_or("", |m| m.as_str()));
}
// `captures_iter` returns an iterator over all matches, each containing its captures.
println!("\nAll dates and their components:");
for caps in date_re.captures_iter(date_text) {
println!(" Match: {}", caps.get(0).map_or("", |m| m.as_str()));
println!(" Year: {}", caps.get(1).map_or("", |m| m.as_str()));
println!(" Month: {}", caps.get(2).map_or("", |m| m.as_str()));
println!(" Day: {}", caps.get(3).map_or("", |m| m.as_str()));
}
// 5. Replacing text
let replace_re = Regex::new(r"fox|dog").unwrap(); // Matches 'fox' or 'dog'
let replaced_text = replace_re.replace_all(text, "CAT");
println!("\nOriginal text for replacement: \"{}\"", text);
println!("Text after replacing 'fox' or 'dog' with 'CAT': \"{}\"", replaced_text);
// Replacing using captured groups
let email_re = Regex::new(r"(\\w+)@(\\w+\\.\\w+)").unwrap(); // Matches user@domain.tld
let email_text = "Contact support@example.com or info@domain.org.";
// In the replacement string, '$1', '$2', etc., refer to captured groups.
// Here, we mask the email: username@...domain
let masked_email_text = email_re.replace_all(email_text, "$1@...$2");
println!("\nOriginal email text: \"{}\"", email_text);
println!("Masked email text ($1@...$2): \"{}\"", masked_email_text);
}
```








Regular Expressions (Regex)