Crate compile_regex

Crate compile_regex 

Source
Expand description

Compile-time regular expression validation and parsing.

This library provides compile-time validation and parsing for regular expressions. It has only a lightweight compile-fmt dependency (to produce better panic messages) and is no-std / no-alloc compatible. Unlike some alternatives, it does not wrap a proc macro.

The library strives to be compatible with regex / regex-syntax crates; it applies the same approach to parsing as the latter. It only implements parsing / validation; i.e., it does not produce automata for matching against a regex. On the other hand, almost all of regex syntax is supported:

  • Whitespace control (x / -x flags) are correctly accounted for during parsing
  • Duplicate capture names are correctly checked for, so e.g. (?<t>.)(?<t>.) is invalid.
  • Counted repetition ranges are checked, so e.g. .{2,1} is invalid.
  • Char ranges in char sets are checked, so e.g. [9-0] is invalid.

§Why?

The main use case is checking whether a particular string constitutes a valid regex so that it can be supplied to a Regex constructor, e.g. via a LazyLock.

Ultimately, it’s a benchmark of how far one can take compile-time computations in Rust just by using a bunch of const fns. As it turns out, it can get you pretty far.

§Limitations

  • Unicode classes (\p and \P escapes) are not supported since it’s almost impossible to check these in compile time.
  • The depth of group nesting is limited to 8. (Nesting is used internally for whitespace control, i.e., the x flag.)
  • The number of named captures is limited to 16.

§Alternatives / similar tools

  • Use regex or regex-syntax if you don’t need compile-time validation / parsing.
  • There are a couple of crates that use regex + proc macro to move regex validation to compile time, for example, regex_static.
  • ere parses and compiles regular expressions in compile time. It supports POSIX extended regexes (i.e., a strict subset of what regex supports), and still uses proc macros.

§Crate features

§alloc

(On by default)

Enables support of alloc types, such as Vec in RegexOptions::try_parse_to_vec().

§std

(On by default)

Enables support of the standard library types, e.g. the Error trait implementation for Error.

§Examples

use compile_regex::{ast, parse, validate};

// Validate a regex for phone numbers.
const _: () = validate(r"(?<code>\+1\s*)?\(\d{3}\)\d{3}-\d{4}");
// Parse the same regex with whitespace and additional named captures
const PHONE_REGEX: &str = r"(?x)
    (?<intl> \+1\s*)? # International prefix
    (?<city> \( \d{3} \)) # City code
    \s*
    (?<num> \d{3}-\d{4})";
const SYNTAX: &[ast::Spanned] = parse!(PHONE_REGEX);

println!("{SYNTAX:#?}");

// Get all named groups in the regex.
let group_names = SYNTAX.iter().filter_map(|spanned| {
    if let ast::Node::GroupStart { name: Some(name), .. } = &spanned.node {
        return Some(&PHONE_REGEX[name.name]);
    }
    None
});
let group_names: Vec<_> = group_names.collect();
assert_eq!(group_names, ["intl", "city", "num"]);

§Errors

If the validate() function or the parse! macro fail, they raise a compile-time error:

// Fails because '+' is not escaped and is thus treated
// as a one-or-more quantifier
const _: () = validate(r"(?<code>+1\s*)?");

Getting information about an error:

use compile_regex::{try_validate, Error, ErrorKind};

const ERR: Error = match try_validate(r"(?<code>+1\s*)?") {
    Ok(_) => panic!("validation succeeded"),
    Err(err) => err,
};

assert_matches!(ERR.kind(), ErrorKind::MissingRepetition);
assert_eq!(ERR.pos(), 8..9);

§See also

See RegexOptions docs for more advanced use cases.

Modules§

ast
AST definitions. Because of const-ness, we support only very primitive AST spanning.

Macros§

parse
Produces spanned syntax nodes for the provided regex. The regex must be a constant expression (but not necessarily a string literal).

Structs§

Error
Error when parsing / validating regular expressions.
RegexOptions
Regular expression parsing options.
Stack
Bounded-capacity stack with const fn operations. Used to store syntax spans via Syntax type alias.
ValidationOutput
Result of validating a regular expression.

Enums§

ErrorKind
Kind of a regex validation Error.

Functions§

try_validate
Tries to validate the provided regular expression with the default options.
validate
Validates the provided regular expression, panicking on errors. This is a shortcut for try_validate().unwrap().