Expand description
Compile-time regular expression validation and parsing.
This library provides compile-time validation and parsing for regular expressions.
It has only a lightweight compile-fmt dependency (to produce better panic messages)
and is no-std / no-alloc compatible. Unlike some alternatives, it does not wrap a proc macro.
The library strives to be compatible with regex / regex-syntax crates; it applies
the same approach to parsing as the latter. It only implements parsing / validation; i.e.,
it does not produce automata for matching against a regex. On the other hand, almost all of
regex syntax is supported:
- Whitespace control (
x/-xflags) are correctly accounted for during parsing - Duplicate capture names are correctly checked for, so e.g.
(?<t>.)(?<t>.)is invalid. - Counted repetition ranges are checked, so e.g.
.{2,1}is invalid. - Char ranges in char sets are checked, so e.g.
[9-0]is invalid.
§Why?
The main use case is checking whether a particular string constitutes a valid regex so that
it can be supplied to a Regex constructor, e.g. via a LazyLock.
Ultimately, it’s a benchmark of how far one can take compile-time computations in Rust just by using
a bunch of const fns. As it turns out, it can get you pretty far.
§Limitations
- Unicode classes (
\pand\Pescapes) are not supported since it’s almost impossible to check these in compile time. - The depth of group nesting is limited to 8. (Nesting is used internally for whitespace control, i.e., the
xflag.) - The number of named captures is limited to 16.
§Alternatives / similar tools
- Use
regexorregex-syntaxif you don’t need compile-time validation / parsing. - There are a couple of crates that use
regex+ proc macro to move regex validation to compile time, for example,regex_static. ereparses and compiles regular expressions in compile time. It supports POSIX extended regexes (i.e., a strict subset of whatregexsupports), and still uses proc macros.
§Crate features
§alloc
(On by default)
Enables support of alloc types, such as Vec in RegexOptions::try_parse_to_vec().
§std
(On by default)
Enables support of the standard library types, e.g. the Error trait implementation
for Error.
§Examples
use compile_regex::{ast, parse, validate};
// Validate a regex for phone numbers.
const _: () = validate(r"(?<code>\+1\s*)?\(\d{3}\)\d{3}-\d{4}");
// Parse the same regex with whitespace and additional named captures
const PHONE_REGEX: &str = r"(?x)
(?<intl> \+1\s*)? # International prefix
(?<city> \( \d{3} \)) # City code
\s*
(?<num> \d{3}-\d{4})";
const SYNTAX: &[ast::Spanned] = parse!(PHONE_REGEX);
println!("{SYNTAX:#?}");
// Get all named groups in the regex.
let group_names = SYNTAX.iter().filter_map(|spanned| {
if let ast::Node::GroupStart { name: Some(name), .. } = &spanned.node {
return Some(&PHONE_REGEX[name.name]);
}
None
});
let group_names: Vec<_> = group_names.collect();
assert_eq!(group_names, ["intl", "city", "num"]);§Errors
If the validate() function or the parse! macro fail, they raise a compile-time error:
// Fails because '+' is not escaped and is thus treated
// as a one-or-more quantifier
const _: () = validate(r"(?<code>+1\s*)?");Getting information about an error:
use compile_regex::{try_validate, Error, ErrorKind};
const ERR: Error = match try_validate(r"(?<code>+1\s*)?") {
Ok(_) => panic!("validation succeeded"),
Err(err) => err,
};
assert_matches!(ERR.kind(), ErrorKind::MissingRepetition);
assert_eq!(ERR.pos(), 8..9);§See also
See RegexOptions docs for more advanced use cases.
Modules§
- ast
- AST definitions. Because of
const-ness, we support only very primitive AST spanning.
Macros§
- parse
- Produces spanned syntax nodes for the provided regex. The regex must be a constant expression (but not necessarily a string literal).
Structs§
- Error
- Error when parsing / validating regular expressions.
- Regex
Options - Regular expression parsing options.
- Stack
- Bounded-capacity stack with
const fnoperations. Used to store syntax spans viaSyntaxtype alias. - Validation
Output - Result of validating a regular expression.
Enums§
Functions§
- try_
validate - Tries to validate the provided regular expression with the default options.
- validate
- Validates the provided regular expression, panicking on errors. This is a shortcut for
try_validate().unwrap().