Rust Tokenizer

The Rust tokenizer provides streaming, character-by-character parsing of Rust source code. It supports modern Rust features including async/await, lifetimes, pattern matching, and macros.

Supported Token Types

Token Type Description
Operator Arithmetic, bitwise, comparison, logical, and compound assignment operators
OpenParenthesis / CloseParenthesis ( and )
OpenBrace / CloseBrace { and }
OpenBracket / CloseBracket [ and ]
Comma ,
Dot .
Arrow ->
FatArrow =>
SequenceTerminator ;
Colon / DoubleColon : and ::
At @
Pound #
QuestionMark ?
StringValue String literals including raw strings
CharValue Character literals
Number Integer, float, hex, octal, binary with suffixes
Boolean true, false
Identifier Variable and function names
Keyword Rust keywords
Macro Macro invocations
Lifetime 'a, 'static
Comment Line and block comments
Whitespace Whitespace characters

Usage

using NTokenizers.Rust;

var tokenizer = RustTokenizer.Create();
var tokens = await tokenizer.Parse(code);

foreach (var token in tokens)
{
    Console.WriteLine($"[{token.TokenType}] {token.Value}");
}

Features

Markdown Integration

using NTokenizers.Markdown;

var tokenizer = MarkdownTokenizer.Create();
var tokens = await tokenizer.Parse(markdownWithRustCode);

The Rust tokenizer is automatically used for code blocks marked with rust or rs.

"