C# Tokenizer

The C# tokenizer is designed to parse C# code and break it down into meaningful components (tokens) for processing. It provides stream-capable functionality for handling large code files or real-time code analysis.

Overview

The C# tokenizer is part of the NTokenizers library and provides a stream-capable approach to parsing C# code. It can process C# source code in real-time, making it suitable for large files or streaming scenarios where loading everything into memory at once is impractical.

Public API

The C# tokenizer inherits from BaseTokenizer<CSharpToken> and provides the following key methods:

Usage Examples

Basic Usage with Stream

using NTokenizers.CSharp;
using Spectre.Console;
using System.Text;

string csharpCode = """
    var user = new { Name = "Laura Smith",
        Active = true };
    """;

using var stream = new MemoryStream(Encoding.UTF8.GetBytes(csharpCode));
await CSharpTokenizer.Create().ParseAsync(stream, onToken: token =>
{
    var value = Markup.Escape(token.Value);
    var colored = token.TokenType switch
    {
        CSharpTokenType.Keyword => new Markup($"[blue]{value}[/]"),
        CSharpTokenType.Identifier => new Markup($"[cyan]{value}[/]"),
        CSharpTokenType.StringValue => new Markup($"[green]{value}[/]"),
        CSharpTokenType.Number => new Markup($"[magenta]{value}[/]"),
        CSharpTokenType.Operator => new Markup($"[yellow]{value}[/]"),
        CSharpTokenType.Comment => new Markup($"[grey]{value}[/]"),
        CSharpTokenType.Whitespace => new Markup($"[grey]{value}[/]"),
        _ => new Markup(value)
    };
    AnsiConsole.Write(colored);
});

Using with TextReader

using NTokenizers.CSharp;
using System.IO;

string csharpCode = "int x = 42;";
using var reader = new StringReader(csharpCode);
await CSharpTokenizer.Create().ParseAsync(reader, onToken: token =>
{
    Console.WriteLine($"Token: {token.TokenType} = '{token.Value}'");
});

Parsing String Directly

using NTokenizers.CSharp;

string csharpCode = "string name = \"John\";";
var tokens = CSharpTokenizer.Create().Parse(csharpCode);
foreach (var token in tokens)
{
    Console.WriteLine($"Token: {token.TokenType} = '{token.Value}'");
}

Use Processed stream as string

using NTokenizers.CSharp;
using System.Text;

string csharpCode = "int x = 42;";
var processedString = await CSharpTokenizer.Create().ParseAsync(csharpCode, token =>
{
    return token.TokenType switch
    {
        CSharpTokenType.Keyword => $"[blue]{token.Value}[/]",
        CSharpTokenType.Identifier => $"[cyan]{token.Value}[/]",
        CSharpTokenType.StringValue => $"[green]{token.Value}[/]",
        CSharpTokenType.Number => $"[magenta]{token.Value}[/]",
        CSharpTokenType.Operator => $"[yellow]{token.Value}[/]",
        CSharpTokenType.Comment => $"[grey]{token.Value}[/]",
        CSharpTokenType.Whitespace => $"[grey]{token.Value}[/]",
        _ => token.Value
    };
});
Console.WriteLine(processedString);

Token Types

The C# tokenizer produces tokens of type CSharpTokenType with the following token types:

More info: CSharpTokenType.cs

See Also

"