TypeScript Tokenizer

The TypeScript tokenizer is designed to parse TypeScript (and javascript) code and break it down into meaningful components (tokens) for processing. It provides stream-capable functionality for handling large TypeScript files or real-time code analysis.

Overview

The TypeScript tokenizer is part of the NTokenizers library and provides a stream-capable approach to parsing TypeScript code. It can process TypeScript source code in real-time, making it suitable for large files or streaming scenarios where loading everything into memory at once is impractical.

Public API

The TypeScript tokenizer inherits from BaseSubTokenizer<TypescriptToken> and provides the following key methods:

Usage Examples

Basic Usage with Stream

using NTokenizers.Typescript;
using Spectre.Console;
using System.Text;

string tsCode = """
    const user = {
        name: "Laura Smith",
        active: true
    };
    """;

using var stream = new MemoryStream(Encoding.UTF8.GetBytes(tsCode));
await TypescriptTokenizer.Create().ParseAsync(stream, onToken: token =>
{
    var value = Markup.Escape(token.Value);
    var colored = token.TokenType switch
    {
        TypescriptTokenType.Keyword => new Markup($"[blue]{value}[/]"),
        TypescriptTokenType.Identifier => new Markup($"[cyan]{value}[/]"),
        TypescriptTokenType.StringValue => new Markup($"[green]{value}[/]"),
        TypescriptTokenType.Number => new Markup($"[magenta]{value}[/]"),
        TypescriptTokenType.Operator => new Markup($"[yellow]{value}[/]"),
        TypescriptTokenType.Comment => new Markup($"[grey]{value}[/]"),
        TypescriptTokenType.Whitespace => new Markup($"[grey]{value}[/]"),
        _ => new Markup(value)
    };
    AnsiConsole.Write(colored);
});

Using with TextReader

using NTokenizers.Typescript;
using System.IO;

string tsCode = "let x: number = 42;";
using var reader = new StringReader(tsCode);
await TypescriptTokenizer.Create().ParseAsync(reader, onToken: token =>
{
    Console.WriteLine($"Token: {token.TokenType} = '{token.Value}'");
});

Parsing String Directly

using NTokenizers.Typescript;

string tsCode = "const name: string = \"John\";";
var tokens = TypescriptTokenizer.Create().Parse(tsCode);
foreach (var token in tokens)
{
    Console.WriteLine($"Token: {token.TokenType} = '{token.Value}'");
}

Use Processed stream as string

using NTokenizers.Typescript;
using System.Text;

string tsCode = "const x = 42;";
var processedString = await TypescriptTokenizer.Create().ParseAsync(tsCode, token =>
{
    return token.TokenType switch
    {
        TypescriptTokenType.Keyword => $"[blue]{token.Value}[/]",
        TypescriptTokenType.Identifier => $"[cyan]{token.Value}[/]",
        TypescriptTokenType.StringValue => $"[green]{token.Value}[/]",
        TypescriptTokenType.Number => $"[magenta]{token.Value}[/]",
        TypescriptTokenType.Operator => $"[yellow]{token.Value}[/]",
        TypescriptTokenType.Comment => $"[grey]{token.Value}[/]",
        _ => token.Value
    };
});
Console.WriteLine(processedString);

Token Types

The TypeScript tokenizer produces tokens of type TypescriptTokenType with the following token types:

More info: TypescriptTokenType.cs

See Also

"