SQL Tokenizer

The SQL tokenizer is designed to parse SQL code and break it down into meaningful components (tokens) for processing. It provides stream-capable functionality for handling large SQL files or real-time SQL statement analysis.

Overview

The SQL tokenizer is part of the NTokenizers library and provides a stream-capable approach to parsing SQL code. It can process SQL source code in real-time, making it suitable for large files or streaming scenarios where loading everything into memory at once is impractical.

Public API

The SQL tokenizer inherits from BaseSubTokenizer<SqlToken> and provides the following key methods:

Usage Examples

Basic Usage with Stream

using NTokenizers.Sql;
using Spectre.Console;
using System.Text;

string sqlCode = """
    SELECT name, age
    FROM users
    WHERE active = true
    ORDER BY name;
    """;

using var stream = new MemoryStream(Encoding.UTF8.GetBytes(sqlCode));
await SqlTokenizer.Create().ParseAsync(stream, onToken: token =>
{
    var value = Markup.Escape(token.Value);
    var colored = token.TokenType switch
    {
        SqlTokenType.Keyword => new Markup($"[blue]{value}[/]"),
        SqlTokenType.Identifier => new Markup($"[cyan]{value}[/]"),
        SqlTokenType.StringValue => new Markup($"[green]{value}[/]"),
        SqlTokenType.Number => new Markup($"[magenta]{value}[/]"),
        SqlTokenType.Operator => new Markup($"[yellow]{value}[/]"),
        SqlTokenType.Comment => new Markup($"[grey]{value}[/]"),
        SqlTokenType.Whitespace => new Markup($"[grey]{value}[/]"),
        _ => new Markup(value)
    };
    AnsiConsole.Write(colored);
});

Using with TextReader

using NTokenizers.Sql;
using System.IO;

string sqlCode = "SELECT * FROM users WHERE id = 42;";
using var reader = new StringReader(sqlCode);
await SqlTokenizer.Create().ParseAsync(reader, onToken: token =>
{
    Console.WriteLine($"Token: {token.TokenType} = '{token.Value}'");
});

Parsing String Directly

using NTokenizers.Sql;

string sqlCode = "INSERT INTO users (name) VALUES ('John');";
var tokens = SqlTokenizer.Create().Parse(sqlCode);
foreach (var token in tokens)
{
    Console.WriteLine($"Token: {token.TokenType} = '{token.Value}'");
}

Use Processed stream as string

using NTokenizers.Sql;
using System.Text;

string sqlCode = "SELECT COUNT(*) FROM users;";
var processedString = await SqlTokenizer.Create().ParseAsync(sqlCode, token =>
{
    return token.TokenType switch
    {
        SqlTokenType.Keyword => $"[blue]{token.Value}[/]",
        SqlTokenType.Identifier => $"[cyan]{token.Value}[/]",
        SqlTokenType.StringValue => $"[green]{token.Value}[/]",
        SqlTokenType.Number => $"[magenta]{token.Value}[/]",
        SqlTokenType.Operator => $"[yellow]{token.Value}[/]",
        SqlTokenType.Comment => $"[grey]{token.Value}[/]",
        _ => token.Value
    };
});
Console.WriteLine(processedString);

Token Types

The SQL tokenizer produces tokens of type SqlTokenType with the following token types:

More info: SqlTokenType.cs

See Also

"