Catalogs¶
SQL Glider can connect to remote data catalogs to pull DDL definitions for the tables referenced in your SQL. This is useful when you want to enrich lineage analysis with real schema information, validate queries against production table definitions, or simply collect DDL for documentation.
How It Works¶
The tables pull command reads your SQL, extracts the table names, then fetches the CREATE TABLE DDL for each one from a remote catalog. CTEs are automatically excluded since they don't exist in any remote catalog.
# Pull DDL for all tables referenced in a query
sqlglider tables pull query.sql --catalog-type databricks
Output:
-- Table: my_catalog.my_schema.customers
CREATE TABLE my_catalog.my_schema.customers (
customer_id BIGINT,
name STRING,
email STRING,
created_at TIMESTAMP
)
USING DELTA;
You can also write each table's DDL to a separate file:
This creates one .sql file per table in the ./ddl/ directory.
Built-in: Databricks¶
SQL Glider ships with a Databricks Unity Catalog provider. It requires the Databricks SDK as an optional dependency:
Authentication¶
The Databricks catalog uses the Databricks SDK's unified authentication, which tries the following sources in order:
- Explicit host and token from
sqlglider.toml - Environment variables (
DATABRICKS_HOST,DATABRICKS_TOKEN) - Databricks CLI profile (
~/.databrickscfg) - OAuth M2M via
DATABRICKS_CLIENT_ID/DATABRICKS_CLIENT_SECRET - Azure CLI or Google Cloud auth for cloud-hosted workspaces
A warehouse ID is always required. Set it in config or via the DATABRICKS_WAREHOUSE_ID environment variable.
Configuration¶
Configure Databricks in sqlglider.toml:
[sqlglider]
catalog_type = "databricks"
[sqlglider.catalog.databricks]
warehouse_id = "abc123def456"
profile = "my-workspace" # optional: Databricks CLI profile
host = "https://my.databricks.com" # optional if using profile or env vars
token = "dapi..." # optional: prefer OAuth or profile instead
Or use environment variables:
export DATABRICKS_HOST="https://my.databricks.com"
export DATABRICKS_TOKEN="dapi..."
export DATABRICKS_WAREHOUSE_ID="abc123def456"
Usage¶
# Pull DDL to stdout
sqlglider tables pull query.sql -c databricks
# Pull DDL to folder
sqlglider tables pull query.sql -c databricks -o ./ddl/
# Combine with templating
sqlglider tables pull query.sql -c databricks --templater jinja --var schema=prod
# From stdin
echo "SELECT * FROM my_catalog.my_schema.users" | sqlglider tables pull -c databricks
# List available catalog providers
sqlglider tables pull --list
Writing a Custom Catalog Provider¶
You can create your own catalog provider as a Python package and register it as a plugin.
1. Implement the Catalog Class¶
Subclass sqlglider.catalog.base.Catalog and implement three methods:
from typing import Any, Dict, List, Optional
from sqlglider.catalog.base import Catalog, CatalogError
class SnowflakeCatalog(Catalog):
@property
def name(self) -> str:
return "snowflake"
def configure(self, config: Optional[Dict[str, Any]] = None) -> None:
"""Set up connection details from config or environment."""
config = config or {}
self._account = config.get("account")
self._warehouse = config.get("warehouse")
# Validate required settings
if not self._account:
raise CatalogError("Snowflake account is required")
def get_ddl(self, table_name: str) -> str:
"""Fetch DDL for a single table."""
try:
# Connect and run: GET_DDL('TABLE', table_name)
return ddl_string
except Exception as e:
raise CatalogError(f"Failed to fetch DDL for {table_name}: {e}") from e
def get_ddl_batch(self, table_names: List[str]) -> Dict[str, str]:
"""Fetch DDL for multiple tables.
Return a dict mapping table names to DDL strings.
For tables that fail, prefix the value with "ERROR: ".
"""
results: Dict[str, str] = {}
for table in table_names:
try:
results[table] = self.get_ddl(table)
except CatalogError as e:
results[table] = f"ERROR: {e}"
return results
Key points:
name— the identifier users pass to--catalog-typeconfigure()— called after instantiation with settings fromsqlglider.tomland CLI; validate required config hereget_ddl()— fetch DDL for one table; raiseCatalogErroron failureget_ddl_batch()— fetch DDL for many tables; prefix failures with"ERROR: "so the batch continues
2. Register via Entry Points¶
In your package's pyproject.toml:
3. Add Configuration Support¶
Users can configure your catalog in sqlglider.toml. The config section name matches the catalog name:
[sqlglider]
catalog_type = "snowflake"
[sqlglider.catalog.snowflake]
account = "xy12345.us-east-1"
warehouse = "COMPUTE_WH"
Note
For the config section to be loaded automatically, the core ConfigSettings model needs to know about your provider. For third-party plugins, users can also pass configuration through environment variables in your configure() method.
4. Use It¶
After installing your package: