I have long been a fan of using linting tools as a part of a CICD pipeline. These tools help ensure code is syntactically correct and consistently formatted.

In a team setting this also allows the machines to yell at individuals about syntax and style guidelines.

Linters for programming languages like Ruff/Black for python or Go-lint for Golang are ubiquitous but other text based formats like SQL, json, or yaml would benefit form this as well.

I was looking into these for some SQL files and found sqlfluff and started using this for linting sqlfluff lint

SQLFluff

SQLFluff is a python based linter for SQL. It supports a large number of popular SQL dialects.

Current supported Dialects

ansi:                 ANSI dialect [inherits from 'nothing']
athena:            AWS Athena dialect [inherits from 'ansi']
bigquery:     Google BigQuery dialect [inherits from 'ansi']
clickhouse:        ClickHouse dialect [inherits from 'ansi']
databricks:    Databricks dialect [inherits from 'sparksql']
db2:                  IBM Db2 dialect [inherits from 'ansi']
duckdb:            DuckDB dialect [inherits from 'postgres']
exasol:                Exasol dialect [inherits from 'ansi']
greenplum:      Greenplum dialect [inherits from 'postgres']
hive:             Apache Hive dialect [inherits from 'ansi']
impala:         Apache Impala dialect [inherits from 'hive']
mariadb:             MariaDB dialect [inherits from 'mysql']
materializ:   Materialize dialect [inherits from 'postgres']                                                       
mysql:                  MySQL dialect [inherits from 'ansi']
oracle:                Oracle dialect [inherits from 'ansi']
postgres:          PostgreSQL dialect [inherits from 'ansi']
redshift:    AWS Redshift dialect [inherits from 'postgres']
snowflake:          Snowflake dialect [inherits from 'ansi']
soql:        Salesforce Object Query Language (SOQL) dialect
                                      [inherits from 'ansi']
sparksql:    Apache Spark SQL dialect [inherits from 'ansi']
sqlite:                SQLite dialect [inherits from 'ansi']
starrocks:         StarRocks dialect [inherits from 'mysql']
teradata:            Teradata dialect [inherits from 'ansi']
trino:                  Trino dialect [inherits from 'ansi']
tsql:         Microsoft T-SQL dialect [inherits from 'ansi']
vertica:              Vertica dialect [inherits from 'ansi']

For my needs I have only been using the postgres dialect.

Configuration

SQLFluff supports configuration files through multiple formats, but for my purposes I use pyproject.toml for python projects or .sqlfluff for other projects.

The following is the basic configuration I am using.

.sqlfulff

[sqlfluff]
dialect = postgres

# The standard max_line_length is 80 
max_line_length = 120

# -1 means use all processors but one
processes = -1

Linting checks

Checking the SQL files is a simple as

sqlfluff lint src/*.sql

Fixing

Fixing the SQL files is a simple as

sqlfluff fix src/*.sql

Gotchas

If you are starting a new project/database then you won’t need to worry about the checks. But if you are running this on SQL files for an existing database there is a couple of rules that can cause issues.

Specifically I have had to disable the capitalisation.identifiers and capitalisation.literals rules

You can add these to your configuration files with exclude_rules

See rule_configuration for more information on how to tweak rules.

Rule Reference is also very useful for evaluating rules.