SQLFluff
I have long been a fan of using linting tools as a part of a CICD pipeline. These tools help ensure code is syntactically correct and consistently formatted.
In a team setting this also allows the machines to yell at individuals about syntax and style guidelines.
Linters for programming languages like Ruff/Black for python or Go-lint for Golang are ubiquitous but other text based formats like SQL, json, or yaml would benefit form this as well.
I was looking into these for some SQL files and found sqlfluff and started using this for linting sqlfluff lint
SQLFluff
SQLFluff is a python based linter for SQL. It supports a large number of popular SQL dialects.
Current supported Dialects
ansi: ANSI dialect [inherits from 'nothing']
athena: AWS Athena dialect [inherits from 'ansi']
bigquery: Google BigQuery dialect [inherits from 'ansi']
clickhouse: ClickHouse dialect [inherits from 'ansi']
databricks: Databricks dialect [inherits from 'sparksql']
db2: IBM Db2 dialect [inherits from 'ansi']
duckdb: DuckDB dialect [inherits from 'postgres']
exasol: Exasol dialect [inherits from 'ansi']
greenplum: Greenplum dialect [inherits from 'postgres']
hive: Apache Hive dialect [inherits from 'ansi']
impala: Apache Impala dialect [inherits from 'hive']
mariadb: MariaDB dialect [inherits from 'mysql']
materializ: Materialize dialect [inherits from 'postgres']
mysql: MySQL dialect [inherits from 'ansi']
oracle: Oracle dialect [inherits from 'ansi']
postgres: PostgreSQL dialect [inherits from 'ansi']
redshift: AWS Redshift dialect [inherits from 'postgres']
snowflake: Snowflake dialect [inherits from 'ansi']
soql: Salesforce Object Query Language (SOQL) dialect
[inherits from 'ansi']
sparksql: Apache Spark SQL dialect [inherits from 'ansi']
sqlite: SQLite dialect [inherits from 'ansi']
starrocks: StarRocks dialect [inherits from 'mysql']
teradata: Teradata dialect [inherits from 'ansi']
trino: Trino dialect [inherits from 'ansi']
tsql: Microsoft T-SQL dialect [inherits from 'ansi']
vertica: Vertica dialect [inherits from 'ansi']
For my needs I have only been using the postgres dialect.
Configuration
SQLFluff supports configuration files through multiple formats, but for my purposes I use pyproject.toml
for python projects or .sqlfluff
for other projects.
The following is the basic configuration I am using.
.sqlfulff
[sqlfluff]
dialect = postgres
# The standard max_line_length is 80
max_line_length = 120
# -1 means use all processors but one
processes = -1
Linting checks
Checking the SQL files is a simple as
sqlfluff lint src/*.sql
Fixing
Fixing the SQL files is a simple as
sqlfluff fix src/*.sql
Gotchas
If you are starting a new project/database then you won’t need to worry about the checks. But if you are running this on SQL files for an existing database there is a couple of rules that can cause issues.
Specifically I have had to disable the capitalisation.identifiers
and capitalisation.literals
rules
You can add these to your configuration files with exclude_rules
See rule_configuration for more information on how to tweak rules.
Rule Reference is also very useful for evaluating rules.