table of contents
SQLFLUFF(1) | SQLFluff | SQLFLUFF(1) |
NAME¶
sqlfluff - SQLFluff stable_version
Bored of not having a good SQL linter that works with whichever dialect you're working with? Fluff is an extensible and modular linter designed to help you write good SQL and catch errors and bad SQL before it hits your database.
Notable releases:
- 1.0.x: First stable release, no major changes to take advantage of a point of relative stability.
- 2.0.x: Recode of rules, whitespace fixing consolidation, sqlfluff format and removal of support for dbt versions pre 1.1. Note, that this release brings with it some breaking changes to rule coding and configuration, see Upgrading from 1.x to 2.0.
- 3.0.x: sqlfluff fix now defaults to not asking for confirmation and the --force option was removed. Richer information returned by the sqlfluff lint command (although in a different structure to previous versions). See Upgrading to 3.x.
For more detail on other releases, see our Release Notes.
Want to see where and how people are using SQLFluff in their projects? Head over to SQLFluff in the Wild for inspiration.
GETTING STARTED¶
To get started just install the package, make a sql file and then run SQLFluff and point it at the file. For more details or if you don't have python or pip already installed see Getting Started.
$ pip install sqlfluff $ echo " SELECT a + b FROM tbl; " > test.sql $ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | LT01 | Expected only single space before 'SELECT' keyword.
| Found ' '. [layout.spacing] L: 1 | P: 1 | LT02 | First line should not be indented.
| [layout.indent] L: 1 | P: 1 | LT13 | Files must not begin with newlines or whitespace.
| [layout.start_of_file] L: 1 | P: 11 | LT01 | Expected only single space before binary operator '+'.
| Found ' '. [layout.spacing] L: 1 | P: 14 | LT01 | Expected only single space before naked identifier.
| Found ' '. [layout.spacing] L: 1 | P: 27 | LT01 | Unnecessary trailing whitespace at end of file.
| [layout.spacing] L: 1 | P: 27 | LT12 | Files must end with a single trailing newline.
| [layout.end_of_file] All Finished 📜 🎉!
CONTENTS¶
Getting Started¶
To get started with SQLFluff you'll need python and pip installed on your machine, if you're already set up, you can skip straight to Installing sqlfluff.
Installing Python¶
How to install python and pip depends on what operating system you're using. In any case, the python wiki provides up to date instructions for all platforms here.
There's a chance that you'll be offered the choice between python versions. Support for python 2 was dropped in early 2020, so you should always opt for a version number starting with a 3. As for more specific options beyond that, SQLFluff aims to be compatible with all current python versions, and so it's best to pick the most recent.
You can confirm that python is working as expected by heading to your terminal or console of choice and typing python --version which should give you a sensible read out and not an error.
$ python --version Python 3.9.1
For most people, their installation of python will come with pip (the python package manager) preinstalled. To confirm this you can type pip --version similar to python above.
$ pip --version pip 21.3.1 from ...
If however, you do have python installed but not pip, then the best instructions for what to do next are on the python website.
Installing SQLFluff¶
Assuming that python and pip are already installed, then installing SQLFluff is straight forward.
$ pip install sqlfluff
You can confirm its installation by getting SQLFluff to show its version number.
$ sqlfluff version 3.2.5
Basic Usage¶
To get a feel for how to use SQLFluff it helps to have a small .sql file which has a simple structure and some known issues for testing. Create a file called test.sql in the same folder that you're currently in with the following content:
SELECT a+b AS foo, c AS bar from my_table
You can then run sqlfluff lint test.sql --dialect ansi to lint this file.
$ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | LT09 | Select targets should be on a new line unless there is
| only one select target.
| [layout.select_targets] L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations
| and aggregates. [structure.column_order] L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
| [layout.indent] L: 1 | P: 9 | LT01 | Expected single whitespace between naked identifier and
| binary operator '+'. [layout.spacing] L: 1 | P: 10 | LT01 | Expected single whitespace between binary operator '+'
| and naked identifier. [layout.spacing] L: 1 | P: 11 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'from'.
| [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords] All Finished 📜 🎉!
You'll see that SQLFluff has failed the linting check for this file. On each of the following lines you can see each of the problems it has found, with some information about the location and what kind of problem there is. One of the errors has been found on line 1, position * (as shown by :code:`L: 1 | P: 9`) and it's a problem with rule *LT01 (for a full list of rules, see Rules Reference). From this (and the following error) we can see that the problem is that there is no space either side of the + symbol in a+b. Head into the file, and correct this issue so that the file now looks like this:
SELECT a + b AS foo, c AS bar from my_table
Rerun the same command as before, and you'll see that the original error (violation of LT01) no longer shows up.
$ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | LT09 | Select targets should be on a new line unless there is
| only one select target.
| [layout.select_targets] L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations
| and aggregates. [structure.column_order] L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
| [layout.indent] L: 1 | P: 13 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'from'.
| [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords]
To fix the remaining issues, we're going to use one of the more advanced features of SQLFluff, which is the fix command. This allows more automated fixing of some errors, to save you time in sorting out your sql files. Not all rules can be fixed in this way and there may be some situations where a fix may not be able to be applied because of the context of the query, but in many simple cases it's a good place to start.
For now, we only want to fix the following rules: LT02, LT12, CP01
$ sqlfluff fix test.sql --rules LT02,LT12,CP01 --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
| [layout.indent] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'FROM'.
| [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords] ==== fixing violations ==== 4 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n]
...at this point you'll have to confirm that you want to make the changes by pressing y on your keyboard...
Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm.
If we now open up test.sql, we'll see the content is now different.
SELECT
a + b AS foo,
c AS bar FROM my_table
In particular:
- The two columns have been indented to reflect being inside the SELECT statement.
- The FROM keyword has been capitalised to match the other keywords.
We could also fix all of the fixable errors by not specifying --rules.
$ sqlfluff fix test.sql --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations
| and aggregates. [structure.column_order] L: 2 | P: 10 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing] ==== fixing violations ==== 2 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm.
If we now open up test.sql, we'll see the content has been updated again.
SELECT
c AS bar,
a + b AS foo FROM my_table
The SQL statement is now well formatted according to all the rules defined in SQLFluff.
The --rules argument is optional, and could be useful when you or your organisation follows a slightly different convention than what we have defined.
Custom Usage¶
So far we've covered the stock settings of SQLFluff, but there are many different ways that people style their sql, and if you or your organisation have different conventions, then many of these behaviours can be configured. For example, given the example above, what if we actually think that indents should only be two spaces, and rather than uppercase keywords, they should all be lowercase?
To achieve this we create a configuration file named .sqlfluff and place it in the same directory as the current file. In that file put the following content:
[sqlfluff] dialect = ansi [sqlfluff:indentation] tab_space_size = 2 [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower
Then rerun the same command as before.
$ sqlfluff fix test.sql --rules LT02,LT12,CP01,ST06,LT09,LT01
Then examine the file again, and you'll notice that the file has been fixed accordingly.
select
c as bar,
a + b as foo from my_table
For a full list of configuration options check out Default Configuration. Note that in our example here we've only set a few configuration values and any other configuration settings remain as per the default config. To see how these options apply to specific rules check out the "Configuration" section within each rule's documentation in Rules Reference.
Going further¶
From here, there are several more things to explore.
- To understand how SQLFluff is interpreting your file explore the parse command. You can learn more about that command and more by running sqlfluff --help or sqlfluff parse --help.
- To start linting more than just one file at a time, experiment with passing SQLFluff directories rather than just single files. Try running sqlfluff lint . (to lint every sql file in the current folder) or sqlfluff lint path/to/my/sqlfiles.
- To find out more about which rules are available, see Rules Reference.
- To find out more about configuring SQLFluff and what other options are available, see Configuration.
- Once you're ready to start using SQLFluff on a project or with the rest of your team, check out Production Usage & Security.
One last thing to note is that SQLFluff is a relatively new project and you may find bugs or strange things while using it. If you do find anything, the most useful thing you can do is to post the issue on GitHub where the maintainers of the project can work out what to do with it. The project is in active development and so updates and fixes may come out regularly.
Why SQLFluff?¶
SQL has been around for a long time, as a language for communicating with databases, like a communication protocol. More recently with the rise of data as a business function, or a domain in its own right SQL has also become an invaluable tool for defining the structure of data and analysis - not just as a one off but as a form of infrastructure as code.
As analytics transitions from a profession of people doing one-offs, and moves to building stable and reusable pieces of analytics, more and more principles from software engineering are moving in the analytics space. One of the best articulations of this is written in the viewpoint section of the docs for the open-source tool dbt. Two of the principles mentioned in that article are quality assurance and modularity.
Quality assurance¶
The primary aim of SQLFluff as a project is in service of that first aim of quality assurance. With larger and larger teams maintaining large bodies of SQL code, it becomes more and more important that the code is not just valid but also easily comprehensible by other users of the same codebase. One way to ensure readability is to enforce a consistent style, and the tools used to do this are called linters.
Some famous linters which are well known in the software community are flake8 and jslint (the former is used to lint the SQLFluff project itself).
SQLFluff aims to fill this space for SQL.
Modularity¶
SQL itself doesn't lend itself well to modularity, so to introduce some flexibility and reusability it is often templated. Typically this is done in the wild in one of the following ways:
- 1.
- Using the limited inbuilt templating abilities of a programming language directly. For example in python this would be using the format string syntax:
"SELECT {foo} FROM {tbl}".format(foo="bar", tbl="mytable")
Which would evaluate to:
SELECT bar FROM mytable
- 2.
- Using a dedicated templating library such as jinja2. This allows a lot more flexibility and more powerful expressions and macros. See the Templating Configuration section for more detail on how this works.
- •
- Often there are tools like dbt or apache airflow which allow templated sql to be used directly, and they will implement a library like jinja2 under the hood themselves.
All of these templating tools are great for modularity but they also mean that the SQL files themselves are no longer valid SQL code, because they now contain these configured placeholder values, intended to improve modularity.
SQLFluff supports both of the templating methods outlined above, as well as dbt projects, to allow you to still lint these "dynamic" SQL files as part of your CI/CD pipeline (which is great 🙌), rather than waiting until you're in production (which is bad 🤦, and maybe too late).
During the CI/CD pipeline (or any time that we need to handle templated code), SQLFluff needs additional info in order to interpret your templates as valid SQL code. You do so by providing dummy parameters in SQLFluff configuration files. When substituted into the template, these values should evaluate to valid SQL (so SQLFluff can check its style, formatting, and correctness), but the values don't need to match actual values used in production. This means that you can use much simpler dummy values than what you would really use. The recommendation is to use the simplest possible dummy value that still allows your code to evaluate to valid SQL so that the configuration values can be as streamlined as possible.
Vision for SQLFluff¶
SQLFluff has a few components:
- 1.
- A generic parser for SQL which aims to be able to unify SQL written in different dialects into a comparable format. The parser.
- 2.
- A mechanism for measuring written SQL against a set of rules, with the added ability to fix any violations found. The linter.
- 3.
- An opinionated set of guidelines for how SQL should be structured and formatted. The rules.
The core vision [1] for SQLFluff is to be really good at being the linter. The reasoning for this is outlined in Why SQLFluff?.
Most of the codebase for SQLFluff is the parser, mostly because at the point of developing SQLFluff, there didn't appear to be a good option for a whitespace-aware parser that could be used instead.
With regards to the rules, SQLFluff aims to be opinionated but it also accepts that many organisations and groups have pre-existing strong conventions around how to write SQL and so ultimately SQLFluff should be flexible enough to support whichever rule set a user wishes to.
Notes
- [1]
- Credit to this article for highlighting the importance of a good vision.
Guides & How-tos¶
This section is of short guides and articles is meant to be read alongside the rest of the documentation which is more reference-oriented.
Setting up SQLFluff¶
Rolling out SQLFluff with a new team¶
Rolling out SQLFluff, like rolling out any other linter or style guide, is not just about the technical rollout, but also how you introduce the tool to the team and organisation around you.
With that in mind, it's worth reminding ourselves what we're trying to achieve with a tool like this. A set of potential success criteria might be:
- 1.
- Faster comprehension and collaboration by the team on a shared codebase. This includes more effective (and more enjoyable) code review on top of code which is easy to review and build upon.
- 2.
- Easier and faster onboarding for new team members. By adopting a style which is clean and consistent with other organisations we make it easier for new people to join the team.
- 3.
- Improved adoption of shared SQL from other sources. If the SQL found in open source projects is easy to read and looks familiar then you're more likely to use it. This means more reusable code across the industry.
- 4.
- Productive discussions around style. By defining your
organisation's style guide in code, it means you can version control it,
discuss changes and ultimately give a concrete output to discussions over
style.
You like leading commas? Make a PR to .sqlfluff and let's discuss with the team what the implications would be.
Consider which of these success measures is most important and most desirable for your team. Write that down.
The following steps are a guide, which you should adapt to your organisation, and in particular its level of data maturity.
1. Assess the situation¶
This step is done by you, or a small group of people who already think that linting is a good idea.
- Run sqlfluff lint on your project with the stock configuration to find out how things work out of the box.
- Set up your Configuration so that things run and that you can get a readout of the errors which you would want the team to see and not the ones you don't. Great tools for this are to use .sqlfluffignore, --exclude-rules or --ignore in the CLI (see CLI Reference).
- Identify which areas of your project are the worst and which are the tidiest. In particular, any areas which are particularly tidy already will be particularly useful in the next phase.
2. Make a plan¶
There are three sensible rollout phases:
- 1.
- Pre CI/CD.
- 2.
- Soft CI/CD (warnings but no strict fails).
- 3.
- Hard CI/CD (violations mean deployments fail).
In each of these phases you have three levers to play with:
- 1.
- Areas of the project in which to apply rules.
- 2.
- Depth of rules enforced (this might also include whether to ignore parsing errors or not).
- 3.
- Whether to just lint changes (Using SQLFluff on changes using diff-quality), or to lint all the existing code as well.
Work out a sensible roadmap of how hard you want to go in each phase. Be clear who is responsible for changes at each phase. An example plan might look like this:
- 1.
- Pre CI/CD we get the tidiest area of a project to a stage that it fully passes the rules we eventually want to enforce. The core project team will do this. Liberal use of sqlfluff fix can be a lifesaver in this phase.
- 2.
- Soft CI/CD is applied to the whole project, team members are encouraged to write tidy SQL, but not required to.
- 3.
- Hard CI/CD is applied to the tidy areas of the project and also to any changes to the whole project. Anyone making changes is required to write SQL which passes check.
- 4.
- Hard CI/CD is applied to the whole project on not just changes, with only a few particularly problematic files explicitly ignored using .sqlfluffignore.
3. Build the need¶
Bring your team together to introduce both linting as a concept and also SQLFluff as a tool. At this stage it's really important that the team understand *why* this is a good thing.
Consider whether to discuss the whole plan from step 2, or whether to only talk about the first few steps. Aim to make this an empowering experience that everyone can get involved with rather than another piece of admin they need to do.
At this stage, you might also want to consider other tools in the SQLFluff ecosystem such as the SQLFluff pre-commit hook and the SQLFluff VSCode plugin or SQLFluff online formatter.
4. Do, Review & Reassess¶
Once the plan is in motion, make sure to start putting in place norms and rituals around how you change the rules. In particular:
- How would someone suggest changing the style guide or enabling/disabling a rule?
- How do we assess whether the changes are working for the team or whether some are creating unnecessary stress?
It's normal for your usage of tools like SQLFluff to change and evolve over time. It's important to expect this change in advance, and welcome it when it happens. Always make sure you're driving toward the success measures you decided up front, rather than just resisting the change.
5. Spread the word 😁¶
Did it work? If so, spread the word. Tell a friend about SQLFluff.
If you're lucky they might share your views on comma placement 🤷♀️.
Developing Custom Rules¶
It's quite common to have organisation-, or project-specific norms and conventions you might want to enforce using SQLFluff. With a little bit of python knowledge this is very achievable with SQLFluff, and there's a plugin architecture to support that.
This guide should be read alongside the code for the SQLFluff example plugin and the more technical documentation for Developing Plugins.
What Plugin do I need?¶
When thinking about developing a rule, the following thought process will help you decide what to develop:
- 1.
- When do I want this rule to show a warning, when should it definitely not show one? What information do I need when evaluating whether a the rule has been followed or not? This information will tell you about the two important locations in the parse tree which will become important.
- The trigger location: i.e. when should the rule be called for evaluation. e.g. CP01 triggers on keywords, because it only needs the information about that keyword to run, but LT08 triggers on WITH statements even though it's only interested in specific pieces of whitespace, because it needs the full context of the statement to evaluate. You may with to examine the parse structure of some example queries you'd want to handle by using sqlfluff parse my_file.sql to identify the right segment. This is then specified using the crawl_behaviour attribute on the rule.
- The anchor location: i.e. which position will show up in the CLI readout back to the user. To continue the example of above, while LT08 triggers on a WITH statement, it anchors on a more specific segment just after where it expected whitespace. It specifies this using the anchor argument to the LintResult object.
- 2.
- How should the rule evaluate and should I implement an auto-fix? For the simplest rules, it the logic to evaluate whether there's an issue can be very simple. For example in the SQLFluff example plugin, we are just checking the name of an element isn't in a configured list. Typically we recommend that for organisation-specific rules, KEEP IT SIMPLE. Some of the rules bundled with SQLFluff contain a lot of complexity for handling how to automatically fix lots of edge cases, but for your organisation it's probably not worth the overhead unless you're a very big team or come across a huge amount of poorly formatted SQL.
- Consider the information not just to trigger, but also whether a custom error message would be appropriate and how to get the information to construct that too. The default error message will be the first line of the rule docstring. Custom messages can be configured by setting the description argument of the LintResult object.
- Do use the existing SQLFluff core rules as examples of what is possible and how to achieve various things - but remember that many of them implement a level of complexity and edge case handling which may not be necessary for your organisation.
- 3.
- How am I going to roll out my rule to the team? Thinking through this aspect of rule development is just as important as the technical aspect. Spending a lot of time on rule development for it to be rejected by the end users of it is both a waste of time and also counterproductive.
- Consider manually fixing any pre-existing issues in your project which would trigger the rule before rollout.
- Seek consensus on how strictly the rule will be enforced and what the step by step pathway is to strict enforcement.
- Consider beta-testing your new rule with a smaller group of users who are more engaged with SQLFluff or code quality in general.
Plugin Discovery¶
One of most common questions asked with respect to custom plugins is discovery, or "how do I tell SQLFluff where my plugin is". SQLFluff uses pluggy as it's plugin architecture (developed by the folks at pytest). Pluggy uses the python packaging metadata for plugin discovery. This means that your plugin must be installed as a python package for discovery. Specifically, it must define an entry point for SQLFluff. When SQLFluff runs, it inspects installed python packages for this entry point and then can run any which define one. For example you'll see in the SQLFluff example plugin that the pyproject.toml file has the following section:
[project.entry-points.sqlfluff] # Change this name in your plugin, e.g. company name or plugin purpose. sqlfluff_example = "sqlfluff_plugin_example"
You can find equivalent examples for setup.cfg and setup.py in the python docs for entry point. This information is registered on install of your plugin, (i.e. when running pip install, or equivalent if you're using a different package manager) so if you change it later, you may need to re-install your plugin.
You can test whether your rule has been successfully discovered by running sqlfluff rules and reviewing whether your new rule has been included in the readout.
NOTE:
- 1.
- Copy the code from the SQLFluff example plugin into a local folder.
- 2.
- Run pip install -e /path/to/where/you/put/it.
- 3.
- Run sqlfluff rules, to confirm that the example plugin is present to demonstrate to yourself that discovery is functional.
- 4.
- Then edit the example plugin to do what you want now that discovery isn't an issue. You may have to re-run pip install ... if you change anything in the rule metadata (like the entry point, filenames or plugin location).
Troubleshooting SQLFluff¶
How to Troubleshoot SQLFluff¶
It can at times be tricky to troubleshoot SQLFluff as it exists within an ecosystem of other tools, and can be deployed in wide range of ways.
This step by step guide can help you narrow down what's likely going wrong and point you toward the swiftest resolution.
1. Common Errors¶
There are a few error messages you may get which have relatively straightforward resolution paths.
Parsing Errors¶
SQLFluff needs to be able to parse your SQL to understand it's structure. That means if it fails to parse the SQL it will give you an error message. The intent is that if SQLFluff cannot parse the SQL, then it should mean the SQL is also invalid and help you understand where and why.
For example, this is a simple query which is not valid SQL:
select 1 2 3 from my_table
When running sqlfluff lint or sqlfluff parse we get the following error message:
==== parsing violations ==== L: 1 | P: 10 | PRS | Line 1, Position 10: Found unparsable section: '2 3'
Furthermore if we look at the full parsing output we can see an unparsable section in the parse tree:
[L: 1, P: 1] |file: [L: 1, P: 1] | statement: [L: 1, P: 1] | select_statement: [L: 1, P: 1] | select_clause: [L: 1, P: 1] | keyword: 'select' [L: 1, P: 7] | [META] indent: [L: 1, P: 7] | whitespace: ' ' [L: 1, P: 8] | select_clause_element: [L: 1, P: 8] | numeric_literal: '1' [L: 1, P: 9] | [META] dedent: [L: 1, P: 9] | whitespace: ' ' [L: 1, P: 10] | unparsable: !! Expected: 'Nothing here.' [L: 1, P: 10] | numeric_literal: '2' [L: 1, P: 11] | whitespace: ' ' [L: 1, P: 12] | numeric_literal: '3' [L: 1, P: 13] | newline: '\n' [L: 2, P: 1] | from_clause: [L: 2, P: 1] | keyword: 'from' [L: 2, P: 5] | whitespace: ' ' [L: 2, P: 6] | from_expression: [L: 2, P: 6] | [META] indent: [L: 2, P: 6] | from_expression_element: [L: 2, P: 6] | table_expression: [L: 2, P: 6] | table_reference: [L: 2, P: 6] | naked_identifier: 'my_table' [L: 2, P: 14] | [META] dedent: [L: 2, P: 14] | newline: '\n' [L: 3, P: 1] | [META] end_of_file:
SQLFluff maintains it's own version of each SQL dialect, and this may not be exhaustive for some of the dialects which are newer to SQLFluff or which are in very active development themselves. This means in some scenarios you may find a query which runs fine in your environment, but cannot be parsed by SQLFluff. This is not a "bug" per-se, but is an indicator of a gap in the SQLFluff dialect.
Many of the issues raised on GitHub relate to parsing errors like this, but it's also a great way to support the project if you feel able to contribute a dialect improvement yourself. We have a short guide on Contributing dialect changes to walk you through the process. In the short term you can also ignore specific files from your overall project so that this specific file doesn't become a blocker for the rest. See Ignoring Errors & Files.
Configuration Issues¶
If you're getting ether unexpected behaviour with your config, or errors because config values haven't been set correctly, it's often due to config file discovery (i.e. whether SQLFluff can find your config file, and what order it's combining config files).
For a more general guide to this topic see Setting Configuration.
To help troubleshoot issues, if you run sqlfluff with a more verbose logging setting (e.g. sqlfluff lint /my/model.sql -v, or -vv, or -vvvvvv) you'll get a readout of the root config that SQLFluff is using. This can help debug which values are being used.
2. Isolating SQLFluff¶
If you're still getting strange errors, then the next most useful thing you can do, both to help narrow down the cause, but also to assist with fixing a bug if you have found one, is to isolate SQLFluff from any other tools you're using in parallel:
- 1.
- If you're using SQLFluff with the dbt templater, then try and recreate the error with the Jinja templater to remove the influence of dbt and any database connection related issues.
- 2.
- If you're getting an error in a remote CI suite (for example on GitHub actions, or a server like Jenkins), try and recreate the issue locally on your machine using the same tools.
- 3.
- If you're Using pre-commit, Using SQLFluff on changes using diff-quality or the VSCode extension try to recreate the issue by running the SQLFluff CLI Reference directly. Often this can make debugging significantly easier because some of these tools hide some of the error messages which SQLFluff gives the user to help debugging errors.
3. Minimise the Query¶
Often SQL scripts can get very long, and if you're getting an error on a very long script, then it can be extremely difficult to work out what the issue is. To assist with this we recommend iteratively cutting down the file (or alternatively, iteratively building a file back up) until you have the smallest file which still exhibits the issue. Often after this step, the issue can become obvious.
- 1.
- If your file has multiple statements in it (i.e. statements separated by ;), then remove ones until SQLFluff no longer shows the issue. When you get to that point, add the offending one back in and remove all the others.
- 2.
- Simplify individual statements. For example in a SELECT statement, if you suspect the issue is coming from a particular column, then remove the others, or remove CTEs, until you've got the simplest query which still shows the issue.
Contributing to SQLFluff¶
It is recommended that the following is read in conjunction with exploring the codebase. dialect_ansi.py in particular is helpful to understand the recursive structure of segments and grammars.
You may also need to reference the Internal API.
Architecture¶
At a high level, the behaviour of SQLFluff is divided into a few key stages. Whether calling sqlfluff lint, sqlfluff fix or sqlfluff parse, the internal flow is largely the same.
Stage 1, the templater¶
This stage only applies to templated SQL. Vanilla SQL is sent straight to stage 2, the lexer.
In order to lint templated SQL, SQLFluff must first convert the 'raw' or pre-templated code into valid SQL, which can then be parsed. The templater returns both the raw and post-templated SQL so that any rule violations which occur in templated sections can be ignored and the rest mapped to their original line location for user feedback.
SQLFluff supports multiple templating engines:
- Jinja
- SQL placeholders (e.g. SQLAlchemy parameters)
- Python format strings
- dbt (via plugin)
Under the hood dbt also uses Jinja, but in SQLFluff uses a separate mechanism which interfaces directly with the dbt python package.
For more details on how to configure the templater see Templating Configuration.
Stage 2, the lexer¶
The lexer takes SQL and separates it into segments of whitespace and code. Where we can impart some high level meaning to segments, we do, but the result of this operation is still a flat sequence of typed segments (all subclasses of RawSegment).
Stage 3, the parser¶
The parser is arguably the most complicated element of SQLFluff, and is relied on by all the other elements of the tool to do most of the heavy lifting.
- 1.
- The lexed segments are parsed using the specified dialect's grammars. In SQLFluff, grammars describe the shape of SQL statements (or their components). The parser attempts to apply each potential grammar to the lexed segments until all the segments have been matched.
- 2.
- In SQLFluff, segments form a tree-like structure. The top-level segment is a FileSegment, which contains zero or more StatementSegments, and so on. Before the segments have been parsed and named according to their type, they are 'raw', meaning they have no classification other than their literal value.
- 3.
- A segment's .match() method uses the match_grammar, on which .match() is called. SQLFluff parses in a single pass through the file, so segments will recursively match the file based on their respective grammars. In the example of a FileSegment, it first divides up the query into statements, and then the .match() method of those segments works out the structure within them.
- •
- Segments must implement a match_grammar. When .match()
- is called on a segment, this is the grammar which is used to decide whether there is a match.
- •
- Grammars combine segments or other grammars together in a
- pre-defined way. For example the OneOf grammar will match if any one of its child elements match.
- 1.
- During the recursion, the parser eventually reaches segments which have no children (raw segments containing a single token), and so the recursion naturally finishes.
- 4.
- If no match is found for a segment, the contents will be wrapped in an UnparsableSegment which is picked up as a parsing error later. This is usually facilitated by the ParseMode on some grammars which can be set to GREEDY, allowing the grammar to capture additional segments as unparsable. As an example, bracketed sections are often configured to capture anything unexpected as unparsable rather than simply failing to match if there is more than expected (which would be the default, STRICT, behaviour).
- 5.
- The result of the .match() method is a MatchResult which contains the instructions on how to turn the flat sequence of raw segments into a nested tree of segments. Calling .apply() on this result at the end of the matching process is what finally creates the nested structure.
When working on the parser there are a couple of design principles to keep in mind.
- Grammars are contained in dialects, the root dialect being the ansi dialect. The ansi dialect is used to host logic common to all dialects, and so does not necessarily adhere to the formal ansi specification. Other SQL dialects inherit from the ansi dialect, replacing or patching any segments they need to. One reason for the Ref grammar is that it allows name resolution of grammar elements at runtime and so a patched grammar with some elements overridden can still rely on lower-level elements which haven't been redeclared within the dialect
- All grammars and segments attempt to match as much as they can and will return partial matches where possible. It is up to the calling grammar or segment to decide whether a partial or complete match is required based on the context it is matching in.
Stage 4, the linter¶
Given the complete parse tree, rule classes check for linting errors by traversing the tree, looking for segments and patterns of concern. If the rule discovers a violation, it returns a LintResult pointing to the segment which caused the violation.
Some rules are able to fix the problems they find. If this is the case, the rule will return a list of fixes, which describe changes to be made to the tree. This can include edits, inserts, or deletions. Once the fixes have been applied, the updated tree is written to the original file.
How to use Git¶
Many of the contributors may not be familiar with Git and it can be a confusing world for those new to it with perplexing terms like clone, fork, branch, merge conflicts and rebase. This guide aims to provide some information to those of you new to Git about the best way we think of working with it it is and also serve as a quick reference to some of the Git terms, or commands to use.
Introduction to Git¶
This section will give some basic background to complete newbies to Git. Feel free to skip to the next section, Recommended way to use Git for SQLFluff, where we talk about how we use it on SQLFluff if you understand the basics already.
What is Git?¶
Git is a distributed version control system. That mouthful basically means it's a way of keeping track of changes to our source code and other content - especially when many, many people are changing various parts of it. The distributed part of it is what makes Git so interesting (and so complicated!) - there can be many copies of our code, and that can cause fun and games when trying to keep it in sync!
The original and primary copy of a code base (called a repository or repo) is hosted on a server (e.g. GitHub), people will be working on copies in their local machine, and people may have forked a copy of the repo to another one also hosted on the server - and then that forked copy may also be copied locally to your machine. Add in different branches in any of those copies and it can quickly become quite confusing.
Git often involves working with the command line, which might be less familiar and a bit intimidating for those of you less technically minded. Graphical front end tools exist to try to replicate this command line functionality but it's helpful to have some familiarity with using Git on the command line and with a guide like this, hopefully that becomes less daunting a prospect!
What is GitHub and how is it different than Git?¶
GitHub is not Git, but it is one of the most commonly used instances of Git and adds various features on top of the core versioning of code that Git handles. The main thing GitHub gives you is a Git server to store your code, and a nice web front end to manage it all through. Using the web front end you can view (and even change!) code, raise issues, open and review pull requests, use GitHub Actions to automate things (e.g. test code) and even host wiki pages like this one.
In this Wiki I've tried to differentiate between Git concepts and commands and those specific to GitHub. Other instances of Git that you might be familiar with, or use in work or other projects, include GitLab and BitBucket. They have many of the same features as GitHub.
GitHub also have a graphical front end tool called GitHub Desktop for working on on Git locally and syncing it back to GitHub. Check out the GitHub Desktop section for tips on how to use it.
SQLFluff makes extensive use of GitHub to help us manage the project and allow all the many disparate contributors to collaborate easily.
Installing Git¶
While it is possible to work just using GitHub's website - especially if just comment on issues and adding your advice - managing the code really is best done locally on your own computer and then pushing changes back up to GitHub. Git is very popular and widely available (see installation instructions for Windows, Mac & Linux). You may already have it installed, so to check if that's the case, open a command line and type:
git --version
If you see a version number returned then you've passed the first step!
If not, then for Windows I recommend installing and using Git Bash which is a Linux-like command line. For MacOS the built in Terminal available under Launchpad is fine and running the above version check will prompt you to install XCode and Git. For Linux I presume you'll be familiar with how to install this.
Git Repos¶
A Git Repository or Repo is a collection of all the code that makes up a project. Well that's not strictly true as a project may also depend on other programs and libraries, but typically they are not stored in the project repo - only the code specific to this project is stored in the repo along with config files that are used to install any necessary libraries to run the code and instead installed (e.g. using a command like npm install for node modules).
The main SQLFluff repo is available on GitHub at: https://github.com/sqlfluff/sqlfluff. However, we also have a few other repos for the VS Code extension and the like, available at https://github.com/sqlfluff.
Git Branches¶
A repo will usually contain a number of branches. These are copies of the code where you can work independently on a particular item. The name branch is used because, like a tree, these can diverge from each other - though, unlike a tree, they are usually merged back when the work is complete.
There will be one main (or master) branch which everything should be merged back into when ready. Traditionally these have been called the master branch, but many projects are trying to use more inclusive language and have switched to using the name main or similar instead. SQLFluff moved to using main in 2021.
Creating a branch is very quick and is integral to how Git works. Git stores branches in an incredibly efficient way and doesn’t literally have a copy of the same code, but only differences basically. So do not feel like it's a big deal to create a branch (it's not!) and frequently creating small branches, and merging them back in to the main branch when ready is the best way to use Git. Creating large branches or reusing branches for lots of different changes is not the best way of using Git and will lead to issues.
GitHub Pull Requests¶
Once your changes are ready to merge back to main you open a pull request (often shortened to PR), which creates a special type of GitHub issue which can be used to merge your changes into the main branch.
A pull request is really a GitHub concept and at the end of the day is basically a fancy way of actioning a merge in Git. Bitbucket also use the term Pull Request, while GitLab uses Merge Request. It should also not be confused with git pull, which is a Git command to pull down changes from the server (e.g. GitHub) into your local copy.
An example pull request on GitHub is shown below: [image: Screenshot of an example pull request on GitHub.] [image]
In this pull request there are the following tabs:
- Conversation - this allows you to give some info using GitHub markdown (including screenshots if you want). Reviewers can comment, and ask questions for you to answer, before merging the pull request into the main code.
- Commits - this shows a list of links to all the individual changes you made to the code. It's not that useful a tab to be honest!
- Checks - this shows all the automated checks run on your code so we know it's good! These are setup in the main repo using GitHub Actions (or similar) and the results are also shown at the bottom of the Conversation tab for open pull requests.
- Files Changed - this is one of the most useful tabs and shows each line of code changed. Reviewers should look those this tab, and can click on individual lines to make comments or code improvement suggestions which are added to the Conversation tab and the person who opened the pull request (called the pull request author) can then answer or address the concern (including accepting any suggested code changes directly into this change with a click).
You can tag people to review your pull request, assign it to someone to deal with (not used much as kind of repeat of the author and reviewers), add labels...etc.
At the bottom of the Conversation tab you will see the following: [image: Bottom of a pull request with "Squash and Merge" and "Close" buttons.] [image]
This shows on this PR that all checks have passed and this is ready to merge. Clicking the big green "Squash and Merge" button will copy (the "Merge" part) all this code into main branch with one single commit (the "Squash" part). Usually you don't need to have all the 100s of commits you have have done while developing this code change so "Squash" is what you want but you can change it if you want.
You can also close this pull request if you change your mind with the Close button at the bottom, or add a comment with the Comment button if you make a big change to it since opening that you want people following the pull request to be aware of.
Please note you do NOT need to Close and Reopen the pull request (or even open a new pull request) when you need to make changes based on review feedback - simply pushing changes to the branch will cause any open pull request from that branch to automatically be updated and checks to automatically be rerun. It is expected (and a good thing!) to change your code based on feedback and this is very much part of the workflow of pull requests.
GitHub Forks¶
As well as branches, GitHub has the concept of forks, which basically means taking a complete copy of the repo (and all its branches at that time) into your own GitHub account. You can then create a branch in that fork, and then open a pull request to to merge code from your branch on your fork, all the way back to the the original repo (called the upstream repo). It may sound like an Inception level of abstraction and confusion but it actually works quite well once you get your head around it.
NOTE:
Why would you fork when you can just work in the original repo? Well most projects don't want people messing with the original repo so restrict permissions to only allow core contributors to create branches in the original repo. Others must therefore fork to make changes and then open pull requests to the original repo for review before they are committed.
And it's important to use the correct terminology when working with forks. Tempting as it is, the original repo should always be referred to as "original" or "upstream", and never "main" or "master" - which refer to branches within a repo. Similarly a "local" copy, or "clone" refers to the copy on your PC as we shall see and that can be of the original repo or a fork.
Another extra bit of hassle with a fork, is that you must keep it reasonably up to date with the original, upstream repo. To do that you periodically merge or rebase the fork back to the original repo which pulls down changes into your fork. We'll explain how to do that later.
Cloning a Git Repo¶
To work on a project in GitHub you would normally clone a repo, which simply means taking a copy of it on your local PC. It is possible to make small edits on the GitHub.com website but it's quite limited and often doesn't allow you to run code locally to test it for example. You can clone a repo by clicking on green Code button on the repo's home page (make sure you do this on your fork and not on the main repo): [image: Screenshot of the clone button in GitHub.] [image]
This offers a number of options:
- "Clone with SSH" is the recommended way but is a little more complicated to set up, but allows you to interact with GitHub without entering your GitHub password each time, and is basically mandatory if using 2FA for your GitHub account.
- "Clone with HTTPS" works but requires you to enter your password each time so gets a little tiresome.
Once you copy the SSH or HTTPS URL on the command line simply go to the command line on your PC, into a directory you want to create the copy in and type the following (assuming SSH):
git clone git@github.com:sqlfluff/sqlfluff.git
You can clone a local copy of the original repo, if you plan to (and have access to work on branches of that, or you can clone a fork of the original repo. The above example command clones the original repo location, and not the fork location - you should change the git address to the forked version when working from a fork.
After running this command you'll see the repo being downloaded locally. You can then branch, edit any of the files, or add new files, or even delete files to your hearts content. Any changes you make will only be on your machine and then you push changes back up to GitHub. We'll cover that later.
Just like with a fork, you need to keep any local up to date with both the original, upstream repo, and the GitHub version. This is done by using the git pull, git merge and git rebase commands. We'll explain how to do all that below.
Git Merge Conflicts¶
When keeping all the different copies in sync you will inevitably run into the dreaded "merge conflict" - a rite of passage every developer must go through. This happens were you've changed some code, but so has someone else, and their changes has been merged into main, so when you attempt to merge (either by syncing main back to your branch to update your branch with any new changes since branching, or by attempting to open a pull request from your branch) Git will give up and say "I don't know what to do here - you deal with it!".
In actually fact, dealing with merge conflicts is actually very simple. When you open the conflicted file you'll see something like this:
If you have questions, please <<<<<<< HEAD open an issue ======= ask your question in Slack >>>>>>> branch-a
In this case someone changed the line to "open an issue" and merged that to main (aka HEAD) and you've also changed it to "ask your question in Slack". Git is warning you that it has been changed since branching but you also changed it. You simply need to decide what line you want and then delete all the other lines (including the ones starting <<<<, ==== and >>>>). Then git add the "resolved" file to your branch.
You can even do it directly on GitHub.
Merge conflicts get a bad name and people think they are scary to deal with but Git actually makes it fairly easy. It will also usually only complain if the exact same line has changed — two people working on different parts of the same file usually won't see any merge conflicts.
Of course if you're both working on lots of the same code, across lots of files they can be a real pain to deal with - this is one of the main reasons to resync your branch back to the original main branch frequently, and also to work on small PRs rather than big unwieldy ones!
Recommended way to use Git for SQLFluff¶
When working on SQLFluff you must fork SQLFluff to your own copy and work on that. SQLFluff contributors do not have access to create branches in the original repo.
To create your own Fork do the following:
- 1.
- Click the Fork button in GitHub and wait for it to complete.
- 2.
- Clone your fork locally - for me (tunetheweb) that means running this command on your machine git clone git@github.tunetheweb/sqlfluff.git
- 3.
- Add a link to upstream - git remote add upstream git@github.com:sqlfluff/sqlfluff.git (change this to the HTTPS URL if not using SSH)
It is also strongly recommended not to work on the main branch of your forked repo. When creating a new branch you will usually branch from main, so once your main has extra changes in it, it is no longer possible (or at least easy!) to create a clean branch for other work.
If you are only working on one thing at once, then using main in your fork may seem the quickest and easiest thing to do, but you'd be surprised how often you may want to pause that work for a bit and work on something else instead - and that's one of the advantages of using Git. For example if you are waiting on a pull request to be reviewed, you may want to work on another feature in the meantime on a fresh branch, which is completely independent of your other work. Or perhaps someone discovers an urgent, and easily fixed, bug in the code that you can quickly fix, before coming back to the current work. See the Switching between branches section below for more info on how to switch branches.
Working on a separate branch to main allows main to be kept in sync with upstream main, which allows new branches to be created more easily. It also allows you to merge upstream main into your branch periodically more easily. It also keeps your history of changes without a long history on each future pull request. Finally, it also also you to completely reset your main back to the same as upstream if you get it completely messed up, without losing any history in other branches.
Resyncing your main branch to upstream¶
As discussed above it is recommended to use the main branch in your fork only to sync up with the main branch in the original SQLFluff repo. To do that you can do one of several things:
- 1.
- You can merge upstream main into your main branch - however if you have any changes, the this will add a commit message for each upstream change into your main. This means the next time you branch and open a pull request you'll see a long history of old commits, which is a bit noisy and annoying when they were nothing to do with your old pull request. Of course you shouldn't have any changes on main (because you're not using this as advised above aren't you?) so this shouldn't be an issue, but you never know...
- 2.
- You can rebase your main branch to be based on upstream main. This is a special Git command that basically makes it look like you just forked your main off of upstream main and then applies any changes you have made in your repo on top of that, so should lead to less risk of old commit messages, but it will still add commits for any changes you added. And again, you shouldn't have any changes on main so this shouldn't be an issue, but you never know...
- 3.
- You can blast away your main and basically reset it back to upstream main. This sounds extreme but is actually my preferred option to ensure main is super clean and reset back to the same as upstream main. Note this will (intentionally!) blast away any changes from your main and you will lose them completely but, again, there shouldn't be any changes here if following above advice.
So I prefer option 3 - it's the cleanest and ensures main is exactly the same as upstream, and that there will be no long history of commit messages in your next branch and pull request.
Commands to force reset your main branch to upstream¶
You can use the below commands to reset your local fork to upstream and then push those changes to GitHub.
NOTE:
Check if upstream already exists:
git remote -v
Add an upstream remote, if not already added previously:
git remote add upstream git@github.com:sqlfluff/sqlfluff.git
Then force reset your main branch:
git fetch upstream git checkout main git reset --hard upstream/main git push origin main --force
After this your should visit your forked repo on GitHub and check you get a message that "This branch is even with sqlfluff:main.": [image: A forked repo which is even with upstream.] [image]
Creating and working on a branch¶
So, when you're ready to make your first changes, do the following:
- 1.
- Resync your local copy to upstream as per above (unless you've just forked it, in which case you can skip this step)
- 2.
- Make sure you are in the main branch - git checkout main
- 3.
- Branch main to a new branch with a meaningful branch name. For example to work on an issue to add new Posgres Create Table you'd run this command - git checkout -b postgres-create-table (note the -b which means "create a new branch please").
- 4.
- Make your changes using your favourite code editor (I recommend VSCode myself).
- 5.
- If making code changes to the website then test them - follow instructions in CONTRIBUTING.md to set up the environment and then use tox -e generate-fixture-yml,cov-init,py38,cov-report,linting to run most of the tests.
- 6.
- Add any new files you added in this change that you want tracked in git - e.g. git add text/fixtures/dialects/*.sql. You also need to do this to re-add merge conflict files that you have resolved. You typically do not need to add other files that are already in the repo that you have changed, as Git will track those already.
- 7.
- Commit your changes - git commit -a. This means “save all changes to Git (for tracked files)”. If you only want to commit some files, then git add just the files you want and then use git commit without the -a option. When you commit you need to add a message. Git will automatically open your default Git editor - usually vi which can take some getting used to but basically type your message and then type Esc : w q ! to enter command mode (Esc), start a command () save your file (w for write) and quit (q) and force override any errors on save (!). See the next section for an alternative method if you're not comfortable with vi. And btw if you want to cancel the commit at this stage, remove any message you added and type Esc q ! to quit without saving any changes and Git will see you didn't add a message and cancel the commit.
- 8.
- Push your committed changes to GitHub - git push. If this is your first push of this branch to GitHub, then git will tell you you have to use a slightly different command: git push --set-upstream origin postgres-create-table
- 9.
- If there are any changes already in GitHub (e.g. of other people are also working on this branch), then you will get and error and need to do a git pull to pull them down locally before you are able to push any more changes back up to GitHub.
- 10.
- Repeat steps 4-9 as often as you want until you are happy to open a pull request to merge this back into SQLFluff code base.
Committing changes - to vi or not to vi¶
My preferred method of committing changes is to use this:
git commit -a
This takes all the changes for existing tracked files and adds them to the commit. New files still need to be added with git add but all files currently tracked by Git are automatically included without having to use git add. This then opens the default Git editor (usually vi) and lists the files for this commit, any files not included and allows you to add the message and complete the commit. If you close vi without adding a message then it cancels the commit.
However vi can be quite a barrier to people as it's quite a confusing editor with a "command" and an "edit" mode, and needing to remember command sequences (like ESC + w + q + !). For those not familiar with linux and vi this might be quite a stumbling block.
You can provide the message on the command line so you don't have to got near vi using a sequence of commands like this:
git status git add file1 file2 git commit -m "Committing file1 and file2"
This does require you to add the files or folders first so a bit more painful than git commit -a. Alternatively you use the -a and -m switches together:
git status git commit -a -m "Committing all open files"
The downside is that, unlike the vi method, it won't show you the list of files it's going to commit, so is a bit more dangerous, hence why I prefer the vi method instead. However, if you do a git status before you commit, you should see the files that will be committed. Plus you can always revert changes if you need to.
So, in summary vi method is preferred but can be more complicated to those not familiar with it so can give commit message on command line but take care with it.
Keeping your branch up to date¶
It is also recommended to merge any changes that have happened to SQLFluff code (in its main branch) into your branch periodically in case it affects your code, and particularly important to do this just before opening a PR.
To merge changes into a forked repo from upstream main do the following:
- 1.
- Commit any changes in the branch you are working on.
- 2.
- Update you local copy of upstream - git fetch upstream
- 3.
- Merge the changes from upstream main into your branch - git merge upstream/main
- 4.
- Commit the merge - git commit -a
- 5.
- Push all your changes up to GitHub - git push
Or to merge from a branch on the main repo do the following:
- 1.
- Commit any changes in the branch you are working on.
- 2.
- Update you local copy of main - git fetch origin main
- 3.
- Merge the changes from upstream main into your branch - git merge main
- 4.
- Commit the merge - git commit -a
- 5.
- Push all your changes up to GitHub - git push
Switching between branches¶
Git allows working on several branches at once. This allows you to work on something else while you are stuck on one bit of work (waiting for answers, or pull request feedback, or you just fancy a break!). Use git checkout to switch between branches and use git status to ensure all your changes are committed when switching between branches.
For example, let's say you are working on feature1 branch:
git checkout main git pull git checkout -b feature1 # Make some changes git commit -m "Commit my changes for feature1" # Make some more changes git commit -m "Commit some more changes for feature1" # Push changes to Github.com if you want to (always good to do this in case your computer dies!) git push # Note the first time you push a new branch you will need a slightly different push comment: # `git push --set-upstream origin feature1` # Helpfully, git will tell you this if you try using just `git push`.
And then you want to take a break from feature1, in which case you should open a new branch - in most cases you want to branch from main again, and not from feature1 branch so make sure you flip back to main again if this is the case. If working on a fork, you should also check your main is up to date first - see the Resyncing your main branch to upstream section above:
# Check your branch is clean and everything has been committed git status # Create a new branch from main (note I've not included the resyncing of main to upstream here for forks) git checkout main git pull git checkout -b feature2 # Make some changes and push your new branch to GitHub git commit -m "Commit some changes for feature2" git push --set-upstream origin feature2 # Make some more changes and push those too git commit -m "Commit more changes for feature2" git push
You are now free to switch back to feature1 if you want using git checkout (note you don't need the -b flag as that's only needed to create a new branch that doesn't exist, whereas just switching between branches that already exist don't need it):
git checkout feature1
And then switch back to feature2 later:
git checkout feature2
The primary concern with multiple branches like this is getting yourself confused! Using git status here is your friend to double check you're on the correct branch and all outstanding changes have been committed.
As with lots of things in Git, there are many ways to handle branches (including the git branch command), but I'd encourage you to stick with few commands and use git checkout, git commit, git push, git pull and git status as much as possible as you can do most things with those and it's already getting confusing!
The other alternative is to just create a new folder and clone the whole repo again and manage it completely separately. This can be easier, safer and less confusing for those less familiar with Git if working on a limited number of branches. However it doesn't scale very well and is not the way you're going to get the most out of Git so as soon as you go beyond a second branch I'd strongly encourage you get used to checking out between branches.
I do encourage separate folders however if you have different repos (e.g. your do some of the main work on the HTTPArchive repo, and some work on your own fork) as switching repo that a folder points to, while also possible, is adding yet more confusion to an already complex thing! 🙂
Opening a Pull Request¶
Once you are finished making changes, you should take the following steps to open a pull request back to the original repo to accept your code into SQLFluff:
- 1.
- Merge in any changes that happened to SQLFluff code since you branches (see above).
- 2.
- Run all the automated tests tox -e generate-fixture-yml,cov-init,py38,cov-report,linting.
- 3.
- Make sure all your changes are pushed to GitHub.
- 4.
- Open a pull request in GitHub.
- 5.
- If the pull request closes an issue then you can add "Closes #123" or "Fixes #123" in the first comment and GitHub will automatically close issue #123 when the pull request is merged. If it doesn't fully close the issue, then you should instead say something like "Makes progress on #123" which will create a link on original issue but not close it.
As mentioned above, you can make more changes to your branch and push them up to GitHub and the Pull Request will automatically be updated. There is no need to close the PR and reopen a new one.
Actioning feedback from pull requests¶
When your pull request is reviewed, the first bit of feedback you're likely to see if from the automated tests that run every time a pull request is opened. They take a few minutes to run and then you will then give you a satisfying green tick, or a scary red cross.
Merge conflict checks¶
The first check that GitHub itself will do is check for any Git Merge Conflicts and these must be resolved before the pull request can be merged. If you merge main to your code before submitting a pull request, then it's unlikely you'll get any of these (unless someone's got in real quick while you were opening it!) which is why it's recommended to do that, but other pull requests can be action while yours is being reviewed so can still happen. Smaller pull request, touching few files reduces the chance of this so again, where at all possible, break up changes into smaller batches.
Linter checks¶
Another check that we perform is linting the code in the pull request. This runs automated checks for errors or code styling and formatting issues that don't match the code conventions we use.
Python code is linted with flake8 and you can run this command on any files to see similar linting issues. You can use black to auto-fix most flake8 issues, though some need to be manually addresses.
Code review feedback¶
After the automated tests have passed the code will be reviewed manually by a maintainer, or another contributor. They can ask questions, or make suggestions to change the code.
Look upon a code review as an opportunity to learn and improve your code. Feedback can be tough to hear after you've worked hard on code, but stay aware that it is meant with the best possible intention to provide feedback to you for this and future commits, and to keep our codebase to a high standard. It is not a personal sleight upon you or your code, and if you are getting annoyed with feedback I suggest you take a break and step away and read it again later, in a fresh light. Of course if if you feel that a reviewer is acting inappropriately then please raise it - we have a Code of Conduct and want all contributors to feel welcome. Feel free to also reach out to a maintainer if you would like to discuss something privately.
When a reviewer makes a code suggestion you can accept it right in GitHub and it will automatically update your branch. As this happens in GitHub directly, just make sure you do a git pull next time you are working locally on your code to pull down these changes. It's quite slow to accept a lot of changes this way, so if there are a lot of changes then sometimes better to change locally in your favourite editor, and then push one commit with the fixes, and then mark each of the suggestions as resolved. Any suggestions on lines which have changes since they were raised, will be marked as outdated which makes them easy to spot.
Reviewers may also make small, seemingly pedantic changes - usually they will include the word "nit", to indicate this is a "nitpick" in these. Like the linting, they can seem needless, but they help maintain our code quality.
It should also be noted that not all suggestions may be valid! Reviewers can make a mistake just as easily as the pull request author (more easily in some ways as they often will be reviewing based on reading the code rather than testing it). Feel free to push back on suggestions when you disagree with them. However, it is best to get consensus between reviewer and pull request author where at all possible so explain why you don't think you should make the change being suggested rather than just ignoring the suggestion or resolving it without a comment.
After addressing feedback, please re-request a review by clicking the little two arrow icon next to the reviewer name, or make a comment (e.g. "All feedback addresses. I think this is good to merge now."). Sometimes it's difficult to know if someone is still working on feedback and a pull request may be left unintentionally, getting out of date, because reviewers are not aware that it's good for a re-review or merge.
Only SQLFluff maintainers can merge pull requests, but every contributor can review pull requests, and merging becomes a lot easier (and so more likely!) if someone else has already had a look through the changes. Please, please, please help us reviewing_pull_requests to help spread the load!
Draft pull requests¶
It is also possible to open a draft pull request, if you want early feedback on your code or approach. Please remember to convert to a full pull request when happy.
Additionally if making a large number of changes (for example if you need to update every dialect), then best to do this in only one or two files initially, open a pull request (whether draft or full) and get feedback on your approach before you spend time updating all the files! It's much easier to review code changes if they are not drowned out by lots of identical changes that needs that code in the same pull request, and much less likely to result in merge conflicts. Then the rest of the files can be added to the pull request, or a separate one opened for those (maybe a new pull request per language if you want different translators to approve any changes).
Reviewing pull requests¶
As mentioned above we strongly encourage contributors to help review pull requests. This is a voluntary, collaborative effort for us all and depending on one or two people creates a bottleneck and a single point of failure for the project.
Even if a review pulls up nothing, it is important to approve the pull request - this indicates that it has been reviewed and is just as useful (if not more so) as commenting on code or making suggestions.
Do also be conscious of how feedback will be read. We have many first time contributors who may not be as familiar with language (either programming language or English language!) so do try to avoid technical terms, colloquialism...etc. Though we are aware of some very commonly used acronyms and terms (which we've included in our Glossary of terms) like LGTM ("Looks Good To Me").
Do be mindful as well that pull request authors have voluntarily spent time on this and we wish to encourage that and foster an inclusive environment. Offensive language is forbidden by our Code of Conduct.
Do remember that a code review is also about reviewing the authors changes, and not about showing off your own knowledge! Try not to get side-tracked but instead raise a new issue if you want to consider something else that comes up during a code review but is not directly related.
On that note, do also remember that code can be improved incrementally. Small changes are much better in Git. So, while it's OK to point out a fuller solution do remember that Perfect is the enemy of good and accepting a change that is an improvement and then improving further in future iterations can often be better than holding out for the perfect solution that may never come. Of course that doesn't mean we should accept code that regresses the quality, or seems like the wrong way of doing it completely!
And finally, we strongly encourage positive GitHub reactions - particular for new contributors. They give quick, encouraging, feedback and add a playful, fun tone: [image: GitHub Heart and Rocket reactions.] [image]
We discourage the negative ones though (thumbs down 👎, or confused 😕). Better to add a comment (politely!) explaining your concerns and then if others agree with you, they can "thumbs up" your comment. This keeps things on a positive tone and also means your cause for concerns are fully understood.
GitHub Desktop¶
GitHub Desktop is a Windows and MacOS app that provides a visual interface to GitHub. It reduces the need to use and understand Git via the command line.
This section will provide some tips on performing some common tasks via the GitHub Desktop
Installing GitHub Desktop¶
First make sure you have Git installed. See our section on Installing Git for more details.
You can then download the install file from https://desktop.github.com/, with further instructions from their Installing and configuring GitHub Desktop document. Your main tasks will be to Authenticate with GitHub and Configuring Git for GitHub Desktop so that the systems know who you are.
Cloning the SQLFluff repo¶
If you have not done already, you will want to clone a copy of the https://github.com/sqlfluff/sqlfluff repo into your computer. The simplest way is to follow Cloning a repository from GitHub to GitHub Desktop where you go to the repository on the website and select "Open with GitHub Desktop". This will open a window where you can click "Clone" and the job will be done.
Navigating GitHub Desktop¶
Once you have cloned repositories you will be able to select them via the "Current repository" toolbar button, just under the menu on the left. By default the sidebar will show you what edits have been made to the repository, and the main section shows actions you may want to perform.
Updating your repository (Pull origin)¶
Over time the original repository will get updated and your copy will become out of date. GitHub Desktop will highlight if your repository is out of date, with an option to pull any changes from the origin so that you have the latest versions.
Making your own edits (creating a branch)¶
You want to create your own branch before you start as you very likely do not have permission to edit the SQLFluff main branch. A branch is a way for you to group your own edits so you can later submit (push) them for review. Then, when they are approved, they will get merged back into the main branch.
Before creating a branch, make sure you're currently on the main branch and it is up to date (see above).
If you click on the "Current branch" tab in the toolbar you will see all the public branches in play. To create your own branch, enter a new name in the textbox at the top and click the "Create new branch" button.
Publishing your branch¶
At the moment your branch is only known to you. If you want others to see it, then you need to publish it. GitHub Desktop will prompt you to do that.
Once published you and others can select your branch on the GitHub website.
Editing your branch¶
You can edit the repository using your favourite editor. As you edit, GitHub Desktop will show you what changes you have made.
Note that you can change branches at any time, but I suggest you commit and push any edits (see next) before you switch as things can get confusing. If you are working with multiple branches, always keep an eye out to make sure you're on the right one when working.
Committing and pushing your edits to the web¶
Every once in a while you want to store and document your changes. This can help you or others in the future. You also have to commit before you can share (push) your changes with anyone. You can quickly commit your current edits via the form to the bottom left.
Once you have commits you will be prompted to push those commits to GitHub. I typically do this straight after committing.
Getting your changes accepted¶
At this point you have a branch with edits committed and everything pushed to GitHub. Once you are happy with your work, you want it to be reviewed, approved and merged back into the main repository.
For this I switch back to the website, as it is there you will be communicating with reviewers. To get this stage started you need to create a pull request. Go to the SQLFluff responsitory on GitHub, make sure your branch is selected, then click the Pull request link and follow the instructions. This will notify the reviewers who will help you to get your changes live.
Keeping the forked repository up to date¶
The main branch of your fork should be kept in sync with the original repository (rebased). Especially before you create any branches to make edits. Details on how to do this are in the Resyncing your main branch to upstream section.
Making your own edits¶
This is done in the exact same way as before (i.e. in Making your own edits (creating a branch)). Create a branch from your master (make sure master is up to date using the above process), publish the branch, edit the files in the branch, commit your edits, push back to GitHub.
With a forked repository the process to get your edits accepted is about the same as before (i.e. in Getting your changes accepted). Go to the web page for your copy of the repository and create a pull request.
Glossary of terms¶
This is a list of terms to those less familiar with Git/GitHub:
- branch - a copy of the code within a repo, where you may be working on code that is not ready to commit back to the main branch. Note Git actually only stores differences so it's not really a copy, but that's just an efficiency in Git and you can consider it a copy to all intents and purposes.
- fetch - a git command which downloads all changes from a remote repo (i.e. a server) to a local one.
- fork - a complete copy of a repo and all it's branches.
- LGTM - Short hand for Looks Good To Me - typically used approving a pull request.
- local - a complete copy of a repo on your PC
- main - the primary branch of SQLFluff repo. Some other repos use master for this.
- master - an alternative name for main branch used by some repos.
- merge - to copy changes from one branch into another branch.
- merge request - what a pull request is also known by particularly in GitLab an alternative to GitHub.
- origin - the server version if the repo (the opposite of local).
- pull - to fetch changes from a remote repo, and then merge them into this branch in one step.
- pull request - a way to merge changes back to the main branch. A pull request is a special issue type that allows the potential merge to be reviewed and commented on before it is merged.
- rebase - to bring a branch up to date, as if it had been created from now, while maintaining the existing changes on top.
- repo/repository - a git project which is basically a collection of files - which may exist in several branches.
- upstream - the original repo that a fork was created from.
Contributing dialect changes¶
One of the best ways that SQLFluff users can improve SQLFluff for themselves and others is in contributing dialect changes.
Users will likely know their syntax much better than the regular maintainers and will have access to an instance of that SQL dialect to confirm changes are valid SQL in that dialect.
If you can fix your own issues then that's often the quickest way of unblocking any issues preventing you from using SQLFluff! The maintainers are all volunteers doing this in our spare time and (like you all I'm sure!), we only have so much time to work on this.
How SQLFluff reads (or parses) SQL¶
SQLFluff has a lexer and parser which is built in a very modular fashion that is easy to read, understand, and expand on without any core programming skills or deep knowledge of Python or how SQLFluff operates. For more information see the Architecture Documentation, but will cover that briefly here to give you enough to start contributing.
We also have a robust Continuous Integration pipeline in GitHub where you can gain confidence your changes are correct and will not break other SQLFluff users, even before a regular maintainer reviews the code.
SQLFluff defines the syntax it will used in dialect files (more on this later). If you look at the dialect_ansi.py file you will see it has syntax like this:
class SelectClauseSegment(BaseSegment):
"""A group of elements in a select target statement."""
type = "select_clause"
match_grammar = StartsWith(
Sequence("SELECT", Ref("WildcardExpressionSegment", optional=True)),
terminator=OneOf(
"FROM",
"WHERE",
"ORDER",
"LIMIT",
"OVERLAPS",
Ref("SetOperatorSegment"),
),
enforce_whitespace_preceding_terminator=True,
)
parse_grammar = Ref("SelectClauseSegmentGrammar")
This says the SelectClauseSegment starts with SELECT or SELECT * and ends when it encounters a FROM, WHERE, ORDER...etc. line.
The match_grammar is what is used primarily to try to match and parse the statement. It can be relatively simple (as in this case), to quickly match just the start and terminating clauses. If that is the case, then a parse_grammar is needed to actually delve into the statement itself with all the clauses and parts it is made up of. The parse_grammar can be fully defined in the class or, like above example, reference another class with the definition.
The match_grammar is used to quickly identify the start and end of this block, as parsing can be quite intensive and complicated as the parser tries various combinations of classes and segments to match the SQL (particularly optional ones like the WildcardExpressionSegment above, or when there is a choice of statements that could be used).
For some statements a quick match is not needed, and so we can delve straight into the full grammar definition. In that case the match_grammar will be sufficient and we don't need the optional parse_grammar.
Here's another statement, which only uses the match_grammar and doesn't have (or need!) an optional parse_grammar:
class JoinOnConditionSegment(BaseSegment):
"""The `ON` condition within a `JOIN` clause."""
type = "join_on_condition"
match_grammar = Sequence(
"ON",
Indent,
OptionallyBracketed(Ref("ExpressionSegment")),
Dedent,
)
You may have noticed that a segment can refer to another segment, and that is a good way of splitting up a complex SQL expression into its component parts to manage and handle them separately.
Segment grammar options¶
There are a number of options when creating SQL grammar including:
Grammar | Used For | Example |
"KEYWORD" | Having a raw SQL keyword | "SELECT" |
Sequence() | Having a known sequence of Keywords or Segments | Sequence("SELECT", Ref("SelectClauseElementSegment"), "FROM"...) |
AnyNumberOf() | Choose from a set of options which may be repeated | "SELECT", AnyNumberOf(Ref("WildcardExpressionSegment"), Ref("ColumnReferenceSegment")...)... |
OneOf() | A more restrictive from a set of AnyNumberOf limited to just one option | OneOf("INNER","OUTER","FULL"), "JOIN" |
Delimited() | Used for lists (e.g. comma-delimited - which is the default) | "SELECT", Delimited("SelectClauseElementSegment"), "FROM"... |
Bracketed() | Used for bracketed options - like function parameters | Ref("FunctionNameSegment"), Bracketed(Ref("FunctionContentsGrammar") |
Some of the keywords have extra params you can give them, the most commonly used will be optional=True. This allows you to further define the make up of a SQL statement. Here's the DeleteStatementSegment definition:
parse_grammar = Sequence(
"DELETE",
Ref("FromClauseSegment"),
Ref("WhereClauseSegment", optional=True), )
You can see the WHERE clause is optional (many's a head has been shaken because of deletes without WHERE clauses I'm sure, but that's what SQL syntax allows!).
Using these Grammar options, it's possible to build up complex structures to define SQL syntax.
Segments and Grammars¶
A Segment is a piece of the syntax which defines a type (which can be useful to reference later in rules or parse trees). This can be through one of the functions that creates a Segment (e.g. NamedParser, SegmentGenerator...etc.) or through a class.
A Grammar is a section of syntax that can be used in a Segment. Typically these are created to avoid repeating the same code in multiple places. Think of a Grammar as an alias for a piece of syntax to avoid you having to type out the same code again and again and again.
The other good thing about Grammars is it allows other dialects to override a specific part of a Segment without having to redefine the whole thing just to tweak one small part. For example ansi defines this:
NotOperatorGrammar=StringParser("NOT", KeywordSegment, type="keyword")
whereas mysql overrides this to:
NotOperatorGrammar=OneOf(
StringParser("NOT", KeywordSegment, type="keyword"),
StringParser("!", CodeSegment, name="not_operator", type="not_operator"), ),
This allows MySQL to use ! in all the places that NOT was used (providing they use NotOperatorGrammar rather than hardcode the NOT keyword of course). This makes it much easier to customise syntax to a particular dialect without having to copy and paste (and maintain) nearly identical code multiple times just to add the extra ! syntax that MySQL supports to mean NOT.
Dialects¶
A lot of SQL is the same no matter which particular type of SQL you are using. The basic SELECT.. FROM... WHERE statement is common to them all. However lots of different SQL dialects (Postgres, Snowflake, Oracle... etc.) have sprung up as different companies have implemented SQL, or expanded it, for their own needs.
For this reason, SQLFluff allows creating dialects, which can have different grammars from each other.
SQLFluff has all the dialects in the src/sqlfluff/dialects folder. The main dialect file (that every other dialect ultimately inherits from) is the dialect_ansi.py file.
In SQLFluff, a dialect is basically a file which inherits everything from the original ANSI dialect, and then adds or overrides parsing segments. If a dialect has the exact same SELECT, FROM and WHERE clauses as ANSI but a different :ORDER BY syntax, then only the :ORDER BY clause needs to overridden so the dialect file will be very small. For some of the other dialects where there's lots of differences (Microsoft T-SQL!) you may be overriding a lot more.
Lexing¶
I kind of skipped this part, but before a piece of SQL can be parsed, it is lexed - that is split up into symbols, and logical groupings.
An inline comment, for example, is defined as this:
RegexLexer(
"inline_comment",
r"(--|#)[^\n]*",
CommentSegment,
segment_kwargs={"trim_start": ("--", "#")}, ),
That is, anything after -- or # to the newline. This allows us to deal with that whole comment as one lexed block and so we don't need to define how to parse it (we even give that a parsing segment name here - CommentSegment).
For simple grammar addition, you won't need to to touch the lexing definitions as they usually cover most common ones already. But for slightly more complicated ones, you may have to add to this. So if you see lexing errors then you may have to add something here.
Lexing happens in order. So it starts reading the SQL from the start, until it has the longest lexing match, then it chomps that up, files it away as a symbol to deal with later in the parsing, and starts again with the remaining text. So if you have SELECT * FROM table WHERE col1 = 12345 it will not break that up into S, E, L...etc., but instead into SELECT, *, FROM, table...etc.
An example of where we had to override lexing, is in BigQuery we have parameterised variables which are of the form @variable_name. The ANSI lexer doesn't recognise the @ sign, so you could add a grammar or segment for that. But a better solution, since you don't need to know two parts (@ and variable_name) is to just tell the lexer to go ahead and parse the whole thing into one big symbol, that we will then use later in the parser:
bigquery_dialect.insert_lexer_matchers(
[
RegexLexer("atsign_literal", r"@[a-zA-Z_][\w]*", CodeSegment),
],
before="equals", )
Note the before="equals" which means we tell the lexer the order of preference to try to match this symbol. For example if we'd defined an at_sign lexing rule for other, standalone @ usage, then we'd want this to be considered first, and only fall back to that if we couldn't match this.
Keywords¶
Most dialects have a keywords file, listing all the keywords. Some dialects just inherit the ANSI keywords and then add or remove keywords from that. Not quite as accurate as managing the actual keywords, but a lot quicker and easier to manage usually!
Keywords are separated into RESERVED and UNRESERVED lists. RESERVED keywords have extra restrictions meaning they cannot be used as identifiers. If using a keyword in grammar (e.g. "SELECT"), then it needs to be in one of the Keywords lists so you may have to add it or you might see error's like this (showing "NAN" has not been added as a Keyword in this dialect):
RuntimeError: Grammar refers to 'NanKeywordSegment' which was not found in the redshift dialect
Also if editing the main ANSI dialect, and adding the the ANSI keyword list, then take care to consider if it needs added to the other dialects if they will inherit this syntax - usually yes unless explicitly overridden in those dialects.
Where to find the grammar for your database¶
Now that you know about some of the tools SQLFluff provides for lexing and parsing a SQL statement, what changes will you make to it? While devising ad-hoc changes to the grammar to fix particular issues can be better than nothing, the best and most robust contributions will be created by consulting the source of truth for the grammar of your dialect when mapping it to SQLFluff segments and grammars. This will help you exhaustively find all possible statements that would be accepted by the dialect.
Many computer languages are written using venerable tools like Flex and Bison, or similar parser generators, and SQL database engines are no exception. You can refer to the parser specification in the source code of your database engine for the ultimate source of truth of how a SQL statement will be parsed: you might be surprised at what your SQL engine will parse due to gaps in the documentation!
You should also refer to the reference documentation for your SQL dialect to get a concise high-level overview of what the statement grammar looks like, as well as read of any further restrictions and intended use of the grammar that you find. If your SQL engine is closed-source, then you'll likely have only the reference documentation to work with. However, this will always be a less-accurate resource than the bison grammar that's actually used for code generation inside the database engine itself.
It is also extremely helpful to try parsing the queries that you put into the test fixtures to make sure that they are actually parsable by the database engine. They don't have to be valid queries per se (can refer to non-existing table names, etc), but you should confirm that they are parsable. We do not want to require that SQLFluff be able to parse a statement that the actual database engine would reject: overeager matching logic can create parsing issues elsewhere.
Here is a list of grammars and parsing techniques for some of the dialects implemented by SQLFluff:
ANSI SQL¶
Unfortunately, the ANSI SQL standard is not free. If you want a licensed copy of the latest standard, it must be purchased: Part 2 is the most useful section for SQLFluff since it contains the grammar. There are, however, other resources you can find on the Internet related to this standard:
- modern-sql.com/standard: has a discussion on the various parts of the standard, and links to some older/draft versions of it that are out there.
- jakewheat.github.io/sql-overview: has a nice browsable view of (only) the BNF grammar.
- web.cecs.pdx.edu/~len/sql1999.pdf: a copy of the (much older) SQL:1999 standard.
- developer.mimer.com/services/mimer-sql-validator/: the SQL-2016 validator can be used to verify if a query can be parsed using the ANSI standard.
PostgreSQL¶
Simply Googling for pg <statement> will often bring up the documentation for an older PG version. Please be sure you're referring to the latest version of the documentation, as well as refer to the bison grammar.
- PostgreSQL Bison grammar
- PostgreSQL Flex scanner
- More information about the parsing stage
- Reference documentation for Postgres SQL statements
- To check if a statement is parseable, simply paste it into psql. If you get a ERROR: syntax error then it means that it can't be parsed. These queries do not need to be parsed by SQLFluff; please do not include them in the main test fixtures. If you get a different error, then it means the query was parsed successfully, and might have failed for a different reason (e.g. non-existing column name, etc). In that case, it's probably best if SQLFluff can also parse it.
- The pgsql-parser tool wraps the official PostgreSQL source code & bison grammar linked above into a simple CLI tool. You can use it if you want to view the exact parse tree that PG can see.
MySQL¶
- Reference documentation for MySQL SQL statements
- MySQL Bison grammar
- To check if a statement is parsable, simply paste it into mysql. Look for ERROR 1064 (42000): You have an error in your SQL syntax to indicate a parse error.
Example of contributing a syntax fix¶
So that's a bit of theory but let's go through some actual examples of how to add to the SQLFluff code to address any issues you are seeing. In this I'm not going to explain about how to set up your Python development environment (see the Contributing to SQLFluff and the CONTRIBUTING.md file for that), nor how to manage Git (see our How to use Git guide if new to that, and we use the standard “Fork, and then open a PR” workflow common to GitHub projects).
So assuming you know (or are willing to follow above guides to find out!) how to set up Python environment, and commit via Git, how do you contribute a simple fix to a dialect for syntax you want SQLFluff to support?
Example 1¶
If we look at issue #1520 it was raised to say we couldn't parse this:
CREATE OR REPLACE FUNCTION public.postgres_setof_test() RETURNS SETOF text
and instead returned this message:
Found unparsable section: 'CREATE OR REPLACE FUNCTION crw_public.po...'
This was in the postgres dialect, so I had a look at dialect_postgres.py and found the code in CreateFunctionStatementSegment which had the following:
parse_grammar = Sequence(
"CREATE",
Sequence("OR", "REPLACE", optional=True),
Ref("TemporaryGrammar", optional=True),
"FUNCTION",
Sequence("IF", "NOT", "EXISTS", optional=True),
Ref("FunctionNameSegment"),
Ref("FunctionParameterListGrammar"),
Sequence( # Optional function return type
"RETURNS",
OneOf(
Sequence(
"TABLE",
Bracketed(
Delimited(
OneOf(
Ref("DatatypeSegment"),
Sequence(
Ref("ParameterNameSegment"), Ref("DatatypeSegment")
),
),
delimiter=Ref("CommaSegment"),
)
),
optional=True,
),
Ref("DatatypeSegment"),
),
optional=True,
),
Ref("FunctionDefinitionGrammar"), )
So it allowed returning a table, or a datatype.
Fixing the issue was as simple as adding the SETOF structure as another return option:
parse_grammar = Sequence(
"CREATE",
Sequence("OR", "REPLACE", optional=True),
Ref("TemporaryGrammar", optional=True),
"FUNCTION",
Sequence("IF", "NOT", "EXISTS", optional=True),
Ref("FunctionNameSegment"),
Ref("FunctionParameterListGrammar"),
Sequence( # Optional function return type
"RETURNS",
OneOf(
Sequence(
"TABLE",
Bracketed(
Delimited(
OneOf(
Ref("DatatypeSegment"),
Sequence(
Ref("ParameterNameSegment"), Ref("DatatypeSegment")
),
),
delimiter=Ref("CommaSegment"),
)
),
optional=True,
),
Sequence(
"SETOF",
Ref("DatatypeSegment"),
),
Ref("DatatypeSegment"),
),
optional=True,
),
Ref("FunctionDefinitionGrammar"), )
With that code the above item could parse.
I added a test case (covered below) and submitted pull request #1522 to fix this.
Example 2¶
If we look at issue #1537 it was raised to say we couldn't parse this:
select 1 from group
And threw this error:
==== parsing violations ==== L: 1 | P: 10 | PRS | Line 1, Position 10: Found unparsable section: 'from' L: 1 | P: 14 | PRS | Line 1, Position 14: Found unparsable section: ' group'
The reporter had also helpfully included the parse tree (produced by sqlfluff parse):
[L: 1, P: 1] |file: [L: 1, P: 1] | statement: [L: 1, P: 1] | select_statement: [L: 1, P: 1] | select_clause: [L: 1, P: 1] | keyword: 'select' [L: 1, P: 7] | [META] indent: [L: 1, P: 7] | whitespace: ' ' [L: 1, P: 8] | select_clause_element: [L: 1, P: 8] | literal: '1' [L: 1, P: 9] | whitespace: ' ' [L: 1, P: 10] | [META] dedent: [L: 1, P: 10] | from_clause: [L: 1, P: 10] | unparsable: !! Expected: 'FromClauseSegment' [L: 1, P: 10] | keyword: 'from' [L: 1, P: 14] | unparsable: !! Expected: 'Nothing...' [L: 1, P: 14] | whitespace: ' ' [L: 1, P: 15] | raw: 'group' [L: 1, P: 20] | newline: '\n'
So the problem was it couldn't parse the FromClauseSegment. Looking at that definition showed this:
FromClauseTerminatorGrammar=OneOf(
"WHERE",
"LIMIT",
"GROUP",
"ORDER",
"HAVING",
"QUALIFY",
"WINDOW",
Ref("SetOperatorSegment"),
Ref("WithNoSchemaBindingClauseSegment"), ),
So the parser was terminating as soon as it saw the GROUP and saying "hey we must have reached the end of the :code:`FROM` clause".
This was a little restrictive so changing that to this solved the problem:
FromClauseTerminatorGrammar=OneOf(
"WHERE",
"LIMIT",
Sequence("GROUP", "BY"),
Sequence("ORDER", "BY"),
"HAVING",
"QUALIFY",
"WINDOW",
Ref("SetOperatorSegment"),
Ref("WithNoSchemaBindingClauseSegment"), ),
You can see we simply replaced the "GROUP" by a Sequence("GROUP", "BY") so it would only match if both words were given. Rechecking the example with this changed code, showed it now parsed. We did the same for "ORDER", and also changed a few other places in the code with similar clauses and added a test case (covered below) and submitted pull request #1546 to fix this.
Example 3¶
As an example of using the reference grammar to fix an existing SQLFluff grammar, pull request #4744 contributed the CREATE CAST / DROP CAST statements to SQLFluff from scratch for both ANSI and PostgreSQL dialects. The first step when contributing a new statement is to check whether the statement is part of the ANSI standard. If it is, then you very likely should first start by adding a generally vendor-neutral version to the SQLFluff ANSI dialect so that other dialects can inherit from it. Every database engine deviates from the ANSI standard in practice, but by adding a reasonably standard segment to the ANSI dialect, you'll probably do a reasonable thing for most other database dialects.
In this case, CREATE and DROP CAST were indeed defined in the ANSI standard <https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#_11_63_user_defined_cast_definition>, as quickly revealed by a quick search of the document:
<user-defined cast definition> ::=
CREATE CAST <left paren> <source data type> AS <target data type> <right paren>
WITH <cast function>
[ AS ASSIGNMENT ]
So the first step was to read this ANSI BNF grammar and use it to build a corresponding vendor-neutral CreateCastSegment in dialect_ansi.py.
class CreateCastStatementSegment(BaseSegment):
"""A `CREATE CAST` statement.
https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#_11_63_user_defined_cast_definition
"""
type = "create_cast_statement"
match_grammar: Matchable = Sequence(
"CREATE",
"CAST",
Bracketed(
Ref("DatatypeSegment"),
"AS",
Ref("DatatypeSegment"),
),
"WITH",
Ref.keyword("SPECIFIC", optional=True),
OneOf(
"ROUTINE",
"FUNCTION",
"PROCEDURE",
Sequence(
OneOf("INSTANCE", "STATIC", "CONSTRUCTOR", optional=True),
"METHOD",
),
),
Ref("FunctionNameSegment"),
Ref("FunctionParameterListGrammar", optional=True),
Sequence("FOR", Ref("ObjectReferenceSegment"), optional=True),
Sequence("AS", "ASSIGNMENT", optional=True),
) # Not shown: register the CreateCastStatementSegment in StatementSegment
As you work your way through the grammar, think about whether other parts of the SQL language might contain similar elements. For example, here we noticed that there are already segments we can reuse for data types, function names, and function parameter lists. This helped simplify our new grammar, as well as make it easy to centrally change those particular areas of the grammar in other dialects. Also consider whether there are entire new segments and grammars you should separately define in addition to the root statement segment you're writing. Introducing new and reusing existing segments adds structure to the SQLFluff parse tree that can make it easier for lint rules to analyze the tree. A strong indicator that there should be a shared segment or grammar is when the reference grammar has a symbol that is reused from multiple other symbols/statements.
After writing the ANSI segment (and corresponding tests), it was time to move on to the PostgreSQL grammar. In this case, a quick glance at the documentation shows us that there are some notable differences from ANSI SQL:
- You can only specify FUNCTION. Other keywords like ROUTINE and PROCEDURE are rejected.
- The SPECIFIC keyword is not supported.
- Most importantly: PG provides some non-standard extensions which we'd like to include, like WITHOUT FUNCTION and AS IMPLICIT.
However, we should also consult the bison grammar for CREATE CAST. Bison grammars tend to be very lengthy and daunting, but the right techniques can help you quickly and easily find what you're looking for:
- Search for a symbol by adding a : to the end of it.
- Start with the highest level of the thing you are looking for. For example, start with the top-level statement symbol. With PostgreSQL, all statements end with Stmt. Putting it all together, if we search for CreateCastStmt:, that takes us right to the definition for it.
- Drill down into deeper parts of the parser to learn more. For example, we see function_with_argtypes in the sequence; if we want to know what that means, search for function_with_argtypes: to find it.
Examining the Bison grammar can take a few extra minutes, but it can be rewarding. You'll be surprised what you might learn. I've found entire alternate spellings of keywords in there that were not in the documentation, and which testing showed were indeed valid SQL! The grammar in PG documentation is human-maintained and not auto-generated, so there can be and are gaps between what is parsable and what is documented.
A good approach if you're still learning might be to draft a segment from the high-level documentation, and then systematically go through it with the bison grammar and verify it's correct (and that you're not forgetting anything).
One aspect of bison grammars to be aware of is that the tend to be very recursive, because it doesn't have the high-level constructs such as AnyOf, Delimited, Bracketed, and so on that SQLFluff provides. On the other hand, SQLFluff doesn't scale well with recursion. Sometimes it's unavoidable and reasonable in many cases (e.g. parenthesized expression) to refer to another segment recursively. But many times the recursion is extremely trivial, and should always be rewritten using an existing high-level SQLFluff concept. For example, this bison grammar defines a bracketed comma-delimited list which would be better represented using Bracketed and Delimited in SQLFluff:
func_args: '(' func_args_list ')' { $$ = $2; }
| '(' ')' { $$ = NIL; }
; func_args_list:
func_arg { $$ = list_make1($1); }
| func_args_list ',' func_arg { $$ = lappend($1, $3); }
;
Example 4¶
As an example of using the reference grammar to fix an existing SQLFluff grammar, issue #4336 reported that array slices were not being parsed correctly in PostgreSQL. A simple SELECT statement was given that I further simplified to the following test case:
SELECT a[2:2+3];
Obviously, we know that a simple query like SELECT a; would parse, so it's surely related to the array access. I started by looking up the bison grammar for PostgreSQL's SELECT statement and drilling down into it to find an array accessor symbol; searching for SelectStmt: proved to be a lucky guess to start with:
SelectStmt: select_no_parens %prec UMINUS
| select_with_parens %prec UMINUS
;
Drilling down into the grammar via SelectStmt --> select_no_parens --> simple_select --> target_list --> target_el show that we are dealing with an a_expr, which is the main symbol widely used to represent an expression throughout the grammar. SQLFluff implements that as ExpressionSegment (and more specifically Expression_A_Grammar). Looking further: target_el --> a_expr --> c_expr --> columnref. Which brings us to a key rule:
columnref: <snip>
| ColId indirection
{
$$ = makeColumnRef($1, $2, @1, yyscanner);
}
Digging into indirection, we finally find where the array accessor is happening:
indirection:
indirection_el { $$ = list_make1($1); }
| indirection indirection_el { $$ = lappend($1, $2); }
; indirection_el: <snip>
| '[' a_expr ']'
{
A_Indices *ai = makeNode(A_Indices);
ai->is_slice = false;
ai->lidx = NULL;
ai->uidx = $2;
$$ = (Node *) ai;
}
| '[' opt_slice_bound ':' opt_slice_bound ']'
{
A_Indices *ai = makeNode(A_Indices);
ai->is_slice = true;
ai->lidx = $2;
ai->uidx = $4;
$$ = (Node *) ai;
}
; opt_slice_bound:
a_expr { $$ = $1; }
| /*EMPTY*/ { $$ = NULL; }
;
From this we observe:
- There is a sequence of indirection elements.
- There can be a simple array index provided, which is an expression.
- Most importantly, and most immediate to our problem, is the observation that each slice bound is optional, and if it is present, then it is an expression.
Now that we looked up the relevant PG grammar, we can dig into the corresponding SQLFluff grammar in a similar top-down way: postgres.SelectStatementSegment --> we see it's mostly a copy of the ANSI select statement, so --> ansi.SelectStatementSegment --> remember Ref always picks the dialect-specific grammar first --> postgres.SelectClauseSegment --> ansi.SelectClauseSegment.parse_grammar --> postgres.SelectClauseSegmentGrammar --> ansi.SelectClauseElementSegment --> ansi.BaseExpressionElementGrammar --> ansi.ExpressionSegment --> ansi.Expression_A_Grammar --> ansi.Expression_C_Grammar --> ansi.Expression_D_Grammar --> notice this at the end of the sequence --> postgres.Accessor_Grammar --> postgres.ArrayAccessorSegment. As you navigate, always remember to check for dialect-specific grammar before falling back to the inherited grammar (e.g. ANSI). Finally, we have found the part of the grammar that corresponds to the indirection_el in the bison grammar!
class ArrayAccessorSegment(ansi.ArrayAccessorSegment):
"""Overwrites Array Accessor in ANSI to allow n many consecutive brackets.
Postgres can also have array access like python [:2] or [2:] so
numbers on either side of the slice segment are optional.
"""
match_grammar = Sequence(
AnyNumberOf(
Bracketed(
Sequence(
OneOf(
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
),
Sequence(
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
optional=True,
),
Ref("SliceSegment"),
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
),
),
Sequence(
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
),
Ref("SliceSegment"),
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
optional=True,
),
),
),
),
bracket_type="square",
)
)
)
Observing this, we can make a few observations. The most glaring are that:
- Only numeric literals are accepted! No expressions. Clearly, that's the source of the issue that the person reported.
- But while we are here, notice another problem we can fix: when a SliceSegment (a ) is present, you're forced to include a numeric literal either before or after the SliceSegment. You can't have [:], even though that's valid SQL that PG can parse.
At this point, it's a simple matter of simplifying & rewriting the grammar to fix these shortcomings and better align it with the bison grammar, which was done in pull request #4748.
Testing your changes¶
So you've made your fix, you've tested it fixed the original problem so just submit that change, and all is good now?
Well, no. You want to do two further things:
- Test your change hasn't broken anything else. To do that you run the test suite.
- Add a test case, so others can check this in future.
To test your changes you'll need to have your environment set up (again see the CONTRIBUTING.md file for how to do that).
Adding test cases for your changes¶
Adding a test case is simple. Just add a SQL file to test/fixtures/dialects/ in the appropriate dialect directory. You can either expand an existing SQL file test case (e.g. if adding something similar to what's in there) or create a new one.
I advise adding the original SQL raised in the issue, and if you have examples from the official syntax, then they are always good test cases to add as well. For example, the Snowflake documentation has an example section at the bottom of every syntax definition so just copy all them into your example file too.
You should also use the reference grammar to exhaustively test various pedantic combinations of syntax. It doesn't have to be runnable. It just needs to parse correctly into the right structure, and be a statement that can get past the parsing stage of the database engine. The documentation often includes more simple examples that might not reflect all the real-world possibilities. While referring to the reference documentation / bison grammar, try to come up with a statement that uses as much of the grammar as it can!
Be sure that you verify that the SQL statements in your test are actually parsable by the database engine! An easy way to do that is often to copy/paste the statement into the console and try running it, or use a CLI parsing tool that uses the same source code as the database engine (e.g. pgsql-parser). An error is ok (e.g. invalid column name), as long as it's not a syntax error from parsing. Check the reference section at the top of this document for dialect-specific resources.
YML test fixture files¶
In addition to the SQL files, we have auto-generated YAML counterparts for them. The YAML contains the parsed version of the SQL, and having these in our source code, allows us to easily see if they change, so if someone redefines a syntax, which changes how a SQL statement is parsed, then the SQL won't change but the parse tree does, so by having that in our source code, and so checking that in with any pull request, we can spot that and make sure we're comfortable the change is expected. For most cases (except adding new test cases obviously!) you would not expect unrelated YML files to change so this is a good check.
To regenerate all the YAML files when you add or edit any test fixture SQL files run the following command:
tox -e generate-fixture-yml
You can also do the following to only generate for a particular dialect, or only for new and changed files, which is often quicker:
tox -e generate-fixture-yml -- --dialect postgres tox -e generate-fixture-yml -- --new-only
It takes a few mins to run, and regenerates all the YAML files. You can then do a git status to see any differences.
When making changes, make sure to check the post-parse structure from the test output or from the associated YAML file: check that each query element is typed correctly. Typical bugs can be that a standalone keyword (such as INTERVAL) is parsed as a function name, or that an element that should be date_part is parsed as an identifier. Typically there is no need to write assertions by hand, but it's the developer's responsibility to verify the structure from auto-generated YAML. One should not assume that everything is working just because no parsing error is raised.
Running the test suite¶
For the basic setup, see the local testing section of the CONTRIBUTING.md file first.
There's a few ways of running the test suite. You could just run the tox command, but this will run all the test suites, for various python versions, and with and without dbt, and take a long time. Best to leave that to our CI infrastructure. You just want to run what you need to have reasonable confidence before submitting.
Testing a single fixture¶
The dialects_test is parametrized to automatically pick all files under test/fixtures/dialects/.
For example if you're adding or modifying dialects/hive/select_interval.sql, you can test that with:
tox -e py38 -- -s test/dialects/dialects_test.py -k hive-select_interval.sql
The -s flag for pytest enables printing of post-parse structure, which allows you to quickly check that each query element is typed correctly. Same can be seen in the generated fixture YAML file.
To run it a bit faster, you can invoke pytest directly (requires that you have activated the project venv):
pytest -s test/dialects/dialects_test.py -k hive-select_interval.sql
Running all dialect tests¶
The following command runs just the dialect tests, for all dialects:
tox -e py38 -- test/dialects/dialects_test.py
The following command runs just the dialect tests, for a specific dialect:
tox -e py38 -- test/dialects/dialects_test.py -k ansi
Or, if making a dialect change to fix a rule that is incorrectly flagging, you can just run the tests for that one rule, for example to run the LT01 tests:
tox -e py38 -- -k LT01 test
Final checks before committing¶
For formatting and linting it's usually enough to rely on the pre-commit hook.
Run all tests (but only on one Python version, and without dbt):
tox -e py311
I like to kick that off just before opening a PR but does take ~10 minutes to run.
If you want also coverage & linting, run this instead (takes even more time):
tox -e generate-fixture-yml,cov-init,py311,cov-report,linting
Also it should be noted that the coverage tests require several versions to run (windows, and dbt) so can report missing coverage when run locally.
The rest can be left for the CI to check.
Regardless of what testing you do, GitHub will run the full regression suite when the PR is opened or updated. Note first time contributors will need a maintainer to kick off the tests until their first PR is merged.
Black code linting¶
These tools are run automatically by the pre-commit hook, but can also be run manually for those not using that.
We use ruff to lint our python code (being a linter ourselves we should have high quality code!). Our CI, or the tox commands above will run this and flag any errors.
In most cases running black on the python file(s) will correct any simple errors (e.g. line formatting) but for some you'll need to run ruff to see the issues and manually correct them.
Submitting your change¶
We use the standard GitHub workflow so simply fork the repo, clone it locally, make the change, push it to your fork, then open a pull request back to the original SQLFluff repo. There’s lots more info in our How to use Git guide if you're new to Git.
Once you open the PR CI tests will run, and after 5-10mins should complete. If all green, then a maintainer will pick it up as soon as they can. Have a good, easy to understand, small PR with all the tests passing, makes it easier to review so more likely to be merged quickly.
Questions¶
Feel free to open up any issues on GitHub, or join the SQLFluff Slack for any quick questions to the community/maintainers.
Developing Rules¶
Rules in SQLFluff are implemented as classes inheriting from BaseRule. SQLFluff crawls through the parse tree of a SQL file, calling the rule's _eval() function for each segment in the tree. For many rules, this allows the rule code to be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away.
Running Tests¶
The majority of the test cases for most bundled rules are "yaml test cases", i.e. test cases defined in yaml files. You can find those yaml fixtures on github. While this provides a very simple way to write tests, it can be occasionally tedious to run specific tests.
Within either a tox environment or virtualenv (as described in the contributing.md file), you can either run all of the rule yaml tests with:
pytest test/rules/yaml_test_cases_test.py -vv
...or to just run tests for a specific rule, there are two options for a syntax to select only those tests:
pytest -vv test/rules/ -k RF01
The -k option simply searches for the content of the argument being in the name of the test, which will match any single or combo tests for that rule. By convention, any test cases for a rule should include the code for that rule.
Traversal Options¶
recurse_into¶
Some rules are a poor fit for the simple traversal pattern described above. Typical reasons include:
- The rule only looks at a small portion of the file (e.g. the beginning or end).
- The rule needs to traverse the parse tree in a non-standard way.
These rules can override BaseRule's recurse_into field, setting it to False. For these rules False, _eval() is only called once, with the root segment of the tree. This can be much more efficient, especially on large files. For example, see rules LT13 and LT12 , which only look at the beginning or end of the file, respectively.
_works_on_unparsable¶
By default, SQLFluff calls _eval() for all segments, even "unparsable" segments, i.e. segments that didn't match the parsing rules in the dialect. This causes issues for some rules. If so, setting _works_on_unparsable to False tells SQLFluff not to call _eval() for unparsable segments and their descendants.
Performance-related Options¶
These are other fields on BaseRule. Rules can override them.
needs_raw_stack¶
needs_raw_stack defaults to False. Some rules use RuleContext.raw_stack property to access earlier segments in the traversal. This can be useful, but it adds significant overhead to the linting process. For this reason, it is disabled by default.
lint_phase¶
There are two phases of rule running.
1. The main phase is appropriate for most rules. These rules are assumed to interact and potentially cause a cascade of fixes requiring multiple passes. These rules run the runaway_limit number of times (default 10).
2. The post phase is for post-processing rules, not expected to trigger any downstream rules, e.g. capitalization fixes. They are run in a post-processing loop at the end. This loop is identical to the main loop, but is only run 2 times at the end (once to fix, and once again to confirm no remaining issues).
The two phases add complexity, but they also improve performance by allowing SQLFluff to run fewer rules during the main phase, which often runs several times.
NOTE: post rules also run on the first pass of the main phase so that any issues they find will be presented in the list of issues output by sqlfluff fix and sqlfluff lint.
Developing Plugins¶
SQLFluff is extensible through "plugins". We use the pluggy library to make linting Rules pluggable, which enable users to implement rules that are just too "organization specific" to be shared, or too platform specific to be included in the core library.
NOTE:
Importing in the main body of the module was previously our recommendation and so may be the case for versions of some plugins. If one of your plugins does use imports in this way, a warning will be presented, recommending that you update your plugin.
# The root module will need to import `hookimpl`, but
# should not yet import the rule definitions for the plugin.
from sqlfluff.core.plugin import hookimpl
@hookimpl
def get_rules():
# Rules should be imported within the `get_rules` method instead
from my_plugin.rules import MyRule
return [MyRule]
Creating a plugin¶
We have an example plugin in sqlfluff/plugins/sqlfluff-plugin-example which you can use as a template for rules, or the sqlfluff/plugins/sqlfluff-templater-dbt which you can use as a template for templater plugins.
Few things to note about plugins:¶
Currently, only Rules and Templaters can be added through plugins. Over time we expect more elements of SQLFluff will be extensible with plugins. Each plugin can implement multiple Rules or Templaters.
We recommend that the name of a plugin should start with "sqlfluff-" to be clear on the purpose of your plugin.
A plugin may need to include a default configuration if its rules are configurable: use plugin default configurations only for that reason! We advise against overwriting core configurations by using a default plugin configuration, as there is no mechanism in place to enforce precedence between the core library configs and plugin configs, and multiple plugins could clash.
A plugin Rule class name should have the structure: "Rule_PluginName_L000". The 'L' can be any letter and is meant to categorize rules; you could use the letter 'S' to denote rules that enforce security checks for example.
An important thing to note when running custom implemented rules: Run pip install -e ., inside the plugin folder so custom rules in linting are included.
A plugin Rule code includes the PluginName, so a rule "Rule_L000" in core will have code "L000", while "Rule_PluginName_L000" will have code "PluginName_L000". Codes are used to display errors, they are also used as configuration keys.
We make it easy for plugin developers to test their rules by exposing a testing library in sqlfluff.utils.testing.
Giving feedback¶
Would you like to have other parts of SQLFluff be "pluggable"? Tell us about it in a GitHub issue 😄.
Documentation Contributions¶
Contributing to the docs is one of the easiest and most helpful ways to help the project. Documentation changes require relatively little specialist knowledge apart from being familiar with how to use SQLFluff and the docs are read by a very wide range of people.
Documentation takes two forms:
- 1.
- Embedded documentation found in function and module docstrings.
- 2.
- The free-standing documentation which you're reading now, and hosted at docs.sqlfluff.com (built using sphinx and ReadtheDocs).
The two are somewhat blurred by the use of autodoc (and some other custom integrations), where documentation is generated directly off docstrings within the codebase, for example the Rules Reference, CLI Reference and Dialects Reference. To understand more about how the custom integrations we use to generate these docs, see the generate-auto-docs.py file.
Docstrings¶
Embedded documentation of functions, classes and modules is most useful for developer-focussed documentation as it's most accessible in the places which those developers are working: directly in the codebase. We enforce that docstrings are present and correctly formatted using the pydocstyle rules for ruff, which we have configured to enforce the google style of docstrings.
Sphinx Docs¶
The main documentation (which you're reading now), is build using sphinx, and written using reStructuredText (files ending with .rst). The sphinx project offers a reStructuredText primer for people who are new to the syntax (and the SQLFluff project uses doc8 in the CI process to try and catch any issues early).
On top of those docs, there are a few areas worth highlighting for new (or returning) users, which are either specific to the SQLFluff project, or not particularly clear in the sphinx docs:
- •
- reStructuredText is very similar to, but differs from (the somewhat more well known) Markdown syntax. Importantly:
- *text with single asterisks* renders as italics. Use **double asterisks** for bold text.
- code snippets are created using the directive, rather than just lone backticks () as found in most Markdown.
- •
- To create links to other parts of the documentation (i.e. Cross-referencing), use either the :ref: syntax.
- Docs for all the SQL dialects are auto generated with associated anchors to use for referencing. For example to link to the PostgreSQL dialect docs, you can use the . Replace the postgres portion with the name of the dialect you want to link to.
- Docs for all the bundled rules and handled using a customer sphinx
plugin, which means you can refer to them using their name or code:
resolves to LT01 and resolves to layout.spacing. - Docs for any of the python classes and modules handled using autodoc can be referenced as per their docs, so the sqlfluff.core.rules.base.BaseRule class can be referenced with . You can also use the ~ prefix (i.e. ) so that it just renders as BaseRule. See the docs for Cross-referencing for more details.
Configuration¶
Setting Configuration¶
SQLFluff accepts configuration either through the command line or through configuration files. There is rough parity between the two approaches with the exception that templating configuration must be done via a file, because it otherwise gets slightly complicated.
For details of what's available on the command line check out the CLI Reference.
Configuration Files¶
For file based configuration SQLFluff will look for the following files in order. Later files will (if found) will be used to overwrite any values read from earlier files.
- setup.cfg
- tox.ini
- pep8.ini
- .sqlfluff
- pyproject.toml
Within these files, the first four will be read like a cfg file, and SQLFluff will look for sections which start with sqlfluff, and where subsections are delimited by a semicolon. For example the jinjacontext section will be indicated in the section started with [sqlfluff:jinjacontext].
For example, a snippet from a .sqlfluff file (as well as any of the supported cfg file types):
[sqlfluff] templater = jinja sql_file_exts = .sql,.sql.j2,.dml,.ddl [sqlfluff:indentation] indented_joins = False indented_using_on = True template_blocks_indent = False [sqlfluff:templater] unwrap_wrapped_queries = True [sqlfluff:templater:jinja] apply_dbt_builtins = True
For the pyproject.toml file, all valid sections start with tool.sqlfluff and subsections are delimited by a dot. For example the jinjacontext section will be indicated in the section started with [tool.sqlfluff.jinjacontext].
For example, a snippet from a pyproject.toml file:
[tool.sqlfluff.core] templater = "jinja" sql_file_exts = ".sql,.sql.j2,.dml,.ddl" [tool.sqlfluff.indentation] indented_joins = false indented_using_on = true template_blocks_indent = false [tool.sqlfluff.templater] unwrap_wrapped_queries = true [tool.sqlfluff.templater.jinja] apply_dbt_builtins = true # For rule specific configuration, use dots between the names exactly # as you would in .sqlfluff. In the background, SQLFluff will unpack the # configuration paths accordingly. [tool.sqlfluff.rules.capitalisation.keywords] capitalisation_policy = "upper"
New Project Configuration¶
When setting up a new project with SQLFluff, we recommend keeping your configuration file fairly minimal. The config file should act as a form of documentation for your team i.e. a record of what decisions you've made which govern how your format your SQL. By having a more concise config file, and only defining config settings where they differ from the defaults - you are more clearly stating to your team what choices you've made.
However, there are also a few places where the default configuration is designed more for existing projects, rather than fresh projects, and so there is an opportunity to be a little stricter than you might otherwise be with an existing codebase.
Here is a simple configuration file which would be suitable for a starter project:
[sqlfluff] # Supported dialects https://docs.sqlfluff.com/en/stable/perma/dialects.html # Or run 'sqlfluff dialects' dialect = snowflake # One of [raw|jinja|python|placeholder] templater = jinja # Comma separated list of rules to exclude, or None # See https://docs.sqlfluff.com/en/stable/perma/rule_disabling.html # AM04 (ambiguous.column_count) and ST06 (structure.column_order) are # two of the more controversial rules included to illustrate usage. exclude_rules = ambiguous.column_count, structure.column_order # The standard max_line_length is 80 in line with the convention of # other tools and several style guides. Many projects however prefer # something a little longer. # Set to zero or negative to disable checks. max_line_length = 120 # CPU processes to use while linting. # The default is "single threaded" to allow easy debugging, but this # is often undesirable at scale. # If positive, just implies number of processes. # If negative or zero, implies number_of_cpus - specified_number. # e.g. -1 means use all processors but one. 0 means all cpus. processes = -1 # If using the dbt templater, we recommend setting the project dir. [sqlfluff:templater:dbt] project_dir = ./ [sqlfluff:indentation] # While implicit indents are not enabled by default. Many of the # SQLFluff maintainers do use them in their projects. allow_implicit_indents = True [sqlfluff:rules:aliasing.length] min_alias_length = 3 # The default configuration for capitalisation rules is "consistent" # which will auto-detect the setting from the rest of the file. This # is less desirable in a new project and you may find this (slightly # more strict) setting more useful. # Typically we find users rely on syntax highlighting rather than # capitalisation to distinguish between keywords and identifiers. # Clearly, if your organisation has already settled on uppercase # formatting for any of these syntax elements then set them to "upper". # See https://stackoverflow.com/questions/608196/why-should-i-capitalize-my-sql-keywords-is-there-a-good-reason [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower [sqlfluff:rules:capitalisation.identifiers] extended_capitalisation_policy = lower [sqlfluff:rules:capitalisation.functions] extended_capitalisation_policy = lower [sqlfluff:rules:capitalisation.literals] capitalisation_policy = lower [sqlfluff:rules:capitalisation.types] extended_capitalisation_policy = lower # The default configuration for the not equal convention rule is "consistent" # which will auto-detect the setting from the rest of the file. This # is less desirable in a new project and you may find this (slightly # more strict) setting more useful. [sqlfluff:rules:convention.not_equal] # Default to preferring the "c_style" (i.e. `!=`) preferred_not_equal_style = c_style
Nesting¶
SQLFluff uses nesting in its configuration files, with files closer overriding (or patching, if you will) values from other files. That means you'll end up with a final config which will be a patchwork of all the values from the config files loaded up to that path. The exception to this is the value for templater, which cannot be set in config files in subdirectories of the working directory. You don't need any config files to be present to make SQLFluff work. If you do want to override any values though SQLFluff will use files in the following locations in order, with values from later steps overriding those from earlier:
- 0.
- [...and this one doesn't really count] There's a default config as part of the SQLFluff package. You can find this below, in the Default Configuration section.
- 1.
- It will look in the user's os-specific app config directory. On macOS and Unix this is ~/.config/sqlfluff, Windows is <home>\AppData\Local\sqlfluff\sqlfluff, for any of the filenames above in the main Setting Configuration section. If multiple are present, they will patch/override each other in the order above.
- 2.
- It will look for the same files in the user's home directory (~).
- 3.
- [if the current working directory is a subdirectory of the user's home directory (~)] It will look for the same files in all directories between the user's home directory (~), and the current working directory.
- 4.
- It will look for the same files in the current working directory.
- 5.
- [if parsing a file in a subdirectory of the current working directory] It will look for the same files in every subdirectory between the current working dir and the file directory.
- 6.
- It will look for the same files in the directory containing the file being linted.
This whole structure leads to efficient configuration, in particular in projects which utilise a lot of complicated templating.
In-File Configuration Directives¶
In addition to configuration files mentioned above, SQLFluff also supports comment based configuration switching in files. This allows specific SQL file to modify a default configuration if they have specific needs.
When used, these apply to the whole file, and are parsed from the file in an initial step before the rest of the file is properly parsed. This means they can be used for both rule configuration and also for parsing configuration.
To use these, the syntax must start as an inline sql comment beginning with sqlfluff (i.e. -- sqlfluff). The line is then interpreted as a colon-separated address of the configuration value you wish to set. A few common examples are shown below:
-- Set Indented Joins -- sqlfluff:indentation:indented_joins:True -- Set a smaller indent for this file -- sqlfluff:indentation:tab_space_size:2 -- Set keywords to be capitalised -- sqlfluff:rules:capitalisation.keywords:capitalisation_policy:upper SELECT * FROM a
JOIN b USING(c)
We recommend only using this configuration approach for configuration that applies to one file in isolation. For configuration changes for areas of a project or for whole projects we recommend Nesting of configuration files.
This syntax is very similar to the method for Ignoring individual lines.
Rule Configuration¶
Rules can be configured with the .sqlfluff config files.
Common rule configurations can be set in the [sqlfluff:rules] section.
For example:
[sqlfluff:rules] allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all
Rule specific configurations are set in rule specific subsections.
For example, enforce that keywords are upper case by configuring the rule CP01:
[sqlfluff:rules:capitalisation.keywords] # Keywords capitalisation_policy = upper
All possible options for rule sections are documented in Rules Reference.
For an overview of the most common rule configurations that you may want to tweak, see Default Configuration (and use Rules Reference to find the available alternatives).
Enabling and Disabling Rules¶
The decision as to which rules are applied to a given file is applied on a file by file basis, by the effective configuration for that file. There are two configuration values which you can use to set this:
- rules, which explicitly enables the specified rules. If this parameter is unset or empty for a file, this implies "no selection" and so "all rules" is taken to be the meaning.
- exclude_rules, which explicitly disables the specified rules. This parameter is applied after the rules parameter so can be used to subtract from the otherwise enabled set.
Each of these two configuration values accept a comma separated list of references. Each of those references can be:
- a rule code e.g. LN01
- a rule name e.g. layout.indent
- a rule alias, which is often a deprecated code e.g. L003
- a rule group e.g. layout or capitalisation
These different references can be mixed within a given expression, which results in a very powerful syntax for selecting exactly which rules are active for a given file.
NOTE:
When considering configuration inheritance, each of rules and exclude_rules will totally overwrite any values in parent config files if they are set in a child file. While the subtraction operation between both of them is calculated "per file", there is no combination operation between two definitions of rules (just one overwrites the other).
The effect of this is that we recommend one of two approaches:
- 1.
- Simply only use rules. This has the upshot of each area of your project being very explicit in which rules are enabled. When that changes for part of your project you just reset the whole list of applicable rules for that part of the project.
- 2.
- Set a single rules value in your master project config file and then only use exclude_rules in sub-configuration files to turn off specific rules for parts of the project where those rules are inappropriate. This keeps the simplicity of only having one value which is inherited, but allows slightly easier and simpler rollout of new rules because we manage by exception.
For example, to disable the rules LT08 and RF02:
[sqlfluff] exclude_rules = LT08, RF02
To enable individual rules, configure rules, respectively.
For example, to enable RF02:
[sqlfluff] rules = RF02
Rules can also be enabled/disabled by their grouping. Right now, the only rule grouping is core. This will enable (or disable) a select group of rules that have been deemed 'core rules'.
[sqlfluff] rules = core
More information about 'core rules' can be found in the Rules Reference.
Additionally, some rules have a special force_enable configuration option, which allows to enable the given rule even for dialects where it is disabled by default. The rules that support this can be found in the Rules Reference.
The default values can be seen in Default Configuration.
See Ignoring Errors & Files for more information on how to turn ignore particular rules for specific lines, sections or files.
Downgrading rules to warnings¶
To keep displaying violations for specific rules, but not have those issues lead to a failed run, rules can be downgraded to warnings. Rules set as warnings won't cause a file to fail, but will still be shown in the CLI to warn users of their presence.
The configuration of this behaves very like exclude_rules above:
[sqlfluff] warnings = LT01, LT04
With this configuration, files with no other issues (other than those set to warn) will pass. If there are still other issues, then the file will still fail, but will show both warnings and failures.
== [test.sql] PASS L: 2 | P: 9 | LT01 | WARNING: Missing whitespace before + == [test2.sql] FAIL L: 2 | P: 8 | CP02 | Unquoted identifiers must be consistently upper case. L: 2 | P: 11 | LT01 | WARNING: Missing whitespace before +
This is particularly useful as a transitional tool when considering the introduction on new rules on a project where you might want to make users aware of issues without blocking their workflow (yet).
Layout & Spacing Configuration¶
The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. To understand more about this section, see the section of the docs dedicated to layout: Configuring Layout.
Layout & Whitespace Configuration¶
If there is one part of building a linter that is going to be controversial it's going to be whitespace (closely followed by cApiTaLiSaTiOn 😁).
More specifically, whitespace divides into three key themes:
- 1.
- Spacing: The amount of whitespace between elements on the same line.
- 2.
- Line Breaks: The choice of where within the code it is inappropriate, appropriate or even compulsory to have a line break.
- 3.
- Indentation: Given a line break, how much whitespace should precede the first code element on that line.
SQLFluff aims to be opinionated on this theme, but also configurable (see Configuring Layout). The tool will have a default viewpoint and will aim to have views on all of the important aspects of SQL layout, but if you (or your organisation) don't like those views then we aim to allow enough configuration that you can lint in line with your views, and still use SQLFluff. For more information on how to configure rules to your own viewpoint see Configuration.
NOTE:
Spacing¶
Of the different elements of whitespace, spacing is likely the least controversial. By default, all elements are separated by a single space character. Except for very specific circumstances (see section on Aligned elements), any additional space between elements is usually unwanted and a distraction for the reader. There are however several common cases where no whitespace is more appropriate, which fall into two cases (for more details on where to configure these see Configuring layout and spacing).
- 1.
- No whitespace but a newline is allowed. This option is configured using the touch option in the spacing_* configuration settings. The most common example of this is the spacing around commas. For example SELECT a , b would be unusual and more normally be written SELECT a, b. Inserting a newline between the a and comma would not cause issues and may even be desired, for example:
SELECT
col_a
, col_b
-- Newline present before column
, col_c
-- When inline, comma should still touch element before.
, GREATEST(col_d, col_e) as col_f FROM tbl_a
- 2.
- No whitespace and a newline is not allowed. This option is configured using the inline option in the spacing_* configuration settings. The most common example of this is spacing within the parts of qualified identifier e.g. my_schema.my_table. If a newline were present between the . and either my_schema or my_table, then the expression would not parse and so no newlines should be allowed.
Aligned elements¶
A special case of spacing is where elements are set to be aligned within some limits. This is not enabled by default, but can be be configured to achieve layouts like:
SELECT
a AS first_column,
b AS second_column,
(a + b) / 2 AS third_column FROM foo AS bar
In this example, the alias expressions are all aligned with each other. To configure this, SQLFluff needs to know what elements to align and how far to search to find elements which should be aligned with each other. The configuration to achieve this layout is:
[sqlfluff:layout:type:alias_expression] # We want non-default spacing _before_ the alias expressions. spacing_before = align # We want to align them within the next outer select clause. # This means for example that alias expressions within the FROM # or JOIN clause would _not_ be aligned with them. align_within = select_clause # The point at which to stop searching outward for siblings, which # in this example would likely be the boundary of a CTE. Stopping # when we hit brackets is usually a good rule of thumb for this # configuration. align_scope = bracketed
Of these configuration values, the align_scope is potentially the least obvious. The following example illustrates the impact it has.
-- With -- align_scope = bracketed -- align_within = select_clause WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar; -- With -- align_scope = bracketed -- align_within = statement WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar -- Now the FROM alias is also aligned. -- With -- align_scope = file -- align_within = select_clause WITH foo as (
SELECT
a,
b,
c AS first_column -- Now the aliases here are aligned
d + e AS second_column -- with the outer query. ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar -- With -- align_scope = file -- align_within = statement WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar
Line Breaks¶
When controlling line breaks, we are trying to achieve a few different things:
- 1.
- Do we have enough line breaks that line length doesn't become excessive. Long lines are hard to read, especially given that readers may be on varying screen sizes or have multiple windows open. This is (of course) configurable, but the default is 80 characters (in line with the dbt Labs SQL style guide.)
- 2.
- Is the positioning of blank lines (i.e. lines with nothing other than whitespace on them) appropriate. There are some circumstances where a blank line is desired (e.g. between CTEs). There are others where they are not, in particular multiple blank lines, for example at the beginning of a file.
- 3.
- Where we do have line breaks, are they positioned appropriately and consistently with regards to other elements around them. This is most common when it comes to commas, and whether they should be leading (e.g. , my_column) or trailing (e.g. my_column,). In less common cases, it may also be desirable for some elements to have both a line break before and after (e.g. a set operator such as UNION).
Indentation¶
Lastly, given we have multiple lines of SQL, to what extent should we indent some lines to provide visual cues to the structure of that SQL. It's important to note that SQL is not whitespace sensitive in its interpretation and that means that any principles we apply here are entirely for the benefit of humans. Your database doesn't care.
The indentation therefore should be treated as a hint to the reader of the structure of the code. This explains the common practice within most languages that nested elements (for example the contents of a set of brackets in a function call) should be indented one step from the outer elements. It's also convention that elements with the same level in a nested structure should have the same indentation, at least with regards to their local surroundings. As an example:
SELECT
nested_within_select AS first_column,
some_function(
nested_within_function,
also_nested_within_function
) AS indented_the_same_as_opening_bracket FROM indented_the_same_as_select
Comment Indents¶
NOTE:
Comments are dealt with differently, depending on whether they're block comments (/* like this */), which might optionally include newlines, or inline comments (-- like this) which are necessarily only on one line.
- •
- Block comments cannot share a line with any code elements (so in effect they must start on their own new line), they cannot be followed by any code elements on the same line (and so in effect must be followed by a newline, if we are to avoid trailing whitespace). None of the lines within the block comment may have an indent less than the first line of the block comment (although additional indentation within a comment is allowed), and that first line should be aligned with the first code element following the block comment.
SELECT
/* This is a block comment starting on a new line
which contains a newline (continuing with at least
the same indent.
- potentially containing greater indents
- having no other code following it in the same line
- and aligned with the line of code following it */
this_column as what_we_align_the_column_to FROM my_table
- •
- Inline comments can be on the same line as other code, but are subject to the same line-length restrictions. If they don't fit on the same line (or if it just looks nicer) they can also be the only element on a line. In this latter case, they should be aligned with the first code element following the comment.
SELECT
-- This is fine
this_column as what_we_align_to,
another_column as something_short, -- Is ok
case
-- This is aligned correctly with below
when indented then take_care
else try_harder
end as the_general_guidance -- Even here we align with the line below FROM my_table
NOTE:
Hanging Indents¶
One approach to indenting nested elements is a layout called a hanging indent. In this layout, there is no line break before the first nested element, but subsequent elements are indented to match the line position of that first element. Two examples might be:
-- A select statement with two hanging indents: SELECT no_line_break_before_me,
indented_to_match_the_first,
1 + (a
+ b) AS another_more_complex_example FROM my_table; -- This TSQL example is also in essence a hanging indent: DECLARE @prv_qtr_1st_dt DATETIME,
@last_qtr INT,
@last_qtr_first_mn INT,
@last_qtr_yr INT;
In some circumstances this layout can be quite neat (the DECLARE statement is a good example of this), however once indents are nested or indentation styles are mixed it can rapidly become confusing (as partially shown in the first example). Additionally, unless the leading element of the first line is very short, hanging indents use much larger indents than a traditional simple indent where a line break is used before the first element.
Hanging indents have been supported in SQLFluff up to the 1.x versions, however they will no longer by supported from 2.0.0 onwards. This is due to the ambiguity which they bring to fixing poorly formatted SQL. Take the following code:
SELECT this_is, badly_formatted, code_and,
not_obvious,
what_was, intended FROM my_table
Given the lack of line break between SELECT and this_is, it would appear that the user is intending a hanging indent, however it is also plausible that they did not and they just forgot to add a line break between them. This ambiguity is unhelpful, both for SQLFluff as a tool, but also for people who write SQL that there two ways of indenting their SQL. Given SQLFluff aims to provide consistency in SQL layout and remove some of the burden of needing to make choices like this - and that it would be very unusual to keep only hanging indents and disable traditional ones - the only route left to consistency is to not allow hanging indents. Starting in 2.0.0, any hanging indents detected will be converted to traditional indents.
Implicit Indents¶
A close cousin of the hanging indent is the implicit indent. While it does look a little like a hanging indent, it's much more consistent in its behaviour and is supported from SQLFluff 2.0.0 onwards.
An implicit indent is exactly like a normal indent, but doesn't have to be actually taken to influence the indentation of lines after it - it just needs to be left un-closed before the end of the line. These are normally available in clauses which take the form of KEYWORD <expression>, like WHERE clauses or CASE expressions.
-- This WHERE clause here takes advantage of an implicit indent. SELECT * FROM my_table WHERE condition_a
AND condition_b; -- With implicit indents disabled (which is currently the -- default), the above formulation is not allowed, and instead -- there should be a newline immediately after `WHERE` (which -- is the location of the _implicit_ indent). SELECT * FROM my_table WHERE
condition_a
AND condition_b;
When addressing both indentation and line-length, implicit indents allow a slightly more compact layout, without significant drawbacks in legibility. They also enable a style much closer to some established style guides.
They are however not recommended by many of the major style guides at time of writing (including the dbt Labs SQL style guide and the Mozilla SQL style guide), and so are disabled by default. To enable them, set the allow_implicit_indents flag in sqluff.indentation to True.
Templated Indents¶
SQLFluff supports templated elements in code, such as those offered by jinja2 (or dbt which relies on it). For simple cases, templated elements are handled as you would expect by introducing additional indents into the layout.
SELECT
a,
{% for n in ['b', 'c', 'd'] %}
-- This section is indented relative to 'a' because
-- it is inside a jinja for loop.
{{ n }},
{% endfor %}
e FROM my_table
This functionality can be turned off if you wish using the template_blocks_indent option in your Configuration.
It's important to note here, that SQLFluff lints the code after it has been rendered, and so only has access to code which is still present after that process.
SELECT
a,
{% if False %}
-- This section of the code cannot be linted because
-- it is never rendered due to the `if False` condition.
my + poorly
+ spaced - and/indented AS section_of_code
{% endif %}
e FROM my_table
More complex templated cases are usually characterised by templated tags cutting across the parse tree. This more formally is where the opening and closing tags of a templated section exist at different levels in the parsed structure. Starting in version 2.x, these will be treated differently (Prior to version 2.x, situations like this were sometimes handled inconsistently or incorrectly).
Indentation should act as a visual cue to the structure of the written SQL, and as such, the most important thing is that template tags belonging to the same block structure use the same indentation. In the example below, this is the opening and closing elements of the second if statement. If treated as a simple case, these tags would have different indents, because they are at different levels of the parse tree and so clearly there is a conflict to be resolved.
The view SQLFluff takes on how to resolve this conflict is to pull all of the tags in this section down to the indent of the least indented (in the example below that would be the closing endif tag). This is similar to the treatment of C Preprocessor Directives, which are treated somewhat as being outside the structure of the rest of the file. In these cases, the content is also not further indented as in the simple case because it makes it harder to line up elements within the affected section and outside (in the example below the SELECT and FROM are a good illustration).
SELECT
a,
{% if True %}
-- This is a simple case. The opening and closing tag are
-- both at the same level within the SELECT clause.
simple_case AS example,
{% endif %}
b, {% if True %}
-- This is a complex case. The opening tag is within the SELECT
-- clause, but the closing tag is outside the statement
-- entirely.
complex_case AS example FROM table_option_one {% else %}
complex_case_two AS example FROM table_option_two {% endif %}
Configuring Layout¶
Configuration for layout is spread across three places:
- 1.
- Indent behavior for particular dialect elements is controlled by the parser. This is because in the background SQLFluff inserts Indent and Dedent tokens into the parse tree where those things are expected. For more detail see Configuring indent locations.
- 2.
- Configuration for the spacing and line position of particular types of element (such as commas or operators) is set in the layout section of the config file. For more detail see Configuring layout and spacing.
- 3.
- Some elements of layout are still controlled by rules directly. These are usually very specific cases, see Rules Reference for more details.
Configuring indent locations¶
One of the key areas for this is the indentation of the JOIN expression, which we'll use as an example.
Semantically, a JOIN expression is part of the FROM expression and therefore would be expected to be indented. However according to many of the most common SQL style guides (including the dbt Labs SQL style guide and the Mozilla SQL style guide) the JOIN keyword is expected to at the same indent as the FROM keyword. By default, SQLFluff sides with the current consensus, which is to not indent the JOIN keyword, however this is one element which is configurable.
By setting values in the sqlfluff:indentation section of your config file you can control how this is parsed.
For example, the default indentation would be as follows:
SELECT
a,
b FROM my_table JOIN another_table
ON
condition1
AND condition2
By setting your config file to:
[sqlfluff:indentation] indented_joins = True
Then the expected indentation will be:
SELECT
a,
b FROM my_table
JOIN another_table
ON
condition1
AND condition2
There is a similar indented_using_on config (defaulted to True) which can be set to False to prevent the USING or ON clause from being indented, in which case the original SQL would become:
SELECT
a,
b FROM my_table JOIN another_table ON
condition1
AND condition2
It's worth noting at this point, that for some users, the additional line break after ON is unexpected, and this is a good example of an implicit indent. By setting your config to:
[sqlfluff:indentation] indented_using_on = False allow_implicit_indents = True
Then the expected indentation will be:
SELECT
a,
b FROM my_table JOIN another_table ON condition1
AND condition2
There is also a similar indented_on_contents config (defaulted to True) which can be set to False to align any AND subsections of an ON block with each other. If set to False (assuming implicit indents are still enabled) the original SQL would become:
SELECT
a,
b FROM my_table JOIN another_table
ON condition1
AND condition2
These can also be combined, so if indented_using_on config is set to False, indented_on_contents is also set to False, and allow_implicit_indents is set tot True then the SQL would become:
SELECT
a,
b FROM my_table JOIN another_table ON condition1 AND condition2
There is also a similar indented_ctes config (defaulted to False) which can be set to True to enforce CTEs to be indented within the WITH clause:
WITH
some_cte AS (
SELECT 1 FROM table1
),
some_other_cte AS (
SELECT 1 FROM table1
) SELECT 1 FROM some_cte
There is also a similar indented_then config (defaulted to True) which can be set to False to allow THEN without an indent after WHEN:
SELECT
a,
CASE
WHEN b >= 42 THEN
1
ELSE 0
END AS c FROM some_table
By default, SQLFluff aims to follow the most common approach to indentation. However, if you have other versions of indentation which are supported by published style guides, then please submit an issue on GitHub to have that variation supported by SQLFluff.
Configuring layout and spacing¶
The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. The syntax of this section is very expressive; however in normal use, only very small alterations should be necessary from the Default Configuration.
The syntax of the section headings here select by type, which corresponds to the type defined in the dialect. For example the following section applies to elements of the type comma, i.e. ,.
[sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing
Within these configurable sections there are a few key elements which are available:
- •
- Spacing Elements: spacing_before, spacing_after and spacing_within. For each of these options, there are a few possible settings:
- The default spacing for all elements is single unless otherwise specified. In this state, elements will be spaced with a single space character unless there is a line break between them.
- The value of touch allows line breaks, but if no line break is present, then no space should be present. A great example of this is the spacing before commas (as shown in the config above), where line breaks may be allowed, but if not they should touch the element before.
- Both of the above can be qualified with the :inline modifier - which prevents newlines within the segment. This is best illustrated by the spacing found in a qualified identifier like my_schema.my_table which uses touch:inline or other clauses where we want to force some elements to be on the same line.
- •
- Line Position: set using the line_position option. By default this is unset, which implies no particular line position requirements. The available options are:
- •
- trailing and leading, which are most common in the placement of commas. Both of these settings also allow the option of a comma on its own on a line, or in the middle of a line, but if there is a line break on either side then they make sure it's on the correct side. By default we assume trailing commas, but if you (or your organisation) have settled on leading commas then you should add the following section to your config:
[sqlfluff:layout:type:comma] line_position = leading
- alone, which means if there is a line break on either side, then there must be a line break on both sides (i.e. that it should be the only thing on that line.
- All of the above options can be qualified with the :strict modifier - which prevents the inline case. For example:
-- Setting line_position to just `alone` -- within [sqlfluff:layout:type:set_operator] -- would not allow: SELECT a UNION SELECT b; -- ...or... SELECT a UNION SELECT b; -- but *would* allow both of the following: SELECT a UNION SELECT b; SELECT a UNION SELECT b; -- However the default is set to `alone:strict` -- then the *only* acceptable configuration is: SELECT a UNION SELECT b;
Templating Configuration¶
This section explains how to configure templating for SQL files.
When writing SQL files, users might utilise some kind of templating. The SQL file itself is written with placeholders which get rendered to proper SQL at run time. This can range from very simple placeholder templating to complex Jinja templating.
SQLFluff supports templated sections in SQL, see Stage 1, the templater. This is achieved by the following set of operations:
- 1.
- SQLFluff pre-renders the templated SQL
- 2.
- SQLFluff applies the lint and fix operations to the rendered file
- 3.
- SQLFluff backports the rule violations to the templated section of the SQL.
SQLFluff does not automatically have access to the same environment used in production template setup. This means it is necessary to either provide that environment or provide dummy values to effectively render the template and generate valid SQL. Refer to the templater sections below for details.
SQLFluff natively supports the following templating engines
- Jinja templater
- Placeholder templater
- Python templater
Also, SQLFluff has an integration to use dbt as a templater.
- •
- dbt templater (via plugin which is covered in a different section).
NOTE:
For example, if the raw SQL uses a {% if condition %} block, the rendered version of the template will only include either the {% then %} or the {% else %} block (depending on the provided configuration for the templater), but not both.
In this case, because SQLFluff linting can only operate on the output of the templater, some areas of the raw SQL will never be seen by the linter and will not be covered by lint rules.
This is functionality we hope to support in future.
Jinja templater¶
The Jinja templater uses Jinja2 to render templates.
There are multiple, complementary ways of configuring the Jinja templater.
- Reading variables and Jinja macros directly from the SQLFLuff config file
- Loading macros from a path
- Using a library
Overview of Jinja templater's configuration options¶
Configuration | Variables | Macros | Filters | Documentation |
Config file | ✅ | ✅ | ❌ | Complex Jinja Variable Templating and Jinja Macro Templating (from config) |
Macro Path | ❌ | ✅ | ❌ | Jinja Macro Templating (from file) |
Library | ✅ | ✅ | ✅ | Library Templating |
For example, a snippet from a .sqlfluff file that uses all config options:
[sqlfluff] templater = jinja [sqlfluff:templater:jinja] apply_dbt_builtins = True load_macros_from_path = my_macros loader_search_path = included_templates library_path = sqlfluff_libs exclude_macros_from_path = my_macros_exclude [sqlfluff:templater:jinja:context] my_list = ['a', 'b', 'c'] MY_LIST = ("d", "e", "f") my_where_dict = {"field_1": 1, "field_2": 2} [sqlfluff:templater:jinja:macros] a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %}
Complex Jinja Variable Templating¶
Apart from the Generic variable templating that is supported for all templaters, two more advanced features of variable templating are available for Jinja.
case sensitivity and native python types. Both are illustrated in the following example:
[sqlfluff:templater:jinja:context] my_list = ['a', 'b', 'c'] MY_LIST = ("d", "e", "f") my_where_dict = {"field_1": 1, "field_2": 2}
SELECT
{% for elem in MY_LIST %}
'{{elem}}' {% if not loop.last %}||{% endif %}
{% endfor %} as concatenated_list FROM tbl WHERE
{% for field, value in my_where_dict.items() %}
{{field}} = {{value}} {% if not loop.last %}and{% endif %}
{% endfor %}
...will render as...
SELECT
'd' || 'e' || 'f' as concatenated_list FROM tbl WHERE
field_1 = 1 and field_2 = 2
Note that the variable was replaced in a case sensitive way and that the settings in the config file were interpreted as native python types.
Jinja Macro Templating (from config)¶
Macros (which also look and feel like functions are available only in the jinja templater. Similar to Generic Variable Templating, these are specified in config files, what's different in this case is how they are named. Similar to the context section above, macros are configured separately in the macros section of the config. Consider the following example.
If passed the following .sql file:
SELECT {{ my_macro(6) }} FROM some_table
...and the following configuration in .sqlfluff in the same directory (note the tight control of whitespace):
[sqlfluff:templater:jinja:macros] a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %}
...then before parsing, the sql will be transformed to:
SELECT 6 + 12 FROM some_table
Note that in the code block above, the variable name in the config is a_macro_def, and this isn't apparently otherwise used anywhere else. Broadly this is accurate, however within the configuration loader this will still be used to overwrite previous values in other config files. As such this introduces the idea of config blocks which could be selectively overwritten by other configuration files downstream as required.
Jinja Macro Templating (from file)¶
In addition to macros specified in the config file, macros can also be loaded from files or folders. This is specified in the config file:
[sqlfluff:templater:jinja] load_macros_from_path = my_macros,other_macros
load_macros_from_path is a comma-separated list of .sql files or folders. Locations are relative to the config file. For example, if the config file above was found at /home/my_project/.sqlfluff, then SQLFluff will look for macros in the folders /home/my_project/my_macros/ and /home/my_project/other_macros/, including any of their subfolders. Any macros defined in the config will always take precedence over a macro defined in the path.
exclude_macros_from_path works in the same manner as load_macros_from_path but allows you to have sqlfluff ignore certain macros. This can be useful if you have custom jinja tags.
Macros loaded from these files are available in every .sql file without requiring a Jinja include or import. They are loaded into the Jinja Global Namespace.
Note: The load_macros_from_path setting also defines the search path for Jinja include or import. As with loaded macros, subdirectories are also supported. For example, if load_macros_from_path is set to my_macros, and there is a file my_macros/subdir/my_file.sql, you can do:
{% include 'subdir/my_file.sql' %}
If you would like to define the Jinja search path without also loading the macros into the global namespace, use the loader_search_path setting instead.
NOTE:
REMEMBER: The reason SQLFluff supports macros is to enable it to parse templated sql without it being a blocker. It shouldn't be a requirement that the templating is accurate - it only needs to work well enough that parsing and linting are helpful.
Builtin Jinja Macro Blocks¶
One of the main use cases which inspired SQLFluff as a project was dbt. It uses jinja templating extensively and leads to some users maintaining large repositories of sql files which could potentially benefit from some linting.
NOTE:
To use the dbt templater, go to dbt templater.
SQLFluff anticipates this use case and provides some built in macro blocks in the Default Configuration which assist in getting started with dbt projects. In particular it provides mock objects for:
- ref: The mock version of this provided simply returns the model reference as the name of the table. In most cases this is sufficient.
- config: A regularly used macro in dbt to set configuration values. For linting purposes, this makes no difference and so the provided macro simply returns nothing.
NOTE:
Library Templating¶
If using SQLFluff with jinja as your templater, you may have library function calls within your sql files that can not be templated via the normal macro templating mechanisms:
SELECT foo, bar FROM baz {{ dbt_utils.group_by(2) }}
To template these libraries, you can use the sqlfluff:jinja:library_path config option:
[sqlfluff:templater:jinja] library_path = sqlfluff_libs
This will pull in any python modules from that directory and allow sqlfluff to use them in templates. In the above example, you might define a file at sqlfluff_libs/dbt_utils.py as:
def group_by(n):
return "GROUP BY 1,2"
If an __init__.py is detected, it will be loaded alongside any modules and submodules found within the library path.
SELECT
{{ custom_sum('foo', 'bar') }},
{{ foo.bar.another_sum('foo', 'bar') }} FROM
baz
sqlfluff_libs/__init__.py:
def custom_sum(a: str, b: str) -> str:
return a + b
sqlfluff_libs/foo/__init__.py:
# empty file
sqlfluff_libs/foo/bar.py:
def another_sum(a: str, b: str) -> str:
return a + b
Additionally, the library can be used to expose Jinja Filters to the Jinja environment used by SQLFluff.
This is achieve by setting a global variable named SQLFLUFF_JINJA_FILTERS. SQLFLUFF_JINJA_FILTERS is a dictionary where
- dictionary keys map to the Jinja filter name
- dictionary values map to the Python callable
For example, to make the Airflow filter ds available to SQLFLuff, add the following to the __init__.py of the library:
# https://github.com/apache/airflow/blob/main/airflow/templates.py#L53 def ds_filter(value: datetime.date | datetime.time | None) -> str | None:
"""Date filter."""
if value is None:
return None
return value.strftime("%Y-%m-%d") SQLFLUFF_JINJA_FILTERS = {"ds": ds_filter}
Now, ds can be used in SQL
SELECT "{{ "2000-01-01" | ds }}";
Jinja loader search path¶
The Jinja environment can be configured to search for files included with include or import in a list of folders. This is specified in the config file:
[sqlfluff:templater:jinja] loader_search_path = included_templates,other_templates
loader_search_path is a comma-separated list of folders. Locations are relative to the config file. For example, if the config file above was found at /home/my_project/.sqlfluff, then SQLFluff will look for included files in the folders /home/my_project/included_templates/ and /home/my_project/other_templates/, including any of their subfolders. For example, this will read from /home/my_project/included_templates/my_template.sql:
{% include 'included_templates/my_template.sql' %}
Any folders specified in the load_macros_from_path setting are automatically appended to the loader_search_path. It is not necessary to specify a given directory in both settings.
Unlike the load_macros_from_path setting, any macros within these folders are not automatically loaded into the global namespace. They must be explicitly imported using the import Jinja directive. If you would like macros to be automatically included in the global Jinja namespace, use the load_macros_from_path setting instead.
Interaction with --ignore=templating¶
Ignoring Jinja templating errors provides a way for users to use SQLFluff while reducing or avoiding the need to spend a lot of time adding variables to [sqlfluff:templater:jinja:context].
When --ignore=templating is enabled, the Jinja templater behaves a bit differently. This additional behavior is usually but not always helpful for making the file at least partially parsable and fixable. It definitely doesn’t guarantee that every file can be fixed, but it’s proven useful for some users.
Here's how it works:
- Within the expanded SQL, undefined variables are automatically replaced with the corresponding string value.
- If you do: {% include query %}, and the variable query is not defined, it returns a “file” containing the string query.
- If you do: {% include "query_file.sql" %}, and that file does not exist or you haven’t configured a setting for load_macros_from_path or loader_search_path, it returns a “file” containing the text query_file.
For example:
select {{ my_variable }} from {% include "my_table.sql" %}
is interpreted as:
select my_variable from my_table
The values provided by the Jinja templater act a bit (not exactly) like a mixture of several types:
- str
- int
- list
- Jinja's Undefined class
Because the values behave like Undefined, it's possible to replace them using Jinja's default() filter. For example:
select {{ my_variable | default("col_a") }} from my_table
is interpreted as:
select col_a from my_table
Placeholder templater¶
Libraries such as SQLAlchemy or Psycopg use different parameter placeholder styles to mark where a parameter has to be inserted in the query.
For example a query in SQLAlchemy can look like this:
SELECT * FROM table WHERE id = :myid
At runtime :myid will be replace by a value provided by the application and escaped as needed, but this is not standard SQL and cannot be parsed as is.
In order to parse these queries is then necessary to replace these placeholders with sample values, and this is done with the placeholder templater.
Placeholder templating can be enabled in the config using:
[sqlfluff] templater = placeholder
A few common styles are supported:
-- colon
WHERE bla = :my_name
-- colon_nospaces
-- (use with caution as more prone to false positives)
WHERE bla = table:my_name
-- colon_optional_quotes
SELECT :"column" FROM :table WHERE bla = :'my_name'
-- numeric_colon
WHERE bla = :2
-- pyformat
WHERE bla = %(my_name)s
-- dollar
WHERE bla = $my_name or WHERE bla = ${my_name}
-- question_mark
WHERE bla = ?
-- numeric_dollar
WHERE bla = $3 or WHERE bla = ${3}
-- percent
WHERE bla = %s
-- ampersand
WHERE bla = &s or WHERE bla = &{s} or USE DATABASE MARK_{ENV}
These can be configured by setting param_style to the names above:
[sqlfluff:templater:placeholder] param_style = colon my_name = 'john'
then you can set sample values for each parameter, like my_name above. Notice that the value needs to be escaped as it will be replaced as a string during parsing. When the sample values aren't provided, the templater will use parameter names themselves by default.
When parameters are positional, like question_mark, then their name is simply the order in which they appear, starting with 1.
[sqlfluff:templater:placeholder] param_style = question_mark 1 = 'john'
In case you need a parameter style different from the ones above, you can pass a custom regex.
[sqlfluff:templater:placeholder] param_regex = __(?P<param_name>[\w_]+)__ my_name = 'john'
N.B. quotes around param_regex in the config are interpreted literally by the templater. e.g. param_regex='__(?P<param_name>[w_]+)__' matches '__some_param__' not __some_param__
the named parameter param_name will be used as the key to replace, if missing, the parameter is assumed to be positional and numbers are used instead.
Also consider making a pull request to the project to have your style added, it may be useful to other people and simplify your configuration.
Python templater¶
Uses native Python f-strings. As described in Generic Variable Templating, an example usage would look be configured as follows:
If passed the following .sql file:
SELECT * FROM {tbl_name}
...and the following configuration in .sqlfluff in the same directory:
[sqlfluff] templater = python [sqlfluff:templater:python:context] tbl_name = my_table
...then before parsing, the sql will be transformed to:
SELECT * FROM my_table
Complex Python Variable Templating¶
Python string formatting supports accessing object attributes via dot notation (e.g. {foo.bar}). However, since we cannot create Python objects within configuration files, we need a workaround in order to provide dummy values to render templates containing these values. The SQLFluff python templater will interpret any variable containing a "." as a dictionary lookup on the magic fixed context key sqlfluff.
-- this SQL SELECT * FROM {foo.bar} -- becomes this SELECT * FROM {sqlfluff["foo.bar"]}
..which can be populated using the following configuration:
[sqlfluff:templater:python:context] sqlfluff = {"foo.bar": "abc"}
dbt templater¶
NOTE:
dbt templating is still a relatively new feature added in 0.4.0 and is still in very active development! If you encounter an issue, please let us know in a GitHub issue or on the SQLFluff slack workspace.
dbt is not the default templater for SQLFluff (it is jinja). dbt is a complex tool, so using the default jinja templater will be simpler. You should be aware when using the dbt templater that you will be exposed to some of the complexity of dbt. Users may wish to try both templaters and choose according to how they intend to use SQLFluff.
A simple rule of thumb might be:
- If you are using SQLFluff in a CI/CD context, where speed is not critical but accuracy in rendering sql is, then the dbt templater may be more appropriate.
- If you are using SQLFluff in an IDE or on a git hook, where speed of response may be more important, then the jinja templater may be more appropriate.
Pros:
- •
- Most (potentially all) macros will work
Cons:
- •
- More complex, e.g. using it successfully may require deeper understanding of your models and/or macros (including third-party macros)
- More configuration decisions to make
- Best practices are not yet established or documented
- •
- If your dbt model files access a database at compile time, using SQLFluff with the dbt templater will also require access to a database.
- •
- Note that you can often point SQLFluff and the dbt templater at a test database (i.e. it doesn't have to be the production database).
- •
- Runs slower
Installation & Configuration¶
In order to get started using SQLFluff with a dbt project you will first need to install the relevant dbt adapter for your dialect and the sqlfluff-templater-dbt package using your package manager of choice (e.g. pip install dbt-postgres sqlfluff-templater-dbt) and then will need the following configuration:
In .sqlfluff:
[sqlfluff] templater = dbt
In .sqlfluffignore:
target/ # dbt <1.0.0 dbt_modules/ # dbt >=1.0.0 dbt_packages/ macros/
You can set the dbt project directory, profiles directory and profile with:
[sqlfluff:templater:dbt] project_dir = <relative or absolute path to dbt_project directory> profiles_dir = <relative or absolute path to the directory that contains the profiles.yml file> profile = <dbt profile> target = <dbt target>
NOTE:
To use builtin dbt Jinja functions SQLFluff provides a configuration option that enables usage within templates.
[sqlfluff:templater:jinja] apply_dbt_builtins = True
This will provide dbt macros like ref, var, is_incremental(). If the need arises builtin dbt macros can be customised via Jinja macros in .sqlfluff configuration file.
[sqlfluff:templater:jinja:macros] # Macros provided as builtins for dbt projects dbt_ref = {% macro ref(model_ref) %}{{model_ref}}{% endmacro %} dbt_source = {% macro source(source_name, table) %}{{source_name}}_{{table}}{% endmacro %} dbt_config = {% macro config() %}{% for k in kwargs %}{% endfor %}{% endmacro %} dbt_var = {% macro var(variable, default='') %}item{% endmacro %} dbt_is_incremental = {% macro is_incremental() %}True{% endmacro %}
If your project requires that you pass variables to dbt through command line, you can specify them in template:dbt:context section of .sqlfluff. See below configuration and its equivalent dbt command:
[sqlfluff:templater:dbt:context] my_variable = 1
dbt run --vars '{"my_variable": 1}'
Known Caveats¶
- To use the dbt templater, you must set templater = dbt in the .sqlfluff config file in the directory where sqlfluff is run. The templater cannot be changed in .sqlfluff files in subdirectories.
- In SQLFluff 0.4.0 using the dbt templater requires that all files within the root and child directories of the dbt project must be part of the project. If there are deployment scripts which refer to SQL files not part of the project for instance, this will result in an error. You can overcome this by adding any non-dbt project SQL files to .sqlfluffignore.
Generic Variable Templating¶
Variables are available in all the templaters. By default the templating engine will expect variables for templating to be available in the config, and the templater will be look in the section corresponding to the context for that templater. By convention, the config for the jinja templater is found in the sqlfluff:templater:jinja:context section, the config for the python templater is found in the sqlfluff:templater:python:context section, the one for the placeholder templater is found in the sqlfluff:templater:placeholder:context section.
For example, if passed the following .sql file:
SELECT {{ num_things }} FROM {{ tbl_name }} WHERE id > 10 LIMIT 5
...and the following configuration in .sqlfluff in the same directory:
[sqlfluff:templater:jinja:context] num_things=456 tbl_name=my_table
...then before parsing, the sql will be transformed to:
SELECT 456 FROM my_table WHERE id > 10 LIMIT 5
NOTE:
Ignoring Errors & Files¶
Ignoring individual lines¶
Similar to flake8's ignore, individual lines can be ignored by adding -- noqa to the end of the line. Additionally, specific rules can be ignored by quoting their code or the category.
-- Ignore all errors SeLeCt 1 from tBl ; -- noqa -- Ignore rule CP02 & rule CP03 SeLeCt 1 from tBl ; -- noqa: CP02,CP03 -- Ignore all parsing errors SeLeCt from tBl ; -- noqa: PRS
NOTE:
Ignoring line ranges¶
Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive.
-- Ignore rule AL02 from this line forward SELECT col_a a FROM foo -- noqa: disable=AL02 -- Ignore all rules from this line forward SELECT col_a a FROM foo -- noqa: disable=all -- Enforce all rules from this line forward SELECT col_a a FROM foo -- noqa: enable=all
.sqlfluffignore¶
Similar to Git's .gitignore and Docker's .dockerignore, SQLFluff supports a .sqlfluffignore file to control which files are and aren't linted. Under the hood we use the python pathspec library which also has a brief tutorial in their documentation.
An example of a potential .sqlfluffignore placed in the root of your project would be:
# Comments start with a hash. # Ignore anything in the "temp" path /temp/ # Ignore anything called "testing.sql" testing.sql # Ignore any ".tsql" files *.tsql
Ignore files can also be placed in subdirectories of a path which is being linted and the sub files will also be applied within that subdirectory.
Ignoring types of errors¶
General categories of errors can be ignored using the --ignore command line option or the ignore setting in .sqlfluffignore. Types of errors that can be ignored include:
- lexing
- linting
- parsing
- templating
Default Configuration¶
The default configuration is as follows, note the Builtin Jinja Macro Blocks in section [sqlfluff:templater:jinja:macros] as referred to above.
NOTE:
This is for two reasons:
- 1.
- The config file should act as a form of documentation for your team. A record of what decisions you've made which govern how your format your sql. By having a more concise config file, and only defining config settings where they differ from the defaults - you are more clearly stating to your team what choices you've made.
- 2.
- As the project evolves, the structure of the config file may change and we will attempt to make changes as backward compatible as possible. If you have not overridden a config setting in your project, we can easily update the default config to match your expected behaviour over time. We may also find issues with the default config which we can also fix in the background. However, the longer your local config file, the more work it will be to update and migrate your config file between major versions.
If you are starting a fresh project and are looking for a good starter config, check out the New Project Configuration section above.
[sqlfluff] # verbose is an integer (0-2) indicating the level of log output verbose = 0 # Turn off color formatting of output nocolor = False # Supported dialects https://docs.sqlfluff.com/en/stable/perma/dialects.html # Or run 'sqlfluff dialects' dialect = None # One of [raw|jinja|python|placeholder] templater = jinja # Comma separated list of rules to check, default to all rules = all # Comma separated list of rules to exclude, or None exclude_rules = None # Below controls SQLFluff output, see max_line_length for SQL output output_line_length = 80 # Number of passes to run before admitting defeat runaway_limit = 10 # Ignore errors by category (one or more of the following, separated by commas: lexing,linting,parsing,templating) ignore = None # Warn only for rule codes (one of more rule codes, separated by commas: e.g. LT01,LT02) # Also works for templating and parsing errors by using TMP or PRS warnings = None # Whether to warn about unneeded '-- noqa:' comments. warn_unused_ignores = False # Ignore linting errors found within sections of code coming directly from # templated code (e.g. from within Jinja curly braces. Note that it does not # ignore errors from literal code found within template loops. ignore_templated_areas = True # can either be autodetect or a valid encoding e.g. utf-8, utf-8-sig encoding = autodetect # Ignore inline overrides (e.g. to test if still required) disable_noqa = False # Ignore inline overrides except those listed. # This take priority over `disabled_noqa`. disable_noqa_except = None # Comma separated list of file extensions to lint # NB: This config will only apply in the root folder sql_file_exts = .sql,.sql.j2,.dml,.ddl # Allow fix to run on files, even if they contain parsing errors # Note altering this is NOT RECOMMENDED as can corrupt SQL fix_even_unparsable = False # Very large files can make the parser effectively hang. # The more efficient check is the _byte_ limit check which # is enabled by default. The previous _character_ limit check # is still present for backward compatibility. This will be # removed in a future version. # Set either to 0 to disable. large_file_skip_char_limit = 0 large_file_skip_byte_limit = 20000 # CPU processes to use while linting. # If positive, just implies number of processes. # If negative or zero, implies number_of_cpus - specified_number. # e.g. -1 means use all processors but one. 0 means all cpus. processes = 1 # Max line length is set by default to be in line with the dbt style guide. # https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md # Set to zero or negative to disable checks. max_line_length = 80 # NOTE: Templater Variant rendering should currently be considered EXPERIMENTAL. # Only set `render_variant_limit` to more than 1 if you know what you're doing! # Implementation of this will also depend on your templater. render_variant_limit = 1 [sqlfluff:indentation] # See https://docs.sqlfluff.com/en/stable/perma/indent_locations.html indent_unit = space tab_space_size = 4 indented_joins = False indented_ctes = False indented_using_on = True indented_on_contents = True indented_then = True indented_then_contents = True allow_implicit_indents = False template_blocks_indent = True # This is a comma separated list of elements to skip # indentation edits to. skip_indentation_in = script_content # If comments are found at the end of long lines, we default to moving # them to the line _before_ their current location as the convention is # that a comment precedes the line it describes. However if you prefer # comments moved _after_, this configuration setting can be set to "after". trailing_comments = before # To exclude comment lines from indentation entirely set this to "True". ignore_comment_lines = False # Layout configuration # See https://docs.sqlfluff.com/en/stable/perma/layout_spacing.html [sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing [sqlfluff:layout:type:binary_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:statement_terminator] spacing_before = touch line_position = trailing [sqlfluff:layout:type:end_of_file] spacing_before = touch [sqlfluff:layout:type:set_operator] line_position = alone:strict [sqlfluff:layout:type:start_bracket] spacing_after = touch [sqlfluff:layout:type:end_bracket] spacing_before = touch [sqlfluff:layout:type:start_square_bracket] spacing_after = touch [sqlfluff:layout:type:end_square_bracket] spacing_before = touch [sqlfluff:layout:type:start_angle_bracket] spacing_after = touch [sqlfluff:layout:type:end_angle_bracket] spacing_before = touch [sqlfluff:layout:type:casting_operator] spacing_before = touch spacing_after = touch:inline [sqlfluff:layout:type:slice] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:dot] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:comparison_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:assignment_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:object_reference] spacing_within = touch:inline [sqlfluff:layout:type:numeric_literal] spacing_within = touch:inline [sqlfluff:layout:type:sign_indicator] spacing_after = touch:inline [sqlfluff:layout:type:tilde] spacing_after = touch:inline [sqlfluff:layout:type:function_name] spacing_within = touch:inline [sqlfluff:layout:type:function_contents] spacing_before = touch:inline [sqlfluff:layout:type:function_parameter_list] spacing_before = touch:inline [sqlfluff:layout:type:array_type] spacing_within = touch:inline [sqlfluff:layout:type:typed_array_literal] spacing_within = touch [sqlfluff:layout:type:sized_array_type] spacing_within = touch [sqlfluff:layout:type:struct_type] spacing_within = touch:inline [sqlfluff:layout:type:bracketed_arguments] spacing_before = touch:inline [sqlfluff:layout:type:match_condition] spacing_within = touch:inline [sqlfluff:layout:type:typed_struct_literal] spacing_within = touch [sqlfluff:layout:type:semi_structured_expression] spacing_within = touch:inline spacing_before = touch:inline [sqlfluff:layout:type:array_accessor] spacing_before = touch:inline [sqlfluff:layout:type:colon] spacing_before = touch [sqlfluff:layout:type:colon_delimiter] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:path_segment] spacing_within = touch [sqlfluff:layout:type:sql_conf_option] spacing_within = touch [sqlfluff:layout:type:sqlcmd_operator] # NOTE: This is the spacing between the operator and the colon spacing_before = touch [sqlfluff:layout:type:slash] spacing_before = any spacing_after = any [sqlfluff:layout:type:comment] spacing_before = any spacing_after = any [sqlfluff:layout:type:pattern_expression] # Snowflake pattern expressions shouldn't have their spacing changed. spacing_within = any [sqlfluff:layout:type:placeholder] # Placeholders exist "outside" the rendered SQL syntax # so we shouldn't enforce any particular spacing around # them. spacing_before = any spacing_after = any [sqlfluff:layout:type:common_table_expression] # The definition part of a CTE should fit on one line where possible. # For users which regularly define column names in their CTEs they # may which to relax this config to just `single`. spacing_within = single:inline # By setting a selection of clauses to "alone", we hint to the reflow # algorithm that in the case of a long single line statement, the # first place to add newlines would be around these clauses. # Setting this to "alone:strict" would always _force_ line breaks # around them even if the line isn't too long. [sqlfluff:layout:type:select_clause] line_position = alone [sqlfluff:layout:type:where_clause] line_position = alone [sqlfluff:layout:type:from_clause] line_position = alone [sqlfluff:layout:type:join_clause] line_position = alone [sqlfluff:layout:type:groupby_clause] line_position = alone [sqlfluff:layout:type:orderby_clause] # NOTE: Order by clauses appear in many places other than in a select # clause. To avoid unexpected behaviour we use `leading` in this # case rather than `alone`. line_position = leading [sqlfluff:layout:type:having_clause] line_position = alone [sqlfluff:layout:type:limit_clause] line_position = alone # Template loop tokens shouldn't dictate spacing around them. [sqlfluff:layout:type:template_loop] spacing_before = any spacing_after = any [sqlfluff:templater] unwrap_wrapped_queries = True [sqlfluff:templater:jinja] apply_dbt_builtins = True # Some rules can be configured directly from the config common to other rules [sqlfluff:rules] allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all [sqlfluff:rules:capitalisation.keywords] # Keywords capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.identifiers] # Unquoted identifiers extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.functions] # Function names extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.literals] # Null & Boolean Literals capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.types] # Data Types extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:ambiguous.join] # Fully qualify JOIN clause fully_qualify_join_types = inner [sqlfluff:rules:ambiguous.column_references] # GROUP BY/ORDER BY column references group_by_and_order_by_style = consistent [sqlfluff:rules:aliasing.table] # Aliasing preference for tables aliasing = explicit [sqlfluff:rules:aliasing.column] # Aliasing preference for columns aliasing = explicit [sqlfluff:rules:aliasing.unused] alias_case_check = dialect [sqlfluff:rules:aliasing.length] min_alias_length = None max_alias_length = None [sqlfluff:rules:aliasing.forbid] # Avoid table aliases in from clauses and join conditions. # Disabled by default for all dialects unless explicitly enabled. # We suggest instead using aliasing.length (AL06) in most cases. force_enable = False [sqlfluff:rules:convention.not_equal] # Consistent usage of preferred "not equal to" comparison preferred_not_equal_style = consistent [sqlfluff:rules:convention.select_trailing_comma] # Trailing commas select_clause_trailing_comma = forbid [sqlfluff:rules:convention.count_rows] # Consistent syntax to count all rows prefer_count_1 = False prefer_count_0 = False [sqlfluff:rules:convention.terminator] # Semi-colon formatting approach multiline_newline = False require_final_semicolon = False [sqlfluff:rules:convention.blocked_words] # Comma separated list of blocked words that should not be used blocked_words = None blocked_regex = None match_source = False [sqlfluff:rules:convention.quoted_literals] # Consistent usage of preferred quotes for quoted literals preferred_quoted_literal_style = consistent # Disabled for dialects that do not support single and double quotes for quoted literals (e.g. Postgres) force_enable = False [sqlfluff:rules:convention.casting_style] # SQL type casting preferred_type_casting_style = consistent [sqlfluff:rules:references.from] # References must be in FROM clause # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:references.qualification] # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:references.consistent] # References must be consistently used # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:references.keywords] # Keywords should not be used as identifiers. unquoted_identifiers_policy = aliases quoted_identifiers_policy = none # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:references.special_chars] # Special characters in identifiers unquoted_identifiers_policy = all quoted_identifiers_policy = all allow_space_in_identifier = False additional_allowed_characters = None ignore_words = None ignore_words_regex = None [sqlfluff:rules:references.quoting] # Policy on quoted and unquoted identifiers prefer_quoted_identifiers = False prefer_quoted_keywords = False ignore_words = None ignore_words_regex = None case_sensitive = True [sqlfluff:rules:layout.long_lines] # Line length ignore_comment_lines = False ignore_comment_clauses = False [sqlfluff:rules:layout.select_targets] wildcard_policy = single [sqlfluff:rules:structure.subquery] # By default, allow subqueries in from clauses, but not join clauses forbid_subquery_in = join [sqlfluff:rules:structure.join_condition_order] preferred_first_table_in_join_clause = earlier
Production Usage & Security¶
SQLFluff is designed to be used both as a utility for developers but also to be part of CI/CD pipelines.
Security Considerations¶
A full list of Security Advisories is available on GitHub.
Given the context of how SQLFluff is designed to be used, there are three different tiers of access which users may have access to manipulate how the tool functions in a secure environment.
- 1.
- Users may have edit access to the SQL code which is being linted. While SQLFluff does not execute the SQL itself, in the process of the templating step (in particular via jinja or dbt), certain macros may have the ability to execute arbitrary SQL code (e.g. the dbt run_query macro). For the Jinja templater, SQLFluff uses the Jinja2 SandboxedEnvironment to limit the execution on unsafe code. When looking to further secure this situation, see below for ways to limit the ability of users to import other libraries.
- 2.
- Users may have edit access to the SQLFluff :ref:`config-files`. In some (perhaps, many) environments, the users who can edit SQL files may also be able to access and edit the Configuration Files. It's important to note that because of In-File Configuration Directives, that users who can edit SQL files which are designed to be linted, will also have access to the vast majority of any configuration options available in Configuration Files. This means that there is minimal additional protection from restricting access to Configuration Files for users who already have access to edit the linting target files (as described above).
- 3.
- Users may have access to change how SQLFluff is invoked. SQLFluff
can be invoked either as a command line too or via the python API.
Typically the method is fixed for a given application. When thinking about
how to restrict the ability of users to call insecure code, SQLFluff aims
to provide options at the point of invocation. In particular, as described
above, the primary risk vector for SQLFluff is the macro environment as
described in Templating Configuration. To restrict users being able
to bring arbitrary python methods into sqlfluff via the
library_path configuration value (see Library Templating),
we recommend that for secure environments you override this config value
either by providing an override option to the FluffConfig
object if using the Python API or via the --library-path CLI
option:
To disable this option entirely via the CLI:
$ sqlfluff lint my_path --library-path none
To disable this option entirely via the python API:
"""This is an example of providing config overrides.""" from sqlfluff.core import FluffConfig, Linter sql = "SELECT 1\n" config = FluffConfig(
overrides={
"dialect": "snowflake",
# NOTE: We explicitly set the string "none" here rather
# than a None literal so that it overrides any config
# set by any config files in the path.
"library_path": "none",
} ) linted_file = Linter(config=config).lint_string(sql) assert linted_file.get_violations() == []
Using SQLFluff directly as a CLI application¶
The SQLFluff CLI application is a python application which means if depends on your host python environment (see Installing SQLFluff).
The exit code provided by SQLFluff when run as a command line utility is designed to assist usefulness in deployment pipelines. If no violations are found then the exit code will be 0. If violations are found then a non-zero code will be returned which can be interrogated to find out more.
- An error code of 0 means operation success, no issues found.
- An error code of 1 means operation success, issues found. For example this might mean that a linting issue was found, or that one file could not be parsed.
- An error code of 2 means an error occurred and the operation could not be completed. For example a configuration issue or an internal error within SQLFluff.
For details of what commands and options are available in the CLI see the CLI Reference.
Using SQLFluff on changes using diff-quality¶
For projects with large amounts of (potentially imperfect) SQL code, the full SQLFluff output could be very large, which can be distracting -- perhaps the CI build for a one-line SQL change shouldn't encourage the developer to fix lots of unrelated quality issues.
To support this use case, SQLFluff integrates with a quality checking tool called diff-quality. By running SQLFluff using diff-quality (rather than running it directly), you can limit the the output to the new or modified SQL in the branch (aka pull request or PR) containing the proposed changes.
Currently, diff-quality requires that you are using git for version control.
NOTE: Installing SQLFluff automatically installs the diff_cover package that provides the diff-quality tool.
Adding diff-quality to your builds¶
In your CI build script:
- 1.
- Set the current working directory to the git repository containing the SQL code to be checked.
- 2.
- Run diff-quality, specifying SQLFluff as the underlying tool:
$ diff-quality --violations sqlfluff
The output will look something like:
------------- Diff Quality Quality Report: sqlfluff Diff: origin/master...HEAD, staged and unstaged changes ------------- sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql (0.0%): sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql:5: Unquoted Identifiers must be consistently upper case. ------------- Total: 1 line Violations: 1 line % Quality: 0% -------------
These messages are basically the same as those provided directly by SQLFluff, although the format is a little different. Note that diff-quality only lists the line _numbers_, not the character position. If you need the character position, you will need to run SQLFluff directly.
NOTE:
- 1.
- diff-quality needs to be run from the root of your git repository (so that it can find the git metadata).
- 2.
- SQLFluff works best if the bulk of the configuration is done from a single .sqlfluff file, which should be in the root of your git repository.
- 3.
- If using dbt templater, then either place your dbt_project.yml file in the same root folder, or if you put it in a subfolder, then only invoke diff-quality and sqlfluff from the root and define the subfolder that the dbt project lives in using the .sqlfluff config file.
By aligning the paths of all three, you should be able to achieve a robust setup. If each is rooted in different paths if can be very difficult to achieve the same result, and the resulting behaviour can be difficult to debug.
To debug any issues relating to this setup, we recommend occasionally running sqlfluff directly using the main cli (i.e. calling sqlfluff lint my/project/path) and check whether that route gives you the results you expect. diff-quality should behave as though it's calling the SQLFluff CLI from the same path that you invoke diff-quality.
For more information on diff-quality and the diff_cover package, see the documentation on their github repository. It covers topics such as:
- Generating HTML reports
- Controlling which branch to compare against (i.e. to determine new/changed lines). The default is origin/main.
- Configuring diff-quality to return an error code if the quality is too low.
- Troubleshooting
Using pre-commit¶
pre-commit is a framework to manage git "hooks" triggered right before a commit is made.
A git hook is a git feature to "fire off custom scripts" when specific actions occur.
Using pre-commit with SQLFluff is a good way to provide automated linting to SQL developers.
With pre-commit, you also get the benefit of only linting/fixing the files that changed.
SQLFluff comes with two pre-commit hooks:
- sqlfluff-lint: returns linting errors.
- sqlfluff-fix: attempts to fix rule violations.
WARNING:
Although it is not advised, you can tell SQLFluff to try and fix these files by overriding the fix_even_unparsable setting in .sqlfluff config file or using the sqlfluff fix --FIX-EVEN-UNPARSABLE command line option.
Overriding this behavior may break your SQL. If you use this override, always be sure to review any fixes applied to files with templating or parse errors to verify they are okay.
You should create a file named .pre-commit-config.yaml at the root of your git project, which should look like this:
repos: - repo: https://github.com/sqlfluff/sqlfluff
rev: stable_version
hooks:
- id: sqlfluff-lint
# For dbt projects, this installs the dbt "extras".
# You will need to select the relevant dbt adapter for your dialect
# (https://docs.getdbt.com/docs/available-adapters):
# additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt']
- id: sqlfluff-fix
# Arbitrary arguments to show an example
# args: [--rules, "LT02,CP02"]
# additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt']
When trying to use the dbt templater, uncomment the additional_dependencies to install the extras. This is equivalent to running pip install <dbt-adapter> sqlfluff-templater-dbt.
You can specify the version of dbt-adapter used in pre-commit, for example:
additional_dependencies : ['dbt-bigquery==1.0.0', 'sqlfluff-templater-dbt']
See the list of available dbt-adapters.
Note that you can pass the same arguments available through the CLI using args:.
Ignoring files while using pre-commit¶
Under the hood, pre-commit works by passing specific files to SQLFluff. For example, if the only two files that are modified in your commit are file_a.sql and file_b.sql, then the command which is called in the background is sqlfluff lint file_a.sql file_b.sql. While this is efficient, it does produce some unwanted noise when also using .sqlfluffignore. This is because SQLFluff is designed to allow users to override an ignore configuration by passing the name of the file directly. This makes a lot of sense in a CLI context, but less so in the context of being invoked by pre-commit.
To avoid noisy logs when using both pre-commit and .sqlfluffignore, we recommend also setting the exclude argument in your .pre-commit-config.yaml file (either the top level config or the hook specific config). This will prevent files matching the given pattern being passed to SQLFluff and so silence any warnings about the .sqlfluffignore being overridden.
Using link https://github.com/features/actions'GitHub Actions link' to Annotate PRs¶
There are two way to utilize SQLFluff to annotate Github PRs.
- 1.
- When sqlfluff lint is run with the --format github-annotation-native option, it produces output formatted as Github workflow commands which are converted into pull request annotations by Github.
- 2.
- When sqlfluff lint is run with the --format github-annotation option, it produces output compatible with this action from yuzutech. Which uses Github API to annotate the SQL in GitHub pull requests.
WARNING:
There is an open feature request for GitHub Actions which you can track to follow this issue.
For more information and examples on using SQLFluff in GitHub Actions, see the sqlfluff-github-actions repository.
Reference¶
Dialects Reference¶
SQLFluff is designed to be flexible in supporting a variety of dialects. Not all potential dialects are supported so far, but several have been implemented by the community. Below are a list of the currently available dialects. Each inherits from another, up to the root ansi dialect.
For a canonical list of supported dialects, run the sqlfluff dialects command, which will output a list of the current dialects available on your installation of SQLFluff.
NOTE:
Consider when adding new features to a dialect:
- Should I be adding it just to this dialect, or adding it to a parent dialect?
- If I'm creating a new dialect, which dialect would be best to inherit from?
- Will the feature I'm adding break any downstream dependencies within dialects which inherit from this one?
ANSI¶
Label: ansi
Default Casing: UPPERCASE
Quotes: String Literals: '', Identifiers: ""
This is the base dialect which holds most of the definitions of common SQL commands and structures. If the dialect which you're actually using isn't specifically implemented by SQLFluff, using this dialect is a good place to start.
This dialect doesn't intend to be brutal in adhering to (and only to) the ANSI SQL spec (mostly because ANSI charges for access to that spec). It aims to be a representation of vanilla SQL before any other project adds their spin to it, and so may contain a slightly wider set of functions than actually available in true ANSI SQL.
AWS Athena¶
Label: athena
Inherits from: ANSI
Default Casing: lowercase
Quotes: String Literals: '', "" or , Identifiers: "" or
The dialect for Athena on Amazon Web Services (AWS).
Google BigQuery¶
Label: bigquery
Inherits from: ANSI
Default Casing: BigQuery resolves unquoted column identifiers case insensitively, and table/dataset identifiers case sensitively (by default, unless is_case_insensitive is set for the latter). Unless specified, columns are returned in the case which they were defined in, which means columns can be re-cased in the result set without aliasing e.g. if a table is defined with CREATE TEMPORARY TABLE foo (col1 int, COL2 int) then SELECT * FROM foo returns col1 and COL2 in the result, but SELECT COL1, col2 FROM foo returns COL1 and col2 in the result.
Quotes: String Literals: '', "", @ or @@ (with the quoted options, also supporting variants prefixes with r/R (for raw/regex expressions) or b/B (for byte strings)), Identifiers: "" or .
The dialect for BigQuery on Google Cloud Platform (GCP).
ClickHouse¶
Label: clickhouse
Inherits from: ANSI
Default Casing: Clickhouse is case sensitive throughout, regardless of quoting. An unquoted reference to an object using the wrong case will raise an UNKNOWN_IDENTIFIER error.
Quotes: String Literals: '', Identifiers: "" or . Note as above, that because identifiers are always resolved case sensitively, the only reason for quoting identifiers is when they contain invalid characters or reserved keywords.
The dialect for ClickHouse.
Databricks¶
Label: databricks
Inherits from: Apache Spark SQL
The dialect for Databricks.
IBM Db2¶
Label: db2
Inherits from: ANSI
The dialect for IBM Db2.
DuckDB¶
Label: duckdb
Inherits from: PostgreSQL
Default Casing: DuckDB stores all identifiers in the case they were defined, however all identifier resolution is case-insensitive (when unquoted, and more unusually, also when quoted). See the DuckDB Identifiers Documentation for more details.
Quotes: String Literals: '', Identifiers: "" or ''
The dialect for DuckDB.
Exasol¶
Label: exasol
Inherits from: ANSI
The dialect for Exasol.
Greenplum¶
Label: greenplum
Inherits from: PostgreSQL
The dialect for Greenplum.
Apache Hive¶
Label: hive
Inherits from: ANSI
The dialect for Apache Hive.
MariaDB¶
Label: mariadb
Inherits from: MySQL
Default Casing: lowercase
Quotes: String Literals: '', "" or @, Identifiers: .
The dialect for MariaDB.
Materialize¶
Label: materialize
Inherits from: PostgreSQL
The dialect for Materialize.
MySQL¶
Label: mysql
Inherits from: ANSI
Default Casing: lowercase
Quotes: String Literals: '', "" or @, Identifiers: .
The dialect for MySQL.
Oracle¶
Label: oracle
Inherits from: ANSI
The dialect for Oracle SQL. Note: this does not include PL/SQL.
PostgreSQL¶
Label: postgres
Inherits from: ANSI
Default Casing: lowercase
Quotes: String Literals: '', Identifiers: "".
This is based around the PostgreSQL spec. Many other SQL dialects are often based on the PostreSQL syntax. If you're running an unsupported dialect, then this is often the dialect to use (until someone makes a specific dialect).
AWS Redshift¶
Label: redshift
Inherits from: PostgreSQL
Default Casing: lowercase (unless configured to be case sensitive with all identifiers using the enable_case_sensitive_identifier configuration value, see the Redshift Names & Identifiers Docs).
Quotes: String Literals: '', Identifiers: "".
The dialect for Redshift on Amazon Web Services (AWS).
Snowflake¶
Label: snowflake
Inherits from: ANSI
Default Casing: UPPERCASE
Quotes: String Literals: '', Identifiers: ""
The dialect for Snowflake, which has much of its syntax inherited from PostgreSQL.
Salesforce Object Query Language (SOQL)¶
Label: soql
Inherits from: ANSI
The dialect for SOQL (Salesforce Object Query Language).
Apache Spark SQL¶
Label: sparksql
Inherits from: ANSI
Default Casing: SparkSQL is case insensitive with both quoted and unquoted identifiers (_"delimited"_ identifiers in Spark terminology). See the Spark Identifiers docs.
Quotes: String Literals: '' or "", Identifiers: .
The dialect for Apache Spark SQL. This includes relevant syntax from Apache Hive for commands that permit Hive Format. Spark SQL extensions provided by the Delta Lake project are also implemented in this dialect.
This implementation focuses on the Ansi Compliant Mode introduced in Spark3, instead of being Hive Compliant. The introduction of ANSI Compliance provides better data quality and easier migration from traditional DBMS.
Versions of Spark prior to 3.x will only support the Hive dialect.
SQLite¶
Label: sqlite
Inherits from: ANSI
Default Casing: Not specified in the docs, but through testing it appears that SQLite stores column names in whatever case they were defined, but is always case-insensitive when resolving those names.
Quotes: String Literals: '' (or "" if not otherwise resolved to an identifier), Identifiers: "", [] or . See the SQLite Keywords Docs for more details.
The dialect for SQLite.
Teradata¶
Label: teradata
Inherits from: ANSI
The dialect for Teradata.
Trino¶
Label: trino
Inherits from: ANSI
Default Casing: lowercase, although the case of a reference is used in the result set column label. If a column is defined using CREATE TEMPORARY TABLE foo (COL1 int), then SELECT * FROM foo returns a column labelled col1, however SELECT COL1 FROM foo returns a column labelled COL1.
Quotes: String Literals: '', Identifiers: ""
The dialect for Trino.
Microsoft T-SQL¶
Label: tsql
Inherits from: ANSI
The dialect for T-SQL (aka Transact-SQL).
Vertica¶
Label: vertica
Inherits from: ANSI
The dialect for Vertica.
Rules Reference¶
This page is an index of available rules which are bundled with SQLFluff.
- For information on how to configure which rules are enabled for your project see Enabling and Disabling Rules.
- If you just want to turn rules on or off for specific files, or specific sections of files, see Ignoring Errors & Files.
- For more information on how to configure the rules which you do enable see Rule Configuration.
Core Rules¶
Certain rules belong to the core rule group. In order for a rule to be designated as core, it must meet the following criteria:
- Stable
- Applies to most dialects
- Could detect a syntax issue
- Isn’t too opinionated toward one style (e.g. the dbt style guide)
Core rules can also make it easier to roll out SQLFluff to a team by only needing to follow a 'common sense' subset of rules initially, rather than spending time understanding and configuring all the rules, some of which your team may not necessarily agree with.
We believe teams will eventually want to enforce more than just the core rules, and we encourage everyone to explore all the rules and customize a rule set that best suites their organization.
See the Configuration section for more information on how to enable only core rules by default.
Rule Index¶
Bundle | Rule Name | Code | Aliases |
Aliasing bundle | aliasing.table | AL01 | L011 |
aliasing.column | AL02 | L012 | |
aliasing.expression | AL03 | L013 | |
aliasing.unique.table | AL04 | L020 | |
aliasing.unused | AL05 | L025 | |
aliasing.length | AL06 | L066 | |
aliasing.forbid | AL07 | L031 | |
aliasing.unique.column | AL08 | ||
aliasing.self_alias.column | AL09 | ||
Ambiguous bundle | ambiguous.distinct | AM01 | L021 |
ambiguous.union | AM02 | L033 | |
ambiguous.order_by | AM03 | L037 | |
ambiguous.column_count | AM04 | L044 | |
ambiguous.join | AM05 | L051 | |
ambiguous.column_references | AM06 | L054 | |
ambiguous.set_columns | AM07 | L068 | |
Capitalisation bundle | capitalisation.keywords | CP01 | L010 |
capitalisation.identifiers | CP02 | L014 | |
capitalisation.functions | CP03 | L030 | |
capitalisation.literals | CP04 | L040 | |
capitalisation.types | CP05 | L063 | |
Convention bundle | convention.not_equal | CV01 | L061 |
convention.coalesce | CV02 | L060 | |
convention.select_trailing_comma | CV03 | L038 | |
convention.count_rows | CV04 | L047 | |
convention.is_null | CV05 | L049 | |
convention.terminator | CV06 | L052 | |
convention.statement_brackets | CV07 | L053 | |
convention.left_join | CV08 | L055 | |
convention.blocked_words | CV09 | L062 | |
convention.quoted_literals | CV10 | L064 | |
convention.casting_style | CV11 | L067 | |
Jinja bundle | jinja.padding | JJ01 | L046 |
Layout bundle | layout.spacing | LT01 | L001, L005, L006, L008, L023, L024, L039, L048, L071 |
layout.indent | LT02 | L002, L003, L004 | |
layout.operators | LT03 | L007 | |
layout.commas | LT04 | L019 | |
layout.long_lines | LT05 | L016 | |
layout.functions | LT06 | L017 | |
layout.cte_bracket | LT07 | L018 | |
layout.cte_newline | LT08 | L022 | |
layout.select_targets | LT09 | L036 | |
layout.select_modifiers | LT10 | L041 | |
layout.set_operators | LT11 | L065 | |
layout.end_of_file | LT12 | L009, layout.end-of-file | |
layout.start_of_file | LT13 | L050 | |
References bundle | references.from | RF01 | L026 |
references.qualification | RF02 | L027 | |
references.consistent | RF03 | L028 | |
references.keywords | RF04 | L029 | |
references.special_chars | RF05 | L057 | |
references.quoting | RF06 | L059 | |
Structure bundle | structure.else_null | ST01 | L035 |
structure.simple_case | ST02 | L043 | |
structure.unused_cte | ST03 | L045 | |
structure.nested_case | ST04 | L058 | |
structure.subquery | ST05 | L042 | |
structure.column_order | ST06 | L034 | |
structure.using | ST07 | L032 | |
structure.distinct | ST08 | L015 | |
structure.join_condition_order | ST09 | ||
TSQL bundle | tsql.sp_prefix | TQ01 | L056 |
Aliasing bundle¶
Implicit/explicit aliasing of table.¶
Aliasing of table to follow preference (requiring an explicit AS is the default).
This rule is sqlfluff fix compatible.
Name: aliasing.table
Aliases: L011
Groups: all, aliasing
Configuration
- •
- aliasing: Should alias have an explicit AS or is implicit aliasing required? Must be one of ['implicit', 'explicit'].
Anti-pattern
In this example, the alias voo is implicit.
SELECT
voo.a FROM foo voo
Best practice
Add AS to make it explicit.
SELECT
voo.a FROM foo AS voo
Implicit/explicit aliasing of columns.¶
Aliasing of columns to follow preference (explicit using an AS clause is default).
Name: aliasing.column
Aliases: L012
Groups: all, core, aliasing
Configuration
- •
- aliasing: Should alias have an explicit AS or is implicit aliasing required? Must be one of ['implicit', 'explicit'].
Anti-pattern
In this example, the alias for column a is implicit.
SELECT
a alias_col FROM foo
Best practice
Add AS to make it explicit.
SELECT
a AS alias_col FROM foo
Column expression without alias. Use explicit AS clause.¶
Name: aliasing.expression
Aliases: L013
Groups: all, core, aliasing
Configuration
- •
- allow_scalar: Whether or not to allow a single element in the select clause to be without an alias. Must be one of [True, False].
Anti-pattern
In this example, there is no alias for both sums.
SELECT
sum(a),
sum(b) FROM foo
Best practice
Add aliases.
SELECT
sum(a) AS a_sum,
sum(b) AS b_sum FROM foo
Table aliases should be unique within each clause.¶
Reusing table aliases is very likely a coding error.
Name: aliasing.unique.table
Aliases: L020
Groups: all, core, aliasing, aliasing.unique
Anti-pattern
In this example, the alias t is reused for two different tables:
SELECT
t.a,
t.b FROM foo AS t, bar AS t -- This can also happen when using schemas where the -- implicit alias is the table name: SELECT
a,
b FROM
2020.foo,
2021.foo
Best practice
Make all tables have a unique alias.
SELECT
f.a,
b.b FROM foo AS f, bar AS b -- Also use explicit aliases when referencing two tables -- with the same name from two different schemas. SELECT
f1.a,
f2.b FROM
2020.foo AS f1,
2021.foo AS f2
Tables should not be aliased if that alias is not used.¶
This rule is sqlfluff fix compatible.
Name: aliasing.unused
Aliases: L025
Groups: all, core, aliasing
Configuration
- •
- alias_case_check: How to handle comparison casefolding in an alias. Must be one of ['dialect', 'case_insensitive', 'quoted_cs_naked_upper', 'quoted_cs_naked_lower', 'case_sensitive'].
Anti-pattern
SELECT
a FROM foo AS zoo
Best practice
Use the alias or remove it. An unused alias makes code harder to read without changing any functionality.
SELECT
zoo.a FROM foo AS zoo -- Alternatively... SELECT
a FROM foo
Enforce table alias lengths in from clauses and join conditions.¶
Name: aliasing.length
Aliases: L066
Groups: all, core, aliasing
Configuration
- max_alias_length: The maximum length of an alias to allow without raising a violation. Must be one of range(0, 1000).
- min_alias_length: The minimum length of an alias to allow without raising a violation. Must be one of range(0, 1000).
Anti-pattern
In this example, alias o is used for the orders table.
SELECT
SUM(o.amount) as order_amount, FROM orders as o
Best practice
Avoid aliases. Avoid short aliases when aliases are necessary.
See also: AL07.
SELECT
SUM(orders.amount) as order_amount, FROM orders SELECT
replacement_orders.amount,
previous_orders.amount FROM
orders AS replacement_orders JOIN
orders AS previous_orders
ON replacement_orders.id = previous_orders.replacement_id
Avoid table aliases in from clauses and join conditions.¶
This rule is sqlfluff fix compatible.
Name: aliasing.forbid
Aliases: L031
Groups: all, aliasing
Configuration
- •
- force_enable: Run this rule even for dialects where this rule is disabled by default. Must be one of [True, False].
NOTE:
This rule is controversial and for many larger databases avoiding alias is neither realistic nor desirable. In particular for BigQuery due to the complexity of backtick requirements and determining whether a name refers to a project or dataset so automated fixes can potentially break working SQL code. For most users AL06 is likely a more appropriate linting rule to drive a sensible behaviour around aliasing.
The stricter treatment of aliases in this rule may be useful for more focused projects, or temporarily as a refactoring tool because the fix routine of the rule can remove aliases.
This rule is disabled by default for all dialects it can be enabled with the force_enable = True flag.
Anti-pattern
In this example, alias o is used for the orders table, and c is used for customers table.
SELECT
COUNT(o.customer_id) as order_amount,
c.name FROM orders as o JOIN customers as c on o.id = c.user_id
Best practice
Avoid aliases.
SELECT
COUNT(orders.customer_id) as order_amount,
customers.name FROM orders JOIN customers on orders.id = customers.user_id -- Self-join will not raise issue SELECT
table1.a,
table_alias.b, FROM
table1
LEFT JOIN table1 AS table_alias ON
table1.foreign_key = table_alias.foreign_key
Column aliases should be unique within each clause.¶
Reusing column aliases is very likely a coding error. Note that while in many dialects, quoting an identifier makes it case-sensitive this rule always compares in a case-insensitive way. This is because columns with the same name, but different case, are still confusing and potentially ambiguous to other readers.
In situations where it is necessary to have columns with the same name (whether they differ in case or not) we recommend disabling this rule for either just the line, or the whole file.
Name: aliasing.unique.column
Groups: all, core, aliasing, aliasing.unique
Anti-pattern
In this example, the alias foo is reused for two different columns:
SELECT
a as foo,
b as foo FROM tbl; -- This can also happen when referencing the same column -- column twice, or aliasing an expression to the same -- name as a column: SELECT
foo,
foo,
a as foo FROM tbl;
Best practice
Make all columns have a unique alias.
SELECT
a as foo,
b as bar FROM tbl; -- Avoid also using the same column twice unless aliased: SELECT
foo as foo1,
foo as foo2,
a as foo3 FROM tbl;
Column aliases should not alias to itself, i.e. self-alias.¶
Renaming the column to itself is a redundant piece of SQL, which doesn't affect its functionality. This rule only applies when aliasing to an exact copy of the column reference (e.g. foo as foo or "BAR" as "BAR", see note below on more complex examples). Aliases which effectively change the casing of an identifier are still allowed.
This rule is sqlfluff fix compatible.
Name: aliasing.self_alias.column
Groups: all, core, aliasing
NOTE:
If those two rules are enabled, the fixes applied may result in a situation where this rule can kick in as a secondary effect. For example this Snowflake query:
-- Original Query. AL09 will not trigger because casing and -- quoting are different. RF06 will however fix the unnecessary -- quoting of "COL". SELECT "COL" AS col FROM table; -- After RF06, the query will look like this, at which point -- CP02 will see the inconsistent capitalisation. Depending -- on the configuration it will change one of the identifiers. -- Let's assume the default configuration of "consistent". SELECT COL AS col FROM table; -- After CP02, the alias and the reference will be the same -- and at this point AL09 can take over and remove the alias. SELECT COL AS COL FROM table; -- ..resulting in: SELECT COL FROM table;
This interdependence between the rules, and the configuration options offered by each one means a variety of outcomes can be achieved by enabling and disabling each one. See Enabling and Disabling Rules and Rule Configuration for more details.
Anti-pattern
Aliasing the column to itself, where not necessary for changing the case of an identifier.
SELECT
col AS col,
"Col" AS "Col",
COL AS col FROM table;
Best practice
Not to use alias to rename the column to its original name. Self-aliasing leads to redundant code without changing any functionality, unless used to effectively change the case of the identifier.
SELECT
col,
"Col"
COL, FROM table; -- Re-casing aliasing is still allowed where necessary, i.e. SELECT
col as "Col",
"col" as "COL" FROM table;
Ambiguous bundle¶
Ambiguous use of DISTINCT in a SELECT statement with GROUP BY.¶
When using GROUP BY a DISTINCT` clause should not be necessary as every non-distinct SELECT clause must be included in the GROUP BY clause.
Name: ambiguous.distinct
Aliases: L021
Groups: all, core, ambiguous
Anti-pattern
DISTINCT and GROUP BY are conflicting.
SELECT DISTINCT
a FROM foo GROUP BY a
Best practice
Remove DISTINCT or GROUP BY. In our case, removing GROUP BY is better.
SELECT DISTINCT
a FROM foo
UNION [DISTINCT|ALL] is preferred over just UNION.¶
This rule is sqlfluff fix compatible.
Name: ambiguous.union
Aliases: L033
Groups: all, core, ambiguous
NOTE:
Anti-pattern
In this example, UNION DISTINCT should be preferred over UNION, because explicit is better than implicit.
SELECT a, b FROM table_1 UNION SELECT a, b FROM table_2
Best practice
Specify DISTINCT or ALL after UNION (note that DISTINCT is the default behavior).
SELECT a, b FROM table_1 UNION DISTINCT SELECT a, b FROM table_2
Ambiguous ordering directions for columns in order by clause.¶
This rule is sqlfluff fix compatible.
Name: ambiguous.order_by
Aliases: L037
Groups: all, ambiguous
Anti-pattern
SELECT
a, b FROM foo ORDER BY a, b DESC
Best practice
If any columns in the ORDER BY clause specify ASC or DESC, they should all do so.
SELECT
a, b FROM foo ORDER BY a ASC, b DESC
Query produces an unknown number of result columns.¶
Name: ambiguous.column_count
Aliases: L044
Groups: all, ambiguous
Anti-pattern
Querying all columns using * produces a query result where the number or ordering of columns changes if the upstream table's schema changes. This should generally be avoided because it can cause slow performance, cause important schema changes to go undetected, or break production code. For example:
- If a query does SELECT t.* and is expected to return columns a, b, and c, the actual columns returned will be wrong/different if columns are added to or deleted from the input table.
- UNION and DIFFERENCE clauses require the inputs have the same number of columns (and compatible types).
- JOIN queries may break due to new column name conflicts, e.g. the query references a column c which initially existed in only one input table but a column of the same name is added to another table.
- CREATE TABLE (<<column schema>>) AS SELECT *
WITH cte AS (
SELECT * FROM foo ) SELECT * FROM cte UNION SELECT a, b FROM t
Best practice
Somewhere along the "path" to the source data, specify columns explicitly.
WITH cte AS (
SELECT * FROM foo ) SELECT a, b FROM cte UNION SELECT a, b FROM t
Join clauses should be fully qualified.¶
By default this rule is configured to enforce fully qualified INNER JOIN clauses, but not [LEFT/RIGHT/FULL] OUTER JOIN. If you prefer a stricter lint then this is configurable.
This rule is sqlfluff fix compatible.
Name: ambiguous.join
Aliases: L051
Groups: all, ambiguous
Configuration
- •
- fully_qualify_join_types: Which types of JOIN clauses should be fully qualified? Must be one of ['inner', 'outer', 'both'].
Anti-pattern
A join is used without specifying the kind of join.
SELECT
foo
FROM bar
JOIN baz;
Best practice
Use INNER JOIN rather than JOIN.
SELECT
foo
FROM bar
INNER JOIN baz;
Inconsistent column references in GROUP BY/ORDER BY clauses.¶
Name: ambiguous.column_references
Aliases: L054
Groups: all, core, ambiguous
Configuration
- •
- group_by_and_order_by_style: The expectation for using explicit column name references or implicit positional references. Must be one of ['consistent', 'implicit', 'explicit'].
NOTE:
Anti-pattern
A mix of implicit and explicit column references are used in a GROUP BY clause.
SELECT
foo,
bar,
sum(baz) AS sum_value
FROM fake_table
GROUP BY
foo, 2;
-- The same also applies to column
-- references in ORDER BY clauses.
SELECT
foo,
bar
FROM fake_table
ORDER BY
1, bar;
Best practice
Reference all GROUP BY/ORDER BY columns either by name or by position.
-- GROUP BY: Explicit
SELECT
foo,
bar,
sum(baz) AS sum_value
FROM fake_table
GROUP BY
foo, bar;
-- ORDER BY: Explicit
SELECT
foo,
bar
FROM fake_table
ORDER BY
foo, bar;
-- GROUP BY: Implicit
SELECT
foo,
bar,
sum(baz) AS sum_value
FROM fake_table
GROUP BY
1, 2;
-- ORDER BY: Implicit
SELECT
foo,
bar
FROM fake_table
ORDER BY
1, 2;
Queries within set query produce different numbers of columns.¶
Name: ambiguous.set_columns
Aliases: L068
Groups: all, ambiguous
Anti-pattern
When writing set expressions, all queries must return the same number of columns.
WITH cte AS (
SELECT
a,
b
FROM foo ) SELECT * FROM cte UNION SELECT
c,
d,
e
FROM t
Best practice
Always specify columns when writing set queries and ensure that they all seleect same number of columns
WITH cte AS (
SELECT a, b FROM foo ) SELECT
a,
b FROM cte UNION SELECT
c,
d FROM t
Capitalisation bundle¶
Inconsistent capitalisation of keywords.¶
This rule is sqlfluff fix compatible.
Name: capitalisation.keywords
Aliases: L010
Groups: all, core, capitalisation
Configuration
- capitalisation_policy: The capitalisation policy to enforce. Must be one of ['consistent', 'upper', 'lower', 'capitalise'].
- ignore_words: Comma separated list of words to ignore from rule.
- ignore_words_regex: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ^ (beginning of text) and $ (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ^ and $.
Anti-pattern
In this example, select is in lower-case whereas FROM is in upper-case.
select
a FROM foo
Best practice
Make all keywords either in upper-case or in lower-case.
SELECT
a FROM foo -- Also good select
a from foo
Inconsistent capitalisation of unquoted identifiers.¶
This rule applies to all unquoted identifiers, whether references or aliases, and whether they refer to columns or other objects (such as tables or schemas).
This rule is sqlfluff fix compatible.
Name: capitalisation.identifiers
Aliases: L014
Groups: all, core, capitalisation
Configuration
- extended_capitalisation_policy: The capitalisation policy to enforce, extended with PascalCase, snake_case, and camelCase. This is separate from capitalisation_policy as it should not be applied to keywords.Camel, Pascal, and Snake will never be inferred when the policy is set to consistent. This is because snake can cause destructive changes to the identifier, and unlinted code is too easily mistaken for camel and pascal. If, when set to consistent, no consistent case is found, it will default to upper. Must be one of ['consistent', 'upper', 'lower', 'pascal', 'capitalise', 'snake', 'camel'].
- ignore_words: Comma separated list of words to ignore from rule.
- ignore_words_regex: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ^ (beginning of text) and $ (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ^ and $.
- unquoted_identifiers_policy: Types of unquoted identifiers to flag violations for. Must be one of ['all', 'aliases', 'column_aliases', 'table_aliases'].
NOTE:
As this feature is only present in a few dialects, and not widely understood by users, we regard it as an antipattern. It is more widely understood that if the case of an identifier matters, then it should be quoted. If you, or your organisation, do wish to rely on this feature, we recommend that you disabled this rule (see Enabling and Disabling Rules).
Anti-pattern
In this example, unquoted identifier a is in lower-case but B is in upper-case.
select
a,
B from foo
In this more complicated example, there are a mix of capitalisations in both reference and aliases of columns and tables. That inconsistency is acceptable when those identifiers are quoted, but not when unquoted.
select
col_1 + Col_2 as COL_3,
"COL_4" as Col_5 from Foo as BAR
Best practice
Ensure all unquoted identifiers are either in upper-case or in lower-case.
select
a,
b from foo; -- ...also good... select
A,
B from foo; --- ...or for comparison with our more complex example, this too: select
col_1 + col_2 as col_3,
"COL_4" as col_5 from foo as bar
Inconsistent capitalisation of function names.¶
This rule is sqlfluff fix compatible.
Name: capitalisation.functions
Aliases: L030
Groups: all, core, capitalisation
Configuration
- extended_capitalisation_policy: The capitalisation policy to enforce, extended with PascalCase, snake_case, and camelCase. This is separate from capitalisation_policy as it should not be applied to keywords.Camel, Pascal, and Snake will never be inferred when the policy is set to consistent. This is because snake can cause destructive changes to the identifier, and unlinted code is too easily mistaken for camel and pascal. If, when set to consistent, no consistent case is found, it will default to upper. Must be one of ['consistent', 'upper', 'lower', 'pascal', 'capitalise', 'snake', 'camel'].
- ignore_words: Comma separated list of words to ignore from rule.
- ignore_words_regex: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ^ (beginning of text) and $ (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ^ and $.
Anti-pattern
In this example, the two SUM functions don't have the same capitalisation.
SELECT
sum(a) AS aa,
SUM(b) AS bb FROM foo
Best practice
Make the case consistent.
SELECT
sum(a) AS aa,
sum(b) AS bb FROM foo
Inconsistent capitalisation of boolean/null literal.¶
This rule is sqlfluff fix compatible.
Name: capitalisation.literals
Aliases: L040
Groups: all, core, capitalisation
Anti-pattern
In this example, null and false are in lower-case whereas TRUE is in upper-case.
select
a,
null,
TRUE,
false from foo
Best practice
Ensure all literal null/true/false literals are consistently upper or lower case
select
a,
NULL,
TRUE,
FALSE from foo -- Also good select
a,
null,
true,
false from foo
Inconsistent capitalisation of datatypes.¶
This rule is sqlfluff fix compatible.
Name: capitalisation.types
Aliases: L063
Groups: all, core, capitalisation
Configuration
- extended_capitalisation_policy: The capitalisation policy to enforce, extended with PascalCase, snake_case, and camelCase. This is separate from capitalisation_policy as it should not be applied to keywords.Camel, Pascal, and Snake will never be inferred when the policy is set to consistent. This is because snake can cause destructive changes to the identifier, and unlinted code is too easily mistaken for camel and pascal. If, when set to consistent, no consistent case is found, it will default to upper. Must be one of ['consistent', 'upper', 'lower', 'pascal', 'capitalise', 'snake', 'camel'].
- ignore_words: Comma separated list of words to ignore from rule.
- ignore_words_regex: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ^ (beginning of text) and $ (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ^ and $.
Anti-pattern
In this example, int and unsigned are in lower-case whereas VARCHAR is in upper-case.
CREATE TABLE t (
a int unsigned,
b VARCHAR(15) );
Best practice
Ensure all datatypes are consistently upper or lower case
CREATE TABLE t (
a INT UNSIGNED,
b VARCHAR(15) );
Convention bundle¶
Consistent usage of != or <> for "not equal to" operator.¶
This rule is sqlfluff fix compatible.
Name: convention.not_equal
Aliases: L061
Groups: all, convention
Configuration
- •
- preferred_not_equal_style: The style for using not equal to operator. Defaults to consistent. Must be one of ['consistent', 'c_style', 'ansi'].
Anti-pattern
SELECT * FROM X WHERE 1 <> 2 AND 3 != 4;
Best practice
Ensure all "not equal to" comparisons are consistent, not mixing != and <>.
SELECT * FROM X WHERE 1 != 2 AND 3 != 4;
Use COALESCE instead of IFNULL or NVL.¶
This rule is sqlfluff fix compatible.
Name: convention.coalesce
Aliases: L060
Groups: all, convention
Anti-pattern
IFNULL or NVL are used to fill NULL values.
SELECT ifnull(foo, 0) AS bar, FROM baz; SELECT nvl(foo, 0) AS bar, FROM baz;
Best practice
Use COALESCE instead. COALESCE is universally supported, whereas Redshift doesn't support IFNULL and BigQuery doesn't support NVL. Additionally, COALESCE is more flexible and accepts an arbitrary number of arguments.
SELECT coalesce(foo, 0) AS bar, FROM baz;
Trailing commas within select clause.¶
This rule is sqlfluff fix compatible.
Name: convention.select_trailing_comma
Aliases: L038
Groups: all, core, convention
Configuration
- •
- select_clause_trailing_comma: Should trailing commas within select clauses be required or forbidden? Must be one of ['forbid', 'require'].
NOTE:
Anti-pattern
SELECT
a,
b, FROM foo
Best practice
SELECT
a,
b FROM foo
Use consistent syntax to express "count number of rows".¶
- Note:
- If both prefer_count_1 and prefer_count_0 are set to true then prefer_count_1 has precedence.
COUNT(*), COUNT(1), and even COUNT(0) are equivalent syntaxes in many SQL engines due to optimizers interpreting these instructions as "count number of rows in result".
The ANSI-92 spec mentions the COUNT(*) syntax specifically as having a special meaning:
So by default, SQLFluff enforces the consistent use of COUNT(*).
If the SQL engine you work with, or your team, prefers COUNT(1) or COUNT(0) over COUNT(*), you can configure this rule to consistently enforce your preference.
This rule is sqlfluff fix compatible.
Name: convention.count_rows
Aliases: L047
Groups: all, core, convention
Configuration
- prefer_count_0: Should count(0) be preferred over count(*) and count(1)? Must be one of [True, False].
- prefer_count_1: Should count(1) be preferred over count(*) and count(0)? Must be one of [True, False].
Anti-pattern
select
count(1) from table_a
Best practice
Use count(*) unless specified otherwise by config prefer_count_1, or prefer_count_0 as preferred.
select
count(*) from table_a
Comparisons with NULL should use "IS" or "IS NOT".¶
This rule is sqlfluff fix compatible.
Name: convention.is_null
Aliases: L049
Groups: all, core, convention
Anti-pattern
In this example, the = operator is used to check for NULL values.
SELECT
a FROM foo WHERE a = NULL
Best practice
Use IS or IS NOT to check for NULL values.
SELECT
a FROM foo WHERE a IS NULL
Statements must end with a semi-colon.¶
This rule is sqlfluff fix compatible.
Name: convention.terminator
Aliases: L052
Groups: all, convention
Configuration
- multiline_newline: Should semi-colons be placed on a new line after multi-line statements? Must be one of [True, False].
- require_final_semicolon: Should final semi-colons be required? (N.B. forcing trailing semi-colons is not recommended for dbt users as it can cause issues when wrapping the query within other SQL queries). Must be one of [True, False].
Anti-pattern
A statement is not immediately terminated with a semi-colon. The • represents space.
SELECT
a
FROM foo
;
SELECT
b
FROM bar••;
Best practice
Immediately terminate the statement with a semi-colon.
SELECT
a
FROM foo;
Top-level statements should not be wrapped in brackets.¶
This rule is sqlfluff fix compatible.
Name: convention.statement_brackets
Aliases: L053
Groups: all, convention
Anti-pattern
A top-level statement is wrapped in brackets.
(SELECT
foo
FROM bar)
-- This also applies to statements containing a sub-query.
(SELECT
foo
FROM (SELECT * FROM bar))
Best practice
Don't wrap top-level statements in brackets.
SELECT
foo
FROM bar
-- Likewise for statements containing a sub-query.
SELECT
foo
FROM (SELECT * FROM bar)
Use LEFT JOIN instead of RIGHT JOIN.¶
Name: convention.left_join
Aliases: L055
Groups: all, convention
Anti-pattern
RIGHT JOIN is used.
SELECT
foo.col1,
bar.col2
FROM foo
RIGHT JOIN bar
ON foo.bar_id = bar.id;
Best practice
Refactor and use LEFT JOIN instead.
SELECT
foo.col1,
bar.col2
FROM bar
LEFT JOIN foo
ON foo.bar_id = bar.id;
Block a list of configurable words from being used.¶
This generic rule can be useful to prevent certain keywords, functions, or objects from being used. Only whole words can be blocked, not phrases, nor parts of words.
This block list is case insensitive.
Example use cases
- We prefer BOOL over BOOLEAN and there is no existing rule to enforce this. Until such a rule is written, we can add BOOLEAN to the deny list to cause a linting error to flag this.
- We have deprecated a schema/table/function and want to prevent it being used in future. We can add that to the denylist and then add a -- noqa: CV09 for the few exceptions that still need to be in the code base for now.
Name: convention.blocked_words
Aliases: L062
Groups: all, convention
Configuration
- blocked_regex: Optional, regex of blocked pattern which should not be used in statements.
- blocked_words: Optional, comma-separated list of blocked words which should not be used in statements.
- match_source: Optional, also match regex of blocked pattern before applying templating.
Anti-pattern
If the blocked_words config is set to deprecated_table,bool then the following will flag:
SELECT * FROM deprecated_table WHERE 1 = 1; CREATE TABLE myschema.t1 (a BOOL);
Best practice
Do not used any blocked words:
SELECT * FROM another_table WHERE 1 = 1; CREATE TABLE myschema.t1 (a BOOLEAN);
Consistent usage of preferred quotes for quoted literals.¶
Some databases allow quoted literals to use either single or double quotes. Prefer one type of quotes as specified in rule setting, falling back to alternate quotes to reduce the need for escapes.
Dollar-quoted raw strings are excluded from this rule, as they are mostly used for literal UDF Body definitions.
This rule is sqlfluff fix compatible.
Name: convention.quoted_literals
Aliases: L064
Groups: all, convention
Configuration
- force_enable: Run this rule even for dialects where this rule is disabled by default. Must be one of [True, False].
- preferred_quoted_literal_style: Preferred quoting style to use for the quoted literals. If set to consistent quoting style is derived from the first quoted literalin the file. Must be one of ['consistent', 'single_quotes', 'double_quotes'].
NOTE:
This rule is only enabled for dialects that allow single and double quotes for quoted literals (currently bigquery, databricks, hive, mysql, sparksql). It can be enabled for other dialects with the force_enable = True flag.
Anti-pattern
select
"abc",
'abc',
"\"",
"abc" = 'abc'
from foo
Best practice
Ensure all quoted literals use preferred quotes, unless escaping can be reduced by using alternate quotes.
select
"abc",
"abc",
'"',
"abc" = "abc"
from foo
Enforce consistent type casting style.¶
This rule is sqlfluff fix compatible.
Name: convention.casting_style
Aliases: L067
Groups: all, convention
Configuration
- •
- preferred_type_casting_style: The expectation for using sql type casting. Must be one of ['consistent', 'shorthand', 'convert', 'cast'].
NOTE:
Anti-pattern
Using mixture of CONVERT, :: and CAST when preferred_type_casting_style config is set to consistent (default).
SELECT
CONVERT(int, 1) AS bar,
100::int::text,
CAST(10 AS text) AS coo FROM foo;
Best practice
Use consistent type casting style.
SELECT
CAST(1 AS int) AS bar,
CAST(CAST(100 AS int) AS text),
CAST(10 AS text) AS coo FROM foo;
Jinja bundle¶
Jinja tags should have a single whitespace on either side.¶
This rule is only active if the jinja templater (or one of it's subclasses, like the dbt templater) are used for the current file.
This rule is sqlfluff fix compatible.
Name: jinja.padding
Aliases: L046
Groups: all, core, jinja
Anti-pattern
Jinja tags with either no whitespace or very long whitespace are hard to read.
SELECT {{ a }} from {{ref('foo')}}
Best practice
A single whitespace surrounding Jinja tags, alternatively longer gaps containing newlines are acceptable.
SELECT {{ a }} from {{ ref('foo') }};
SELECT {{ a }} from {{
ref('foo')
}};
Layout bundle¶
Inappropriate Spacing.¶
This rule checks for an enforces the spacing as configured in Configuring Layout. This includes excessive whitespace, trailing whitespace at the end of a line and also the wrong spacing between elements on the line. Because of this wide reach you may find that you wish to add specific configuration in your project to tweak how specific elements are treated. Rather than configuration on this specific rule, use the sqlfluff.layout section of your configuration file to customise how this rule operates.
The • character represents a space in the examples below.
This rule is sqlfluff fix compatible.
Name: layout.spacing
Aliases: L001, L005, L006, L008, L023, L024, L039, L048, L071
Groups: all, core, layout
Anti-pattern
SELECT
a, b(c) as d•• FROM foo•••• JOIN bar USING(a)
Best practice
- Unless an indent or preceding a comment, whitespace should be a single space.
- There should also be no trailing whitespace at the ends of lines.
- There should be a space after USING so that it's not confused for a function.
SELECT
a, b(c) as d FROM foo JOIN bar USING (a)
Incorrect Indentation.¶
This rule is sqlfluff fix compatible.
Name: layout.indent
Aliases: L002, L003, L004
Groups: all, core, layout
Anti-pattern
The • character represents a space and the → character represents a tab. In this example, the third line contains five spaces instead of four and the second line contains two spaces and one tab.
SELECT
••→a,
•••••b
FROM foo
Best practice
Change the indentation to use a multiple of four spaces. This example also assumes that the indent_unit config value is set to space. If it had instead been set to tab, then the indents would be tabs instead.
SELECT
••••a,
••••b
FROM foo
Operators should follow a standard for being before/after newlines.¶
The configuration for whether operators should be trailing or leading is part of Configuring Layout. The default configuration is:
[sqlfluff:layout:type:binary_operator] line_position = leading [sqlfluff:layout:type:comparison_operator] line_position = leading
This rule is sqlfluff fix compatible.
Name: layout.operators
Aliases: L007
Groups: all, layout
Anti-pattern
In this example, if line_position = leading (or unspecified, as is the default), then the operator + should not be at the end of the second line.
SELECT
a +
b FROM foo
Best practice
If line_position = leading (or unspecified, as this is the default), place the operator after the newline.
SELECT
a
+ b FROM foo
If line_position = trailing, place the operator before the newline.
SELECT
a +
b FROM foo
Leading/Trailing comma enforcement.¶
The configuration for whether operators should be trailing or leading is part of Configuring Layout. The default configuration is:
[sqlfluff:layout:type:comma] line_position = trailing
This rule is sqlfluff fix compatible.
Name: layout.commas
Aliases: L019
Groups: all, layout
Anti-pattern
There is a mixture of leading and trailing commas.
SELECT
a
, b,
c FROM foo
Best practice
By default, SQLFluff prefers trailing commas. However it is configurable for leading commas. The chosen style must be used consistently throughout your SQL.
SELECT
a,
b,
c FROM foo -- Alternatively, set the configuration file to 'leading' -- and then the following would be acceptable: SELECT
a
, b
, c FROM foo
Line is too long.¶
This rule is sqlfluff fix compatible.
Name: layout.long_lines
Aliases: L016
Groups: all, core, layout
Configuration
- ignore_comment_clauses: Should comment clauses (e.g. column comments) be ignored when linting line lengths? Must be one of [True, False].
- ignore_comment_lines: Should lines that contain only whitespace and comments be ignored when linting line lengths? Must be one of [True, False].
Function name not immediately followed by parenthesis.¶
This rule is sqlfluff fix compatible.
Name: layout.functions
Aliases: L017
Groups: all, core, layout
Anti-pattern
In this example, there is a space between the function and the parenthesis.
SELECT
sum (a) FROM foo
Best practice
Remove the space between the function and the parenthesis.
SELECT
sum(a) FROM foo
WITH clause closing bracket should be on a new line.¶
This rule is sqlfluff fix compatible.
Name: layout.cte_bracket
Aliases: L018
Groups: all, core, layout
Anti-pattern
In this example, the closing bracket is on the same line as CTE.
WITH zoo AS (
SELECT a FROM foo)
SELECT * FROM zoo
Best practice
Move the closing bracket on a new line.
WITH zoo AS (
SELECT a FROM foo ) SELECT * FROM zoo
Blank line expected but not found after CTE closing bracket.¶
This rule is sqlfluff fix compatible.
Name: layout.cte_newline
Aliases: L022
Groups: all, core, layout
Anti-pattern
There is no blank line after the CTE closing bracket. In queries with many CTEs, this hinders readability.
WITH plop AS (
SELECT * FROM foo ) SELECT a FROM plop
Best practice
Add a blank line.
WITH plop AS (
SELECT * FROM foo ) SELECT a FROM plop
Select targets should be on a new line unless there is only one select target.¶
This rule is sqlfluff fix compatible.
Name: layout.select_targets
Aliases: L036
Groups: all, layout
Configuration
- •
- wildcard_policy: Treatment of wildcards. Defaults to single. Must be one of ['single', 'multiple'].
NOTE:
Anti-pattern
Multiple select targets on the same line.
select a, b from foo; -- Single select target on its own line. SELECT
a FROM foo;
Best practice
Multiple select targets each on their own line.
select
a,
b from foo; -- Single select target on the same line as the ``SELECT`` -- keyword. SELECT a FROM foo; -- When select targets span multiple lines, however they -- can still be on a new line. SELECT
SUM(
1 + SUM(
2 + 3
)
) AS col FROM test_table;
SELECT modifiers (e.g. DISTINCT) must be on the same line as SELECT.¶
This rule is sqlfluff fix compatible.
Name: layout.select_modifiers
Aliases: L041
Groups: all, core, layout
Anti-pattern
select
distinct a,
b from x
Best practice
select distinct
a,
b from x
Set operators should be surrounded by newlines.¶
This rule is sqlfluff fix compatible.
Name: layout.set_operators
Aliases: L065
Groups: all, core, layout
Anti-pattern
In this example, UNION ALL is not on a line itself.
SELECT 'a' AS col UNION ALL SELECT 'b' AS col
Best practice
SELECT 'a' AS col UNION ALL SELECT 'b' AS col
Files must end with a single trailing newline.¶
This rule is sqlfluff fix compatible.
Name: layout.end_of_file
Aliases: L009, layout.end-of-file
Groups: all, core, layout
Anti-pattern
The content in file does not end with a single trailing newline. The $ represents end of file.
SELECT
a
FROM foo$
-- Ending on an indented line means there is no newline
-- at the end of the file, the • represents space.
SELECT
••••a
FROM
••••foo
••••$
-- Ending on a semi-colon means the last line is not a
-- newline.
SELECT
a
FROM foo
;$
-- Ending with multiple newlines.
SELECT
a
FROM foo
$
Best practice
Add trailing newline to the end. The $ character represents end of file.
SELECT
a
FROM foo
$
-- Ensuring the last line is not indented so is just a
-- newline.
SELECT
••••a
FROM
••••foo
$
-- Even when ending on a semi-colon, ensure there is a
-- newline after.
SELECT
a
FROM foo
;
$
Files must not begin with newlines or whitespace.¶
This rule is sqlfluff fix compatible.
Name: layout.start_of_file
Aliases: L050
Groups: all, layout
Anti-pattern
The file begins with newlines or whitespace. The ^ represents the beginning of the file.
^
SELECT
a
FROM foo
-- Beginning on an indented line is also forbidden,
-- (the • represents space).
••••SELECT
••••a
FROM
••••foo
Best practice
Start file on either code or comment. (The ^ represents the beginning of the file.)
^SELECT
a
FROM foo
-- Including an initial block comment.
^/*
This is a description of my SQL code.
*/
SELECT
a
FROM
foo
-- Including an initial inline comment.
^--This is a description of my SQL code.
SELECT
a
FROM
foo
References bundle¶
References cannot reference objects not present in FROM clause.¶
Name: references.from
Aliases: L026
Groups: all, core, references
Configuration
- •
- force_enable: Run this rule even for dialects where this rule is disabled by default. Must be one of [True, False].
NOTE:
Anti-pattern
In this example, the reference vee has not been declared.
SELECT
vee.a FROM foo
Best practice
Remove the reference.
SELECT
a FROM foo
References should be qualified if select has more than one referenced table/view.¶
Name: references.qualification
Aliases: L027
Groups: all, references
NOTE:
Anti-pattern
In this example, the reference vee has not been declared, and the variables a and b are potentially ambiguous.
SELECT a, b FROM foo LEFT JOIN vee ON vee.a = foo.a
Best practice
Add the references.
SELECT foo.a, vee.b FROM foo LEFT JOIN vee ON vee.a = foo.a
Column references should be qualified consistently in single table statements.¶
This rule is sqlfluff fix compatible.
Name: references.consistent
Aliases: L028
Groups: all, references
Configuration
- force_enable: Run this rule even for dialects where this rule is disabled by default. Must be one of [True, False].
- single_table_references: The expectation for references in single-table select. Must be one of ['consistent', 'qualified', 'unqualified'].
NOTE:
"consistent" will be fixed to "qualified" if inconsistency is found.
Anti-pattern
In this example, only the reference to b is qualified.
SELECT
a,
foo.b FROM foo
Best practice
Either all column references should be qualified, or all unqualified.
SELECT
a,
b FROM foo -- Also good SELECT
foo.a,
foo.b FROM foo
Keywords should not be used as identifiers.¶
Although unreserved keywords can be used as identifiers, and reserved words can be used as quoted identifiers, best practice is to avoid where possible, to avoid any misunderstandings as to what the alias represents.
Name: references.keywords
Aliases: L029
Groups: all, references
Configuration
- ignore_words: Comma separated list of words to ignore from rule.
- ignore_words_regex: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ^ (beginning of text) and $ (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ^ and $.
- quoted_identifiers_policy: Types of quoted identifiers to flag violations for. Must be one of ['all', 'aliases', 'column_aliases', 'table_aliases', 'none'].
- unquoted_identifiers_policy: Types of unquoted identifiers to flag violations for. Must be one of ['all', 'aliases', 'column_aliases', 'table_aliases'].
NOTE:
Anti-pattern
In this example, SUM (built-in function) is used as an alias.
SELECT
sum.a FROM foo AS sum
Best practice
Avoid keywords as the name of an alias.
SELECT
vee.a FROM foo AS vee
Do not use special characters in identifiers.¶
Name: references.special_chars
Aliases: L057
Groups: all, references
Configuration
- additional_allowed_characters: Optional list of extra allowed characters, in addition to alphanumerics (A-Z, a-z, 0-9) and underscores.
- allow_space_in_identifier: Should spaces in identifiers be allowed? Must be one of [True, False].
- ignore_words: Comma separated list of words to ignore from rule.
- ignore_words_regex: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ^ (beginning of text) and $ (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ^ and $.
- quoted_identifiers_policy: Types of quoted identifiers to flag violations for. Must be one of ['all', 'aliases', 'column_aliases', 'table_aliases', 'none'].
- unquoted_identifiers_policy: Types of unquoted identifiers to flag violations for. Must be one of ['all', 'aliases', 'column_aliases', 'table_aliases'].
Anti-pattern
Using special characters within identifiers when creating or aliasing objects.
CREATE TABLE DBO.ColumnNames (
[Internal Space] INT,
[Greater>Than] INT,
[Less<Than] INT,
Number# INT )
Best practice
Identifiers should include only alphanumerics and underscores.
CREATE TABLE DBO.ColumnNames (
[Internal_Space] INT,
[GreaterThan] INT,
[LessThan] INT,
NumberVal INT )
Unnecessary quoted identifier.¶
This rule will fail if the quotes used to quote an identifier are (un)necessary depending on the force_quote_identifier configuration. This rule applies to both column references and their aliases. The default (safe) behaviour is designed not to unexpectedly corrupt SQL. That means the circumstances in which quotes can be safely removed depends on the current dialect would resolve the unquoted variant of the identifier (see below for examples).
Additionally this rule may be configured to a more aggressive setting by setting case_sensitive to False, in which case quotes will be removed regardless of the casing of the contained identifier. Any identifiers which contain special characters, spaces or keywords will still be left quoted. This setting is more appropriate for projects or teams where there is more control over the inputs and outputs of queries, and where it's more viable to institute rules such as enforcing that all identifiers are the default casing (and therefore meaning that using quotes to change the case of identifiers is unnecessary).
Dialect group | ✅ Example where quotes are safe to remove. | ⚠️ Examples where quotes are not safe to remove. |
Natively UPPERCASE dialects e.g. Snowflake, BigQuery, TSQL & Oracle. | Identifiers which, without quotes, would resolve to the default casing of FOO i.e. "FOO". | Identifiers where the quotes are necessary to preserve case (e.g. "Foo" or "foo"), or where the identifier contains something invalid without the quotes such as keywords or special characters e.g. "SELECT", "With Space" or "Special&Characters". |
Natively lowercase dialects e.g. Athena, Hive & Postgres | Identifiers which, without quotes, would resolve to the default casing of foo i.e. "foo". | Identifiers where the quotes are necessary to preserve case (e.g. "Foo" or "foo"), or where the identifier contains something invalid without the quotes such as keywords or special characters e.g. "SELECT", "With Space" or "Special&Characters". |
Case insensitive dialects e.g. DuckDB or Apache Spark SQL | Any identifiers which are valid without quotes: e.g. "FOO", "foo", "Foo", "fOo", FOO and foo would all resolve to the same object. | Identifiers which contain something invalid without the quotes such as keywords or special characters e.g. "SELECT", "With Space" or "Special&Characters". |
This rule is closely associated with (and constrained by the same above factors) as aliasing.self_alias.column (AL09).
When prefer_quoted_identifiers = False (default behaviour), the quotes are unnecessary, except for reserved keywords and special characters in identifiers.
This rule is sqlfluff fix compatible.
Name: references.quoting
Aliases: L059
Groups: all, references
Configuration
- case_sensitive: If False, comparison is done case in-sensitively. Defaults to True. Must be one of [True, False].
- ignore_words: Comma separated list of words to ignore from rule.
- ignore_words_regex: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ^ (beginning of text) and $ (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ^ and $.
- prefer_quoted_identifiers: If True, requires every identifier to be quoted. Defaults to False. Must be one of [True, False].
- prefer_quoted_keywords: If True, requires every keyword used as an identifier to be quoted. Defaults to False. Must be one of [True, False].
Anti-pattern
In this example, valid unquoted identifiers, that are not also reserved keywords, are needlessly quoted.
SELECT "foo" as "bar"; -- For lowercase dialects like Postgres SELECT "FOO" as "BAR"; -- For uppercase dialects like Snowflake
Best practice
Use unquoted identifiers where possible.
SELECT foo as bar; -- For lowercase dialects like Postgres SELECT FOO as BAR; -- For uppercase dialects like Snowflake -- Note that where the case of the quoted identifier requires -- the quotes to remain, or where the identifier cannot be -- unquoted because it would be invalid to do so, the quotes -- may remain. For example: SELECT
"Case_Sensitive_Identifier" as is_allowed,
"Identifier with spaces or speci@l characters" as this_too,
"SELECT" as also_reserved_words FROM "My Table With Spaces"
When prefer_quoted_identifiers = True, the quotes are always necessary, no matter if the identifier is valid, a reserved keyword, or contains special characters.
NOTE:
Anti-pattern
In this example, a valid unquoted identifier, that is also not a reserved keyword, is required to be quoted.
SELECT 123 as foo
Best practice Use quoted identifiers.
SELECT 123 as "foo" -- For ANSI, ... -- or SELECT 123 as `foo` -- For BigQuery, MySql, ...
Structure bundle¶
Do not specify else null in a case when statement (redundant).¶
This rule is sqlfluff fix compatible.
Name: structure.else_null
Aliases: L035
Groups: all, structure
Anti-pattern
select
case
when name like '%cat%' then 'meow'
when name like '%dog%' then 'woof'
else null
end from x
Best practice
Omit else null
select
case
when name like '%cat%' then 'meow'
when name like '%dog%' then 'woof'
end from x
Unnecessary CASE statement.¶
This rule is sqlfluff fix compatible.
Name: structure.simple_case
Aliases: L043
Groups: all, structure
Anti-pattern
CASE statement returns booleans.
select
case
when fab > 0 then true
else false
end as is_fab from fancy_table -- This rule can also simplify CASE statements -- that aim to fill NULL values. select
case
when fab is null then 0
else fab
end as fab_clean from fancy_table -- This also covers where the case statement -- replaces NULL values with NULL values. select
case
when fab is null then null
else fab
end as fab_clean from fancy_table
Best practice
Reduce to WHEN condition within COALESCE function.
select
coalesce(fab > 0, false) as is_fab from fancy_table -- To fill NULL values. select
coalesce(fab, 0) as fab_clean from fancy_table -- NULL filling NULL. select fab as fab_clean from fancy_table
Query defines a CTE (common-table expression) but does not use it.¶
Name: structure.unused_cte
Aliases: L045
Groups: all, core, structure
Anti-pattern
Defining a CTE that is not used by the query is harmless, but it means the code is unnecessary and could be removed.
WITH cte1 AS (
SELECT a
FROM t ), cte2 AS (
SELECT b
FROM u ) SELECT * FROM cte1
Best practice
Remove unused CTEs.
WITH cte1 AS (
SELECT a
FROM t ) SELECT * FROM cte1
Nested CASE statement in ELSE clause could be flattened.¶
This rule is sqlfluff fix compatible.
Name: structure.nested_case
Aliases: L058
Groups: all, structure
Anti-pattern
In this example, the outer CASE's ELSE is an unnecessary, nested CASE.
SELECT
CASE
WHEN species = 'Cat' THEN 'Meow'
ELSE
CASE
WHEN species = 'Dog' THEN 'Woof'
END
END as sound FROM mytable
Best practice
Move the body of the inner CASE to the end of the outer one.
SELECT
CASE
WHEN species = 'Cat' THEN 'Meow'
WHEN species = 'Dog' THEN 'Woof'
END AS sound FROM mytable
Join/From clauses should not contain subqueries. Use CTEs instead.¶
By default this rule is configured to allow subqueries within FROM clauses but not within JOIN clauses. If you prefer a stricter lint then this is configurable.
This rule is sqlfluff fix compatible.
Name: structure.subquery
Aliases: L042
Groups: all, structure
Configuration
- •
- forbid_subquery_in: Which clauses should be linted for subqueries? Must be one of ['join', 'from', 'both'].
NOTE:
Anti-pattern
select
a.x, a.y, b.z from a join (
select x, z from b ) using(x)
Best practice
with c as (
select x, z from b ) select
a.x, a.y, c.z from a join c using(x)
Select wildcards then simple targets before calculations and aggregates.¶
This rule is sqlfluff fix compatible.
Name: structure.column_order
Aliases: L034
Groups: all, structure
Anti-pattern
select
a,
*,
row_number() over (partition by id order by date) as y,
b from x
Best practice
Order select targets in ascending complexity
select
*,
a,
b,
row_number() over (partition by id order by date) as y from x
Prefer specifying join keys instead of using USING.¶
This rule is sqlfluff fix compatible.
Name: structure.using
Aliases: L032
Groups: all, structure
NOTE:
In fact dbt removed it from their style guide in February 2022. However, some like the rule, so for now we will keep it in SQLFluff, but encourage those that do not find value in the rule, to turn it off.
NOTE:
Anti-pattern
SELECT
table_a.field_1,
table_b.field_2 FROM
table_a INNER JOIN table_b USING (id)
Best practice
Specify the keys directly
SELECT
table_a.field_1,
table_b.field_2 FROM
table_a INNER JOIN table_b
ON table_a.id = table_b.id
DISTINCT used with parentheses.¶
This rule is sqlfluff fix compatible.
Name: structure.distinct
Aliases: L015
Groups: all, structure, core
Anti-pattern
In this example, parentheses are not needed and confuse DISTINCT with a function. The parentheses can also be misleading about which columns are affected by the DISTINCT (all the columns!).
SELECT DISTINCT(a), b FROM foo
Best practice
Remove parentheses to be clear that the DISTINCT applies to both columns.
SELECT DISTINCT a, b FROM foo
Joins should list the table referenced earlier/later first.¶
This rule will break conditions from join clauses down into subconditions using the "and" and "or" binary operators.
Subconditions that are made up of a qualified column reference, a comparison operator and another qualified column reference are then evaluated to check whether they list the table that was referenced earlier - or later, depending on the preferred_first_table_in_join_clause configuration.
Subconditions that do not follow that pattern are ignored by this rule.
This rule is sqlfluff fix compatible.
Name: structure.join_condition_order
Groups: all, structure
Configuration
- •
- preferred_first_table_in_join_clause: Which table to list first when joining two tables. Defaults to earlier. Must be one of ['earlier', 'later'].
NOTE:
Anti-pattern
In this example, the tables that were referenced later are listed first and the preferred_first_table_in_join_clause configuration is set to earlier.
select
foo.a,
foo.b,
bar.c from foo left join bar
-- This subcondition does not list
-- the table referenced earlier first:
on bar.a = foo.a
-- Neither does this subcondition:
and bar.b = foo.b
Best practice
List the tables that were referenced earlier first.
select
foo.a,
foo.b,
bar.c from foo left join bar
on foo.a = bar.a
and foo.b = bar.b
TSQL bundle¶
SP_ prefix should not be used for user-defined stored procedures in T-SQL.¶
Name: tsql.sp_prefix
Aliases: L056
Groups: all, tsql
Anti-pattern
The SP_ prefix is used to identify system procedures and can adversely affect performance of the user-defined stored procedure. It can also break system procedures if there is a naming conflict.
CREATE PROCEDURE dbo.sp_pull_data
AS
SELECT
ID,
DataDate,
CaseOutput
FROM table1
Best practice
Use a different name for the stored procedure.
CREATE PROCEDURE dbo.pull_data
AS
SELECT
ID,
DataDate,
CaseOutput
FROM table1
-- Alternatively prefix with USP_ to
-- indicate a user-defined stored procedure.
CREATE PROCEDURE dbo.usp_pull_data
AS
SELECT
ID,
DataDate,
CaseOutput
FROM table1
CLI Reference¶
sqlfluff¶
SQLFluff is a modular SQL linter for humans.
sqlfluff [OPTIONS] COMMAND [ARGS]...
Options
- --version
- Show the version and exit.
Examples:
sqlfluff lint --dialect postgres . sqlfluff lint --dialect mysql --rules ST05 my_query.sql sqlfluff fix --dialect sqlite --rules LT10,ST05 src/queries sqlfluff parse --dialect duckdb --templater jinja path/my_query.sql
dialects¶
Show the current dialects available.
sqlfluff dialects [OPTIONS]
Options
- -n, --nocolor
- No color - output will be without ANSI color codes.
- -v, --verbose
- Verbosity, how detailed should the output be. This is stackable, so -vv is more verbose than -v. For the most verbose option try -vvvv or -vvvvv.
- --version
- Show the version and exit.
fix¶
Fix SQL files.
PATH is the path to a sql file or directory to lint. This can be either a file ('path/to/file.sql'), a path ('directory/of/sql/files'), a single ('-') character to indicate reading from stdin or a dot/blank ('.'/' ') which will be interpreted like passing the current working directory as a path argument.
sqlfluff fix [OPTIONS] [PATHS]...
Options
- -n, --nocolor
- No color - output will be without ANSI color codes.
- -v, --verbose
- Verbosity, how detailed should the output be. This is stackable, so -vv is more verbose than -v. For the most verbose option try -vvvv or -vvvvv.
- --version
- Show the version and exit.
- --stdin-filename <stdin_filename>
- When using stdin as an input, load the configuration as if the contents of stdin was in a file in the listed location. This is useful for some editors that pass file contents from the editor that might not match the content on disk.
- --library-path <library_path>
- Override the library_path value from the [sqlfluff:templater:jinja] configuration value. Set this to 'none' to disable entirely. This overrides any values set by users in configuration files or inline directives.
- --disable-noqa-except <disable_noqa_except>
- Ignore all but the listed rules inline noqa comments.
- --disable-noqa
- Set this flag to ignore inline noqa comments.
- --logger <logger>
- Choose to limit the logging to one of the loggers.
- Options
- templater | lexer | parser | linter | rules | config
- --bench
- Set this flag to engage the benchmarking tool output.
- -i, --ignore <ignore>
- Ignore particular families of errors so that they don't cause a failed run. For example --ignore parsing would mean that any parsing errors are ignored and don't influence the success or fail of a run. --ignore behaves somewhat like noqa comments, except it applies globally. Multiple options are possible if comma separated: e.g. --ignore parsing,templating.
- --encoding <encoding>
- Specify encoding to use when reading and writing files. Defaults to autodetect.
- --ignore-local-config
- Ignore config files in default search path locations. This option allows the user to lint with the default config or can be used in conjunction with --config to only reference the custom config file.
- --config <extra_config_path>
- Include additional config file. By default the config is generated from the standard configuration files described in the documentation. This argument allows you to specify an additional configuration file that overrides the standard configuration files. N.B. cfg format is required.
- -e, --exclude-rules <exclude_rules>
- Exclude specific rules. For example specifying --exclude-rules LT01 will remove rule LT01 (Unnecessary trailing whitespace) from the set of considered rules. This could either be the allowlist, or the general set if there is no specific allowlist. Multiple rules can be specified with commas e.g. --exclude-rules LT01,LT02 will exclude violations of rule LT01 and rule LT02.
- -r, --rules <rules>
- Narrow the search to only specific rules. For example specifying --rules LT01 will only search for rule LT01 (Unnecessary trailing whitespace). Multiple rules can be specified with commas e.g. --rules LT01,LT02 will specify only looking for violations of rule LT01 and rule LT02.
- -t, --templater <templater>
- The templater to use (default=jinja)
- Options
- raw | jinja | python | placeholder
- -d, --dialect <dialect>
- The dialect of SQL to lint
- --warn-unused-ignores
- Warn about unneeded '-- noqa:' comments.
- --persist-timing <persist_timing>
- A filename to persist the timing information for a linting run to in csv format for external analysis. NOTE: This feature should be treated as beta, and the format of the csv file may change in future releases without warning.
- --disable-progress-bar
- Disables progress bars.
- -p, --processes <processes>
- The number of parallel processes to run. Positive numbers work as expected. Zero and negative numbers will work as number_of_cpus - number. e.g -1 means all cpus except one. 0 means all cpus.
- -f, --force
- [DEPRECATED - From 3.0 onward this is the default behaviour] Apply fixes will also be applied file by file, during the linting process, rather than waiting until all files are linted before fixing.
- --check
- Analyse all files and ask for confirmation before applying any fixes. Fixes will be applied all together at the end of the operation.
- -q, --quiet
- Reduces the amount of output to stdout to a minimal level. This is effectively the opposite of -v. NOTE: It will only take effect if -f/--force is also set.
- -x, --fixed-suffix <fixed_suffix>
- An optional suffix to add to fixed files.
- --FIX-EVEN-UNPARSABLE
- Enables fixing of files that have templating or parse errors. Note that the similar-sounding '--ignore' or 'noqa' features merely prevent errors from being displayed. For safety reasons, the 'fix'command will not make any fixes in files that have templating or parse errors unless '--FIX-EVEN-UNPARSABLE' is enabled on the command lineor in the .sqlfluff config file.
- --show-lint-violations
- Show lint violations
Arguments
- PATHS
- Optional argument(s)
format¶
Autoformat SQL files.
This effectively force applies sqlfluff fix with a known subset of fairly stable rules. Enabled rules are ignored, but rule exclusions (via CLI) or config are still respected.
PATH is the path to a sql file or directory to lint. This can be either a file ('path/to/file.sql'), a path ('directory/of/sql/files'), a single ('-') character to indicate reading from stdin or a dot/blank ('.'/' ') which will be interpreted like passing the current working directory as a path argument.
sqlfluff format [OPTIONS] [PATHS]...
Options
- -n, --nocolor
- No color - output will be without ANSI color codes.
- -v, --verbose
- Verbosity, how detailed should the output be. This is stackable, so -vv is more verbose than -v. For the most verbose option try -vvvv or -vvvvv.
- --version
- Show the version and exit.
- --stdin-filename <stdin_filename>
- When using stdin as an input, load the configuration as if the contents of stdin was in a file in the listed location. This is useful for some editors that pass file contents from the editor that might not match the content on disk.
- --library-path <library_path>
- Override the library_path value from the [sqlfluff:templater:jinja] configuration value. Set this to 'none' to disable entirely. This overrides any values set by users in configuration files or inline directives.
- --disable-noqa-except <disable_noqa_except>
- Ignore all but the listed rules inline noqa comments.
- --disable-noqa
- Set this flag to ignore inline noqa comments.
- --logger <logger>
- Choose to limit the logging to one of the loggers.
- Options
- templater | lexer | parser | linter | rules | config
- --bench
- Set this flag to engage the benchmarking tool output.
- -i, --ignore <ignore>
- Ignore particular families of errors so that they don't cause a failed run. For example --ignore parsing would mean that any parsing errors are ignored and don't influence the success or fail of a run. --ignore behaves somewhat like noqa comments, except it applies globally. Multiple options are possible if comma separated: e.g. --ignore parsing,templating.
- --encoding <encoding>
- Specify encoding to use when reading and writing files. Defaults to autodetect.
- --ignore-local-config
- Ignore config files in default search path locations. This option allows the user to lint with the default config or can be used in conjunction with --config to only reference the custom config file.
- --config <extra_config_path>
- Include additional config file. By default the config is generated from the standard configuration files described in the documentation. This argument allows you to specify an additional configuration file that overrides the standard configuration files. N.B. cfg format is required.
- -e, --exclude-rules <exclude_rules>
- Exclude specific rules. For example specifying --exclude-rules LT01 will remove rule LT01 (Unnecessary trailing whitespace) from the set of considered rules. This could either be the allowlist, or the general set if there is no specific allowlist. Multiple rules can be specified with commas e.g. --exclude-rules LT01,LT02 will exclude violations of rule LT01 and rule LT02.
- -r, --rules <rules>
- Narrow the search to only specific rules. For example specifying --rules LT01 will only search for rule LT01 (Unnecessary trailing whitespace). Multiple rules can be specified with commas e.g. --rules LT01,LT02 will specify only looking for violations of rule LT01 and rule LT02.
- -t, --templater <templater>
- The templater to use (default=jinja)
- Options
- raw | jinja | python | placeholder
- -d, --dialect <dialect>
- The dialect of SQL to lint
- --warn-unused-ignores
- Warn about unneeded '-- noqa:' comments.
- --persist-timing <persist_timing>
- A filename to persist the timing information for a linting run to in csv format for external analysis. NOTE: This feature should be treated as beta, and the format of the csv file may change in future releases without warning.
- --disable-progress-bar
- Disables progress bars.
- -p, --processes <processes>
- The number of parallel processes to run. Positive numbers work as expected. Zero and negative numbers will work as number_of_cpus - number. e.g -1 means all cpus except one. 0 means all cpus.
- -x, --fixed-suffix <fixed_suffix>
- An optional suffix to add to fixed files.
Arguments
- PATHS
- Optional argument(s)
lint¶
Lint SQL files via passing a list of files or using stdin.
PATH is the path to a sql file or directory to lint. This can be either a file ('path/to/file.sql'), a path ('directory/of/sql/files'), a single ('-') character to indicate reading from stdin or a dot/blank ('.'/' ') which will be interpreted like passing the current working directory as a path argument.
Linting SQL files:
Linting a file via stdin (note the lone '-' character):
sqlfluff lint [OPTIONS] [PATHS]...
Options
- -n, --nocolor
- No color - output will be without ANSI color codes.
- -v, --verbose
- Verbosity, how detailed should the output be. This is stackable, so -vv is more verbose than -v. For the most verbose option try -vvvv or -vvvvv.
- --version
- Show the version and exit.
- --stdin-filename <stdin_filename>
- When using stdin as an input, load the configuration as if the contents of stdin was in a file in the listed location. This is useful for some editors that pass file contents from the editor that might not match the content on disk.
- --library-path <library_path>
- Override the library_path value from the [sqlfluff:templater:jinja] configuration value. Set this to 'none' to disable entirely. This overrides any values set by users in configuration files or inline directives.
- --disable-noqa-except <disable_noqa_except>
- Ignore all but the listed rules inline noqa comments.
- --disable-noqa
- Set this flag to ignore inline noqa comments.
- --logger <logger>
- Choose to limit the logging to one of the loggers.
- Options
- templater | lexer | parser | linter | rules | config
- --bench
- Set this flag to engage the benchmarking tool output.
- -i, --ignore <ignore>
- Ignore particular families of errors so that they don't cause a failed run. For example --ignore parsing would mean that any parsing errors are ignored and don't influence the success or fail of a run. --ignore behaves somewhat like noqa comments, except it applies globally. Multiple options are possible if comma separated: e.g. --ignore parsing,templating.
- --encoding <encoding>
- Specify encoding to use when reading and writing files. Defaults to autodetect.
- --ignore-local-config
- Ignore config files in default search path locations. This option allows the user to lint with the default config or can be used in conjunction with --config to only reference the custom config file.
- --config <extra_config_path>
- Include additional config file. By default the config is generated from the standard configuration files described in the documentation. This argument allows you to specify an additional configuration file that overrides the standard configuration files. N.B. cfg format is required.
- -e, --exclude-rules <exclude_rules>
- Exclude specific rules. For example specifying --exclude-rules LT01 will remove rule LT01 (Unnecessary trailing whitespace) from the set of considered rules. This could either be the allowlist, or the general set if there is no specific allowlist. Multiple rules can be specified with commas e.g. --exclude-rules LT01,LT02 will exclude violations of rule LT01 and rule LT02.
- -r, --rules <rules>
- Narrow the search to only specific rules. For example specifying --rules LT01 will only search for rule LT01 (Unnecessary trailing whitespace). Multiple rules can be specified with commas e.g. --rules LT01,LT02 will specify only looking for violations of rule LT01 and rule LT02.
- -t, --templater <templater>
- The templater to use (default=jinja)
- Options
- raw | jinja | python | placeholder
- -d, --dialect <dialect>
- The dialect of SQL to lint
- --warn-unused-ignores
- Warn about unneeded '-- noqa:' comments.
- --persist-timing <persist_timing>
- A filename to persist the timing information for a linting run to in csv format for external analysis. NOTE: This feature should be treated as beta, and the format of the csv file may change in future releases without warning.
- --disable-progress-bar
- Disables progress bars.
- -p, --processes <processes>
- The number of parallel processes to run. Positive numbers work as expected. Zero and negative numbers will work as number_of_cpus - number. e.g -1 means all cpus except one. 0 means all cpus.
- -f, --format <format>
- What format to return the lint result in (default=human).
- Options
- human | json | yaml | github-annotation | github-annotation-native | none
- --write-output <write_output>
- Optionally provide a filename to write the results to, mostly used in tandem with --format. NB: Setting an output file re-enables normal stdout logging.
- --annotation-level <annotation_level>
- When format is set to "github-annotation" or "github-annotation-native", default annotation level (default="warning"). "failure" and "error" are equivalent. Any rules configured only as warnings will always come through with type "notice" regardless of this option.
- Options
- notice | warning | failure | error
- --nofail
- If set, the exit code will always be zero, regardless of violations found. This is potentially useful during rollout.
- --disregard-sqlfluffignores
- Perform the operation regardless of .sqlfluffignore configurations
Arguments
- PATHS
- Optional argument(s)
parse¶
Parse SQL files and just spit out the result.
PATH is the path to a sql file or directory to lint. This can be either a file ('path/to/file.sql'), a path ('directory/of/sql/files'), a single ('-') character to indicate reading from stdin or a dot/blank ('.'/' ') which will be interpreted like passing the current working directory as a path argument.
sqlfluff parse [OPTIONS] PATH
Options
- -n, --nocolor
- No color - output will be without ANSI color codes.
- -v, --verbose
- Verbosity, how detailed should the output be. This is stackable, so -vv is more verbose than -v. For the most verbose option try -vvvv or -vvvvv.
- --version
- Show the version and exit.
- --stdin-filename <stdin_filename>
- When using stdin as an input, load the configuration as if the contents of stdin was in a file in the listed location. This is useful for some editors that pass file contents from the editor that might not match the content on disk.
- --library-path <library_path>
- Override the library_path value from the [sqlfluff:templater:jinja] configuration value. Set this to 'none' to disable entirely. This overrides any values set by users in configuration files or inline directives.
- --disable-noqa-except <disable_noqa_except>
- Ignore all but the listed rules inline noqa comments.
- --disable-noqa
- Set this flag to ignore inline noqa comments.
- --logger <logger>
- Choose to limit the logging to one of the loggers.
- Options
- templater | lexer | parser | linter | rules | config
- --bench
- Set this flag to engage the benchmarking tool output.
- -i, --ignore <ignore>
- Ignore particular families of errors so that they don't cause a failed run. For example --ignore parsing would mean that any parsing errors are ignored and don't influence the success or fail of a run. --ignore behaves somewhat like noqa comments, except it applies globally. Multiple options are possible if comma separated: e.g. --ignore parsing,templating.
- --encoding <encoding>
- Specify encoding to use when reading and writing files. Defaults to autodetect.
- --ignore-local-config
- Ignore config files in default search path locations. This option allows the user to lint with the default config or can be used in conjunction with --config to only reference the custom config file.
- --config <extra_config_path>
- Include additional config file. By default the config is generated from the standard configuration files described in the documentation. This argument allows you to specify an additional configuration file that overrides the standard configuration files. N.B. cfg format is required.
- -e, --exclude-rules <exclude_rules>
- Exclude specific rules. For example specifying --exclude-rules LT01 will remove rule LT01 (Unnecessary trailing whitespace) from the set of considered rules. This could either be the allowlist, or the general set if there is no specific allowlist. Multiple rules can be specified with commas e.g. --exclude-rules LT01,LT02 will exclude violations of rule LT01 and rule LT02.
- -r, --rules <rules>
- Narrow the search to only specific rules. For example specifying --rules LT01 will only search for rule LT01 (Unnecessary trailing whitespace). Multiple rules can be specified with commas e.g. --rules LT01,LT02 will specify only looking for violations of rule LT01 and rule LT02.
- -t, --templater <templater>
- The templater to use (default=jinja)
- Options
- raw | jinja | python | placeholder
- -d, --dialect <dialect>
- The dialect of SQL to lint
- -c, --code-only
- Output only the code elements of the parse tree.
- -m, --include-meta
- Include meta segments (indents, dedents and placeholders) in the output. This only applies when outputting json or yaml.
- --write-output <write_output>
- Optionally provide a filename to write the results to, mostly used in tandem with --format. NB: Setting an output file re-enables normal stdout logging.
- --parse-statistics
- Set this flag to enabled detailed debugging readout on the use of terminators in the parser.
- --nofail
- If set, the exit code will always be zero, regardless of violations found. This is potentially useful during rollout.
Arguments
- PATH
- Required argument
render¶
Render SQL files and just spit out the result.
PATH is the path to a sql file. This should be either a single file file ('path/to/file.sql') or a single ('-') character to indicate reading from stdin.
sqlfluff render [OPTIONS] PATH
Options
- -n, --nocolor
- No color - output will be without ANSI color codes.
- -v, --verbose
- Verbosity, how detailed should the output be. This is stackable, so -vv is more verbose than -v. For the most verbose option try -vvvv or -vvvvv.
- --version
- Show the version and exit.
- --stdin-filename <stdin_filename>
- When using stdin as an input, load the configuration as if the contents of stdin was in a file in the listed location. This is useful for some editors that pass file contents from the editor that might not match the content on disk.
- --library-path <library_path>
- Override the library_path value from the [sqlfluff:templater:jinja] configuration value. Set this to 'none' to disable entirely. This overrides any values set by users in configuration files or inline directives.
- --disable-noqa-except <disable_noqa_except>
- Ignore all but the listed rules inline noqa comments.
- --disable-noqa
- Set this flag to ignore inline noqa comments.
- --logger <logger>
- Choose to limit the logging to one of the loggers.
- Options
- templater | lexer | parser | linter | rules | config
- --bench
- Set this flag to engage the benchmarking tool output.
- -i, --ignore <ignore>
- Ignore particular families of errors so that they don't cause a failed run. For example --ignore parsing would mean that any parsing errors are ignored and don't influence the success or fail of a run. --ignore behaves somewhat like noqa comments, except it applies globally. Multiple options are possible if comma separated: e.g. --ignore parsing,templating.
- --encoding <encoding>
- Specify encoding to use when reading and writing files. Defaults to autodetect.
- --ignore-local-config
- Ignore config files in default search path locations. This option allows the user to lint with the default config or can be used in conjunction with --config to only reference the custom config file.
- --config <extra_config_path>
- Include additional config file. By default the config is generated from the standard configuration files described in the documentation. This argument allows you to specify an additional configuration file that overrides the standard configuration files. N.B. cfg format is required.
- -e, --exclude-rules <exclude_rules>
- Exclude specific rules. For example specifying --exclude-rules LT01 will remove rule LT01 (Unnecessary trailing whitespace) from the set of considered rules. This could either be the allowlist, or the general set if there is no specific allowlist. Multiple rules can be specified with commas e.g. --exclude-rules LT01,LT02 will exclude violations of rule LT01 and rule LT02.
- -r, --rules <rules>
- Narrow the search to only specific rules. For example specifying --rules LT01 will only search for rule LT01 (Unnecessary trailing whitespace). Multiple rules can be specified with commas e.g. --rules LT01,LT02 will specify only looking for violations of rule LT01 and rule LT02.
- -t, --templater <templater>
- The templater to use (default=jinja)
- Options
- raw | jinja | python | placeholder
- -d, --dialect <dialect>
- The dialect of SQL to lint
Arguments
- PATH
- Required argument
rules¶
Show the current rules in use.
sqlfluff rules [OPTIONS]
Options
- -n, --nocolor
- No color - output will be without ANSI color codes.
- -v, --verbose
- Verbosity, how detailed should the output be. This is stackable, so -vv is more verbose than -v. For the most verbose option try -vvvv or -vvvvv.
- --version
- Show the version and exit.
version¶
Show the version of sqlfluff.
sqlfluff version [OPTIONS]
Options
- -n, --nocolor
- No color - output will be without ANSI color codes.
- -v, --verbose
- Verbosity, how detailed should the output be. This is stackable, so -vv is more verbose than -v. For the most verbose option try -vvvv or -vvvvv.
- --version
- Show the version and exit.
Python API¶
SQLFluff exposes a public api for other python applications to use. A basic example of this usage is given here, with the documentation for each of the methods below.
"""This is an example of how to use the simple sqlfluff api.""" from typing import Any, Dict, Iterator, List, Union import sqlfluff # -------- LINTING ---------- my_bad_query = "SeLEct *, 1, blah as fOO from mySchema.myTable" # Lint the given string and return an array of violations in JSON representation. lint_result = sqlfluff.lint(my_bad_query, dialect="bigquery") # lint_result = # [ # { # "code": "CP01", # "line_no": 1, # "line_pos": 1, # "description": "Keywords must be consistently upper case.", # } # ... # ] # -------- FIXING ---------- # Fix the given string and get a string back which has been fixed. fix_result_1 = sqlfluff.fix(my_bad_query, dialect="bigquery") # fix_result_1 = 'SELECT *, 1, blah AS foo FROM myschema.mytable\n' # We can also fix just specific rules. fix_result_2 = sqlfluff.fix(my_bad_query, rules=["CP01"]) # fix_result_2 = 'SELECT *, 1, blah AS fOO FROM mySchema.myTable' # Or a subset of rules... fix_result_3 = sqlfluff.fix(my_bad_query, rules=["CP01", "CP02"]) # fix_result_3 = 'SELECT *, 1, blah AS fOO FROM myschema.mytable' # -------- PARSING ---------- # Parse the given string and return a JSON representation of the parsed tree. parse_result = sqlfluff.parse(my_bad_query) # parse_result = {'file': {'statement': {...}, 'newline': '\n'}} # This JSON structure can then be parsed as required. # An example usage is shown below: def get_json_segment(
parse_result: Dict[str, Any], segment_type: str ) -> Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]:
"""Recursively search JSON parse result for specified segment type.
Args:
parse_result (Dict[str, Any]): JSON parse result from `sqlfluff.fix`.
segment_type (str): The segment type to search for.
Yields:
Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]:
Retrieves children of specified segment type as either a string for a raw
segment or as JSON or an array of JSON for non-raw segments.
"""
for k, v in parse_result.items():
if k == segment_type:
yield v
elif isinstance(v, dict):
yield from get_json_segment(v, segment_type)
elif isinstance(v, list):
for s in v:
yield from get_json_segment(s, segment_type) # e.g. Retrieve array of JSON for table references. table_references = list(get_json_segment(parse_result, "table_reference")) print(table_references) # [[{'identifier': 'mySchema'}, {'dot': '.'}, {'identifier': 'myTable'}]] # Retrieve raw table name from last identifier in the table reference. for table_reference in table_references:
table_name = list(get_json_segment(parse_result, "naked_identifier"))[-1]
print(f"table_name: {table_name}") # table_name: myTable
Simple API commands¶
Sqlfluff is a SQL linter for humans.
- sql (str) -- The SQL to be fixed.
- dialect (str, optional) -- A reference to the dialect of the SQL to be fixed. Defaults to ansi.
- rules (Optional[List[str], optional) -- A subset of rule references to fix for. Defaults to None.
- exclude_rules (Optional[List[str], optional) -- A subset of rule references to avoid fixing for. Defaults to None.
- config (Optional[FluffConfig], optional) -- A configuration object to use for the operation. Defaults to None.
- config_path (Optional[str], optional) -- A path to a .sqlfluff config, which is only used if a config is not already provided. Defaults to None.
- fix_even_unparsable (bool, optional) -- Optional override for the corresponding SQLFluff configuration value.
- Returns
- str for the fixed SQL if possible.
- sql (str) -- The SQL to be linted.
- dialect (str, optional) -- A reference to the dialect of the SQL to be linted. Defaults to ansi.
- rules (Optional[List[str], optional) -- A list of rule references to lint for. Defaults to None.
- exclude_rules (Optional[List[str], optional) -- A list of rule references to avoid linting for. Defaults to None.
- config (Optional[FluffConfig], optional) -- A configuration object to use for the operation. Defaults to None.
- config_path (Optional[str], optional) -- A path to a .sqlfluff config, which is only used if a config is not already provided. Defaults to None.
- Returns
- List[Dict[str, Any]] for each violation found.
- parse(sql: str, dialect: str = 'ansi', config: FluffConfig | None = None, config_path: str | None = None) -> Dict[str, Any]
- Parse a SQL string.
- sql (str) -- The SQL to be parsed.
- dialect (str, optional) -- A reference to the dialect of the SQL to be parsed. Defaults to ansi.
- config (Optional[FluffConfig], optional) -- A configuration object to use for the operation. Defaults to None.
- config_path (Optional[str], optional) -- A path to a .sqlfluff config, which is only used if a config is not already provided. Defaults to None.
- Returns
- Dict[str, Any] JSON containing the parsed structure.
NOTE:
Advanced API usage¶
The simple API presents only a fraction of the functionality present within the core SQLFluff library. For more advanced use cases, users can import the Linter() and FluffConfig() classes from sqlfluff.core. As of version 0.4.0 this is considered as experimental only as the internals may change without warning in any future release. If you come to rely on the internals of SQLFluff, please post an issue on GitHub to share what you're up to. This will help shape a more reliable, tidy and well documented public API for use.
Configuring SQLFluff¶
You can use FluffConfig() class to configure SQLFluff behaviour.
"""This is an example of providing config overrides.""" from sqlfluff.core import FluffConfig, Linter sql = "SELECT 1\n" config = FluffConfig(
overrides={
"dialect": "snowflake",
# NOTE: We explicitly set the string "none" here rather
# than a None literal so that it overrides any config
# set by any config files in the path.
"library_path": "none",
} ) linted_file = Linter(config=config).lint_string(sql) assert linted_file.get_violations() == []
Instances of FluffConfig() can be created manually, or parsed.
"""An example to show a few ways of configuring the API.""" import sqlfluff from sqlfluff.core import FluffConfig, Linter # ####################################### # The simple API can be configured in three ways. # 1. Limited keyword arguments sqlfluff.fix("SELECT 1", dialect="bigquery") # 2. Providing the path to a config file sqlfluff.fix("SELECT 1", config_path="test/fixtures/.sqlfluff") # 3. Providing a preconfigured FluffConfig object. # NOTE: This is the way of configuring SQLFluff which will give the most control. # 3a. FluffConfig objects can be created directly from a dictionary of values. config = FluffConfig(configs={"core": {"dialect": "bigquery"}}) # 3b. FluffConfig objects can be created from a config file in a string. config = FluffConfig.from_string("[sqlfluff]\ndialect=bigquery\n") # 3c. FluffConfig objects can be created from a config file in multiple strings # to simulate the effect of multiple nested config strings. config = FluffConfig.from_strings(
# NOTE: Given these two strings, the resulting dialect would be "mysql"
# as the later files take precedence.
"[sqlfluff]\ndialect=bigquery\n",
"[sqlfluff]\ndialect=mysql\n", ) # 3d. FluffConfig objects can be created from a path containing a config file. config = FluffConfig.from_path("test/fixtures/") # 3e. FluffConfig objects can be from keyword arguments config = FluffConfig.from_kwargs(dialect="bigquery", rules=["LT01"]) # The FluffConfig is then provided via a config argument. sqlfluff.fix("SELECT 1", config=config) # ####################################### # The core API is always configured using a FluffConfig object. # When instantiating a Linter (or Parser), a FluffConfig must be provided # on instantiation. See above for details on how to create a FluffConfig. linter = Linter(config=config) # The provided config will then be used in any operations. lint_result = linter.lint_string("SELECT 1", fix=True) fixed_string = lint_result.fix_string() # NOTE: The "True" element shows that fixing was a success. assert fixed_string == ("SELECT 1", True)
Supported dialects and rules are available through list_dialects() and list_rules().
"""This is an example of how get basic options from sqlfluff.""" import sqlfluff # -------- DIALECTS ---------- dialects = sqlfluff.list_dialects() # dialects = [DialectTuple(label='ansi', name='ansi', inherits_from='nothing'), ...] dialect_names = [dialect.label for dialect in dialects] # dialect_names = ["ansi", "snowflake", ...] # -------- RULES ---------- rules = sqlfluff.list_rules() # rules = [ # RuleTuple( # code='Example_LT01', # description='ORDER BY on these columns is forbidden!' # ), # ... # ] rule_codes = [rule.code for rule in rules] # rule_codes = ["LT01", "LT02", ...]
Advanced API reference¶
The core elements of sqlfluff.
- class FluffConfig(configs: ConfigMappingType | None = None, extra_config_path: str | None = None, ignore_local_config: bool = False, overrides: ConfigMappingType | None = None, plugin_manager: pluggy.PluginManager | None = None, require_dialect: bool = True)
- The class that actually gets passed around as a config object.
- copy() -> FluffConfig
- Returns a copy of the FluffConfig.
This creates a shallow copy of most of the config, but with a deep copy of the _configs dictionary.
- diff_to(other: FluffConfig) -> Dict[str, Any]
- Compare this config to another.
- Parameters
- other (FluffConfig) -- Another config object to compare against. We will return keys from this object that are not in other or are different to those in other.
- Returns
- A filtered dict of items in this config that are not in the other or are different to the other.
- classmethod from_kwargs(config: FluffConfig | None = None, dialect: str | None = None, rules: List[str] | None = None, exclude_rules: List[str] | None = None, require_dialect: bool = True) -> FluffConfig
- Instantiate a config from either an existing config or kwargs.
This is a convenience method for the ways that the public classes like Linter(), Parser() and Lexer() can be instantiated with a FluffConfig or with the convenience kwargs: dialect & rules.
- classmethod from_root(extra_config_path: str | None = None, ignore_local_config: bool = False, overrides: Dict[str, Any] | None = None, **kw: Any) -> FluffConfig
- Loads a config object just based on the root directory.
- classmethod from_strings(*config_strings: str, extra_config_path: str | None = None, ignore_local_config: bool = False, overrides: Dict[str, Any] | None = None, plugin_manager: PluginManager | None = None) -> FluffConfig
- Loads a config object given a series of nested config strings.
Config strings are incorporated from first to last, treating the first element as the "root" config, and then later config strings will take precedence over any earlier values.
- get(val: str, section: str | Iterable[str] = 'core', default: Any = None) -> Any
- Get a particular value from the config.
- get_section(section: str | Iterable[str]) -> Any
- Return a whole section of config as a dict.
If the element found at the address is a value and not a section, it is still returned and so this can be used as a more advanced from of the basic get method.
- Parameters
- section -- An iterable or string. If it's a string we load that root section. If it's an iterable of strings, then we treat it as a path within the dictionary structure.
- get_templater(**kwargs: Any) -> RawTemplater
- Instantiate the configured templater.
- get_templater_class() -> Type['RawTemplater']
- Get the configured templater class.
NOTE: This is mostly useful to call directly when rules want to determine the type of a templater without (in particular to work out if it's a derivative of the jinja templater), without needing to instantiate a full templater. Instantiated templaters don't pickle well, so aren't automatically passed around between threads/processes.
- iter_vals(cfg: Dict[str, Any] | None = None) -> Iterable[Tuple[Any, ...]]
- Return an iterable of tuples representing keys.
We show values before dicts, the tuple contains an indent value to know what level of the dict we're in. Dict labels will be returned as a blank value before their content.
- make_child_from_path(path: str) -> FluffConfig
- Make a child config at a path but pass on overrides and extra_config_path.
- process_inline_config(config_line: str, fname: str) -> None
- Process an inline config command and update self.
- process_raw_file_for_config(raw_str: str, fname: str) -> None
- Process a full raw file for inline config and update self.
- set_value(config_path: Iterable[str], val: Any) -> None
- Set a value at a given path.
- verify_dialect_specified() -> None
- Check if the config specifies a dialect, raising an error if not.
- class Lexer(config: FluffConfig | None = None, last_resort_lexer: StringLexer | None = None, dialect: str | None = None)
- The Lexer class actually does the lexing step.
- elements_to_segments(elements: List[TemplateElement], templated_file: TemplatedFile) -> Tuple[RawSegment, ...]
- Convert a tuple of lexed elements into a tuple of segments.
- lex(raw: str | TemplatedFile) -> Tuple[Tuple[BaseSegment, ...], List[SQLLexError]]
- Take a string or TemplatedFile and return segments.
If we fail to match the whole string, then we must have found something that we cannot lex. If that happens we should package it up as unlexable and keep track of the exceptions.
- static lex_match(forward_string: str, lexer_matchers: List[StringLexer]) -> LexMatch
- Iteratively match strings using the selection of submatchers.
- static map_template_slices(elements: List[LexedElement], template: TemplatedFile) -> List[TemplateElement]
- Create a tuple of TemplateElement from a tuple of LexedElement.
This adds slices in the templated file to the original lexed elements. We'll need this to work out the position in the source file.
- static violations_from_segments(segments: Tuple[RawSegment, ...]) -> List[SQLLexError]
- Generate any lexing errors for any unlexables.
- class Linter(config: FluffConfig | None = None, formatter: Any = None, dialect: str | None = None, rules: List[str] | None = None, user_rules: List[Type[BaseRule]] | None = None, exclude_rules: List[str] | None = None)
- The interface class to interact with the linter.
- classmethod allowed_rule_ref_map(reference_map: Dict[str, Set[str]], disable_noqa_except: str | None) -> Dict[str, Set[str]]
- Generate a noqa rule reference map.
- fix(tree: BaseSegment, config: FluffConfig | None = None, fname: str | None = None, templated_file: TemplatedFile | None = None) -> Tuple[BaseSegment, List[SQLBaseError]]
- Return the fixed tree and violations from lintfix when we're fixing.
- get_rulepack(config: FluffConfig | None = None) -> RulePack
- Get hold of a set of rules.
- lint(tree: BaseSegment, config: FluffConfig | None = None, fname: str | None = None, templated_file: TemplatedFile | None = None) -> List[SQLBaseError]
- Return just the violations from lintfix when we're only linting.
- classmethod lint_fix_parsed(tree: BaseSegment, config: FluffConfig, rule_pack: RulePack, fix: bool = False, fname: str | None = None, templated_file: TemplatedFile | None = None, formatter: Any = None) -> Tuple[BaseSegment, List[SQLBaseError], IgnoreMask | None, List[Tuple[str, str, float]]]
- Lint and optionally fix a tree object.
- classmethod lint_parsed(parsed: ParsedString, rule_pack: RulePack, fix: bool = False, formatter: Any = None, encoding: str = 'utf8') -> LintedFile
- Lint a ParsedString and return a LintedFile.
- classmethod lint_rendered(rendered: RenderedFile, rule_pack: RulePack, fix: bool = False, formatter: Any = None) -> LintedFile
- Take a RenderedFile and return a LintedFile.
- Returns
- an object representing that linted file.
- Return type
- LintedFile
- static load_raw_file_and_config(fname: str, root_config: FluffConfig) -> Tuple[str, FluffConfig, str]
- Load a raw file and the associated config.
- parse_path(path: str, parse_statistics: bool = False) -> Iterator[ParsedString]
- Parse a path of sql files.
NB: This a generator which will yield the result of each file within the path iteratively.
- static remove_templated_errors(linting_errors: List[SQLBaseError]) -> List[SQLBaseError]
- Filter a list of lint errors, removing those from the templated slices.
- render_file(fname: str, root_config: FluffConfig) -> RenderedFile
- Load and render a file with relevant config.
- rule_tuples() -> List[RuleTuple]
- A simple pass through to access the rule tuples of the rule set.
- class Parser(config: FluffConfig | None = None, dialect: str | None = None)
- Instantiates parsed queries from a sequence of lexed raw segments.
- parse(segments: Sequence[BaseSegment], fname: str | None = None, parse_statistics: bool = False) -> BaseSegment | None
- Parse a series of lexed tokens using the current dialect.
Internal API¶
Anything within this section should only be necessary for people who are developing plugins or rules to interact with SQLFluff on a deeper level or people who've decided to help the project by contributing to SQLFluff.
As these docs are some of the least commonly used, you may find that not all modules are documented directly here, and so you may find it instructive to read this together with docstrings and comments directly within the SQLFluff codebase on GitHub.
sqlfluff.core.rules.base: Base Rule Classes¶
Implements the base rule class.
Rules crawl through the trees returned by the parser and evaluate particular rules.
The intent is that it should be possible for the rules to be expressed as simply as possible, with as much of the complexity abstracted away.
The evaluation function should take enough arguments that it can evaluate the position of the given segment in relation to its neighbors, and that the segment which finally "triggers" the error, should be the one that would be corrected OR if the rule relates to something that is missing, then it should flag on the segment FOLLOWING, the place that the desired element is missing.
- class BaseRule(code: str, description: str, **kwargs: Any)
- The base class for a rule.
- code (str) -- The identifier for this rule, used in inclusion or exclusion.
- description (str) -- A human readable description of what this rule does. It will be displayed when any violations are found.
- crawl(tree: BaseSegment, dialect: Dialect, fix: bool, templated_file: TemplatedFile | None, ignore_mask: IgnoreMask | None, fname: str | None, config: FluffConfig) -> Tuple[List[SQLLintError], Tuple[RawSegment, ...], List[LintFix], Dict[str, Any] | None]
- Run the rule on a given tree.
- Returns
- A tuple of (vs, raw_stack, fixes, memory)
- static discard_unsafe_fixes(lint_result: LintResult, templated_file: TemplatedFile | None) -> None
- Remove (discard) LintResult fixes if they are "unsafe".
By removing its fixes, a LintResult will still be reported, but it will be treated as _unfixable_.
- static filter_meta(segments: Sequence[BaseSegment], keep_meta: bool = False) -> Tuple[BaseSegment, ...]
- Filter the segments to non-meta.
Or optionally the opposite if keep_meta is True.
- classmethod get_config_ref() -> str
- Return the config lookup ref for this rule.
If a name is defined, it's the name - otherwise the code.
The name is a much more understandable reference and so makes config files more readable. For backward compatibility however we also support the rule code for those without names.
- classmethod get_parent_of(segment: BaseSegment, root_segment: BaseSegment) -> BaseSegment | None
- Return the segment immediately containing segment.
NB: This is recursive.
- segment -- The segment to look for.
- root_segment -- Some known parent of the segment we're looking for (although likely not the direct parent in question).
- static split_comma_separated_string(raw: str | List[str]) -> List[str]
- Converts comma separated string to List, stripping whitespace.
- class LintResult(anchor: BaseSegment | None = None, fixes: List[LintFix] | None = None, memory: Any | None = None, description: str | None = None, source: str | None = None)
- A class to hold the results of a rule evaluation.
- anchor (BaseSegment, optional) -- A segment which represents the position of the problem. NB: Each fix will also hold its own reference to position, so this position is mostly for alerting the user to where the problem is.
- fixes (list of LintFix, optional) -- An array of any fixes which would correct this issue. If not present then it's assumed that this issue will have to manually fixed.
- memory (dict, optional) -- An object which stores any working memory for the rule. The memory returned in any LintResult will be passed as an input to the next segment to be crawled.
- description (str, optional) -- A description of the problem identified as part of this result. This will override the description of the rule as what gets reported to the user with the problem if provided.
- source (str, optional) -- A string identifier for what generated the result. Within larger libraries like reflow this can be useful for tracking where a result came from.
- to_linting_error(rule: BaseRule) -> SQLLintError | None
- Convert a linting result to a SQLLintError if appropriate.
- class RuleLoggingAdapter(logger, extra=None)
- A LoggingAdapter for rules which adds the code of the rule to it.
- process(msg: str, kwargs: Any) -> Tuple[str, Any]
- Add the code element to the logging message before emit.
- class RuleMetaclass(name: str, bases: List[BaseRule], class_dict: Dict[str, Any])
- The metaclass for rules.
This metaclass provides provides auto-enrichment of the rule docstring so that examples, groups, aliases and names are added.
The reason we enrich the docstring is so that it can be picked up by autodoc and all be displayed in the sqlfluff docs.
- class RulePack(rules: List[BaseRule], reference_map: Dict[str, Set[str]])
- A bundle of rules to be applied.
This contains a set of rules, post filtering but also contains the mapping required to interpret any noqa messages found in files.
The reason for this object is that rules are filtered and instantiated into this pack in the main process when running in multi-processing mode so that user defined rules can be used without reference issues.
- reference_map
- A mapping of rule references to the codes they refer to, e.g. {"my_ref": {"LT01", "LT02"}}. The references (i.e. the keys) may be codes, groups, aliases or names. The values of the mapping are sets of rule codes only. This object acts as a lookup to be able to translate selectors (which may contain diverse references) into a consolidated list of rule codes. This mapping contains the full set of rules, rather than just the filtered set present in the rules attribute.
- Type
- dict
- codes() -> Iterator[str]
- Returns an iterator through the codes contained in the pack.
- class RuleSet(name: str, config_info: Dict[str, Dict[str, Any]])
- Class to define a ruleset.
A rule set is instantiated on module load, but the references to each of its classes are instantiated at runtime. This means that configuration values can be passed to those rules live and be responsive to any changes in configuration from the path that the file is in.
Rules should be fetched using the get_rulelist() command which also handles any filtering (i.e. allowlisting and denylisting).
New rules should be added to the instance of this class using the register() decorator. That decorator registers the class, but also performs basic type and name-convention checks.
The code for the rule will be parsed from the name, the description from the docstring. The eval function is assumed that it will be overridden by the subclass, and the parent class raises an error on this function if not overridden.
- copy() -> RuleSet
- Return a copy of self with a separate register.
- get_rulepack(config: FluffConfig) -> RulePack
- Use the config to return the appropriate rules.
We use the config both for allowlisting and denylisting, but also for configuring the rules given the given config.
- register(cls: Type[BaseRule], plugin: PluginSpec | None = None) -> Type[BaseRule]
- Decorate a class with this to add it to the ruleset.
@myruleset.register class Rule_LT01(BaseRule):
"Description of rule."
def eval(self, **kwargs):
return LintResult()
We expect that rules are defined as classes with the name Rule_XXXX where XXXX is of the form LNNN, where L is a letter (literally L for linting by default) and N is a three digit number.
If this receives classes by any other name, then it will raise a ValueError.
- rule_reference_map() -> Dict[str, Set[str]]
- Generate a rule reference map for looking up rules.
Generate the master reference map. The priority order is: codes > names > groups > aliases (i.e. if there's a collision between a name and an alias - we assume the alias is wrong)
sqlfluff.utils.functional: Functional Traversal API¶
These newer modules provide a higher-level API for rules working with segments and slices. Rules that need to navigate or search the parse tree may benefit from using these. Eventually, the plan is for all rules to use these modules. As of December 30, 2021, 17+ rules use these modules.
The modules listed below are submodules of sqlfluff.utils.functional.
segments Module¶
Surrogate class for working with Segment collections.
- class Segments(*segments: BaseSegment, templated_file: TemplatedFile | None = None)
- Encapsulates a sequence of one or more BaseSegments.
The segments may or may not be contiguous in a parse tree. Provides useful operations on a sequence of segments to simplify rule creation.
- all(predicate: Callable[[BaseSegment], bool] | None = None) -> bool
- Do all the segments match?
- any(predicate: Callable[[BaseSegment], bool] | None = None) -> bool
- Do any of the segments match?
- apply(fn: Callable[[BaseSegment], Any]) -> List[Any]
- Apply function to every item.
- children(predicate: Callable[[BaseSegment], bool] | None = None) -> Segments
- Returns an object with children of the segments in this object.
- find(segment: BaseSegment | None) -> int
- Returns index if found, -1 if not found.
- first(predicate: Callable[[BaseSegment], bool] | None = None) -> Segments
- Returns the first segment (if any) that satisfies the predicates.
- get(index: int = 0, *, default: BaseSegment | None = None) -> BaseSegment | None
- Return specified item. Returns default if index out of range.
- iterate_segments(predicate: Callable[[BaseSegment], bool] | None = None) -> Iterable[Segments]
- Loop over each element as a fresh Segments.
- last(predicate: Callable[[BaseSegment], bool] | None = None) -> Segments
- Returns the last segment (if any) that satisfies the predicates.
- property raw_segments: Segments
- Get raw segments underlying the segments.
- property raw_slices: RawFileSlices
- Raw slices of the segments, sorted in source file order.
- recursive_crawl(*seg_type: str, recurse_into: bool = True) -> Segments
- Recursively crawl for segments of a given type.
- recursive_crawl_all() -> Segments
- Recursively crawl all descendant segments.
- reversed() -> Segments
- Return the same segments in reverse order.
- select(select_if: Callable[[BaseSegment], bool] | None = None, loop_while: Callable[[BaseSegment], bool] | None = None, start_seg: BaseSegment | None = None, stop_seg: BaseSegment | None = None) -> Segments
- Retrieve range/subset.
NOTE: Iterates the segments BETWEEN start_seg and stop_seg, i.e. those segments are not included in the loop.
segment_predicates Module¶
Defines commonly used segment predicates for rule writers.
For consistency, all the predicates in this module are implemented as functions returning functions. This avoids rule writers having to remember the distinction between normal functions and functions returning functions.
This is not necessarily a complete set of predicates covering all possible requirements. Rule authors can define their own predicates as needed, either as regular functions, lambda, etc.
- and_(*functions: Callable[[BaseSegment], bool]) -> Callable[[BaseSegment], bool]
- Returns a function that computes the functions and-ed together.
- get_type() -> Callable[[BaseSegment], str]
- Returns a function that gets segment type.
- is_code() -> Callable[[BaseSegment], bool]
- Returns a function that checks if segment is code.
- is_comment() -> Callable[[BaseSegment], bool]
- Returns a function that checks if segment is comment.
- is_keyword(*keyword_name: str) -> Callable[[BaseSegment], bool]
- Returns a function that determines if it's a matching keyword.
- is_meta() -> Callable[[BaseSegment], bool]
- Returns a function that checks if segment is meta.
- is_raw() -> Callable[[BaseSegment], bool]
- Returns a function that checks if segment is raw.
- is_templated() -> Callable[[BaseSegment], bool]
- Returns a function that checks if segment is templated.
- is_type(*seg_type: str) -> Callable[[BaseSegment], bool]
- Returns a function that determines if segment is one of the types.
- is_whitespace() -> Callable[[BaseSegment], bool]
- Returns a function that checks if segment is whitespace.
- not_(fn: Callable[[BaseSegment], bool]) -> Callable[[BaseSegment], bool]
- Returns a function that computes: not fn().
- or_(*functions: Callable[[BaseSegment], bool]) -> Callable[[BaseSegment], bool]
- Returns a function that computes the functions or-ed together.
- raw_is(*raws: str) -> Callable[[BaseSegment], bool]
- Returns a function that determines if segment matches one of the raw inputs.
- raw_slices(segment: BaseSegment, templated_file: TemplatedFile | None) -> RawFileSlices
- Returns raw slices for a segment.
- raw_upper_is(*raws: str) -> Callable[[BaseSegment], bool]
- Returns a function that determines if segment matches one of the raw inputs.
- templated_slices(segment: BaseSegment, templated_file: TemplatedFile | None) -> TemplatedFileSlices
- Returns raw slices for a segment.
raw_file_slices Module¶
Surrogate class for working with RawFileSlice collections.
- class RawFileSlices(*raw_slices: RawFileSlice, templated_file: TemplatedFile | None = None)
- Encapsulates a sequence of one or more RawFileSlice.
The slices may or may not be contiguous in a file. Provides useful operations on a sequence of slices to simplify rule creation.
- all(predicate: Callable[[RawFileSlice], bool] | None = None) -> bool
- Do all the raw slices match?
- any(predicate: Callable[[RawFileSlice], bool] | None = None) -> bool
- Do any of the raw slices match?
- select(select_if: Callable[[RawFileSlice], bool] | None = None, loop_while: Callable[[RawFileSlice], bool] | None = None, start_slice: RawFileSlice | None = None, stop_slice: RawFileSlice | None = None) -> RawFileSlices
- Retrieve range/subset.
NOTE: Iterates the slices BETWEEN start_slice and stop_slice, i.e. those slices are not included in the loop.
raw_file_slice_predicates Module¶
Defines commonly used raw file slice predicates for rule writers.
For consistency, all the predicates in this module are implemented as functions returning functions. This avoids rule writers having to remember the distinction between normal functions and functions returning functions.
This is not necessarily a complete set of predicates covering all possible requirements. Rule authors can define their own predicates as needed, either as regular functions, lambda, etc.
- is_slice_type(*slice_types: str) -> Callable[[RawFileSlice], bool]
- Returns a function that determines if segment is one of the types.
sqlfluff.utils.reflow: Whitespace Reflow API¶
Many rules supported by SQLFluff involve the spacing and layout of different elements, either to enforce a particular layout or just to add or remove code elements in a way sensitive to the existing layout configuration. The way this is achieved is through some centralised utilities in the sqlfluff.utils.reflow module.
This module aims to achieve several things:
- Less code duplication by implementing reflow logic in only one place.
- Provide a streamlined interface for rules to easily utilise reflow logic.
- •
- Given this requirement, it's important that reflow utilities work within the existing framework for applying fixes to potentially templated code. We achieve this by returning LintFix objects which can then be returned by each rule wanting to use this logic.
- •
- Provide a consistent way of configuring layout requirements. For more details on configuration see Configuring Layout.
To support this, the module provides a ReflowSequence class which allows access to all of the relevant operations which can be used to reformat sections of code, or even a whole file. Unless there is a very good reason, all rules should use this same approach to ensure consistent treatment of layout.
- class ReflowSequence(elements: List[ReflowBlock | ReflowPoint], root_segment: BaseSegment, reflow_config: ReflowConfig, depth_map: DepthMap, lint_results: List[LintResult] | None = None)
- Class for keeping track of elements in a reflow operation.
This acts as the primary route into using the reflow routines. It acts in a way that plays nicely within a rule context in that it accepts segments and configuration, while allowing access to modified segments and a series of LintFix objects, which can be returned by the calling rule.
Sequences are made up of alternating ReflowBlock and ReflowPoint objects (even if some points have no segments). This is validated on construction.
Most operations also return ReflowSequence objects such that operations can be chained, and then the resultant fixes accessed at the last stage, for example:
fixes = (
ReflowSequence.from_around_target(
context.segment,
root_segment=context.parent_stack[0],
config=context.config,
)
.rebreak()
.get_fixes() )
- break_long_lines() -> ReflowSequence
- Rebreak any remaining long lines in a sequence.
This assumes that reindent() has already been applied.
- classmethod from_around_target(target_segment: BaseSegment, root_segment: BaseSegment, config: FluffConfig, sides: str = 'both') -> ReflowSequence
- Generate a sequence around a target.
- target_segment (RawSegment) -- The segment to center around when considering the sequence to construct.
- root_segment (BaseSegment) -- The relevant root segment (usually the base FileSegment).
- config (FluffConfig) -- A config object from which to load the spacing behaviours of different segments.
- sides (str) -- Limit the reflow sequence to just one side of the target. Default is two sided ("both"), but set to "before" or "after" to limit to either side.
NOTE: We don't just expand to the first block around the target but to the first code element, which means we may swallow several comment blocks in the process.
To evaluate reflow around a specific target, we need need to generate a sequence which goes for the preceding raw to the following raw. i.e. at least: block - point - block - point - block (where the central block is the target).
- classmethod from_raw_segments(segments: Sequence[RawSegment], root_segment: BaseSegment, config: FluffConfig, depth_map: DepthMap | None = None) -> ReflowSequence
- Construct a ReflowSequence from a sequence of raw segments.
This is intended as a base constructor, which others can use. In particular, if no depth_map argument is provided, this method will generate one in a potentially inefficient way. If the calling method has access to a better way of inferring a depth map (for example because it has access to a common root segment for all the content), it should do that instead and pass it in.
- classmethod from_root(root_segment: BaseSegment, config: FluffConfig) -> ReflowSequence
- Generate a sequence from a root segment.
- root_segment (BaseSegment) -- The relevant root segment (usually the base FileSegment).
- config (FluffConfig) -- A config object from which to load the spacing behaviours of different segments.
- get_fixes() -> List[LintFix]
- Get the current fix buffer.
We're hydrating them here directly from the LintResult objects, so for more accurate results, consider using .get_results(). This method is particularly useful when consolidating multiple results into one.
- get_raw() -> str
- Get the current raw representation.
- get_results() -> List[LintResult]
- Return the current result buffer.
- insert(insertion: RawSegment, target: RawSegment, pos: str = 'before') -> ReflowSequence
- Returns a new ReflowSequence with the new element inserted.
Insertion is always relative to an existing element. Either before or after it as specified by pos. This generates appropriate creation LintFix objects to direct the linter to insert those elements.
- rebreak() -> ReflowSequence
- Returns a new ReflowSequence corrected line breaks.
This intentionally does not handle indentation, as the existing indents are assumed to be correct.
NOTE:
- reindent() -> ReflowSequence
- Reindent lines within a sequence.
- replace(target: BaseSegment, edit: Sequence[BaseSegment]) -> ReflowSequence
- Returns a new ReflowSequence with edit elements replaced.
This generates appropriate replacement LintFix objects to direct the linter to modify those elements.
- respace(strip_newlines: bool = False, filter: str = 'all') -> ReflowSequence
- Returns a new ReflowSequence with points respaced.
- strip_newlines (bool) -- Optionally strip newlines before respacing. This is primarily used on focused sequences to coerce objects onto a single line. This does not apply any prioritisation to which line breaks to remove and so is not a substitute for the full reindent or reflow methods.
- filter (str) -- Optionally filter which reflow points to respace. Default configuration is all. Other options are line_break which only respaces points containing a newline or followed by an end_of_file marker, or inline which is the inverse of line_break. This is most useful for filtering between trailing whitespace and fixes between content on a line.
NOTE this method relies on the embodied results being correct so that we can build on them.
- without(target: RawSegment) -> ReflowSequence
- Returns a new ReflowSequence without the specified segment.
This generates appropriate deletion LintFix objects to direct the linter to remove those elements.
- class ReflowPoint(segments: Tuple[RawSegment, ...])
- Class for keeping track of editable elements in reflow.
This class, and its sibling ReflowBlock, should not normally be manipulated directly by rules, but instead should be manipulated using ReflowSequence.
It holds segments which can be changed during a reflow operation such as whitespace and newlines.It may also contain Indent and Dedent elements.
It holds no configuration and is influenced by the blocks on either side, so that any operations on it usually have that configuration passed in as required.
- property class_types: Set[str]
- Get the set of contained class types.
Parallel to BaseSegment.class_types
- get_indent() -> str | None
- Get the current indent (if there).
- get_indent_impulse() -> IndentStats
- Get the change in intended indent balance from this point.
- get_indent_segment_vals(exclude_block_indents=False) -> List[int]
- Iterate through any indent segments and extract their values.
- indent_to(desired_indent: str, after: BaseSegment | None = None, before: BaseSegment | None = None, description: str | None = None, source: str | None = None) -> Tuple[List[LintResult], ReflowPoint]
- Coerce a point to have a particular indent.
If the point currently contains no newlines, one will be introduced and any trailing whitespace will be effectively removed.
More specifically, the newline is inserted before the existing whitespace, with the new indent being a replacement for that same whitespace.
For placeholder newlines or indents we generate appropriate source fixes.
- is_all_unrendered() -> bool
- Return whether this element is all unrendered.
Returns True if contains only whitespace, indents, template loops or placeholders.
Note: * ReflowBlocks will contain the placeholders and loops * ReflowPoints will contain whitespace, indents and newlines.
- num_newlines() -> int
- Return the number of newlines in this element.
These newlines are either newline segments or contained within consumed sections of whitespace. This counts both.
- property pos_marker: PositionMarker | None
- Get the first position marker of the element.
- property raw: str
- Get the current raw representation.
- respace_point(prev_block: ReflowBlock | None, next_block: ReflowBlock | None, root_segment: BaseSegment, lint_results: List[LintResult], strip_newlines: bool = False, anchor_on: str = 'before') -> Tuple[List[LintResult], ReflowPoint]
- Respace a point based on given constraints.
NB: This effectively includes trailing whitespace fixes.
Deletion and edit fixes are generated immediately, but creations are paused to the end and done in bulk so as not to generate conflicts.
Note that the strip_newlines functionality exists here as a slight exception to pure respacing, but as a very simple case of positioning line breaks. The default operation of respace does not enable it, however it exists as a convenience for rules which wish to use it.
- class ReflowBlock(segments: Tuple[RawSegment, ...], spacing_before: str, spacing_after: str, line_position: str | None, depth_info: DepthInfo, stack_spacing_configs: Dict[int, str], line_position_configs: Dict[int, str])
- Class for keeping track of elements to reflow.
This class, and its sibling ReflowPoint, should not normally be manipulated directly by rules, but instead should be manipulated using ReflowSequence.
It holds segments to reflow and also exposes configuration regarding how they are expected to reflow around others. Typically it holds only a single element, which is usually code or a templated element. Because reflow operations control spacing, it would be very unusual for this object to be modified; as such it exposes relatively few methods.
The attributes exposed are designed to be "post configuration" i.e. they should reflect configuration appropriately.
- property class_types: Set[str]
- Get the set of contained class types.
Parallel to BaseSegment.class_types
- depth_info: DepthInfo
- Metadata on the depth of this segment within the parse tree which is used in inferring how and where line breaks should exist.
- classmethod from_config(segments: Tuple[RawSegment, ...], config: ReflowConfig, depth_info: DepthInfo) -> ReflowBlock
- Construct a ReflowBlock while extracting relevant configuration.
This is the primary route to construct a ReflowBlock, as is allows all of the inference of the spacing and position configuration from the segments it contains and the appropriate config objects.
- is_all_unrendered() -> bool
- Return whether this element is all unrendered.
Returns True if contains only whitespace, indents, template loops or placeholders.
Note: * ReflowBlocks will contain the placeholders and loops * ReflowPoints will contain whitespace, indents and newlines.
- line_position: str | None
- Desired line position for this block. See Configuring layout and spacing
- line_position_configs: Dict[int, str]
- Desired line position configurations for parent segments of the segment in this block. See Configuring layout and spacing
- num_newlines() -> int
- Return the number of newlines in this element.
These newlines are either newline segments or contained within consumed sections of whitespace. This counts both.
- property pos_marker: PositionMarker | None
- Get the first position marker of the element.
- property raw: str
- Get the current raw representation.
- spacing_after: str
- Desired spacing after this block. See Configuring layout and spacing
- spacing_before: str
- Desired spacing before this block. See Configuring layout and spacing
- stack_spacing_configs: Dict[int, str]
- Desired spacing configurations for parent segments of the segment in this block. See Configuring layout and spacing
Release Notes¶
This page aims to act as a guide for migrating between major SQLFluff releases. Necessarily this means that bugfix releases, or releases requiring no change for the user are not mentioned. For full details of each individual release, see the detailed changelog.
Upgrading to 3.x¶
This release makes a couple of potentially breaking changes:
- It drops support for python 3.7, which reached end of life in June 2023.
- It migrates to pyproject.toml rather than setup.cfg as the python packaging configuration file (although keeping setuptools as the default backend).
- The serialised output for sqlfluff lint now contains more information about the span of linting issues and initial proposed fixes. Beside the new fields, the original fields of line_pos and line_no have been renamed to start_line_pos and start_line_no, to distinguish them from the new fields starting end_*.
- When linting from stdin, if there are no violations found - before this version, the serialised response would be simply an empty list ([]). From 3.0 onwards, there will now be a record for the file with some statistics, but the violations section of the response for that file will still be an empty list.
- The default annotation_level set by the --annotation-level option on the sqlfluff lint command has been changed from notice to warning, to better distinguish linting errors from warnings, which always now have the level of notice. This is only relevant when using the github-annotation or github-annotation-native formats.
- The previously deprecated --disable_progress_bar on :code:lint, fix and format has now been removed entirely. Please migrate to --disable-progress-bar to continue using this option.
- The --force option on sqlfluff fix is now the default behaviour and so the option has been deprecated. A new --check option has been introduced which mimics the old default behaviour. This has been changed as it enables significantly lower memory overheads when linting and fixing large projects.
Upgrading to 2.3¶
This release include two minor breaking changes which will only affect users engaged in performance optimisation of SQLFluff itself.
- The --profiler option on sqlfluff parse has been removed. It was only present on the parse command and not lint or fix, and it is just as simple to invoke the python cProfiler directly.
- The --recurse cli option and sqlfluff.recurse configuration option have both been removed. They both existed purely for debugging the parser, and were never used in a production setting. The improvement in other debugging messages when unparsable sections are found means that this option is no longer necessary.
Upgrading to 2.2¶
This release changes some of the interfaces between SQLFluff core and our plugin ecosystem. The only breaking change is in the interface between SQLFluff and templater plugins (which are not common in the ecosystem, hence why this is only a minor and not a major release).
For all plugins, we also recommend a different structure for their imports (especially for rule plugins which are more common in the ecosystem) - for performance and stability reasons. Some users had been experiencing very long import times with previous releases as a result of the layout of plugin imports. Users with affected plugins will begin to see a warning from this release onward, which can be resolved for their plugin by updating to a new version of that plugin which follows the guidelines.
Templater plugins¶
Templaters before this version would pass a make_template() callable to the slicing methods as part of being able to map the source file. This method would accept a str and return a jinja2.environment.Template object to allow the templater to render multiple variants of the template to do the slicing operation (which allows linting issues found in templated files to be mapped accurately back to their position in the unrendered source file). This approach is not very generalisable, and did not support templating operations with libraries other than jinja2.
As a result, we have amended the interface to instead pass a render_func() callable, which accepts a str and returns a str. This works fine for the jinja templater (and by extension the dbt templater) as they can simply wrap the original callable with a method that calls render() on the original Template object. It also however opens up the door to other templating engines, and in particular to remote templaters which might pass unrendered code over a HTTP connection for rendering.
Specifically:
- The slice_file() method of the base templater classes no longer accepts an optional make_template argument or a templated_str argument.
- Instead a render_func callable should be passed which can be called to generate the templated_str on demand.
- Unlike the optional make_template - render_func is not optional and should always be present.
Rule plugins¶
We recommend that the module in a plugin which defines all of the hook implementations (anything using the @hookimpl decorator) must be able to fully import before any rule implementations are imported. More specifically, SQLFluff must be able to both import and run any implementations of get_configs_info() before any plugin rules (i.e. any derivatives of BaseRule) are imported. Because of this, we recommend that rules are defined in a separate module to the root of the plugin and then only imported within the get_rules() method.
Importing in the main body of the module was previously our recommendation and so may be the case for versions of some plugins. If one of your plugins does use imports in this way, a warning will be presented from this version onward, recommending that you update your plugin.
See the Developing Plugins section of the docs for an example.
Upgrading from 1.x to 2.0¶
Upgrading to 2.0 brings several important breaking changes:
- All bundled rules have been recoded, both from generic L00X formats into groups within similar codes (e.g. an aliasing group with codes of the format AL0X), but also given names to allow much clearer referencing (e.g. aliasing.column).
- Rule Configuration now uses the rule name rather than the rule code to specify the section. Any unrecognised references in config files (whether they are references which do match existing rules by code or alias, or whether the match no rules at all) will raise warnings at runtime.
- A complete re-write of layout and whitespace handling rules (see Layout & Whitespace Configuration), and with that a change in how layout is configured (see Configuring Layout) and the combination of some rules that were previously separate. One example of this is that the legacy rules L001, L005, L006, L008, L023, L024, L039, L048 & L071 have been combined simply into LT01.
Recommended upgrade steps¶
To upgrade smoothly between versions, we recommend the following sequence:
- 1.
- The upgrade path will be simpler if you have a slimmer configuration file. Before upgrading, consider removing any sections from your configuration file (often .sqlfluff, see Configuration) which match the current Default Configuration. There is no need to respecify defaults in your local config if they are not different to the stock config.
- 2.
- In a local (or other non-production) environment, upgrade to SQLFluff 2.0.x. We recommend using a compatible release specifier such as ~=2.0.0, to ensure any minor bugfix releases are automatically included.
- 3.
- Examine your configuration file (as mentioned above), and evaluate how rules are currently specified. We recommend primarily using either rules or exclude_rules rather than both, as detailed in Enabling and Disabling Rules. Using either the sqlfluff rules CLI command or the online Rules Reference, replace all references to legacy rule codes (i.e. codes of the form L0XX). Specifically:
- In the rules and exclude_rules config values. Here, consider using group specifiers or names to make your config simpler to read and understand (e.g. capitalisation, is much more understandable than CP01,CP02,CP03,CP04,CP05, but the two specifiers will have the same effect). Note that while legacy codes will still be understood here (because they remain valid as aliases for those rules) - you may find that some rules no longer exist in isolation and so these references may be misleading. e.g. L005 is now an alias for layout.spacing but that rule is much more broad ranging than the original scope of L005, which was only spacing around commas.
- In Rule Configuration. In particular here, legacy references to rule codes are no longer valid, will raise warnings, and until resolved, the configuration in those sections will be ignored. The new section references should include the rule name (e.g. [sqlfluff:rules:capitalisation.keywords] rather than [sqlfluff:rules:L010]). This switch is designed to make configuration files more readable, but we cannot support backward compatibility here without also having to resolve the potential ambiguity of the scenario where both code-based and name-based are both used.
- Review the Configuring Layout documentation, and check whether any indentation or layout configuration should be revised.
- 4.
- Check your project for In-File Configuration Directives which refer to rule codes. Alter these in the same manner as described above for configuration files.
- 5.
- Test linting your project for unexpected linting issues. Where found, consider whether to use sqlfluff fix to repair them in bulk, or (if you disagree with the changes) consider changing which rules you enable or their configuration accordingly. In particular you may notice:
- The indentation rule (L003 as was, now LT02) has had a significant rewrite, and while much more flexible and accurate, it is also more specific. Note that Hanging Indents are no longer supported, and that while not enabled by default, many users may find the enabling Implicit Indents fits their organisation's style better.
- The spacing rule (LT01: layout.spacing) has a much wider scope, and so may pick up spacing issues that were not previously enforced. If you disagree with any of these, you can override the sqlfluff:layout sections of the config with different (or just more liberal settings, like any).
Example 2.0 config¶
To illustrate the points above, this is an illustrative example config for a 2.0 compatible project. Note that the config is fairly brief and sets only the values which differ from the default config.
[sqlfluff] dialect = snowflake templater = dbt max_line_length = 120 # Exclude some specific rules based on a mixture of codes and names exclude_rules = RF02, RF03, RF04, ST06, ST07, AM05, AM06, convention.left_join, layout.select_targets [sqlfluff:indentation] # Enabling implicit indents for this project. # See https://docs.sqlfluff.com/en/stable/perma/indent_locations.html allow_implicit_indents = True # Add a few specific rule configurations, referenced by the rule names # and not by the rule codes. [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower [sqlfluff:rules:capitalisation.identifiers] capitalisation_policy = lower [sqlfluff:rules:capitalisation.functions] extended_capitalisation_policy = lower # An example of setting a custom layout specification which # is more lenient than default config. [sqlfluff:layout:type:set_operator] line_position = alone
Upgrading to 1.4¶
This release brings several internal changes, and acts as a prelude to 2.0.0. In particular, the following config values have changed:
- sqlfluff:rules:L007:operator_new_lines has been changed to sqlfluff:layout:type:binary_operator:line_position.
- sqlfluff:rules:comma_style and sqlfluff:rules:L019:comma_style have both been consolidated into sqlfluff:layout:type:comma:line_position.
If any of these values have been set in your config, they will be automatically translated to the new values at runtime, and a warning will be shown. To silence the warning, update your config file to the new values. For more details on configuring layout see Configuring Layout.
Upgrading to 1.3¶
This release brings several potentially breaking changes to the underlying parse tree. For users of the cli tool in a linting context you should notice no change. If however your application relies on the structure of the SQLFluff parse tree or the naming of certain elements within the yaml format, then this may not be a drop-in replacement. Specifically:
- The addition of a new end_of_file` meta segment at the end of the parse structure.
- The addition of a template_loop` meta segment to signify a jump backward in the source file within a loop structure (e.g. a jinja for` loop).
- Much more specific types on some raw segments, in particular identifier and literal type segments will now appear in the parse tree with their more specific type (which used to be called name) e.g. naked_identifier, quoted_identifier, numeric_literal etc...
If using the python api, the parent type (such as identifier) will still register if you call .is_type("identifier"), as this function checks all inherited types. However the eventual type returned by .get_type()` will now be (in most cases) what used to be accessible at .name. The name attribute will be deprecated in a future release.
Upgrading to 1.2¶
This release introduces the capability to automatically skip large files, and sets default limits on the maximum file size before a file is skipped. Users should see a performance gain, but may experience warnings associated with these skipped files.
Upgrades pre 1.0¶
- 0.13.x new rule for quoted literals, option to remove hanging indents in rule L003, and introduction of ignore_words_regex.
- 0.12.x dialect is now mandatory, the spark3 dialect was renamed to sparksql and datatype capitalisation was extracted from L010 to it's own rule L063.
- 0.11.x rule L030 changed to use extended_capitalisation_policy.
- 0.10.x removed support for older dbt versions < 0.20 and stopped fix attempting to fix unparsable SQL.
- 0.9.x refinement of the Simple API, dbt 1.0.0 compatibility, and the official SQLFluff Docker image.
- 0.8.x an improvement to the performance of the parser, a rebuild of the Jinja Templater, and a progress bar for the CLI.
- 0.7.x extracted the dbt templater to a separate plugin and removed the exasol_fs dialect (now merged in with the main exasol).
- 0.6.x introduced parallel processing, which necessitated a big re-write of several innards.
- 0.5.x introduced some breaking changes to the API.
- 0.4.x dropped python 3.5, added the dbt templater, source mapping and also introduced the python API.
- 0.3.x drops support for python 2.7 and 3.4, and also reworks the handling of indentation linting in a potentially not backward compatible way.
- 0.2.x added templating support and a big restructure of rules and changed how users might interact with SQLFluff on templated code.
- 0.1.x involved a major re-write of the parser, completely changing the behaviour of the tool with respect to complex parsing.
SQLFluff in the Wild¶
Want to find other people who are using SQLFluff in production use cases? Want to brag about how you're using it? Just want to show solidarity with the project and provide a testimonial for it?
Just add a section below by raising a PR on GitHub by editing this file ✏️.
- SQLFluff in production dbt projects at tails.com. We use the SQLFluff cli as part of our CI pipeline in codeship to enforce certain styles in our SQL codebase (with over 650 models) and keep code quality high. Contact @alanmcruickshank.
- Netlify's data team uses SQLFluff with dbt to keep code quality in more than 350 models (and growing). Previously, we had our SQL Guidelines defined in a site hosted with Netlify, and now we're enforcing these rules in our CI workflow thanks to SQLFluff.
- Drizly's analytics team uses SQLFluff with dbt for over 700 models as part of our CI checks in GitHub. Before SQLFluff, we had SQL best practices outlined in a google doc and had to manually enforce through PR comments. We're now able to enforce much of our style guide automatically through SQLFluff.
- Petal's data-eng team runs SQLFluff on our 100+ model dbt project. As a pre-commit hook and as a CI check, SQLFluff helps keep our SQL readable and consistent.
- Surfline's Analytics Engineering team implemented SQLFluff as part of our continuous integration (CI) suite across our entire dbt project (700+ models). We implement the CI suite using GitHub Actions and Workflows. The benefits of using SQLFluff at Surfline are:
- The SQL in our dbt models is consistent and easily readable.
- Our style guide is maintained as code, not a README that is rarely updated.
- Reduced burden on Analytics Engineers to remember every single style rule.
- New Analytics Engineers can quickly see and learn what "good SQL" looks like at Surfline and start writing it from day 1.
- The HTTP Archive uses SQLFluff to automatically check for quality and consistency of code submitted by the many contributors to this project. In particular our annual Web Almanac attracts hundreds of volunteers to help analyse our BigQuery dataset and being able automatically lint Pull Requests through GitHub Actions is a fantastic way to help us maintain our growing repository of over a thousand queries.
- Brooklyn Data Co has a dbt_artifacts dbt package from which runs SQLFluff in CI to lint pull requests automatically. It uses the GitHub Actions workflow contributed by Greg Clunies, with annotations on pull requests to make it easy for contributors to see where their SQL has failed any rules. See an example pull request with SQLFluff annotations.
- Markerr has tightly integrated SQLFluff into our CI/CD process for data model changes and process improvements. Since adopting SQLFluff across the organization, the clarity of our SQL code has risen dramatically, freeing up review time to focus on deeper data and process-specific questions.
- Symend has a microservices platform supporting our SaaS product. We use SQLFLuff in the CI/CD process of several of our data-oriented microservices. Among other things, it validates our database migration scripts, deployed using schemachange and we have near-term plans to implement it for our dbt projects.
- At CarePay we use SQLFLuff to lint and fix all our dbt models as well as several other SQL heavy projects. Locally we use SQLFluff with pre-commit and have also integrated it into our CI/CD pipelines.
- Core Analytics Team from Typeform and videoask uses SQLFluff in the production dbt project for building our datawarehouse layer for both products:
- We use it locally in our day to day work, helping us to write cleaner code.
- We added SQLFluff to our CI processes, so during a PR we can check that any new or modified sql file has a consistent and easy-to-read format.
SQLFluff Slack¶
We have a fast-growing community on Slack, come and join us!
SQLFluff on Twitter¶
Follow us On Twitter @SQLFluff for announcements and other related posts.
- Index
- Module Index
- Search Page
AUTHOR¶
Alan Cruickshank
COPYRIGHT¶
2024, Alan Cruickshank
November 18, 2024 |