Avoiding injection with taint analysis
“Security is a process, not a product” — attribution unavailable
One simple way to improve the robustness of any code base is static analysis. It’s not widely used because it carries a (regrettably well-deserved) reputation for being a noisy, blunt instrument, but with small tweaks static analysis can become part of the common development process. In this post, I will explain how we use it to improve the security of our code.
Querying with SQL
Databases. A convenient way to store application data and wreck your application’s security.
A naive way to build queries is:

Where the email address is supplied by the user. While this works, it is also vulnerable to injection.
The simplest way we avoid SQL injection is to use prepared statements with parameterised queries (#1 on the OWASP SQL Injection Prevention Cheat Sheet).

The query is now a constant string and the value of email
is sent to the database server along with the query in such a way that it cannot be treated as SQL code.
This isn’t universally applicable. If we don’t want to go full ORM, there are certain cases where we still format our queries:
- The columns we select or table we query are chosen based on a condition
f”SELECT {requested_columns} FROM {view_name}_view;"
- We add filters (extra where clauses) based on a condition
SELECT id FROM users {build_where_clause} ORDER BY id;
Wouldn’t it be nice to detect vulnerabilities in such cases automatically?
Bandit
While looking for available security-related tools for Python we came across Bandit, a “tool designed to find common security issues in Python code”.
It analyses your code and prints out errors which are then investigated by the developer who either:
- Rewrites the code according to best practice or removes the identified vulnerability.
- Decides that Bandit is wrong and it’s not, in fact, a security problem. Bandit can be silenced by adding a
# nosec
comment to the flagged line. - Finds some other trick to rewrite the code so that Bandit doesn’t detect the issue.
Bandit ships with a plugin which finds all String nodes that match a SQL-like regex. If the string is formatted with {}
or %
(e.g. snippet 1) Bandit will report an issue. No issue is found in snippet 2 (correctly). Where it fails is for the more complicated situations where prepared statements cannot be used.

Bandit raises an issue here: “Possible SQL injection vector through string-based query construction”.
While correct that we are injecting the string filter_string
inside the actual query, Bandit doesn’t understand that the value of filter_string
comes from constant values. We can see that there is no way for a user to send a parameter which would be run as SQL code.
This is a false positive, a warning about code which is non-exploitable.
False Positives
False positives cause alert blindness. While good engineers understand the danger of injection vulnerabilities, if facing a massive list of false positives to fix they may easily miss a real one. By peppering the code with # nosec
comments, we also prevent Bandit from re-alerting if the surrounding code is later changed and becomes exploitable.
Taint analysis/code flow analysis/data flow analysis
Bandit’s strengths lie in looking for uses of potentially dangerous functions or bad defaults. In fact, we run some custom plugins for Bandit. In the case of injection, however, it just isn’t advanced enough to figure out what is actually happening.
Python-taint (pyt) takes a different approach to this problem: code flow analysis.
The code is processed to create a Control Flow Graph.
Certain functions and variables are defined as sources of “taint”. This means that they return values under the user’s control. E.g.:
flask.request.args.get
flask.requests.cookies
- Any variable in a flask route or equivalent for django
The Control Flow Graph is used to trace the path of this “taint” as it is assigned to different variables and passed through functions which propagate taint.

Eventually, taint may hit a “sink”: a function which must never receive tainted user input. E.g.:
db.execute
flask.redirect
subprocess.call
In snippet 3, the source of tainted user input request.args.get
doesn’t actually taint filter_string
or query
, so no false-positive vulnerability is found.
Looking back at snippet 1 we have the valid vulnerability chain request.args.get
(source) → email
→ query
→ db.execute
(sink) which python-taint flags like this:
File: snippet1.pyUser input at line 1, source "request.args.get(": ~call_1 = ret_request.args.get('email')Reassigned in: File: snippet1.py Line 1: email = ~call_1 Line 2: ~call_2 = ret_'SELECT id FROM users WHERE email = '{}''.format(email) Line 2: query = ~call_2Reaches line 3, sink "execute(" ~call_3 = ret_db.execute(query)
Our Experience
Pyt is awesome. Well, it’s still a bit rough around the edges, but already we have been able to:
- Turn off Bandit’s SQL injection plugins
- Vastly reduce the noise from security linting (our false positive rate)
- Get vulnerable code fixed by the responsible teams
- Remove tens of
# nosec
comments required by Bandit’s false positives
We initially had to make a few code changes to pyt before rolling it out:
- Support additional language features of Python 3.5 and 3.6.
- Mark particular arguments of functions as sinks:
db.execute(TAINT)
is a vulnerability, butdb.execute(safe_query, email=TAINT)
shouldn’t be. - Be more resilient to the varied code structures and language features occasionally used in the Smarkets code-base (recursion, currying).
- Make pyt exit with code 1 if there are vulnerabilities, so CI knows it’s a failure.
These changes have been merged upstream to pyt on github.
We use a custom triggers file to specify which functions are sources and sinks: you’ll probably want to do this too so that the results are relevant to your code base. Our full command is:
pyt --dont-prepend-root --no-local-imports --only-unsanitised --screen -v -t /path/to/pyt_triggers.json -pr /app -r /app
Warning
When deploying any linting tool you need to keep your engineers updated, and observe how they respond. Convenience trumps security, for end-users and developers alike. If there is a rush to get certain fixes out, and a brand new check gets in the way, a quick and crafty way to evade the check becomes awfully inviting. For example, under time pressure one could rewrite snippet 1 so that the query is split across multiple lines (fooling Bandit but not pyt):

In conclusion
Introducing static analysis, especially the code flow variant, can improve the security of almost any code base by catching certain types of straightforward programming errors before they become security bugs. However, you must be willing to spend time and effort to customise it for your needs. The default errors are likely to be noisy or misleading, which can turn the tool into an impediment.
At Smarkets, static analysis is part of our regular workflow. While it’s practically impossible to never ship insecure software, at least ours won’t be trivially so.