NumPy

NumPy logo. Source: Cournapeau 2018, BSD license.
NumPy logo. Source: Cournapeau 2018, BSD license.

NumPy is an open source Python library that enables efficient manipulation of multi-dimensional numerical data structures. These are called arrays in NumPy. NumPy is an alternative to Interactive Data Language (IDL) and MATLAB.

Since it's release in 2005, NumPy has become a fundamental package for numerical and scientific computing in Python. In addition to efficient data structures and operations on them, it provides many high-level mathematical functions that aid scientific computation. Pandas, SciPy, Matplotlib, scikit-learn and scikit-image are just a few popular scientific packages that make use of NumPy.

Discussion

  • What does NumPy do differently from core Python?
    Comparing storage of Python list and NumPy array. Source: UCF 2020.
    Comparing storage of Python list and NumPy array. Source: UCF 2020.

    Python is slower than compiled languages such as C but it's easy to learn. Python is suited for rapid prototyping and iterative development.

    While Python's list data type can be used to construct multi-dimensional data structures (lists containing lists), NumPy is faster and provides a better API for developers. Python's lists are general purpose. They can contain data of different types. This means that types are also stored, type-dispatching code is invoked at runtime and types are checked. Lists are processed using loops or comprehensions and can't be vectorized to support elementwise operations. NumPy sacrifices some of Python's flexibility to improve performance.

    Specifically, NumPy is better at these aspects:

    • Size: NumPy data structures take up less space. Each Python integer object takes 28 bytes whereas in NumPy an integer is just 8 bytes. A Python list of n items requires 64+8n+28n bytes whereas in NumPy it's 96+8n bytes.
    • Performance: NumPy code runs faster than Python code, particularly for large input data.
    • Functionality: NumPy provides lots of functions and methods to simplify operations. High-level operations such as linear algebra are also included.
  • What are some of the main features of NumPy?
    Example showing at most four elements loaded into registers and processed in parallel. Source: Konrad 2018.
    Example showing at most four elements loaded into registers and processed in parallel. Source: Konrad 2018.

    NumPy arrays are homogeneous, meaning that array elements are of the same type. Hence, no type checking is required at runtime. All elements of an array take up same amount of space.

    The spacing between elements along an axis is also constant. This is called striding. This is useful when the same data in memory can be used to create a new array without copying. Different arrays are therefore different views into memory. Thus, it's easier to modify data subsets in memory.

    Operations are vectorized, which means that the operation can be executed in parallel on multiple elements of the array. This speeds up computation. Developers need not write for loops.

    NumPy provides APIs for easy manipulation of arrays. Some of these are indexing, slicing, reshaping, stacking and splitting. Broadcasting is a feature that allows operations between vectors and scalars, or vectors of different sizes.

    NumPy integrates easily with C/C++ or Fortran code that may provide optimized implementations. Useful functions covering linear algebra, Fourier transform, and random numbers are provided.

  • Could you share some performance numbers comparing NumPy versus Python implementations?
    Comparing performance of pure Python, NumPy, Cython and C. Source: Ross 2014.
    Comparing performance of pure Python, NumPy, Cython and C. Source: Ross 2014.

    For a simple computation of mean and standard deviation of a million floating point numbers, NumPy was 30X faster than a pure Python implementation. However, optimized Cython and C implementations were even faster. Another study showed that if input is small (less than 200 numbers), pure Python did better than NumPy. For inputs greater than about 15,000 numbers, NumPy outperformed C++.

    One experiment in Machine Learning compared pure Python, NumPy and TensorFlow (on CPU) implementations of gradient descent. Runtimes were 18.65, 0.32 and 1.20 seconds respectively. NumPy was 50X faster than pure Python. For more complex ML problems deployed on multiple GPUs, TensorFlow is likely to outperform NumPy.

    When evaluating NumPy performance, the underlying library for vector/matrix computations matters. NumPy comes with Default BLAS & Lapack. Depending on the distribution, alternatives may be included: OpenBLAS, Intel MKL, ATLAS, etc. In general, these alternatives are faster than the default library. For example, SVD is 10X faster on Intel MKL.

    Hardware platforms may provide further acceleration. For example, Intel AVX2 provides at least 20% improvement on top of OpenBLAS.

  • Does NumPy automatically make use of GPU hardware?

    NumPy doesn't natively support GPUs. However, there are tools and libraries to run NumPy on GPUs.

    Numba is a Python compiler that can compile Python code to run on multicore CPUs and CUDA-enabled GPUs. Numba also understands NumPy and generates optimized compiled code. Developers specify type signatures for Python functions. Numba uses them towards just-in-time (JIT) compilation. Numba team also provides pyculib, which is a Python interface to CUDA libraries such as cuBLAS, cuFFT and cuRAND.

    Grumpy has been proposed as a framework to seamlessly target multicore CPUs and GPUs. It does a mix of JIT compilation and offloading to optimized libraries such as cuBLAS or LAPACK.

    CuPy is a Python library that implements NumPy arrays for CUDA-enabled GPUs and leverages CUDA GPU acceleration libraries. The code is mostly a drop-in replacement to NumPy code since the APIs are very similar. PyCUDA is a similar library from NVIDIA.

    MinPy is similar to CuPy and is meant to be a NumPy interface above MXNet for building artificial neural networks. It includes auto differentiation in addition to transparent CPU/GPU acceleration.

  • What are some essential resources to learn NumPy?

    The main NumPy website is the definitive resource to consult. Beginners can start by reading their Quickstart tutorial or the absolute beginner's guide. The latter includes the basics of installing NumPy.

    Rougier's book titled From Python to Numpy focuses on Python programmers who wish to learn NumPy and it's vectorization. Perhaps a classic is the PhD thesis titled Guide to NumPy, by Travis E. Oliphant who created NumPy.

    MATLAB users might want to read NumPy for Matlab users. It maps MATLAB operations to NumPy equivalents.

    DataCamp blog has shared a handy NumPy cheatsheet.

    Those who wish to contribute to the NumPy project or study it's source code can head to NumPy's GitHub repository.

Milestones

1995

Numeric is released to enable numerical computations. It's designed to provide homogeneous numeric arrays, that is, arrays whose elements all belong to the same data type, and therefore easier and faster to process.

2005

NumPy is released based on an older library named Numeric. It also combines features of another library named Numarray. NumPy is initially named SciPy Core but renamed to NumPy in January 2006.

Oct
2006

NumPy v1.0 is released.

Apr
2009

NumPy v1.3.0 is released. This release includes experimental Windows 64-bit support. Support for 64-bit OpenBLAS comes a decade later in December 2019.

Aug
2010

NumPy v1.5.0 is released. This is the first release to support Python 3.

Jan
2019

GitHub publishes a study of Machine Learning (ML) projects hosted on their platform. The study spans contributions from Jan-Dec 2018. It's seen that 74% of ML Python projects import NumPy. This is followed by SciPy and Pandas.

Jul
2019

NumPy v1.17.0 is released. This release supports Python 3.5-3.7 but drops support for Python 2.7. In fact, NumPy v1.16.x is the last series to support Python 2.7 but being a long term release, v1.16.x will be maintained till 2020. NumPy v1.16.6 is released in December 2019.

Feb
2020
Daily downloads of NumPy of Python 2 and 3 for Nov2019-Apr2020. Source: PyPI Stats 2020.
Daily downloads of NumPy of Python 2 and 3 for Nov2019-Apr2020. Source: PyPI Stats 2020.

Following the end of life of Python 2 in January 2020, the number of downloads for older NumPy releases based on Python 2 falls sharply. By April 2020, 80% of NumPy downloads are based on Python 3.

References

  1. Candido, Renato. 2018. "Pure Python vs NumPy vs TensorFlow Performance Comparison." Real Python, May 7. Updated 2018-07-05. Accessed 2020-04-27.
  2. Cohen, Ori. 2019. "Is your Numpy optimized for speed?" Towards Data Science, on Medium, September 27. Accessed 2020-04-27.
  3. Cournapeau, David. 2018. "File:NumPy logo.svg." Wikipedia, August 29. Accessed 2020-04-27.
  4. Elliott, Thomas. 2019. "The State of the Octoverse: machine learning." Blog, GitHub, January 24. Accessed 2020-04-27.
  5. Fowler, Matt. 2016. "Speeding up Python and NumPy: C++ing the Way." Medium, March 20. Accessed 2020-04-27.
  6. Harris, Mark. 2013. "Numba: High-Performance Python with CUDA Acceleration." NVIDIA Developer Blog, September 19. Updated 2017-09-19. Accessed 2020-04-27.
  7. Jimenez, Athenas. 2016. "Improving Python performance for scientific tools and libraries." 01.org, Blog, Intel Open Source, Intel Corporation, May 13. Accessed 2020-04-27.
  8. Konrad, Markus. 2018. "Vectorization and parallelization in Python with NumPy and Pandas." WZB Data Science Blog, February 02. Accessed 2020-04-27.
  9. MinPy Docs. 2016. "NumPy under MinPy, with GPU." Distributed (Deep) Machine Learning Community, on Read the Docs, November 11. Accessed 2020-04-27.
  10. NVIDIA Developer. 2011. "PyCUDA." NVIDIA, October 02. Updated 2018-10-11. Accessed 2020-04-27.
  11. NumPy. 2020a. "Older Array Packages." Accessed 2020-04-27.
  12. NumPy. 2020b. "Homepage." NumPy. Accessed 2020-04-27.
  13. NumPy DevDocs. 2020. "NumPy: the absolute basics for beginners." April 26. Accessed 2020-04-27.
  14. NumPy Docs. 2020a. "Release Notes." NumPy, February 5. Accessed 2020-04-27.
  15. NumPy Docs. 2020b. "NumPy 1.17.0 Release Notes." NumPy, February 5. Accessed 2020-04-27.
  16. NumPy Docs. 2020c. "NumPy 1.16.0 Release Notes." NumPy, February 5. Accessed 2020-04-27.
  17. NumPy Docs. 2020d. "NumPy 1.5.0 Release Notes." NumPy, February 5. Accessed 2020-04-27.
  18. NumPy Docs. 2020e. "NumPy for Matlab users." NumPy, February 5. Accessed 2020-04-27.
  19. PyPI. 2020. "Release history." numpy, 1.18.3, April 20. Accessed 2020-04-27.
  20. PyPI Stats. 2020. "numpy." PyPI Stats, April 27. Accessed 2020-04-27.
  21. Ravishankar, Mahesh, and Vinod Grover. 2019. "Automatic acceleration of Numpy applications on GPUs and multicore CPUs." arXiv, v1, January 11. Accessed 2020-04-27.
  22. Ross, Paul. 2014. "The Performance of Python, Cython and C on a Vector." Notes on Cython, October 6. Accessed 2020-04-27.
  23. Rougier, Nicolas P. 2017. "From Python to Numpy." May. Accessed 2020-04-27.
  24. SciPy. 2020. "Frequently Asked Questions." SciPy. Accessed 2020-04-27.
  25. SciPy GitHub. 2020. "SciPy: History_of_SciPy." Accessed 2020-04-27.
  26. Seif, George. 2019. "Here’s How to Use CuPy to Make Numpy Over 10X Faster." Towards Data Science, on Medium, August 22. Accessed 2020-04-27.
  27. UCF. 2020. "Python Lists vs. Numpy Arrays - What is the difference?" webcourses@UCF, IST Advanced Topics Primer, Univ. of Central Florida. Accessed 2020-04-27.
  28. Waters, John K. 2020. "Python 2 Officially Hits End of Life, Final Few Fixes Coming April 2020." ADTMag, 1105 Media Inc., January 09. Accessed 2020-04-27.

Further Reading

  1. NumPy DevDocs. 2020. "NumPy: the absolute basics for beginners." April 26. Accessed 2020-04-27.
  2. Harris, Mark. 2013. "Numba: High-Performance Python with CUDA Acceleration." NVIDIA Developer Blog, September 19. Updated 2017-09-19. Accessed 2020-04-27.
  3. Zelenka, Scott. 2018. "How to shrink NumPy, SciPy, Pandas, and Matplotlib for your data product." Towards Data Science, on Medium, September 25. Accessed 2020-04-27.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
4
0
1374
1
0
4
1251
Words
1
Likes
11K
Hits

Cite As

Devopedia. 2020. "NumPy." Version 5, May 19. Accessed 2024-06-25. https://devopedia.org/numpy
Contributed by
2 authors


Last updated on
2020-05-19 06:15:51