# NumPy

NumPy is an open source Python library that enables efficient manipulation of multi-dimensional numerical data structures. These are called arrays in NumPy. NumPy is an alternative to Interactive Data Language (IDL) and MATLAB.

Since it's release in 2005, NumPy has become a fundamental package for numerical and scientific computing in Python. In addition to efficient data structures and operations on them, it provides many high-level mathematical functions that aid scientific computation. Pandas, SciPy, Matplotlib, scikit-learn and scikit-image are just a few popular scientific packages that make use of NumPy.

## Discussion

• What does NumPy do differently from core Python?

Python is slower than compiled languages such as C but it's easy to learn. Python is suited for rapid prototyping and iterative development.

While Python's list data type can be used to construct multi-dimensional data structures (lists containing lists), NumPy is faster and provides a better API for developers. Python's lists are general purpose. They can contain data of different types. This means that types are also stored, type-dispatching code is invoked at runtime and types are checked. Lists are processed using loops or comprehensions and can't be vectorized to support elementwise operations. NumPy sacrifices some of Python's flexibility to improve performance.

Specifically, NumPy is better at these aspects:

• Size: NumPy data structures take up less space. Each Python integer object takes 28 bytes whereas in NumPy an integer is just 8 bytes. A Python list of n items requires 64+8n+28n bytes whereas in NumPy it's 96+8n bytes.
• Performance: NumPy code runs faster than Python code, particularly for large input data.
• Functionality: NumPy provides lots of functions and methods to simplify operations. High-level operations such as linear algebra are also included.
• What are some of the main features of NumPy?

NumPy arrays are homogeneous, meaning that array elements are of the same type. Hence, no type checking is required at runtime. All elements of an array take up same amount of space.

The spacing between elements along an axis is also constant. This is called striding. This is useful when the same data in memory can be used to create a new array without copying. Different arrays are therefore different views into memory. Thus, it's easier to modify data subsets in memory.

Operations are vectorized, which means that the operation can be executed in parallel on multiple elements of the array. This speeds up computation. Developers need not write for loops.

NumPy provides APIs for easy manipulation of arrays. Some of these are indexing, slicing, reshaping, stacking and splitting. Broadcasting is a feature that allows operations between vectors and scalars, or vectors of different sizes.

NumPy integrates easily with C/C++ or Fortran code that may provide optimized implementations. Useful functions covering linear algebra, Fourier transform, and random numbers are provided.

• Could you share some performance numbers comparing NumPy versus Python implementations?

For a simple computation of mean and standard deviation of a million floating point numbers, NumPy was 30X faster than a pure Python implementation. However, optimized Cython and C implementations were even faster. Another study showed that if input is small (less than 200 numbers), pure Python did better than NumPy. For inputs greater than about 15,000 numbers, NumPy outperformed C++.

One experiment in Machine Learning compared pure Python, NumPy and TensorFlow (on CPU) implementations of gradient descent. Runtimes were 18.65, 0.32 and 1.20 seconds respectively. NumPy was 50X faster than pure Python. For more complex ML problems deployed on multiple GPUs, TensorFlow is likely to outperform NumPy.

When evaluating NumPy performance, the underlying library for vector/matrix computations matters. NumPy comes with Default BLAS & Lapack. Depending on the distribution, alternatives may be included: OpenBLAS, Intel MKL, ATLAS, etc. In general, these alternatives are faster than the default library. For example, SVD is 10X faster on Intel MKL.

Hardware platforms may provide further acceleration. For example, Intel AVX2 provides at least 20% improvement on top of OpenBLAS.

• Does NumPy automatically make use of GPU hardware?

NumPy doesn't natively support GPUs. However, there are tools and libraries to run NumPy on GPUs.

Numba is a Python compiler that can compile Python code to run on multicore CPUs and CUDA-enabled GPUs. Numba also understands NumPy and generates optimized compiled code. Developers specify type signatures for Python functions. Numba uses them towards just-in-time (JIT) compilation. Numba team also provides pyculib, which is a Python interface to CUDA libraries such as cuBLAS, cuFFT and cuRAND.

Grumpy has been proposed as a framework to seamlessly target multicore CPUs and GPUs. It does a mix of JIT compilation and offloading to optimized libraries such as cuBLAS or LAPACK.

CuPy is a Python library that implements NumPy arrays for CUDA-enabled GPUs and leverages CUDA GPU acceleration libraries. The code is mostly a drop-in replacement to NumPy code since the APIs are very similar. PyCUDA is a similar library from NVIDIA.

MinPy is similar to CuPy and is meant to be a NumPy interface above MXNet for building artificial neural networks. It includes auto differentiation in addition to transparent CPU/GPU acceleration.

• What are some essential resources to learn NumPy?

The main NumPy website is the definitive resource to consult. Beginners can start by reading their Quickstart tutorial or the absolute beginner's guide. The latter includes the basics of installing NumPy.

Rougier's book titled From Python to Numpy focuses on Python programmers who wish to learn NumPy and it's vectorization. Perhaps a classic is the PhD thesis titled Guide to NumPy, by Travis E. Oliphant who created NumPy.

MATLAB users might want to read NumPy for Matlab users. It maps MATLAB operations to NumPy equivalents.

DataCamp blog has shared a handy NumPy cheatsheet.

Those who wish to contribute to the NumPy project or study it's source code can head to NumPy's GitHub repository.

## Milestones

1995

Numeric is released to enable numerical computations. It's designed to provide homogeneous numeric arrays, that is, arrays whose elements all belong to the same data type, and therefore easier and faster to process.

2005

NumPy is released based on an older library named Numeric. It also combines features of another library named Numarray. NumPy is initially named SciPy Core but renamed to NumPy in January 2006.

Oct
2006

NumPy v1.0 is released.

Apr
2009

NumPy v1.3.0 is released. This release includes experimental Windows 64-bit support. Support for 64-bit OpenBLAS comes a decade later in December 2019.

Aug
2010

NumPy v1.5.0 is released. This is the first release to support Python 3.

Jan
2019

GitHub publishes a study of Machine Learning (ML) projects hosted on their platform. The study spans contributions from Jan-Dec 2018. It's seen that 74% of ML Python projects import NumPy. This is followed by SciPy and Pandas.

Jul
2019

NumPy v1.17.0 is released. This release supports Python 3.5-3.7 but drops support for Python 2.7. In fact, NumPy v1.16.x is the last series to support Python 2.7 but being a long term release, v1.16.x will be maintained till 2020. NumPy v1.16.6 is released in December 2019.

Feb
2020

Following the end of life of Python 2 in January 2020, the number of downloads for older NumPy releases based on Python 2 falls sharply. By April 2020, 80% of NumPy downloads are based on Python 3.

## Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins mohanjo
4
0
1074 arvindpdmn
1
0
4
1251
Words
1
Likes
8664
Hits

## Cite As

Devopedia. 2020. "NumPy." Version 5, May 19. Accessed 2023-05-02. https://devopedia.org/numpy
• Site Map