Python for Scientific Computing
 Summary

Discussion
 What makes Python a suitable language for scientific computing?
 I'm used to MATLAB. Why should I use Python?
 Should I worry about performance when using Python for scientific research?
 What are the essential packages for scientific computing in Python?
 Could you name some useful scientific projects/packages in Python?
 Could you name some domainspecific scientific projects/packages in Python?
 What's the recommended Python distribution for scientific computing?
 As a beginner in scientific Python, what should be my learning path?
 What useful developer resources are available for scientific computing in Python?
 Milestones
 References
 Further Reading
 Article Stats
 Cite As
Fortran has been the language of choice for many decades for scientific computing because of speed. In the 1980s, when a programmer's time was becoming more valuable than compute time, there was a need for languages that were easier to learn and use. For the purpose of research, codecompileexecute workflow gave way to interactexplorevisualize workflow. In this context were born MATLAB, IDL, Mathematica and Maple.^{}
Modern scientific computing is not just about numerical computing. It needs to be versatile: deal with large datasets, offer richer data structures than just numerical arrays, make network calls, interface with databases, interwork with web apps, handle data in various formats, enable team collaboration, enable easy documentation.^{}
Python offers all of the above. It's been said that,^{}
Python provides a balance of clarity and flexibility without sacrificing performance.
Discussion
What makes Python a suitable language for scientific computing? Python is not just suited for manipulating numbers. It offers a "computational ecosystem" that can fulfil the needs of a modern scientist.^{} Python does well in system integration, in gluing together many different parts contributed by different folks. Python's duck typing is one of the reasons why this is possible. In terms of data types,
memoryview
,PyCapsule
and NumPy'sarray
aid scientific work.^{}Python is easy to learn and use. It offers a natural syntax.^{} This enables researchers to express and explore their ideas more directly rather than fight with lowlevel language syntax.^{}
With Python, performance bottlenecks can be optimized at a lowlevel without sacrificing highlevel usability.^{} A researcher needs to explore and visualize ideas in an incremental manner. Python allows for this via IPython/Jupyter notebooks and matplotlib.^{} A variety of Python tools can work together and share data within the same runtime environment without having to exchange data only via the filesystem.^{}
I'm used to MATLAB. Why should I use Python? MATLAB is proprietary, expensive and hard to extend. Python is open, communitydriven, portable, powerful and extensible. Python is also better with strings, namespaces, classes and GUIs.^{} While MATLAB, along with Simulink, has vast libraries, Python is catching up as many scientific projects are adopting Python.^{} MATLAB is said to be poor at scalability, complex data structures, memory handling, system tasks and database programming.^{}
However, there are some criticisms of Python (December 2013). Syntax is not consistent since different packages are written by different folks with different needs. There are two ordinary differential equation (ODE) solvers in scipy with incompatible syntax. Duplicated functionality across packages may result in confusion. MATLAB does better with data regression, boundary value problems and partial differential equations (PDE).^{}
In 2014, Konrad Hinsen commented that Python may not be suitable for smallscale projects where code is written once and rarely maintained thereafter. This becomes a problem when Python scientific libraries are upgraded by deprecating older classes/functions/methods.^{}
Should I worry about performance when using Python for scientific research? The short answer is no. There are many initiatives that aim to make Python faster. PyPy and Pyston do justintime (JIT) compilation for better performance. Nuitka aims to replace the Python runtime to automatically transpile code to languages that run fast natively. Numba speeds up mathheavy Python code to native machine instructions with just a few annotations on your Python code.^{}
While pure Python code is definitely slower when compared to Fortran or C, scientific packages in Python often make use of lowlevel implementations that are themselves written in Fortran, C, etc.^{} For example, NumPy operations often call BLAS or LAPACK functions that are written in Fortran.^{}
f2py is enabling Python to directly use Fortran implementations. SWIG and Cython allow us to make calls to optimized C/C++ implementations from within Python.^{} For example, Cython is being used by scikitlearn.^{} Intel Math Kernel Library (MKL) and PyCUDA are also bringing Python on par with Fortran on specific hardware platforms.^{}
What are the essential packages for scientific computing in Python? Here are some packages that could be considered essential:^{} ^{}
 numpy: Multidimensional arrays and operations on them. Executes faster than Python.
 scipy: Linear algebra, interpolation, integration, FFT...
 matplotlib: Plotting and data visualization with an API similar to MATLAB. Eases transition from MATLAB to Python.
 spyder: An IDE that includes IPython (for interactive computing).
 pandas: Numerical data structures, data manipulation and analysis.
 jupyter: Webbased sharing of code, graphs, annotations and results.
Most Python scientific packages are based on numpy and scipy. If visualization is involved, matplotlib may be used. For higherlevel data structures, pandas may be used. Spyder, IPython and Jupyter are simply useful tools for the scientist or engineer.
Could you name some useful scientific projects/packages in Python? Here are some that can be applied to any domain:
 Image processing: Pillow, OpenCV, scikitlearn^{} , Mahotas^{}
 Visualization: matplotlib, bokeh, plotly, mayavi, seaborn^{} , basemap, NetworkX^{}
 Markov chains: Markov, MarkovNetwork, PyMarkov, pyEMMA, hmmus^{}
 Stochastic process: stochastic, StochPy, sdeint^{}
 Solving PDEs: FEniCS, SfePy
 Convex Optimization: CVXPY
 Units and conversions: quantities, pint
 Multiprecision math: mpmath, GmPy^{}
 Spatial analysis: cartopy, georasters, PySAL, geopy
 Data access: pydap, cubes, Blaze, bottleneck, pytables
 Machine learning: scikitlearn, Mlpy, TensorFlow, Theano, Caffe, Keras^{}
 Natural Language Processing: NLTK, TextBlob, spaCy, gensim,^{} pycorenlp^{}
 Statistics: statistics, statsmodels, patsy
 Tomography: TomoPy
 Symbolic computing: sympy^{}
 Simulations: SimPy, BlockCanvas, railgun
Could you name some domainspecific scientific projects/packages in Python? Since there are dozens of packages for all types of scientific work, we can only give a sample:^{} ^{}
 Solar data analysis: SunPy
 Astronomy: astropy
 Chemistry: thermo, chemlab
 Biology: biopython
 Neurosciences: PsychoPy, NIPY
 Life sciences: DEAP
 Network analysis: NetworkX
 Quantum dynamics: QuTiP
 Protein analysis: ProDy
 Neuron simulations: NEURON
 Seismology: ObsPy
 Phylogenetic computing: DendroPy
 Software defined radio: GNU Radio
What's the recommended Python distribution for scientific computing? Installation of Python for scientific work used to be a pain earlier but with modern distributions, this is no longer an issue.^{}
Rather than install Python's standard distribution and then install scientific packages one by one, the recommended approach is to use an alternative distribution customized for scientific computing: Enthought Canopy, Anaconda, Python(x,y) or WinPython. Enthought Canopy is commercial but the rest are free.^{} Enthought Canopy claims to include 450+ tested scientific and analytic packages.^{}
Anaconda distribution uses conda for package management. The use of virtual environments is recommended so that different projects can use their own specific environments.^{} Powered by Anaconda, Intel offers its own distribution that's optimized for performance.^{}
SageMath is another distribution that offers a webbased interface and uses Jupyter notebooks. It aims to be the free open source alternative to Magma, Maple, Mathematica and Matlab.
As a beginner in scientific Python, what should be my learning path? From tools and environment perspectives, get familiar with using IPython, Jupyter Notebook and optionally Spyder.
After learning the basics of Python, the next step is to learn numpy since it's the base for many scientific packages. With numpy, you can work with matrices and do vectorized operations without having to write explicit loops. You should learn about operations such as reshaping, transposing, filling, copying, concatenating, flattening, broadcasting, filtering and sorting.^{}
You could then learn scipy to do optimization, linear algebra, integration, and so on. For visualization, matplotlib can be a starting point. For dealing with higherlevel data structures and manipulation, learn pandas.
If you wish get into data science, scikitlearn and Theano can be starting points. For statistical modelling, you can learn statsmodels.^{}
What useful developer resources are available for scientific computing in Python? One good place to start learning is the SciPy Lecture Notes.
SciPy Conference is an annual event for Python's scientific community. It also happens in Europe as EuroSciPy and in India as SciPy India.^{}
EarthPy is a collection of IPython notebooks for learning how to apply Python to Earth sciences.
The Hacker Within, Software Carpentry and Data Carpentry are some communities that bring together research and scientific folks.^{} Although these are not exclusive to Python, Python programmers will find them useful.
Milestones
Numeric is released to enable numerical computations. This is the ancestor of today's NumPy.^{}
Many scientific modules are brought together and released as a single package named SciPy.^{} The same year, IPython is born.^{}
"Python for Scientific Computing Workshop" is organized at Caltech. In 2004, this is renamed as SciPy Conference and is now an annual event. In 2008, EuroSciPy is held for the first time. In 2009, 1st SciPy India is held.^{}
NumPy is released based on an older library named Numeric. It also combines features of another library named Numarray. NumPy is initially named SciPy Core but renamed to NumPy in January 2006.^{}
2017
Anaconda Accelerate is split into Intel Distribution for Python and open source Numba's subprojects pyculib, pyculib_sorting and data_profiler.^{}
References
 Anaconda Docs. 2018. "Anaconda Accelerate." Accessed 20180326.
 Bobriakov, Igor. 2017. "Top 15 Python Libraries for Data Science in 2017." Medium, May 9. Accessed 20180228.
 CoCalc. 2020. "Python Environments." Collaborative Calculation and Data Science. Accessed 20200722.
 corochann. 2017. "Setup python environment." corochannNote, July 15. Accessed 20180326.
 d'Avezac, Mayeul. 2014. "How can I choose the right programming language for a computational physics project?" ResearchGate. Accessed 20180326.
 EliteDataScience. 2017. "5 Heroic Python NLP Libraries." February 5. Accessed 20180228.
 Enthought. 2018. "Enthought Canopy: The Python Platform of Choice for Scientists and Engineers." Accessed 20180326.
 Hinsen, Konrad. 2014. "The state of NumPy." Konrad Hinsen's Blog, September 12. Accessed 20180326.
 Hinsen, Konrad. 2017. "Why Python does so well in scientific computing." Konrad Hinsen's Blog, September 12. Accessed 20180326.
 Hirsch, Michael. 2018. "Speed of Matlab vs. Python Numpy Numba CUDA vs Julia vs IDL." SciVision, Inc., January 13. Accessed 20180326.
 Intel Software. 2018. "Intel Distribution for Python: Accelerate Python Performance, Powered by Anaconda." Accessed 20180326.
 Keenan, Tyler. 2016. "15 Python Libraries for Data Science." Upwork, June 28. Updated 20180325. Accessed 20180228.
 Kitchin, John. 2013. "Python as alternative to Matlab for engineering calculations." The Kitchin Research Group, December 30. Accessed 20180228.
 Kumar E K, Vipin, Ying H, and Jing X. 2012. "Numpy/Scipy with Intel® MKL and Intel® Compilers." Intel Software, June 28. Updated 20171119. Accessed 20180326.
 Lin, Johnny WeiBing. 2012. "Why Python Is the Next Wave in Earth Sciences Computing." Bulletin of the American Meteorological Society, 93(12), pp. 1823–1824. Accessed 20180326.
 Mahotas Docs. 2016. "Mahotas: Computer Vision in Python." Version 1.4.3, October 3. Accessed 20180228.
 Millman, Jarrod and Travis Vaught. 2008. "The State of SciPy." Proc. of 10th Python in Science Conference (SciPy 2008), pp. 510. Accessed 20180228.
 NumPy.org. 2018. "Older Array Packages." Accessed 20180228.
 Pansop. 2015. "9 Python Analytics Libraries." Data Science Central, May 21. Accessed 20180325.
 Perez, F., B. E. Granger, and J. D. Hunter. 2011. "Python: An Ecosystem for Scientific Computing." Computing in Science & Engineering, vol. 13, no. 2, pp. 1321, MarchApril. Accessed 20180228.
 PyPI. 2018a. "Index of Packages Matching 'markov'." Accessed 20180228.
 PyPI. 2018b. "Index of Packages Matching 'stochastic'." Accessed 20180228.
 Python Wiki. 2017. "Numeric and Scientific." August 7. Accessed 20180228.
 Pyzo. 2016. "Python vs Matlab." Accessed 20180326.
 Pölsterl, Sebastian. 2020. "sebp/pythonscientificcomputing.md." Gist, GitHub, July 9. Accessed 20200722.
 Reitz, Kenneth, and Tanya Schlusser. 2020. "Scientific Applications." In: The Hitchhiker’s Guide to Python. Accessed 20200722.
 Rossant, Cyrille. 2013. "Why use Python for scientific computing?" July 1. Accessed 20180326.
 Science.gov. 2020. "Sample records for python mixture package." Topics, Science.gov. Accessed 20200722.
 SciPy. 2020. "Scientific computing tools for Python." SciPy. Accessed 20200722.
 SciPy GitHub. 2018. "SciPy: History_of_SciPy." Accessed 20180228.
 Stanford NLP GitHub. 2018. "Using Stanford CoreNLP within other programming languages and packages." CoreNLP, v3.9.1. Accessed 20180228.
 STX Next. 2017. "The most popular Python scientific libraries." April 12. Accessed 20180228.
 van der Walt, Stéfan, and Jarrod Millman. 2011. "Preface." Proc. of 10th Python in Science Conference (SciPy 2011), pp. 13. Accessed 20180228.
 van der Walt, Stéfan, S. Chris Colbert, and Gaël Varoquaux. 2011. "The NumPy array: a structure for efficient numerical computation." arXiv, February 8. Accessed 20180228.
 VanderPlas, Jake. 2017. "The Unexpected Effectiveness of Python in Science." Slides from PyCon 2017, on SpeakerDeck, May 19. Accessed 20200427.
 Wikipedia. 2020. "SciPy." Wikipedia, July 5. Accessed 20200722.
 Yegulalp, Serdar. 2015. "5 projects that push Python performance." InfoWorld, February 9. Accessed 20180326.
Further Reading
 Rossant, Cyrille. 2013. "Why use Python for scientific computing?" July 1. Accessed 20180326.
 Perez, F., B. E. Granger and J. D. Hunter. 2011. "Python: An Ecosystem for Scientific Computing." Computing in Science & Engineering, vol. 13, no. 2, pp. 1321, MarchApril. Accessed 20180228.
 Bobriakov, Igor. 2017. "Top 15 Python Libraries for Data Science in 2017." Medium, May 9. Accessed 20180228.
 Krill, Paul. 2012. "Van Rossum: Python is not too slow." InfoWorld, March 16. Accessed 20180228.
 Oliphant, Travis E. 2012. "NumPy and SciPy: History and Ideas for the Future." SciPy Accessed 20180228.
 Rao, Vinay. 2018. "Accelerating Python for scientific research." IBM Developer, April 04. Accessed 20190206.
Article Stats
Cite As
See Also
 Python ML Frameworks
 NumPy
 SciPy
 Pandas
 Scikitlearn
 IPython