Python for Scientific Computing
- Summary
-
Discussion
- What makes Python a suitable language for scientific computing?
- I'm used to MATLAB. Why should I use Python?
- Should I worry about performance when using Python for scientific research?
- What are the essential packages for scientific computing in Python?
- Could you name some useful scientific projects/packages in Python?
- Could you name some domain-specific scientific projects/packages in Python?
- What's the recommended Python distribution for scientific computing?
- As a beginner in scientific Python, what should be my learning path?
- What useful developer resources are available for scientific computing in Python?
- Milestones
- References
- Further Reading
- Article Stats
- Cite As
Fortran has been the language of choice for many decades for scientific computing because of speed. In the 1980s, when a programmer's time was becoming more valuable than compute time, there was a need for languages that were easier to learn and use. For the purpose of research, code-compile-execute workflow gave way to interact-explore-visualize workflow. In this context were born MATLAB, IDL, Mathematica and Maple.
Modern scientific computing is not just about numerical computing. It needs to be versatile: deal with large datasets, offer richer data structures than just numerical arrays, make network calls, interface with databases, interwork with web apps, handle data in various formats, enable team collaboration, enable easy documentation.
Python offers all of the above. It's been said that,
Python provides a balance of clarity and flexibility without sacrificing performance.
Discussion
-
What makes Python a suitable language for scientific computing? Python is not just suited for manipulating numbers. It offers a "computational ecosystem" that can fulfil the needs of a modern scientist. Python does well in system integration, in gluing together many different parts contributed by different folks. Python's duck typing is one of the reasons why this is possible. In terms of data types,
memoryview
,PyCapsule
and NumPy'sarray
aid scientific work.Python is easy to learn and use. It offers a natural syntax. This enables researchers to express and explore their ideas more directly rather than fight with low-level language syntax.
With Python, performance bottlenecks can be optimized at a low-level without sacrificing high-level usability. A researcher needs to explore and visualize ideas in an incremental manner. Python allows for this via IPython/Jupyter notebooks and matplotlib. A variety of Python tools can work together and share data within the same runtime environment without having to exchange data only via the filesystem.
-
I'm used to MATLAB. Why should I use Python? MATLAB is proprietary, expensive and hard to extend. Python is open, community-driven, portable, powerful and extensible. Python is also better with strings, namespaces, classes and GUIs. While MATLAB, along with Simulink, has vast libraries, Python is catching up as many scientific projects are adopting Python. MATLAB is said to be poor at scalability, complex data structures, memory handling, system tasks and database programming.
However, there are some criticisms of Python (December 2013). Syntax is not consistent since different packages are written by different folks with different needs. There are two ordinary differential equation (ODE) solvers in scipy with incompatible syntax. Duplicated functionality across packages may result in confusion. MATLAB does better with data regression, boundary value problems and partial differential equations (PDE).
In 2014, Konrad Hinsen commented that Python may not be suitable for small-scale projects where code is written once and rarely maintained thereafter. This becomes a problem when Python scientific libraries are upgraded by deprecating older classes/functions/methods.
-
Should I worry about performance when using Python for scientific research? The short answer is no. There are many initiatives that aim to make Python faster. PyPy and Pyston do just-in-time (JIT) compilation for better performance. Nuitka aims to replace the Python runtime to automatically transpile code to languages that run fast natively. Numba speeds up math-heavy Python code to native machine instructions with just a few annotations on your Python code.
While pure Python code is definitely slower when compared to Fortran or C, scientific packages in Python often make use of low-level implementations that are themselves written in Fortran, C, etc. For example, NumPy operations often call BLAS or LAPACK functions that are written in Fortran.
f2py is enabling Python to directly use Fortran implementations. SWIG and Cython allow us to make calls to optimized C/C++ implementations from within Python. For example, Cython is being used by scikit-learn. Intel Math Kernel Library (MKL) and PyCUDA are also bringing Python on par with Fortran on specific hardware platforms.
-
What are the essential packages for scientific computing in Python? Here are some packages that could be considered essential:
- numpy: Multi-dimensional arrays and operations on them. Executes faster than Python.
- scipy: Linear algebra, interpolation, integration, FFT...
- matplotlib: Plotting and data visualization with an API similar to MATLAB. Eases transition from MATLAB to Python.
- spyder: An IDE that includes IPython (for interactive computing).
- pandas: Numerical data structures, data manipulation and analysis.
- jupyter: Web-based sharing of code, graphs, annotations and results.
Most Python scientific packages are based on numpy and scipy. If visualization is involved, matplotlib may be used. For higher-level data structures, pandas may be used. Spyder, IPython and Jupyter are simply useful tools for the scientist or engineer.
-
Could you name some useful scientific projects/packages in Python? Here are some that can be applied to any domain:
- Image processing: Pillow, OpenCV, scikit-learn , Mahotas
- Visualization: matplotlib, bokeh, plotly, mayavi, seaborn , basemap, NetworkX
- Markov chains: Markov, MarkovNetwork, PyMarkov, pyEMMA, hmmus
- Stochastic process: stochastic, StochPy, sdeint
- Solving PDEs: FEniCS, SfePy
- Convex Optimization: CVXPY
- Units and conversions: quantities, pint
- Multi-precision math: mpmath, GmPy
- Spatial analysis: cartopy, georasters, PySAL, geopy
- Data access: pydap, cubes, Blaze, bottleneck, pytables
- Machine learning: scikit-learn, Mlpy, TensorFlow, Theano, Caffe, Keras
- Natural Language Processing: NLTK, TextBlob, spaCy, gensim, pycorenlp
- Statistics: statistics, statsmodels, patsy
- Tomography: TomoPy
- Symbolic computing: sympy
- Simulations: SimPy, BlockCanvas, railgun
-
Could you name some domain-specific scientific projects/packages in Python? Since there are dozens of packages for all types of scientific work, we can only give a sample:
- Solar data analysis: SunPy
- Astronomy: astropy
- Chemistry: thermo, chemlab
- Biology: biopython
- Neurosciences: PsychoPy, NIPY
- Life sciences: DEAP
- Network analysis: NetworkX
- Quantum dynamics: QuTiP
- Protein analysis: ProDy
- Neuron simulations: NEURON
- Seismology: ObsPy
- Phylogenetic computing: DendroPy
- Software defined radio: GNU Radio
-
What's the recommended Python distribution for scientific computing? Installation of Python for scientific work used to be a pain earlier but with modern distributions, this is no longer an issue.
Rather than install Python's standard distribution and then install scientific packages one by one, the recommended approach is to use an alternative distribution customized for scientific computing: Enthought Canopy, Anaconda, Python(x,y) or WinPython. Enthought Canopy is commercial but the rest are free. Enthought Canopy claims to include 450+ tested scientific and analytic packages.
Anaconda distribution uses conda for package management. The use of virtual environments is recommended so that different projects can use their own specific environments. Powered by Anaconda, Intel offers its own distribution that's optimized for performance.
SageMath is another distribution that offers a web-based interface and uses Jupyter notebooks. It aims to be the free open source alternative to Magma, Maple, Mathematica and Matlab.
-
As a beginner in scientific Python, what should be my learning path? From tools and environment perspectives, get familiar with using IPython, Jupyter Notebook and optionally Spyder.
After learning the basics of Python, the next step is to learn numpy since it's the base for many scientific packages. With numpy, you can work with matrices and do vectorized operations without having to write explicit loops. You should learn about operations such as reshaping, transposing, filling, copying, concatenating, flattening, broadcasting, filtering and sorting.
You could then learn scipy to do optimization, linear algebra, integration, and so on. For visualization, matplotlib can be a starting point. For dealing with higher-level data structures and manipulation, learn pandas.
If you wish get into data science, scikit-learn and Theano can be starting points. For statistical modelling, you can learn statsmodels.
-
What useful developer resources are available for scientific computing in Python? One good place to start learning is the SciPy Lecture Notes.
SciPy Conference is an annual event for Python's scientific community. It also happens in Europe as EuroSciPy and in India as SciPy India.
EarthPy is a collection of IPython notebooks for learning how to apply Python to Earth sciences.
The Hacker Within, Software Carpentry and Data Carpentry are some communities that bring together research and scientific folks. Although these are not exclusive to Python, Python programmers will find them useful.
Milestones
References
- Anaconda Docs. 2018. "Anaconda Accelerate." Accessed 2018-03-26.
- Bobriakov, Igor. 2017. "Top 15 Python Libraries for Data Science in 2017." Medium, May 9. Accessed 2018-02-28.
- CoCalc. 2020. "Python Environments." Collaborative Calculation and Data Science. Accessed 2020-07-22.
- EliteDataScience. 2017. "5 Heroic Python NLP Libraries." February 5. Accessed 2018-02-28.
- Enthought. 2018. "Enthought Canopy: The Python Platform of Choice for Scientists and Engineers." Accessed 2018-03-26.
- Hinsen, Konrad. 2014. "The state of NumPy." Konrad Hinsen's Blog, September 12. Accessed 2018-03-26.
- Hinsen, Konrad. 2017. "Why Python does so well in scientific computing." Konrad Hinsen's Blog, September 12. Accessed 2018-03-26.
- Hirsch, Michael. 2018. "Speed of Matlab vs. Python Numpy Numba CUDA vs Julia vs IDL." SciVision, Inc., January 13. Accessed 2018-03-26.
- Intel Software. 2018. "Intel Distribution for Python: Accelerate Python Performance, Powered by Anaconda." Accessed 2018-03-26.
- Keenan, Tyler. 2016. "15 Python Libraries for Data Science." Upwork, June 28. Updated 2018-03-25. Accessed 2018-02-28.
- Kitchin, John. 2013. "Python as alternative to Matlab for engineering calculations." The Kitchin Research Group, December 30. Accessed 2018-02-28.
- Kumar E K, Vipin, Ying H, and Jing X. 2012. "Numpy/Scipy with Intel® MKL and Intel® Compilers." Intel Software, June 28. Updated 2017-11-19. Accessed 2018-03-26.
- Lin, Johnny Wei-Bing. 2012. "Why Python Is the Next Wave in Earth Sciences Computing." Bulletin of the American Meteorological Society, 93(12), pp. 1823–1824. Accessed 2018-03-26.
- Mahotas Docs. 2016. "Mahotas: Computer Vision in Python." Version 1.4.3, October 3. Accessed 2018-02-28.
- Millman, Jarrod and Travis Vaught. 2008. "The State of SciPy." Proc. of 10th Python in Science Conference (SciPy 2008), pp. 5-10. Accessed 2018-02-28.
- NumPy.org. 2018. "Older Array Packages." Accessed 2018-02-28.
- Pansop. 2015. "9 Python Analytics Libraries." Data Science Central, May 21. Accessed 2018-03-25.
- Perez, F., B. E. Granger, and J. D. Hunter. 2011. "Python: An Ecosystem for Scientific Computing." Computing in Science & Engineering, vol. 13, no. 2, pp. 13-21, March-April. Accessed 2018-02-28.
- PyPI. 2018a. "Index of Packages Matching 'markov'." Accessed 2018-02-28.
- PyPI. 2018b. "Index of Packages Matching 'stochastic'." Accessed 2018-02-28.
- Python Wiki. 2017. "Numeric and Scientific." August 7. Accessed 2018-02-28.
- Pyzo. 2016. "Python vs Matlab." Accessed 2018-03-26.
- Pölsterl, Sebastian. 2020. "sebp/python-scientific-computing.md." Gist, GitHub, July 9. Accessed 2020-07-22.
- Reitz, Kenneth, and Tanya Schlusser. 2020. "Scientific Applications." In: The Hitchhiker’s Guide to Python. Accessed 2020-07-22.
- Rossant, Cyrille. 2013. "Why use Python for scientific computing?" July 1. Accessed 2018-03-26.
- STX Next. 2017. "The most popular Python scientific libraries." April 12. Accessed 2018-02-28.
- SciPy. 2020. "Scientific computing tools for Python." SciPy. Accessed 2020-07-22.
- SciPy GitHub. 2018. "SciPy: History_of_SciPy." Accessed 2018-02-28.
- Science.gov. 2020. "Sample records for python mixture package." Topics, Science.gov. Accessed 2020-07-22.
- Stanford NLP GitHub. 2018. "Using Stanford CoreNLP within other programming languages and packages." CoreNLP, v3.9.1. Accessed 2018-02-28.
- VanderPlas, Jake. 2017. "The Unexpected Effectiveness of Python in Science." Slides from PyCon 2017, on SpeakerDeck, May 19. Accessed 2020-04-27.
- Wikipedia. 2020. "SciPy." Wikipedia, July 5. Accessed 2020-07-22.
- Yegulalp, Serdar. 2015. "5 projects that push Python performance." InfoWorld, February 9. Accessed 2018-03-26.
- corochann. 2017. "Setup python environment." corochannNote, July 15. Accessed 2018-03-26.
- d'Avezac, Mayeul. 2014. "How can I choose the right programming language for a computational physics project?" ResearchGate. Accessed 2018-03-26.
- van der Walt, Stéfan, and Jarrod Millman. 2011. "Preface." Proc. of 10th Python in Science Conference (SciPy 2011), pp. 1-3. Accessed 2018-02-28.
- van der Walt, Stéfan, S. Chris Colbert, and Gaël Varoquaux. 2011. "The NumPy array: a structure for efficient numerical computation." arXiv, February 8. Accessed 2018-02-28.
Further Reading
- Rossant, Cyrille. 2013. "Why use Python for scientific computing?" July 1. Accessed 2018-03-26.
- Perez, F., B. E. Granger and J. D. Hunter. 2011. "Python: An Ecosystem for Scientific Computing." Computing in Science & Engineering, vol. 13, no. 2, pp. 13-21, March-April. Accessed 2018-02-28.
- Bobriakov, Igor. 2017. "Top 15 Python Libraries for Data Science in 2017." Medium, May 9. Accessed 2018-02-28.
- Krill, Paul. 2012. "Van Rossum: Python is not too slow." InfoWorld, March 16. Accessed 2018-02-28.
- Oliphant, Travis E. 2012. "NumPy and SciPy: History and Ideas for the Future." SciPy Accessed 2018-02-28.
- Rao, Vinay. 2018. "Accelerating Python for scientific research." IBM Developer, April 04. Accessed 2019-02-06.
Article Stats
Cite As
Article Warnings
- Readability score of this article is below 50 (49). Use shorter sentences. Use simpler words.