• Logos of some popular Python scientific packages. Source: Stamford 2016.
    image
  • Comparing MATLAB with Python. Source: Pyzo 2016.
    image
  • Comparing the performance of some languages for scientific computing. Source: Adapted from Hirsch 2018.
    image

Python for Scientific Computing

Summary

image
Logos of some popular Python scientific packages. Source: Stamford 2016.

Fortran has been the language of choice for many decades for scientific computing because of speed. In the 1980s, when a programmer's time was becoming more valuable than compute time, there was a need for languages that were easier to learn and use. For the purpose of research, code-compile-execute workflow gave way to interact-explore-visualize workflow. In this context were born MATLAB, IDL, Mathematica and Maple.

Modern scientific computing is not just about numerical computing. It needs to be versatile: deal with large datasets, offer richer data structures than just numerical arrays, make network calls, interface with databases, interwork with web apps, handle data in various formats, enable team collaboration, enable easy documentation.

Python offers all of the above. It's been said that,

Python provides a balance of clarity and flexibility without sacrificing performance.

Milestones

1995

Numeric is released to enable numerical computations. This is the ancestor of today's NumPy.

2001

Many scientific modules are brought together and released as a single package named SciPy. The same year, IPython is born.

2002

"Python for Scientific Computing Workshop" is organized at Caltech. In 2004, this is renamed as SciPy Conference and is now an annual event. In 2008, EuroSciPy is held for the first time. In 2009, 1st SciPy India is held.

2005

NumPy is released based on an older library named Numeric. It also combines features of another library named Numarray. NumPy is initially named SciPy Core but renamed to NumPy in January 2006.

Jul
2017

Anaconda Accelerate is split into Intel Distribution for Python and open source Numba's sub-projects pyculib, pyculib_sorting and data_profiler.

Discussion

  • What makes Python a suitable language for scientific computing?

    Python is not just suited for manipulating numbers. It offers a "computational ecosystem" that can fulfil the needs of a modern scientist. Python does well in system integration, in gluing together many different parts contributed by different folks. Python's duck typing is one of the reasons why this is possible. In terms of data types, memoryview, PyCapsule and NumPy's array aid scientific work.

    Python is easy to learn and use. It offers a natural syntax. This enables researchers to express and explore their ideas more directly rather than fight with low-level language syntax.

    With Python, performance bottlenecks can be optimized at a low-level without sacrificing high-level usability. A researcher needs to explore and visualize ideas in an incremental manner. Python allows for this via IPython/Jupyter notebooks and matplotlib. A variety of Python tools can work together and share data within the same runtime environment without having to exchange data only via the filesystem.

  • I'm used to MATLAB. Why should I use Python?
    image
    Comparing MATLAB with Python. Source: Pyzo 2016.

    MATLAB is proprietary, expensive and hard to extend. Python is open, community-driven, portable, powerful and extensible. Python is also better with strings, namespaces, classes and GUIs. While MATLAB, along with Simulink, has vast libraries, Python is catching up as many scientific projects are adopting Python. MATLAB is said to be poor at scalability, complex data structures, memory handling, system tasks and database programming.

    However, there are some criticisms of Python (December 2013). Syntax is not consistent since different packages are written by different folks with different needs. There are two ordinary differential equation (ODE) solvers in scipy with incompatible syntax. Duplicated functionality across packages may result in confusion. MATLAB does better with data regression, boundary value problems and partial differential equations (PDE).

    In 2014, Konrad Hinsen commented that Python may not be suitable for small-scale projects where code is written once and rarely maintained thereafter. This becomes a problem when Python scientific libraries are upgraded by deprecating older classes/functions/methods.

  • Should I worry about performance when using Python for scientific research?
    image
    Comparing the performance of some languages for scientific computing. Source: Adapted from Hirsch 2018.

    The short answer is no. There are many initiatives that aim to make Python faster. PyPy and Pyston do just-in-time (JIT) compilation for better performance. Nuitka aims to replace the Python runtime to automatically transpile code to languages that run fast natively. Numba speeds up math-heavy Python code to native machine instructions with just a few annotations on your Python code.

    While pure Python code is definitely slower when compared to Fortran or C, scientific packages in Python often make use of low-level implementations that are themselves written in Fortran, C, etc. For example, NumPy operations often call BLAS or LAPACK functions that are written in Fortran.

    f2py is enabling Python to directly use Fortran implementations. SWIG and Cython allow us to make calls to optimized C/C++ implementations from within Python. For example, Cython is being used by scikit-learn. Intel Math Kernel Library (MKL) and PyCUDA are also bringing Python on par with Fortran on specific hardware platforms.

  • What are the essential packages for scientific computing in Python?

    Here are some packages that could be considered essential:

    • numpy: Multi-dimensional arrays and operations on them.
    • scipy: Linear algebra, interpolation, integration, FFT...
    • matplotlib: 2D plotting.
    • spyder: An IDE that includes IPython (for interactive computing).
    • pandas: Numerical data structures, data manipulation and analysis.
    • jupyter: Web-based sharing of code, graphs, annotations and results.

    Most Python scientific packages are based on numpy and scipy. If visualization is involved, matplotlib may be used. For higher-level data structures, pandas may be used. However, many use numpy for getting data in and out of packages. Spyder, IPython and Jupyter are simply useful tools for the scientist or engineer.

  • Could you name some useful scientific projects/packages in Python?

    Here are some that can be applied to any domain:

    • Image processing: Pillow, OpenCV, scikit-learn, Mahotas
    • Visualization: matplotlib, bokeh, plotly, mayavi, seaborn, basemap, NetworkX
    • Markov chains: Markov, MarkovNetwork, PyMarkov, pyEMMA, hmmus
    • Stochastic process: stochastic, StochPy, sdeint
    • Solving PDEs: FEniCS, SfePy
    • Convex Optimization: CVXPY
    • Units and conversions: quantities, pint
    • Multi-precision math: mpmath, GmPy
    • Spatial analysis: cartopy, georasters, PySAL, geopy
    • Data access: pydap, cubes, Blaze, bottleneck, pytables
    • Machine learning: scikit-learn, Mlpy, TensorFlow, Theano, Caffe, Keras
    • Natural Language Processing: NLTK, TextBlob, spaCy, gensim, pycorenlp
    • Statistics: statistics, statsmodels, patsy
    • Tomography: TomoPy
    • Symbolic computing: sympy
    • Simulations: SimPy, BlockCanvas, railgun
  • Could you name some domain-specific scientific projects/packages in Python?

    Since there are dozens of packages for all types of scientific work, we can only give a sample:

  • What's the recommended Python distribution for scientific computing?

    Installation of Python for scientific work used to be a pain earlier but with modern distributions, this is no longer an issue.

    Rather than install Python's standard distribution and then install scientific packages one by one, the recommended approach is to use an alternative distribution customized for scientific computing: Enthought Canopy, Anaconda, Python(x,y) or WinPython. Enthought Canopy is commercial but the rest are free. Enthought Canopy claims to include 450+ tested scientific and analytic packages.

    Anaconda distribution uses the conda for package management. The use of virtual environments is recommended so that different projects can use their own specific environments. Powered by Anaconda, Intel offers its own distribution that's optimized for performance.

    SageMath is another distribution that offers a web-based interface and uses Jupyter notebooks. It aims to be the free open source alternative to Magma, Maple, Mathematica and Matlab.

  • As a beginner in scientific Python, what should be my learning path?

    From tools and environment perspectives, get familiar with using IPython, Jupyter Notebook and optionally Spyder.

    After learning the basics of Python, the next step is to learn numpy since it's the base for many scientific packages. With numpy, you can work with matrices and do vectorized operations without having to write explicit loops. You should learn about operations such as reshaping, transposing, filling, copying, concatenating, flattening, broadcasting, filtering and sorting.

    You could then learn scipy to do optimization, linear algebra, integration, and so on. For visualization, matplotlib can be a starting point. For dealing with higher-level data structures and manipulation, learn pandas.

    If you wish get into data science, scikit-learn and Theano can be starting points. For statistical modelling, you can learn statsmodels.

  • What useful developer resources are available for scientific computing in Python?

    One good place to start learning is the SciPy Lecture Notes.

    SciPy Conference is an annual event for Python's scientific community. It also happens in Europe as EuroSciPy and in India as SciPy India.

    EarthPy is a collection of IPython notebooks for learning how to apply Python to Earth sciences. The Hacker Within, Software Carpentry and Data Carpentry are some communities that bring together research and scientific folks. Although these are not exclusive to Python, Python programmers will find them useful.

References

  1. Anaconda Docs. 2018. "Anaconda Accelerate." Accessed 2018-03-26.
  2. Bobriakov, Igor. 2017. "Top 15 Python Libraries for Data Science in 2017." Medium, May 9. Accessed 2018-02-28.
  3. EliteDataScience. 2017. "5 Heroic Python NLP Libraries." February 5. Accessed 2018-02-28.
  4. Enthought. 2018. "Enthought Canopy: The Python Platform of Choice for Scientists and Engineers." Accessed 2018-03-26.
  5. Hinsen, Konrad. 2014. "The state of NumPy." Konrad Hinsen's Blog, September 12. Accessed 2018-03-26.
  6. Hinsen, Konrad. 2017. "Why Python does so well in scientific computing." Konrad Hinsen's Blog, September 12. Accessed 2018-03-26.
  7. Hirsch, Michael. 2018. "Speed of Matlab vs. Python Numpy Numba CUDA vs Julia vs IDL." SciVision, Inc., January 13. Accessed 2018-03-26.
  8. Intel Software. 2018. "Intel Distribution for Python: Accelerate Python Performance, Powered by Anaconda." Accessed 2018-03-26.
  9. Keenan, Tyler. 2016. "15 Python Libraries for Data Science." Upwork, June 28. Updated 2018-03-25. Accessed 2018-02-28.
  10. Kitchin, John. 2013. "Python as alternative to Matlab for engineering calculations." The Kitchin Research Group, December 30. Accessed 2018-02-28.
  11. Kumar E K, Vipin, Ying H, and Jing X. 2012. "Numpy/Scipy with Intel® MKL and Intel® Compilers." Intel Software, June 28. Updated 2017-11-19. Accessed 2018-03-26.
  12. Lin, Johnny Wei-Bing. 2012. "Why Python Is the Next Wave in Earth Sciences Computing." Bulletin of the American Meteorological Society, 93(12), pp. 1823–1824. Accessed 2018-03-26.
  13. Mahotas Docs. 2016. "Mahotas: Computer Vision in Python." Version 1.4.3, October 3. Accessed 2018-02-28.
  14. Millman, Jarrod and Travis Vaught. 2008. "The State of SciPy." Proc. of 10th Python in Science Conference (SciPy 2008), pp. 5-10. Accessed 2018-02-28.
  15. NumPy.org. 2018. "Older Array Packages." Accessed 2018-02-28.
  16. Pansop. 2015. "9 Python Analytics Libraries." Data Science Central, May 21. Accessed 2018-03-25.
  17. Perez, F., B. E. Granger, and J. D. Hunter. 2011. "Python: An Ecosystem for Scientific Computing." Computing in Science & Engineering, vol. 13, no. 2, pp. 13-21, March-April. Accessed 2018-02-28.
  18. PyPI. 2018a. "Index of Packages Matching 'markov'." Accessed 2018-02-28.
  19. PyPI. 2018b. "Index of Packages Matching 'stochastic'." Accessed 2018-02-28.
  20. Python Wiki. 2017. "Numeric and Scientific." August 7. Accessed 2018-02-28.
  21. Pyzo. 2016. "Python vs Matlab." Accessed 2018-03-26.
  22. Rossant, Cyrille. 2013. "Why use Python for scientific computing?" July 1. Accessed 2018-03-26.
  23. STX Next. 2017. "The most popular Python scientific libraries." April 12. Accessed 2018-02-28.
  24. SciPy GitHub. 2018. "SciPy: History_of_SciPy." Accessed 2018-02-28.
  25. Stamford, John. 2016. "Essential Python Packages." Stamford Research, August 30. Accessed 2018-02-28.
  26. Stanford NLP GitHub. 2018. "Using Stanford CoreNLP within other programming languages and packages." CoreNLP, v3.9.1. Accessed 2018-02-28.
  27. Yegulalp, Serdar. 2015. "5 projects that push Python performance." InfoWorld, February 9. Accessed 2018-03-26.
  28. corochann. 2017. "Setup python environment." corochannNote, July 15. Accessed 2018-03-26.
  29. d'Avezac, Mayeul. 2014. "How can I choose the right programming language for a computational physics project?" ResearchGate. Accessed 2018-03-26.
  30. van der Walt, Stéfan, and Jarrod Millman. 2011. "Preface." Proc. of 10th Python in Science Conference (SciPy 2011), pp. 1-3. Accessed 2018-02-28.
  31. van der Walt, Stéfan, S. Chris Colbert, and Gaël Varoquaux. 2011. "The NumPy array: a structure for efficient numerical computation." arXiv, February 8. Accessed 2018-02-28.

Milestones

1995

Numeric is released to enable numerical computations. This is the ancestor of today's NumPy.

2001

Many scientific modules are brought together and released as a single package named SciPy. The same year, IPython is born.

2002

"Python for Scientific Computing Workshop" is organized at Caltech. In 2004, this is renamed as SciPy Conference and is now an annual event. In 2008, EuroSciPy is held for the first time. In 2009, 1st SciPy India is held.

2005

NumPy is released based on an older library named Numeric. It also combines features of another library named Numarray. NumPy is initially named SciPy Core but renamed to NumPy in January 2006.

Jul
2017

Anaconda Accelerate is split into Intel Distribution for Python and open source Numba's sub-projects pyculib, pyculib_sorting and data_profiler.

Tags

See Also

  • Python ML frameworks
  • NumPy
  • SciPy
  • Pandas
  • Scikit-learn
  • IPython

Further Reading

  1. Rossant, Cyrille. 2013. "Why use Python for scientific computing?" July 1. Accessed 2018-03-26.
  2. Perez, F., B. E. Granger and J. D. Hunter. 2011. "Python: An Ecosystem for Scientific Computing." Computing in Science & Engineering, vol. 13, no. 2, pp. 13-21, March-April. Accessed 2018-02-28.
  3. Bobriakov, Igor. 2017. "Top 15 Python Libraries for Data Science in 2017." Medium, May 9. Accessed 2018-02-28.
  4. Krill, Paul. 2012. "Van Rossum: Python is not too slow." InfoWorld, March 16. Accessed 2018-02-28.
  5. Oliphant, Travis E. 2012. "NumPy and SciPy: History and Ideas for the Future." SciPy Accessed 2018-02-28.

Top Contributors

Last update: 2018-07-11 09:55:12 by gurumoorthyP
Creation: 2018-02-28 06:43:21 by gurumoorthyP

Article Stats

1454
Words
1
Chats
2
Authors
5
Edits
3
Likes
1084
Hits

Cite As

Devopedia. 2018. "Python for Scientific Computing." Version 5, July 11. Accessed 2018-09-25. https://devopedia.org/python-for-scientific-computing
BETA V0.17