R (Language)

R logo. Source: R Project 2016.
R logo. Source: R Project 2016.

R is a free software environment for statistical computing and graphics. This definition implies that R is open source, is designed for statistical computing and has strong graphing capabilities. R is a language but also an environment in the sense that it's for users who wish to do interactive statistical analysis and modelling, but who are not necessarily programmers.

R is based on S, a language from the 1970s. While R has evolved a lot, it still retains many of the design constructs of its early days. In recent times, adoption of R is growing due to interest in Big Data and Data Science. However, the use of R should be seen as complementary to other languages such as Python.

Discussion

  • How would you describe R?

    R is a language that's influenced by two different paradigms: object-oriented programming and functional programming. Everything in R is an object. Computations in R are function calls. In fact, function definitions and function calls are also objects. It's also an interpreted language (no compilation is required).

    Object references in R is a combination of name and context called environment. Changing an object by local references will affect the object only within the local environment. A function call translates to the creation of a new environment. Hence, phrases "call by value" or "call by reference" common in other languages, are not applicable in R.

    There are no scalar types in R. Vector-based types are used. Lazy arguments are used for functions. In other words, the arguments are evaluated inside the functions only when required.

  • What are the advantages of R?
    Some of the advantages of R. Source: Business Process Incubator 2016.
    Some of the advantages of R. Source: Business Process Incubator 2016.

    Unlike S-PLUS, SAS or SPSS, R is open source and can be used without any licensing fee. R is cross-platform: it can be used on Linux, Windows or MAC OS.

    R is supported by a large community. Along with frequent releases by a core team, the wider community has released thousands of packages that are available at the Comprehensive R Archive Network (CRAN). Beyond this community of volunteers, R is backed by big companies. The R Foundation and the R Consortium are committed to the evolution of R.

    R and many third-party packages are suited for data analysis and modelling. R simplifies the work of a data scientist by offering tools for easy data manipulation, visualization and modeling. At its core, R operations are vectorized. This means that vectors and matrices can be easily handled. Because it's interpreted, R is highly interactive. R users can start using R interactively, and as they gather more expertise, they can do more complex things by writing R scripts.

  • What are the typical applications where R is used?

    R is popular among data scientists because of the many statistical packages that it offers. Quantitative analysts use R. It's been used in diverse fields: biotech, finance, research, high technology industries, and more. Anywhere there's a need to produce production-quality plotting, R is a suitable choice. Likewise, in the area of Machine Learning R has an advantage because of its strong ties with academia.

    In econometrics, one researcher explained that R enabled him to do complex regressions. He was able to do this via custom matrix algebra and validate the results against those offered by R packages. When there's a need to look at complex statistical properties, R has an advantage.

    According to data scientist Matt Adams,

    I wouldn't even say R is for programmers. It's best suited for people that have data-oriented problems they're trying to solve, regardless of their programming aptitude.
  • What are some criticisms of R?

    R is said to be poor in memory management. Because of the requirement of environments, all objects must be stored internally rather than as files on disk. In recent times, this limitation has been somewhat remedied. Newer developments from Microsoft are expected to address this as well: ParallelR for distributed computing, Rhadoop for running on Hadoop nodes, Reproducible R Toolkit and RevoPemaR for writing parallel external memory algorithms. Also, there are packages and tools to handle Big Data in R.

    R is worse in terms of performance and lacks proper unit testing frameworks.

    Security was not built into R at the design stage. This means that we can't use it via web server or embed it in a web browser. However, the use of isolated virtual containers can provide the necessary security to make this possible in a world of cloud computing. Likewise, Shiny is an R package that can be used to create interactive web applications. DeployR is another package for bringing R analytics to web dashboards.

  • For Data Science, should I choose R or Python?

    Both R and Python are open source. Both are maintained by active communities. Both are interpreted languages. Both can be used interactively. Both have good tools to work with.

    While Python has 138,000+ packages, R has 12,000+ packages (May 2018). This is expected since Python is versatile while R is more focused on statistical computing. R is said to have more statistical packages that suit the work of a data scientist. However, Python's capability is getting better due to NumPy, Pandas, scikit-learn, Keras, and TensorFlow. Python's graphical capabilities are also improving with matplotlib, Bokeh, Seaborn, and Plotly.

    Because Python is more object-oriented, it may be a better choice for large projects. If the main work is data analysis, R must be preferred. If work involves gathering and cleaning data as well, Python may be a better choice. Having said that, statisticians tend to prefer R while computer scientists go with Python.

    A mix of the two, each suited to its strengths, is a good choice. Python could be used for preprocessing data while R could do the statistical analysis.

  • As a beginner, how can I get started with R?

    Download and install R. This is sufficient to execute R commands and scripts. For better developer experience, download and install RStudio. This is an IDE that integrates a text editor with syntax checking, R console to execute commands, easy access to documentation, a panel to view variables for debugging, a plot viewer, and more.

    Read about most useful third-party packages. Install these from CRAN as required by your projects. One approach is to install the tidyverse collection of packages. This will install a number of packages useful for data science work.

    Learn one or more of R Markdown, Shiny, Knitr and Bookdown. These will help you to document your code, its results and share the same with others in various formats.

    Learn the language syntax. Get familiar with R data types and structures. In particular, learn about vectors, lists and data frames. Next, learn about R's graphing capabilities. Common plotting systems to learn are Base and ggplot2.

Milestones

1976

A new language by the name of S is initiated in Bell Labs for statistical computations. It's written in Fortran. In 1988, it's rewritten in C. S-PLUS is a commercial implementation of S.

Aug
1993

R language is announced to the public by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland. Their work on R started in 1991. For portability, R is coded in ANSI standard C. In syntax, R resembles S but the underlying semantics is based on Scheme. Ihaka mentions,

We implemented the language by first writing an interpreter for a Scheme subset and then progressively mutating it to resemble S.
Jun
1995

R source code is released to all under the Free Software Foundation's GNU GPL license.

1997

R Core Group is formed to handle change requests. Group members are volunteers and contribute whenever possible.

Feb
2000

R version 1.0.0 is released to the public.

2015

R Consortium is founded to "support the worldwide community of users, maintainers and developers of R software". The Consortium collaborates with R Foundation (founded in 2003), the governing body of the R Project.

2015
Products from Revolution Analytics integrated into Microsoft's product portfolio. Source: Smith 2016a.
Products from Revolution Analytics integrated into Microsoft's product portfolio. Source: Smith 2016a.

Microsoft acquires Revolution Analytics, which offers both community and enterprise distributions of R. In January 2016, Microsoft announces that Revolution products are renamed/integrated into Microsoft's products.

Apr
2018

R version 3.5.0 is released. In May 2018, Microsoft R Open (MRO) version 3.5.0 becomes available.

May
2018

There are about 12,500+ packages on the The Comprehensive R Archive Network (CRAN). A similar number is listed on the Microsoft R Application Network (MRAN).

References

  1. Business Process Incubator. 2016. "Importance of Learning 'R' for Data Science." Blog, July 28. Accessed 2018-05-13.
  2. CRAN. 2018a. "Contributed Packages." R Project, May. Accessed 2018-05-14.
  3. CRAN. 2018b. "Index of /src/base/R-3." R Project, April. Accessed 2018-05-14.
  4. Castle, Nikki. 2017. "R vs. Python: What Language is Best for Building Data Models?" Datascience.com, July 20. Accessed 2018-05-14.
  5. Chambers, John M. 2014. "Object-Oriented Programming, Functional Programming and R." Statistical Science, Vol. 29, No. 2, pp. 167–180, Institute of Mathematical Statistics. Accessed 2018-05-14.
  6. Data Flair. 2017. "Introduction to R Programming | Features & Applications of R." Data Flair, March 20. Accessed 2018-05-14.
  7. El Gewily, Shady. 2017. "Does anyone use R for something different than machine learning and data science." Quora, April 18. Accessed 2018-05-14.
  8. Grolemund, Garrett. 2015. "Work-with-big-data." Webinars, RStudio GitHub, November 25. Accessed 2018-05-14.
  9. Ihaka, Ross. 1998. "R : Past and Future History: A Free Software Project." Interface '98 , Statistics Department, The University of Auckland. Accessed 2018-05-13.
  10. Ihaka, Ross, and Robert Gentleman. 1996. "R: A Language for Data Analysis and Graphics." Journal of Computational and Graphical Statistics, vol. 5, no. 3, pp. 299-314. Accessed 2018-05-13.
  11. Krill, Paul. 2015. "Why R? The pros and cons of the R language." InfoWorld, June 30. Accessed 2018-05-14.
  12. Lee, Cheng Han. 2015. "How to Choose Between Learning Python or R First." Udacity, January 12. Accessed 2018-05-14.
  13. MRAN. 2018. "Packages." Microsoft R Application Network, Microsoft, May. Accessed 2018-05-14.
  14. MRAN Revolution Analytics. 2018. "Microsoft R Open & MKL Downloads." Microsoft R Application Network, Microsoft, May. Accessed 2018-05-14.
  15. Peng, Roger D. 2016. "R Programming for Data Science." December 22. Accessed 2018-05-13.
  16. PyPI. 2018. "The Python Package Index." Accessed 2018-05-14.
  17. R Consortium. 2018. "About." Accessed 2018-05-13.
  18. R Project. 2016. "R Logo." R Foundation. Accessed 2018-05-13.
  19. R Project. 2018. "The R Project for Statistical Computing." R Foundation. Accessed 2018-05-13.
  20. Sirosh, Joseph. 2015. "Microsoft Closes Acquisition of Revolution Analytics." Machine Learning Blog, Microsoft, April 6. Accessed 2018-05-14.
  21. Smith, Dave. 2016a. "Revolution R renamed Microsoft R, available free to developers and students." Revolutions Blog, January 12. Accessed 2018-05-13.
  22. Smith, David. 2016b. "Over 16 years of R Project history." Revolutions Blog, March 4. Accessed 2018-05-13.
  23. Wayner, Peter. 2017. "Python vs. R: The battle for data scientist mind share." InfoWorld, April 6. Accessed 2018-05-14.

Further Reading

  1. Ihaka, Ross, and Robert Gentleman. 1996. "R: A Language for Data Analysis and Graphics." Journal of Computational and Graphical Statistics, vol. 5, no. 3, pp. 299-314. Accessed 2018-05-13.
  2. Chambers, John M. 2014. "Object-Oriented Programming, Functional Programming and R." Statistical Science, Vol. 29, No. 2, pp. 167–180, Institute of Mathematical Statistics. Accessed 2018-05-14.
  3. Peng, Roger D. 2016. "R Programming for Data Science." December 22. Accessed 2018-05-13.
  4. Data Flair. 2017. "Introduction to R Programming | Features & Applications of R." Data Flair, March 20. Accessed 2018-05-14.
  5. Asay, Matt. 2015. "In data science, the R language is swallowing Python." InfoWorld, July 24. Accessed 2018-05-13.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
4
0
1281
1
0
10
1401
Words
4
Likes
7519
Hits

Cite As

Devopedia. 2022. "R (Language)." Version 5, February 15. Accessed 2024-06-25. https://devopedia.org/r-language
Contributed by
2 authors


Last updated on
2022-02-15 11:49:45

Improve this article

Article Warnings

  • In References, replace these sub-standard sources: data-flair.training
  • In Further Reading, replace these sub-standard sources: data-flair.training