# R (Language)

R is a free software environment for statistical computing and graphics.^{} This definition implies that R is open source, is designed for statistical computing and has strong graphing capabilities. R is a language but also an environment in the sense that it's for users who wish to do interactive statistical analysis and modelling, but who are not necessarily programmers.^{}

R is based on S, a language from the 1970s. While R has evolved a lot, it still retains many of the design constructs of its early days.^{} In recent times, adoption of R is growing due to interest in Big Data and Data Science. However, the use of R should be seen as complementary to other languages such as Python.^{}

## Discussion

How would you describe R? R is a language that's influenced by two different paradigms: object-oriented programming and functional programming. Everything in R is an object. Computations in R are function calls. In fact, function definitions and function calls are also objects.

^{}It's also an interpreted language (no compilation is required).^{}Object references in R is a combination of name and context called

*environment*. Changing an object by local references will affect the object only within the local environment. A function call translates to the creation of a new environment. Hence, phrases "call by value" or "call by reference" common in other languages, are not applicable in R.^{}There are no scalar types in R. Vector-based types are used.

^{}Lazy arguments are used for functions. In other words, the arguments are evaluated inside the functions only when required.^{}What are the advantages of R? Unlike S-PLUS, SAS or SPSS, R is open source and can be used without any licensing fee. R is cross-platform: it can be used on Linux, Windows or MAC OS.

^{}R is supported by a large community. Along with frequent releases by a core team, the wider community has released thousands of packages that are available at the Comprehensive R Archive Network (CRAN). Beyond this community of volunteers, R is backed by big companies.

^{}The R Foundation and the R Consortium are committed to the evolution of R.^{}R and many third-party packages are suited for data analysis and modelling. R simplifies the work of a data scientist by offering tools for easy data manipulation, visualization and modeling.

^{}At its core, R operations are vectorized. This means that vectors and matrices can be easily handled.^{}Because it's interpreted, R is highly interactive.^{}R users can start using R interactively, and as they gather more expertise, they can do more complex things by writing R scripts.^{}What are the typical applications where R is used? R is popular among data scientists because of the many statistical packages that it offers. Quantitative analysts use R.

^{}It's been used in diverse fields: biotech, finance, research, high technology industries, and more.^{}Anywhere there's a need to produce production-quality plotting, R is a suitable choice. Likewise, in the area of Machine Learning R has an advantage because of its strong ties with academia.^{}In econometrics, one researcher explained that R enabled him to do complex regressions. He was able to do this via custom matrix algebra and validate the results against those offered by R packages. When there's a need to look at complex statistical properties, R has an advantage.

^{}According to data scientist Matt Adams,

^{}I wouldn't even say R is for programmers. It's best suited for people that have data-oriented problems they're trying to solve, regardless of their programming aptitude.

What are some criticisms of R? R is said to be poor in memory management. Because of the requirement of environments, all objects must be stored internally rather than as files on disk.

^{}In recent times, this limitation has been somewhat remedied.^{}Newer developments from Microsoft are expected to address this as well: ParallelR for distributed computing, Rhadoop for running on Hadoop nodes, Reproducible R Toolkit and RevoPemaR for writing parallel external memory algorithms.^{}Also, there are packages and tools to handle Big Data in R.^{}R is worse in terms of performance and lacks proper unit testing frameworks.

^{}Security was not built into R at the design stage. This means that we can't use it via web server or embed it in a web browser. However, the use of isolated virtual containers can provide the necessary security to make this possible in a world of cloud computing.

^{}Likewise, Shiny is an R package that can be used to create interactive web applications.^{}DeployR is another package for bringing R analytics to web dashboards.^{}For Data Science, should I choose R or Python? Both R and Python are open source. Both are maintained by active communities.

^{}Both are interpreted languages. Both can be used interactively. Both have good tools to work with.^{}While Python has 138,000+ packages,

^{}R has 12,000+ packages (May 2018).^{}This is expected since Python is versatile while R is more focused on statistical computing. R is said to have more statistical packages that suit the work of a data scientist. However, Python's capability is getting better due to NumPy, Pandas, scikit-learn, Keras, and TensorFlow. Python's graphical capabilities are also improving with matplotlib, Bokeh, Seaborn, and Plotly.^{}Because Python is more object-oriented, it may be a better choice for large projects. If the main work is data analysis, R must be preferred. If work involves gathering and cleaning data as well, Python may be a better choice. Having said that, statisticians tend to prefer R while computer scientists go with Python.

^{}A mix of the two, each suited to its strengths, is a good choice. Python could be used for preprocessing data while R could do the statistical analysis.

^{}As a beginner, how can I get started with R? Download and install R. This is sufficient to execute R commands and scripts. For better developer experience, download and install RStudio. This is an IDE that integrates a text editor with syntax checking, R console to execute commands, easy access to documentation, a panel to view variables for debugging, a plot viewer, and more.

^{}Read about most useful third-party packages. Install these from CRAN as required by your projects. One approach is to install the tidyverse collection of packages. This will install a number of packages useful for data science work.

Learn one or more of R Markdown, Shiny, Knitr and Bookdown. These will help you to document your code, its results and share the same with others in various formats.

Learn the language syntax. Get familiar with R data types and structures. In particular, learn about vectors, lists and data frames. Next, learn about R's graphing capabilities. Common plotting systems to learn are

*Base*and*ggplot2*.

## Milestones

A new language by the name of *S* is initiated in Bell Labs for statistical computations. It's written in Fortran. In 1988, it's rewritten in C. *S-PLUS* is a commercial implementation of *S*.^{}

1993

*R* language is announced to the public by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland.^{} Their work on R started in 1991.^{} For portability, R is coded in ANSI standard C.^{} In syntax, R resembles S but the underlying semantics is based on Scheme. Ihaka mentions,^{}

We implemented the language by first writing an interpreter for a Scheme subset and then progressively mutating it to resemble S.

1995

R source code is released to all under the Free Software Foundation's GNU GPL license.^{}

*R Core Group* is formed to handle change requests. Group members are volunteers and contribute whenever possible.^{}

2000

R version 1.0.0 is released to the public.^{}

*R Consortium* is founded to "support the worldwide community of users, maintainers and developers of R software".^{} The Consortium collaborates with *R Foundation* (founded in 2003),^{} the governing body of the R Project.

Microsoft acquires Revolution Analytics, which offers both community and enterprise distributions of R.^{} In January 2016, Microsoft announces that Revolution products are renamed/integrated into Microsoft's products.^{}

2018

R version 3.5.0 is released.^{} In May 2018, *Microsoft R Open (MRO)* version 3.5.0 becomes available.^{}

2018

There are about 12,500+ packages on the *The Comprehensive R Archive Network (CRAN)*.^{} A similar number is listed on the *Microsoft R Application Network (MRAN)*.^{}

## References

- Business Process Incubator. 2016. "Importance of Learning 'R' for Data Science." Blog, July 28. Accessed 2018-05-13.
- Castle, Nikki. 2017. "R vs. Python: What Language is Best for Building Data Models?" Datascience.com, July 20. Accessed 2018-05-14.
- Chambers, John M. 2014. "Object-Oriented Programming, Functional Programming and R." Statistical Science, Vol. 29, No. 2, pp. 167–180, Institute of Mathematical Statistics. Accessed 2018-05-14.
- CRAN. 2018a. "Contributed Packages." R Project, May. Accessed 2018-05-14.
- CRAN. 2018b. "Index of /src/base/R-3." R Project, April. Accessed 2018-05-14.
- Data Flair. 2017. "Introduction to R Programming | Features & Applications of R." Data Flair, March 20. Accessed 2018-05-14.
- El Gewily, Shady. 2017. "Does anyone use R for something different than machine learning and data science." Quora, April 18. Accessed 2018-05-14.
- Grolemund, Garrett. 2015. "Work-with-big-data." Webinars, RStudio GitHub, November 25. Accessed 2018-05-14.
- Ihaka, Ross. 1998. "R : Past and Future History: A Free Software Project." Interface '98 , Statistics Department, The University of Auckland. Accessed 2018-05-13.
- Ihaka, Ross, and Robert Gentleman. 1996. "R: A Language for Data Analysis and Graphics." Journal of Computational and Graphical Statistics, vol. 5, no. 3, pp. 299-314. Accessed 2018-05-13.
- Krill, Paul. 2015. "Why R? The pros and cons of the R language." InfoWorld, June 30. Accessed 2018-05-14.
- Lee, Cheng Han. 2015. "How to Choose Between Learning Python or R First." Udacity, January 12. Accessed 2018-05-14.
- MRAN. 2018. "Packages." Microsoft R Application Network, Microsoft, May. Accessed 2018-05-14.
- MRAN Revolution Analytics. 2018. "Microsoft R Open & MKL Downloads." Microsoft R Application Network, Microsoft, May. Accessed 2018-05-14.
- Peng, Roger D. 2016. "R Programming for Data Science." December 22. Accessed 2018-05-13.
- PyPI. 2018. "The Python Package Index." Accessed 2018-05-14.
- R Consortium. 2018. "About." Accessed 2018-05-13.
- R Project. 2016. "R Logo." R Foundation. Accessed 2018-05-13.
- R Project. 2018. "The R Project for Statistical Computing." R Foundation. Accessed 2018-05-13.
- Sirosh, Joseph. 2015. "Microsoft Closes Acquisition of Revolution Analytics." Machine Learning Blog, Microsoft, April 6. Accessed 2018-05-14.
- Smith, Dave. 2016a. "Revolution R renamed Microsoft R, available free to developers and students." Revolutions Blog, January 12. Accessed 2018-05-13.
- Smith, David. 2016b. "Over 16 years of R Project history." Revolutions Blog, March 4. Accessed 2018-05-13.
- Wayner, Peter. 2017. "Python vs. R: The battle for data scientist mind share." InfoWorld, April 6. Accessed 2018-05-14.

## Further Reading

- Ihaka, Ross, and Robert Gentleman. 1996. "R: A Language for Data Analysis and Graphics." Journal of Computational and Graphical Statistics, vol. 5, no. 3, pp. 299-314. Accessed 2018-05-13.
- Chambers, John M. 2014. "Object-Oriented Programming, Functional Programming and R." Statistical Science, Vol. 29, No. 2, pp. 167–180, Institute of Mathematical Statistics. Accessed 2018-05-14.
- Peng, Roger D. 2016. "R Programming for Data Science." December 22. Accessed 2018-05-13.
- Data Flair. 2017. "Introduction to R Programming | Features & Applications of R." Data Flair, March 20. Accessed 2018-05-14.
- Asay, Matt. 2015. "In data science, the R language is swallowing Python." InfoWorld, July 24. Accessed 2018-05-13.

## Article Stats

## Cite As

### See Also

- R Data Structures
- R Plotting Systems
- Object-Oriented Programming in R
- Vectorization in R
- Data Science
- Machine Learning