User:Ironholds/R

From Wikipedia, the free encyclopedia

History[edit]

Initial work[edit]

The clock tower at the University of Auckland, where the first versions of R were written by Ross Ihaka and Robert Gentleman.

R was initially written by Ross Ihaka and Robert Gentleman, colleagues at the University of Auckland. Both had a strong interest in statistical programming, but felt that commercial options on their Macintosh computers were too limited. Instead, they began working on an alternative for their personal use. This was initially based on Ihaka's familiarity with and appreciation of the Scheme programming language but, over time, grew to include more and more syntax similar from the S statistical programming language. The result was, syntactically, very like S, with two major differences (both as a result of the Scheme influence): a fixed amount of memory assigned at startup, combined with regular garbage collection to minimise memory usage, and the use of lexical scoping.[1] They named it R, as both a reference to S and to their shared first initial.[2]

In August 1993, some initial binaries were made available on the s-news mailing list; several people provided feedback, including Martin Mächler, who encouraged Gentleman and Ihaka to release R under the GNU General Public License, resulting in freely reusable and modifiable software. Despite initial reluctance, this release happened in June 1995, the result being a free and open source statistical programming language and environment.[3] As a result of the combination of S-influenced syntax and GNU licensing, R is sometimes referred to as "GNU S".[4][5] Mächler also volunteered the use of resources at his parent institution, ETH Zurich, to host mailing lists for discussing and maintaining R - by 1997, this led to the creation of the r-announce, r-devel and r-help mailing lists for announcements from the developers, developer discussion and help, respectively. As people began writing packages to extend the language, Kurt Hornik at the Vienna University of Technology began hosting CRAN, the Comprehensive R Archive Network, to host current and past versions of R packages and make them available to download.[3]

Increasing popularity and R Core team[edit]

With the introduction of the mailing lists for discussion, R development accelerated,[3] initially focused on matching the capabilities of Version 3 of S.[6] Without the bandwidth to keep up with all the patches and proposed changes, Ihaka and Gentleman formed the R Core team to take over maintenance of the language; by early 1998 this consisted of Gentleman, Ihaka, Mächler and Hornik, along with Peter Dalgaard, Thomas Lumley, Luke Tierney, Paul Murrell, Heiner Schwarte, Friedrich Leisch and Douglas Bates.[3]

Syntax[edit]

R's syntax is heavily influenced by Scheme and S, although ... Unlike most other languages, R has no formal language convention

Data structures[edit]

R has five common data structures, three homogenous (meaning that all elements must be of the same type) and two heterogenous (contents may be of different types):

Dimensions Homogenous Heterogeneous
1 atomic vector list
2 matrix data frame
3 array

Unlike other programming languages, R has no scalar elements; all data types are capable of holding multiple elements.[7]

Atomic vectors are of one of four types; logical (containing TRUE and FALSE values), integers (containing whole numbers), double (containing numbers with a decimal place, and also known as numeric) and character (containing text). Construction of atomic vectors is usually done with the c() or "combine" function, which detects the type of the provided elements and generates a vector of that type. Each type also has a dedicated constructor:

#Creating a double ('numeric') atomic vector with type detection
double_vector <- c(2.4)

#Creating a double atomic vector explicitly
also_a_double_vector <- as.double(2.4)

Atomic vectors of different types can be merged into a single, larger vector, but doing so necessitates the coercion of the elements to the type that can most flexibly hold them. The ordering for this runs (from most to least flexible) character, double, integer and logical: mixing a character and logical vector, for example, would produce a character vector, while mixing an integer and a double vector would produce a double.[8] Because vectors are stored contiguously, they cannot be resized; instead, appending to a vector or merging multiple vectors together creates an entirely new object.[9]

Lists are a form of vector, and are sometimes known as "recursive vectors";[10] the primary distinctions between lists and atomic vectors are that lists can contain elements of different types, and can contain other lists.[11]

Arrays are essentially atomic vectors, with additional attributes specifying the number of rows and columns.[12] They are usually multidimensional, with an array consisting of three or more dimensions. Matrices are a special case of arrays that contain only two dimensions;[13] they are commonly used for statistical operations in R,[14] due to their existence as a mathematical structure.[15]

The most regularly used object for storing data is the data frame, a two-dimensional structure with named columns.[16] Data frames can contain columns of different types (although all elements within a column must have the same type), and architecturally are a modified form of a list.[17]

Operators[edit]

R's standard assignment operator is the two-character <-;[18] this originated with APL, which heavily influenced R's predecessor, S. The more-standard = can also be used for assignment,[18] but has historically been discouraged due to the use of = in function calls, which rendered the operator ambiguous in some situations.

Additional operators exist around selecting subsets of an object; [, [[ and $. [ is the most common, and tries to preserve the original type of the object; for example, list[1], to retrieve the first element in a list, will always retrieve it as a list, whatever its internal type. [[, on the other hand, consistently returns a single value; list1 will retrieve the contents of the first entry of list.[19]

Functional programming[edit]

R's function parameters are named, but function calls can optionally rely on the order in which parameter names appear in the function definition:[20]

#Define a function
sum_input <- function(x, y){
    return(x + y)
}

#This works
sum_input(x = 10, y = 10)

#This also works
sum_input(10, 10)

Lazy evaluation, metaprogramming, et al

Object-oriented programming[edit]

R's core language definition contains two different class systems for object-oriented programming. The older one, S3,... S3, S4, R5, R6

Distinguishing features[edit]

Multi-paradigm programming[edit]

R is a multi-paradigm programming language; although it is a functional programming language at its core, it contains a variety of methods for engaging in object-oriented programming, influenced by and inherited from S.[21] The

Non-standard evaluation[edit]

R's evaluation system is both non-standard and lazy,(ihaka, 305) something built into the initial design.

The non-standard nature of R's evaluation means that it is possible for a function to evaluate not only the values it is given, but also the code used to generate those values.[22] An example of this is the function to load packages, library(), which is called with the syntax:

#Load the package named "urltools"
library(urltools)

The value given (urltools) is neither a reserved word nor a string, and so under R's standard rules of evaluation, this code should fail; the library() function should search for an object called urltools, fail to find it in scope, and throw an error. Instead, library() takes the value provided, holds it unevaluated, and converts it into a string before processing it further.[23] This is a rare feature of programming languages, and "allows you to write functions that are extremely powerful", but renders code less readable and makes it more difficult to predict the outcome of running it.[24] Robert Gentleman, one of the language authors, explicitly recommends that "users should only make use of [non-standard evaluation] for compelling reasons".[25]

Lexical scoping[edit]

R's scope is lexical, meaning that variables stored inside function definitions only exist while the function does. This is not a feature found in S or S Plus, but is native to Scheme and was imported from there by Ihaka and Gentleman when writing R.[26] This functions by associating each function with an environment,

Plotting and data visualisation[edit]

R's core language definition can generate visualisations of datasets or calculations, including point and line graphs, density plots, and three-dimensional visualisations.[27]

Packages[edit]

R is distinguished by the large number of extensions, or "packages", associated with the language. These are largely hosted on the Comprehensive R Archive Network (CRAN), and as of 2015 over 6,000 distinct packages were available.[2] The existence of packages and the vast number of packages available is seen as a "major strength" of the language,[28] as they make specialised types of analysis that would otherwise only be sold commercially available to interested researchers and programmers.[2] Researchers studying the social dynamics of open source environments have also argued that the package system has played a substantial role in the success of R as a language; it allows developers to collaborate on improving the language without overwhelming the team that maintains the core implementation, and means that the tools available in R increase organically to match user needs.[29] One of the most prominent packages is ggplot2, a data visualisation package that implements Leland Wilkinson's The Grammar of Graphics.[30]

Many of these packages are developed and maintained on R-Forge, a free fork of the SourceForge software similar to GForge, run by the R Foundation. R-Forge offers package authors convenient access to version control, along with package-specific mailing lists and announcement tools.[31]

Language community[edit]

Conferences[edit]

The R community's primary conference is "useR!", which is formally supported by the R Foundation.[32] The first conference was held in Austria from 20-22 May 2004, with a focus on new developments in the core language and distribution, and best programming practices.[33] "useR!" conferences held so far are:

Dates Conference Location Notes
20-22 May 2004 useR! 2004 Vienna University of Technology, Vienna, Austria [33]
15-17 June 2006 useR! 2006 Vienna University of Economics and Business, Vienna, Austria [34]
8-10 August 2007 useR! 2007 Iowa State University, Ames, Iowa, United States [35]
12-14 August 2008 useR! 2008 Technical University of Dortmund, Dortmund, Germany [36]
8-10 July 2009 useR! 2009 ENSAR, Rennes, France [37]
21-23 July 2010 useR! 2010 National Institute of Standards and Technology, Gaithersburg, Maryland, United States [38]
16-18 August 2011 useR! 2011 University of Warwick, Coventry, United Kingdom [39]
row 1, cell 1 row 1, cell 2 row 1, cell 3
  • useR! 2012, Nashville, Tennessee, USA
  • useR! 2013, Albacete, Spain
  • useR! 2014, Los Angeles, USA
  • useR! 2015, Aalborg, Denmark[40]

Integration with other languages[edit]

As a language implemented in C, R provides a native interface to C via the .C() and .Call() functions.[41] In 2005, Dominick Samperi submitted a dedicated C++ connector as part of the RQuantLib package; this became its own package, Rcpp, in 2006, but was not updated after that year and was archived in 2009. Dirk Eddelbuettel, a Debian and R developer, began work on revitalising this project and released a dedicated C++ API in November 2008, under the Rcpp name. This API was rewritten in 2009 by Eddelbuettel and Romain Francois, and is still maintained, providing a convenient API to connect C++ libraries and code to R.[42] This package provides an easy interface between R and C++ objects, along with syntactic sugar to make C++'s representation of operations more closely represent R's.[43]


Outside of the C/C++ stack, R can be used from Python with the RPy module,[44]

Criticism[edit]

The GNU licensing of R has attracted criticism for making the language unattractive to commercial organisations that want to embed R in their projects, due to the license's viral nature.[45]

References[edit]

  1. ^ Ihaka 1998, p. 1-2.
  2. ^ a b c Tippmann 2015, p. 110.
  3. ^ a b c d Ihaka 1998, p. 4.
  4. ^ Matloff 2011, p. xix.
  5. ^ R Core Team 2001, p. 2.
  6. ^ Ihaka 1998, p. 5.
  7. ^ Wickham 2015, p. 13.
  8. ^ Wickham 2015, p. 15-16.
  9. ^ Matloff 2011, p. 26.
  10. ^ Matloff 2011, p. 85-6.
  11. ^ Wickham 2015, p. 18.
  12. ^ Matloff 2009, p. 59.
  13. ^ Matloff 2009, p. 82-3.
  14. ^ Wickham 2015, p. 24.
  15. ^ Dalgaard 2008, p. 17.
  16. ^ Wickham 2015, p. 27.
  17. ^ Matloff 2011, p. 100.
  18. ^ a b Matloff 2011, p. 4.
  19. ^ Wickham 2015, p. 40-42.
  20. ^ James 2013, p. 44.
  21. ^ Jackman 2003, p. 20.
  22. ^ Wickham 2015, p. 259-60.
  23. ^ Gentleman 2008, p. 53.
  24. ^ Wickham 2015, p. 278.
  25. ^ Gentleman 2008, p. 52.
  26. ^ Gentleman 2008, p. 59.
  27. ^ James 2013, p. 46-50.
  28. ^ Matloff 2011, p. 355.
  29. ^ Fox 2009, p. 10.
  30. ^ Valero-Mora 2010, p. 1.
  31. ^ Theußl 2009, p. 9-10.
  32. ^ "R: Conferences". R Foundation for Statistical Computing. Retrieved 14 March 2015.
  33. ^ a b R Newsletter editorial board 2003, p. 42.
  34. ^ useR! 2006 Organising Committee 2005, p. 43.
  35. ^ Cook 2007, p. 43.
  36. ^ useR! 2008 Organising Committee 2007, p. 75.
  37. ^ useR! 2009 Organising Committee 2009, p. 67.
  38. ^ Mullen 2010, p. 77.
  39. ^ useR! 2011 Organising Committee 2009, p. 79.
  40. ^ "The useR! Conference 2015". Retrieved 17 November 2014.
  41. ^ Matloff 2011, p. 323-4.
  42. ^ Eddelbuettel 2011, p. 2.
  43. ^ Eddelbuettel 2011, p. 6-8.
  44. ^ Matloff 2011, p. 330.
  45. ^ Woods, Dan (27 January 2015). "Microsoft's Revolution Analytics Acquisition Is The Wrong Way To Embrace R". Forbes. Retrieved 13 March 2015.

Bibliography[edit]