Jump to content

dplyr

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Cewbot (talk | contribs) at 06:45, 22 January 2021 (Normalize {{Multiple issues}}: Remove {{Multiple issues}} for only 1 maintenance template(s): Context). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


dplyr
Original author(s)Hadley Wickham
Initial releaseJanuary 7, 2014; 10 years ago (2014-01-07)
Stable release
1.0.0 / June 1, 2020; 4 years ago (2020-06-01)
Written inR
LicenseGPLv2
Websitedplyr.tidyverse.org//

One of the core packages of the tidyverse in R, dplyr is primarily a set of functions designed to enable dataframe manipulation in an intuitive, user-friendly way. Data analysts typically use dplyr in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization.

Authored primarily by Hadley Wickham, dplyr was launched in 2014.[1]

The five core verbs

While dplyr actually includes several dozen functions that enable various forms of data manipulation, the package features five primary verbs:[2]

filter(), which is used to extract rows from a dataframe, based on conditions specified by a user;

select(), which is used to subset a dataframe by its columns;

arrange(), which is used to sort rows in a dataframe based on attributes held by particular columns;

mutate(), which is used to create new variables, by altering and/or combining values from existing columns; and

summarize(), also spelled summarise(), which is used to collapse values from a dataframe into a single summary.

Additional functions

In addition to its five main verbs, dplyr also includes several other functions that enable exploration and manipulation of dataframes. Included among these are:

count(), which is used to sum the number of unique observations that contain some particular value or categorical attribute;

slice_max(), which returns a data subset that contains the rows with the highest number of values for some particular variable;

slice_min(), which returns a data subset that contains the rows with the lowest number of values for some particular variable.

Built-in datasets

The dplyr package comes with five datasets. These are: band_instruments, band_instruments2, band_members, starwars, storms.        

References

  1. ^ "Introducing dplyr". blog.rstudio.com. Retrieved 2020-09-02.
  2. ^ Grolemund, Garrett; Wickham, Hadley. 5 Data transformation | R for Data Science.