Jump to content

David G. Robinson (data scientist)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by WikiCleanerBot (talk | contribs) at 08:21, 22 October 2020 (v2.03b - Bot T20 CW#61 - WP:WCW project (Reference before punctuation)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

David G. Robinson is a data scientist at the Heap analytics company. He is a co-author of the tidytext R (programming language) package and the O’Reilly book, Text Mining with R. Robinson has previously worked as a Chief Data Scientist at DataCamp and as a data scientist at Stack Overflow.[1] He was also a data engineer at Flatiron Health in 2019.

Education

Robinson received his PhD in Quantitative and Computational Biology from Princeton University[2] and his Bachelors from Harvard University with a degree in A.B., Statistics in 2010.[3]

Career

Robinson previously worked at Flatiron Health, where he used data science in the fight against cancer on the Data Insights Engineering team. He has three courses on DataCamp published, which assist people with learning R and data science.[4] He has also published the book, Text Mining with R: A Tidy Approach,[5] which a guide to drawing insights from text using the tidytext package in R. Co-authored with Julia Silge, and published by O’Reilly in July 2017.[6] Another book authored by Robinson is Introduction to Empirical Bayes: Examples from Baseball Statistics, an e-book demonstrating the statistical method of empirical Bayes, based on the example of estimating baseball batting averages.[7]

Robinson is known for his analysis of Donald Trump's tweets in 2016, when he found that posts from Trump's official account came from multiple sources.[8][9][10]

Publications

Robinson has numerous publications including, "Widespread changes in mRNA stability contribute to quiescence-specific gene expression patterns in a fibroblast model of quiescence",[11] "broom: An R package for converting statistical analysis objects into tidy data frames",[12] "A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays",[13] "subSeq: Determining appropriate sequencing depth through efficient read subsampling",[14] "Design and Analysis of Bar-seq Experiments",[15] and "OASIS: an automated program for global investigation of bacterial and archaeal insertion sequences".[16]

As mentioned, his book, "Introduction to Empirical Bayes", helps readers understand Bayesian methods for estimating binomial proportions, through a series of examples drawn from baseball statistics.[17]

References

  1. ^ "Learn R, Python & Data Science Online". undefined. Retrieved 2020-04-01.{{cite web}}: CS1 maint: url-status (link)
  2. ^ "QCB Graduate | Lewis-Sigler Institute". lsi.princeton.edu. Retrieved 2020-04-01.
  3. ^ Robinson, David. "LinkedIn".{{cite web}}: CS1 maint: url-status (link)
  4. ^ "The gapminder dataset | R". campus.datacamp.com. Retrieved 2020-04-01.
  5. ^ Silge, Julia (12 June 2017). Text mining with R : a tidy approach. Robinson, David (First ed.). Sebastopol, CA. ISBN 978-1-4919-8162-7. OCLC 990182937.{{cite book}}: CS1 maint: location missing publisher (link)
  6. ^ Robinson, Julia Silge and David. Text Mining with R.
  7. ^ "Introduction to Empirical Bayes: Examples from Baseball Statistics". Gumroad. Retrieved 2020-04-01.
  8. ^ Greenemeier, Larry. "Only Some of @realDonaldTrump's Tweets Are Actually Donald Trump". Scientific American. Retrieved 2020-06-01.
  9. ^ Berger, Arielle. "DATA SCIENTIST: There's an easy way to tell if one of Trump's tweets came from him or his campaign". Business Insider. Retrieved 2020-06-01.
  10. ^ Kahn, Andrew; Philbrick, Ian Prasad (2016-08-15). "Who Wrote These Donald Trump Tweets?". Slate. ISSN 1091-2339. Retrieved 2020-06-01.
  11. ^ Johnson, Elizabeth L.; Robinson, David G.; Coller, Hilary A. (2017-02-01). "Widespread changes in mRNA stability contribute to quiescence-specific gene expression patterns in a fibroblast model of quiescence". BMC Genomics. 18 (1): 123. doi:10.1186/s12864-017-3521-0. ISSN 1471-2164. PMC 5286691. PMID 28143407.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  12. ^ Robinson, David (2014-12-19). "broom: An R Package for Converting Statistical Analysis Objects Into Tidy Data Frames". arXiv:1412.3565 [stat.CO].
  13. ^ Robinson, David G.; Wang, Jean; Storey, John D. (2015). "A Nested Parallel Experiment Demonstrates Differences in Intensity-Dependence Between RNA-Seq and Microarrays". Nucleic Acids Research. 43 (20): gkv636. bioRxiv 10.1101/013342. doi:10.1093/nar/gkv636. PMC 4787771. PMID 26130709.
  14. ^ Robinson, David G.; Storey, John D. (2014-12-01). "subSeq: Determining Appropriate Sequencing Depth Through Efficient Read Subsampling". Bioinformatics. 30 (23): 3424–3426. doi:10.1093/bioinformatics/btu552. ISSN 1367-4803. PMC 4296149. PMID 25189781.
  15. ^ Robinson, David G.; Chen, Wei; Storey, John D.; Gresham, David (2014-01-01). "Design and Analysis of Bar-seq Experiments". G3: Genes, Genomes, Genetics. 4 (1): 11–18. doi:10.1534/g3.113.008565. ISSN 2160-1836. PMC 3887526. PMID 24192834.
  16. ^ Robinson, David G.; Lee, Ming-Chun; Marx, Christopher J. (2012-12-01). "OASIS: an automated program for global investigation of bacterial and archaeal insertion sequences". Nucleic Acids Research. 40 (22): e174. doi:10.1093/nar/gks778. ISSN 0305-1048. PMC 3526298. PMID 22904081.
  17. ^ February 07, Announcing the release of my e-book: Introduction to Empirical Bayes was published on; 2017. "Announcing the release of my e-book: Introduction to Empirical Bayes". Variance Explained. Retrieved 2020-04-13. {{cite web}}: |last2= has numeric name (help)CS1 maint: numeric names: authors list (link)