This article's tone or style may not reflect the encyclopedic tone used on Wikipedia. (February 2012) (Learn how and when to remove this template message)
Douglass Read Cutting is a software designer and advocate and creator of open-source search technology. He founded Lucene and, with Mike Cafarella, Nutch, both open-source search technology projects which are now managed through the Apache Software Foundation. Cutting and Cafarella are also the co-founders of Apache Hadoop.
Education and early career
Prior to developing Lucene, Cutting held search technology positions at Xerox PARC where he worked on the Scatter/Gather algorithm and on computational stylistics. He also worked at Excite, where he was one of the chief designers of the search engine, and Apple Inc., where he was the primary author of the V-Twin text search framework.
Open source projects
Lucene, a search indexer, and Nutch, a spider or crawler, are the two key components of an open-source general search platform, which first crawls the Web for content, and then structures it into a searchable index. Cutting's leadership of these two projects extended the concepts and capabilities of general open-source software projects such as Linux and MySQL into the vertical domain of search. In a 2017 article, Cutting was quoted with the statement, "open source is a requirement for business."
Use of MapReduce paradigm
In December 2004, Google Labs published a paper on the MapReduce algorithm, which allows very large scale computations to be trivially parallelized across large clusters of servers. Cutting and Cafarella, realizing the importance of this paper to extending Lucene into the realm of extremely large search problems, created the open-source Hadoop framework that allows applications based on the MapReduce paradigm to be run on large clusters of commodity hardware. Cutting was an employee of Yahoo!, where he led the Hadoop project full-time. He later went on to work for Cloudera.
Open source foundations and awards
In July 2009, Cutting was elected to the board of directors of the Apache Software Foundation, and in September 2010, he was elected its chairman.
- Cutting, Mike Cafarella, Ben Lorica, Doug (2016-03-31). "The next 10 years of Apache Hadoop". O'Reilly Media. Retrieved 2018-04-16.
- "Cloudera management team". Cloudera. Retrieved 2016-08-17.
- Cutting, Douglass R., David R. Karger, Jan O. Pedersen, and John W. Tukey. "Scatter/gather: A cluster-based approach to browsing large document collections." SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. (Reprinted in ACM SIGIR Forum, vol. 51, no. 2, pp. 148-159. ACM, 2017.)
- Pedersen, Jan O., David Karger, Douglass R. Cutting, and John W. Tukey. "Scatter-gather: a cluster-based method and apparatus for browsing large document collections." U.S. Patent 5,442,778, issued August 15, 1995.
- Karlgren, Jussi; Cutting, Douglass. "Recognizing text genres with simple metrics using discriminant analysis.". Proceedings of the 15th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1994.
- "The Lucene search engine: Powerful, flexible, and free". JavaWorld. 2000-09-15. Retrieved 2017-01-25.
Cutting is the primary author of the V-Twin search engine (part of Apple's Copland operating system effort)…
- "Wikipedia: Powered by Lucene". Lucene. Retrieved September 5, 2007.
- "Doug Cutting, 'father' of Hadoop, talks about big data tech evolution". ComputerWeekly.com. Retrieved June 26, 2018.
- Handy, Alex (10 August 2009). "Hadoop creator goes to Cloudera". Software Development Times. Archived from the original on 13 March 2012. Retrieved 2011-03-22.
- "O'Reilly Open Source Awards - OSCON 2015". YouTube. O'Reilly. Retrieved 27 July 2015.
- "Doug Cutting's blog".
- An interview with Doug Cutting
- Video interview of Doug Cutting
- Audio interview with Doug Cutting
- Doug Cutting's publications and patents
- Doug Cutting joins Yahoo!
- Blog post by Tom White about Doug Cutting creating Hadoop Note that this post was written while Hadoop was still an unnamed spinoff of Nutch. Tom updates his earlier post with the Hadoop name here.
- Article co-authored by Doug Cutting in ACM Queue, 'Building Nutch: Open Source Search'