Jump to content

Spark NLP

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Liance (talk | contribs) at 15:23, 15 August 2019. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Spark NLP
Original author(s)John Snow Labs
Initial releaseOctober 2017[1]
Stable release
2.0 / March 2019; 5 years ago (2019-03)
Repositorygithub.com/JohnSnowLabs/spark-nlp
Written inPython, Scala
Operating systemLinux, Windows, macOS, OS X
TypeNatural language processing
LicenseApache licence
Websitewww.johnsnowlabs.com/spark-nlp/

Spark NLP is an open-source text processing library built on top of Apache Spark and its Spark ML library.[2][3][4][5][6] Its goal is to provide an API for natural language processing annotations allowing a scalable approach within a distributed large scale environment.

Main features

Several annotators are provided out of the box for both Python and Scala:

  • Tokenizer: Word tokens
  • Normalizer: Text cleaning
  • Stemmer: Hard stems
  • Lemmatizer: Lemmas
  • RegexMatcher: Rule matching
  • TextMatcher: Phrase matching
  • Chunker: Meaningful phrase matching
  • DateMatcher: Date-time parsing
  • SentenceDetector: Sentence Boundary Detector
  • DeepSentenceDetector: Sentence Boundary Detector with Machine Learning
  • POSTagger: Part of speech tagger
  • ViveknSentimentDetector: Sentiment analysis
  • SentimentDetector: Sentiment analysis
  • Named Entity Recognition CRF annotator
  • Named Entity Recognition Deep Learning annotator
  • SpellChecker: Norvig algorithm
  • SpellChecker: Symmetric delete
  • Dependency Parser: Unlabeled grammatical relation
  • Typed Dependency Parser: Labeled grammatical relation

References

  1. ^ Talby, David. "Introducing the Natural Language Processing Library for Apache Spark". databricks.com. databricks. Retrieved 29 March 2019.
  2. ^ Team, Editorial (2018-09-04). "The Use of NLP to Extract Unstructured Medical Data From Text". insideBIGDATA. Retrieved 2019-03-29.
  3. ^ "John Snow Labs' Natural Language Understanding Software Gets "State of the Art" Recognition in Three Industry Events". StartUp Beat. 2018-07-19. Retrieved 2019-03-29.
  4. ^ Ellafi, Saif Addin (2018-02-28). "Comparing production-grade NLP libraries: Running Spark-NLP and spaCy pipelines". O'Reilly Media. Retrieved 2019-03-29.
  5. ^ Ellafi, Saif Addin (2018-02-28). "Comparing production-grade NLP libraries: Accuracy, performance, and scalability". O'Reilly Media. Retrieved 2019-03-29.
  6. ^ Ewbank, Kay. "Spark Gets NLP Library". www.i-programmer.info. {{cite web}}: Cite has empty unknown parameter: |dead-url= (help)

Category:Software Category:Open-source artificial intelligence