List of text mining software

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Text mining computer programs are available from many commercial and open source companies and sources.


  • AeroText – a suite of text mining applications for content analysis. Content used can be in multiple languages.
  • Angoss – Angoss Text Analytics provides entity and theme extraction, topic categorization, sentiment analysis and document summarization capabilities via the embedded Lexalytics Salience Engine. The software provides the unique capability of merging the output of unstructured, text-based analysis with structured data to provide additional predictive variables for improved predictive models and association analysis.
  • Attensity – hosted, integrated and stand-alone text mining (analytics) software that uses natural language processing technology to address collective intelligence in social media and forums; the voice of the customer in surveys and emails; customer relationship management; e-services; research and e-discovery; risk and compliance; and intelligence analysis.
  • AUTINDEX - is a commercial text mining software package based on sophisticated linguistics by IAI (Institute for Applied Information Sciences), Saarbrücken.
  • Autonomy – text mining, clustering and categorization software
  • Averbis – provides text analytics, clustering and categorization software, as well as terminology management and enterprise search
  • Basis Technology – provides a suite of text analysis modules to identify language, enable search in more than 20 languages, extract entities, and efficiently search for and translate entities.
  • Clarabridge – text analytics (text mining) software, including natural language (NLP), machine learning, clustering and categorization. Provides SaaS, hosted and on-premise text and sentiment analytics that enables companies to collect, listen to, analyze, and act on the Voice of the Customer (VOC) from both external (Twitter, Facebook, Yelp!, product forums, etc.) and internal sources (call center notes, CRM, Enterprise Data Warehouse, BI, surveys, emails, etc.).
  • Complete Discovery Source - provides software and services for data discovery and data analytics via Nytrix CIY and other proprietary tools.
  • Daedalus, S.A. – provides a text analytics software engine to automaticlly extract meaning from all types of multimedia content.
  • Eduworks – software and solutions providing analytics and text mining in education, competency management, and training.
  • Endeca Technologies – provides software to analyze and cluster unstructured text.
  • EpiAnalytics - provides integrated, cloud-based software for text and content tagging and classification for organization of unstructured text-based comment data in CRM, survey and social posts / listings.
  • Expert System S.p.A. – suite of semantic technologies and products for developers and knowledge managers.
  • FICO – leading provider of decision management solutions powered by advanced analytics (includes text analytics).
  • Feith Systems - provides text-recognition-abled (including optical character recognition and automated redaction) software solutions for Business Intelligence, KPI dashboards, Enterprise reporting, and records management with automated categorization for structured and unstructured data
  • General Sentiment - Social Intelligence platform that uses natural language processing to discover affinities between the fans of brands with the fans of traditional television shows in social media. Stand alone text analytics to capture social knowledge base on billions of topics stored to 2004.
  • Health Language Analytics - provides an infrastructure for rapid deployment of BIG TEXT solutions for coding, search, extraction and predictive modelling. Has completed much work with medical records and reports.[1]
  • IBM LanguageWare - the IBM suite for text analytics (tools and Runtime).
  • IBM SPSS - provider of Modeler Premium (previously called IBM SPSS Modeler and IBM SPSS Text Analytics), which contains advanced NLP-based text analysis capabilities (multi-lingual sentiment, event and fact extraction), that can be used in conjunction with Predictive Modeling. Text Analytics for Surveys provides the ability to categorize survey responses using NLP-based capabilities for further analysis or reporting.
  • intelligentCAPTURE - from AGI-Information Management Consultants integrates data capturing of paper and digital text, translation, linguistic text analytics, search for libraries and documentation
  • Inxight – provider of text analytics, search, and unstructured visualization technologies. (Inxight was bought by Business Objects that was bought by SAP AG in 2008).
  • LanguageWare – text analysis libraries and customization software from IBM.
  • Language Computer Corporation – text extraction and analysis tools, available in multiple languages.
  • Lexalytics - provider of a text analytics engine used in Social Media Monitoring, Voice of Customer, Survey Analysis, and other applications.
  • LexisNexis – provider of business intelligence solutions based on an extensive news and company information content set. LexisNexis acquired DataOps to pursue search
  • Mathematica – provides built in tools for text alignment, pattern matching, clustering and semantic analysis.
  • Medallia - offers one system of record for survey, social, text, written and online feedback.
  • Megaputer Intelligence - offers text and data mining and analytics - developers of PolyAnalyst software.
  • NetOwl – suite of multilingual text and entity analytics products, including entity extraction, link and event extraction, sentiment analysis, geotagging, name translation, name matching, and identity resolution, among others.
  • Omniviz from Instem Scientific - Data mining and visual analytics tool.[2]
  • SAS – SAS Text Miner and Teragram; commercial text analytics, natural language processing, and taxonomy software used for Information Management.
  • Semantria - offers its services via API and Excel plugin. It is a spinoff of text-analysis software Lexalytics, but differs in that it is offered via API and Excel plugin, and in that it incorporates a bigger knowledge base and uses deep learning.
  • Smartlogic – Semaphore; Content Intelligence platform containing commercial text analytics, natural language processing, rule-based classification, ontology/taxonomy modelling and information vizualization software used for Information Management.
  • StatSoft – provides STATISTICA Text Miner as an optional extension to STATISTICA Data Miner, for Predictive Analytics Solutions.
  • Sysomos - provider social media analytics software platform, including text analytics and sentiment analysis on online consumer conversations.
  • Textalytics - Meaning as a Service: a set of text analytics APIs that offer vertical, high-level functionality targeted at specific usage scenarios: Semantic Publishing, Media Analysis, Voice of the Customer.
  • WordStat - Content analysis and text mining add-on module of QDA Miner for analyzing large amounts of text data.
  • Xpresso - XPRESSO, an engine developed by the Abzooba’s core technology group, is focused on the automated distillation of expressions in social media conversations.[3]
  • Thomson Data Analyzer – enables complex analysis on patent information, scientific publications and news.

Open source[edit]

  • Carrot2 – text and search results clustering framework.
  • GATE – General Architecture for Text Engineering, an open-source toolbox for natural language processing and language engineering
  • Gensim - large-scale topic modelling and extraction of semantic information from unstructured text (Python)
  • OpenNLP - natural language processing
  • Natural Language Toolkit (NLTK) – a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language.
  • RapidMiner with its Text Processing Extension – data and text mining software.
  • Unstructured Information Management Architecture (UIMA) – a component framework to analyze unstructured content such as text, audio and video, originally developed by IBM.
  • The programming language R provides a framework for text mining applications in the package tm.[4] The Natural Language Processing task view contains tm and other text mining library packages.[5]
  • The KNIME Text Processing extension.
  • KH Coder - For content analysis, text mining or corpus linguistics.
  • The PLOS Text Mining Collection[6]


  1. ^
  2. ^ Yang, Yunyun; Akers, Lucy; Klose, Thomas; Barcelon Yang, Cynthia (2008). "Text mining and visualization tools – Impressions of emerging capabilities". World Patent Information 30 (4): 280. doi:10.1016/j.wpi.2008.01.007. 
  3. ^ ":: Welcome to Abzooba ::". Retrieved 2013-10-13. 
  4. ^ Introduction to the tm Package: Text Mining in R
  5. ^ CRAN Task View: Natural Language Processing
  6. ^ "Table of Contents: Text Mining". PLOS.