= List of text mining methods =

Text mining methods are different forms of text mining whose usage is based on their suitability for a given data set. Text mining is the process of extracting data from unstructured text and finding patterns or relations. Below is a list of text mining methodologies.

- Centroid-based Clustering: Unsupervised learning method. Clusters are determined based on data points.
  - Fast Global K-Means: Made to accelerate Global K-Means.
  - Global K-Means: Global K-Means is an algorithm that begins with one cluster, and then divides into multiple clusters based on the number required.
  - K-Means: An algorithm that requires two parameters: K, a number of clusters, and a set of data.
  - FW-K-Means: Used with vector space model. Uses the methodology of weight to decrease noise.
  - Two-Level-K-Means: Regular K-Means algorithm takes place first. Clusters are then selected for subdivision into subclasses if they do not reach the threshold.
- Cluster Algorithm
  - Hierarchical Clustering
    - Agglomerative Clustering: Bottom-up approach. Each cluster starts small and then aggregates together to form larger clusters.
    - Divisive Clustering: Top-down approach. Large clusters are split into smaller clusters.
  - Density-based Clustering: A structure is determined by the density of data points.
    - DBSCAN
  - Distribution-based Clustering: Clusters are formed based on mathematical methods from data.
    - Expectation-maximization algorithm
- Collocation
- Stemming Algorithm
  - Truncating Methods: Removing the suffix or prefix of a word.
    - Lovins Stemmer: Removes longest suffix.
    - Porters Stemmer: Allows programmers to stem words based on their own criteria.
  - Statistical Methods: Statistical procedure is involved and typically results in affixes being removed.
    - N-Gram Stemmer: A set of n characters that are consecutive taken from a word
    - Hidden Markov Model (HMM) Stemmer: Moves between states are based on probability functions.
    - Yet Another Suffix Stripper (YASS) Stemmer: Hierarchal approach in creating clusters. Clusters are then considered a set of elements in classes and their centroids are the stems.
  - Inflectional & Derivational Methods
    - Krovetz Stemmer: Changes words to word stems that are valid English words.
    - Xerox Stemmer: Removes prefixes.
- Term Frequency
  - Term Frequency Inverse Document Frequency
- Topic Modeling
  - Latent Semantic Analysis (LSA)
  - Latent Dirichlet Allocation (LDA)
  - Non-Negative Matrix Factorization (NMF)
  - Bidirectional Encoder Representations from Transformers (BERT)
- Wordscores: First estimates scores on word types based on a reference text. Then applies wordscores to a text that is not a reference text to get a document score. Lastly, documents that are not referenced are rescaled to then compare to the reference text.
