Author profiling

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Author profiling is a method of analyzing a given number of texts to try to uncover various characteristics of the author (e.g. age and gender) based on stylistic- and content-based features[1].


Automatic Authorship Identification (AAI) has existed for almost 120 years. Thomas Corwin Mendenhall was the first to examine works of Francis Bacon, William Shakespeare, and Christopher Marlowe aiming to detect quantitative stylistic differences using word length.[2] Since then, things have changed rapidly due to the development of technology.

There are three major fields in AAI: authorship attribution, author identification, and author profiling. In the first two, the goal is to recognize the author from a set of authors, while in author profiling, the goal is to find specific characteristics of the author, based on stylistic- or content-based features.[3][4]

The author profiling task is a yet unsolved problem, due to its difficulty. It has been studied by many researchers and, while some show great progress and good results, it still has many unexplored areas and room for improvement. Through the organizational efforts of PAN,[clarification needed] many teams around the globe try every year to find the characteristics of authors.[5][6][7]

Characteristics vary between approaches, but age and gender are usually among them.[8][9][10][11][12] Other personality traits have included the zodiac and the occupation of the author.[5][10]


  1. ^ Álvarez-Carmona, Miguel Á., et al. "Evaluating topic-based representations for author profiling in social media". Ibero-American Conference on Artificial Intelligence. Springer, Cham, 2016. p. 151-162.
  2. ^ Mendenhall, Thomas Corwin. "The characteristic curves of composition." Science (1887): 237-249.
  3. ^ Mikros, George K., and Kostas Perifanos. "Authorship Attribution in Greek Tweets Using Author's Multilevel N-Gram Profiles." 2013 AAAI Spring Symposium Series. 2013.
  4. ^ Stamatatos, Efstathios. "A survey of modern authorship attribution methods." Journal of the American Society for information Science and Technology 60.3 (2009): 538-556.
  5. ^ a b Rangel, Francisco, et al. "Overview of the 3rd Author Profiling Task at PAN 2015." CLEF. 2015.
  6. ^ Rangel, Francisco, et al. "Overview of the 2nd author profiling task at pan 2014." CEUR Workshop Proceedings. Vol. 1180. CEUR Workshop Proceedings, 2014.
  7. ^ Rangel, Francisco, et al. "Overview of the author profiling task at pan 2013." CLEF Conference on Multilingual and Multimodal Information Access Evaluation. CELCT, 2013.
  8. ^ Argamon, Shlomo, et al. "Mining the blogosphere: Age, gender and the varieties of selfexpression." First Monday 12.9 (2007).
  9. ^ Nguyen, Dong-Phuong, et al. "" How old do you think I am?" A study of language and age in Twitter." (2013).
  10. ^ a b Schler, Jonathan, et al. "Effects of Age and Gender on Blogging." AAAI Spring Symposium:Computational Approaches to Analyzing Weblogs. Vol. 6. 2006.
  11. ^ Argamon, Shlomo, et al. "Gender, genre, and writing style in formal written texts." TEXT-THE HAGUE THEN AMSTERDAM THEN BERLIN- 23.3 (2003): 321-346.
  12. ^ Koppel, Moshe, Shlomo Argamon, and Anat Rachel Shimoni. "Automatically categorizing written texts by author gender." Literary and Linguistic Computing 17.4 (2002): 401-412.