Most common words in Spanish

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Below are two estimates of the most common words in Modern Spanish. Each estimate comes from an analysis of a different text corpus. A text corpus is a large collection of samples of written and/or spoken language, that has been carefully prepared for linguistic analysis. To determine which words are the most common, researchers create a database of all the words found in the corpus, and categorise them based on the context in which they are used.

The first table lists the 100 most common word forms from the Corpus de Referencia del Español Actual (CREA), a text corpus compiled by the Real Academia Española (RAE). The RAE is Spain's official institution for documenting, planning, and standardising the Spanish language. A word form is any of the grammatical variations of a word.

The second table is a list of 100 most common lemmas found in a text corpus compiled by Mark Davies and other language researchers at Brigham Young University in the United States. A lemma is the primary form of a word—the one that would appear in a dictionary. The Spanish infinitive tener ("to have") is a lemma, while tiene ("has")—which is a conjugation of tener—is a word form.

Real Academia Española[edit]

The list below comes from "1000 formas más frecuentes" (English: 1000 most frequent word forms)", a list published by the Real Academia Española (RAE) from analysis of more than 160 million word forms found in the Corpus de Referencia del Español Actual (English: Reference Corpus of Current Spanish), or CREA. CREA is a computerised corpus of texts written in Spanish, and of transcripts of spoken Spanish. It includes books, magazines, and newspapers with a wide variety of content, as well as transcripts of spoken language from radio and television broadcasts and other sources. All the works in the collection are from 1975 to 2004. CREA includes samples from all Spanish-speaking countries.[1]

The list of "1000 most frequent word forms" comes from an analysis of CREA version 3.2.[2] Plurals, verb conjugations, and other inflections are ranked separately. Homonyms, however, are not distinguished from one another. CREA 3.2 was published in June 2008.[1]

Most frequent word forms out of ~160 million words
(RAE 2008)
Rank Word form Occurrences Part of speech Translation
1 de 9,999,518 preposition of; from
2 la 6,277,560 article, pronoun the; third person feminine singular pronoun
3 que 4,681,839 conjunction that, which
4 el 4,569,652 article the
5 en 4,234,281 preposition in, on
6 y 4,180,279 conjunction and
7 a 3,260,939 preposition to, at
8 los 2,618,657 article, pronoun the; third person masculine direct object
9 se 2,022,514 pronoun -self, oneself (reflexive)
10 del 1,857,225 preposition from the
11 las 1,686,741 article, pronoun the; third person feminine direct object
12 un 1,659,827 article a, an
13 por 1,561,904 preposition by, for, through
14 con 1,481,607 preposition with
15 no 1,465,503 adverb no; not
16 una 1,347,603 article a, an, one
17 su 1,103,617 possessive his/her/its/your
18 para 1,062,152 preposition for, to, in order to
19 es 1,019,669 verb is
20 al 951,054 preposition to the
21 lo 866,955 article, pronoun the; third person masculine direct object
22 como 773,465 conjunction like, as
23 más 661,696 adjective more
24 o 542,284 conjunction or
25 pero 450,512 conjunction but
26 sus 449,870 possessive your
27 le 413,241 pronoun third person indirect object
28 ha 380,339 verb he/she/it has [done something]; you (formal) have [done something]
29 me 374,368 pronoun me
30 si 327,480 conjunction if, whether
31 sin 298,383 preposition without
32 sobre 289,704 preposition on top of, over, about
33 este 285,461 adjective this
34 ya 274,177 adverb already; still
35 entre 267,493 preposition between
36 cuando 257,272 conjunction when
37 todo 247,340 adjective all, every
38 esta 238,841 adjective this
39 ser 232,924 verb to be
40 son 232,415 verb they are, you (pl.) are
41 dos 228,439 number two
42 también 227,411 adverb too, also, as well
43 fue 223,791 verb was
44 había 223,430 verb I/he/she/it/there was (or used to be)
45 era 219,933 verb was
46 muy 208,540 adverb very
47 años 203,027 noun
(masculine)
years
48 hasta 202,935 preposition until
49 desde 198,647 preposition from; since
50 está 194,168 verb is
51 mi 186,360 possessive my
52 porque 185,700 conjunction because
53 qué 184,956 pronoun what?; which?; how adjective
54 sólo 170,552 adverb only, solely
55 han 169,718 verb they/you (pl.) have [done something]
56 yo 167,684 pronoun I
57 hay 164,940 verb there is/are
58 vez 163,538 noun
(feminine)
time, instance
59 puede 161,219 verb can
60 todos 158,168 adjective all; every
61 así 155,645 adverb like that
62 nos 154,412 pronoun us
63 ni 153,451 conjunction, adverb neither; nor; no even
64 parte 148,750 noun
(masculine / feminine)
part; message
65 tiene 147,274 verb has
66 él 139,080 pronoun
(masculine)
he, it
67 uno 136,020 number one
68 donde 132,077 preposition where
69 bien 130,957 adjective fine, well
70 tiempo 130,896 noun
(masculine)
time; weather
71 mismo 130,746 adjective same
72 ese 127,976 pronoun that
73 ahora 125,661 adverb now
74 cada 124,558 determiner each; every
75 e 123,729 conjunction and
76 vida 123,491 noun
(feminine)
life
77 otro 121,983 adjective other, another
78 después 121,746 preposition after
79 te 120,052 pronoun to you, for you; yourself
80 otros 119,500 pronoun others
81 aunque 115,556 conjunction though, although, even though
82 esa 115,377 adjective that
83 eso 114,523 pronoun that
84 hace 114,507 verb he/she/it does/makes
85 otra 113,982 adjective, pronoun other; another
86 gobierno 113,011 noun
(masculine)
government
87 tan 112,471 adverb so
88 durante 112,020 preposition during
89 siempre 111,557 adverb always
90 día 110,921 noun
(masculine)
day
91 tanto 110,679 adjective, adverb so much
92 ella 110,620 pronoun she, her; it
93 tres 109,542 number three
94 108,631 noun, pronoun yes; reflexive pronoun
95 dijo 108,471 verb said; told
96 sido 107,352 past participle been
97 gran 106,991 adjective large, great, big
98 país 104,568 noun
(masculine)
country
99 según 104,204 preposition as; according to
100 menos 103,498 adjective less; fewer

Mark Davies[edit]

In 2006, Mark Davies, an associate professor of linguistics at Brigham Young University, published his estimate of the 5000 most common words in Modern Spanish. To make this list, he compiled samples only from 20th-century sources—especially from the years 1970 to 2000. Most of the sources are from the 1990s. Of the 20 million words in the corpus, about one-third (~6,750,000 words) come from transcripts of spoken Spanish: conversations, interviews, lectures, sermons, press conferences, sports broadcasts, and so on. Among the written sources are novels, plays, short stories, letters, essays, newspapers, and the encyclopedia Encarta. The samples, written and spoken, come from Spain and at least 10 Latin American countries. Most of the samples were previously compiled for the Corpus del Español (2001), a 100 million-word corpus that includes works from the 13th century through the 20th.[3][4]

The 5000 words in Davies' list are lemmas.[5] A lemma is the form of the word as it would appear in a dictionary.[6] Singular nouns and plurals, for example, are treated as the same word, as are infinitives and verb conjugations. The table below includes the top 100 words from Davies' list of 5000.[7][8] This list distinguishes between the definite articles lo and la and the pronouns lo and la; all are ranked individually. The adjectives ese and esa are ranked together (as are este and esta) ), but the pronoun eso is separate. All conjugations of a verb are ranked together.

A highlighted row indicates that the word was found to occur especially frequently in samples of spoken Spanish.[9]

Most frequent lemmas out of ~20 million words
(Davies 2006)
Rank Lemma Occurrences Part of speech Translation
1 el / la 2,037,803 article the
2 de 1,319,834 preposition of, from
3 que 662,653 conjunction that, which
4 y 562,162 conjunction and
5 a 529,899 preposition to, at
6 en 507,233 preposition in, on
7 un 434,022 article a, an
8 ser 374,194 verb to be
9 se 329,012 pronoun -self, oneself (reflexive)
10 no 257,365 adverb no
11 haber 196,962 verb to have
12 por 190,975 preposition by, for, through
13 con 184,597 preposition with
14 su 187,810 adjective his, her, their, your
15 para 126,061 preposition for, to, in order to
16 como 106,840 conjunction like, as
17 estar 106,429 verb to be
18 tener 106,642 verb to have
19 le 98,211 pronoun third person indirect object
20 lo 91,035 article the
21 lo 92,519 pronoun third person masculine direct object
22 todo 88,057 adjective all, every
23 pero 82,435 conjunction but, yet, except
24 más 92,352 adjective more
25 hacer 81,619 verb to do; to make
26 o 82,444 conjunction or
27 poder 76,738 verb to be able to, can
28 decir 79,343 verb to tell, say
29 este / esta 80,544 adjective this
30 ir 70,352 verb to go
31 otro 61,726 adjective other, another
32 ese / esa 60,989 adjective that
33 la 55,523 pronoun third person feminine direct object
34 si 53,608 conjunction if, whether
35 me 95,577 pronoun me
36 ya 46,778 adverb already, still
37 ver 45,854 verb to see
38 porque 44,500 conjunction because
39 dar 40,233 verb to give
40 cuando 39,726 conjunction when
41 él 38,597 pronoun he
42 muy 39,558 adverb very, really
43 sin 40,432 preposition without
44 vez 35,286 noun
(feminine)
time, occurrence
45 mucho 36,391 adjective much, many, a lot
46 saber 37,092 verb to know
47 qué 42,000 pronoun what?; which?; how adjective
48 sobre 35,038 preposition on top of, over, about
49 mi 45,636 adjective my
50 alguno 30,485 adjective / pronoun some; someone
51 mismo 29,569 adjective same
52 yo 54,635 pronoun I
53 también 33,348 adverb also
54 hasta 29,506 preposition / adverb until, up to; even
55 año 33,053 noun
(masculine)
year
56 dos 27,733 number two
57 querer 28,696 verb to want, love
58 entre 30,756 preposition between
59 así 24,832 adverb like that
60 primero 26,553 adjective first
61 desde 25,288 preposition from, since
62 grande 25,963 adjective large, great, big
63 eso 31,636 pronoun
(neuter gender)
that
64 ni 24,261 conjunction not even, neither, nor
65 nos 26,349 pronoun us
66 llegar 22,878 verb to arrive
67 pasar 22,466 verb to pass; to happen; to spend time
68 tiempo 22,432 noun
(masculine)
time, weather
69 ella(s) 24,770 pronoun she; (plural) them
70 33,828 adverb yes
71 día 24,715 noun
(masculine)
day
72 uno 21,407 number one
73 bien 21,589 adverb well
74 poco 20,986 adjective / adverb little, few; a little bit
75 deber 22,232 verb should, ought to; to owe
76 entonces 23,548 adverb so, then
77 poner 20,330 verb to put (on); to get [adjective]
78 cosa 23,943 noun
(feminine)
thing
79 tanto 20,531 adjective much
80 hombre 20,292 noun
(masculine)
man, mankind, husband
81 parecer 19,964 verb to seem, to look like
82 nuestro 20,666 adjective our
83 tan 19,002 adverb such, a, too, so
84 donde 18,852 conjunction where
85 ahora 21,030 adverb now
86 parte 20,319 noun
(feminine)
part, portion
87 después 20,229 adverb after
88 vida 18,045 noun
(feminine)
life
89 quedar 18,152 verb to remain, to stay
90 siempre 17,689 adverb always
91 creer 21,257 verb to believe
92 hablar 19,006 verb to speak, to talk
93 llevar 17,062 verb to take, to carry
94 dejar 18,185 verb to let, to leave
95 nada 19,365 pronoun nothing
96 cada 17,155 adjective each, every
97 seguir 16,104 verb to follow
98 menos 15,527 adjective less, fewer
99 nuevo 17,381 adjective new
100 encontrar 15,556 verb to find

See also[edit]

Notes[edit]

  1. ^ a b "CREA". RAE.es (in Spanish). Real Academia Española. Retrieved 2017-07-13.
  2. ^ "Corpus de Referencia del Español Actual (CREA) — Listado de frecuencias". RAE.es (in Spanish). Real Academia Española. Retrieved 2017-07-13.
  3. ^ Davies (2006), p. 2–3
  4. ^ "El Corpus del Español". corpusdelespanol.org. Retrieved 2017-07-13.
  5. ^ Davies (2006), pp. 4–6
  6. ^ Davies (2006), p. 4
  7. ^ Davies (2006), pp. 12–14
  8. ^ "Top Spanish Vocabulary". Vistawide World Languages & Cultures. Retrieved 2017-07-13.
  9. ^ Davies (2006), p. 9

References[edit]

External links[edit]