Talk:Associative array

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

Confussed with Ordering vs Sorting[edit]

I find this phrase confusing:

Ordering preservation: Balanced binary trees and skip lists preserve ordering — allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently (they require the data to be sorted in a separate step).

I feel that the phrase should refer to the preservation of the "insertion order" when iterating over the collections of Keys or Values. But "key is nearest to a given value" suppose a kind order in the set of keys, and this is not part of the contract for the associative array abstract type. Note that Java JDK1.5 includes the class LinkedHashMap that preserves insertion order when browsing the map.

For the same reason I don't see the point to metion "range querries" for associative arrays, as they are not form part of the contract. Maybe you could include among the "Specialized representations" the ones with sorted key sets. I mean, to define the abstract subtype "Sorted Associative Array" that would allow for range querries, and would exclude the simple has table implemetations. —Preceding unsigned comment added by 83.43.153.47 (talk) 20:53, 20 November 2009 (UTC)


Hashes[edit]

Please don't confuse hash-tables with hashes. They are very different things. —Preceding unsigned comment added by 203.222.167.145 (talk) 02:19, 8 September 2009 (UTC)

Lisp[edit]

How is a value accessed in Lisp? --Hirzel

Simple answer: For a hash table, (gethash key table). For an alist, (cdr (assoc key alist)).
Complex answer: gethash actually returns two values. (Functions can do that in Lisp -- and no, it isn't the same as returning a pair or list of two elements as a value. If you don't ask for the second value, it is discarded.) The first is the value of the key in the table, or NIL if there is not one. The second is a boolean, T if the key was in the table, and NIL if it was not. This is so you can tell a NIL that was a hash value from a NIL that indicates failure.
Example:
(multiple-value-bind (result success)
   (gethash 'moose *some-hash-table*)
   (if success
      (print result)
      (print "Not found.")))
That will print "Not found" if there wasn't a key moose in the hash table *some-hash-table*. --FOo 22:38 16 Jul 2003 (UTC)

It is not correct to say that Lisp uses alists for associative arrays. Common Lisp has a native, typed, hash table type which is the closest match to other languages' dictionaries or hashes. Alists are simply a shape of list for which there are some associative-like operations; they're used where a hash table would be too much overhead or complexity, not as a replacement for more general associative arrays. --FOo 16:16 16 Jul 2003 (UTC)

My changes[edit]

I've done some pretty severe reorganization of this page. As usual I wanted to separate the theoretical stuff from the language-specific stuff. I also added a lot more about data structures used to implement associative arrays and the relative tradeoffs, and some links to the STL pages about the abstract data type and their hash table/red-black tree map implementations. Added a little note about Lua also. I hope my splitting apart and editing of the associative list section pleased the one person above, and didn't anger anyone. I'd appreciate any feedback on my changes.

Derrick Coetzee 03:01, 25 Nov 2003 (UTC)

---

To 62.134.120.20, if you read this, I feel like many of the links you added were either:

  • highly ambiguous (like "time")
  • misleading, incomplete, or pointing to unrelated topics (as in "real time")
  • incorrect, referring to an article which does exist by the wrong name
  • redundant, referring to an article recently referred to again
  • rarely used phrases which only happened to occur in the text

While I encourage effective cross-linking of material, bad cross-linking frustrates and obscures. Please exercise more discretion in the future when adding links. I refer you to some of Wikipedia's policy on links:

http://en2.wikipedia.org/wiki/Wikipedia:Make_only_links_where_relevant

Derrick Coetzee 08:28, 6 Dec 2003 (UTC)

PHP[edit]

Where is the PHP example??

Sign your edits. Wouter Lievens 08:56, 31 May 2005 (UTC)

"In PHP all arrays can be associative, except that the keys are limited to integers and strings and can only be a single level of subscripts."

Maybe I'm just reading this wrong, but can someone explain to me what this above statement means? --CheesyPuffs144 (talk) 02:27, 15 June 2008 (UTC)

References as Keys[edit]

In a tree where each node is an associative array, the keys can be references to other nodes in the tree rather than using a textual name. This can be implimented with the Tie function.

I'm implimenting a distributed one of these and wondering if anyone knows of any other implimentations or documentation about this kind of data structure, or whether it goes by any specific names etc. Nad 21:21, 31 January 2006 (UTC)

Hmmm, Do you mean that the values can be references to other associative sub-arrays, rather than the keys? For example, if I had to represent a Unix directory tree of just sub-directories, files and links to sub-directories then I would first walk the directory tree ignoring any links. Each directory would become an associative array,(AA), with directory entry names as keys and either file 'objects' as values for each file; or sub-AAs to represent sub-directories. A second walk of the directory tree would allow my to insert references to appropriate AA's as values for keys representing directory links.

In Python, AA values can be any object, so an AA with a recursive value example is just:

*** Python 2.4.2 (#67, Jan 17 2006, 15:36:03) [MSC v.1310 32 bit (Intel)] on win32. ***
>>> dir1 = { 'file1': 'file1' }      # AA creation with one file
>>> dir1
{'file1': 'file1'}
>>> dir1['link1'] = dir1             # The recursive link
>>> dir1
{'file1': 'file1', 'link1': {...}}
>>> dir1['link1']                    # Accesses through the reference
{'file1': 'file1', 'link1': {...}}
>>> dir1['link1']['link1']
{'file1': 'file1', 'link1': {...}}
>>> dir1['link1']['link1']['file1']
'file1'

Again in Python; all non-mutable types are able to be AA keys. Other user-defined types can be used as AA keys if they define a hashing method. (The main mutable built-in type affected by the restriction is the Python list type. Lists are easily converted to the non-mutable tuple type for use as dictionary keys). --Paddy 07:39, 30 April 2006 (UTC)

for key in phonebook[edit]

I note that someone change the Python example loop to:

for name in phonebook:

instead of:

for key in phonebook

I deliberately choose key to emphasize that iterating over the phonebook dict iterates over all its keys.


appending[edit]

Someone should describe how to insert elements into a hash table.

Delete reference to TAWK?[edit]

Thomson Automation are nolonger selling there version of AWK. (follow the link). Sould it be removed from the AWK entry --Paddy 17:22, 22 September 2006 (UTC)

<source lang="PostScript"> ... </source>[edit]

Or at least I would if I could, but "PostScript" is not a recognised source language...

actionscript, ada, apache, applescript, asm, asp, autoit, bash, blitzbasic, bnf, c,
c_mac, caddcl, cadlisp, cfdg, cfm, cpp, cpp-qt, csharp, css, d, delphi, diff, div,
dos, eiffel, fortran, freebasic, gml, groovy, html4strict, idl, ini, inno, io, java,
java5, javascript, latex, lisp, lua, matlab, mirc, mpasm, mysql, nsis, objc, ocaml,
ocaml-brief, oobas, oracle8, pascal, perl, php, php-brief, plsql, python, qbasic,
rails, reg, robots, ruby, sas, scheme, sdlbasic, smalltalk, smarty, sql, tcl, text,
thinbasic, tsql, vb, vbnet, vhdl, visualfoxpro, winbatch, xml, xpp, z80

Any suggestions how to remedy this? nemo 11:35, 4 October 2007 (UTC)

Removed "Map" from the "Data structures" section[edit]

I undid a good-faith edit by User:Irishjugg that added a new section titled "Maps" and went as follows:

The map (associative array) is a structure with no stored order, similar to an orderless list. Values are stored by certain key values in the form mapname[keyvalue] = value. In C++ implementations, such as the STL, the maps are templated and can have values of any standard or user defined type. Insertion into a map is is O(1) time, while for accessing elements it is O(nlogn) as it mush search for the keys. The STL implementation uses a Red Black tree to handle the key searches which ensures worst case O(nlogn). Maps are at the heart of associative arrays.

I have a number of issues with this addition, as follows:

  • A map is just another name for an associative array, so saying you can implement an associative array as a map is pretty vacuous
  • The most insightful point in this section, that the STL map is implemented as a red-black tree, was already covered; mention of self-balancing trees is under "Efficient representations"
  • Insertions into a tree (if you don't provide a good hint to the insert function) is O(log n), not O(1)
  • Accessing elements is O(log n), not O(n log n)
  • Saying that the STL implementation is red black trees is not really correct, since an implementor is free to choose any structure that satisfies the appropriate complexity requirements. An AVL tree, various variants of B-trees, skip lists, and probably many other structures would all be reasonably appropriate as a backing store, and would result in a reasonable STL implementation.

EvanED 22:07, 10 December 2007 (UTC)

Edited again for a minor fix, and additional objection, EvanED 22:10, 10 December 2007 (UTC)

That all sounds correct to me. Dcoetzee 03:06, 16 June 2008 (UTC)


Associative array X Regular array[edit]

The article says the following about the difference:

While a regular array maps an index to an arbitrary data type such as integers, other primitive types, or even objects, an associative array's keys can be arbitrarily typed. The values of an associative array do not need to be the same type, although this is dependent on the programming language.

However, in my opinion, the difference between arrays and maps/dictionaries/hash tables lies on keys, not on values. At least in the programming languages I've programmed on so far, a traditional array is (conceptually) a special kind of map where indices are integers that are immediately consecutive (without "holes" or "jumps").

Comments? --Antonielly (talk) 19:06, 20 October 2008 (UTC)

Association lists[edit]

The section under Association lists defines these to be a linked list of key-value pairs. This is consistent with the common definition of the term in Lisp/Scheme -- for example, [1]. The DADS page at [2] notes that the term is primarily associated with Lisp and the AI community.

User:Pgr94 recently made an edit [3] that doubts this, and the edit comment states that "decent implementations use balanced trees which give O(log n) look up time". However, the article itself lays out a contrast between efficient implementations -- balanced trees and hash tables -- in the immediately preceding section, and the inefficient linked-list-based implementation commonly called an association list. The DADS page, cited by User:Pgr94, makes no specific claim as to efficiency of an association list, and notes that they are often much smaller than general dictionaries. Indeed, an a-list is semantically equivalent to a tree-based dictionary, and the first page cited above notes that an a-list in practice (in one implementation) is in fact a true linked list.

There are two possible solutions to this -- either the claim that an association list might be tree-based is wrong, in which case User:Pgr94's edit should be reverted, or else the rest of the article needs to be made consistent with this new fact, since its structure sets up a contrast between linear-time and log-time implementations. My understanding of a-lists leans toward the former, i.e., that a true a-list is always a linked list and anything else is a dictionary but not an a-list. Thoughts?

cfallin|(talk) 11:33, 21 December 2008 (UTC)

I am questioning whether association lists have to be linked lists.
The DADS page says "dictionary" and "association list" are synonyms.
It would be good to have some more sources, specifically data-structure textbooks. Unfortunately, the books I have to hand don't mention the topic. pgr94 (talk) 12:03, 21 December 2008 (UTC)
Unfortunately I couldn't find anything in my data structures book (Cormen et al, Intro to Algorithms 2nd ed) either. But here's a mention of association lists in an O'Reilly book on Haskell: Real World Haskell, pg 299 (Google Books preview). This states "An association list is just a normal list containing (key, value) tuples." My understanding of the DADS page connection between association lists and dictionaries is that an association list and a dictionary are equivalent at the semantic level -- thus, the operations listed on the page are implemented by both -- but that association list implies a particular underlying structure, namely, a simple list. The two citations I've given -- the O'Reilly book and the CMU lisp page -- both support this common usage. Here's another Lisp page that describes an alist specifically as a true list: LISP FAQ part 2 section 2. Again, I'd be interested in any examples of a tree-based alist you could find (some casual Googling didn't turn up anything on my end). If not, I'd be glad to incorporate some of these citations into the article and make the Lisp connection more clear. cfallin|(talk) 22:29, 21 December 2008 (UTC)
There are two definitions of "assocation list" floating around. One is a synonym for associative array; the other refers to a specific implementation of an associative array usign a simple unordered linked list of key-value pairs. I think the latter is more useful, since otherwise there is no simple term for such an implementation, but we could potentially mention both, at the risk of introducing confusion. Dcoetzee 22:04, 22 December 2008 (UTC)
I made an edit that will hopefully clarify this -- thanks to both of you for the input. cfallin (talk) 03:19, 24 December 2008 (UTC)
I'm not sure. An association list is, by name, a sort of list, and certainly not a hash table or a tree. I would discourage readers from confusing the abstract data type "associative array" from any of its common implementations -- including association lists, hash tables, or various sorts of trees. --FOo (talk) 07:20, 24 December 2008 (UTC)
You're right. I was hesitant to force this at first, but the distinction between the semantic level (at which an alist and a dictionary are synonyms) and implementation level is what I was arguing above. Given this and the fact that pgr94's only objection was a citation of the DADS page, which makes no claim on implementation, I've brought the paragraph into close alignment with the original, but with a few citations. cfallin (talk) 21:51, 24 December 2008 (UTC)

Naming discussion (assoc array vs. dict vs. hash etc)[edit]

So far as I can tell, Wikipedia has decided/ordained that the canonical name for the abstract data type dictionary/map/associative array is "associative array". In my opinion, this decision should be revisited. Has this been discussed before, and if so, can someone point me to it?

In any case, here's the list of pros and cons I see for the current term:

pros:

  • Unambiguous. There are no other meanings for "associative array", so it doesn't need to be clarified when mentioned in other articles.

cons:

  • The only language/community that uses the term "associative array" is the PHP community.
  • Each of the terms "dictionary" and "map" are used far more widely than "associative array", both in the software engineering and computer science fields.
  • The term is misleading in that arrays are a completely different kind of data structure. Novices tend to gloss over this distinction because in PHP and Javascript you can use arrays with arbitrary keys, so they are really dictionaries.

I propose that we rename this article to "Map (computer science)" and redirect "Associative Array" to that.

Just to list the alternatives, right off the bat we can say that "Hashtable" is not quite right, because it implies a particular implementation of the abstract data type.

The next runner up is "Dictionary (computer science)", which I think is a very strong candidate, since it seems to see the most use in programming language standard libraries (check the list of terms used in language standard libraries in the article). However, I don't think it sees much use in the computer science field, and while I personally prefer the name "dictionary" in my own discussions, I feel like "map" more correctly refers to the abstract data type and less to the specific task of mapping string keys to values.

How do other people feel about this?

Rkleckner (talk) 18:16, 14 April 2010 (UTC)

Wikipedia is of two minds on the topic, since the comparison article is Comparison of programming languages (mapping). For my part, I think "mapping" is an excellent term, and would support a rename to "Mapping data type" (by parallel with Array data type).—chaos5023 (talk) 18:19, 14 April 2010 (UTC)
"Map" is a term with too many meanings already, even in CS alone. Consider memory mapping (mmap), character map, bitmap, mipmap, key map (xkbmap), and so on. Most of those have nothing to do with the data structure. "Associative array" makes it absolutely clear we're talking about a specific data structure; while we could talk about another data structure to implement the map of something (example: an actual geographic map). Niczar ⏎ 18:57, 14 April 2010 (UTC)
It's very true that associative array is totally unambiguous. However, I don't think it is the most popular, especially in the field of computer science. While it may be best for the purposes of Wikipedia to use the unambiguous term, it promotes this term, which I believe is confusing and inhibits understanding. I'm more conflicted about it now than when I started, so I'm just going to put in my two cents and let the rest of you decide what to do. Rkleckner (talk) 19:08, 14 April 2010 (UTC)
mmap maps virtual addresses onto something else, charmaps map integers to glyphs, keymaps map keys to signals. "map" is not ambiguous at all, it's a well understood frequently used term in computer science. 99.224.97.6 (talk) —Preceding undated comment added 06:36, 23 May 2011 (UTC).
FWIW, though, "associative array" has seen a lot more usage than just in PHP; the term has a lot of history. In all likelihood it's the name of the article (and is used in PHP docs) because it's probably the oldest term for the concept. —chaos5023 (talk) 18:22, 14 April 2010 (UTC)
I think the term "association list" has a lot of history, but I don't think "associative array" has as much. I take back what I said about it being restricted to the PHP community, though. there is more usage of the term, for example in the D programming language. Rkleckner (talk) 19:08, 14 April 2010 (UTC)
I'd like to see some actual statistics correlated with popularity before making such changes. --Cybercobra (talk) 18:49, 14 April 2010 (UTC)
I'm pretty happy with "associative array". It is an old, standard term. Gafter (talk) 00:04, 15 April 2010 (UTC)
Dictionary is also used in CS in the sense "data dictionary", which is distinct from an ADT. 165.146.42.194 (talk) 11:54, 15 April 2010 (UTC)

Redirects are a bit of a mess[edit]

Hi. Between this article ("Associative array") and the article Attribute–value pair, some redirects are a bit messy:

These four redirects above currently point to Associative array.

These nine redirects above currently point to Attribute–value pair.

Previously, some of these redirects pointed to NoSQL, which I felt wasn't appropriate. This all seems a bit messy. Perhaps someone could take a look? --MZMcBride (talk) 19:20, 27 January 2013 (UTC)