Talk:String (computer science)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated Start-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

Other related topics[edit]

Anyone want to tackle - string (small version of rope), string (general chain of various things), string (music), etc. - not just the computer version.

Also, string theory of cosmology/physics.

Computing theory[edit]

The first paragraph seems too "busy" to me. What about replacing it with something like this?

A string (or string of characters) is a data type used in most programming languages to represent text, and is the focus of this article.
The computing term string is also used in a broader sense to group a sequence of entities; for example, tokens in a language grammar, or a sequence of states in automata. See the theory of computation.

This is a lot better. Also, I think the usage in computing theory could be expanded in its own paragraph: one starts with a finite alphabet, then considers all finite sequences consisting of letters from that alphabet (including the empty string) and defines concatentation of strings. The set of string with concatentation is then a monoid.


I think I wrote most the current paragraph and I agree your rewrite is better. Just do it! --drj


Ok, I'll move my text to the main article. I won't try to expand the second paragraph; I'm inclined to leave that to the computation article, or to whoever can concisely expand it without detracting from the rest of the page. --loh


I won't try to expand the second paragraph; I'm inclined to leave that to the computation article, or to whoever can concisely expand it without detracting from the rest of the page. --Hornlo

Other meaninngs[edit]

I think something definitely needs to be added about the other meanings of string. Ukulele is already linked to this page, which is somewhat confusing. Although, I'm not sure how much content can actually be provided for the other meanings. Maybe this article should be moved to String (computer science) or something, and this page be turned into a disambiguation page. B4hand

I propose renaming this article to Character string (computer science). Comments? - Bevo 17:03, 13 Jun 2004 (UTC)

I oppose. A string doesn't need to be literal string in general. -- Taku 18:44, Sep 25, 2004 (UTC)

Lexicographical order[edit]

The lexicographical order on Σ* is not a well-ordering (for example, what is the least element in a*b?), but only a total order. fudo 13:53, 2 August 2005 (UTC)

The least element of a*b is b. The lexicographical order on Σ* is indeed a well-ordering. --Pexatus 00:22, 19 March 2006 (UTC)
No. There is no least element of a*b if you use the alphabetical ordering a < b. Assume there is some least element m; this means that m = a^{k}b for some non-negative integer k. Note that n = a^{k+1}b is also in a*b. n agrees m in k positions, but the k + 1th position of n is less than the k + 1th position of m, so n < m, which contradicts the assumption that there exists a least element of a*b. Therefore, there is no least element of a*b. --Bkkbrad 16:28, 13 September 2006 (UTC)

strings, not characters[edit]

There are lots of references to this page throughout wikipedia for a "string" that is not a set of characters; a string of bits or bytes, for example. I think it is important that the article be cleaned-up to make it clear that "character strings" are the most common uses of the type, but the term might apply to vectors of data not representable by a string in a particular language. I've made a couple of edits in this direction, but I think some more effort needs to go into the issue. -- Mikeblas 22:39, 29 January 2006 (UTC)

Just dont forget that a vector has a fixed length, per definition (as an element in an N-dimensional space), while strings often has (chronologically) variable length. 83.255.35.89 (talk) 11:23, 4 March 2011 (UTC)
But (unfortunately?) the word "vector" in computer science is most often used to refer to a variable-sized storage (see std::vector), while the word "array" is used to refer to the fixed-sized objects you are talking about. (In linear algebra libraries the word "vector" is used for fixed-sized objects, but these also demonstrate unusual (for programming) properties such as addition doing a component-wise action rather than an append operation).Spitzak (talk) 17:09, 4 March 2011 (UTC)
Yes, I know, it's a disaster. The designers of the C++ STL must have been rather ignorant, or why would you create that kind of mess otherwise, with concepts turned upside down? They could have called it dynamic array, string of type, or whatever; the well defined term vector was really the worst possible choice and the worst kind of hijacking of words. If they actually knew what they did, I guess they must have been inspired by the arrogant "C-syntax" (B→C→C++→Java etc), which, when spread to the world of webb languages some 15 years ago, caused the equality symbol to suddenly lose its meaning in large circles, a symbol that has been established in both mathematics and everyday use since hundreds of years. Too many young (or uneducated) people are now using == and != instead of = and ≠ in any everyday context (mathematics next...?) and they would interpret the equation a=b as a definition/mutation/initializing of a...
I can't see why Wikipedia shouldn't do what it can to clarify backgrounds like this. I belive it is crucical to illustrate "misunderstandings" and unessescary discrepancy in terminology among branches of science, so more people can see that there are other conventions than the most vulgar ones that one may want to adhere to. It does not conflict with the goal of describing actual usage and terminology or with "following the sources", as there are many kinds of sources, and plenty of room for elaborations on Wikipedia.
(As a side note, while "addition" may mean several things (as you wrote), only algebraic superposition should use the + sign really; concatenation may use &, &&, ::, |, concat, or whatnot, at least in my world ;) Regards 83.255.32.149 (talk) 04:37, 5 March 2011 (UTC)
I agree, but you probably ought not to take it too far. "Character string" is usually implied, and the article shouldn't give the impression that "string" on its own is incorrect. For the most part, I think your edits were fine, although perhaps you could revert the edits to the string oriented languages section. More effort doesn't need to go into the issue, as that would just confuse the article. --StuartBrady 22:59, 29 January 2006 (UTC)
I was confused by this too. "String" always refers to a string of characters. Vectors of other things are lists, arrays, vectors, ... I'm curious as to what language it is where the word "string" is used in reference to lists of objects. Richard W.M. Jones 09:04, 1 May 2006 (UTC)
Yes. And keep in mind that WP article titles generally reflect the most common usage for a given term (unless it needs its own disambiguation page). In computer science, "string" most common refers to a string of characters. Other not-so-common meanings (e.g., bitstring) can be linked to in the "See also" section. — Loadmaster 18:35, 1 May 2007 (UTC)
My impression is that "string" virtually always means "string of characters". The only counter-example I know is that C++ std::string is based on a template and can be made to use any object. However I am almost certain this was done only to support bytes and "wchar" (16 bits, often mislabled "Unicode"). If it was not for "wchar" then they probably would not have made it a template. Since wchar is intended to store characters (or UTF-16) then the string is still a "string of characters". I would be interested if anybody has any real examples of usage of a std::basic_string template with any object for any purpose other than storing something that would be considered "characters".Spitzak (talk) 17:19, 4 March 2011 (UTC)

Null and NUL[edit]

We should decide on just one spelling and capitalization of null. I vote for two L's.24.186.138.188 02:03, 2 May 2006 (UTC)

"Null" generally refers to the "null character" (or a "null pointer"). "NUL" (all caps) is the mnemonic name (ASCII, EBCDIC, Unicode, etc.) of the "null character" code. — Loadmaster 18:35, 1 May 2007 (UTC)

Origin of the Term ?[edit]

Anyone know the history behind using the term "string" to mean a sequence of characters? I mean its not like a series of characters looks much like a ball of string... I assume it's origins are as a mathematical term, but it would be interesting to know how it came to such common use in computing.

Presumably it comes from the rather obvious expression "a string of characters" (as in "these go some characters stringing by"), equivalent to "a string of pearls" or "a sequence of characters" or other similar phrases. — Loadmaster (talk) 16:27, 9 February 2008 (UTC)

Trying to stop misuse of character encodings[edit]

The edits I keep trying are to stop a whole army of uninformed but well-meaning programmers who think strlen() should parse and count the Unicode code points in a string. This is a totally useless definition and causing no end of grief when systems do this. For some reason otherwise intelligent programmers turn into these complete morons when presented with UTF-8 and this is actually seriously damaging any ability to do internationalization. If anybody can think of a wording that says these answers are different and that the fixed-size one is "better" it would help a lot. The reverted-to wording implies that the number of characters is the more important attribute, which is wrong.Spitzak (talk) 04:10, 29 May 2009 (UTC)

StringBuilder .Net type[edit]

As well as the standard String type, .Net has a StringBuilder type. I think this is implemented as a linked list, but I'm not too sure. This would be worth adding to the Implementation section. N4m3 (talk) 09:35, 28 May 2011 (UTC)

Only finite strings?[edit]

I noticed this statement:

Although formal strings can have an arbitrary (but finite) length, the length of strings in real languages is often constrained to an artificial maximum. [emphasis mine]

I would like a citation on that, what about languages with lazy evaluation like Clojure and Haskell?

 cycle "Is this finite? ""Is this finite? Is this finite? Is this finite? Is this finite? ..."
 
 let shouting = 'a' : shouting in putStr shouting
 ⇒ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa..."

BiT (talk) 03:00, 8 December 2011 (UTC)

Reverse string[edit]

In the "Formal theory" section, it might be useful to mention that:

A string s = ab, composed of zero or more characters (here, 'a' and 'b') of the alphabet, is said to be the reverse of string t if t = ba. For example, if Σ = {0,1} the string 0011001 is the reverse of 1001100. The empty string and all strings of length 1 are reverses of themselves. A string that is the reverse of itself is also called a palindrome.

The problem is, I don't know for certain whether the terms "reverse string" or "string reversal" are correct or not. FWIW, practically all programming languages/libraries that provide this operation call it "reverse". Does anyone know what the proper term should be? (Inverse? Inversion? Opposite? Transpose?) — Loadmaster (talk) 17:05, 9 November 2012 (UTC)

Reverse would be the correct terminology, but as currently formulated your statement would either not be general enough (only working for two character strings) or, worse, likely be misinterpreted as stating that WORLDHELLO and LOWORLDHEL would be reverses of HELLOWORLD (instead of DLROWOLLEH.) —Ruud 23:09, 9 November 2012 (UTC)
How about this:
A string s = abc, composed of zero or more characters of the alphabet (here, 'a', 'b', and 'c'), is said to be the reverse of string t if t = cba. For example, if Σ = {0,1} the string 0011001 is the reverse of 1001100. A string that is the reverse of itself is also called a palindrome, which includes the empty string and all strings of length 1.
I don't see the confusion; it states that we're talking about a string of zero or more characters of the alphabet, and the example of 0011001 should make it additionally clear that we're talking about the ordering of the characters of the string, not substrings of the string. If you think this is still confusing, we could instead use HELLOWORLD and DLROWOLLEH, but this requires a larger symbol alphabet ({D,E,H,L,O,R,W}), which complicates the description somewhat. — Loadmaster (talk) 18:02, 13 November 2012 (UTC)
The problem is that you're giving a very specific example but state it in such a form that it—at least at first reading—appears to be a very general definition. You're either using too much formal machinery for what is an informal statement or, conversely, make a statement that not precise enough to be a formal definition. This is how one of my formal language textbooks defines a reverse:

The reverse of a string is obtained by writing the symbols in reverse order; if w is a string as shown above, then its reverse wR is

wR = an...a2a1.
Where the they explained "above" that a, b, c, ... denote elements from the alphabet Σ and u, v, w, ... strings over that alphabet. —Ruud 18:45, 13 November 2012 (UTC)
I went ahead and added a "Reversal" subsection to the article, with (hopefully) simplified language. — Loadmaster (talk) 22:25, 15 November 2012 (UTC)