Talk:Comparison of programming languages (string functions)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated B-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.

C function toupper() in UpperCase[edit]

This is misleading in the article. C doesn't have a function to uppercase a whole string. toupper() takes and returns an integer as its arguments, NOT strings. It's prototype:

int toupper(int c);

If c is a lowercase letter (a-z), topupper() returns the uppercase version (A-Z). Otherwise toupper() returns c unchanged. toupper() does not convert international characters (those with ASCII codes over 0x80), like ă or ç. To uppercase a whole string you need to write a function something like this:

#include <ctype.h> //standard C header file with the prototype of toupper()

void UpperCaseAString(char *theString)//string is a pointer to the first char of the string you want to uppercase.


char *myCharPtr = theString;//myCharPtr is a pointer to char - innitialize it to theString

while(*myCharPtr != '\0')//C uses-null terminated strings. *is what's pointed to by myCharPtr
    *myCharPtr = toupper(*myCharPtr);
    myCharPtr ++; //myCharPtr is a pointer to type char so it will be incremented by sizeof(char).

In C strings are essentially pointers to a character and they end where there is a NULL ('\0') character. It would be worthwhile to explain what strings are in different languages.Senor Cuete (talk) 03:41, 10 May 2008 (UTC)Senor Cuete

The 1. should appear as a pound sign and the box is put there by Wiki's text engine. I didn't type it like that.Senor Cuete (talk) 03:44, 10 May 2008 (UTC)Senor Cuete

The <source lang="...">...</source> tag should fix it. Ghettoblaster (talk) 12:43, 10 May 2008 (UTC)

Compare (integer result, fast/non-human ordering)[edit]

In the table row for C, why would you go through the hassle of writing your own function when you could call the C function strncmp?

#include <string.h>

int strncmp(const char *s1, const char *s2, size_t n);

Senor Cuete (talk) 00:52, 16 May 2008 (UTC)Senor Cuete


Shouldn't the table row for C just mention the C function strncpy?

#include <string.h>

char *strncpy(char *s1, const char *s2, size_t n);

Why concatenate when you can copy?Senor Cuete (talk) 00:53, 16 May 2008 (UTC)Senor Cuete

Because strncpy() will not copy a null-terminator if the string is n or more characters long. --Spoon! (talk) 12:13, 16 May 2008 (UTC)

strings vs lists[edit]

"In both Prolog and Erlang, a string is represented as a list (of character codes), therefore all list-manipulation procedures are applicable, though the latter also implements a set of such procedures that are string-specific."

I think this is the same for Haskell, should it also be noted? —Preceding unsigned comment added by (talk) 00:20, 28 June 2008 (UTC)

Additional procedure/operators[edit]

Some further string manipulations for consideration:

  • substring append & prepends: eg in python: s+="ABD"
  • replace substring:
    1. by substring text: eg AWK gsub("Earthling","Martian",string)
    2. by slice: s[3:4]="XY"
  • insert substring at offset.

NevilleDNZ (talk) 08:17, 15 May 2009 (UTC)


Came here looking for a Python equivalent to the ASC() function, which, in BASIC/VB6, returns the numeric value of the first character of a string.

Not exactly equivalent to any string function in any language which handles strings differently, but in BASIC it was a string function. —Preceding unsigned comment added by (talk) 05:17, 22 June 2009 (UTC)

  • It's called ORD() in many languages (since the character set / language / font may not be ASCII, but the idea is the same). This Wikipedia Page String Function comparison could use a section on (number to/from string, character to/from string) --BrianFennell (talk) 22:37, 3 September 2009 (UTC)

substring, startpos, base?[edit]

Ark! The substring table does not list the base for startpos and endpos. Is the startpos=1 the first character in the parent string, or the second? —Preceding unsigned comment added by (talk) 05:57, 22 June 2009 (UTC)

Square bracket as syntax[edit]

There is a problem here: sometimes the square brackets indicate on optional field: string(1[,n]), and sometimes are part of the language: string[1,n].

That leaves the problem that we can't always see that part of the command is optional: string[1 /,n/]. —Preceding unsigned comment added by (talk) 06:03, 22 June 2009 (UTC)

I see that it's been Fixed now - thank you whoever :~) (talk) 07:22, 14 January 2010 (UTC)

LUA missing as programming language[edit]

I missed lua in this page. I'm willing to add lua examples (which might take some time) but there should be someone to cross-read them. Or are there reasons not to have lua in the examples?

LUA string.find and string.gsub misplaced?[edit]

These functions work with pattern matching, not with plain strings (well, find can be forced to do so with additional options) There should be at least a comment about this. Bassklampfe (talk) 15:12, 30 November 2010 (UTC)

Removal of "Compare (integer result, fast/non-human ordering)"[edit]

I am removing the Compare (integer result, fast/non-human ordering) section, for the following reasons:

  1. This is not a common or primitive operation. Observe that of the languages listed, not one provides a built-in operator or standard library function to perform this type of comparison. Only one of the examples calls a single function, and that is in an uncommon third-party library. The rest are all implemented in terms of structural comparison of tuples (not a string operation at all) or sequential boolean OR (using the basic string comparison already detailed in the previous section). The section therefore does not in fact provide any new information about string functions at all. It merely describes an alleged optimisation technique. But...
  2. This is not even an optimisation in most cases. The complicated "fast" approaches given in the article all involved more operations than the straightforward standard approach, nullifying any speed improvement they might have brought. The OCaml and Ruby examples were particularly bad, since the "fast" versions actually involved allocating and freeing memory on the heap!
    I ran some benchmarks in Perl and OCaml, and I was unable to find any cases where the "fast" version was not actually slower than the standard approach. In one case (OCaml, comparing short strings), the code given in the article was literally 33% slower than a straightforward!
    It's possible that things might be different in other languages, and the technique might be generally faster in some very restricted circumstances (maybe when comparing very long strings that are very similar?), but it is clearly not something that anyone should be using without benchmarking it against their own data; and it's unlikely that string comparisons will frequently be enough of a bottleneck to justify this kind of micro-optimisation in the first place.

In short, this is not the kind of useful information that Wikipedia prides itself on spreading, and I don't think it belongs in this article. (talk) 17:04, 26 July 2009 (UTC)

equivalence relation missing[edit]

This article deals with three ways to compare string (equality, compare, and strcmp). This might have some issues:

  • From my understanding, the three ones cover the same feature.
  • This feature is not defined as long as lexicographical order is not defined.
  • It is not clear if this comparison is a low level comparison, or on an equivalence basis.

For instance, how do you compare Montréal and Montréal (the two canonically equivalents UTF16 unicode forms)?

Montréal (a city in North America) with its two canonically equivalents UTF16 unicode forms (NFC and NFD)
character M o n t r é a l
UTF16 NFC 004d 006f 006e 0074 0072 00e9 0061 006c
UTF16 NFD 004d 006f 006e 0074 0072 0065 0301 0061 006c
UTF16 NFD (code points) M o n t r e ◌́ a l

"Code" format[edit]

The "code" tags on the keywords in the tables (or perhaps other changes) have destroyed the formatting, making the tables almost illegible. If you go back a decade and look the original tables, you'll see that the keywords are clearly delimited, making the tables clear and easy to read.

The present formating makes the whole excercise almost worthless: if you can't read it easily, whats the point of having pages of text? — Preceding unsigned comment added by (talk) 09:27, 18 July 2017 (UTC)