Talk:C string handling

Choice of the title

Even though there was consensus that a split of C library-specific stuff from Null-terminated string was needed, there [[was some disagreement about what title to choose. I chose this title because it is consistent with other articles about C standard library. Also, "C string handling" gets more hits (225,000) on google than "String handling in C" (35,900) so this title is favoured by WP:COMMONNAME. If anyone disagrees, please discuss here, I am open to suggestions. Also, a relevant discussion is at Talk:C standard library#Move request through WP:RM. 1exec1 (talk) 18:04, 9 November 2011 (UTC)

Revert of an edit

I reverted this edit because both C++ and PHP are not very related. C++ does not have strlen per se; it inherits that function from C. As of PHP's strlen, it does not operate on C strings. PHP strings are not even null-terminated ([1]), so I don't see how that's relevant to C string handling the article is about. I agree that there's slight problem of strlen redirecting here, however this problem should be solved using different means than linking from this page per above reasons. Can anyone suggest better redirect target or another solution? 1exec1 (talk) 19:18, 23 November 2011 (UTC)

As for the prototype of strerror, I deleted the prototype because it fails #1 in WP:NOTMANUAL, since function prototypes belong to reference manual not encyclopedia. Also, it fails #8 in WP:NOTDIRECTORY: "Wikipedia articles are not a complete exposition of all possible details. Rather, an article is a summary of accepted knowledge regarding its subject." Exposition of a function prototype definitely falls outside the scope of a summary, while the name of the function and short one-sentence description probably is still within, although that's also debatable. 1exec1 (talk) 19:17, 23 November 2011 (UTC)

Its not an WP:Manual to have the prototype. And its not all posible details to list the protype, its basic information about the function. But if you think so, why have the list of functions at all? Christian75 (talk) 19:45, 23 November 2011 (UTC)
Why do you say that prototype is not covered by the notmanual/notdirectory policy? These guidelines supplement the what Wikipedia is not, which itself probably has roots in the notability principle. In my understanding, the spirit of all these policies is that Wikipedia should include the information that is not relevant for broader public. For example Wikipedia shouldn't have a page that nobody reads or material within a page that is important to nobody. I feel that prototypes provide very little, if any, additional information. For example, compare int strlen(char *s) - returns the length of the given string s with strlen - returns the length of the given string: the second version carries essentially the same information as the first, just without the redundant bits. So I think the second way of presenting information is better.
Another thing is consistency. If we decide to show prototypes, then we have to show them for all functions. However, some of the functions have quite complex prototypes that need a lengthy explanation even when the functionality of the function can be explained in one sentence. So we end up with two alternatives none of which are acceptable: have a prototype without explanation (what's the purpose of the prototype then) or explain the prototype in a way that violates notmanual. This is another problem.
Regarding the functions themselves I do not think they violate notmanual because these functions are notable enough to warrant a mention, they are relevant to this article and that information is important to all readers in order to understand the article. 1exec1 (talk) 20:52, 23 November 2011 (UTC)
You wirte a lot. First it doesnt matter if people read it or not (WP:NOBODYREADSIT). I think prototypes are useful, and basic information for the functions. If I see the prototype "int strlen(char *s)", it tells a lot, if I just see the description I have to search information another place. I cant see how it violates WP:NOTMANUAL - its a description of what strlen is. Its like that Schrödinger equation didnt have the equation but just an explanation with words. Christian75 (talk) 21:49, 23 November 2011 (UTC)
I still don't understand your argument. What additional information does "int strlen(char *s)" add to strlen - returns the length of the given string. The description already implies all the arguments. I don't think anyone who's looking for a general description of the function can find the prototype useful. It's needed only for those who are programming and looking for a reference manual, but again WP:NOTMANUAL applies. Can you provide at least one usage case when prototype is useful for someone who's not programming in C? As for physics equations, they are different subject, since an equation is self-explanatory enough that an explanation, which is neither concise nor precise, is not needed. 1exec1 (talk) 22:42, 23 November 2011 (UTC)
First, an equation is not self-explanatory. But okey, this is about the prototypes. If you think the list does violate WP:NOTMANNUAL with prototypes, please explain, why it doesnt violate WP:NOTMANUAL without the prototypes. Its still look the same, but with one information less. And do you think that non-programmers are looking for this kind of information (and it really doesnt matter who is looking for it) Christian75 (talk) 08:24, 24 November 2011 (UTC)
How an equation is not self-explanatory? If I write F = ma it immediately defines a relationship between mass, acceleration and force. All is needed to know is notation. When you write size_t strlen( char*s ), or size_t strcspn(const char *s1, const char *s2); even if you know that s, s1, s2 are strings and size_t is integer you can't say what the function does.
I do not argue that entire line violates WP:NOTMANUAL, only the prototype bit, which is redundant and reduces readability. Since this article is about C string handling, not C string per se, the function names and short descriptions are relevant and notmanual - you can't talk about functions working on C strings without naming them. That information is important to anyone who has programming skills and is interested in computing, whereas the prototypes are relevant only to C programmers looking for reference extensively detailing (#8 in WP:NOTDIRECTORY) the behaviour of the function (WP:NOTMANUAL). Thus my argumentation still holds: prototypes are redundant, reduce readability, fail WP:NOTMANUAL and WP:NOTDIRECTORY.
Since you haven't provided an argument why prototypes should be included except that you think it's important and doesn't fail NOTMANUAL I removed it again. Because of that and as you are the first opposing the absence of prototypes, I think that you should ask for more opinions to establish consensus. Maybe wikiproject computing and Talk:C standard library would be a good place to start. Until the consensus is established, the prototypes should stay as is. 1exec1 (talk) 12:07, 24 November 2011 (UTC)
Its like an formula - an educated person knows what F = ma means (but the average person has no clue at all) - it gets worse if you had taken an equation from Schrödinger equation ${\displaystyle i\hbar {\frac {\partial }{\partial t}}\Psi (\mathbf {r} ,\,t)={\hat {H}}\Psi =\left(-{\frac {\hbar ^{2}}{2m}}\nabla ^{2}+V(\mathbf {r} )\right)\Psi (\mathbf {r} ,\,t)=-{\frac {\hbar ^{2}}{2m}}\nabla ^{2}\Psi (\mathbf {r} ,\,t)+V(\mathbf {r} )\Psi (\mathbf {r} ,\,t)}$ (an educated person knows what it means). An educated person knows what size_t strcspn(const char *s1, const char *s2); means - its helpfull for the reader to see prototypes because it gives information about what the function does - what to put in, and what to expect to get out - and that what wikipedia should do - explaining how thinks works. The way the article is know its just a long list of externals links. You are the only person which removes the prototypes, and you have done it on every article with prototypes that you have edited. I think nobody cares about the C articles which are of very poor quality - but I do. Christian75 (talk) 22:16, 26 November 2011 (UTC)
Well, F = ma is one of the Newtonian law that is included in curriculum of virtually all high schools around the world, thus average person already knows it. Or do you allege that high-school graduates are already much above average? In any case, comparison to physics formulae is a wrong one, because they approximate how nature works and thus are important regardless how complex they are.
Having said that, I think I can pinpoint exact reasons why we don't agree: "<...> because it gives information about what the function does - what to put in, and what to expect to get out" (emphasis mine) - this is exactly what fails WP:NOTMANUAL. The policy is there for a good reason - Wikipedia is an encyclopedia and thus it is not suited for content that should be in reference manuals. See, for example this. The page is little use for someone who's looking for a reference because it contains a lot of unrelated information. Programmer doesn't care what are criticism or history of strlcpy when all he's looking for is the precise explanation of the behaviour of the function. Also, the information is not easily found and the navigation across different pages for C functions is poor (see e.g. [2] or [3] for an example of good navigation facilities). So I think a much better solution would be to remove the manual-like content, but also add links to some dedicated reference website, so that all the details a programmer might want would be easily accessible. I advocate to remove all manual-like content including prototypes for two reasons: it's distracting readers who are interested only in encyclopedic content and obfuscates links to a dedicated reference website for those who need only a reference. By the way, once I also wanted the Wikipedia to include entire C reference manual (e.g. check out this). Only after seeing firsthand that wikipedia is not useful as reference I decided that that's a bad idea. 1exec1 (talk) 18:31, 29 November 2011 (UTC)

Encodings

The text currently says that 32-bit wchar_t strings can be used to store UTF-16. Technically this is true in that the code units of UTF-16 can be stored losslessly in 32 bit words. It is also true that 32 and 16 bit wchar_t can store UTF-8 codes losslessly. But I notice the author does not claim that.

In real use, UTF-16 padded out to 32-bit code units is not UTF-16. If that block of memory is written to a file, that file is certainly not UTF-16. If it was interpreted as UTF-16 it would mostly be characters alternating with nul, but there would be errors if any surrogate pairs were encountered as they would have a nul between them.

I would like to remove this code as I feel it is misleading. However an actual example of distributed software storing UTF-16 into 32-bit wchar_t would allow this to remain. Note that the software must actually work with and store non-BMP surrogate pairs as two wchar_t, if it combines them then the result is UTF-32, and if it fails to handle surrogate halves at all then the result is UTF-32 limited to the BMP.Spitzak (talk) 23:46, 29 December 2011 (UTC)

I wanted to say that wchar_t can be used to process many encodings, not only UTC-4/UTF-32 (or UTC-2/UTF-16 on windows). Most of the wide string functions work with almost any encoding, provided that the code unit fits in wchar_t. I agree that the current text is misleading though. I think a better option would be to clarify it than to remove. I'm fairly confident that it would be possible to find a reference also. 1exec1 (talk) 19:46, 30 December 2011 (UTC)

Error behavior of *_s functions

The current article says "Their behavior on error is so useless (and even destructive in the case of strcat_s) that actually using them in cases where it is unknown if the string will overflow the buffer is almost impossible." Having read the MSDN page describing these functions as well as the two links cited at the end of the article, I have no idea why the error behavior is considered "useless" or "destructive". This needs clarified and/or sourced or I will remove it in a couple more days. 128.105.181.52 (talk) 23:07, 22 March 2012 (UTC)

strcat_s will replace the first byte of the old string with a nul on failure, making it impossible to recover the arguments on failure:
char foo[] = "foo";
strcat_s(foo, "bar", 4); // fails on purpose (I may have argument order wrong here)
assert(foo[0] == 'f'); // now this fails, the string has been destroyed

This is destructive and means strcat_s is useless for actual error detection, since you cannot recover from the error.Spitzak (talk) 03:17, 24 March 2012 (UTC)
Thanks for the explanation. I've replaced the dubious tag and original wording with an expanded description (toned-down in terms of criticism which I feel is much more fair than the original one). Sound fair? 216.165.158.75 (talk) 01:21, 28 March 2012 (UTC)
I guess, though it is a bit lengthy (almost all discussion of individual functions was deleted when they were all merged into these single pages). I also believe the default behavior is to clear the output buffer and return an error indicator. You have to install an error handler to get the abort behavior. Abort is much worse, since it replaces a potential failure (in cases where the returned string is not usable) with a *guaranteed* DOS failure.Spitzak (talk) 18:09, 28 March 2012 (UTC)
The default behavior is abort according to both MSDN documentation ([4]) and experimentation. I did not check the standard document. Also, I definitely disagree that aborting is much worse -- I strongly feel that a "guaranteed" DOS is better than continuing with near-certain incorrect behavior, especially when I suspect such behavior is generally reasonably likely to be exploitable in a real sense. 216.165.158.75 (talk) 00:22, 29 March 2012 (UTC)

It is three years after this discussion and the article is still wrong. As discussed, "strcat_s and strcpy_s functions return an error indicator upon buffer overflow, together with setting the output buffer to a zero-length string, which destroys data in the case of strcat_s" is a very misleading statement *at best*. The edit history seems to indicate that editors think the C11 behavior is different and matches the description. It does not, see K.3.1.4 in at least the draft standard ("If a runtime-constraint is violated, the implementation shall call the currently registered runtime-constraint handler") and K.6.3.1.1 ("The behavior of the default handler is implementation-defined, and it may cause the program to exit or abort"). This is certainly the behavior of MS's implementation, and I suspect would be the behavior on most other implementations as well. I will clean up this description later if someone else does not get to it first. 66.188.116.58 (talk) 19:28, 11 February 2015 (UTC)

Personally, I think it'd still be pretty reasonable to just ditch the entire thing. The fact that *anyone* objects to the error behavior is still unsourced, aside from our OR. The last two sentences of that paragraph are pushing includability, and once they're gone, there's no reason to talk about the error behavior. 216.165.158.75 (talk) 00:26, 29 March 2012 (UTC)

Okay, somebody has edited this *yet again* to say "in the C11 versions it returns an error indicator". This is in fact what I thought the _s functions all do by default (they return an error code), they only abort if the programmer previously did some "setup" to "change the default error handler" which it is best to assume was *not* done if you want to describe default behavior. However somebody else claimed abort was the default behavior. Can somebody who has access to these pieces of crap actually *TEST* this and see what the default is?Spitzak (talk) 18:26, 9 April 2012 (UTC)

It is three years after this discussion and the article is still wrong. As discussed, "strcat_s and strcpy_s functions return an error indicator upon buffer overflow, together with setting the output buffer to a zero-length string, which destroys data in the case of strcat_s" is a very misleading statement *at best*. The edit history seems to indicate that editors think the C11 behavior is different and matches the description. It does not, see K.3.1.4 in at least the draft standard ("If a runtime-constraint is violated, the implementation shall call the currently registered runtime-constraint handler") and K.6.3.1.1 ("The behavior of the default handler is implementation-defined, and it may cause the program to exit or abort"). This is certainly the behavior of MS's implementation, and I suspect would be the behavior on most other implementations as well. I will clean up this description later if someone else does not get to it first. 66.188.116.58 (talk) 19:28, 11 February 2015 (UTC)
Please add a good reference to prevent future edit wars. • SbmeirowTalk • 00:38, 12 February 2015 (UTC)
Let's see if the changes stick now.EvanED (talk) 05:08, 13 February 2015 (UTC)

Seeking clarification?

In regards to the following statement in this article, I believe for the sake of clarity it might pay to omit the weasel word "often" and be specific as to which part of the article uses the term "string" erroneously: Documentation (including this page) will often use the term string to mean pointer to a string. Modifiable lvalue (talk) 10:08, 26 February 2013 (UTC)

I also wish to identify the reliability of any sources that may be used to cite this as factual. Documentation that treats string synonymously as pointer to string is doing so in disagreement with the current C specification draft (section 7.1.1 of n1570) for which our C compilers must be fairly compliant, and the Single Unix Specification for which most of our operating systems are based: Those manuals can be found in the "See also" section of Opengroup String.h. Modifiable lvalue (talk) 10:26, 17 March 2013 (UTC)

The second paragraph ends with the statement:

"These functions are so popular and used so often that they are usually considered part of the definition of C."

This is misleading and incorrect. They ARE part of the definition of C -- they're written into the effing standard, even as far back as ANSI C89/ISO C90. — Preceding unsigned comment added by 82.9.176.129 (talk) 17:12, 3 November 2013 (UTC)

They are part of the C standard library, not the *language* C. The word "strcat" is not a token in C.Spitzak (talk) 19:18, 4 November 2013 (UTC)