Talk:Source lines of code

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing  
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 

Unnamed section[edit]

When discussing the comparison of quality of code produced by different programmers, the term "productivity" is used where another term, e.g. "efficiency", may be more appropriate. This assumes that the definition of "productivity" skews towards quantity, i.e. to be more productive simply means to produce more output, whereas the context skews towards efficiency, quality or some analogue which suggests the concept of "doing more with less". Then again, the discussion actually considers two aspects: two different code artifacts written to do the same task, and the different qualities of the respective programmers who produce the code artifacts. What is a good term to describe a worker who is good at producing higher quality products or tools? More specifically, what is a word for the measure of said ability? "Productive" is not a good word for that measure.

Perhaps the distinctions between using SLOC to estimate software complexity, the measure of software quality in general, and the measure of programmer capability should be made more explicit.


for (i=0; i<100; ++i) {printf("hello");} /* How many lines of code is this? */

Why is that ambiguous? Because it has more than one semicolon? -from a non-programmer

I would say one line of code. When counting lines in a large program it is usually done mechanically (i.e. by a program), so it isn't going to think it the way a person would. If it were reformatted it would count as two lines. That is why SLOC is a rough estimate. The question is whether the above sample line conforms to the formatting standards used by the shop it was written in. Since printf("hello"); is pretty simple I think it would be OK, but since it is so simple this is a somewhat contrived example. (sigh...I just realized I hedge so much that I nevger actually said anything...) RJFJR 01:37, August 10, 2005 (UTC)

The SLOC table for Windows appears to be very wrong (see this comment on Larry Osterman's blog):

"That wikipedia page is kinda funny. According to it, "Windows NT 5.0", released in 2000, contains 20M lines of code, whereas "Windows 2000", released in 2001, contains 35M.  ::scratches head::

In fact, all of the "years" are completely wrong. Windows 3.1 in 1990? No, 3.0 was 1990. 3.1 was 1991 or 1992 I think. "Windows NT" (no version) in 1995, Windows 95 in 1997, NT4 in 1998, and so on. The table is prefixed "According to Gary McGraw." Wonder where that guy got his info from? I'd hardly believe his LOC counts if he can't even get the years right."

I've changed the table to use the values from Andrew Tanenbaum's "Modern Operating Systems" book. Unfortunately, this only covers the NT line, not the Win 3.1/9x products. Does anyone have accurate figures for these? Bakery2k 11:48, 25 March 2006 (UTC)

Quote: "With the advent of GUI-based languages/tools such as Visual Basic, much of development work is done by drag-and-drops and a few mouse clicks, where the programmer virtually writes no piece of code, most of the time." - that is one of the most asinine things I have ever read. It sounds like a hippy ideal from the mid-70s of 4GL languages. Where is my jetpack? They promised me one by now!!!

Programs for counting lines of code[edit]

I think this section deserves to be removed. This is an uncommented collection of links that does not provide any help and does not belong into wikipedia. 84.191.231.103 21:17, 10 November 2006 (UTC)

I agree. Vorratt 01:10, 31 December 2006 (UTC)

I'm uncertain why you think this is inappropriate for Wikipedia (and I'm not sure I understand Wikipedia's goals and rules well enough to judge that) -- but it'd be a shame, as this kind of information is awfully useful for researchers who want to study this subject. -- Terry Hancock

The section probably should be split out into a separate topic, since (using the recent change by Dinker Charak as an example), it has turned into a list of links to programs written by people who want to advertise their work. If it were only that, I'd join the mob and add a link to c_count, for instance. There are many such programs Tedickey 12:22, 8 September 2007 (UTC)

Citations[edit]

There are several "citation requested" notes in the SLOC tables. I'm not sure which particular numbers come from which sources, but I can provide the following links to SLOC data, by category:

Five versions of Debian (from 2.0 "Hamm" to 3.1 "Sarge") may be found at: http://libresoft.dat.escet.urjc.es/debian-counting/

Another semi-independent paper evaluated just 2.2 "Potato": http://people.debian.org/~jgb/debian-counting/

(I say semi-independent, because I believe one author is shared between the two sources. But it was an independent study, although it used the same SLOCCount tool).

Data for Red Hat 6.2 and 7.1 were published by David Wheeler at: http://www.dwheeler.com/sloc/

Direct links to the papers: http://www.dwheeler.com/sloc/redhat71-v1/redhat71sloc.html http://www.dwheeler.com/sloc/redhat62-v1/redhat62sloc.html

In it, he cites the following references for Windows versions: http://www.schneier.com/crypto-gram-0003.html#8

(even with the anchor, you'll have to scroll down a bit to find this information buried midway through the article)

and also this source for NASA Space Shuttle flight software: http://books.nap.edu/html/statsoft/chap2.html

(Note that Wheeler's notation is a bit misleading -- it appears (to me) to indicate that 420,000 SLOC are used on the on board computer and that 1.4 million SLOC are used on the ground. But that's incorrect. The 1.4 million SLOC software is the size of the *testbed software* used to certify the 420,000 SLOC actually used on the Shuttle. In other words, it's *all* for the on board software. Wheeler doesn't actually say anything wrong, it's just that it's a brief note in a table and not explained.).

Which hopefully will make it easy to check the figures' sources (or replace them with equivalents). You'll realize of course that the numbers won't match exactly because there are different methodologies used in picking exactly which lines of code should be included in the counts, when exactly the data were measured, etc.

I haven't found anything on Mac OSs or FreeBSD, though I'm still looking.

Also, regarding the flippant comment about the "dates being all wrong" for Windows -- this betrays a misunderstanding. The dates need not be "release dates". They are the approximate dates at which the evaluation was made on the original source code, which evolves continuously over time. Proprietary software is simply released at specific points on that evolution, so it's less obvious that this is true. So the dates being different from the release dates doesn't necessarily mean anything. On the other hand, I don't have the McGraw book, so I can't see what the actual claim is.

In order to truly get SLOC for Windows products, you would have to have inside access to the code (which is no doubt, only available under an NDA to people contracting with Microsoft). This limits who we can get such information from. The complained-about numbers appear to be taken from David Wheeler's introduction to his papers, which references a "Gary McGraw (of Cigital)" for the source. However, *in* the papers, he uses the Schneier citation I've listed above (so I think it's probably a more reliable source).

-- Terry Hancock

More on citations...

I had a source referencing a book by Andrew Tanenbaum from 2001. However, that's obviously not the source for the later numbers. Probably this is (already in the references): http://www.computerworld.com.au/index.php/id;1942598204;pp;1

However, there's another book with similar information in it: http://www.knowing.net/PermaLink,guid,c4bdc793-bbcf-4fff-8167-3eb1f4f4ef99.aspx

Which is quoting from the book: Vincent Maraia, "The Build Master: Microsoft's Software Configuration Management Best Practices", Addison-Wesley Microsoft Technology Series, 2005.

The source for the Mac OS 10.4 "Leopard" release is apparently Steve Jobs himself, from a keynote speech, which is described here (including a paraphrase of Jobs): http://www.macosxrumors.com/articles/2006/08/09/wwdc-2006-keynote-detailed-report

68.93.224.4 21:43, 14 February 2007 (UTC)Terry Hancock

Okay, another issue:

Open Solaris is claimed to be 10 million SLOC by Sun in 2005. http://www.boostmarketing.com/story.php?id=474

From Sun, there is a claim that *Star Office* was 7.5 Million SLOC when it was first released as OpenOffice, e.g.: http://java.sun.com/developer/jcpopensource/

The Sun Solaris 7.5 meg figure is from the Debian 3.1 paper, and that refers to an earlier paper for those numbers.

68.93.224.4 22:41, 14 February 2007 (UTC)Terry Hancock

I'm not sure how to add a citation, but the citation for Paint.NET lines of code is: http://getpaint.net/download.html where it says, "Interested in looking at almost 140,000 lines of code?"

COBOL[edit]

The COBOL article has a much shorter Hello World, so what's the deal with that 17-line beast in this SLOC article? -j.engelh (talk) 21:39, 28 January 2008 (UTC)

There's a discussion of this at Talk:COBOL#Bogus lines deleted. The 17 line version is a particular dialect of COBOL once used on CP/M. (I think similar variants are not uncommon; when I briefly touched on COBOL many years ago at school, I'm pretty sure we were told all the division declarations are mandatory even if they are empty, just as seems to be done here.) Since the point of the example is to show that languages can differ greatly in verbosity, it seems perfectly reasonable to use an especially verbose variant so long as it isn't contrived -- although perhaps we should label it with the actual dialect name, to avoid pointless offence to much put-upon COBOL supporters. Actually, I think a strong argument can be made that the "short" example is far too long; after all, there are plenty of quite common languages in which "Hello World" is a one-liner.
Having said all that, I think "Hello, world!" is not a good example. In nearly all high-level languages the core functionality of "Hello, world!" is a single statement, so any statement count over 1 is actually a measure of the amount of syntactical cruft required to get a program up and running at all, which is not necessarily a good measure of verbosity for particular tasks. It might be better to use an example which actually highlights the strengths and / or domain impedance mismatches of particular languages. (And to avoid someone complaining, "yes that's hard using built-ins, but company X sells a library extension which solves it in one command!", we might specify no libraries outside the standard distribution.)
From a real-world task in which I have personally seen massive difference in verbosity of seriously proposed solutions, I would suggest something like parsing a web server log file in Common Log Format to count how many unique IP addresses have hit the server during the life of that log file (i.e., how many unique values are found in the first field of each line.) Hopefully this would be short enough to fit into a readable example whilst still being a reasonable, not-too-contrived example. Perhaps if someone could try this in Visual Basic (a practical and popular language, but one I've found rather impedance mismatched for this sort of task) and we can compare it with my rather simple example in a language optimised for line-oriented text processing. -- 203.20.101.203 (talk) 06:52, 7 July 2009 (UTC)

Bill Gate's citation[edit]

Is there any source for Bill Gate's citation? 130.232.32.155 (talk) 13:21, 21 January 2009 (UTC)

apparently (I googled it a while back, and iirc it was attributed to Business @ the Speed of Thought - but I don't have a copy to verify that) Tedickey (talk) 13:31, 21 January 2009 (UTC)

Geer quote[edit]

The page doesn't have much to do with source size (or even security). Just a rambling rant. TEDickey (talk) 20:59, 27 August 2010 (UTC)

I agree with the relevance of the statement. In general, the higher the complexity the system, the more things that can go wrong. However, I also agree that more citation is needed. A good quality software development and defect reduction process can do wonders if applied effectively. This is merely a rule of thumb. It seems logical, but SLOC doesn't necessarily translate to quality.--74.107.74.39 (talk) 01:55, 12 April 2011 (UTC)

KLoC disambiguation page needed?[edit]

The related term, KLoC, used to redirect to this article, but now it sends one to Poland :), and KLOC sends one to a radio station. I've added disambiguation links including this article and the third on both pages, i.e.

For the radio station, see KLOC.
For the village in northern Poland, see Kloc.

However, I didn't feel comfortable adding parallel links to this page, as the main heading is SLoC, not KLoC. Therefore, my question is: do these three kloc pages deserve a disambiguation page yet?

q.v. Talk:KLOC Pdebonte (talk) 13:33, 1 September 2010 (UTC)

Relation with security faults[edit]

"A number of experts have claimed a relationship between the number of lines of code in a program and the number of bugs that it contains" -- IIRC D.J. Bernstein is a notable example, HTH anyone willing to dig a bit (somewhat busy right now). --Gvy (talk) 13:06, 8 July 2011 (UTC)


Uninformative introduction[edit]

Software projects can vary between 1 to 100,000,000 or more lines of code.

...doesn't say anything, does it? Andreas Lundblad (talk) 15:45, 19 April 2012 (UTC)

Subjective language[edit]

I take issue with this sentence:

"Using lines of code to compare a 10,000 line project to a 100,000 line project is far more useful than when comparing a 20,000 line project with a 21,000 line project"

It goes without saying that the ranges between these examples are on a different scale. But does that really relate to usefulness at all? If so, "useful" to whom?

What is "useful" is completely subjective. For example, someone working on a purposefully minimal codebase probably couldn't care less about the considerations of a larger project that they aren't working on.

Also does this sentence provide any kind of value or insight into the concept of SLOC? I don't think it does. It has very little surrounding context or relavence.

Consensus to remove/clean up/make less subjective?... — Preceding unsigned comment added by 81.157.211.54 (talk) 09:26, 4 June 2012 (UTC)

Debian LOC[edit]

Are there any updates for Debian 6 with respect to LOC. For Debian 7, wheezy, there seem to be some people who did the count in February 2012 http://blog.james.rcpt.to/2012/02/13/debian-wheezy-us19-billion-your-price-free/ which could be incorporated into the article. Greetings --hroest 13:54, 23 April 2013 (UTC)