Talk:Source lines of code

This is the talk page for discussing improvements to the Source lines of code article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Computing: Software / CompSci Low‑importance

This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles

Low

This article has been rated as Low-importance on the project's importance scale.

This article is supported by WikiProject Software (assessed as High-importance).

This article is supported by WikiProject Computer science (assessed as Low-importance).

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Unnamed section[edit]

When discussing the comparison of quality of code produced by different programmers, the term "productivity" is used where another term, e.g. "efficiency", may be more appropriate. This assumes that the definition of "productivity" skews towards quantity, i.e. to be more productive simply means to produce more output, whereas the context skews towards efficiency, quality or some analogue which suggests the concept of "doing more with less". Then again, the discussion actually considers two aspects: two different code artifacts written to do the same task, and the different qualities of the respective programmers who produce the code artifacts. What is a good term to describe a worker who is good at producing higher quality products or tools? More specifically, what is a word for the measure of said ability? "Productive" is not a good word for that measure.

Perhaps the distinctions between using SLOC to estimate software complexity, the measure of software quality in general, and the measure of programmer capability should be made more explicit.

for (i=0; i<100; ++i) {printf("hello");} /* How many lines of code is this? */

Why is that ambiguous? Because it has more than one semicolon? -from a non-programmer

I would say one line of code. When counting lines in a large program it is usually done mechanically (i.e. by a program), so it isn't going to think it the way a person would. If it were reformatted it would count as two lines. That is why SLOC is a rough estimate. The question is whether the above sample line conforms to the formatting standards used by the shop it was written in. Since printf("hello"); is pretty simple I think it would be OK, but since it is so simple this is a somewhat contrived example. (sigh...I just realized I hedge so much that I nevger actually said anything...) RJFJR 01:37, August 10, 2005 (UTC)

The SLOC table for Windows appears to be very wrong (see this comment on Larry Osterman's blog):

"That wikipedia page is kinda funny. According to it, "Windows NT 5.0", released in 2000, contains 20M lines of code, whereas "Windows 2000", released in 2001, contains 35M. ::scratches head::

In fact, all of the "years" are completely wrong. Windows 3.1 in 1990? No, 3.0 was 1990. 3.1 was 1991 or 1992 I think. "Windows NT" (no version) in 1995, Windows 95 in 1997, NT4 in 1998, and so on. The table is prefixed "According to Gary McGraw." Wonder where that guy got his info from? I'd hardly believe his LOC counts if he can't even get the years right."

I've changed the table to use the values from Andrew Tanenbaum's "Modern Operating Systems" book. Unfortunately, this only covers the NT line, not the Win 3.1/9x products. Does anyone have accurate figures for these? Bakery2k 11:48, 25 March 2006 (UTC)[reply]

Quote: "With the advent of GUI-based languages/tools such as Visual Basic, much of development work is done by drag-and-drops and a few mouse clicks, where the programmer virtually writes no piece of code, most of the time." - that is one of the most asinine things I have ever read. It sounds like a hippy ideal from the mid-70s of 4GL languages. Where is my jetpack? They promised me one by now!!!

Programs for counting lines of code[edit]

I think this section deserves to be removed. This is an uncommented collection of links that does not provide any help and does not belong into wikipedia. 84.191.231.103 21:17, 10 November 2006 (UTC)[reply]

I agree. Vorratt 01:10, 31 December 2006 (UTC)[reply]

I'm uncertain why you think this is inappropriate for Wikipedia (and I'm not sure I understand Wikipedia's goals and rules well enough to judge that) -- but it'd be a shame, as this kind of information is awfully useful for researchers who want to study this subject. -- Terry Hancock

The section probably should be split out into a separate topic, since (using the recent change by Dinker Charak as an example), it has turned into a list of links to programs written by people who want to advertise their work. If it were only that, I'd join the mob and add a link to c_count, for instance. There are many such programs Tedickey 12:22, 8 September 2007 (UTC)[reply]

Citations[edit]

There are several "citation requested" notes in the SLOC tables. I'm not sure which particular numbers come from which sources, but I can provide the following links to SLOC data, by category:

Five versions of Debian (from 2.0 "Hamm" to 3.1 "Sarge") may be found at: http://libresoft.dat.escet.urjc.es/debian-counting/

Another semi-independent paper evaluated just 2.2 "Potato": http://people.debian.org/~jgb/debian-counting/

(I say semi-independent, because I believe one author is shared between the two sources. But it was an independent study, although it used the same SLOCCount tool).

Data for Red Hat 6.2 and 7.1 were published by David Wheeler at: http://www.dwheeler.com/sloc/

Direct links to the papers: http://www.dwheeler.com/sloc/redhat71-v1/redhat71sloc.html http://www.dwheeler.com/sloc/redhat62-v1/redhat62sloc.html

In it, he cites the following references for Windows versions: http://www.schneier.com/crypto-gram-0003.html#8

(even with the anchor, you'll have to scroll down a bit to find this information buried midway through the article)

and also this source for NASA Space Shuttle flight software: http://books.nap.edu/html/statsoft/chap2.html

(Note that Wheeler's notation is a bit misleading -- it appears (to me) to indicate that 420,000 SLOC are used on the on board computer and that 1.4 million SLOC are used on the ground. But that's incorrect. The 1.4 million SLOC software is the size of the *testbed software* used to certify the 420,000 SLOC actually used on the Shuttle. In other words, it's *all* for the on board software. Wheeler doesn't actually say anything wrong, it's just that it's a brief note in a table and not explained.).

Which hopefully will make it easy to check the figures' sources (or replace them with equivalents). You'll realize of course that the numbers won't match exactly because there are different methodologies used in picking exactly which lines of code should be included in the counts, when exactly the data were measured, etc.

I haven't found anything on Mac OSs or FreeBSD, though I'm still looking.

Also, regarding the flippant comment about the "dates being all wrong" for Windows -- this betrays a misunderstanding. The dates need not be "release dates". They are the approximate dates at which the evaluation was made on the original source code, which evolves continuously over time. Proprietary software is simply released at specific points on that evolution, so it's less obvious that this is true. So the dates being different from the release dates doesn't necessarily mean anything. On the other hand, I don't have the McGraw book, so I can't see what the actual claim is.

In order to truly get SLOC for Windows products, you would have to have inside access to the code (which is no doubt, only available under an NDA to people contracting with Microsoft). This limits who we can get such information from. The complained-about numbers appear to be taken from David Wheeler's introduction to his papers, which references a "Gary McGraw (of Cigital)" for the source. However, *in* the papers, he uses the Schneier citation I've listed above (so I think it's probably a more reliable source).

-- Terry Hancock

COBOL[edit]

The COBOL article has a much shorter Hello World, so what's the deal with that 17-line beast in this SLOC article? -j.engelh (talk) 21:39, 28 January 2008 (UTC)[reply]

There's a discussion of this at Talk:COBOL#Bogus lines deleted. The 17 line version is a particular dialect of COBOL once used on CP/M. (I think similar variants are not uncommon; when I briefly touched on COBOL many years ago at school, I'm pretty sure we were told all the division declarations are mandatory even if they are empty, just as seems to be done here.) Since the point of the example is to show that languages can differ greatly in verbosity, it seems perfectly reasonable to use an especially verbose variant so long as it isn't contrived -- although perhaps we should label it with the actual dialect name, to avoid pointless offence to much put-upon COBOL supporters. Actually, I think a strong argument can be made that the "short" example is far too long; after all, there are plenty of quite common languages in which "Hello World" is a one-liner.

Having said all that, I think "Hello, world!" is not a good example. In nearly all high-level languages the core functionality of "Hello, world!" is a single statement, so any statement count over 1 is actually a measure of the amount of syntactical cruft required to get a program up and running at all, which is not necessarily a good measure of verbosity for particular tasks. It might be better to use an example which actually highlights the strengths and / or domain impedance mismatches of particular languages. (And to avoid someone complaining, "yes that's hard using built-ins, but company X sells a library extension which solves it in one command!", we might specify no libraries outside the standard distribution.)

From a real-world task in which I have personally seen massive difference in verbosity of seriously proposed solutions, I would suggest something like parsing a web server log file in Common Log Format to count how many unique IP addresses have hit the server during the life of that log file (i.e., how many unique values are found in the first field of each line.) Hopefully this would be short enough to fit into a readable example whilst still being a reasonable, not-too-contrived example. Perhaps if someone could try this in Visual Basic (a practical and popular language, but one I've found rather impedance mismatched for this sort of task) and we can compare it with my rather simple example in a language optimised for line-oriented text processing. -- 203.20.101.203 (talk) 06:52, 7 July 2009 (UTC)[reply]

Hi Everyone. I love COBOL and I created an account to edit this page. This program compiles under GnuCOBOL. The 17 line example is a terrible example of this modern and enjoyable language

      identification division.
      program-id. hello .
      procedure division.
      display "hello wolrd"
      goback .
      end program hello .  — Preceding unsigned comment added by HalfMadDad (talk • contribs) 18:44, 23 April 2017 (UTC)[reply]

the last line is unnecessary as nothing is executed after goback 2A02:1406:F:86D5:0:0:3A07:E564 (talk) 12:17, 28 September 2023 (UTC)[reply]

Bill Gate's citation[edit]

Is there any source for Bill Gate's citation? 130.232.32.155 (talk) 13:21, 21 January 2009 (UTC)[reply]

apparently (I googled it a while back, and iirc it was attributed to Business @ the Speed of Thought - but I don't have a copy to verify that) Tedickey (talk) 13:31, 21 January 2009 (UTC)[reply]

I found this plain text transcription, and if it is faithful and complete, the citation is apocryphal http://www.managementparadise.com/uploads_blog/43000/42898/0_9550.pdf — Preceding unsigned comment added by 89.2.141.30 (talk) 11:18, 5 November 2018 (UTC)[reply]

Someone also asked about this phrase on Wikiquote: en:wikiquote:Talk:Bill Gates#Measuring aircraft building progress by weight. I searched Gates' book (1, 2, 3, 4), but I did not find the phrase or anything similar to it. Cheers, Manifestation (talk) 14:23, 7 July 2020 (UTC)[reply]

Geer quote[edit]

The page doesn't have much to do with source size (or even security). Just a rambling rant. TEDickey (talk) 20:59, 27 August 2010 (UTC)[reply]

I agree with the relevance of the statement. In general, the higher the complexity the system, the more things that can go wrong. However, I also agree that more citation is needed. A good quality software development and defect reduction process can do wonders if applied effectively. This is merely a rule of thumb. It seems logical, but SLOC doesn't necessarily translate to quality.--74.107.74.39 (talk) 01:55, 12 April 2011 (UTC)[reply]

KLoC disambiguation page needed?[edit]

The related term, KLoC, used to redirect to this article, but now it sends one to Poland :), and KLOC sends one to a radio station. I've added disambiguation links including this article and the third on both pages, i.e.

However, I didn't feel comfortable adding parallel links to this page, as the main heading is SLoC, not KLoC. Therefore, my question is: do these three kloc pages deserve a disambiguation page yet?

q.v. Talk:KLOC Pdebonte (talk) 13:33, 1 September 2010 (UTC)[reply]

Relation with security faults[edit]

"A number of experts have claimed a relationship between the number of lines of code in a program and the number of bugs that it contains" -- IIRC D.J. Bernstein is a notable example, HTH anyone willing to dig a bit (somewhat busy right now). --Gvy (talk) 13:06, 8 July 2011 (UTC)[reply]

Uninformative introduction[edit]

Software projects can vary between 1 to 100,000,000 or more lines of code.

...doesn't say anything, does it? Andreas Lundblad (talk) 15:45, 19 April 2012 (UTC)[reply]

As a veteran programmer, I find the statement useful even though it indulges in a bit of hyperbole. I mean, really?? One line of code for a project? The lower limit might be very hard to define, but I guess it must start somewhere and I guess it is not impossible to start a project with one line of code. A 10⁹ to 10¹⁰ SLOC dynamic range seems reasonable. The only reason I even saw this article is that I am taking on a program I estimate at 10⁴ SLOC, and I want to estimate how much time I am committing to. Loyalgadfly (talk) 14:55, 27 February 2019 (UTC)[reply]

Subjective language[edit]

I take issue with this sentence:

"Using lines of code to compare a 10,000 line project to a 100,000 line project is far more useful than when comparing a 20,000 line project with a 21,000 line project"

It goes without saying that the ranges between these examples are on a different scale. But does that really relate to usefulness at all? If so, "useful" to whom?

What is "useful" is completely subjective. For example, someone working on a purposefully minimal codebase probably couldn't care less about the considerations of a larger project that they aren't working on.

Also does this sentence provide any kind of value or insight into the concept of SLOC? I don't think it does. It has very little surrounding context or relavence.

Consensus to remove/clean up/make less subjective?... — Preceding unsigned comment added by 81.157.211.54 (talk) 09:26, 4 June 2012 (UTC)[reply]

If there is anything this article could benefit from more, it would be a section on how LOC relates to different languages. It is a relative term, so relating "apples" and "oranges" is idiotic. But, used as a tool for relating relative efforts, it sometimes has its place. That said, programming is very much like writing prose. One can be succinct and say a lot in just a few words, as well as beating around the bush and never getting to the point. As in, "being paid to write by the 'pound'." Loyalgadfly (talk) 14:55, 27 February 2019 (UTC)[reply]

Debian LOC[edit]

Are there any updates for Debian 6 with respect to LOC. For Debian 7, wheezy, there seem to be some people who did the count in February 2012 http://blog.james.rcpt.to/2012/02/13/debian-wheezy-us19-billion-your-price-free/ which could be incorporated into the article. Greetings --hroest 13:54, 23 April 2013 (UTC)[reply]

Blank Lines[edit]

In the "Measurement Methods" section, there is a paragraph stating that blank lines are counted in physical SLOC. This, to my knowledge, is incorrect; I use Code Count at work and it doesn't count whitespace at all. — Preceding unsigned comment added by 199.209.144.220 (talk) 15:12, 22 January 2015 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified 3 external links on Source lines of code. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

Added archive https://web.archive.org/web/20140223013701/http://blog.james.rcpt.to/2012/02/13/debian-wheezy-us19-billion-your-price-free/ to http://blog.james.rcpt.to/2012/02/13/debian-wheezy-us19-billion-your-price-free/
Corrected formatting/usage for http://www.h-online.com/open/features/What-s-new-in-Linux-2-6-32-872271.html?view=print
Corrected formatting/usage for http://www.h-online.com/open/features/What-s-new-in-Linux-3-6-1714690.html?page=3

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 05:07, 9 November 2016 (UTC)[reply]

Steve Ballmer Quote[edit]

In the "Disadvantages" section, the use of Steve Ballmer as an authority on writing code is like using Marie Antoinette as an authority on baking bread. I find it ironic that someone who knows so little about writing code makes such an insightful comment. But to use him as an authority totally discredits the article in the eyes of anyone who knows anything about the subject. Loyalgadfly (talk) 14:55, 27 February 2019 (UTC)[reply]