Talk:Git/Archive 2

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Archive 1 Archive 2 Archive 3


The "Unique Characteristics" and "Using GIT" sections are laden with language that fails Wikipedia's requirements of Wikipedia:Neutral point of view. This needs to be cleaned up. -/- Warren 06:00, 3 June 2007 (UTC)

"Using Git", as mentioned above on this discussion page, is just broken. But as for "Unique characteristics", I'm trying to figure out precisely what's wrong. I think it goes on a bit, and could be tightened up, but I'd hate to perpetuate NPOV problems and I'm not quite grasping how the problem applies in this specific instance.
I just read through the NPOV policy and most of the associated articles, but they're mostly about edit wars and political hot buttons and situtations where there are strongly polarized opinions. The problem here is I'm not seeing what the alternate point of view is!
Most articles are written by fans, and no difference here, but I don't see significant egregious gushing. There are a number of factual assertions, and while there aren't specific citations for each one, as far as I can tell they're all objectively true. The strongest assertions are specifically footnoted.
The first two paragraphs could probably be toned down; I'll do that.
The most questionable fact is "toolkit design"; as part of the MS-Windows porting effort, there's an effort underway (and nearing completion as of 1 June 2007) to eliminate all the shell scripts, so this could stand an update. It's a bit tricky, though—it's still designed that way; only the implementation has changed.
The section title could perhaps be "Comparison to other version control systems", since some of the points are things that are not truly unique, but the current one has the general idea and is shorter. "Distinguishing characteristics", perhaps?
Maybe it's a pervasive tone thing and not specific words. Still, could you (or anyone) point out a specific example to make the flaws clearer? Thanks. 17:59, 3 June 2007 (UTC)
Statements like "Git supports rapid and convenient branching and merging" are simply not acceptable. More precisely, words like "rapid" and "convenient" are not words that an encyclopedia should use to describe something, unless such a claim can be quantified. These are assertions of opinion, not statements of fact. Here are some other similar statements with inappropriate phrases bolded: "Repositories can be easily published"; "shell scripts that provide convenient wrappers. It is easy to chain the components together to do other clever things."; "It is thus easy to experiment with new merge algorithms"; "communication for the merge is small and efficient". It's fine to state that a piece of software is designed to be "fast" or "easy", if you can find someone authoritative who has stated that; you would also need to say who made that statement. Research papers or other studies that quantitatively measure Git in comparison with other SCM systems is a good basis for content, too. However, unchallenged statements of a product's awesomeness are WP:NOT acceptable per Wikipedia's policy on advertising. -/- Warren 18:54, 3 June 2007 (UTC)
Thanks, I'll fix! Note that in many cases, the adjectives can be justified by links to benchmarks. The shell-script wrappers part is trying to say "more convenient than the primitives", which I think is a sufficiently obvious statement to not need specific citation, but I can certainly clarify it. 02:40, 4 June 2007 (UTC)

Something specific: 'Efficient handling of large projects. Git is very fast, and scales well. It is commonly an order of magnitude faster than other revision control systems, and several orders of magnitude faster on some operations.[cited]'

Apart from the aforementioned problems (note the Bold items), the phrase 'Other revision control systems' is very general, bordering on the deceptive. You have to go to the linked article (blog) to notice that it is only talking about Bazaar and Mercurial, in a test that the cited article itself calls 'unscientific'. And despite giving timings to the millisecond, the comments note 'The units for those tests are seconds, and they're all wall clock times.'
For this sort of statement to be asserted; the methodology needs more rigour; and the range of systems it is compared to should be wider, and include a winder range of systems such as CVS, Subversion, and commercial systems such as ClearCase (and PerforceHumg 23:17, 31 August 2007 (UTC)).
ie. To make assertions about GIT's speed; especially if asserting that it is 'orders of magnitude' faster, better citations are necesscary.
$0.02. EasyTarget 10:06, 5 June 2007 (UTC)
I'll dig them up; the statement is objectively true, just needs better citation. Mercurial is the only VCS that's close. Here's a first pass at relevant links. OpenSolaris SCM evaluation FreeBSD evaluation of VCS speed Jst's evaluation for Mozilla (notice the > 100x speed difference on several operations) a subjective blog post Some very fragmentary benchmarks at the wiki An Hg vs. bzr comparison and this and this and this dead link that gets cited a lot (and I've found relocated here) that can be used to link hg vs. git observations like this. Too bad that omits numbers, as does this observation that only says "blazingly fast" vs. "unworkably slow". Likewise this complaint about SVK.
Oh! I missed this link farm of VCS comparisons. And here are some 1-10 rankings of 7 VCSs including "speed", but it doesn't quote figures.
You know, a direct comparison with CVS is hard to find. Here's a space comparison (and I know I've seen others), but we want time comparisons. This fairly famous blog post by Keith Packard says "We've all gotten very spoiled by Git; many operations which take minutes under CVS now complete fast enough to leave you wondering if anything happened at all." and this blog post about moving from CVS to git says "The first thing you will notice is that it’s damn fast!", but neither quote numbers. Relative to CVS, it appears sufficiently lopsided that people don't generally quantify it. I'll have to search mailing list archives for info.
a lot of comparisons only talk about features and not performance. CVS vs. SVN speed comparison.
here's an observation to support the "it's easy to use git in scripts" point. Not relevant to speed, but useful for later.
Of course, I could always do benchmarks myself, but that would be "original research". 22:56, 5 June 2007 (UTC)
Another speed comparison (where he creates a branch in a very non-optimal way with git). Here's a not very scientific git vs. BitKeeper comparison.
Thanks! Those are much more informative comparisons. An observation:
- They are still limited in the systems they compare to; only making comparisons to other (mostly FOSS) CM tools with a similar use model and architecture. There are also very successful and widely used commercial CM systems; eg. ClearCase, StarTeam and Synergy, that I do not see covered in any of those references.
I'm not going to make the change myself, but the statement should be put in context; GIT's speed advantage has only (so far) been demonstrated when compared to similarly orientated tools, and only on POSIX architectures.
EasyTarget 09:53, 13 June 2007 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── It should be noted that "commonly", from 'commonly an order of magnitude faster than other revision control systems,' is a term of art in computer science that refers to runtime execution speed with respect to time (see any Computer Science/Mathematics article on big-O notation). It specifically refers to the "average case" execution time. While such a claim may require citation (typically via benchmarks) it does not indicate bias, or an intent to advertise. James Martinez 5:53, 25 July 2007 (UTC)

How about adding a 'criticisms' section too? Compared to other versioning systems like svn of even sccs, git is terribly hard to use and the documentation is not particularly helpful. Basically it suffers from the same weaknesses that Linux suffered from for a decade or so of its early life. High technical quality and powerful options, but terrible in terms of usability for a new user or a non-super user. IMO it should have stayed as an engine, it really needs a front-end to be less than maddening to use. (talk) 21:14, 23 May 2008 (UTC)
Huh? Git used to be WAY harder to use than it is now. Sure you have to bang on it a while, but so what? Git is highly learnable. This is FUD and it does not belong on Wikipedia. If you absolutely must, learn SVN first and then transition to git after six months to a year. Criticism sections are discouraged in any case. (talk) 23:49, 17 June 2008 (UTC)
Where are criticism sections discouraged? I'm asking because I don't know. Thanks. ~a (usertalkcontribs) 02:14, 18 June 2008 (UTC)
Usability and documentation quality, though difficult to quantity, are very important and oft-overlooked aspects of software. I'm glad to hear the git has made some improvement, but going from terrible to bad, doesn't necessarily mean that the mention of this issue is unwarranted. There are two important reasons to work this information into the article. #1:WP:NPOV , the article mentions many of the good things about git, and I agree that it has many virtues. It virtues don't preclude mentioning its vices. #2 If someone is evaluating various VCS's and they aren't informed of Git's strength's AND weaknesses, Wikipedia would be doing this person a terrible disservice by functioning more as an advertisement than a source of balanced viewpoints. If you have a bunch of Linux comfortable power-users who don't mind taking time away from their project to fiddle with a piece of software, then git seems like a great way to go. But not every project is developed by those sorts of users. (talk) 04:13, 5 December 2008 (UTC)
But this "criticism" should be presented in encyclopedic way. Saying that Git is difficult to use and referring to some random blog rants is not the way. The encyclopedic way is either (1) to clearly describe how Git does certain things (i.e., tell user the facts) and then let user evaluate if it's difficult or not, (2) or to refer to some real studies which compares features and shows clear evidence of some usability problems. If we can't provide one of these the whole "Git is difficult to use" is only repeating a mantra (and is often just based on feelings). Furthermore, if there are some good references that indicate usability problems then let's not say that "Git is difficult" but tell exactly what is the supposed problem and let user evaluate if it causes difficulties to her. —Preceding unsigned comment added by (talk) 05:25, 21 March 2009 (UTC)
Agreed. In many ways, this article is a fan letter. Cernansky (talk) 22:36, 27 July 2009 (UTC)

Meaning of the word 'git'

This isn't worth putting on the article page, but just FYI, "git" does not (at least directly) mean a stupid or unpleasant person. It's actually the slang pronunciation of the word "get" which you may be more familiar with in the context of "the get of my loins", or "he's a misbegotten son of a ...". In other words, it's an alternative for the word 'bastard'. Which actually fits quite well with the quotation from Linus: Linus Torvalds has quipped about the name "git", which is a British slang for a stupid or unpleasant person: “I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'.” -- 13:51, 13 August 2007 (UTC)

"Unique characteristics" is a bad section header

In particular, the word "unique" is troublesome. You can't guarantee that any of the characteristics are unique, and some of them demonstrably are not.

All it takes is for a new VCS to come out that includes some of these features, and suddenly the article is making a false claim.

Really this section is about what makes Git interesting compared with other VCSs.

Some viable replacements for the section header:

  • Distinguishing features
  • Notable characteristics
  • Differentiating features
  • Idiosyncracies
  • Comparison to similar projects
  • Peculiarities

Or, simply

  • Characteristics

Direvus 22:04, 14 August 2007 (UTC)

Also, within this is at least one thing that is not a designed feature, as designated:

  1. Garbage accumulates unless collected. Aborting operations or backing out changes will leave useless dangling objects in the database. These are generally a small fraction of the continuously growing history of wanted objects, but reclaiming the space using git-gc --prune can be slow.[17]

This should be moved somewhere else if this section stays with its current intent. (talk) 16:37, 9 April 2008 (UTC)

Missing info?

I'm semi-technical, and this encyclopedia entry leaves a lot to be desired -- what language was git programmed in? I see that there is a C implementation, which implies that it was written in something else. If this info could be added by somebody knowledgeable, that would be great. Also, the "bullet-format" entry does not make for an easy read at all. —Preceding unsigned comment added by (talkcontribs) 23:34, 20 August 2007

Interview with Junio Hamano

Episode 19 of FLOSS Weekly has an extended interview with Junio Hamano.
Dvandelay 21:56, 2 September 2007 (UTC)

software configuration management

At the start of the tech talk (see article references), Linus starts by saying that git is a source control management system, and not a software configuration management system, as the wikipedia article says on the first line. Who's right? -- 09:26, 13 November 2007 (UTC) Web Service

Feels to me like we should give more prominent mention to the public web service

I'm not yet skilled enough as a Wikipedian to be sure if this is an encyclopedic idea or not, nor how to accomplish it idiomatically.

Wikipedia trails today include:

Linux Kernel (gitweb)

I imagine we lose most of the audience at the transition to footnote from page.

-- Pelavarre 15:48, 2 December 2007 (UTC)

Confusingly ambiguous thing about snapshooting

The text says: "One property of Git that has led to considerable controversy is that it snapshots directory trees of files.". This appears to say: "of all the things that can be done to directory trees of files, snapshooting is the thing (or one of the things) that git does." What I think it means to say is "The things that Git snapshoots are directory trees of files rather than individual files." This could perhaps do with clarification. AMackenzie (talk) 21:44, 23 January 2008 (UTC)

Wrong source link

Can someone who know which one is correct, correct the source link for linus comment. Link says [43] but it goes to #42. This probably isn't right. —Preceding unsigned comment added by (talk) 12:32, 24 January 2008 (UTC)

Article status

I removed the cleanup tag that had been there since August 2007. This article is much better than that now! The article covers quite much and is very well referenced, even if there are still things to do. Most importantly there still needs to be some editing done to convert the list style to a more fluid text. We need to isolate even fewer really characteristic features of git to describe. As a git user I know that it might be hard to reduce the number of unique features. -- Sverdrup (talk) 20:48, 8 March 2008 (UTC)

"git bisect" is worthy of some text some time...

Since copied, it's a dearly loved "killer feature" by those who use it. References:

  • Andreas Ericsson (2008-03-11). "Re: Mercurial's only true "plugin" extension: inotify... and can it be done in Git?". git (Mailing list). Clearly, git is the most innovative tool here, since its developers managed to cook up something so immensely useful as "git bisect" (which by the way is well-nigh single-handedly responsible for reducing our average bugreport-to-fix time from 4 days to 6 hours). 
  • Michal Piotrowski (2007-06-17). "Chapter 4: Git, quilt and binary searching". Linux Kernel Tester's Guide (PDF). Translated by Rafael J. Wysocki (version 0.3-rc1 ed.). Retrieved 2008-03-11.  Unknown parameter |coauthors= ignored (|author= suggested) (help)
  • Bowes, James (2007-02-18). "git bisect: A practical example with yum". James Bowes' blog. Retrieved 2008-03-11. I used git bisect to track down a bug in yum last night. It was so easy and practical that I figured I should record it here, so that others might want to give git a try. 

Oh, and another Git talk, this one post-1.5-release:

  • David Nusinow (2007-06-18). Maintaining Packages with Git (ogg). Debconf 2007. Event occurs at 02:34. Retrieved 2008-03-12. You don't realize how slow [subversion] is until you're not having to hit the network every single time you do an operation. 
(Addendum: Just listened to the talk; it's truncated at 15 minutes and not very interesting. Ah, well.) (talk) 03:46, 12 March 2008 (UTC)

More references... Linus bragging

Placed here for possible reference when more article text gets written.

  • Linus Torvalds (2006-11-28). "git and bzr". bazaar (Mailing list). Retrieved 2008-03-13. Such a "multiple sources" case can actually be found by doing

    git blame -C tree-walk.c

    which (correctly) figures out that the code comes from both merge-tree.c (the "entry compare/extract" functions)_and_ from sha1_name.c (the "find_tree_entry()" function).
    So yes, "git blame" is a _hell_ of a lot more powerful than anybody else's "annotate", as far as I know. I literally suspect that nobody else comes even close.
  • Linus Torvalds (2005-06-22). "Do a cross-project merge of Paul Mackerras' gitk visualizer". Retrieved 2008-03-13. This merge itself is pretty interesting too, since it shows off a feature of git itself that is incredibly cool: you can merge a separate git project into another git project. Not only does this keep all the history of the original project, it also makes it possible to continue to merge with the original project and the union of the two projects. 

And a "Linus is proud of his performance" link:

  • Linus Torvalds (2006-11-28). "git and bzr". bazaar (Mailing list). Retrieved 2008-03-13. Performance is important to git, but it's important not in the sense of "let's not do it because it performs badly", but in the sense of "things should be so fast that people don't even realize that they are done". You guys may count commit times in seconds. I still want to commit multiple patches _per_second_ to the kernel tree. THAT is performance. 

—Preceding unsigned comment added by (talkcontribs) 08:41, 13 March 2008

Two inaccuracies

I have uncounted two inaccuracies in the article. Can anyone correct them? Below, I have explained what is wrong with those statements.

Statement 1:

Git is a set of primitive programs written in C, and a large number of shell scripts that provide convenient wrappers.[13] It is easy to chain the components together to do other clever things.[14]

There is constant work of converting existing shell scripts to C. Currently, most of git commands are implemented in C with less than 25% implemented as shell script wrappers. However, this does not mean that that the ability to write convenient wrappers as shell scripts has diminished over time.

Maybe, it should be re-written as: "Git started as a set of primitive programs written in C, and a large number of shell scripts that provide convenient wrappers.[13] Most of those shell scripts are converted to C now, but it still easy to chain the components together to do other clever things.[14]"

Statement 2:

Git on Windows is noticeably slower,[46] due to Git's heavy use of file system features that are particularly fast on Linux.[47]

The main problem with the Cygwin version of Git is emulation of Fork (operating system). Describing fork() as file system feature is incorrect as it has nothing to do with filesystem, but process creating.

Fork is available on any POSIX-compliant operating system, and it is reasonable fast on any system using a copy-on-write technique, while Cygwin emulation has to perform the full copy with other overhead caused by the lack of support in the kernel, which makes it so expensive.

The problem is most noticeable for git commands implemented as shell scripts, because shell scripts often create new processes to do their job, which involves fork().

--Dpotapov (talk) 22:17, 23 March 2008 (UTC)

Thanks for the suggestions. Why not be bold and go ahead and make the changes? Your first point is definitely well-taken (and I'll make the change), but I question the second. While fork is one problem, the really nasty one is the lstat(2) equivalent GetFileAttributesExA(). On Linux, due to the dcache, it's blazingly fast, and git relies on being able to stat() every file in the source tree very rapidly. But it's far slower on Windows. Some recent work has eliminated some redundant stat() calls and eliminated some emulation overhead,[1] but there's still a difference.
Oh, here's a reference to the MinGW port nearing completion.[2] (talk) 05:08, 26 March 2008 (UTC)
Why I think that the statement #2 is incorrect:
1. Fork is the *major* problem for Git on Windows. Its emulation by Cygwin is *very* slow, and fork() is NOT a filesystem feature. To demonstrate how slow shell scripts can be on Windows, here is one example [3]. When git-fetch was re-written from shell to C, Windows users reported 25x or more speedup, while Linux users have not noticed any difference. (Probably, because on Linux the speed was bound by network communication and time needed for the server to respond). So, the lost of performance on fork() emulation is really huge on Windows.
2. Indeed, Git does stat() on every file in the working tree, but practically any other version control system does so, because before to check-in changes or show changes or many other operations, you have to find what files have been changed. The only exception I heard about is Mercurial, which optionally allows to run a special deamon, which monitors changes in your work tree, and thus to avoid the need to scan the whole directory to find changes. Thus doing stat() on every file in your work is pretty normal.
While emulation of lstat() in Cygwin has some overhead over using Windows native API, it is not as big to be easily noticeable. You can try to run Windows-native version of Subversion and then Cygwin version of Subversion, and see what difference does it makes...
You said stat() is blazingly fast on Linux and far slower on Windows. IMHO, what is "blazing fast" and "far slower" is very subjective. Do you have any numbers?
Here are results of what happened when the number of lstat() calls was cut twice:

on Linux, the performance increased on 57% [4], while on Windows (MINGW version) it increased only on 39% [5].

If lstat() were the main cause why Git is slower on Windows than on Linux then you would expect a bigger gain on Windows than on Linux, but in reality the opposite is true.
Thus, the idea of stat() as being the cause is completely discredited. In seems, the original idea was based solely on some Linus' speculation a long time ago, but Linus has never run Git on Windows. Besides, later, after seeing the result, he openly admitted: "I have absolutely no idea how to do performance analysis or even something simple as getting a list of system calls from Windows (even if I had a Windows machine, which I obviously don't ;), so I'm afraid I have no clue why git might be considered slow there. I was hoping this was it" [6].
So, I believe the phrase "due to Git's heavy use of file system features that are particularly fast on Linux." should be removed as lacking of any ground.
3. I think the phrase "Git on Windows is noticeably slower" is correct, but the reference associated with this phrase is a bit misleading and requires further explanation. In cases where Git commands written in C is used, Git performs only slightly slower than on Linux (it still may be noticeable, but I have also noticed that SVN can be slower on Windows than on Linux in some cases). The real slowdown happens when a Git command written in shell is used. These commands can be slower on Windows 10 times or more. (See my above example about rewriting git-fetch in C). The reference attached to the phrase about Git being slower on Windows refers precisely to the case where a shell script was used. Unfortunately for Windows users, git-merge is still a shell script, so it is not surprisingly that it is considerably slower on Windows than on Linux.
--Dpotapov (talk) 11:19, 26 March 2008 (UTC)
git-merge has been rewritten in C. There are only some rarely-used tools written in sh/perl scripts. The shell-script thing is very much history now. (talk) 22:17, 26 December 2008 (UTC)

Missing features

In the critism section is states "Missing features that other more mature version control systems provide." Can we have a list of missing features there, or have some references? Just stating that it has missing features without even naming them is a bit low, IMHO. (talk) 21:59, 14 July 2008 (UTC)

Detailed criticism?

That man pages reference others is part of the design of manpages, an explanation why this is a problem here could be appropriate. Also i think some of the 'missing features' should be noted. Quite generating FUD this way IMHO. —Preceding unsigned comment added by (talk) 14:49, 7 August 2008 (UTC)

Looked at the comparison chart too and the only 'missing' feature seems to be file-rename-merging which is not needed by git's design of not handling single files but the whole repository.
Please enlight me what other systems can do that git can't?!? —Preceding unsigned comment added by (talk) 14:53, 7 August 2008 (UTC)
I'm not sure it is a question of what git can't do so much as how easy it is to do things in git. For example, Linux in 1996 could theoretically do everything that Windows could, but it wasn't until several years later that Linux actually developed to the point that doing a lot of those things became practical, simple, economical, etc... Git is a similar story. Along those lines, I noticed that the criticism section says that Older versions of git were criticized and seems to imply that these criticisms have been addressed. Are there any references that indicate these issues have been addressed? Are there any major open source projects that have adopted it in the last year? If not, I'm not sure why the criticisms are any less valid or why our language should imply they are less valid. In absence of evidence to the contrary, I'd propose a revert. (talk) 03:49, 5 December 2008 (UTC)
You aren't following any of Git's development, are you? Git's development is very fast-paced and a version that is half a year old is considered an ancient version. Since its creating, Git has matured and has become a solid tool. The usability has dramatically improved since the time when those critics were justified. And yes, major projects have been switching to Git, perl being just the latest example. Certainly a revert is not justified here, because as of today (version 1.6), Git is not any harder to learn than any other SCM (from the point of view of a neutral person that knows no such tool at all yet). I'd rather say that if you think any of the criticisms is still valid, then provide references and explain what exactly you think Git is not able to do, or what can't be done easily. (talk) 16:58, 24 December 2008 (UTC)

I'm really missing a "drawbacks" or "criticism" section here. I'm not an SVN lover, or anything, but I'm working on putting together a proposal for Git, and I like to be fair and include all features and drawbacks. Doing a simple Google search you can see that Git has a much steeper learning curve, its distributed nature makes it more prone to conflicts when commiting data to the server, and thus requires more merging or patch management, it is really not suited for handling large files (and there's actually a section that says that it's good for large projects, which is very misleading!), you always have to clone the entire repository, you can't clone a single directory... Wikipedia should be about information, leaving drawbacks out just to make a product look better is misinforming by obfuscation. CrisLander (talk) 08:44, 17 June 2010 (UTC)

I support including a criticisms section, but protest most of the statements in the previous paragraph. Out of 7 claims made, 1 is true, 2 are true but not drawbacks, and 4 are not true, IMHO of course. In particular the fact that the official source code repositories for the Linux kernel, window system, Perl programming language, Qt user interface toolkit, and FlightGear flight simulator (to name a few) are in Git provides some evidence that calling Git "good for large projects" is not "misleading". Again I'm in favor of a criticisms section, but the content needs more corroboration than "a simple Google search." I'm against including a "drawbacks" section, as that term implies a value judgment which does not fit Wikipedia's neutral point of view approach. (talk) 05:04, 28 December 2010 (UTC)