Talk:Sparse file

This is the talk page for discussing improvements to the Sparse file article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Computing: Software C‑class Low‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
C	This article has been rated as C-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-importance on the project's importance scale.
	This article is supported by WikiProject Software (assessed as Low-importance).

If anyone would like to edit this page, more info can be found here. I think the information on that site is more clear to me (probably the illustrations helped a lot to make it clear to me, maybe someone could make an illustration... --Bernard François 17:43, 11 June 2006 (UTC)[reply]

I have always found sparse files to be more trouble than they are worth, personally. If you would like to see a defense of them (possible legitimate applications of them), try this link: http://www.cs.wisc.edu/~thain/library/sparse.pdf Timothy Andux-Jones 15:39, 26 March 2007 (UTC) I removed the link to that PDF from the article because that's not the same kind of sparse file. —Preceding unsigned comment added by Chekholko (talk • contribs) 02:18, 28 November 2007 (UTC)[reply]

I believe the explanation here is very clear. One read and i knew what it was. By the way, sparse files are used on any Unix and linux (like in lastlog). Paul Cobbaut 20:20, 27 July 2007 (UTC)[reply]

More info

What are sparse files good for?
What is returned if one reads from a sparse area of a sparse file?

Thanks, --Abdull (talk) 21:13, 8 February 2008 (UTC)[reply]

Disadvantages Might be Criticisms

I had a little prod in Google to try to find a better method for detecting the sparse files. Nothing came up after three queries, I probably didn't have the right expression on my face at that moment. I imagine that it would be fairly easy to write such a program although that has nothing to do with Wikipedia of course. I know you were already thinking it, but hey it could be useful right?

Why could it be useful?

Welllllll to represent this in the article we would have to ditch the "advantages/disadvantages" and instead have "benefits/criticisms" or suchlike. Consider if you will rsync. Rsync and programs like it will flesh the file out to its full size before transferring it, which results in a lot of wasted bandwidth. Just a thought. These links may be of some use, although I doubt that they could be useful as sources per se: http://www.ntfs.com/ntfs-sparse.htm http://kerneltrap.org/mailarchive/openbsd-misc/2007/11/9/398477

I think that the whole idea of sparse files in and of itself smells a lot like filesystem compression. Definitely distinct but also probably related, no? Anyway I hope I am at least slightly helpful. Cheers. 125.236.211.165 (talk) 07:14, 24 March 2008 (UTC)[reply]

Although it might seem similar to compression, the idea behind sparse files is not wasting disk bandwidth+space on non-data sections (zero-filled), while compression trades a comparatively huge amount of CPU resources for disk bandwidth+space on actual data sections. Jarfil (talk) 18:10, 20 May 2008 (UTC)[reply]

I believe the actual driving force behind sparse files is not "compression" per se, but are based on kernel dumps and core dumps. Those are files that if you sequentially dump memory you will encounter many areas of memory which are not mapped at all (the kernel and process VMA space is very much a parallel to a "file with holes") and at some point some clever person probably realized that all the blank pages that kernel/core dumps wrote out could be avoided on-disk and stop disks from filling up. It also greatly speeds the time spent doing a kernel/core dump to disk since those pages don't need to be written to disk. I don't have a reference to this as the origination, but sparse-files are most useful in this application, so I expect this is the origin of them. Lamontcg (talk) 18:10, 23 August 2009 (UTC)[reply]

Depends on viewpoint. To the filesystem, perhaps it's not 'real' data, but to an application reading the file, such holes are undetectable except by the kind of heuristic cp uses. It also requires a writing application to request holes, and it's typically faster than writing actual zero blocks; plus, filling in holes tends to increase fragmentation. But at the abstraction level between filesystem and file, this is still fundamentally compression. ddawson (talk) 13:08, 4 April 2009 (UTC)[reply]

You don't have to write an application to request holes. If you seek() to a position in a file and start writing it will automatically create a sparse file. Coding this is trivial. Most applications tend to write sequentially and not reserve space in the middle of files and seek around, so this doesn't happen in the common case, but any application that creates a file by writing blocks out of order will create a sparse file. Lamontcg (talk) 18:10, 23 August 2009 (UTC)[reply]

I'd say that sparse files is a kind of compression; it doesn't matter whether it was designed with different goals in mind, as long as it makes files smaller. Quoting Data compression: Data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. --Erik Sandberg (talk) 15:04, 11 September 2009 (UTC)[reply]

Detecting sparse files

The method given for detecting sparse files in Unix is not quite correct. It states, "Sparse files have different apparent and actual file sizes." While true, this doesn't help; a moment's reflection should help one realize this is also true for most non-sparse files, as any time a file doesn't fill a whole number of blocks, the allocated size will greater by at least the unused number of bytes (and greater yet when indirect blocks are involved). I guess what it should say is that the apparent size of a sparse file is (typically) larger than the allocated size, and that such a condition is a reliable indicator. There are borderline cases involving very small holes and indirect blocks where a sparse file would not be detected as sparse, but I expect those are rare and not important for most uses.

Of course, for FSs without inodes, things will probably be a little different. ddawson (talk) 15:10, 4 April 2009 (UTC)[reply]

Linux command options

I'm pretty sure that cp --sparse=always is linux- or GNU-coreutils-specific, my FreeBSD 7.x servers don't have this option. Lamontcg (talk) 18:10, 23 August 2009 (UTC)[reply]

Confirmed. The --sparse option is GNU-cp specific, and doesn't exist (yet) in FreeBSD's cp(1). I'll update the paragraph accordingly. Cghost (talk) 13:10, 18 October 2009 (UTC)[reply]

History

Sparse files have a long history in Unix. GNU tar first supported sparse files in 1990 in version 1.09^[1]. Clearly filesystems must have implemented sparse files before then. —Preceding unsigned comment added by Lamontcg (talk • contribs) 18:17, 23 August 2009 (UTC)[reply]

Sparse files aka file holes

Should we also mention these are known as file holes? (Understanding Linux Kernel by Cesati mentions file holes instead)... it took me a while to figure out file holes cause sparse files

^ http://www.gnu.org/software/tar/manual/html_section/Sparse-Formats.html

[1] ttp://www.gnu.org/software/tar/manual/html_section/Sparse-Formats.html

[1]