Jump to content

Talk:BagIt

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by A goethals (talk | contribs) at 17:03, 10 January 2012. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

WikiProject iconDigital Preservation Unassessed
WikiProject iconThis article is within the scope of WikiProject Digital Preservation, a collaborative effort to improve the coverage of digital preservation on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.

Some notes

Reviewing the IETF draft and this article I found some aspects that may be worth to mention:

Limitations of BagIt

  • There is no registry of checksum algorithms and their abbreviations as used for manifest files
  • A tag manifest file cannot contain its own file name ( Hochstenbach this is a general remark - any manifest file cannot contain its own manifest)

Specification issues

  • It is not made clear, whether order of tags in tag files may be relevant. I suggest to explicitly state in the specification if order is not relevant
  • It is not made clear whether tags may be repeated (in this case order may be relevant)
  • Speaking about tag files: There is no common tag file format. Section 4.2 describes a key/value-format, but only for bag-info.txt. I'd call it a design error of BagIt to already have two tag file formats (space-separated as in manifest files and fetch.txt, and key/value-format). For additional tag files, not mentioned in the current draft, you know nothing but the character encoding, it could be any format.
  • Tag/metadata values cannot include newline characters - on the other hand whitespace is considered as part of the value. Does line folding change the value or not?
  • The general form of tag/metadata labels is not specified. Can they include spaces and non ASCII-characters, such as umlauts? Obviously they cannot include colons.
  • It should be noted that a tag files' checksum can change without relevant change in its content, if you apply Unicode normalization. —Preceding unsigned comment added by JakobVoss (talkcontribs) 06:53, 14 October 2010 (UTC)[reply]
  • It is probably worth noting that issues with the specification are best discussed in the digital-curation Google Group instead of on Wikipedia proper. The people who are responsible for editing the specification probably aren't paying attention to this talk page. Edsu (talk) 16:53, 14 October 2010 (UTC)[reply]
  • BagIt reminds me on distributed revision control systems. A bag looks like a snapshot of a RCS, or a RCS repository with one revision only. The article should contain some references to similar systems, maybe RCS can be mentioned (among others)

-- JakobVoss (talk) 09:48, 13 October 2010 (UTC)[reply]

Sentences removed

  • I removed the following sentence because it added no information: "Once a bag is received, verified and placed in storage, the manifest can be used again in the future to verify that the integrity of the bag remains intact."
  • Edsu removed the sentence "In practice “bagit.txt” only contains characters that are also part of plain ASCII, and the most common character encoding for tag files is UTF-8.". But IETF draft says
"The "bagit.txt" file should consist of exactly two lines,
BagIt-Version: M.N
Tag-File-Character-Encoding: UTF-8
Unless the "M.N" part or the "UTF-8" part include non-ASCII characters (which I strongly doubt), the encoding of bagit.txt only contains plain ASCII characters - beside the optional BOM header, that I forgot to mention. Anyway the existence of BOM headers should be mentioned in the specification.

-- JakobVoss (talk) 06:54, 14 October 2010 (UTC)[reply]

  • I put the sentence about bagit.txt character encoding back, indicating that it must be UTF-8. Distinguishing between ASCII and UTF-8 seems out of scope for this article. I initially confused this sentence with talking about the encoding of bag-info.txt instead of bagit.txt, since it was also talking about tag files.

-- Edsu (talk) 16:50, 14 October 2010 (UTC)[reply]