Talk:BagIt

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Digital Preservation  
WikiProject icon This article is within the scope of WikiProject Digital Preservation.
 ???  This article has not yet received a rating on the quality scale.
 

Some notes[edit]

Reviewing the IETF draft and this article I found some aspects that may be worth to mention:

Limitations of BagIt[edit]

  • There is no registry of checksum algorithms and their abbreviations as used for manifest files
  • A tag manifest file cannot contain its own file name ( Hochstenbach this is a general remark - any manifest file cannot contain its own manifest); ( Tibaut Houzanme This is to explain the earlier remark: meaning, one cannot build the house one is born in. One file has to be created, before another file can contain its name as part of manifest list. However, a remedy would be that the manifest text file can be made to contain a Hash of the text or string that makes up the manifest list itself. Then when a manifest is modified, not only the manifest' Hash will be different, but also the string of text that make it up will also be different. Would this be necessary? Is the real question. And I posit, the data checksum, the manifest checksum and the metadata checksum together is sufficient.)

Specification issues[edit]

  • It is not made clear, whether order of tags in tag files may be relevant. I suggest to explicitly state in the specification if order is not relevant
  • It is not made clear whether tags may be repeated (in this case order may be relevant)
  • Speaking about tag files: There is no common tag file format. Section 4.2 describes a key/value-format, but only for bag-info.txt. I'd call it a design error of BagIt to already have two tag file formats (space-separated as in manifest files and fetch.txt, and key/value-format). For additional tag files, not mentioned in the current draft, you know nothing but the character encoding, it could be any format.
  • Tag/metadata values cannot include newline characters - on the other hand whitespace is considered as part of the value. Does line folding change the value or not?
  • The general form of tag/metadata labels is not specified. Can they include spaces and non ASCII-characters, such as umlauts? Obviously they cannot include colons.
  • It should be noted that a tag files' checksum can change without relevant change in its content, if you apply Unicode normalization. —Preceding unsigned comment added by JakobVoss (talkcontribs) 06:53, 14 October 2010 (UTC)
  • It is probably worth noting that issues with the specification are best discussed in the digital-curation Google Group instead of on Wikipedia proper. The people who are responsible for editing the specification probably aren't paying attention to this talk page. Edsu (talk) 16:53, 14 October 2010 (UTC)

Related systems[edit]

  • BagIt reminds me on distributed revision control systems. A bag looks like a snapshot of a RCS, or a RCS repository with one revision only. The article should contain some references to similar systems, maybe RCS can be mentioned (among others)

-- JakobVoss (talk) 09:48, 13 October 2010 (UTC)

Sentences removed[edit]

  • I removed the following sentence because it added no information: "Once a bag is received, verified and placed in storage, the manifest can be used again in the future to verify that the integrity of the bag remains intact."
  • Edsu removed the sentence "In practice “bagit.txt” only contains characters that are also part of plain ASCII, and the most common character encoding for tag files is UTF-8.". But IETF draft says
"The "bagit.txt" file should consist of exactly two lines,
BagIt-Version: M.N
Tag-File-Character-Encoding: UTF-8
Unless the "M.N" part or the "UTF-8" part include non-ASCII characters (which I strongly doubt), the encoding of bagit.txt only contains plain ASCII characters - beside the optional BOM header, that I forgot to mention. Anyway the existence of BOM headers should be mentioned in the specification.

-- JakobVoss (talk) 06:54, 14 October 2010 (UTC)

  • I put the sentence about bagit.txt character encoding back, indicating that it must be UTF-8. Distinguishing between ASCII and UTF-8 seems out of scope for this article. I initially confused this sentence with talking about the encoding of bag-info.txt instead of bagit.txt, since it was also talking about tag files.

-- Edsu (talk) 16:50, 14 October 2010 (UTC)