Talk:BagIt
Appearance
Digital Preservation Unassessed | |||||||
|
Some notes
Reviewing the IETF draft and this article I found some aspects that may be worth to mention:
Limitations of BagIt
- There is no registry of checksum algorithms and their abbreviations as used for manifest files
- A tag manifest file cannot contain its own file name ( Hochstenbach this is a general remark - any manifest file cannot contain its own manifest)
Specification issues
- It is not made clear, whether order of tags in tag files may be relevant. I suggest to explicitly state in the specification if order is not relevant
- It is not made clear whether tags may be repeated (in this case order may be relevant)
- Speaking about tag files: There is no common tag file format. Section 4.2 describes a key/value-format, but only for bag-info.txt. I'd call it a design error of BagIt to already have two tag file formats (space-separated as in manifest files and fetch.txt, and key/value-format). For additional tag files, not mentioned in the current draft, you know nothing but the character encoding, it could be any format.
- Tag/metadata values cannot include newline characters - on the other hand whitespace is considered as part of the value. Does line folding change the value or not?
- The general form of tag/metadata labels is not specified. Can they include spaces and non ASCII-characters, such as umlauts? Obviously they cannot include colons.
- It should be noted that a tag files' checksum can change without relevant change in its content, if you apply Unicode normalization. —Preceding unsigned comment added by JakobVoss (talk • contribs) 06:53, 14 October 2010 (UTC)
- It is probably worth noting that issues with the specification are best discussed in the digital-curation Google Group instead of on Wikipedia proper. The people who are responsible for editing the specification probably aren't paying attention to this talk page. Edsu (talk) 16:53, 14 October 2010 (UTC)
Related systems
- BagIt reminds me on distributed revision control systems. A bag looks like a snapshot of a RCS, or a RCS repository with one revision only. The article should contain some references to similar systems, maybe RCS can be mentioned (among others)
-- JakobVoss (talk) 09:48, 13 October 2010 (UTC)
Sentences removed
- I removed the following sentence because it added no information: "Once a bag is received, verified and placed in storage, the manifest can be used again in the future to verify that the integrity of the bag remains intact."
- Edsu removed the sentence "In practice “bagit.txt” only contains characters that are also part of plain ASCII, and the most common character encoding for tag files is UTF-8.". But IETF draft says
- "The "bagit.txt" file should consist of exactly two lines,
- BagIt-Version: M.N
- Tag-File-Character-Encoding: UTF-8
- Unless the "M.N" part or the "UTF-8" part include non-ASCII characters (which I strongly doubt), the encoding of bagit.txt only contains plain ASCII characters - beside the optional BOM header, that I forgot to mention. Anyway the existence of BOM headers should be mentioned in the specification.
-- JakobVoss (talk) 06:54, 14 October 2010 (UTC)
- I put the sentence about bagit.txt character encoding back, indicating that it must be UTF-8. Distinguishing between ASCII and UTF-8 seems out of scope for this article. I initially confused this sentence with talking about the encoding of bag-info.txt instead of bagit.txt, since it was also talking about tag files.