Draft:Flexible Tagged Format

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Symbol opinion vote.svg Comment: The one source coming from the same user that created this article is classified as original research, which is not allowed on Wikipedia. hewhoamareismyself 21:18, 11 June 2014 (UTC)


Flexible Tagged Format
Filename extension .ftf
Type code FTF
Magic number FTFORMAT
Developed by Joost de Folter
Initial release 17 May 2014; 6 months ago (2014-05-17)
Latest release
1.0 (Initial release)
(17 May 2014; 6 months ago (2014-05-17))
Type of format Computer file format
Open format? Yes
Website https://github.com/folterj/FTFLib

Flexible Tagged Format (FTF) is a completely flexible storage format using human readable tags, and potentially support storing any type of data, at the same time making it completely portable and easy to use and implement on any platform.

Introduction[edit]

Features[edit]

  • Platform independent
  • Extremely flexible
  • Truly self describing
  • Meta data stored as variable tags
  • Human readable tags describing file content / data format
  • Possible to use tags without content (like flags)
  • Any type of data formatting / data types
  • Any type of encoding / compression
  • Multiple data content elements
  • Large file support (> 2GB); using 64 bit size fields
  • Platform independent; auto endianness algorithm
  • File integrity checking
  • Easy to use / implement

Rationale[edit]

The rationale of creating this data format was to design a data format to store any type of data, in a completely flexible and unrestricted way. Though this format was mainly designed for storing scientific data in an efficient way, it can be used for any type of data, making it completely universal. In addition, it should be platform independent, support large files, support encoding, compression and integrity checking. Though there are many other flexible formats available, many have limitations, and in general appear very difficult to use / restricted to support of provided APIs.

Implementation[edit]

Structure[edit]

To store data of any type, and being able to describe this unambiguously, optional tags are used. These tags are (null terminated) upper case ASCII format making them human readable, of arbitrary length. The content of each tag is likewise ASCII format, where content is allowed to be blank. To make it possible to store any type of data, the format of the data content is completely flexible. Appropriate tags are used to accurately describe the format. This means that the support of particular data formats cannot be supported universally. Each application supports specific data format(s), and can decide whether it supports particular format(s) based on the information stored in the tags. This approach allows a completely flexible implementation, simple to realise for each application, without the need to support every possible use. The format supports multiple data elements which are stored sequentially.

                - Fixed header
                - Global tags
Element(s) /    - Element tags
           \    - Element data content
              ( - Hash)

The implementation support two levels of tags. On requesting a tag of an element, first the element tags will be searched. If the specified tag is not found, the global tags will be searched. This allows setting of 'global' tags that apply to any element, but where elements can overwrite the global value with a specific one only applying to the element. Platform indecency is implemented by a simple auto endianness algorithm, which works as follows. After the first 8 bytes containing the magic numbers, a second 8 byte set is written. Depending on the order of the bytes when this is read and converted to a long integer, the implementation will determine that either the contents is as expected, and no endianness conversion is necessary to read the size fields, or that these need to be reversed to obtain the expected results. Fields can be written without correction, and are corrected when reading using this simple algorithm.

Simple endianness algorithm[edit]

byte[] fileHeader = 0x46, 0x54, 0x46, 0x4F, 0x52, 0x4D, 0x41, 0x54
ulong endianness = 0x454E4449414E4553
bool reverse

writeHeader(Stream stream):
stream.write(fileHeader)
stream.write(endianness)
...

readHeader(Stream stream):
byte[] fileHeader0 = stream.read()
[check if fileheader0 == fileheader]
ulong endianness0 = stream.read()
reverse = (endianness0 != endianness)
...

Encoding / compression is supported by allowing the data content of each element to be encoded / compressed using a desired format. This is reflected in corresponding tags. The tags are never encoded or compressed. Large data size (larger than 2 GB) is implemented by using 64 bit size fields. File integrity checking is supported, allowing hash bytes to be stored at the end of the element(s). The hash type and length in bytes is set accordingly in the appropriate global tags.

File structure[edit]

Flexible Tagged Format files use the extension .ftf

File structure
byte[] fixedFileHeader
ulong endianness
ulong globalHeader.tagsSize
string globalHeader.tags[0].label
string globalHeader.tags[0].content
...
ulong globalHeader.contentSize
ulong elements[0].header.tagsSize
string elements[0].header.tags[0].label
string elements[0].header.tags[0].content
...
ulong elements[0].header.contentSize
byte[] elements[0].content
...
byte[] hash (optional)

Object Oriented implementation[edit]

This simplified Object Oriented implementation demonstrates how this format is accessed as a file. Note that in this example, the FtfContent object is responsible for managing the file content: reading / writing the file header including the global tags, and each element can subsequentially be read / written by using readElement() / writeElement() respectively. currentElement points to the last read or written Element.

Flexible Tagged Format UML Diagram.png

Download libraries[edit]

The Flexible Tag Format is available as C++ library, at the FtfContent class level as shown, at the file level, and as an image implementation.[1]

Tags[edit]

All Meta data tags are stored as human readable null terminated ascii strings. The FTF format uses a simple two level hierarchy. Firstly, global tags stored in the global header apply to all element(s). Secondly, each element can have its own tags, which apply only to the particular element, and in case of tags with the same name in the global header, the content of the element tags overrides the content of the global tag.

Guide lines[edit]

Standard tags for global use

  • ELEMENTS : The number of internal elements (if not set assume a single element)
  • DESCRIPTION : General description of content
  • APPLICATION : Application responsible or associated with this data
  • AUTHOR : Author of content
  • SUBJECT : Subject / context
  • HASHTYPE : Type of data hash (for file integrity checking)
  • HASHSIZE : Size of data hash in bytes (for file integrity checking)

Standard tags for global / element use

  • ELEMENT : Element number
  • LABEL : Label describing specific content
  • TIMESTAMP : Original date / time of creation
  • DIMENSIONS : General description of content dimensions, taking storage format into account

Examples[edit]

Example storing two dimensional uncompressed numerical data

  • ELEMENTS : 1
  • DESCRIPTION : Component concentration
  • TIMESTAMP : 2014/04/14 14:00:00
  • DIMENSIONS : 100 100 (content stored as rows, columns)
  • FORMAT : FP
  • PRECISION : 64
  • HASHTYPE : SHA512
  • HASHSIZE : 64 (size in bytes)

Example for storing compressed image data

  • ELEMENTS : 1
  • APPLICATION : FTFImageLib
  • TIMESTAMP : 2014/04/14 14:00:00
  • WIDTH : 1000
  • HEIGHT : 1000
  • CHANNELS : 3
  • DIMENSIONS : 1000 1000 3 (content stored as height, width, channels)
  • COLORMODEL : RGB (name of general color model)
  • COLORCHANNELSFORMAT : RGB (specific order of component channels)
  • COMPONENTFORMAT : FP
  • COMPONENTBITS : 32
  • COMPRESSION : DEFLATE

Comparison with other formats[edit]

Comparing the Flexible Tagged Format with other independent formats, a number of strengths and limitations of each of these considered data formats is described.

Flexible Image Transport System (FITS)

  • application: images
  • strengths: meta data stored as flexible human readable ASCII header
  • limitations: image data, mandatory / limited tags, no compression, no large file support

Hierarchical Data Format (HDF)

  • application: numerical data
  • strengths: hierarchical design; versatile; supports various data formats
  • limitations: not easy to use; conflicts caused by supporting various data formats; no large file support

Hierarchical Data Format version 5 (HDF5)

  • application: numerical data
  • strengths: hierarchical design; versatile; simplified file structure
  • limitations: not easy to use

Common Data Format (CDF)

  • application: multi-dimensional data
  • strengths: uses attributes for meta data, integrated compression
  • limitations: not easy to use, attributes are predefined

Network Common Data Format (NetCDF)

  • application: array-oriented scientific data
  • strengths: integrated compression
  • limitations: not easy to use

Simple Data Format (SDF)

  • application: numerical data
  • strengths: simple format allows easy implementation; ASCII meta data
  • limitation: fixed, restricted header format; no compression

References[edit]

See also[edit]

Category:Computer_file_formats

Flexible Tagged Format[edit]