Data file

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A data file is a computer file which stores data to use by a computer application or system. It generally does not refer to files that contain instructions or code to be executed (typically called program files), or to files which define the operation or structure of an application or system (which include configuration files, directory files, etc.); but specifically to information used as input, or written as output by some other software program. This is especially helpful when debugging a program.

Most computer programs work with files. This is because files help in storing information permanently. Database programs create files of information. Compilers read source files and generate executable files. A file itself is a bunch of bytes stored on some storage device like tape, magnetic disk, Optical disk etc. The data files are the files that store data pertaining to a specific application, for later use.

Storage types of Data file[edit]

The data files can be stored in two ways:

  1. Text files.
  2. Binary files.

A text file (also called ASCII files) stores information in ASCII characters. A text file contains visible characters. One can see the contents of file on the monitor or edit it using any of the text editors. In text files, each line of text is terminated, (delimited) with a special character known as EOL (End of Line) character. In text files some internal translations take place when this EOL character is read or written.

Examples of text files

  • A file containing a C++ program

A binary file is a file that contains information in the same format in which the information is held in memory i.e. in the binary form. In binary file, there is no delimiter for a line. Also no translations occur in binary files. As a result,binary files are faster and easier for a program to read and write than the text files. As long as the file doesn't need to be read or need to be ported to a different type of system, binary files are the best way to store program information.

Examples of binary files

  • An executable file
  • An object file

In C++, a file, at its lowest level, is interpreted simply as a sequence, or stream, of bytes. One aspect of the file I/O library manages the transfer of these bytes. At this level, the notion of a data type is absent. On the other hand, file, at user level, consists of a sequence of possibly intermixed data types——characters, arithmetic values, class objects. A second aspect of file I/O library manages the interface between these two values.

Stream[edit]

A stream is a sequence of bytes. A stream is a general name given to a flow of data. Different streams are used to represent different kinds of data flow. Each stream is associated with a particular class, which contains member functions and definitions for dealing with that particular kind of data flow. The stream that supplies data to the program in known as an input stream. It reads the data from the file and hands it over to the program. The stream that receives data from the program is known as an output stream. It writes the received data to the file. The following figure illustrates this.

File Input and Output using streams[1]

When the main function of the program is invoked, it already has three predefined streams open and available for use. These represent the "standard" input and output channels that have been established for the process.

The fstream.h header file[edit]

In C++, file input/output facilities are implemented through a component header file of C++ standard library. This header file is fstream.h. The fstream library predefines a set of operations for handling file related input and output. It defines certain classes that help one perform file input and output. For example, ifstream class ties a file to the program for input; ofstream class ties a file to the program for output; and fstream class ties a file to the program for both input and output. The classes defined inside fstream.h derive from classes under iostream.h, the header file that manages console I/O operations in C++. Following figure shows the stream class hierarchy.

Stream class Hierarchy[2]

The functions of these classes have been summarised in the following table:

Functions of File Stream Classes[3]

Opening and closing files in C++[edit]

In C++, opening of files can be achieved in two ways:

  1. Using the constructor function of the stream class.
  2. Using the function open().

The first method is preferred when a single file is used with a stream, however, for managing multiple files with the same stream, the second method is preferred.

  • Opening files using Constructors
ifstream input_file("DataFile");

The data being read from DataFile has been channelised through the input stream as shown:

Data being read from 'datafile' using an input stream[4]

The above given statement creates an object (input_file) of input file stream. The object name is a user defined name. After creating the ifstream object input_file, the file DataFile is opened and attached to the input stream input_file. Now both, the data being read from DataFile has been channelised through the input stream object. The connections with a file are closed automatically when the input and output stream objects expire i.e., when they go out of scope. (For instance, a global object expires when the program terminates). Also you can close a connection with a file explicitly by using close() method

input_file.close();

Closing such a connection does not eliminate the stream, it just disconnects it from the file. The stream still remains there. Closing a file flushes the buffer which means the data remaining in the buffer (input or output stream) is moved out of it in the direction it is ought to be.

  • Opening files using open() function
ifstream filin;                                     //create an input stream
filin.open("Master.dat");                           //associate '''filin''' stream with file Master.dat
.                                                   //process Master.dat
.
filin.close();                                      //terminate association with Master.dat
filin.open("Tran.dat");                             //associate '''filin''' stream with file Tran.dat
.                                                   //process Tran.dat
.
filin.close();                                      //terminate association

A stream can be connected to only one file at a time.

The concept of File Modes[edit]

When processing a file, you will typically specify the type of operation you want to perform. The operation is specified using what is referred to as a file mode.The filemode describes how a file is to be used—to read from it, to write to it, to append it, and so on.

stream_object.open ("filename", (filemode) );

The following table lists the filemodes available and their meaning:

File Mode Constants[5]

Steps to process a File in your Program[edit]

The five steps to use files in your C++ program are:

  1. Determine the type of link required.
  2. Declare a stream for the desired type of link.
  3. Attach the desired file to the stream.
  4. Now process as required.
  5. Close the file link with stream.

The complete example program:

/*To get rollnumbers and marks of the students of a class (get from the user) and store these details into a file called 'Marks.dat' */
 
#include<fstream.h>
void main( )
{
    ofstream filout ;                              // stream decided and declared - steps 1 & 2
    filout.open("marks.dat", ios :: out) ;         // file linked - step 3
    char ans = 'y' ;                               // process as required - step 4 begins
    int rollno ;
    float marks ;
    while(ans == 'y' || ans == 'Y')
    {
        cout << " \n Enter Rollno. :" ;
        cin >> rollno;
        cout << " \n Enter Marks :" ;
        cin >> marks ;
        filout << rollno << " \n " << marks << " \n " ;
        cout << " \n Want to enter more records?(y/n)..." ;
        cin >> ans ;
    }
    filout.close( ) ;                               // delink the file - step 5                            
}

File Handling in C++[edit]

  • get ( ) function:

Prototypes are :

istream & get (char * buf, int num, char delim = '\n') ;

The above first form reads characters into a character array pointed to by buf until either num characters have been read, or the character specified by delim has been encountered. For instance,

char line [40] ;
cin.get (line, 40, '$') ;

The above statements will read characters into line until either 40 characters are read or '$' character is encountered, whichever occurs earlier. If the input given in rtesponse to above statements is as follows :

Value is $ 177.5

Then line will be storing

Value is

And if the input given is as follows :

The amount is 17.5.

The contents of line will be

The amount is 17.5.

The array pointed to by buf will be null-terminated by get ( ). If no delim character is specified, by default a newline character acts as a delimiter. If the delimiter character is encountered in the input stream the get ( ) function does not extract it. Rather, the delimiter character remains in the stream until the next input operation.

int get ( ) ;

The above second form of get ( ) returns the next character from the stream. It returns EOF if the end-of-file is encountered. For instance, the following code fragment illustrates it :

int i ;
char ch ;
ch = i = fin.get ( ) ;

If the input given is A, then the value of i will be 65(ASCII value of A) and the value of ch will be A.

  • getline ( ) function

Prototype is :

istream & getline (char * buf, int num, char delim = '\n') ;

This function is virtually identical to get(buf, num, delim) version of get ( ). The difference between get(buf, num, delim) and getline ( ) is that getline ( ) reads and removes the delimiter newline character from the input stream if it is encountered which is not done by the get ( ) function. Following figure explains the difference between get ( ) and getline ( ) functions :

Difference between get() and getline()[6]
  • read ( ) and write ( ) functions :

Reading and writing blocks of binary data is to use C++'s read ( ) and write ( ) functions. Their prototypes are :

istream & read ( (char *) & buf, int sizeof (buf)) ;
ostream & write ( (char *) & buf, int sizeof (buf)) ;

The read ( ) function reads sizeof(buf) bytes from the associated stream and puts them in the buffer pointed to by buf. The write ( ) function writes sizeof(buf) bytes to the associated stream from the buffer pointed to by buf. The data written to a file using write ( ) can only be read accurately using read ( ). The following program writes a structure to the disk and then reads it back using write ( ) and read ( ) functions.

#include <fstream.h>
#include <string.h>
#include <conio.h>                                                  // for clrscr ( )
struct customer
{
    char name [51] ;
    float balance ;
};
int main ( )
{
    clrscr ( ) ;
    customer savac;
    strcpy(savac.name, "Tina Marshall") ;                          // copy value to structure
    savac.balance = 21310.75 ;                                     // variable savac
    ofstream fout ;
    fout.open("Saving", ios :: out | ios :: binary) ;              // open output file
    if(!fout)
    {
        cout << "File can't be opened \n" ;
        return 1;
    }
    fout.write((char *) & savac, sizeof(customer)) ;              // write to file
    fout.close() ;                                                 // close connection
    // read it back now
    ifstream fin ;
    fin.open("Saving", ios :: in | ios :: binary) ;               // open input file
    fin.read((char *) & savac, sizeof(customer)) ;                // read structure
    cout << savac.name ;                                           // display structure now
    cout << "has the balance amount of Rs." << savac.balance << "\n" ;
    fin.close( );
}

As you can see, only a single call to read ( ) or write ( ) is necessary to read or write the entire structure. Each individual field need not be read or written separately. If the end-of-file is reached before the specified number of characters have been read, the read ( ) simply stops, and the buffer contains as many characters as were available.

File pointers and random access[edit]

Every file maintains two pointers called get_pointer (in input mode file) and put_pointer (in output mode file) which tell the current position in the file where writing or reading will take place. These pointers help attain random access in file. That means moving directly to any location in the file instead of moving through it sequentially.In C++, random access is achieved by manipulating seekg ( ), seekp ( ), tellg ( ), tellp ( ) functions. The seekg ( ) and tellg ( ) functions are for input streams (ifstream) and seekp ( ) and tellp ( ) functions are for output streams (ofstream). However, if you use them with an fstream object then the above functions return the same value. The most common forms of these functions are :

seekg ( ) - istream & seekg (long) ;                      Form 1
            istream & seekg (long, seek_dir) ;            Form 2
seekp ( ) - ofstream & seekp (long) ;                     Form 1
            ofstream & seekp (long, seek_dir) ;           Form 2
tellg ( ) - long tellg ( )
tellp ( ) - long tellp ( )

The seekg ( ) (or seekp ( ) ) when used according to Form 1, it moves the get_pointer (or put_pointer) to an absolute position. For example,

ifstream fin ;
ofstream fout ;
fin.seekg(30) ;         // will move the ''get_pointer'' (in '''ifstream''') to byte number 30 in the file.
fout.seekp(30) ;        // will move the ''put_pointer'' (in '''ofstream''') to byte number 30 in the file.

When seekg ( ) (or seekp ( ) ) function is used according to Form 2, it moves the get_pointer (or put_pointer) to a position relative to the current position, following the definition of seek_dir. Seek_dir is an enumeration (defined in iostream.h) that has following values.

ios :: beg             // refers to beginning of the file
ios :: cur            // refers to current position in the file
ios :: end            // refers to end of the file

For example,

fin.seekg(30, ios :: beg) ;         // go to byte no. 30 from beginning of the file linked with fin.
fin.seekg(-2, ios :: cur) ;         // back up 2 bytes from current.
fin.seekg(0,  ios :: end) ;          // go to the end of the file.
fin.seekg(-5, ios :: end) ;         // back up 5 bytes from end of the file.

The methods tellp ( ) and tellg ( ) return the position (in terms of byte number) of put_pointer and get_pointer respectively in an output file and input file respectively.

Error handling during file I/O[edit]

Sometimes during file operations, errors may also creep in. For instance, a file being opened for reading might not exist. Or a file name used for a new file may already exist.Or an attempt could be made to read past the end-of-file, etc. To check for such errors and to ensure smooth processing, C++ file streams inherit stream-state members from the ios class that store the information on the status of a file that is being currently used. The current state of the I/O system is held in an integer, in which the following flags are encoded :

Stream - state flags.tif[7]

There are several error handling functions supported by class ios that help you read and process the status recorded in a file stream. Following table lists these error handling functions and their meaning :

Error handling functions[8]

These functions may be used in the appropriate places in a program to locate the status of a file stream and thereby take the necessary corrective measures. For example :

.
.
.
ifstream fin ;
fin.open ("Master") ;
while (! fin.fail ( ) )
{
    . . .                               // process the file.
}
if (fin.eof ( ) )
{
    . . .                              // terminate the program.
}
else if (fin.bad ( ) )
{
    . . .                              // report fatal error.
}
else 
{
    fin.clear ( ) ;                    // clear error - state flags
    . . . 
}
.
.
.

Detecting EOF[edit]

You can detect when the end-of-file is reached by using the member functions eof ( ) which has the prototype

int eof ( ) ;

It returns non-zero when the end-of-file has been reached, otherwise it returns zero. For instance,

ifstream fin ;
fin.open ("Master", ios :: in | ios :: binary) ;
while(! fin.eof ( ) )                                      // as long as '''eof ( )''' is zero.
{                                                          // that is, the file's end is not reached.
    . . .                                                  // process the file. 
}
if (fin . eof ( ) )                                         // if non - zero
cout << "End of the file reached ! \n" ;

The above code fragment processes a file as long as EOF is not reached. It uses eof ( ) function with the stream object to check for the file's end.
To detect end of file, without using EOF ( ), you may check whether the stream object has become NULL or not e.g.,

ifstream fin ;
 
fin.open ("Master", ios :: in | ios :: binary) ;
while (fin )
{
    . . . 
}

Data file categories[edit]

Data files come in two broad categories: open and closed.

Closed data file formats Closed data (frequently referred to as proprietary format files) files have their metadata data elements hidden, obscured or unavailable to users of the file. Application developers do this to discourage users from tampering with or corrupting the data files or importing the data into a competitor's application.

Open data file formats Open data files have their internal structures available to users of the file through a process of metadata publishing. Metadata publishing implies that the structure and semantics of all the possible data elements within a file are available to users.

Examples of open data files include XML formats such as HTML for storing web pages or SVG for storing scalable graphics.

References[edit]

  1. ^ Fig.7.1 of Chapter-7 Data File Handling,Computer Science Book by Sumita Arora
  2. ^ Fig.7.2 of Chapter-7 Data File Handling,Computer Science Book by Sumita Arora
  3. ^ Fig.7.2 of Chapter-7 Data File Handling,Computer Science Book by Sumita Arora
  4. ^ Fig.7.2 of Chapter-7 Data File Handling,Computer Science Book by Sumita Arora
  5. ^ Fig.7.2 of Chapter-7 Data File Handling,Computer Science Book by Sumita Arora
  6. ^ Fig.7.2 of Chapter-7 Data File Handling,Computer Science Book by Sumita Arora
  7. ^ Fig.7.2 of Chapter-7 Data File Handling,Computer Science Book by Sumita Arora
  8. ^ Fig.7.2 of Chapter-7 Data File Handling,Computer Science Book by Sumita Arora
  • Computer Science book for class XII by Sumita Arora,Publication-Dhanpat Rai & Co.Edition sixth:2009-2010.

See also[edit]