|This article needs additional citations for verification. (February 2009)|
|Type of format||Executable|
In many computer operating systems, a COM file is a type of executable file; the name is derived from the file name extension .COM. Originally, the term stood for "Command file", a text file containing commands to be issued to the operating system (similar to a DOS batch file), on many of the Digital Equipment Corporation mini and mainframe operating systems going back to the 1970s.
With the introduction of microcomputers, the type of files commonly associated with the extension .com changed; in 8-bit CP/M, and later in MS-DOS and compatible DOSes, they are executable files, typically implementing a command. However, executables in the COM file format do not necessarily need to have the file name extension .COM in any but CP/M and very early versions of MS-DOS.
The .COM file name extension has no relation to the .com (for "commercial") top-level Internet domain name. However, this similarity in name has been exploited by malicious computer virus writers.
MS-DOS binary format
The COM format is the original binary executable format used in CP/M and MS-DOS. It is very simple; it has no header (with the exception of CP/M 3 files), and contains no metadata, only code and data. This simplicity exacts a price: the binary has a maximum size of 65,280 (FF00h) bytes (256 bytes short of 64 KiB) and stores all its code and data in one segment.
Since it lacks relocation information, it is loaded by the operating system at a pre-set address, at offset 0100h immediately following the PSP, where it is executed (hence the limitation of the executable's size). This was not an issue on early 8-bit machines because of how the segmentation model works, but it is the main reason why the format fell into disuse soon after the introduction of 16- and then 32-bit processors with their much larger, segmented memory.
In the Intel 8080 CPU architecture, only 65,536 bytes of memory could be addressed (address range 0000h to FFFFh). Under CP/M, the first 256 bytes of this memory, from 0000h to 00FFh were reserved for system use by the zero page, and any user program had to be loaded at exactly 0100h to be executed. COM files fit this model perfectly. Note that before the introduction of MP/M and Concurrent CP/M, there was no possibility of running more than one program or command at a time: the program loaded at 0100h was run, and no other.
Although the file format is the same in MS-DOS and CP/M, .COM files for the two operating systems are not compatible; MS-DOS COM files contain x86 instructions and possibly MS-DOS system calls, while CP/M COM files contain 8080 instructions (programs restricted to certain machines could also contain additional instructions for 8085 or Z80) and CP/M system calls.
It is possible to make a .COM file which will run under both operating systems, a "fat binary". There is no true compatibility at the instruction level; the instructions at the entry point are chosen to be equal in functionality but different in both operating systems, and make program execution jump to the section for the operating system in use. It is basically two different programs with the same functionality in a single file, preceded by code selecting the one to use.
Under CP/M 3, if the first byte of a COM file is C9h, there is a 256-byte header; since C9h corresponds to the 8080 instruction
RET, this means that the COM file will immediately terminate if run on an earlier version of CP/M that does not support this extension. (Because the instruction sets of the 8085 and Z80 are supersets of the 8080 instruction set, this works on all three processors.) C9h is an invalid opcode on the 8088/8086, and it will cause an INT 6 exception in v86 mode since the 386. Since C9h is the opcode for LEAVE since the 80188/80186 and therefore not used as the first instruction in a valid program, the executable loader in some versions of DOS rejects COM files that start with C9h, avoiding a crash.
Files may have names ending in .COM, but not be in the simple format described above; this is indicated by a magic number at the start of the file. For example, the COMMAND.COM file in DR DOS 6.0 is actually in DOS executable format, indicated by the first two bytes being MZ (4Dh 5Ah), the initials of Mark Zbikowski.
In MS-DOS and compatible DOSes, there is no memory management provided for COM files by the loader or execution environment. All memory is simply available to the COM file. After execution, the operating system command shell, COMMAND.COM, is reloaded. This leaves the possibilities that the COM file can either be very simple, using a single segment, or arbitrarily complex, providing its own memory management system. An example of a complex program is COMMAND.COM, the MS-DOS shell, which provided a loader to load other COM or EXE programs. In the .COM system, larger programs (up to the available memory size) can be loaded and run, but the system loader assumes that all code and data is in the first segment, and it is up to the .COM program to provide any further organization. Programs larger than available memory, or large data segments, can be handled by dynamic linking, if the necessary code is included in the .COM program. The advantage of using the .COM rather than .EXE format is that the binary image is usually smaller and easier to program using an assembler. Once compilers and linkers of sufficient power became available, it was no longer advantageous to use the .COM format for complex programs.
In MS-DOS, if a directory contains both a COM file and an EXE file with same name (not including extension), the COM file is preferentially selected for execution. For example, if a directory in the system path contains two files named
foo.exe, the following would execute
A user wishing to run
foo.exe can explicitly use the complete filename:
Taking advantage of this default behaviour, virus writers and other malicious programmers have used names like
notepad.com for their creations, hoping that if it is placed in the same directory as the corresponding EXE file, a command or batch file may accidentally trigger their program instead of the text editor
On Windows NT and derivatives (Windows 2000, Windows XP, Windows Vista, and Windows 7), the PATHEXT variable is used to override the order of preference (and acceptable extensions) for calling files without specifying the extension from the command line. The default value still places
.com files before
.exe files. This closely resembles a feature previously found in JP Software's line of extended command line processors 4DOS, 4OS2, and 4NT.
The format is still executable on many modern Windows-based platforms, but it is run in an MS-DOS-emulating subsystem, NTVDM, which is not present in 64-bit variants. COM files can be executed also on DOS emulators such as DOSBox, on any platform supported by these emulators.
Malicious usage of the .com extension
Some computer virus writers have hoped to take advantage of modern computer users' likely lack of knowledge of the .com file extension and associated binary format, along with their more likely familiarity with the .com Internet domain name. E-mail has been sent with attachment names similar to "www.example.com". Unwary Microsoft Windows users clicking on such an attachment would expect to begin browsing a site named
http://www.example.com/, but instead would run the attached binary command file named
www.example, giving it full permission to do to their machine whatever its author had in mind.
Note that there is nothing malicious about the COM file format itself; this is an exploitation of the coincidental name collision between .com command files and .com commercial web sites.
- John Elliott's article on the extended CP/M-80 3.0 COM file header
- COM 101 - a DOS executable walkthrough