Segmented file transfer
|This article does not cite any references (sources). (October 2008)|
Segmented file-transfer (also known as multisource file-transfer or swarming file-transfer) is the coordinated transmission of a computer file sourced from multiple servers to a single destination. It can be applied as well when downloading the same file from the same server in various parts, like some download managers do. A computer program downloads (retrieves) different portions of the file from various sources simultaneously, and assembles the file on the destination computer data storage device.
Segmented downloads probably have an origin with NASA and the magnetic tape based file systems used on Deep Space Network craft such as those in the Voyager Program. However from the 1960s to the 1980s there was a lot of experimentation with uploading, downloading (and synchronizing) data over bandwidth restricted telecommunications links by most many mainframe computer users. So the early origins of segmented downloading are not historically clear.
It is understood that some NASA missions use some kind of segmented downloading technique (for either file formats or data streams) :
- Mars Rovers (for ICER image files)
- New Horizons (for Jupiter flyby data)
- Voyager Program (historical)
Swarmcast was the first significant peer-to-peer (P2P) content delivery system that implemented a kind of segmented downloading technology. The program and protocol was invented and developed in 1999 by Justin Chapweske and sold to Opencola, which released the software under a GPL license.
Most IP networks are designed for users to download more than they upload, usually with an expected (Download:Upload) ratio of 3:1 or more.
Segmented downloading, when used by only 20% of an ISP's user base, can upset the ISP's network to a point of requiring substantial reprogramming of routers and a rethink of network design.
- Traditional web object caching technology (like the Squid proxy) is of no use here.
- Universal adoption of IPv6 cannot help either, as it only allows all users to have fixed IP addresses. Fixed IP address don't fully address the routing table problems associated with segmented downloading.
- Typical downloading configurations can have a single user in touch with up to 10 to 30 ephemeral users per file scattered across the global internet.
- IP router tables can become bloated with routes to these ephemeral users slowing down table lookups.
- Large files can be made available efficiently to many other users by someone who does not have large upload bandwidth.
- routes to the more obscure parts of the Internet can assert themselves across most of the Internet—this is especially true for dial-up users
- segmented downloading does save some transmission capacity, as the number of lost or redundant megabytes is minimal compared to losing a prolonged http or ftp download
Most ISPs have learned to cope with segmented downloading technology, but coping has meant the mandatory deployment of TCP/IP traffic shaping technology.
Segmented downloading technology cannot magically solve all downloading problems. There are mathematical constraints on the effectiveness of the technology.
In a group of users that has insufficient upload-bandwidth, with demand higher than supply. Segmented downloading can however very nicely handle traffic peaks, and it can also, to some degree, let uploaders upload "more often" to better utilize their connection.
Data integrity issues
- Very simple implementations of segmented downloading technology can often result in varying levels of file corruption, as there often is no way of knowing if all sources are actually uploading segments of the same file.
- Data corruption problems have led to most programs using segmented downloading using some sort of checksum or hash algorithm to ensure file integrity (to receive file intact) and uniqueness (to not receive bits of other similar files).
- Usually MD5 and SHA-1 hashes are preferred in most segmented download protocols, but CRC-64-ECMA would suffice in most cases. In cases where only MPEG files are being sent CRC-32-MPEG would also be acceptable.
- In the future most segmented downloading technologies will probably use layered hashes and checksums like WHIRLPOOL, SHA-256, SHA-512 and CRC-64-ECMA (for individual segments) to unquestionably guarantee data integrity. MD5 and SHA-1 have been determined to be cryptographically weak with respect to protecting data integrity.
Although with respect to BitTorrent and other distributed file transfer protocols there is no difference between uploading and downloading (as clients can do both) or any meaningful distinction between client and server (as both are the same) there are some segmented uploading technologies that do exist.
Space segment based telecom systems are the only widely known cases where segmented uploading technologies have emerged. This is mainly due to the limited bandwidth and other space segment constraints.
- CCSDS software uploading protocols have the capability of segmented uploading, but current deployed systems have not been in need of the protocol being used in its most BitTorrent like capability.
- Satellite direct to home subscription systems deployed in Europe and North America have employed an approach of upgrading software on customer devices by only sending a few bytes at a time (~2k or less) over a long period of time. Generally these segmented upload approaches are proprietary and related to the SIM card security and subscription mechanism.
With respect to Direct To Home TV systems using segmented uploading to outwit "hackers" — only SkyTV (UK) and DirecTV (USA) have been possibly linked to having the capability to do so or have done so in the past. However, one can assume that any modern MPEG2 DVB DTH mass subscriber system has the ability to accept software upgrades trickled to it at the rate of 8kb/day or less.
- Direct Connect (file sharing)
- Download Manager