||This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. (February 2013)|
Data migration is the process of transferring data between storage types, formats, or computer systems. It is a key consideration for any system implementation, upgrade, or consolidation. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. Data migration occurs for a variety of reasons, including: Server or storage equipment replacements or upgrades; Website consolidation; Server maintenance; and Data center relocation.
To achieve an effective data migration procedure, data on the old system is mapped to the new system providing a design for data extraction and data loading. The design relates old data formats to the new system's formats and requirements. Programmatic data migration may involve many phases but it minimally includes data extraction where data is read from the old system and data loading where data is written to the new system.
If a decision has been made to provide a set input file specification for loading data onto the target system, this allows a pre-load 'data validation' step to be put in place, interrupting the standard E(T)L process. Such a data validation process can be designed to interrogate the data to be transferred, to ensure that it meets the predefined criteria of the target environment, and the input file specification. An alternative strategy is to have on-the-fly data validation occurring at the point of loading, which can be designed to report on load rejection errors as the load progresses. However, in the event that the extracted and transformed data elements are highly 'integrated' with one another, and the presence of all extracted data in the target system is essential to system functionality, this strategy can have detrimental, and not easily quantifiable effects.
After loading into the new system, results are subjected to data verification to determine whether data was accurately translated, is complete, and supports processes in the new system. During verification, there may be a need for a parallel run of both systems to identify areas of disparity and forestall erroneous data loss.
Automated and manual data cleaning is commonly performed in migration to improve data quality, eliminate redundant or obsolete information, and match the requirements of the new system.
Data migration phases (design, extraction, cleansing, load, verification) for applications of moderate to high complexity are commonly repeated several times before the new system is deployed.
Data is stored on various media in files or databases, and is generated and consumed by software applications which in turn support business processes. The need to transfer and convert data can be driven by multiple business requirements and the approach taken to the migration depends on those requirements. Four major migration categories are proposed on this basis.
A business may choose to rationalize the physical media to take advantage of more efficient storage technologies. This will result in having to move physical blocks of data from one tape or disk to another, often using virtualization techniques. The data format and content itself will not usually be changed in the process and can normally be achieved with minimal or no impact to the layers above.
Similarly, it may be necessary to move from one database vendor to another, or to upgrade the version of database software being used. The latter case is less likely to require a physical data migration, but this can happen with major upgrades. In these cases a physical transformation process may be required since the underlying data format can change significantly. This may or may not affect behavior in the applications layer, depending largely on whether the data manipulation language or protocol has changed – but modern applications are written to be agnostic to the database technology so that a change from Sybase, MySQL, DB2 or SQL Server to Oracle should only require a testing cycle to be confident that both functional and non-functional performance has not been adversely affected.
Changing application vendor – for instance a new CRM or ERP platform – will inevitably involve substantial transformation as almost every application or suite operates on its own specific data model. Further, to allow the application to be sold to the widest possible market, commercial off-the-shelf packages are generally configured for each customer using metadata. Application programming interfaces (APIs) are supplied to protect the integrity of the data they have to handle. Use of the API is normally a condition of the software warranty, although a waiver may be allowed if the vendor's own or certified partner professional services and all tools are used.
Business process migration
Business processes operate through a combination of human and application systems actions, often orchestrated by business process management tools. When these change they can require the movement of data from one store, database or application to another to reflect the changes to the organization and information about customers, products and operations. Examples of such migration drivers are mergers and acquisitions, business optimization and reorganization to attack new markets or respond to competitive threat.
The first two categories of migration are usually routine operational activities that the IT department takes care of without the involvement of the rest of the business. The last two categories directly affect the operational users of processes and applications, are necessarily complex, and delivering them without significant business downtime can be challenging. A highly adaptive approach, concurrent synchronization, a business-oriented audit capability and clear visibility of the migration for stakeholders are likely to be key requirements in such migrations.
Project versus process
Further, it is helpful to distinguish between data migration and data integration activities. Data migration is a project where data will be moved or copied from one environment to another, and removed or decommissioned in the source. During the migration (which can take place over months or even years), data can flow in multiple directions, and there may be multiple migrations taking place simultaneously. The Extract, Transform, Load actions will be necessary, although the means of achieving these may not be those traditionally associated with the ETL acronym.
Data integration by contrast is a permanent part of the IT architecture, and is responsible for the way data flows between the various applications and data stores - and is a process rather than a project activity. Standard ETL technologies designed to supply data from operational systems to data warehouses would fit within the latter category.
Migration as a form of digital preservation
Migration, which focuses on the digital object itself, is the act of transferring, or rewriting data from an out-of-date medium to a current medium and has for many years been considered the only viable approach to long-term preservation of digital objects. Reproducing brittle newspapers onto microfilm is an example of such migration.
- Migration addresses the possible obsolescence of the data carrier, but does not address the fact that certain technologies which run the data may be abandoned altogether, leaving migration useless.
- Time consuming – migration is a continual process, which must be repeated every time a medium reaches obsolescence, for all data objects stored on a certain media.
- Costly - an institution must purchase additional data storage media at each migration.
As a result of the disadvantages listed above, technology professionals have begun to develop alternatives to migration, such as emulation.
- Janssen C, Data migration, http://www.techopedia.com/definition/1180/data-migration (retrieved 12 August 2013)
- van der Hoeven, Jeffrey, Bram Lohman, and Remco Verdegem. "Emulation for Digital Preservation in Practice: The Results." The International Journal of Digital Curation 2.2 (2007): 123-132.
- Muira, Gregory. "Pushing the Boundaries of Traditional Heritage Policy: maintaining long-term access to multimedia content." IFLA Journal 33 (2007): 323-326.