Science DMZ Network Architecture
The term Science DMZ refers to a computer subnetwork that is structured to be secure, but without the performance limits that would otherwise result from passing data through a stateful firewall. The Science DMZ is designed to handle high volume data transfers, typical with scientific and high-performance computing, by creating a special DMZ to accommodate those transfers. It is typically deployed at or near the local network perimeter, and is optimized for a moderate number of high-speed flows, rather than for general-purpose business systems or enterprise computing.
The term Science DMZ was coined by collaborators at the US Department of Energy's ESnet in 2010. A number of universities and laboratories have deployed or are deploying a Science DMZ. In 2012 the National Science Foundation funded the creation or improvement of Science DMZs on several university campuses in the United States.
The Science DMZ is a network architecture to support Big Data. The so-called information explosion has been discussed since the mid 1960s, and more recently the term data deluge has been used to describe the exponential growth in many types of data sets. These huge data sets, often need to be copied from one location to another using the Internet. The movement of data sets of this magnitude in a reasonable amount of time should be possible on modern networks. For example, it should only take less than 4 hours to transfer 10 TeraBytes of data on a 10 Gigabit Ethernet network path, assuming disk performance is adequate The problem is that this requires networks that are free from packet loss and middleboxes such as traffic shapers or firewalls that slow network performance.
Most businesses and other institutions use a firewall to protect their internal network from malicious attacks originating from outside. All traffic between the internal network and the external Internet must pass through a firewall, which discards traffic likely to be harmful.
A stateful firewall tracks the state of each logical connection passing through it, and rejects data packets inappropriate for the state of the connection. For example, a website would not be allowed to send a page to a computer on the internal network, unless the computer had requested it. This requires a firewall to keep track of the pages recently requested, and match requests with responses.
A firewall must also analyze network traffic in much more detail, compared to other networking components, such as routers and switches. Routers only have to deal with the network layer, but firewalls must also process the transport and application layers as well. All this additional processing takes time, and limits network throughput. While routers and most other networking components can handle speeds of 100 billion bits per second (Gbps), firewalls limit traffic to about 1 Gbit/s, which is unacceptable for passing large amounts of scientific data.
While stateful firewall may be necessary for critical business data, such as financial records, credit cards, employment data, student grades, trade secrets, etc., science data requires less protection, because copies usually exists in multiple locations and there is less economic incentive to tamper.
A firewall must restrict access to the internal network but allow external access to services offered to the public, such as web servers on the internal network. This is usually accomplished by creating a separate internal network called a DMZ, a play on the term “demilitarized zone." External devices are allowed to access devices in the DMZ. Devices in the DMZ are usually maintained more carefully to reduce their vulnerability to malware. Hardened devices are sometimes called bastion hosts.
The Science DMZ takes the DMZ idea one step farther, by moving high performance computing into its own DMZ. Specially configured routers pass science data directly to or from designated devices on an internal network, thereby creating a virtual DMZ. Security is maintained by setting access control lists (ACLs) in the routers to only allow traffic to/from particular sources and destinations. Security is further enhanced by using an intrusion detection system (IDS) to monitor traffic, and look for indications of attack. When an attack is detected, the IDS can automatically update router tables, resulting in what some call a Remotely Triggered BlackHole (RTBH).
The Science DMZ provides a well-configured location for the networking, systems, and security infrastructure that supports high-performance data movement. In data-intensive science environments, data sets have outgrown portable media, and the default configurations used by many equipment and software vendors are inadequate for high performance applications. The components of the Science DMZ are specifically configured to support high performance applications, and to facilitate the rapid diagnosis of performance problems. Without the deployment of dedicated infrastructure, it is often impossible to achieve acceptable performance. Simply increasing network bandwidth is usually not good enough, as performance problems are caused by many factors, ranging from underpowered firewalls to dirty fiber optics to untuned operating systems.
The Science DMZ is the codification of a set of shared best practices—concepts that have been developed over the years—from the scientific networking and systems community. The Science DMZ model describes the essential components of high-performance data transfer infrastructure in a way that is accessible to non-experts and scalable across any size of institution or experiment.
The primary components of a Science DMZ are:
- A high performance Data Transfer Node (DTN) running parallel data transfer tools such as GridFTP
- A network performance monitoring host, such as perfSONAR
- A high performance router/switch
Optional Science DMZ components include:
- Support for layer-2 Multiprotocol Label Switching (MPLS) Virtual Private Networks (VPN)
- Support for Software Defined Networking
- Dan Goodin (June 26, 2012). "Scientists experience life outside the firewall with "Science DMZs."". Retrieved 2013-05-12.
- Eli Dart, Brian Tierney, Eric Pouyoul, Joe Breen (January 2012). "Achieving the Science DMZ" (PDF). Retrieved 2015-12-31.
- Dart, E.; Rotman, L.; Tierney, B.; Hester, M.; Zurawski, J. (2013). "The Science DMZ". Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13. p. 1. doi:10.1145/2503210.2503245. ISBN 9781450323789.
- "Why Science DMZ?". Retrieved 2013-05-12.
- Dart, Eli; Metzger, Joe (June 13, 2011). "The Science DMZ". CERN LHCOPN/LHCONE workshop. Retrieved 2013-05-26.
This is the earliest cite-able reference to the Science DMZ. Work on the concept had been going on for several years prior to this.
- "Implementation of a Science DMZ at San Diego State University to Facilitate High-Performance Data Transfer for Scientific Applications". National Science Foundation. September 10, 2012. Retrieved 2013-05-13.
- "SDNX - Enabling End-to-End Dynamic Science DMZ". National Science Foundation. September 7, 2012. Retrieved 2013-05-13.
- "Improving an existing science DMZ". National Science Foundation. September 12, 2012. Retrieved 2013-05-13.
- Dart, Eli; Rotman, Lauren (Aug 2012). "The Science DMZ: A Network Architecture for Big Data". LBNL-report.
- Brett Ryder (Feb 25, 2010). "The Data Deluge". The Economist.
- ."Network Requirements and Expectations". Lawrence Berkeley National Laboratory.
- pmoyer (Dec 13, 2012). "Research & Education Network (REN) Architecture: Science-DMZ". Retrieved 2013-05-12.
- "Science DMZ: Data Transfer Nodes". Lawrence Berkeley Laboratory. 2013-04-04. Retrieved 2013-05-13.