Elephant flow

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Percent of all traffic on a daily trace for the top 10 flows on a T-1 line between the US and Japan from December 2001 to May 2007. Median daily flows total about 350,000

In computer networking, an elephant flow is an extremely large (in total bytes) continuous flow set up by a TCP (or other protocol) flow measured over a network link. Elephant flows, though not numerous, can occupy a disproportionate share of the total bandwidth over a period of time. It is not clear who coined "elephant flow", but the term began occurring in published Internet network research in 2001 when the observations were made that a small number of flows carry the majority of Internet traffic and the remainder consists of a large number of flows that carry very little Internet traffic (mice flows).[1][2] For example, researchers Mori et al. studied the traffic flows on several Japanese universities and research networks.[3] At the WIDE network they found elephant flows were only 4.7% of all flows but occupied 41.3% of all data transmitted during the time period.

The actual impact of elephant flows on Internet traffic is still an area of research and debate. Some research shows that elephant flows may be highly correlated with traffic spikes and other elephant flows (Lan & Heidemann and Mori et al.).[4] Elephant flows have varying definitions proposed by researchers including flows that occupy greater than 1% of total traffic in a time period,[5] measuring the duration of the flow,[6] and looking at flows whose size is greater than the mean plus three standard deviations of traffic during the time period.[4] One of the main goals of research into elephant flows is to develop more efficient bandwidth management tools and predictive models for the Internet. For example, researchers have focused on providing better quality of service to flows of small sizes (mice flows) by de-prioritizing elephant flows.[7]

Elephant flows can also be viewed from the perspective of a network appliance such as an Intrusion Prevention System (IPS). In this context the number of bytes on the flow is less significant than the instantaneous processing load required to service the flow, where the processing load depends on the IPS configuration (how much work it is supposed to do) and the byte rate (flow throughput). An elephant flow could thus be defined as a flow that exceeds a given total service time within a particular time interval

For example, if just a single CPU core is used to process a flow, an elephant flow could be considered any flow for which the processing load exceeds the capacity of the CPU core. This in turn could be defined by dropped packets or an excess latency for any packet to transit the device. Obviously, lower thresholds can be applied and more cores could be used but the basic concept of required processing load relative to processing capacity holds.

To see how this differs from simply looking at the total bytes on a flow, consider two flows F1 and F2 with N1 and N2 total bytes respectively and where N2 = 1000*N1. It is possible that N1 is an elephant flow while N2 is not, if for example the required inspection of F1 is more complex than that of F2 and/or if the rate of F1 is much greater than the rate of F2.

See also[edit]

References[edit]

  1. ^ Fang, W.; Peterson, L. "Inter-AS traffic patterns and their implications". Global Telecommunications Conference, GLOBECOM '99 (3): 1859–1868. Archived from the original on 2015-05-05.
  2. ^ Guo, Liang; Matta, I. (11–14 November 2001). The War Between Mice and Elephants (PDF). Dept. of Comput. Sci., Boston Univ., MA, USA. pp. 180–188. CiteSeerX 10.1.1.28.7225. doi:10.1109/ICNP.2001.992898. ISBN 978-0-7695-1429-1.
  3. ^ Mori, T.; Kawahara, R.; Naito, S.; Goto, S. (2004). On the characteristics of Internet traffic variability: spikes and elephants. Applications and the Internet Proceedings. 2004 International Symposium on Applications and the Internet. pp. 99–106. doi:10.1109/SAINT.2004.1266104. ISBN 978-0-7695-2068-1.
  4. ^ a b Lan, K.; Heidemann, J. (2003). "On the correlation of internet flow characteristics" (PDF). Technical Report ISI-TR-574. Archived from the original (PDF) on 2010-05-28. Retrieved 2011-01-21.
  5. ^ Estan, C.; Varghese, G. (November 2001). "New directions in traffic measurement and accounting" (PDF). Proceeding of ACM SIGCOMM Internet Measurement Workshop 2001, San Francisco Bay Area. Archived from the original (PDF) on 2016-03-06.
  6. ^ Papagiannaki, K.; Taft, N.; Bhattacharyya, S.; Thiran, P.; Salamatian, K.; Diot, C. (November 2002). A Pragmatic Definition of Elephants in Internet Backbone Traffic. Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement. pp. 175–176. doi:10.1145/637201.637227. ISBN 978-1581136036.
  7. ^ Divakaran, Dinil Mon; Altman, Eitan; Primet, Pascale Vicat-Blanc (June 2011). Size-Based Flow-Scheduling Using Spike-Detection. Proceedings of 18th International Conference on Analytical and Stochastic Modeling Techniques and Applications - ASMTA 2011, Venice, Italy. Lecture Notes in Computer Science. 6751. pp. 331–345. doi:10.1007/978-3-642-21713-5_24. ISBN 978-3-642-21712-8.