Fork–join queue

From Wikipedia, the free encyclopedia

A fork–join queueing node

In queueing theory, a discipline within the mathematical theory of probability, a fork–join queue is a queue where incoming jobs are split on arrival for service by numerous servers and joined before departure.[1] The model is often used for parallel computations[2] or systems where products need to be obtained simultaneously from different suppliers (in a warehouse or manufacturing setting).[3]: 78–80  The key quantity of interest in this model is usually the time taken to service a complete job. The model has been described as a "key model for the performance analysis of parallel and distributed systems."[4] Few analytical results exist for fork–join queues, but various approximations are known.

The situation where jobs arrive according to a Poisson process and service times are exponentially distributed is sometimes referred to as a Flatto–Hahn–Wright model or FHW model.[5][6][7]


On arrival at the fork point, a job is split into N sub-jobs which are served by each of the N servers. After service, sub-job wait until all other sub-jobs have also been processed. The sub-jobs are then rejoined and leave the system.[3]

For the fork–join queue to be stable the input rate must be strictly less than sum of the service rates at the service nodes.[8]


Fork–join queues have been used to model zoned RAID systems,[9] parallel computations[2] and for modelling order fulfilment in warehouses.[3]

Response time[edit]

The response time (or sojourn time[10]) is the total amount of time a job spends in the system.


Ko and Serfozo give an approximation for the response time distribution when service times are exponentially distributed and jobs arrive either according to a Poisson process[11] or a general distribution.[12] QIu, Pérez and Harrison give an approximation method when service times have a phase-type distribution.[13]

Average response time[edit]

An exact formula for the average response time is only known in the case of two servers (N=2) with exponentially distributed service times (where each server is an M/M/1 queue). In this situation, the response time (total time a job spends in the system) is[14]


  • is the utilization.
  • is the arrival rate of jobs to all the nodes.
  • is the service rate across all the nodes.

In the situation where nodes are M/M/1 queues and N > 2, Varki's modification of mean value analysis can also be used to give an approximate value for the average response time.[15]

For general service times (where each node is an M/G/1 queue) Baccelli and Makowski give bounds for the average response time and higher moments of this quantity both in the transient and steady state situations.[16] Kemper and Mandjes show that for some parameters these bounds are not tight and show demonstrate an approximation technique.[10] For heterogeneous fork-join queues (fork-join queues with different service times), Alomari and Menasce propose an approximation based on harmonic numbers that can be extended to cover more general cases such as probabilistic fork, open and closed fork-join queues.[17]

Subtask dispersion[edit]

The subtask dispersion, defined to be the range of service times, can be numerically computed and optimal deterministic delays introduced to minimize the range.[18]

Stationary distribution[edit]

In general the stationary distribution of the number of jobs at each queue is intractable.[11] Flatto considered the case of two servers (N=2) and derived the stationary distribution for the number of jobs at each queue via uniformization techniques.[5] Pinotsi and Zazanis show that a product form solution exists when arrivals are deterministic as the queue lengths are then independent D/M/1 queues.[7]

Heavy traffic/diffusion approximation[edit]

When the server is heavily loaded (service rate of the queue is only just larger than arrival rate) the queue length process can be approximated by a reflected Brownian motion which converges to the same stationary distribution as the original queueing process.[19][20] Under limiting conditions the state space of the synchronisation queues collapses and all queues behave identically.[21]

Join queue distribution[edit]

Once jobs are served, the parts are reassembled at the join queue. Nelson and Tantawi published the distribution of the join queue length in the situation where all servers have the same service rate.[14] Heterogeneous service rates and distribution asymptotic analysis are considered by Li and Zhao.[22]

Networks of fork–join queues[edit]

An approximate formula can be used to calculate the response time distribution for a network of fork–join queues joined in series (one after the other).[23]

Split–merge model[edit]

A related model is the split–merge model, for which analytical results exist.[2][24] Exact results for the split-merge queue are given by Fiorini and Lipsky.[25] Here on arrival a job is split into N sub-tasks which are serviced in parallel. Only when all the tasks finish servicing and have rejoined can the next job start. This leads to a slower response time on average.

Generalized (n,k) fork-join system[edit]

A generalization of the fork-join queueing system is the fork-join system where the job exits the system when any out of tasks are served. The traditional fork-join queueing system is a special case of the system when . Bounds on the mean response time of this generalized system were found by Joshi, Liu and Soljanin.[26][27]


  1. ^ Kim, C.; Agrawala, A. K. (1989). "Analysis of the fork-join queue". IEEE Transactions on Computers. 38 (2): 250. doi:10.1109/12.16501.
  2. ^ a b c Lebrecht, Abigail; Knottenbelt, William J. (June 2007). Response Time Approximations in Fork-Join Queues (PDF). 23rd Annual UK Performance Engineering Workshop (UKPEW). Archived from the original (PDF) on 29 October 2013. Retrieved 2 July 2009.
  3. ^ a b c Serfozo, R. (2009). "Markov Chains". Basics of Applied Stochastic Processes. Probability and Its Applications. pp. 1–98. doi:10.1007/978-3-540-89332-5_1. ISBN 978-3-540-89331-8.
  4. ^ Boxma, Onno; Koole, Ger; Liu, Zhen (1996). Queueing-theoretic Solution Methods for Models of Parallel and Distributed Systems (PDF) (Technical report). CWI. BS-R9425.
  5. ^ a b Flatto, L.; Hahn, S. (1984). "Two Parallel Queues Created by Arrivals with Two Demands I". SIAM Journal on Applied Mathematics. 44 (5): 1041. doi:10.1137/0144074.
  6. ^ Wright, Paul E. (1992), "Two parallel processors with coupled inputs", Advances in Applied Probability, 24 (4): 986–1007, doi:10.2307/1427722, JSTOR 1427722, S2CID 124774848
  7. ^ a b Pinotsi, D.; Zazanis, M. A. (2005). "Synchronized queues with deterministic arrivals". Operations Research Letters. 33 (6): 560. doi:10.1016/j.orl.2004.12.005.
  8. ^ Konstantopoulos, Panagiotis; Walrand, Jean (September 1989). "Stationary and Stability of Fork-Join Networks" (PDF). Journal of Applied Probability. 26 (3): 604–614. doi:10.2307/3214417. JSTOR 3214417. S2CID 120222029. Archived from the original (PDF) on 18 March 2012.
  9. ^ Lebrecht, A. S.; Dingle, N. J.; Knottenbelt, W. J. (2009). "Modelling Zoned RAID Systems Using Fork-Join Queueing Simulation". Computer Performance Engineering. Lecture Notes in Computer Science. Vol. 5652. pp. 16–29. CiteSeerX doi:10.1007/978-3-642-02924-0_2. ISBN 978-3-642-02923-3.
  10. ^ a b Kemper, B.; Mandjes, M. (2011). "Mean sojourn times in two-queue fork-join systems: Bounds and approximations". OR Spectrum. 34 (3): 723. doi:10.1007/s00291-010-0235-y.
  11. ^ a b Ko, S. S.; Serfozo, R. F. (2004). "Response times in M/M/s fork-join networks". Advances in Applied Probability. 36 (3): 854. doi:10.1239/aap/1093962238. S2CID 122581916.
  12. ^ Ko, S. S.; Serfozo, R. F. (2008). "Sojourn times in G/M/1 fork‐join networks". Naval Research Logistics. 55 (5): 432. doi:10.1002/nav.20294. S2CID 119551482.
  13. ^ Qiu, Zhan; Pérez, Juan F.; Harrison, Peter G. (2015). "Beyond the mean in fork-join queues: Efficient approximation for response-time tails". Performance Evaluation. 91: 99–116. doi:10.1016/j.peva.2015.06.007.
  14. ^ a b Nelson, R.; Tantawi, A. N. (1988). "Approximate analysis of fork/join synchronization in parallel queues". IEEE Transactions on Computers. 37 (6): 739. doi:10.1109/12.2213.
  15. ^ Varki, Elizabeth; Merchant, Arif; Chen, H. "M/M/1 Fork-join queue with variable sub-tasks" (PDF). Archived from the original (PDF) on 5 August 2010. Retrieved 29 March 2009.
  16. ^ Baccelli, François; Makowski, A. (1985), Simple computable bounds for the fork-join queue (PDF), National Institute for Research in Computer Science and Control Technical Report, retrieved 8 July 2011
  17. ^ Alomari, F.; Menasce, D. A. (2013). "Efficient Response Time Approximations for Multiclass Fork and Join Queues in Open and Closed Queuing Networks". IEEE Transactions on Parallel and Distributed Systems. 25 (6): 1437–1446. doi:10.1109/TPDS.2013.70. S2CID 422296.
  18. ^ Tsimashenka, I.; Knottenbelt, W. J. (2013). "Reduction of Subtask Dispersion in Fork-Join Systems" (PDF). Computer Performance Engineering. Lecture Notes in Computer Science. Vol. 8168. pp. 325–336. CiteSeerX doi:10.1007/978-3-642-40725-3_25. ISBN 978-3-642-40724-6.
  19. ^ Tan, X.; Knessl, C. (1996). "A fork-join queueing model: Diffusion approximation, integral representations and asymptotics". Queueing Systems. 22 (3–4): 287. doi:10.1007/BF01149176. S2CID 206789463.
  20. ^ Varma, Subir (1990). "Heavy and Light Traffic Approximations for Queues with Synchronization Constraints (PhD thesis)" (PDF). University of Maryland. Retrieved 10 February 2013.
  21. ^ Atar, R.; Mandelbaum, A.; Zviran, A. (2012). "Control of Fork-Join Networks in heavy traffic" (PDF). 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton). p. 823. doi:10.1109/Allerton.2012.6483303. ISBN 978-1-4673-4539-2. S2CID 18115820.
  22. ^ Li, J.; Zhao, Y. Q. (2010). "On the Probability Distribution of Join Queue Length in a Fork-Join Model". Probability in the Engineering and Informational Sciences. 24 (4): 473. doi:10.1017/S0269964810000112. S2CID 124693767.
  23. ^ Ko, S. S. (2007). "Cycle Times in a Serial Fork-Join Network". Computational Science and Its Applications – ICCSA 2007. Lecture Notes in Computer Science. Vol. 4705. pp. 758–766. doi:10.1007/978-3-540-74472-6_62. ISBN 978-3-540-74468-9.
  24. ^ Harrison, P.; Zertal, S. (2003). "Queueing Models with Maxima of Service Times". Computer Performance Evaluation. Modelling Techniques and Tools. Lecture Notes in Computer Science. Vol. 2794. pp. 152–168. doi:10.1007/978-3-540-45232-4_10. ISBN 978-3-540-40814-7.
  25. ^ Fiorini, Pierre M. (2015). "Exact Analysis of Some Split Merge Queues". SIGMETRICS Performance Evaluation Review. 43 (2): 51–53. doi:10.1145/2825236.2825257. S2CID 26219594.
  26. ^ Joshi, Gauri; Liu, Yanpei; Soljanin, Emina (October 2012). Coding for Fast Content Download. Allerton Conference on Communication, Control and Computing. arXiv:1210.3012. Bibcode:2012arXiv1210.3012J.
  27. ^ Joshi, Gauri; Liu, Yanpei; Soljanin, Emina (May 2014). On the Delay-Storage trade-off in Content Download from Coded Distributed Storage. Journal on Selected Areas of Communications. arXiv:1305.3945. Bibcode:2013arXiv1305.3945J.