It is especially suitable for computers laid out in an N × N mesh. While Cannon's algorithm works well in homogeneous 2D grids, extending it to heterogeneous 2D grids has been shown to be difficult.
The main advantage of the algorithm is that its storage requirements remain constant and are independent of the number of processors.
The Scalable Universal Matrix Multiplication Algorithm (SUMMA) is a more practical algorithm that requires less workspace and overcomes the need for a square 2D grid. It is used by the ScaLAPACK, PLAPACK, and Elemental libraries.
When multiplying two N×N matrices A and B, we need N×N processing nodes P arranged in a 2d grid. Initially pi,j is responsible for ai,j and bi,j.
row i of matrix a is circularly shifted by i elements to the left. col j of matrix b is circularly shifted by j elements up. Repeat n times: p[i][j] multiplies its two entries and adds to running total. circular shift each row of a 1 element left circular shift each col of b 1 element up
- Lynn Elliot Cannon, A cellular computer to implement the Kalman Filter Algorithm, Technical report, Ph.D. Thesis, Montana State University, 14 July 1969.
- Gupta, H.; Sadayappan, P.: Communication Efficient Matrix-Multiplication on Hypercubes, dbpubs.stanford.edu
- 4.2 Matrix Multiplication on a Distributed Memory Machine, www.phy.ornl.gov
- Ph.D. Research, graal.ens-lyon.fr. The thesis itself is not available from the archived link.
- Robert A. van de Geijn and Jerrell Watts, SUMMA: scalable universal matrix multiplication algorithm, Concurrency: Practice and Experience. Volume 9, Issue 4, pages 255–274, April 1997.
|This applied mathematics-related article is a stub. You can help Wikipedia by expanding it.|