The SPIKE algorithm deals with a linear system AX = F, where A is a banded matrix of bandwidth much less than , and F is an matrix containing right-hand sides. It is divided into a preprocessing stage and a postprocessing stage.
Each of these stages can be accomplished in several ways, allowing a multitude of variants. Two notable variants are the recursive SPIKE algorithm for non-diagonally-dominant cases and the truncated SPIKE algorithm for diagonally-dominant cases. Depending on the variant, a system can be solved either exactly or approximately. In the latter case, SPIKE is used as a preconditioner for iterative schemes like Krylov subspace methods and iterative refinement.
The first step of the preprocessing stage is to factorize the diagonal blocks Aj. For numerical stability, one can use LAPACK's XGBTRF routines to LU factorize them with partial pivoting. Alternatively, one can also factorize them without partial pivoting but with a "diagonal boosting" strategy. The latter method tackles the issue of singular diagonal blocks.
In concrete terms, the diagonal boosting strategy is as follows. Let 0ε denote a configurable "machine zero". In each step of LU factorization, we require that the pivot satisfy the condition
|pivot| > 0ε‖A‖1.
If the pivot does not satisfy the condition, it is then boosted by
where ε is a positive parameter depending on the machine's unit roundoff, and the factorization continues with the boosted pivot. This can be achieved by modified versions of ScaLAPACK's XDBTRF routines. After the diagonal blocks are factorized, the spikes are computed and passed on to the postprocessing stage.
Assume that p is a power of two, i.e., p = 2d. Consider a block diagonal matrix
D̃1 = diag(D̃ 1,...,D̃ p/2)
for k = 1,...,p/2. Notice that D̃1 essentially consists of diagonal blocks of order 4m extracted from S̃. Now we factorize S̃ as
S̃ = D̃1S̃2.
The new matrix S̃2 has the form
Its structure is very similar to that of S̃2, only differing in the number of spikes and their height (their width stays the same at m). Thus, a similar factorization step can be performed on S̃2 to produce
S̃2 = D̃2S̃3
S̃ = D̃1D̃2S̃3.
Such factorization steps can be performed recursively. After d − 1 steps, we obtain the factorization
S̃ = D̃1⋯D̃d−1S̃d,
where S̃d has only two spikes. The reduced system will then be solved via
X̃ = S̃−1 dD̃−1 d−1⋯D̃−1 1G̃.
The block LU factorization technique in the two-partition case can be used to handle the solving steps involving D̃1, ..., D̃d−1 and S̃d for they essentially solve multiple independent systems of generalized two-partition forms.
Generalization to cases where p is not a power of two is almost trivial.
The first SPIKE partitioning and algorithm was presented in  and was designed as the means to improve the stability properties of a parallel Givens rotations-based solver for tridiagonal systems. A version of the algorithm, termed g-Spike, that is based on serial Givens rotations applied independently on each block was designed for the NVIDIA GPU . A SPIKE-based algorithm for the GPU that is based on a special block diagonal pivoting strategy is described in .
The SPIKE algorithm can also function as a preconditioner for iterative methods for solving linear systems. To solve a linear system Ax = b using a SPIKE-preconditioned iterative solver, one extracts center bands from A to form a banded preconditioner M and solves linear systems involving M in each iteration with the SPIKE algorithm.
In order for the preconditioner to be effective, row and/or column permutation is usually necessary to move "heavy" elements of A close to the diagonal so that they are covered by the preconditioner. This can be accomplished by computing the weighted spectral reordering of A.
The SPIKE algorithm can be generalized by not restricting the preconditioner to be strictly banded. In particular, the diagonal block in each partition can be a general matrix and thus handled by a direct general linear system solver rather than a banded solver. This enhances the preconditioner, and hence allows better chance of convergence and reduces the number of iterations.
Intel offers an implementation of the SPIKE algorithm under the name Intel Adaptive Spike-Based Solver. Tridiagonal solvers have also been developed for the NVIDIA GPU
 and the Xeon Phi co-processors. The method in  is the basis for a tridiagonal solver in the cuSPARSE library. The Givens rotations based solver was also implemented for the
GPU and the Intel Xeon Phi.
^ Sameh, A. H.; Kuck, D. J. (1978). "On Stable Parallel Linear System Solvers". Journal of the ACM. 25: 81. doi:10.1145/322047.322054.
^ Venetis, I.E.; Kouris, A.; Sobczyk, A.; Gallopoulos, E.; Sameh, A. H. (2015). "A direct tridiagonal solver based on Givens rotations for GPU architectures". Parallel Computing. 25: 101–116. doi:10.1016/j.parco.2015.03.008.
^ Chang, L.-W.; Stratton, J.; Kim, H.; Hwu, W.-M. (2012). "A scalable, numerically stable, high-performance tridiagonal solver using GPUs". Proc. Int'l. Conf. High Performance Computing, Networking Storage and Analysis (SC'12). Los Alamitos, CA, USA: IEEE Computer Soc. Press: 27:1–27:11. ISBN978-1-4673-0804-5.