Jump to content

Basic Linear Algebra Subprograms: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Undid revision 575790628 by Glrx (talk) Project is 7 months old and used in other projects.
Undid revision 575792748 by 68.5.190.129 (talk) It is a personal project that "intends" to produce hi qual impl; WP:UNDUE challenge
Line 50: Line 50:
;Intel MKL: The [[Intel]] [[Math Kernel Library]], supporting x86 32-bits and 64-bits. Includes optimizations for Intel [[Pentium (brand)|Pentium]], [[Intel Core|Core]] and Intel [[Xeon]] CPUs and Intel [[Xeon Phi]]; support for [[Linux]], [[Microsoft Windows|Windows]] and [[Mac OS X]].[http://software.intel.com/en-us/intel-mkl/]
;Intel MKL: The [[Intel]] [[Math Kernel Library]], supporting x86 32-bits and 64-bits. Includes optimizations for Intel [[Pentium (brand)|Pentium]], [[Intel Core|Core]] and Intel [[Xeon]] CPUs and Intel [[Xeon Phi]]; support for [[Linux]], [[Microsoft Windows|Windows]] and [[Mac OS X]].[http://software.intel.com/en-us/intel-mkl/]
;MathKeisan: [[NEC Corporation|NEC]]'s math library, supporting [[NEC SX architecture]] under [[SUPER-UX]], and [[Itanium]] under [[Linux]] [http://www.mathkeisan.com/]
;MathKeisan: [[NEC Corporation|NEC]]'s math library, supporting [[NEC SX architecture]] under [[SUPER-UX]], and [[Itanium]] under [[Linux]] [http://www.mathkeisan.com/]
;MATTLAS BLAS: A modern task based LGPL BLAS implementation written in C++, currently supporting the AVX instruction set for x86_64. [https://github.com/matthewbadin/MATTLAS]
;Netlib BLAS: The official reference implementation on [[Netlib]], written in [[Fortran|Fortran 77]]. [http://www.netlib.org/blas/]
;Netlib BLAS: The official reference implementation on [[Netlib]], written in [[Fortran|Fortran 77]]. [http://www.netlib.org/blas/]
;Netlib CBLAS: Reference [[C (programming language)|C]] interface to the BLAS. It is also possible (and popular) to call the Fortran BLAS from C. [http://www.netlib.org/blas]
;Netlib CBLAS: Reference [[C (programming language)|C]] interface to the BLAS. It is also possible (and popular) to call the Fortran BLAS from C. [http://www.netlib.org/blas]

Revision as of 23:11, 4 October 2013

Basic Linear Algebra Subprograms (BLAS) are a set of low-level kernel subroutines that perform common linear algebra operations such as copying, vector scaling, vector dot products, linear combinations, and matrix multiplication. They were first published in 1979. The BLAS are used to build larger linear algebra subroutine libraries such as LINPACK and LAPACK. The BLAS allow such general libraries to make use of high performance architectures by linking BLAS implementations specifically implemented for those architectures. The BLAS see usage in high-performance computing. Highly optimized implementations of the BLAS have been developed by hardware vendors such as Intel and AMD, as well as by other authors, e.g. GotoBLAS and ATLAS (a portable self-optimizing BLAS). The LINPACK and HPL benchmarks relies heavily on DGEMM, a BLAS subroutine, for its performance measurements.

Background

With the advent of numerical programming, sophistical subroutine libraries became useful. These libraries would contain common mathematical operations such as root finding, matrix inversion, and solving systems of equations. The language of choice was FORTRAN. An early example of such a library was IBM's Scientific Subroutine Package (SSP). These subroutine libraries allowed programmers to concentrate on their specific problems and avoid re-implementing well-known algorithms. The library routines would also be better than average implementations; matrix algorithms, for example, might use full pivoting to get better numerical accuracy. The library routines would also have more efficient routines. For example, a library may include a program to solve a matrix that is upper triangular. The libraries would include single-precision and double-precision versions of some algorithms.

Initially, these subroutines used hard-coded loops. If a subroutine need to perform a matrix multiplication, there would be three nested DO loops. Linear algebra programs have many common low-level ("kernel") operations. During 1973 to 1977, several of these kernel operations were identified.[1] These kernel operations became defined subroutines that math libraries could call. The kernel calls had advantages over hard-coded loops: the library routine would be more readable, there were fewer chances for bugs, and the kernel implementation could be optimized for speed. A specification for these kernel operations using scalars and vectors, the level-1 Basic Linear Algebra Subroutines (BLAS), was published in 1979.[2] The BLAS were used to implement LINPACK.

The BLAS abstraction allows customization for high performance. For example, LINPACK is a general purpose library that can be used on many different machines without modification. LINPACK could use a generic version of the BLAS. To gain performance, different machines might use tailored versions of BLAS. As computer architectures became more sophisticated, vector machines appears. The BLAS for a vector machine could use the machine's fast vector operations.

Other machine features became available and could also be exploited. Consequently, the BLAS were augmented in 1984 to 1986 with level-2 kernel operations that concerned vector-matrix operations. Memory hierarchy was also recognized as something to exploit. Many computers have cache memory that is much faster than main memory; keeping matrix manipulations localized allows better usage of the cache. In 1987 to 1988, the level 3 BLAS were identified to do matrix-matrix operations. The level 3 BLAS encouraged block-partitioned algorithms. The LAPACK library uses level 3 BLAS.[3]

The original BLAS concerned densely stored vectors and matrices. Further extensions to the BLAS, such as to sparse matrices, have been addressed.[4]

Functionality

The BLAS functionality is divided into three levels: 1, 2 and 3.

Level 1

This level contains vector operations of the form

(also known as SAXPY) as well as scalar dot products and vector norms, among other things.

Level 2

This level contains matrix-vector operations of the form

as well as solving for with being triangular, among other things.

Level 3

This level contains matrix-matrix operations of the form

as well as solving for triangular matrices , among other things. This level contains the widely used General Matrix Multiply operation.

Implementations

Accelerate
Apple's framework for Mac OS X and iOS, which includes tuned versions of BLAS and LAPACK.[1] [2]
ACML
The AMD Core Math Library, supporting the AMD Athlon and Opteron CPUs under Linux and Windows.[3]
C++ AMP BLAS
The C++ AMP BLAS Library is an open source implementation of BLAS for Microsoft's AMP language extension for Visual C++.[4]
ATLAS
Automatically Tuned Linear Algebra Software, an open source implementation of BLAS APIs for C and Fortran 77.[5]
BLIS
BLAS-like Library Instantiation Software framework for rapid instantiation. [6]
cuBLAS
Optimized BLAS for NVIDIA based GPU cards.[7]
Eigen BLAS
A Fortran 77 and C BLAS library implemented on top of the open source Eigen library, supporting x86, x86 64, ARM (NEON), and PowerPC architectures.[8] (Note: as of Eigen 3.0.3, the BLAS interface is not built by default and the documentation refers to it as "a work in progress which is far to be ready for use".)
ESSL
IBM's Engineering and Scientific Subroutine Library, supporting the PowerPC architecture under AIX and Linux.[9]
GotoBLAS
Kazushige Goto's BSD-licensed implementation of BLAS, tuned in particular for Intel Nehalem/Atom, VIA Nanoprocessor, AMD Opteron.[10]
HP MLIB
HP's Math library supporting IA-64, PA-RISC, x86 and Opteron architecture under HPUX and Linux.
Intel MKL
The Intel Math Kernel Library, supporting x86 32-bits and 64-bits. Includes optimizations for Intel Pentium, Core and Intel Xeon CPUs and Intel Xeon Phi; support for Linux, Windows and Mac OS X.[11]
MathKeisan
NEC's math library, supporting NEC SX architecture under SUPER-UX, and Itanium under Linux [12]
Netlib BLAS
The official reference implementation on Netlib, written in Fortran 77. [13]
Netlib CBLAS
Reference C interface to the BLAS. It is also possible (and popular) to call the Fortran BLAS from C. [14]
OpenBLAS
Optimized BLAS based on Goto BLAS hosted at GitHub, supporting Intel Sandy Bridge and MIPS_architecture Loongson processors. [15]
PDLIB/SX
NEC's Public Domain Mathematical Library for the NEC SX-4 system.[16]
SCSL
SGI's Scientific Computing Software Library contains BLAS and LAPACK implementations for SGI's Irix workstations.[17]
Sun Performance Library
Optimized BLAS and LAPACK for SPARC, Core and AMD64 architectures under Solaris 8, 9, and 10 as well as Linux.[18]
SurviveGotoBLAS2
Optimized BLAS that is an attempt to continue the work of Kazushige Goto by Ei-ji Nakama. [19]

Other libraries offering BLAS-like functionality

AMD APPML
AMD Accelerated Parallel Processing Math Libraries contains FFT and 3 Levels BLAS functions written in OpenCL. Designed to run on AMD GPUs supporting OpenCL also work on CPUs to facilitate multicore programming and debugging. [20]
Armadillo
Armadillo is a C++ linear algebra library aiming towards a good balance between speed and ease of use. It employs template classes, and has optional links to BLAS/ATLAS and LAPACK. It is sponsored by NICTA (in Australia) and is licensed under a free license. [21].
CUDA SDK
The NVIDIA CUDA SDK includes BLAS functionality for writing C programs that runs on GeForce 8 Series or newer graphics cards.
Eigen
The Eigen template library provides an easy to use highly generic C++ template interface to matrix/vector operations and related algorithms like solving algorithms, decompositions etc. It uses vector capabilities and is optimized for both fixed size and dynamic sized and sparse matrices.[22]
GSL
The GNU Scientific Library Contains a multi-platform implementation in C which is distributed under the GNU General Public License.
HASEM
is a C++ template library, being able to solve linear equations and to compute eigenvalues. It is licensed under BSD License. [23]
LAMA
The Library for Accelerated Math Applications (LAMA) is a C++ template library for writing numerical solvers targeting various hardwares (e.g. GPUs through CUDA or OpenCL) on distributed memory systems, hiding the hardware specific programming from the program developer
Libflame
FLAME project implementation of dense linear algebra library [24]
MAGMA
Matrix Algebra on GPU and Multicore Architectures (MAGMA) project develops a dense linear algebra library similar to LAPACK but for heterogeneous and hybrid architectures including multicore systems accelerated with GPGPU graphics cards. [25]
MTL4
The Matrix Template Library version 4 is a generic C++ template library providing sparse and dense BLAS functionality. MTL4 establishes an intuitive interface (similar to MATLAB) and broad applicability thanks to Generic programming.
PLASMA
The Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) project is a modern replacement of LAPACK for multi-core architectures. PLASMA is a software framework for development of asynchronous operations and features out of order scheduling with a runtime scheduler called QUARK that may be used for any code that expresses its dependencies with a Directed acyclic graph. [26]
uBLAS
A generic C++ template class library providing BLAS functionality. Part of the Boost library. It provides bindings to many hardware-accelerated libraries in a unifying notation. Moreover, uBLAS focuses on correctness of the algorithms using advanced C++ features. [27]

The Sparse BLAS

Sparse extensions to the previously dense BLAS exist such as in ACML

See also

References

  • BLAST Forum (21 August 2001), Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard, Knoxville, TN: University of Tennessee
  • Lawson, C. L.; Hanson, R. J.; Kincaid, D.; Krogh, F. T. (1979), "Basic Linear Algebra Subprograms for FORTRAN usage", ACM Trans. Math. Software, 5: 308–323, Algorithm 539
  • Dodson, D. S.; Grimes, R. G. (1982), "Remark on algorithm 539: Basic Linear Algebra Subprograms for Fortran usage", ACM Trans. Math. Software, 8: 403–404
  • Dodson, D. S. (1983), "Corrigendum: Remark on "Algorithm 539: Basic Linear Algebra Subroutines for FORTRAN usage"", ACM Trans. Math. Software, 9: 140
  • J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, An extended set of FORTRAN Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 14 (1988), pp. 1–17.
  • J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, Algorithm 656: An extended set of FORTRAN Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 14 (1988), pp. 18–32.
  • J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling, A set of Level 3 Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 16 (1990), pp. 1–17.
  • J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling, Algorithm 679: A set of Level 3 Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 16 (1990), pp. 18–28.
New BLAS
  • L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, R. C. Whaley, An Updated Set of Basic Linear Algebra Subprograms (BLAS), ACM Trans. Math. Soft., 28-2 (2002), pp. 135–151.
  • J. Dongarra, Basic Linear Algebra Subprograms Technical Forum Standard, International Journal of High Performance Applications and Supercomputing, 16(1) (2002), pp. 1–111, and International Journal of High Performance Applications and Supercomputing, 16(2) (2002), pp. 115–199.
  • BLAS homepage on Netlib.org
  • BLAS FAQ
  • BLAS operations from the GNU Scientific Library reference manual
  • BLAS Quick Reference Guide from LAPACK Users' Guide
  • CSBlas for C#. CSBlas is the translation of Fortran to C# of the BLAS subroutines.
  • NLapack. Port of LAPACK and BLAS using unmanaged (native) Fortran libraries.
  • Lawson Oral History One of the original authors of the BLAS discusses its creation in an oral history interview. Charles L. Lawson Oral history interview by Thomas Haigh, 6 and 7 November 2004, San Clemente, California. Society for Industrial and Applied Mathematics, Philadelphia, PA.
  • Dongarra Oral History In an oral history interview, Jack Dongarra explores the early relationship of BLAS to LINPACK, the creation of higher level BLAS versions for new architectures, and his later work on the ATLAS system to automatically optimize BLAS for particular machines. Jack Dongarra, Oral history interview by Thomas Haigh, 26 April 2005, University of Tennessee, Knoxville TN. Society for Industrial and Applied Mathematics, Philadelphia, PA
  • An Overview of the Sparse Basic Linear Algebra Subprograms: The New Standard from the BLAS Technical Forum [28]