The real time factor (RTF) is a common metric for measuring the speed of an automatic speech recognition system. It can also be used in other contexts where an audio or video signal is processed (usually automatically) at nearly constant rate (e.g. reading music from a CD).


If it takes time  {P} to process an input of duration  {I} , the real time factor is defined as

 RTF = \frac{P}{I} .

If, for example, it takes 8 hours of computation time to process a recording of duration 2 hours, the real time factor is 4. When the real time factor is 1 or less than 1, the processing is done in real time. It is a hardware-dependent value.

The accuracy of a speech recognition system, on the other hand, is measured with the word error rate.