Digital video fingerprinting
|This article does not cite any references or sources. (September 2012)|
Video fingerprinting is a technique in which software identifies, extracts and then compresses characteristic components of a video, enabling that video to be uniquely identified by its resultant “fingerprint”. Video fingerprinting is technology that has proven itself to be effective at identifying and comparing digital video data.
Principles behind video fingerprinting technology
Video fingerprinting methods extract several unique features of a digital video that can be stored as a fingerprint of the video content. The evaluation and identification of video content is then performed by comparing the extracted video fingerprints. For digital video data, both audio and video fingerprints can be extracted, each having individual significance for different application areas.
The creation of a video fingerprint involves the use of software that decodes the video data and then applies several feature extraction algorithms. Video fingerprints are highly compressed when compared to the original source file and can therefore be easily stored in databases for later comparison. They may be seen as an extreme form of lossy compression and cannot be used to reconstruct the original video content.
The huge number of videos currently available (thanks to the development of user generated content sites (UGC sites)) presents video fingerprinting technologies with a scalability challenge.
Compared to hash codes
Normally, digital data are compared based on hash values that are directly derived from the digital components of a file. However, such methods are incomplete as they can only determine absolute equality or non-equality of video data files or parts. More often than not, differences in a video codec and digital processing artifacts may cause small differences in the digital components without changing the video perceptually. Thus, when employing hash methods, a comparison for absolute equality may fail even when two video segments are perceptually identical. Moreover, hash value comparisons are also of little value when one wishes to identify video segments that are similar (but not identical) to a given reference clip. The limitations of the equality / inequality dichotomy inherent to hash value techniques render “similar searching” impossible.
Also, digital video fingerprinting enables to recognize videos with a different resolution compared with the original (smaller or larger) as well as recognize videos that have been modified slightly (blurring, rotation, acceleration or decceleration, cropping, insertions of new elements in the video), and videos where the audio track has been modified.
Compared to watermarking
Video fingerprinting should not be confused with digital watermarking which relies on inserting identifying features into the content itself, and therefore changing the nature of the content. Some watermarks can be inserted in a way that they remain imperceptible by a viewer. A robust watermark can be difficult to detect and remove, but removal of invisible watermarks is a significant weakness.
Since watermarks must be inserted into the video, they only identify copies of the particular video made after that point in time. For example, if a watermark is inserted at broadcast it cannot be used to identify copies of the video made before the broadcast.
Video fingerprinting does not rely on any addition to the video stream. A video fingerprint cannot be "removed" because it is not "added". In addition, a reference video fingerprint can be created at any point from any copy of the video.
Watermarks offer some advantages over fingerprinting. A unique watermark can be added to the content at any stage in the distribution process and multiple independent watermarks can be inserted into the same video content. This can be particularly useful in tracing the history of a copy of a video. Detecting watermarks in a video can indicate the source of an unauthorized copy.
While video fingerprinting systems must search a potentially large database of reference fingerprints, a watermark detection system only has to do the computation to detect the watermark. This computation can be significant and when multiple watermark keys must be tested then watermarking can fail to scale to UGV site volumes.
Video fingerprinting is of interest in the Digital Rights Management (DRM) area, particularly regarding the distribution of unauthorized content on the Internet. Video fingerprinting systems enable content providers (e.g. film studios) or publishers (e.g. UGC sites) to determine if any of the publisher's files contain content registered with the fingerprint service. If registered content is detected, the publisher can take the appropriate action – remove it from the site, monetize it, add correct attribution, etc.
Video fingerprinting within Smart TV is enabling an emerging category of interactive television applications. Television devices integrated with real-time fingerprinting software can automatically recognize the video content on-screen in order to enable interactive features and applications on top of the programming. Entrepreneur Mark Cuban has made investments to leverage this technology to create interactive features for his cable networks HDNet and AXS.
Video fingerprinting may be used for broadcast monitoring (e.g. advertisement monitoring, news monitoring) and general media monitoring. Broadcast monitoring solutions can inform content providers and content owners with play lists of when and where their video content was used.
Video fingerprinting is also used by authorities to track the distribution of illegal content such as happy slapping, terrorist and child abuse related videos. Another use is for companies to track the leak of confidential recordings or videos, or for celebrities to track the presence on the Internet of unauthorized videos (for instance videos of themselves taken by amateurs using a camcorder or a mobile phone).
Video fingerprint can also be used to create content-aware video advertising. As one implementation, if a video service provider distributes a nationally broadcast video content which contains a nationally broadcast TV commercial, a localized overlay of text/graphics may be performed on the national commercial. This way, the national commercial will have local overlay of information specific to that commercial. For example, if the national commercial contains a 15-second spot for a Ford Explorer SUV, through the fingerprint technology, local operators may put an overlay of the local dealership information – phone number, promotion, etc. – over the 15-second commercial, creating a localized commercial for the SUV that appears to be targeted only for the local audience.
Fingerprinting visual content is similar to audio fingerprinting but uses a different technology. From a content provider's point of view, both video and audio fingerprinting need to be used in most applications. Consider the online publication of "mash-ups". Mash-ups can consist of content from several sources that is compiled together and is set to a unique audio track. Since the audio track is different from the original version, the copyrighted material in these mash-ups would go undetected using only audio fingerprinting techniques. In other cases, mash-ups consist of the soundtrack from a commercial video source, like a movie, used with a different video stream. In this case, a video fingerprint would not match but an audio fingerprint would. When the audio and video streams are not from the same master work, the question of fair-use may arise.
This discrepancy has real applications in the global online community in terms of film distribution. Films shown in countries other than their country of origin are often dubbed into other languages. This change in audio renders the films virtually unrecognizable by audio fingerprinting technologies unless a copy of all known versions has been previously fingerprinted. Employing video fingerprinting, however, enables the content owner to fingerprint just once and have each subsequent version remain recognizable. If the customer wishes to know which language soundtrack is present on a particular video, then an audio fingerprint must be used.