Jump to content

NTU RGB-D dataset

From Wikipedia, the free encyclopedia

The NTU RGB-D (Nanyang Technological University's Red Blue Green and Depth information) dataset is a large dataset containing recordings of labeled human activities.[1] This dataset consists of 56,880 action samples containing 4 different modalities (RGB videos, depth map sequences, 3D skeletal data, infrared videos) of data for each sample.

The dataset consists of 60 labelled actions. Specifically: drink water, eat meal/snack, brushing teeth, brushing hair, drop, pickup, throw, sitting down, standing up (from sitting position), clapping, reading, writing, tear up paper, wear jacket, take off jacket, wear a shoe, take off a shoe, wear on glasses, take off glasses, put on a hat/cap, take off a hat/cap, cheer up, hand waving, kicking something, put something inside pocket / take out something from pocket, hopping (one foot jumping), jump up, make a phone call/answer phone, playing with phone/tablet, typing on a keyboard, pointing to something with finger, taking a selfie, check time (from watch), rub two hands together, nod head/bow, shake head, wipe face, salute, put the palms together, cross hands in front (say stop), sneeze/cough, staggering, falling, touch head (headache), touch chest (stomachache/heart pain), touch back (backache), touch neck (neckache), nausea or vomiting condition, use a fan (with hand or paper)/feeling warm, punching/slapping other person, kicking other person, pushing other person, pat on back of other person, point finger at the other person, hugging other person, giving something to other person, touch other person's pocket, handshaking, walking towards each other and walking apart from each other.

Classifiers

[edit]

This is a table of some of the machine learning methods used on the database and their error rates, by type of classifier:

Type Paper Preprocessing Description Accuracy (%)
Deep Learning Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points[2] Unconstraint attention mechanism over RGB stream 86.6
Deep Learning Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition[3] Arranging skeletal joints for tree traversal 77.7
Deep Learning Deep LSTM[1] None 67.3

See also

[edit]

References

[edit]
  1. ^ a b Shahroudy, Amir; Liu, Jun; Ng, Tian-Tsong; Wang, Gang (2016). "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis". arXiv:1604.02808 [cs.CV].
  2. ^ Baradel, Fabien; Wolf, Christian; Mille, Julien; Taylor, Graham (2018). "Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points". arXiv:1802.07898 [cs.CV].
  3. ^ Liu, Jun; Shahroudy, Amir; Xu, Dong; Wang, Gang (2016). "Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition". arXiv:1607.07043 [cs.CV].