- Affective Computing is also the title of a textbook on the subject by Rosalind Picard.
Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science. While the origins of the field may be traced as far back as to early philosophical enquiries into emotion, the more modern branch of computer science originated with Rosalind Picard's 1995 paper on affective computing. A motivation for the research is the ability to simulate empathy. The machine should interpret the emotional state of humans and adapt its behaviour to them, giving an appropriate response for those emotions.
- 1 Areas of affective computing
- 2 Technologies of affective computing
- 2.1 Emotional speech
- 2.2 Facial affect detection
- 2.3 Body gesture
- 2.4 Physiological monitoring
- 2.5 Visual aesthetics
- 3 Potential applications
- 4 See also
- 5 References
- 6 Sources
- 7 External links
Areas of affective computing
Detecting and recognizing emotional information
Detecting emotional information begins with passive sensors which capture data about the user's physical state or behavior without interpreting the input. The data gathered is analogous to the cues humans use to perceive emotions in others. For example, a video camera might capture facial expressions, body posture and gestures, while a microphone might capture speech. Other sensors detect emotional cues by directly measuring physiological data, such as skin temperature and galvanic resistance.
Recognizing emotional information requires the extraction of meaningful patterns from the gathered data. This is done using machine learning techniques that process different modalities, such as speech recognition, natural language processing, or facial expression detection, and produce either labels (i.e. 'confused') or coordinates in a valence-arousal space.
Emotion in machines
Another area within affective computing is the design of computational devices proposed to exhibit either innate emotional capabilities or that are capable of convincingly simulating emotions. A more practical approach, based on current technological capabilities, is the simulation of emotions in conversational agents in order to enrich and facilitate interactivity between human and machine. While human emotions are often associated with surges in hormones and other neuropeptides, emotions in machines might be associated with abstract states associated with progress (or lack of progress) in autonomous learning systems. In this view, affective emotional states correspond to time-derivatives (perturbations) in the learning curve of an arbitrary learning system.
Marvin Minsky, one of the pioneering computer scientists in artificial intelligence, relates emotions to the broader issues of machine intelligence stating in The Emotion Machine that emotion is "not especially different from the processes that we call 'thinking.'"
Technologies of affective computing
One can take advantage of the fact that changes in the autonomic nervous system indirectly alter speech, and use this information to produce systems capable of recognizing affect based on extracted features of speech. For example, speech produced in a state of fear, anger or joy becomes faster, louder, precisely enunciated with a higher and wider pitch range. Other emotions such as tiredness, boredom or sadness, lead to slower, lower-pitched and slurred speech. Emotional speech processing recognizes the user's emotional state by analyzing speech patterns. Vocal parameters and prosody features such as pitch variables and speech rate are analyzed through pattern recognition.
Speech recognition is a great method of identifying affective state, having an average success rate reported in research of 63%. This result appears fairly satisfying when compared with humans’ success rate at identifying emotions, but a little insufficient compared to other forms of emotion recognition (such as those which employ physiological states or facial processing). Furthermore, many speech characteristics are independent of semantics or culture, which makes this technique a very promising one to use.
The process of speech/text affect detection requires the creation of a reliable database, knowledge base, or vector space model, broad enough to fit every need for its application, as well as the selection of a successful classifier which will allow for quick and accurate emotion identification.
Currently, the most frequently used classifiers are linear discriminant classifiers (LDC), k-nearest neighbour (k-NN), Gaussian mixture model (GMM), support vector machines (SVM), artificial neural networks (ANN), decision tree algorithms and hidden Markov models (HMMs). Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system. The list below gives a brief description of each algorithm:
- LDC – Classification happens based on the value obtained from the linear combination of the feature values, which are usually provided in the form of vector features.
- k-NN – Classification happens by locating the object in the feature space, and comparing it with the k nearest neighbours (training examples). The majority vote decides on the classification.
- GMM – is a probabilistic model used for representing the existence of sub-populations within the overall population. Each sub-population is described using the mixture distribution, which allows for classification of observations into the sub-populations.
- SVM – is a type of (usually binary) linear classifier which decides in which of the two (or more) possible classes, each input may fall into.
- ANN – is a mathematical model, inspired by biological neural networks, that can better grasp possible non-linearities of the feature space.
- Decision tree algorithms – work based on following a decision tree in which leaves represent the classification outcome, and branches represent the conjunction of subsequent features that lead to the classification.
- HMMs – a statistical Markov model in which the states and state transitions are not directly available to observation. Instead, the series of outputs dependent on the states are visible. In the case of affect recognition, the outputs represent the sequence of speech feature vectors, which allow the deduction of states’ sequences through which the model progressed. The states can consist of various intermediate steps in the expression of an emotion, and each of them has a probability distribution over the possible output vectors. The states’ sequences allow us to predict the affective state which we are trying to classify, and this is one of the most commonly used techniques within the area of speech affect detection.
The vast majority of present systems are data-dependent. This creates one of the biggest challenges in detecting emotions based on speech, as it implicates choosing an appropriate database used to train the classifier. Most of the currently possessed data was obtained from actors and is thus a representation of archetypal emotions. Those so-called acted databases are usually based on the Basic Emotions theory (by Paul Ekman), which assumes the existence of six basic emotions (anger, fear, disgust, surprise, joy, sadness), the others simply being a mix of the former ones. Nevertheless, these still offer high audio quality and balanced classes (although often too few), which contribute to high success rates in recognizing emotions.
However, for real life application, naturalistic data is preferred. A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the human-computer interaction (HCI).
Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain, and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone. The first attempt to produce such database was the FAU Aibo Emotion Corpus for CEICES (Combining Efforts for Improving Automatic Classification of Emotional User States), which was developed based on a realistic context of children (age 10-13) playing with Sony’s Aibo robot-pet. Likewise, producing one standard database for all emotional research would provide a method of evaluating and comparing different affect recognition systems.
The complexity of the affect recognition process increases with the amount of classes (affects) and speech descriptors used within the classifier. It is therefore crucial to select only the most relevant features in order to assure the ability of the model to successfully identify emotions, as well as increasing the performance, which is particularly significant to real-time detection. The range of possible choices is vast; with some studies mentioning the use of over 200 distinct features. It is crucial to identify those that are redundant and undesirable in order to optimize the system, and increase the success rate of correct emotion detection. The most commonly speech characteristics are categorized in the following groups
- Frequency characteristics
- Accent shape – affected by the rate of change of the fundamental frequency.
- Average pitch – description of how high/low the speaker speaks relative to the normal speech.
- Contour slope – describes the tendency of the frequency change over time, it can be rising, falling or level.
- Final lowering – the amount by which the frequency falls at the end of an utterance.
- Pitch range – measures the spread between maximum and minimum frequency of an utterance.
- Time-related features:
- Speech rate – describes the rate of words or syllables uttered over a unit of time
- Stress frequency – measures the rate of occurrences of pitch accented utterances
- Voice quality parameters and energy descriptors:
- Breathiness – measures the aspiration noise in speech
- Brilliance – describes the dominance of high Or low frequencies In the speech
- Loudness – measures the amplitude of the speech waveform, translates to the energy of an utterance
- Pause Discontinuity – describes the transitions between sound and silence
- Pitch Discontinuity – describes the transitions of fundamental frequency.
Facial affect detection
The detection and processing of facial expression is achieved through various methods such as optical flow, hidden Markov model, neural network processing or active appearance model. More than one modalities can be combined or fused (multimodal recognition, e.g. facial expressions and speech prosody or facial expressions and hand gestures) to provide a more robust estimation of the subject's emotional state.
By doing cross-cultural research in Papua New Guinea, on the Fore Tribesmen, at the end of the 1960s Paul Ekman proposed the idea that facial expressions of emotion are not culturally determined, but universal. Thus, he suggested that they are biological in origin and can therefore be safely and correctly categorised. He therefore officially put forth six basic emotions, in 1972:
However in the 1990s Ekman expanded his list of basic emotions, including a range of positive and negative emotions not all of which are encoded in facial muscles. The newly included emotions are:
- Pride in achievement
- Sensory pleasure
Facial Action Coding System
Defining expressions in terms of muscle actions A system has been conceived in order to formally categorise the physical expression of emotions. The central concept of the Facial Action Coding System, or FACS, as created by Paul Ekman and Wallace V. Friesen in 1978 are Action Units (AU). They are, basically, a contraction or a relaxation of one or more muscles. However, as simple as this concept may seem, it is enough to form the base of a complex and devoid of interpretation emotional identification system.
By identifying different facial cues, scientists are able to map them to their corresponding Action Unit code. Consequently, they have proposed the following classification of the six basic emotions, according to their Action Units (“+” here mean “and”):
Challenges in facial detection
As with every computational practice, in affect detection by facial processing, some obstacles need to be surpassed, in order to fully unlock the hidden potential of the overall algorithm or method employed. The accuracy of modelling and tracking has been an issue, especially in the incipient stages of affective computing. As hardware evolves, as new discoveries are made and new practices introduced, this lack of accuracy fades, leaving behind noise issues. However, methods for noise removal exist including Neighbourhood Averaging, linear Gaussian smoothing, Median Filtering, or newer methods such as the Bacterial Foraging Optimization Algorithm.
It is generally known that the degree of accuracy in facial recognition (not affective state recognition) has not been brought to a level high enough to permit its widespread efficient use across the world (there have been many attempts, especially by law enforcement, which failed at successfully identifying criminals). Without improving the accuracy of hardware and software used to scan faces, progress is very much slowed down.
Other challenges include
- The fact that posed expressions, as used by most subjects of the various studies, are not natural, and therefore not 100% accurate.
- The lack of rotational movement freedom. Affect detection works very well with frontal use, but upon rotating the head more than 20 degrees, “there’ve been problems”.
Gestures could be efficiently used as a means of detecting a particular emotional state of the user, especially when used in conjunction with speech and face recognition. Depending on the specific action, gestures could be simple reflexive responses, like lifting your shoulders when you don’t know the answer to a question, or they could be complex and meaningful as when communicating with sign language. Without making use of any object or surrounding environment, we can wave our hands, clap or beckon. On the other hand, when using objects, we can point at them, move, touch or handle these. A computer should be able to recognize these, analyze the context and respond in a meaningful way, in order to be efficiently used for Human-Computer Interaction.
There are many proposed methods to detect the body gesture. Some literature differentiates 2 different approaches in gesture recognition: a 3D model based and an appearance-based. The foremost method makes use of 3D information of key elements of the body parts in order to obtain several important parameters, like palm position or joint angles. On the other hand, Appearance-based systems use images or videos to for direct interpretation. Hand gestures have been a common focus of body gesture detection, apparentness[vague]methods and 3-D modeling methods are traditionally used.
This could be used to detect a user’s emotional state by monitoring and analysing their physiological signs. These signs range from their pulse and heart rate, to the minute contractions of the facial muscles. This area of research is still in relative infancy as there seems to be more of a drive towards affect recognition through facial inputs. Nevertheless, this area is gaining momentum and we are now seeing real products which implement the techniques. The three main physiological signs that can be analysed are: Blood Volume Pulse, Galvanic Skin Response, Facial Electromyography
Blood Volume Pulse
A subject’s Blood Volume Pulse (BVP) can be measured by a process called photoplethysmography, which produces a graph indicating blood flow through the extremities. The peaks of the waves indicate a cardiac cycle where the heart has pumped blood to the extremities. If the subject experiences fear or is startled, their heart usually ‘jumps’ and beats quickly for some time, causing the amplitude of the cardiac cycle to increase. This can clearly be seen on a photoplethysmograph when the distance between the trough and the peak of the wave has decreased. As the subject calms down, and as the body’s inner core expands, allowing more blood to flow back to the extremities, the cycle will return to normal.
Infra-red light is shone on the skin by special sensor hardware, and the amount of light reflected is measured. The amount of reflected and transmitted light correlates to the BVP as light is absorbed by hemoglobin which is found richly in the blood stream.
It can be cumbersome to ensure that the sensor shining infra-red light and monitoring the reflected light is always pointing at the same extremity, especially seeing as subjects often stretch and readjust their position whilst using a computer. There are other factors which can affect your Blood Volume Pulse. As it is a measure of blood flow through the extremities, if the subject feels hot, or particularly cold, then their body may allow more, or less, blood to flow to the extremities, all of this regardless of the subject’s emotional state.
Facial Electromyography is a technique used to measure the electrical activity of the facial muscles by amplifying the tiny electrical impulses that are generated by muscle fibers when they contract. The face expresses a great deal of emotion, however there are two main facial muscle groups that are usually studied to detect emotion: The corrugator supercilii muscle, also known as the ‘frowning’ muscle, draws the brow down into a frown, and therefore is the best test for negative, unpleasant emotional response. The zygomaticus major muscle is responsible for pulling the corners of the mouth back when you smile, and therefore is the muscle used to test for positive emotional response.
Galvanic Skin Response
Galvanic Skin Response (GSR) is a measure of skin conductivity, which is dependent on how moist the skin is. As the sweat glands produce this moisture and the glands are controlled by the body’s nervous system, there is a correlation between GSR and the arousal state of the body. The more aroused a subject is, the greater the skin conductivity and GSR reading.
It can be measured using two small silver chloride electrodes placed somewhere on the skin, and applying small voltage between them. The conductance is measured by a sensor. To maximize comfort and reduce irritation the electrodes can be placed on the feet, which leaves the hands fully free to interface with the keyboard and mouse.
Aesthetics, in the world of art and photography, refers to the principles of the nature and appreciation of beauty. Judging beauty and other aesthetic qualities is a highly subjective task. Computer scientists at Penn State treat the challenge of automatically inferring aesthetic quality of pictures using their visual content as a machine learning problem, with a peer-rated on-line photo sharing website as data source. They extract certain visual features based on the intuition that they can discriminate between aesthetically pleasing and displeasing images.
In e-learning applications, affective computing can be used to adjust the presentation style of a computerized tutor when a learner is bored, interested, frustrated, or pleased. Psychological health services, i.e. counseling, benefit from affective computing applications when determining a client's emotional state.
Robotic systems capable of processing affective information exhibit higher flexibility while one works in uncertain or complex environments. Companion devices, such as digital pets, use affective computing abilities to enhance realism and provide a higher degree of autonomy.
Other potential applications are centered around social monitoring. For example, a car can monitor the emotion of all occupants and engage in additional safety measures, such as alerting other vehicles if it detects the driver to be angry. Affective computing has potential applications in human computer interaction, such as affective mirrors allowing the user to see how he or she performs; emotion monitoring agents sending a warning before one sends an angry email; or even music players selecting tracks based on mood.
One idea, put forth by the Romanian researcher Dr. Nicu Sebe in an interview, is the analysis of a person’s face while they are using a certain product (he mentioned ice cream as an example). Companies would then be able to use such analysis to infer whether their product will or will not be well received by the respective market.
One could also use affective state recognition in order to judge the impact of a tv advertisement though a real-time video recording of that person and through the subsequent study of his or her facial expression. Averaging the results obtained on a large group of subjects, one can tell whether that commercial (or movie) has the desired effect and what the elements which interest the watcher most are.
Affective computing is also being applied to the development of communicative technologies for use by people with autism.
- Tao, Jianhua; Tieniu Tan (2005). "Affective Computing: A Review". Affective Computing and Intelligent Interaction. LNCS 3784. Springer. pp. 981–995. doi:10.1007/11573548.
- James, William (1884). "What is Emotion". Mind 9: 188–205. doi:10.1093/mind/os-IX.34.188. Cited by Tao and Tan.
- "Affective Computing" MIT Technical Report #321 (Abstract), 1995
- Kleine-Cosack, Christian (October 2006). "Recognition and Simulation of Emotions" (PDF). Archived from the original on May 28, 2008. Retrieved May 13, 2008. "The introduction of emotion to computer science was done by Pickard (sic) who created the field of affective computing."
- Diamond, David (December 2003). "The Love Machine; Building computers that care". Wired. Archived from the original on 18 May 2008. Retrieved May 13, 2008. "Rosalind Picard, a genial MIT professor, is the field's godmother; her 1997 book, Affective Computing, triggered an explosion of interest in the emotional side of computers and their users."
- Garay, Nestor; Idoia Cearreta; Juan Miguel López; Inmaculada Fajardo (April 2006). "Assistive Technology and Affective Mediation" (PDF). Human Technology: an Interdisciplinary Journal on Humans in ICT Environments 2 (1): 55–83. Archived from the original on 28 May 2008. Retrieved 2008-05-12.
- Heise, David (2004). "Enculturating agents with expressive role behavior". In Trappl, Robert. Agent Culture: Human-Agent Interaction in a Mutlicultural World. Lawrence Erlbaum Associates. pp. 127–142.
- Restak, Richard (2006-12-17). "Mind Over Matter". The Washington Post. Retrieved 2008-05-13.
- Breazeal, C. and Aryananda, L. Recognition of affective communicative intent in robot-directed speech. Autonomous Robots 12 1, 2002. pp. 83–104.
- Dellaert, F., Polizin, t., and Waibel, A., Recognizing Emotion in Speech", In Proc. Of ICSLP 1996, Philadelphia, PA, pp.1970-1973, 1996
- Lee, C.M.; Narayanan, S.; Pieraccini, R., Recognition of Negative Emotion in the Human Speech Signals, Workshop on Auto. Speech Recognition and Understanding, Dec 2001
- Hudlicka 2003, p. 24
- Hudlicka 2003, p. 25
- Charles Osgood; William May; Murray Miron (1975). Cross-Cultural Universals of Affective Meaning. Univ. of Illinois Press. ISBN 978-94-007-5069-2.
- Erik Cambria; Amir Hussain (2012). Sentic Computing: Techniques, Tools, and Applications. Springer. ISBN 978-94-007-5069-2.
- Scherer 2010, p. 241
- “Gaussian Mixture Model.” Connexions – Sharing Knowledge and Building Communities. Web. 10 Mar. 2011. <http://cnx.org/content/m13205/latest/>.
- Erik Cambria; Thomas Mazzocco; Amir Hussain (2013). "Application of multi-dimensional scaling and artificial neural networks for biologically inspired opinion mining.". Biologically Inspired Cognitive Architectures 4. pp. 41–53.
- Ekman, P. & Friesen, W. V (1969). The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica, 1, 49–98.
- Steidl, Stefan (5 March 2011). "FAU Aibo Emotion Corpus". Pattern Recognition Lab.
- Scherer 2010, p. 243
- Caridakis, G.; Malatesta, L.; Kessous, L.; Amir, N.; Raouzaiou, A.; Karpouzis, K. (November 2–4, 2006). "Modeling naturalistic affective states via facial and vocal expressions recognition". International Conference on Multimodal Interfaces (ICMI’06). Banff, Alberta, Canada.
- Balomenos, T.; Raouzaiou, A.; Ioannou, S.; Drosopoulos, A.; Karpouzis, K.; Kollias, S. (2004). "Emotion Analysis in Man-Machine Interaction Systems". Machine Learning for Multimodal Interaction. Lecture Notes in Computer Science 3361. Springer-Verlag. pp. 318–328.
- Ekman, Paul (1972). Cole, J., ed. "Universals and Cultural Differences in Facial Expression of Emotion". Nebraska Symposium on Motivation. Lincoln, Nebraska: University of Nebraska Press. pp. 207–283.
- Ekman, Paul (1999). "Basic Emotions". In Dalgleish, T; Power, M. Handbook of Cognition and Emotion. Sussex, UK: John Wiley & Sons..
- “Facial Action Coding System (FACS) and the FACS Manual.” A Human Face. N.p., n.d. Web. 21 Mar. 2011. <http://face-and-emotion.com/dataface/facs/description.jsp>.
- Clever Algorithms. “Bacterial Foraging Optimization Algorithm – Swarm Algorithms – Clever Algorithms.” Clever Algorithms. N.p., n.d. Web. 21 Mar. 2011. <http://www.cleveralgorithms.com/nature-inspired/swarm/bfoa.html>.
- “Soft Computing.” Soft Computing. N.p., n.d. Web. 18 Mar. 2011. <www.softcomputing.net/bfoa-chapter.pdf>.
- Nagpal, Renu, Pooja Nagpal, and Sumeet Kaur. “Hybrid Technique for Human Face Emotion Detection.” (IJACSA) International Journal of Advanced Computer Science and Applications 1.6 (2010): 91-101. TheSAI. Web. 11 Mar. 2011. doi: 10.14569/IJACSA.2010.010615
- Williams, Mark. “Better Face-Recognition Software – Technology Review.” Technology Review: The Authority on the Future of Technology. N.p., n.d. Web. 21 Mar. 2011. <http://www.technologyreview.com/Infotech/18796/?a=f>.
- J. K. Aggarwal, Q. Cai, Human Motion Analysis: A Review, Computer Vision and Image Understanding, Vol. 73, No. 3, 1999
- Pavlovic, Vladimir I.; Sharma, Rajeev; Huang, Thomas S. (1997). "Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review". IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Picard, Rosalind (1998). Affective Computing. MIT.
- Larsen JT, Norris CJ, Cacioppo JT, “Effects of positive and negative affect on electromyographic activity over zygomaticus major and corrugator supercilii”, (September 2003)
- Ritendra Datta, Dhiraj Joshi, Jia Li and James Z. Wang, Studying Aesthetics in Photographic Images Using a Computational Approach, Lecture Notes in Computer Science, vol. 3953, Proceedings of the European Conference on Computer Vision, Part III, pp. 288-301, Graz, Austria, May 2006.
- S. Asteriadis, P. Tzouveli, K. Karpouzis, S. Kollias, Estimation of behavioral user state based on eye gaze and head pose—application in an e-learning environment, Multimedia Tools and Applications, Springer, Volume 41, Number 3 / February, 2009, pp. 469-493.
- Projects in Affective Computing
- Hudlicka, Eva (2003). "To feel or not to feel: The role of affect in human-computer interaction". International Journal of Human-Computer Studies 59 (1-2): 1–32. doi:10.1016/s1071-5819(03)00047-8. CiteSeerX: 10.1.1.180.6429.
- Scherer, Klaus R, Banziger, T & Roesch, Etienne B (2010). A blueprint for affective computing: a sourcebook. Oxford: Oxford University Press.
- Affective Computing Research Group at the MIT Media Laboratory
- Computational Emotion Group at USC
- Emotive Computing Group at the University of Memphis
- 2011 International Conference on Affective Computing and Intelligent Interaction
- Brain, Body and Bytes: Psychophysiological User Interaction CHI 2010 Workshop (10-15, April 2010)
- International Journal of Synthetic Emotions
- IEEE Transactions on Affective Computing (TAC)
- Renu Nagpal, Pooja Nagpal and Sumeet Kaur, “Hybrid Technique for Human Face Emotion Detection” International Journal of Advanced Computer Science and Applications(IJACSA), 1(6), 2010
- openSMILE: popular state-of-the-art open-source toolkit for large-scale feature extraction for affect recognition and computational paralinguistics