One of the singularities of human beings that have contributed enormously to the development and growth of mankind is the ability to communicate precisely with rich and powerful spoken (and later in history, also written) languages. Having said that, a significant percentage of what is communicated does not circulate through those languages, but through nonverbal cues. These cues can be in the form of gestures performed, for example, with the hands; or also facial expressions that convey information about what is inside but not necessarily spoken.
Given how relevant facial expressions as well as speech itself have been to human interactions, it is not surprising that they have been researched for centuries. In [1] it is described how studies on facial expressions were already performed in the Aristotelian era —4th century BC.
Until the 21st century, the fast-growing computer multimedia technology and the continuous breakthrough in the field of AI technologies have made great progress in speech-based emotion recognition. The traditional machine learning algorithms based on Gaussian mixture model, support vector machine (SVM), and artificial neural networks have achieved brilliant results in speech-based emotion recognition tasks. However, the traditional machine learning algorithms have some defects in the accuracy of emotion recognition by speech and images. Improving the accuracy of emotion recognition by speech and images based on existing technologies is a critical goal of AI and deep learning algorithms.
As a deep neural network most commonly used to analyze visual images, CNN can greatly reduce the number of parameters in operation due to the parameter sharing mechanism, so it is widely used in image and video recognition technology. In a CNN, the input layer inputs data. For Speech or Image data, we usually convert them into a feature vector, and then input it into the neural network, and the convolution kernel in the convolution layer performs the convolution operation on the input of the upper layer and the data of this layer. Through local connection and global sharing, CNN greatly reduces the number of parameters, and enhances the learning efficiency.
Through multi-layer convolution operation, the data extracted from low-level features is input into the linear rectification layer and pooling layer for down-sampling. The pooled data cannot only further reduce the network training parameters, but also strengthen the fitting degree of the model to a certain extent. Finally, the full connection layer transfers the input data to neurons, and the output layer outputs the final result. Figure 1 displays the whole operation process of CNN.
Fig 1. Structure of the CNN model taken from [6]
In the past two years, emotion AI vendors have moved into completely new areas and industries, helping organizations to create a better customer experience and unlock real cost savings. These uses include:
- Video gaming. Using
computer vision, the game console/video game detects emotions via facial
expressions during the game and adapts to it.[2]
- Medical diagnosis.
Software can help doctors with the diagnosis of diseases such as Alexithymia
by using face analysis.[3]
- Education. Learning
software prototypes have been developed to adapt to kids’ emotions. When
the child shows frustration because a task is too difficult or too simple,
the program adapts the task so it becomes less or more challenging.
Another learning system helps autistic children recognize other people's
emotions.[4]
- Employee safety. Based on Gartner client inquiries, demand for employee safety solutions are on the rise. Emotion AI can help to analyze the stress and anxiety levels of employees who have very demanding jobs such as first responders.[5]
[1] Bettadapura, Vinay. "Face expression recognition and analysis: the state of the art." arXiv preprint arXiv: 1203.6722 (2012).
[2] Aggag, Ahmed & Revett, Kenneth. (2011).
Affective gaming: a GSR based approach. 262-266.
[3] Facial emotion recognition and alexithymia in
adults with somatoform disorders Pedrosa Gil, F.; Ridout, N.; Kessler, H.;
Neuffer, M.; Schoechlin, C.; Traue, H.C.; Nickel, M. Depression and Anxiety
25(11): E133-E141 2007
[4] M. Bouhlal, K. Aarika, R. Ait Abdelouahid, S. Elfilali,
E. Benlahmar, Emotions recognition as innovative tool for improving students’
performance and learning approaches, Procedia Computer Science, Volume 175, 2020
[5] https://www.unite.ai/recognizing-employee-stress-through-facial-analysis-at-work/
[6] https://www.frontiersin.org/articles/10.3389/fpsyg.2021.818833/full
Niciun comentariu:
Trimiteți un comentariu