"Computational Methods for Analysis of Mouth Shapes in Sign Languages", National Institutes of Health, R21-DC-011081.
American Sign Language (ASL) grammar is specified by the manual sign (the hand) and by the nonmanual components (the face). These facial articulations perform significant semantic, prosodic, pragmatic, and syntactic functions. This proposal will systematically study mouth positions in ASL. Our hypothesis is that ASL mouth positions are more extensive than those used in speech. To study this hypothesis, this project is divided into three studies. First we study the assumption that mouth positions are fundamental for the understanding of signs produced in context because they are very distinct from signs seen in isolation. To study this we have recently collected a database of ASL sentences and nonmanuals in over 3,600 video clips from 20 Deaf native signers. Our experiments will use this database to identify potential mappings from visual to linguistic features. Our second goal is to design a set of shape analysis and discriminant analysis algorithms that can efficiently analyze the large number of frames in these video clips. This way, we expect to define a linguistically useful model, i.e., the smallest model that contains the main visual features from which further predictions can be made. Then, in our third study, we will explore the hypothesis that the linguistically distinct mouth positions are also visually distinct. In particular, we will use the algorithms defined in the second study to determine if distinct visual features are used to define different linguistic categories. This result will show whether linguistically meaningful mouth positions are not only necessary in ASL, but whether they are defined using nonoverlapping visual features. These studied combined address a critical need. At present, the study of nonmanuals must be carried out manually, that is, the shape and position of each facial feature in each frame must be recorded by hand. Furthermore, to be able to draw conclusive results for the design of a linguistic model, it is necessary to study many video sequences of related sentences as produced by different signers. It has thus proven nearly impossible to continue this research manually.
"ASL Nonmanuals: A linguistic and computational analysis", National Institutes of Health, R01 (with Prof. Wilbur)
In this project we are investigating the role of the facial components, combinations of components, and interactions of components that constitute facial expressions nonmanual markers) in the grammar of American Sign Language (ASL). Some of these components have already been shown to differ in significant ways from those used by the general hearing population. They may carry semantic, prosodic, pragmatic, and syntactic information that may not be provided by the manual signing itself. We are compiling an inventory of facial articulations, and constructing a database of video images of these in isolation and in context. One of our major goals is to develop computational tools to construct a model of facial behavior in ASL.
To successfully accomplish the above, we propose an integrated linguistic and computational approach to the study of nonmanuals. As stated above, the major goal is to construct a computational phonological model of ASL nonmanuals. To this end, we have targeted a relevant set of facial features and have identified several experiments that will provide the necessary information for compiling the model. A necessary step in preparation for these experiments is to develop computer vision and pattern recognition algorithms that automatically extract these facial features from a large quantity of videos. These algorithms should be capable of processing data more accurately and efficiently than can be done by hand. Finally, by comparing these results with those obtained from native ASL signers in a series of perceptual studies, we can determine which of our hypotheses are correct and which need to be modified.
A recent talk summarizing our progress in this arena was given in IEEE CVPR 2008 in Anchorage, AK. The slides of this talk are available here.
"RI: Computer Vision Algorithms for the Study of Facial Expressions of Emotions in Sign Languages", National Science Foundation, IIS-07-13055.
The goal of this project is to study and define the essential differences in processing facial expressions of emotions by deaf native users of ASL. This will then be used to design a face avatar that can mimic such emotional expressions and can be employed for educational purposes. We will also study the relation and implications that our findings have on the analysis of ASL non-manuals as defined in the NIH project summary above.
To achieve our goal we are developing computer vision algorithms that can be used to study the differences in production of facial expressions of emotions between native users of ASL and hearing non-signers. In particular, we are using computer vision techniques to estimate the motion of each facial feature over a video sequence and extracting three-dimensional information that is believed to be employed by users of ASL. This process requires the extension of our preliminary work on motion estimation, structure-from-motion, and shape and face analysis. Finally, these results will be used to extend our current model and understanding of face perception.
A recent talk summarizing our progress in this arena was given in IEEE CVPR 2008 in Anchorage, AK. The slides of this talk are available here.Advances on the design of pattern recognition and machine learning techniques were summarized in a CVPR tutorial.