Position: Postdoctoral Fellow

Current Institution: Carnegie Mellon University

Abstract:
Data-driven synthesis and evaluation of syntactic facial expressions in ASL animation

Deaf adults using sign language as a primary means of communication tend to have low literacy skills in written languages due to limited spoken language exposure and other educational factors. For example, standardized testing in the U.S. reveals that a majority of deaf high school graduates perform at or below a fourth-grade English reading level. If the reading level of text on websites, television captioning, or other media is too complex, these adults may not comprehend the conveyed message despite having read the text. The number of people using sign language as a primary means of communication is considerable: 500,000 in the U.S. (American Sign Language – ASL) and 70 million worldwide. Technology to automatically synthesize linguistically accurate and natural-looking sign language animations can increase information accessibility for this population. State-of-art sign language animation tools focus mostly on

State-of-art sign language animation tools focus mostly on accuracy of manual signs rather than on facial expressions. We investigate the synthesis of syntactic ASL facial expressions, which are grammatically required and essential to the meaning of ASL animations as shown by prior research. Specifically, we show that an annotated sign language corpus, including both the manual and non-manual signs, can be used to model and generate linguistically meaningful facial expressions, if it is combined with facial feature extraction techniques, statistical machine learning, and an animation platform with detailed facial parameterization. Our synthesis approach uses a data-driven methodology in which recordings of human ASL signers are used as a basis for generating face and head movements for animation. We train our models with facial expression examples that are represented as MPEG-4 facial action time series extracted from an ASL video corpus using computer vision based face-tracking. To avoid idiosyncratic aspects of a single performance, we model a facial expression based on the underlying trace of movements learned from multiple recordings of different sentences where such expressions occur. Latent traces are obtained using Continuous Profile Models (CPM), which are probabilistic generative models building upon Hidden Markov Models. To support

Our synthesis approach uses a data-driven methodology in which recordings of human ASL signers are used as a basis for generating face and head movements for animation. We train our models with facial expression examples that are represented as MPEG-4 facial action time series extracted from an ASL video corpus using computer vision based face-tracking. To avoid idiosyncratic aspects of a single performance, we model a facial expression based on the underlying trace of movements learned from multiple recordings of different sentences where such expressions occur. Latent traces are obtained using Continuous Profile Models (CPM), which are probabilistic generative models building upon Hidden Markov Models. To support generation of ASL animations with facial expressions, we enhanced a virtual human character in the open source animation platform EMBR with face controls following the MPEG-4 Facial Animation standard, ASL hand shapes, and a pipeline to embed MPEG4 facial expression streams in ASL sentences represented as EMBR scripts with body movement information. We assessed our modeling approach through comparison with an alternative centroid approach, where a single representative performance was selected by minimizing DTW distance from the other examples. Through both metric evaluation and an experimental user study with Deaf participants, we found that the facial expressions driven by our CPM models produce high-quality facial expressions that are more similar to

We assessed our modeling approach through comparison with an alternative centroid approach, where a single representative performance was selected by minimizing DTW distance from the other examples. Through both metric evaluation and an experimental user study with Deaf participants, we found that the facial expressions driven by our CPM models produce high-quality facial expressions that are more similar to human performance of novel sentences. Our user study draws from our prior work in rigorous methodological research on how experiment design affects study outcomes when evaluating sign language animations with facial expressions.

Bio:

Hernisa Kacorri is a Post Doctoral Fellow at the Human-Computer Interaction Institute at Carnegie Mellon University. As a member of the Cognitive Assistance Lab she works with Chieko Asakawa, Kris Kitani, and Jeff Bigham to help people with visual impairment understand the surrounding world. She recently received her Ph.D. in Computer Science from the Graduate Center CUNY, as a member of the Linguistic and Assistive Technologies Lab at CUNY and RIT, advised by Matt Huenerfauth. Her dissertation focused on developing mathematical models of human facial expressions for synthesizing animations of American Sign Language that are linguistically accurate and easy to understand. She designed and conducted experimental research studies with deaf and hard-of-hearing participants, and created a framework for rapid prototyping and generation of animations of American Sign Language for empirical evaluation studies. To support her research, she contributed software to enhance the open source EMBR animation platform with MPEG-4 based facial expression and released an experimental stimuli and question dataset for benchmarking future studies. As part of the emerging field of human-data interaction, her work lies at the intersection of accessibility, computational linguistics, and applied machine learning. Her research was supported by NSF, CUNY Science Fellowship, and Mina Rees Dissertation Fellowship in the Sciences. During her Ph.D. Hernisa also visited, as a research intern, the Accessibility Research Group at IBM Research – Tokyo (2013) and the Data Science and Technology Group at Lawrence Berkeley National Lab (2015).
Hernisa’s research interest in accessibility was sparked at the National and Kapodistrian University of Athens, where she earned her M.S. and B.S. degrees in Computer Science and was a member of the Speech and Accessibility Lab, supervised by Georgios Kouroupetroglou. She was involved in a number of research projects that supported people with disabilities in Higher Education. She developed software to support audio rendering of MathML in Greek, contributed to MathPlayer, and developed an 8-dot Braille code for Greek monotonic and polytonic writing systems. She served as one of the two Assistive Technologies Specialists at the University of Athens, taught at national vocational training programs for blind students, and led seminars and workshops promoting accessibility.