[PAST] Exploiting Deep Nets for Better Gaze Estimates and Gaze Estimates for Better Deep Nets
[PAST] Exploiting Deep Nets for Better Gaze Estimates and Gaze Estimates for Better Deep Nets [1,088]
2019 / 10 / 24 PM 2:00
Location: 301 - 1420
Speaker: Prof. Bert Shi
Bert Shi is Professor and Head of the Department of Electronic and Computer Engineering at the Hong Kong University of Science and Technology, where he has been on the faculty since 1994. He received the B.S. and M.S. Degrees in Electrical Engineering at Stanford University in 1987 and 1988, and the Ph.D. degree in Electrical Engineering at the University of California at Berkeley in 1994. He has held visiting faculty positions with the Department of Bioengineering at the University of Pennsylvania in 2002 and with the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology in 2009.

His research interests lie in the areas of neuromorphic engineering, robotics, human machine interfaces, computational neuroscience, with a particular focus on the use of machine learning in visual information processing and visually guided control. He was named IEEE Fellow in "for contributions to the analysis, implementation and application of cellular neural networks" in 2001. He was a Distinguished Lecturer for the IEEE Circuits and Systems Society for 2001-2002 and 2007-2008. His research group won the 2017 Facial Expression Recognition and Analysis (FERA) AU Intensity Estimation Challenge, and have received top paper prizes at international conferences. He is or has been an Associate Editor for IEEE Transactions on Circuits and Systems I and II, the IEEE Transactions on Biomedical Circuits and Systems and Frontiers in Neuromorphic Engineering. He was a Chair of the IEEE Circuits and Systems Society Technical Committee on Cellular Neural Networks and Array Computing from 2003-2005. He was Technical Program Chair of the 2004 IEEE International Workshop on Cellular Neural Networks and their Applications, and General Chair of the 2005 IEEE International Workshop on Cellular Neural Networks and their Applications.


Given the well-known success of deep neural networks in computer vision, it is perhaps unsurprising that they can be used effectively to improve the quality of gaze direction estimates from facial images. What may be unexpected is that the opposite is also true: gaze estimates can be used to improve the performance of deep nets. In this talk, I will describe work on both of these problems. I begin with a discussion of two methods for obtaining better gaze estimates from deep networks. First, I describe the use of dilated convolutions to deal with the problem that gaze changes are subtle: large changes in gaze result in very slight changes in appearance. Second, I describe a method for unsupervised outlier detection to learn how to identify situations where gaze estimates are unreliable, e.g. due to blinks or occlusions, without the need for extra training labels. I then switch gears to discuss the reverse problem: how estimates of gaze can be used to improve the performance of deep networks for end-to-end autonomous driving. Recent work has shown that human novices benefit from simply observing the gaze patterns of experts while performing a task. It turns out the same is true for deep nets for imitation learning. In particular, it is possible to train a conditional adversarial network to replicate human gaze patterns, and to use the output of this network to help train an imitation learning network by replacing standard dropout layers with “gaze-modulated” dropout layers. Gaze-modulated dropout encourages the network to pay more attention to visual areas used by human drivers when generating steering commands. Experimental results in a driving simulator show superior performance in terms of reduced prediction error and increased distance travelled between infractions when compared with standard dropout as well as other possible ways to integrate gaze information.