IIIT Hyderabad Publications |
|||||||||
|
Driver Attention Monitoring using Facial FeaturesAuthor: Isha Dua Dua Date: 2020-05-01 Report no: IIIT/TH/2020/40 Advisor:C V Jawahar AbstractHow can we assess the quality of human driving using AI? Driver inattention is one of the leading causes of vehicle crashes and incidents worldwide. Driver inattention includes driver fatigue leading to drowsiness and driver distraction, say due to the use of cellphone or rubbernecking, all of which leads to a lack of situational awareness. Hitherto, techniques presented to monitor driver attention evaluated factors such as fatigue and distraction independently. However, to develop a robust driver attention monitoring system, all the factors affecting a driver’s attention needs to be analyzed holistically. In this thesis, we present two novel approaches for driver attention analysis on the road using driver video and fusion of driver and road video. In the first approach, we propose the driver attention rating system that leverages the front camera of a windshield-mounted smartphone to monitor the driver attention by combining several features. We derive a driver attention rating by fusing spatio-temporal features based on the driver state and behavior such as head pose, eye gaze, eye closure, yawns, use of cellphones, etc. We present a few architectures for feature aggregation like AutoRate and Attention-based AutoRate. We perform an extensive evaluation of feature aggregation networks on real-world driving data and also data from controlled, static vehicle settings with 30 drivers in a large city. We compare the proposed method’s automaticallygenerated rating with the scores given by 5 human annotators. We introduce the kappa coefficient, an evaluation metric to compute the inter-rater agreement between the generated rating and the rating provided by human annotators. We observe that Attention-based AutoRate outperforms other proposed designs for feature aggregation by 10%. Further, we use the learned temporal and spatial attention to visualize the key frame and the key action, which justifies the model’s predicted rating. Finally, to provide driver-specific results, we fine-tune the Attention-based AutoRate model using the specific driver data to give personalized driver experience. In the second approach, we propose driver gaze mapping on the road using the fusion of driver and road videos as input. The proposed approach is used to estimate driver attention and determine which objects the driver is focusing on while driving. To solve such a task, we introduce a new dataset called DGAZE, which is an image dataset that contains the driver view and road view annotated with the driver gaze point on the road. The data is collected in a lab setting, mimicking road conditions using low-cost mobile phone cameras. It has a total of 100,000 images collected with 20 drivers and 103 unique objects on the road belonging to 7 classes, including cars, pedestrians, traffic signals, auto-rickshaw, etc. We also present I-DGAZE, a fused convolutional neural network trained on the DGAZE dataset for predicting driver gaze on the road. Our architecture combines facial features such as facial key-point location and head pose along with the image of the left eye to get optimum results. Our model achieves an error of 94.5 pixels without calibration and 45 pixels with calibration. We compare our model with state-of-the-art eye gaze works and present extensive ablation results. Overall, in this thesis, we propose two methods for driver attention analysis on the road. These approaches provide feedback about the quality of driver attention using driver video and fusion of driver and road video. We introduced dataset for driver attention rating and driver gaze mapping on the road. We also introduced two novel architectures, Attention-based AutoRate and I-DGAZE, corresponding to each proposed task. The evaluation metric and experimental results prove the efficacy of the same. Our two significant contributions include the proposal of a rating system for measuring driver inattention on the road and the dataset consisting of both driver and road view along with driver gaze location on the road. Full thesis: pdf Centre for Visual Information Technology |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |