IVAS: A multimodal AI system for objective video interview assessment with facial emotion, gaze, and audio analysis

https://doi.org/10.55214/2576-8484.v10i1.11626

Authors

  • Syed Azeem Inam Department of Artificial Intelligence and Mathematical Sciences, Sindh Madressatul Islam University, Karachi, Pakistan.
  • Abdul Kabeer Department of Artificial Intelligence and Mathematical Sciences, Sindh Madressatul Islam University, Karachi, Pakistan.
  • Muneeb Ahmed Abbasi Department of Artificial Intelligence and Mathematical Sciences, Sindh Madressatul Islam University, Karachi, Pakistan.
  • Abdullah Ayub Khan Department of Computer Science, Bahria University Karachi Campus, Karachi 75260, Pakistan.

Traditional interviewing literature has regularly highlighted its vulnerability to human subjectivity and unconscious bias, as such pitfalls may cause interviewers to overlook critical behavioral cues. To address these shortcomings, the present study proposes an artificial intelligence-based assessment tool, the Interview Video Analysis System (IVAS), which integrates Facial Emotion Recognition (FER), Gaze Tracking, and Audio Analysis technologies into a single, cohesive system to evaluate candidates objectively. IVAS comprises a Convolutional Neural Network (CNN) based on 22 layers trained on the FER-2013 dataset, achieving an accuracy of 86% in recognizing seven different emotions: Anger, Fear, Sadness, Happiness, Disgust, Surprise, and Neutral, through sophisticated data augmentation and hyperparameter optimization. The system utilizes 68-point facial features from dlib, with its gaze module measuring eye contact, directional changes, and blinks, relating these metrics to engagement and confidence. Additionally, the audio component employs Llama 3.2 with 11 billion parameters and Mel-Frequency Cepstral Coefficients (MFCC) to extract voice features such as pitch, hesitancy, and fluency, and transcribes speech for linguistic analysis. The late-fusion logic aggregates outputs from each module into a well-organized system that provides percentage measures and performance levels. As a Streamlit web application, IVAS can process live and recorded interviews in real-time, offering recruiters a scalable, data-driven assessment tool. This system outperforms other state-of-the-art models by approximately 12 to 30% in FER efficiency and is the first to incorporate multimodal behavioral analysis to reduce ambiguities inherent in unimodal methods. Collectively, this comprehensive system enhances traditional interviewing techniques by providing standardized, bias-proof insights into candidate compatibility.

How to Cite

Inam, S. A., Kabeer, A., Abbasi, M. A., & Khan, A. A. (2026). IVAS: A multimodal AI system for objective video interview assessment with facial emotion, gaze, and audio analysis. Edelweiss Applied Science and Technology, 10(1), 525–543. https://doi.org/10.55214/2576-8484.v10i1.11626

Downloads

Download data is not yet available.

Dimension Badge

Download

Downloads

Issue

Section

Articles

Published

2026-01-01