Dr. Farshad Safavi stands out as a leading researcher and educator in human–robot interaction and deep learning. He earned his Ph.D. in Computer Science from the University of Maryland, Baltimore County in 2025, guided by Dr. Ramana Vinjamuri. His doctoral research focused sharply on emotion recognition for enhancing human-robot interaction. Prior to this, Dr. Safavi completed a B.Eng. in Electrical Engineering at Carleton University in 2012, followed by an M.Eng. in Electrical and Computer Engineering from the University of Toronto in 2018.

Currently, he serves as a research fellow in Radiation Oncology at the University of Maryland School of Medicine. His scholarly contributions appear extensively in top-tier journals, including IEEE Transactions on Affective Computing, IEEE Access, and the Journal of Intelligent & Robotic Systems. Furthermore, he has authored a book chapter for Springer Nature and led pioneering research in real-time remote sensing applications.

Dr. Safavi’s professional journey bridges academia and industry seamlessly. He has held key positions at Pleora Technologies, Brainwave Science, and the BRAIN Center. Throughout, he has mentored both graduate and undergraduate students, particularly in multimodal AI and affective computing.

His excellence has earned him prestigious awards from NSF, IUCRC, Mitacs, and the Vision: Science to Applications program. Notably, he was a finalist at IEEE ICRA 2024 and received a travel award at BSN 2023—both underscoring his pivotal role in advancing human-centered AI technologies.

In an exclusive interaction with The Interview World, Dr. Safavi reveals insights on cutting-edge progress in real-time emotion recognition within human–robot interaction. He explores the evolving applications of brain-computer interfaces (BCI) in law enforcement. He also addresses the ethical and technical challenges of designing AI systems that accurately sense and respond to human emotions. Additionally, he explains how he maintains rigorous standards while navigating diverse fields—from mobile health to semantic segmentation of aerial imagery. Finally, he envisions a future dominated by low-power, real-time neural networks enabling on-device emotion recognition.

Below are the key takeaways from this insightful and compelling conversation.      

Q: Your research integrates deep learning and multimodal signal fusion for affective computing. What are the most promising developments in real-time emotion recognition, especially in the context of human–robot interaction?

A: Real-time emotion recognition is rapidly becoming essential for achieving truly emotionally intelligent human–robot interaction (HRI). A breakthrough in this domain lies in multimodal deep learning models that seamlessly integrate diverse data streams. These include facial expressions, gestures, eye-tracking, neurophysiological signals like electroencephalography (EEG), and language comprehension powered by large language models (LLMs). Together, these technologies push the boundaries of HRI, enabling robots not only to interact but also to perceive and respond with social awareness and emotional intelligence. Our research publications—Emerging Frontiers in Human–Robot Interaction and New Horizons in Human–Robot Interaction: Synergy, Cognition, and Emotion—highlight some of these pioneering advances.

Transformers have revolutionized this space by capturing complex temporal and cross-modal dependencies more effectively than traditional convolutional neural networks (CNNs) or recurrent neural networks (RNNs). They excel at detecting subtle emotional cues and can be optimized for real-time inference. As a result, these models accurately interpret users’ emotional states, whether through categorical labels such as happiness or contempt, or continuous dimensions like valence and arousal. We explore these developments further in our studies, Deep Fusion of Neurophysiological and Facial Features for Enhanced Emotion Detection and Facial Expression Recognition with an Efficient Mix Transformer for Affective Human-Robot Interaction.

The versatility of emotion recognition models is expanding rapidly across multiple industries, with adoption set to accelerate. For instance, wearable devices—smartwatches and earbuds—now incorporate sensors to monitor neurophysiological signals such as heart rate and galvanic skin response, enabling continuous emotional tracking. This innovation fosters emotional self-awareness and mental health support. Practical applications abound: transportation systems benefit from real-time detection of fatigue or anxiety, enhancing safety; security sectors explore deception detection; and adaptive learning platforms assess user engagement to personalize content dynamically.

Within HRI, these advancements prove invaluable, especially in scenarios limiting human contact or involving risk—such as pandemics—where social robots operate in hazardous environments. Emotion recognition equips these robots to respond empathetically, offer emotional support in mental health care, and tailor interactions in educational or rehabilitation settings. Moreover, it plays a crucial role in monitoring emotional dysregulation linked to depression, anxiety, and autism spectrum disorder. Emotion-aware systems also elevate user experiences in both HRI and human–computer interaction (HCI), powering immersive gaming and adaptive learning platforms that respond fluidly to users’ emotional states.

Looking forward, the fusion of real-time processing, broadly generalizable deep learning models, and efficient edge computing will accelerate progress. These advances will make human–robot interaction more natural, adaptive, and profoundly emotionally intelligent.

Figure 1. A multimodal affective computing framework that integrates deep learning with wearable devices. Visual, neural, and physiological signals are captured through various sensors and processed using a Transformer-based model. A deep fusion block combines these modalities to enable emotion recognition, facilitating emotional awareness that supports both emotion elicitation and the generation of emotional behaviors.
Figure 1. A multimodal affective computing framework that integrates deep learning with wearable devices. Visual, neural, and physiological signals are captured through various sensors and processed using a Transformer-based model. A deep fusion block combines these modalities to enable emotion recognition, facilitating emotional awareness that supports both emotion elicitation and the generation of emotional behaviors.

Q: With your work on EEG-based concealed information detection in collaboration with Brainwave Science, how do you see BCI applications evolving for real-world law enforcement or neurosecurity use cases?

A: Brain–Computer Interface (BCI) technology, combined with modalities like computer vision and advanced language models, is entering a pivotal phase of transformation for practical applications in law enforcement and neurosecurity. Our research proves that portable electroencephalography (EEG) devices can non-invasively detect neural signatures linked to concealed knowledge—specifically the P300 component. This breakthrough surpasses traditional polygraph techniques that depend on indirect autonomic signals.

Technologically, integrating deep learning models has empowered us to classify deceptive responses with remarkable accuracy. Our findings showcase clear progress in detecting concealment through EEG signals. Moreover, by factoring in reaction times, brain activity topography, and personalized calibration protocols, we are laying a solid foundation for adaptive, individualized assessments that further enhance precision.

Looking forward, we envision BCI technologies enabling real-time lie detection during interrogations, delivering greater objectivity and reducing false positives compared to conventional polygraphs. These systems hold immense promise for security-critical contexts—such as rapidly identifying high-risk individuals at airport checkpoints. They could also prove invaluable during interviews to expose exaggerations or hidden information from candidates. Real-time screening, powered by lightweight deep models, can facilitate these applications effectively. Additionally, advances in tailored stimulus delivery, aligned with an individual’s emotional responses, will boost the clarity of P300 signals and minimize interpretive uncertainty.

Ultimately, hybrid AI-driven BCI platforms that integrate EEG data with behavioral and physiological cues will revolutionize our ability to infer concealed knowledge or intent with unmatched robustness and accuracy.

Q: Given the growing interest in emotionally intelligent AI, what are the major ethical and technical challenges in building systems that can ‘sense’ and ‘respond’ to human emotions?

A: Developing emotionally intelligent AI faces a fundamental challenge: the highly personalized nature of neurophysiological signals and emotional expressions. We assert that subject-specific calibration dramatically enhances the accuracy of decoding these signals. This precision is crucial for dependable emotion recognition and effective deception detection. Yet, as Brain-Computer Interface (BCI) systems advance, ethical and legal frameworks must evolve in tandem—especially around privacy, informed consent, and due process.

Because this technology captures data directly from an individual’s brain activity and emotional responses, it handles profoundly personal and sensitive information. Using such data—particularly in legal or high-stakes scenarios—raises urgent questions about ownership, protection, and the individual’s right to privacy. Emotionally responsive AI can unintentionally reveal internal mental states that people want to keep confidential. Therefore, establishing strict ethical guidelines and regulatory policies becomes imperative.

These policies must address several critical areas. They should restrict how training data is collected and employed. They must guarantee transparency in how emotional inferences are drawn. Additionally, they need to clearly define legal limits for deploying these technologies in contexts like law enforcement or employment.

Despite these complex challenges, our research demonstrates the technical viability of EEG-based deception detection systems beyond controlled laboratory environments. Through deep multimodal emotion recognition—fusing facial features with neurophysiological signals—and advanced deep fusion techniques, we show promising results for real-world applications in law enforcement and neurosecurity. However, the success of these applications hinges on prioritizing ethical safeguards alongside technological innovation.

Q: Your work on mobile health and semantic segmentation in aerial imagery reflects a broad technical scope. How do you maintain rigor while transitioning across such diverse application domains?

A: Deep learning models demonstrate remarkable flexibility, adapting seamlessly to the specific datasets they are trained on—whether in mobile health, aerial semantic segmentation, or neurophysiological signal processing. For instance, as illustrated in Figure 2, semantic segmentation can precisely detect facial landmarks for expression recognition. Similarly, this approach extends to identifying road landmarks, showcasing its versatile applicability.

At the heart of deep learning lies representation learning: the power to autonomously extract meaningful features from raw input and train models to produce accurate outputs solely based on data. This capability enables the design of architectures that strike a balance between generalizability and task-specific precision across diverse domains.

Take semantic segmentation for disaster response as a case in point. In my work, such as the Comparative Study of Real-Time Semantic Segmentation Networks in Aerial Images During Flooding Events and the Comparative Study Between Real-Time and Non-Real-Time Segmentation Models on Flooding Events, I optimized deep learning models to identify damaged infrastructure using aerial imagery. This involved carefully balancing processing speed with accuracy, tackling class imbalance, and rigorously validating results against high-resolution datasets. Notably, these core principles seamlessly translate to facial expression recognition in affective computing, as demonstrated in Facial Expression Recognition with an Efficient Mix Transformer for Affective Human-Robot Interaction.

In mobile health applications, I employed analogous strategies—such as signal denoising, multimodal data fusion, and model interpretability—to develop AI systems that transform raw physiological signals into actionable insights for users. Across every project, I prioritize building reproducible AI pipelines, implementing robust cross-validation, and conducting comprehensive ablation studies. These practices ensure the reliability and generalizability of all outcomes.

By treating each application as a distinct system with its own constraints, while leveraging a unified AI foundation, I facilitate scientifically rigorous transitions between domains. This method consistently yields solutions that are not only technically robust but also deliver meaningful, real-world impact.

Figure 2. Landmark detection using semantic segmentation is employed in both remote sensing and facial expression recognition. The core principle involves identifying key features and directing greater attention to regions of higher significance during training across different datasets. For instance, as illustrated in Figure 2, a Transformer-based model was trained on the FER dataset for facial expression recognition, while a similar model was trained on the FloodNet dataset for remote sensing. In both cases, the fundamental architecture of the deep learning model remains largely unchanged.
Figure 2. Landmark detection using semantic segmentation is employed in both remote sensing and facial expression recognition. The core principle involves identifying key features and directing greater attention to regions of higher significance during training across different datasets. For instance, as illustrated in Figure 2, a Transformer-based model was trained on the FER dataset for facial expression recognition, while a similar model was trained on the FloodNet dataset for remote sensing. In both cases, the fundamental architecture of the deep learning model remains largely unchanged.

Q: With the rise of wearables and edge computing, how do you envision the future of low-power, real-time neural networks for on-device emotion recognition or brain-signal processing?

A: The surge in wearables and edge computing is revolutionizing the future of low-power, real-time neural networks for on-device emotion recognition and brain-signal processing. These applications demand lightweight, high-efficiency deep learning models capable of running seamlessly on resource-limited hardware—microcontrollers, mobile GPUs, and edge processors alike. By dramatically reducing parameters, minimizing inference latency, and shrinking model size, these architectures enable powerful neural networks to operate within tight computational and memory constraints.

Achieving such efficiency requires innovative design strategies that cut computational complexity without sacrificing accuracy. This means trimming layers and trainable parameters, pruning redundant connections, and pinpointing the neural components that most influence performance. Rather than relying on deep, cumbersome architectures, lightweight models harness advanced methods—such as attention mechanisms and efficient convolutional operations—to extract rich, meaningful features with fewer layers.

Our contributions advance this paradigm, exemplified by projects like Efficient Semantic Segmentation on Edge Devices, Real-time Aerial Pixel-wise Scene Understanding after Natural Disasters, and Large-Scale Damage Assessment with UAVs and Deep Learning. These initiatives prove that sophisticated deep learning can run robustly on low-power hardware, driving real-world applications from disaster relief to emotion-aware human–robot interaction.

Looking forward, breakthroughs in model compression, neural architecture search, and hardware-software co-design will push real-time, on-device inference further. These innovations will enhance the feasibility and reliability of emotion recognition and EEG-based systems in everyday environments, making AI truly ubiquitous and responsive.

In summary, my research harnesses AI to integrate and interpret complex, multimodal data, yielding practical solutions with wide-reaching societal and national impact. From mobile health to neurosecurity and affective human–robot interaction, each effort tackles urgent challenges. The federal support backing my work reflects its scientific rigor and strategic importance to the United States. As I advance this frontier, I remain steadfast in pioneering AI-driven innovations in medical imaging and precision healthcare—fields poised to transform clinical outcomes and define the future of patient-centered care nationwide.

Related Posts