Emerging Trends in Explainable AI
The Future Leaders of Responsible AI Present Emerging Trends
“Dear future leaders of responsible AI…” reads the letter sent from Saidot to our class, along with the famous 6-finger gloves. I couldn’t agree more - I strongly believe that the future leaders of responsible AI were in this classroom this semester. And not just because they have survived a course that covered topics including interpretable ML, explainable AI, AI Alignment, Adversarial AI, Moral AI, and Mechanistic Interpretability. I have never met a group of ~50 people (between residential and online sections) so brilliant, so passionate, so hard working.
Here is the final artifact from the course - a post containing all of the final projects from the course*. These are novel projects on emerging trends in Responsible AI. I hope you enjoy learning more about them!






Special thanks to this semester’s guest speakers: Varun Babbar, Hayden McTavish, Chris Lam, and Dr. Walter Sinnot Armstrong.



Duke University is quickly becoming the epicenter of learning, researching, and developing Responsible AI. If this post inspires you, this course will be taught again this Spring. Let’s make a sequel just as good as the original! And be sure to join us for the inaugural Duke Responsible AI Symposium Feb 28-March 1!
Project Table of Contents
XAI in Education (5)
Adversarial AI (4)
Misinformation, Bias, & Moral AI (3)
XAI in LLMs (6)
XAI in Sentiment Analysis (2)
XAI Applications in FinTech (2)
XAI Applications in Business (2)
XAI Applications in Healthcare & Social Good (3)
XAI Applications in Fun & Productivity (3)
Emergent Topics in XAI (4)
XAI Education
XAI Learning Game - Afraa Noureen
An interactive tool that bridges the gap between AI interpretability and real-world applications. Learn SHAP values, analyze model predictions, and enhance your understanding of Explainable AI—all through fun and engaging gameplay!
Art Meets AI - An XAI-Driven Approach to Neural Style Transfer - Sakshee Patil
A brilliant exploration of neural style transfer using XAI approaches. Educational, interactive, and just plain fun to use!
AI Explained! - Aarya Desai
A book of beautiful and fun visual explanations for machine learning & XAI concepts: decision trees, ridge regression, generalized additive models, LIME, PDP, Counterfactual Explanations, Attention Mechanisms, and Retrieval-Augmented-Generation (RAG).
XAI Learner Platform - Yabei Zeng
This application provides an easy-to-understand introduction to Explainable AI (XAI). Learn how XAI techniques make machine learning models more transparent and interpretable. Through interactive modules, discover how AI models work, explore sentiment analysis examples, and understand the "why" behind predictions.
Interactive Web App with tutorials, learning modules, and case studies!
Saliency Map Tutorial - Shuaiming Jing
Saliency maps are a key tool in explainable AI, helping visualize which parts of an image contribute most to a neural network's predictions.
In this tutorial, you will:
Learn the theory behind saliency maps.
Explore their real-world use cases.
Interact with models to generate and compare saliency maps.
Adversarial AI
Automated Defence Intelligence systems are vulnerable to adversarial attacks - Stuart Bladon
Defense intelligence is increasingly becoming a domain where computers play a pivotal role. This shift offers numerous advantages, as computers can swiftly process vast amounts of intelligence that would require human analysts days to analyze. Consequently, faster insights often translate into more actionable insights. However, this transformation also introduces risks, as intelligence is constantly susceptible to counterintelligence measures. Adversarial attacks have been demonstrated against various models, including image classification models and large language models (LLMs). These perturbations are often imperceptible to humans and can have severe consequences when applied to critical questions such as the likelihood of invasion.
ViT_vs_CNN_Adversarial_Attacks - Hoi Mei (Kelly) Tong
This project examines the comparative performance of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in the context of adversarial attacks. CNNs, which have long dominated computer vision tasks, are well-documented for their vulnerability to adversarial examples. The subtle perturbations of adversarial attacks cause models to misclassify inputs with high confidence. ViTs, a more recent architecture leveraging self-attention mechanisms, have shown promise in achieving competitive or superior results in various vision tasks. However, their behavior under adversarial conditions, especially compared to CNNs, remains less thoroughly studied. This project aims to provide insights into their performance differences when subjected to adversarial perturbations, focusing on accuracy as the primary evaluation metric. Furthermore, explainable Artificial Intelligence techniques such as saliency map is applied to visualize whether the two models rely on different features in predicting and labeling the adversarial examples.
AI-Attack-Prevention-Tool - Aryan Laxman Sirohi
This is a website that can detect the most common cyber attacks on Convolutional Neural Networks (CNNs) which are FGSM (Fast Gradient Sign Method), PGD (Projected Gradient Descent), C&W (Carlini & Wagner), DeepFool.
The goal of this project was to build a model that can detect if an image supplied to ResNet34 has been attacked and by which method it has been attacked by. For this project the most common attacks were considered (FGSM, PGD, C&W, and DeepFool).
Mirage - Lennox Anderson, Ahmed Boutar, Tal Erez
MIRAGE is a Python-based pipeline designed for researchers to analyze, modify, and compare audio files, with a particular focus on music. MIRAGE is versatile and equipped for tasks ranging from creating formulas to reconstructing audio files from formula-based transformations.
Misinformation, Bias, & Moral AI
Comprehensive Dataset for AI-Generated Content and Tracing Research - Anastasiia Saenko
This project introduces a comprehensive enriched dataset designed to bridge the gap between real-world content and AI-generated content, with a focus on tracing information origins, identifying biases, and addressing misinformation. The dataset is enriched with metadata such as misinformation labels, political polarization, sentiment classifications, and demographic indicators. It covers a wide range of topics, including public health (COVID-19), politics, conflict reporting, and social issues, making it an essential resource for training AI models, generating prompts, and conducting comparative analysis.
Persona Aware Bias Evaluation (PABE): Towards evaluation of implicit bias in chatbots - Jason Lee
The Persona-Aware Bias Evaluation (PABE) framework proposes a new framework to implicitly encode protected attribute information into chatbot queries through zero-shot text style transfer & persona prompting. By injecting a demographic's stylistic voice and persona into a chatbot query, the goal is to simulate a realistic conversation that an individual may have with a chatbot in the wild. Applications of PABE datasets include evaluation of fairness (i.e. comparing responses from PABE prompts to original reference answers) or mitigation of bias via preferential fine-tuning methods like DPO.
Demo Pitch (only available to Duke affiliates)
The Most Cautious and Conservative Moral-LLM - Wayne Yang
This model is fine-tuned through llama-3.1-8B. It is fine-tuned by several GPT-generated content about human safety and morality. This model will take a highly conservative approach to safety and morality, providing its reference and reasoning in the response, followed by the final response.
XAI in LLMs
LlmEmbeddingXrVizualization - Akalpit Dawkhar & Rakeen Rouf
A package for visualizing Large Language Model (LLM) embedding spacese from Hugging Face models with just the model name as input!
Inspired by the belief that data should be experienced, not just viewed, we're bridging the gap between 2D plots and spatial understanding in the LLM embeddings space. The fundamental limitation of 2D screens - trying to compress three dimensions into two - has always forced us to sacrifice either information or clarity. Our platform breaks free from these constraints, transforming raw datasets into immersive XR visualizations using nothing but the name of the model from Hugging Face. Every visualization is accessible on your Meta Quest XR Headsets. We're not just plotting data - we're creating a new way to discover insights through spatial exploration, one that respects the true dimensionality of our data. Each word/sentence embedding is meticulously positioned in virtual space, ensuring perfect spatial accuracy and true-to-scale representation. This precision becomes particularly powerful when visualizing LLM embedding spaces - allowing users to physically explore how concepts are related within these models. By walking through the three-dimensional embedding space, researchers can intuitively verify if semantically similar concepts cluster together and identify unexpected relationships that traditional 2D visualizations might miss.
InContextLab - Hongxuan Li
Python package for interpreting LLMs In-Context Learning. sklearn-compatible and easy to use.
Graph RAG: Powerful, Explainable, Augmentation - John Coogan
Large Language Models (LLMs) are an impressive development for modern technology. While seemingly magically capable of generating human-like text, they also represent some of the most difficult tools to understand. This facet of LLMs leaves room for myriad 'black box' issues such as bias, poor interpretability, adversarial attacks, and more. Ultimately, when you utilize an LLM, you often accept a level of non-determinism and opacity in your results. While there are many use cases where this is acceptable, there are many more where it is not. Beyond the more nefarious issues presented above, there are also many readily apparent, logistical, issues such as the scope and timeliness of the training data. At the end of the day, an LLM only 'knows' what it has been trained on and, while it can still generate text to answer any question (because the generation of text comes down to the question of what the next most probable word is), the quality of the answer is directly related to the quality of the training data. What we are left with is a tool that is incredibly powerful but, in domain specific applications, can be incredibly brittle.
This project is an exploration of the use of graphs, vector stores, and retrieval augmented generation to not only improve the performance of certain models but to also simultaneously address many of these shortfalls.
Interpretable RAG System - Ritu Toshniwal
This project introduces an interpretable Retrieval-Augmented Generation (RAG) system with t-SNE visualizations to help users better understand how retrieved chunks contribute to the final response generated by a large language model (LLM). The tool provides insights into chunk relevance, allows users to tweak chunking parameters interactively, and avoids time wasted on A/B testing.
XAI: Knowledge Graph Visualization of GraphRAG Model - Bob Zhang
This project aims to bridge this gap by enhancing the explainability of GraphRAG through the implementation of a visualization method that allows users to explore the underlying knowledge graph. By enabling interaction with the graph's entities and relationships, this approach provides insight into the model's retrieval mechanisms, fostering greater transparency and trust in its decision-making process.
LLM Response Analyzer - Rishabh Shah
The LLM Response Analyzer is a comprehensive tool designed to help users explore, understand, and influence the behavior of Large Language Models (LLMs). By tweaking key parameters, visualizing attention mechanisms, and steering the AI’s personality, this project demystifies how LLMs generate their outputs and opens up new possibilities for explainability and control.
XAI in Sentiment Analysis
Interpretability in Sentiment Analysis Models - Minjie Yang
Large language models (LLMs) have achieved significant success in tasks like financial sentiment analysis, where accurate predictions can directly influence decision-making processes. However, their complex architectures often make them challenging to interpret, raising concerns about their reliability and transparency in high-stakes domains such as finance. This report uses a BERT sentiment analysis model fine-tuned on financial news and explores its interpretability techniques, aiming to uncover its internal workings and enhance trust in its predictions.
LIME and Cross-Domain Sentiment Analysis Using BERT - Antara Bhide
This project explores the use of LIME (Local Interpretable Model-agnostic Explanations) to evaluate the performance of a BERT model trained on one domain and applied to another domain. Specifically, it focuses on understanding how transfer learning in NLP (Natural Language Processing) works, and how LIME can provide insights into the model's predictions when applied to cross-domain data.
XAI Applications in FinTech
Loan Application Risk Dashboard - Haoran Wang
The Loan Application Risk Dashboard is an interactive web application built using Streamlit. This dashboard allows users to:
Input loan application data (both numerical and categorical factors).
Predict the risk grade for a loan application using a trained Random Forest model.
Visualize feature contributions through Permutation Importance.
The project leverages machine learning and explainable AI techniques to help financial professionals assess loan risks more transparently.
Detecting Financial Scams with XAI - Using SHAP and LIME - Jinyoung Suh
Can XAI help detect financial scams better? Financial scams are becoming more complicated, and it’s important to not only detect them but also understand why an AI model flags something as suspicious. This project will explore how making AI more understandable can help financial institutions find fraud while reducing mistakes.
XAI Applications in Business
Counterfactuals for Customer Churn Prediction - Gunel Aghakishiyeva
This project demonstrates the use of counterfactual explanations for predicting customer churn in an online retail business. By leveraging explainable AI (XAI) techniques, the project provides actionable insights for reducing churn and improving customer retention. The approach combines machine learning with the generation of diverse counterfactuals to understand what changes in customer behavior could have prevented churn.
Market Trend Forecasting and Customer Behavior Analysis - Yiren Shen
An interactive web application with the main functions of market trend forecasting and customer behavior analysis. Goal: Provide data cleansing, customer analytics, time series analysis, product analytics and XAI insights to help companies optimize their decision making.
XAI Applications in Healthcare & Social Good
Grad-CAM and LIME with an Image-based Classifier for Abdominal MRI Series -Chad Miller
This project delivers an interactive Streamlit-based application that classifies abdominal MRI series using convolutional neural networks (CNNs) and integrates explainable AI (XAI) techniques for enhanced interpretability. The app enables users to explore MRI datasets, view model predictions, and generate visual explanations for the decisions made by the model. By utilizing LIME, Anchors, and SHAP, the application allows users to identify the most influential regions in an image, fostering transparency and trust in AI-assisted workflows.
Suicide Posts Prediction with Explainable AI - Xueqing Wu
Combines BERT and LIME to identify words that cause the BERT model to classify a social media post as being at risk of suicide. Insights from this project include words like 500mg, suicide, and hurting lead the model to predict the post as high risk of suicide. This is useful because we can use words like suicide and hurting as keyword to detect suicide risk on social media.
Predicting the location of hidden graves in municipalities of Mexico using ML models (a suggestion on improving explainability) - Daniela Jimenez Lara
Current SOTA models use non inherently-interpretable models. This project aimed to address this by exploring the use of a RuleFit classifier. Achieving comparable results, the rule importance enables the exploration of feature importance and interaction.
XAI Applications in Fun & Productivity
LyricalMap - Siddarth Vinnakota
LyricalMap uses embedding models, TSNE dimensionality reduction and visualization, k-means clustering, and LLMs to identify lyrical relationships and user interest.
MailMood - Wilson Tseng
MailMood is a Chrome extension designed to enhance emotional understanding in digital communication by detecting and highlighting emotions in email content. By identifying the emotional tone and specific triggering words or phrases in an email, the tool aims to bridge communication gaps in professional and academic settings.
AI-Powered Fashion Recommender with Explainable Insights - Anannya Chuli
Upload your image to receive personalized fashion recommendations powered by a state-of-the-art AI model, with Grad-CAM visualizations explaining the model’s decisions. Explore similar styles and discover outfit inspirations tailored to you!
Brainstorm.ai - Yancey Yang
The project acts as an assistant for new ideas. It will provide detailed step by step implementation plan and professional review from different perspectives. Within the user interface, evaluation can be delivered in seconds with AI.
Emergent Topics in XAI
Explainable AI Models for Edge-Case Object Identification in Autonomous Vehicles - Luopeiwen (Tina) Yi
This study evaluates the predictive performance and explainability of AI models in edge-case scenarios within the context of autonomous driving. It examines two pre-trained image classification models, ResNet50 and VGG16, under conditions such as night driving, adverse weather, and broken infrastructure, focusing on 'object identification' as the classification of entire scenes rather than object localization. Using a novel dataset of edge-case images, this research applies explainability techniques—LIME, Grad-CAM, and Anchors Explanations—to clarify model predictions and identify limitations. ResNet50 and VGG16 demonstrated comparable classification performance, excelling in clear conditions but struggling with occlusions, poor visibility, and overlapping objects. Among the XAI methods, Anchors provided the most precise and interpretable explanations but lacked generalizability due to low coverage. LIME offered flexible, localized visualizations but struggled with consistency in complex scenes, while Grad-CAM, though efficient, lacked clarity and differentiation. These findings emphasize the need for AI models and XAI methods that better integrate contextual reasoning and enhance generalization to improve robustness in real-world autonomous driving applications.
Exploring Perturbation-Based Evaluations in XAI for Time Series Forecasting Models - Haodong He
In the rapidly growing field of machine learning, interpreting predictions from complex models is essential for ensuring trust and accountability, particularly in tasks like time series forecasting. This project aimed to address the challenges of evaluating XAI techniques for time series data by developing a systematic perturbation-based evaluation framework. The objective was to validate feature attributions generated by three popular XAI methods—LIME, Saliency Maps, and Integrated Gradients (IG)—by comparing the model’s performance before and after perturbing key features.
Quantum Computing and XAI: A Dance of Probabilities - Akhil Chintalapati
Okay, imagine this: two of the coolest tech worlds are coming together for a surprising collab. On one side, there’s quantum computing — super weird, super futuristic, and basically the stuff of sci-fi. On the other side, we have explainable AI (XAI), which is all about making AI actually explain itself, so humans can trust it. At first, they seem totally unrelated, like chai and pizza. But trust me, when you dig deeper, it’s mind-blowing how they might actually fit together.
The Impact of Compression on DNN Explanation - Osama Ahmed
This project analyzes how model compression techniques affect BERT model explanations in sentiment analysis tasks, comparing original BERT and DistilBERT models using SHAP explanations. Report Here.
Explainable AI for Housing Estimates - Keese Phillips
This application is designed to demonstrate an understanding of how complex problems can be explained using Generalized Additive Models (GAMs) and LIME, and when used in conjunction of one another, can sometimes explain even the most complex decision boundaries.
*Students were able to opt out of inclusion in this publication. Some students opted out of having their video included and some students did not submit their artifacts in a format that could be easily shared.
Such a rich set of projects, Brinnae! I need to hear more about the symposium.