PPR Seminar

Advances in Perception, Prediction, and Reasoning

Hosted by Tejas Gokhale at UMBC

Schedule

Feb 19, 2025
4:00 -- 5:15 PM
ITE 231

Webex Link

Xiaomin Lin
Postdoctoral Researcher, JHU

Seeing Beneath the Surface: Vision-Enabled Robots for Long-term Ocean Monitoring.
Autonomous systems operating in complex and unstructured environments, especially underwater, require robust perception, adaptive navigation, and intelligent reasoning to function effectively. However, traditional AI models often struggle in these settings due to sensory limitations, dynamic obstacles, and computational constraints. This talk highlights these challenges and presents emerging technologies in subsea sensing and low-power autonomous operation. The first part of the talk explores multimodal sensing, demonstrating how optical, acoustic, and fused modalities enhance perception in low-visibility environments. The second part introduces active perception, where robots dynamically select the most informative viewpoints to optimize navigation and exploration. Finally, the third part discusses efficient reasoning, showcasing how compact language models enable real-time decision-making for autonomous exploration and task execution. By integrating these three pillars, this research advances the next generation of intelligent autonomous systems for underwater robotics, environmental monitoring, and beyond.

Bio

Dr. Xiaomin Lin is a Postdoctoral Researcher at Johns Hopkins University, working at the intersection of AI, robotics, and edge computing. He received his Ph.D. in Electrical and Computer Engineering from the University of Maryland, College Park, where his dissertation focused on simulation-driven learning for autonomous underwater systems. His research spans perception-driven autonomy, multi-modal sensing, and efficient AI deployment on edge devices. His work has been recognized with the Best Paper Award at IROS 2024(Autonomous Robotic Systems in Aquaculture) and the Best Poster Award at the Maryland Robotics Center Symposium. Dr. Lin's research has been funded by USDA, ONR, and AFRL, and he actively collaborates with academia and industry to push the boundaries of subsea autonomy.

Nov 25, 2024
4:00 -- 5:15 PM
Math & Psych 106

Webex Link

Changhoon Kim
Postdoctoral Scientist, Amazon

Strengthening Image Generative AI: Integrating Fingerprinting and Revision Methods for Enhanced Safety and Control
In the rapidly evolving field of Generative Artificial Intelligence (Gen-AI) for imaging, models such as DALL·E3 and Stable Diffusion have transitioned from theoretical concepts to practical tools with significant impact across various sectors including entertainment, art, journalism, and education. These advancements represent a substantial technological evo- lution, enhancing creative and professional practices. However, the widespread accessibility of Gen-AI also facilitates misuse by malicious actors who create deepfakes and spread misinformation, posing serious risks to societal well-being and privacy. This talk will address these critical challenges by focusing on enhancing the reliability of Image Gen-AI models through the identification and mitigation of inherent vulnerabilities and the development of computational tools and frameworks for enabling better community oversight. The talk will describe the development of innovative fingerprinting techniques that trace malicious Gen-AI outputs back to their sources, and the implementation of strategies to prevent the generation of unauthorized content. These efforts collectively strengthen the robustness and accountability of Gen-AI technologies, particularly in sensitive applications.

Bio

Changhoon Kim is a Postdoctoral Scientist and Amazon Web Services. He completed his Ph.D. in Computer Engineering at Arizona State University. His primary research focuses on the creation of secure machine learning systems. He has dedicated his efforts to developing user-attribution methods for generative models, a critical area of research in the age of AI-generated hyper-realistic content for tracing malicious usage, and machine UNlearning for removing private or harmful content from AI models. Kim’s research has been recognized at prestigious conferences such as ICLR, ICML, ECCV, and CVPR, and a U.S. patent for user-attribution in generative models. To further contribute to the community, he has organized tutorials and workshops at leading conferences to emphasize the importance of secure generative AI.

May 01, 2024
4:00 -- 5:15 PM
ENGR 231

Webex Link

Serena Booth
AAAS AI Policy Fellow, United States Senate

Building Human-AI Alignment: Specifying, Inspecting, and Modeling AI Behaviors
The learned behaviors of AI and robot agents should align with the intentions of their human designers. Toward this goal, people must be able to easily specify, inspect, and model agent behaviors. For specifications, we will consider expert-written reward functions for reinforcement learning (RL) and non-expert preferences for reinforcement learning from human feedback (RLHF). I will show evidence that experts are bad at writing reward functions: even in a trivial setting, experts write specifications that are overfit to a particular RL algorithm, and they often write erroneous specifications for agents that fail to encode their true intent. Next, I will show that the common approach to learning a reward function from non-experts in RLHF uses an inductive bias that fails to encode how humans express preferences, and that our proposed bias better encodes human preferences both theoretically and empirically. For inspection, humans must be able to assess the behaviors an agent learns from a given specification. I will discuss a method to find settings that exhibit particular behaviors, like out-of-distribution failures. Lastly, cognitive science theories attempt to show how people build conceptual models that explain agent behaviors. I will show evidence that some of these theories are used in research to support humans, but that we can still build better curricula for modeling. Collectively, my research provides evidence that—even with the best of intentions— current human-AI systems often fail to induce alignment, and my research proposes promising directions for how to build better aligned human-AI systems.

Bio

Serena Booth received her PhD at MIT CSAIL in 2023. Serena studies how people write specifications for AI systems and how people assess whether AI systems are successful in learning from specifications. While at MIT, Serena served as an inaugural Social and Ethical Responsible Computing Scholar, teaching AI Ethics and developing MIT’s AI ethics curriculum that is also released on MIT OpenCourseWare. Serena is a graduate of Harvard College (2016), after which she worked as an Associate Product Manager at Google to help scale Google’s ARCore augmented reality product to 100 million devices. Serena currently works in the U.S. Senate as a AAAS AI Policy Fellow, where she is working on AI policy questions for the Senate Banking, Housing, and Urban Affairs Committee. Her research has been supported by an MIT Presidential Fellowship and by an NSF GRFP. She is a Rising Star in EECS and an HRI Pioneer.

April 29, 2024
4:00 -- 5:15 PM
ENGR 231

Webex Link

Michael Saxon
Ph.D. Candidate, University of California, Santa Barbara

Rigorous measurement in text-to-image systems (and AI more broadly?)
As large pretrained models underlying generative AI systems have grown larger, inscrutable, and widely-deployed, interest in understanding their nature as emergent rather than engineered systems has grown. I believe to move this "ersatz natural science" of AI forward, we need to focus on building rigorous observational tools for these systems, which can characterize capabilities unambiguously. At their best, benchmarks and metrics could meet this need, but at present they are often treated as mere leaderboards to chase and only very indirectly measure capabilities of interest. This talk covers three works on this topic: first, a work laying out the high-level case for building a subfield of "model metrology" which focuses on building better benchmarks and metrics. Then, it covers two works on metrology in the generative image domain: first, a work which assesses multilingual conceptual knowledge in text-to-image (T2I) systems, and second, a meta-benchmark that demonstrates how many T2I prompt faithfulness benchmarks actually fail to capture the compositionality characteristics of T2I systems which they purport to measure. This line of inquiry is intended to help move benchmarking toward the ideal of rigorous tools of scientific observation.

Bio

Michael Saxon is a PhD candidate and NSF Fellow in the NLP Group at the University of California, Santa Barbara. His research sits on the intersection of generative model benchmarking, multimodality, and AI ethics. He’s particularly interested in making meaningful evaluations of hard-to-measure new capabilities in these artifacts. Michael earned his BS in Electrical Engineering and MS in Computer Engineering at Arizona State University, advised by Visar Berish and Sethuraman Panchanathan in 2018 and 2020 respectively.

April 24, 2024
4:00 -- 5:15 PM
ENGR 231

Webex Link

Catherine Ordun
Vice President, AI, Booz Allen Hamilton

Visible-Thermal Image Registration and Translation for Remote Medical Applications
Thermal imagery captured in the Long Wave Infrared (LWIR) spectrum has long-played a vital role in thermal physiology. Signs of stress and inflammation which are unseen in the visible spectrum, can be detected in LWIR due to principles of blackbody radiation. As a result, thermal facial imagery provides a unique modality for physiological assessment of states such as chronic pain. In this presentation, I will provide a presentation of my research into image registration to align visible-thermal images that serve as a prerequisite for image-to-image translation using conditional GANs and Diffusion Models. I will share recent work leading research with the National Institutes of Health applying this research in a real-world setting on cancer patients suffering from chronic pain.

Bio

Dr. Catherine Ordun is a Vice President at Booz Allen Hamilton, leading AI Rapid Prototyping and Tech Transfer solutions for mission-critical problems for the Federal Government. She drives AI rapid prototyping to support mission-critical proof-of-concepts across multiple AI domains, in addition to AI tech transfer to support algorithm reuse and consumption. She also leads multimodal AI research supporting the National Cancer Institute for chronic cancer pain detection. Dr. Ordun is a Ph.D. graduate of the UMBC Department of Information Systems under Drs. Sanjay Purushotham and Edward Raff, and obtained her bachelors degree from Georgia Tech, masters from Emory, and an MBA from GWU Business School.

April 17, 2024
4:00 -- 5:15 PM
ENGR 231

Webex Link

Yu Zeng
Ph.D., Johns Hopkins University

Learning to Synthesize Images with Multimodal and Hierarchical Inputs
In recent years, image synthesis and manipulation has experienced remarkable advancements driven by deep learning algorithms and web-scale data, yet there persists a notable disconnect between the intricate nature of human ideas and the simplistic input structures employed by the existing models. In this talk, I will present our research towards a more natural way for controllable image synthesis inspired by the coarse-to-fine workflow of human artists and the inherently multimodal aspect of human thought processes. We consider the inputs of semantic and visual modality at varying levels of hierarchy. For the semantic modality, we introduce a general framework for modeling semantic inputs of different levels, which includes image-level text prompts and pixel-level label maps as two extremes and brings a series of mid-level regional descriptions with different precision. For the visual modality, we explore the use of low-level and high-level visual inputs aligning with the natural hierarchy of visual processing. Additionally, as the misuse of generated images becomes a societal threat, I will introduce our findings on the trustworthiness of deep generative models in the second part of this talk and potential future research directions.

Bio

Yu Zeng is a PhD candidate at Johns Hopkins University advised by Vishal M Patel. Her research interest lies in computer vision and deep learning. She has focused on two main areas: (1) deep generative models for image synthesis and editing, and (2) label-efficient deep learning. By combining these research areas, she aims to bridge human creativity and machine intelligence through user-friendly and socially responsible models while minimizing the need for intensive human supervision. Yu has collaborated with researchers at NVIDIA and Adobe through internships. Prior to her PhD, she worked as a researcher at Tencent Games. Yu’s research has been recognized by the KAUST Rising Stars in AI and her PhD study has been supported by JHU Kewei Yang and Grace Xin Fellowship.

March 05, 2024
2:15 -- 3:30 PM
ITE 325-B

Webex Link

Co-hosted by Lara Martin

Li "Harry" Zhang
Ph.D. Candidate, University of Pennsylvania

Structured Event Reasoning with Large Language Models
Reasoning about real-life events is a unifying challenge in AI and NLP that has profound utility in a variety of domains, while any fallacy in high-stake applications like law, medicine, and science could be catastrophic. Able to work with diverse text in these domains, large language models (LLMs) have proven capable of answering questions and solving problems. In this talk, I demonstrate that end-to-end LLMs still systematically fail on reasoning tasks of complex events. Moreover, their black-box nature gives rise to little interpretability and user control. To address these issues, I propose two general approaches to use LLMs in conjunction with a structured representation of events. The first is a language-based representation involving relations of sub-events that can be learned by LLMs via fine-tuning. The second is a symbolic representation involving states of entities that can be leveraged by either LLMs or deterministic solvers. On a suite of event reasoning tasks, I show that both approaches outperform end-to-end LLMs in terms of performance and trustworthiness.

Bio

Li "Harry" Zhang is a 5th-year PhD student working on Natural Language Processing (NLP) and artificial intelligence at the University of Pennsylvania advised by Prof. Chris Callison-Burch. He earned his Bachelor's degree at the University of Michigan mentored by Prof. Rada Mihalcea and Prof. Dragomir Radev. He has published more than 20 papers in NLP conferences that have been cited more than 1,000 times. He has reviewed more than 50 papers in those venues and has served as Session Chair and Program Chair in many conferences and workshops. Being a musician, producer, content creator of over 50,000 subscribers, he is also passionate in the research of AI music.

Feb 08, 2024
3:30 -- 4:45 PM
ITE 325 B

Webex Link

★★ PPR Distinguished Speaker ★★
Yezhou Yang
Associate Professor, Arizona State University

Visual Concept Learning Beyond Appearances: Modernizing a Couple of Classic Ideas
The goal of Computer Vision, as coined by Marr, is to develop algorithms to answer "What are", "Where at", "When from" visual appearance. The speaker, among others, recognizes the importance of studying underlying entities and relations beyond visual appearance, following an Active Perception paradigm. This talk will present the speaker's efforts over the last decade, ranging from 1) reasoning beyond appearance for vision and language tasks (VQA, captioning, T2I, etc.), and addressing their evaluation misalignment, through 2) reasoning about implicit properties, to 3) their roles in a Robotic visual concept learning framework. The talk will also feature the Active Perception Group (APG)’s projects addressing emerging challenges of the nation in automated mobility and intelligent transportation domains, at the ASU School of Computing and Augmented Intelligence (SCAI).

Bio

Yezhou (YZ) Yang is an Associate Professor and a Fulton Entrepreneurial Professor in the School of Computing and Augmented Intelligence (SCAI) at Arizona State University. He founded and directs the ASU Active Perception Group, and currently serves as the topic lead (situation awareness) at the Institute of Automated Mobility, Arizona Commerce Authority. He is also a thrust lead (AVAI) at Advanced Communications Technologies (ACT, a Science and Technology Center under the New Economy Initiative, Arizona). His work includes exploring visual primitives and representation learning in visual (and language) understanding, grounding them by natural language and high-level reasoning over the primitives for intelligent systems, secure/robust AI, and V&L model evaluation alignment. Yang is a recipient of the Qualcomm Innovation Fellowship 2011, the NSF CAREER award 2018, and the Amazon AWS Machine Learning Research Award 2019. He received his Ph.D. from the University of Maryland at College Park, and B.E. from Zhejiang University, China. He is a co- founder of ARGOS Vision Inc, an ASU spin-off company.

Dec 04, 2023
4:00 -- 5:15 PM
ENGR 231

Webex Link

Man Luo
Postdoctoral Research Fellow, Mayo Clinic

Advancing Multimodal Retrieval and Generation: From General to Biomedical Domains
This talk explores advancements in multimodal retrieval and generation across general and biomedical domains. The first work introduces a multimodal retriever and reader pipeline for vision-based question answering, using image-text queries to retrieve and interpret relevant textual knowledge. The second work simplifies this approach with an efficient end-to-end retrieval model, removing dependencies on intermediate models like object detectors. The final part presents a biomedical-focused multimodal generation model, capable of classifying and explaining labels in images with text prompts. Together, these works demonstrate significant progress in integrating visual and textual data processing in diverse applications.

Bio

Dr Man Luo is a Postdoctoral Research Fellow at Mayo Clinic with Dr. Imon Banerjee and Dr. Bhavik Patel. Her research is at the intersection of information retrieval and reading comprehension within natural language processing (NLP) and multimodal domains, to retrieve and utilize external knowledge with efficiency and generalization. Currently she is interested in knowledge retrieval, multimodal understanding, and applications of LLMs and VLMs in biomedical/healthcare application. She earned her Ph.D. in 2023 from Arizona State University advised by Dr. Chitta Baral, and has collaborated at industrial research labs at Salesforce, Meta, and Google.

Nov 29, 2023
4:00 -- 5:15 PM
ENGR 231

Webex Link

Kowshik Thopalli
Postdoctoral Researcher, Lawrence Livermore National Laboratory

Making Machine Learning Models Safer: Data and Model Perspectives
As machine learning systems are increasingly deployed in real-world settings like healthcare, finance, and scientific applications, ensuring their safety and reliability is crucial. However, many state-of-the-art ML models still suffer from issues like poor out-of-distribution generalization, sensitivity to input corruptions, requiring large amounts of data, and inadequate calibration - limiting their robustness and trustworthiness for critical real-world applications. In this talk, I will first present a broad overview of different safety considerations for modern ML systems. I will then proceed to discuss our recent efforts in making ML models safer from two complementary perspectives - (i) manipulating data and (ii) enriching the model capabilities by developing novel training mechanisms. I will discuss our work on designing new data augmentation techniques for object detection followed by demonstrating how, in the absence of data from desired target domains of interest, one could leverage pre-trained generative models for efficient synthetic data generation. Next, I will present a new paradigm of training deep networks called model anchoring and show how one could achieve similar properties to an ensemble but through a single model. I will specifically discuss how model anchoring can significantly enrich the class of hypothesis functions being sampled and demonstrate its effectiveness through its improved performance on several safety benchmarks. I will conclude by highlighting exciting future research directions for producing robust ML models through leveraging multi-modal foundation models.

Bio

Kowshik Thopalli is a Machine Learning Scientist and a post-doctoral researcher at Lawrence Livermore National Laboratory. His research focuses on developing reliable machine learning models that are robust under distribution shifts. He has published papers on a variety of techniques to address model robustness, including domain adaptation, domain generalization, and test-time adaptation using geometric and meta-learning approaches. His expertise also encompasses integrating diverse knowledge sources, such as domain expert guidance and generative models, to improve model data efficiency, accuracy, and resilience to distribution shifts. He received his Ph.D. in 2023 from Arizona State University.

Nov 27, 2023
4:00 -- 5:15 PM
ENGR 231

Webex Link

Eadom Dessalene
Ph.D. Candidate, University of Maryland College Park

Learning Actions from Humans in Video
The prevalent computer vision paradigm in the realm of action understanding is to directly transfer advances in object recognition toward action understanding. In this presentation I discuss the motivations for an alternative embodied approach centered around the modelling of actions rather than objects and survey recent work of ours along these lines, as well as promising possible future directions.

Bio

Eadom Dessalene is a Ph.D. Candidate at University of Maryland, College Park, advised by Yiannis Aloimonos and Cornelia Fermuller in the Perception and Robotics Group. Eadom received his bachelors degree in Computer Science from George Mason University. He has made several important contributions to research on video understanding and ego-centric vision through publications in CVPR, ICLR, T-PAMI, and ICRA, as well as winning first place in the 2020 EPIC Kitchens Action Anticipation Challenge.