PPR Seminar

Advances in Perception, Prediction, and Reasoning

Hosted by Tejas Gokhale at UMBC


March 05, 2024

Co-hosted by Lara Martin

Li "Harry" Zhang
Ph.D. Candidate, University of Pennsylvania

Structured Event Reasoning with Large Language Models
Reasoning about real-life events is a unifying challenge in AI and NLP that has profound utility in a variety of domains, while any fallacy in high-stake applications like law, medicine, and science could be catastrophic. Able to work with diverse text in these domains, large language models (LLMs) have proven capable of answering questions and solving problems. In this talk, I demonstrate that end-to-end LLMs still systematically fail on reasoning tasks of complex events. Moreover, their black-box nature gives rise to little interpretability and user control. To address these issues, I propose two general approaches to use LLMs in conjunction with a structured representation of events. The first is a language-based representation involving relations of sub-events that can be learned by LLMs via fine-tuning. The second is a symbolic representation involving states of entities that can be leveraged by either LLMs or deterministic solvers. On a suite of event reasoning tasks, I show that both approaches outperform end-to-end LLMs in terms of performance and trustworthiness.

Bio Li "Harry" Zhang is a 5th-year PhD student working on Natural Language Processing (NLP) and artificial intelligence at the University of Pennsylvania advised by Prof. Chris Callison-Burch. He earned his Bachelor's degree at the University of Michigan mentored by Prof. Rada Mihalcea and Prof. Dragomir Radev. He has published more than 20 papers in NLP conferences that have been cited more than 1,000 times. He has reviewed more than 50 papers in those venues and has served as Session Chair and Program Chair in many conferences and workshops. Being a musician, producer, content creator of over 50,000 subscribers, he is also passionate in the research of AI music.

Feb 08, 2024
3:30 -- 4:45 PM
ITE 325 B

Webex Link

★★ PPR Distinguished Speaker ★★
Yezhou Yang
Associate Professor, Arizona State University

Visual Concept Learning Beyond Appearances: Modernizing a Couple of Classic Ideas
The goal of Computer Vision, as coined by Marr, is to develop algorithms to answer "What are", "Where at", "When from" visual appearance. The speaker, among others, recognizes the importance of studying underlying entities and relations beyond visual appearance, following an Active Perception paradigm. This talk will present the speaker's efforts over the last decade, ranging from 1) reasoning beyond appearance for vision and language tasks (VQA, captioning, T2I, etc.), and addressing their evaluation misalignment, through 2) reasoning about implicit properties, to 3) their roles in a Robotic visual concept learning framework. The talk will also feature the Active Perception Group (APG)’s projects addressing emerging challenges of the nation in automated mobility and intelligent transportation domains, at the ASU School of Computing and Augmented Intelligence (SCAI).

Bio Yezhou (YZ) Yang is an Associate Professor and a Fulton Entrepreneurial Professor in the School of Computing and Augmented Intelligence (SCAI) at Arizona State University. He founded and directs the ASU Active Perception Group, and currently serves as the topic lead (situation awareness) at the Institute of Automated Mobility, Arizona Commerce Authority. He is also a thrust lead (AVAI) at Advanced Communications Technologies (ACT, a Science and Technology Center under the New Economy Initiative, Arizona). His work includes exploring visual primitives and representation learning in visual (and language) understanding, grounding them by natural language and high-level reasoning over the primitives for intelligent systems, secure/robust AI, and V&L model evaluation alignment. Yang is a recipient of the Qualcomm Innovation Fellowship 2011, the NSF CAREER award 2018, and the Amazon AWS Machine Learning Research Award 2019. He received his Ph.D. from the University of Maryland at College Park, and B.E. from Zhejiang University, China. He is a co- founder of ARGOS Vision Inc, an ASU spin-off company.

Dec 04, 2023
4:00 -- 5:15 PM
ENGR 231

Webex Link

Man Luo
Postdoctoral Research Fellow, Mayo Clinic

Advancing Multimodal Retrieval and Generation: From General to Biomedical Domains
This talk explores advancements in multimodal retrieval and generation across general and biomedical domains. The first work introduces a multimodal retriever and reader pipeline for vision-based question answering, using image-text queries to retrieve and interpret relevant textual knowledge. The second work simplifies this approach with an efficient end-to-end retrieval model, removing dependencies on intermediate models like object detectors. The final part presents a biomedical-focused multimodal generation model, capable of classifying and explaining labels in images with text prompts. Together, these works demonstrate significant progress in integrating visual and textual data processing in diverse applications.

Bio Dr Man Luo is a Postdoctoral Research Fellow at Mayo Clinic with Dr. Imon Banerjee and Dr. Bhavik Patel. Her research is at the intersection of information retrieval and reading comprehension within natural language processing (NLP) and multimodal domains, to retrieve and utilize external knowledge with efficiency and generalization. Currently she is interested in knowledge retrieval, multimodal understanding, and applications of LLMs and VLMs in biomedical/healthcare application. She earned her Ph.D. in 2023 from Arizona State University advised by Dr. Chitta Baral, and has collaborated at industrial research labs at Salesforce, Meta, and Google.

Nov 29, 2023
4:00 -- 5:15 PM
ENGR 231

Webex Link

Kowshik Thopalli
Postdoctoral Researcher, Lawrence Livermore National Laboratory

Making Machine Learning Models Safer: Data and Model Perspectives
As machine learning systems are increasingly deployed in real-world settings like healthcare, finance, and scientific applications, ensuring their safety and reliability is crucial. However, many state-of-the-art ML models still suffer from issues like poor out-of-distribution generalization, sensitivity to input corruptions, requiring large amounts of data, and inadequate calibration - limiting their robustness and trustworthiness for critical real-world applications. In this talk, I will first present a broad overview of different safety considerations for modern ML systems. I will then proceed to discuss our recent efforts in making ML models safer from two complementary perspectives - (i) manipulating data and (ii) enriching the model capabilities by developing novel training mechanisms. I will discuss our work on designing new data augmentation techniques for object detection followed by demonstrating how, in the absence of data from desired target domains of interest, one could leverage pre-trained generative models for efficient synthetic data generation. Next, I will present a new paradigm of training deep networks called model anchoring and show how one could achieve similar properties to an ensemble but through a single model. I will specifically discuss how model anchoring can significantly enrich the class of hypothesis functions being sampled and demonstrate its effectiveness through its improved performance on several safety benchmarks. I will conclude by highlighting exciting future research directions for producing robust ML models through leveraging multi-modal foundation models.

Bio Kowshik Thopalli is a Machine Learning Scientist and a post-doctoral researcher at Lawrence Livermore National Laboratory. His research focuses on developing reliable machine learning models that are robust under distribution shifts. He has published papers on a variety of techniques to address model robustness, including domain adaptation, domain generalization, and test-time adaptation using geometric and meta-learning approaches. His expertise also encompasses integrating diverse knowledge sources, such as domain expert guidance and generative models, to improve model data efficiency, accuracy, and resilience to distribution shifts. He received his Ph.D. in 2023 from Arizona State University.

Nov 27, 2023
4:00 -- 5:15 PM
ENGR 231

Webex Link

Eadom Dessalene
Ph.D. Candidate, University of Maryland College Park

Learning Actions from Humans in Video
The prevalent computer vision paradigm in the realm of action understanding is to directly transfer advances in object recognition toward action understanding. In this presentation I discuss the motivations for an alternative embodied approach centered around the modelling of actions rather than objects and survey recent work of ours along these lines, as well as promising possible future directions.

Bio Eadom Dessalene is a Ph.D. Candidate at University of Maryland, College Park, advised by Yiannis Aloimonos and Cornelia Fermuller in the Perception and Robotics Group. Eadom received his bachelors degree in Computer Science from George Mason University. He has made several important contributions to research on video understanding and ego-centric vision through publications in CVPR, ICLR, T-PAMI, and ICRA, as well as winning first place in the 2020 EPIC Kitchens Action Anticipation Challenge.