Tejas Gokhale

PhD Candidate
School of Computing & AI, Arizona State University


about publications CV (pdf) Google Scholar V&L Seminar O-DRUM Reading Group

Tejas Gokhale


My mission is to conduct research to improve the robustness and reliability of AI systems. My contributions towards this goal are in the domains of machine learning, computer vision, and natural language processing.

I work with Yezhou Yang and Chitta Baral at ASU, and closely collaborate with Rushil Anirudh at Lawrence Livermore National Laboratories. I received my MS in ECE from Carnegie Mellon University, where I worked with Aswin Sankaranarayanan.

My domain expertise lies in "semantic vision", i.e. computer vision tasks that seek to assign "meaning" to what we see -- this includes "designative" tasks such as image classification; "communicative" tasks involving both vision and language such as visual question answering, visual reasoning, and image captioning.

The main focus of my Ph.D. is on robust visual understanding, to address problems such as domain shift/out-of-distribution generalization, linguistic robustness (logical, semantic), visual robustness (corruptions, geometric transformations, attribute-level shift).

A Research Statement and a poster summarizing some of my work, is now available. (presented at CVPR DC 2022)

Collaboration/Mentorship Opportunities: If you're a PhD student interested in collaborating with me on robust machine learning in Vision/NLP/V+L (domain generalization, adversarial attack/defense etc.), or other related topics, please send me an email (if you're at ASU, we can discuss it over coffee). I'm always happy to dish out advice and share my experiences w.r.t. admissions to Ph.D. programs in CS/EE/CE (help me help you by sending me a list of specific questions over email).


23 Jun 2022 Presented my work at CVPR Doctoral Consortium at CVPR 2022

20 Jun 2022 Organized the 1st Workshop on Open-Domain Retrieval Under Multi-Modal Settings (O-DRUM) at CVPR 2022

25 Apr 2022 Recognized as Highlighted Reviewer for ICLR 2022 (top ~8%)


* indicates equal contribution

Improving Diversity with Adversarially Learned Transformations for Domain Generalization
(to appear in) WACV 2023
Tejas Gokhale, Rushil Anirudh, Jayaraman Thiagarajan, Bhavya Kailkhura, Chitta Baral, Yezhou Yang
pdf code

ALT discovers diverse and adversarial transformations using an image-to-image neural network with learnable weights. ALT improves the state-of-the-art single domain generalization performance on three benchmarks and is significantly better than pixel-wise adversarial training and standard data augmentation techniques.

Semantically Distributed Robust Optimization for Vision-and-Language Inference
ACL Findings 2022
Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang,
pdf code

SDRO: a distributed robust optimization method that operates with linguistic transformations of sentence inputs, SISP: a suit of semantics-inverting (SI) and semantics-preserving (SP) linguistic transformations, and an ensembling technique for vision-and-language inference.

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness
ACL Findings 2022
Tejas Gokhale, Man Luo, Swaroop Mishra, Bhavdeep Singh Sachdeva, Chitta Baral

In this work, we conduct a comprehensive study of common data modification strategies and evaluate not only their in-domain and OOD performance, but also their adversarial robustness (AR). This work serves as an empirical study towards understanding the relationship between generalizing to unseen domains and defending against adversarial perturbations.

Improving Biomedical Information Retrieval with Neural Retrievers
AAAI 2022
Man Luo, Arindam Mitra , Tejas Gokhale, Chitta Baral

We seek to improve information retrieval (IR) using neural retrievers (NR) in the biomedical domain, using a three-pronged approach. (1) a template-based question generation method, (2) two novel pre-training tasks that are closely aligned to the downstream task of information retrieval, (3) the ``Poly-DPR'' model which encodes each context into multiple context vectors.

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
ICCV 2021
Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

VQA models trained with two additional objectives: object centroid estimation and relative position estimation, lead to improved performance on spatial reasoning questions (in GQA) in fully supervised and few shot settings as well as improved O.O.D. generalization.

WeaQA: Weak Supervision via Captions for Visual Question Answering
ACL 2021 Findings
Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

We show that models can be trained without any human-annotated Q-A pairs, but only with images and associated text captions. Our experiments suggest gains on benchmark with shifted priors (VQA-CP) over baselines which use full supervision from human-authored QA data.

HalluciNet: Scene Completion by Exploiting Object Co-occurrence Relationships
CVPR 2021 Workshop, "AI for Content Creation"
Kuldeep Kulkarni, Tejas Gokhale, Rajhans Singh, Pavan Turaga, Aswin Sankaranarayanan

Scene completion from sparse and incomplete label maps. `Halluci-Net' is a 2-stage method that captures the object co-occurrence relationships, to produce dense label maps from incomplete labelmaps and object boundaries, for image synthesis.

Self-Supervised Test-Time Learning for Reading Comprehension
NAACL 2021
Pratyay Banerjee, Tejas Gokhale, Chitta Baral

Unsupervised Reading Comprehension method that operates directly on a single test passage. Synthetic QA pairs are generated from the passage, and models are trained on these. When a new human-authored test question appears, models infer answers better than previous unsupervised methods.

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations
AAAI 2021
Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura, Jayaraman Thiagarajan, Chitta Baral, Yezhou Yang
pdf code

An adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to attributes-space. Studies robustness to semantic shifts that are beyond L-p norm perturbations, on 3 types of naturally occurring perturbations --- object-related shifts, geometric transformations, and common image corruptions.

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
EMNLP 2020
Tejas Gokhale*, Pratyay Banerjee*, Chitta Baral, Yezhou Yang,

MUTANT is a training paradigm that exposes VQA models to perceptually similar, yet semantically distinct mutations of the input image or question. We use a pairwise consistency loss between answers to original and mutant inputs as a regularization, along with an answer embedding NCE loss. MUTANT establishes a new SOTA (+10%) on the VQA-CP challenge (for generalization under Changing Priors)

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
EMNLP 2020
Zhiyuan Fang* Tejas Gokhale*, Pratyay Banerjee, Chitta Baral, Yezhou Yang,
pdf code web

Actions in videos are inherently linked to latent social and commonsense aspects. We present the first work on generating commonsense captions directly from videos, to describe latent intentions, attributes, and effects of humans in videos. Additionally we explore the use of open-ended video-based commonsense question answering (V2C-QA) as a way to enrich our captions.

VQA-LOL: Visual Question Answering under the Lens of Logic
ECCV 2020
Tejas Gokhale*, Pratyay Banerjee*, Chitta Baral, Yezhou Yang,
pdf, web video

VQA models struggle at negation, antonyms, conjunction, disjunction! We show a capability of answering logically composed questions with our novel modules and datasets, while retaining performance on VQA data.

Cooking With Blocks : A Recipe for Visual Reasoning on Image-Pairs
CVPR 2019 Workshop, Vision Meets Cognition
Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral,
pdf, [CVPR-VMC Paper] web

Given two images (source, target) with different object configurations, what is the sequence of steps to re-arrange source to match target? For this reasoning task, our modular approach that contains a visual encoder and an event-sequencer/planner, and exhibits inductive generalization.


ICLR 2022 Highlighted Reviewer
CVPR 2022 Doctoral Consortium
IJCAI 2019 Doctoral Consortium
Scholarships / Fellowships: Travel Awards
  • Graduate College Travel Award, ASU (for CVPR 2022, ICCV 2021, EMNLP 2020, ECCV 2020)
  • IJCAI Doctoral Consortium Travel Award, (IJCAI, 2019)
  • CIDSE Travel Grant Award, (for CVPR 2019)
Societies / Memberships


Reviewer: ICLR 2022, NeurIPS 2022, ECCV 2022, AAAI 2021, 2022 ACL Conferences / ARR (ACL, EMNLP, NAACL) 2021, 2022, ICRA 2019, 2020, 2021, IEEE RA-L, WACV 2022, Springer MVAP
Organizer/Host: O-DRUM Workshop @CVPR 2022, Spring 2021 Seminar Series (Frontiers in Vision and Language), Summer Vision Reading Group
Advisor: ASU Machine Learning Club (undergraduate student organization)
Research Mentor: ASU FURI, CSE485 Capstone (Cognitive Vision, Vision&Language)
  • ASU CSE310: Data Structures and Algorithms (Taught Recitations)
  • ASU CSE408: Multimedia Information Systems (TA),
  • ASU CSE110: Principles of Programming (Taught Labs)
  • ASU CSE576: Natural Language Processing (Mentored Class Projects),
  • BITS CTE: Advanced Image Processing (co-Instructor)
Volunteer: ICML 2020, SWRS 2019
Student Mentor: Graduate Student Mentorship Program (ASU), Peer Mentorship Program (BITS Goa).


Collaborators: Mentees:
  • MS Research: Adela Shao (2021- ),  Maitreya Patel (2021- ),  Abhishek Chaudhary (2020-2021 AY) → Amazon,  Arnav Chakravarthy (2020-2021 AY) → VMWare,  Aadhavan Sadasivam (2019-20 AY) &→ PayPal,  Shivakshit Patri (2019 Spring) → Amazon
  • Undergraduate: Mertay Dayanc (ASU FURI), Paul Butler (2019-20 AY) → Microsoft, Jace Lord, Sagarika Pannase, Aashwin Ranjan, William Tith (BS Capstone).

Template borrowed from Alane Suhr .