Publications | Karan Sikka

2024

arXiv

Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification

Pritish Sahu , Karan Sikka, and Ajay Divakaran

arXiv preprint, 2024

Abs PDF

We present Pelican, a novel framework for detecting and correcting hallucinations in vision-language models by decomposing complex claims into verifiable sub-claims and using program-of-thought verification to ensure factual accuracy.
NAACL

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models

Yangyi Chen , Karan Sikka, Michael Cogswell , and 2 more authors

In NAACL , 2024

Abs PDF Code Website

We investigate chain-of-thought reasoning in vision-language models, proposing metrics to measure reasoning consistency and methods to improve the reliability of these models’ reasoning processes.
PNAS

SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments

Abhinav Rajvanshi , Karan Sikka, Xiao Lin , and 3 more authors

In PNAS , 2024

Abs PDF

We present SayNav, a framework that grounds large language models for robot navigation in novel environments through dynamic planning, enabling robots to understand natural language commands and navigate effectively in previously unseen spaces.
Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning

Anirudh Som , Karan Sikka, Helen Gent , and 3 more authors

In ACL Findings , 2024

PDF
CVPR

Dress: Instructing large vision-language models to align and interact with humans via natural language feedback

Yangyi Chen , Karan Sikka, Michael Cogswell , and 2 more authors

In CVPR , 2024

Abs PDF Website

DRESS is a method for aligning vision-language models with human preferences using natural language feedback, enabling more effective human-AI interaction and improving model behavior through iterative refinement.

2023

TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models

Indranil Sur , Karan Sikka, Matthew Walmer , and 5 more authors

In ICCV , 2023

PDF
Multilingual Content Moderation: A Case Study on Reddit

Meng Ye , Karan Sikka, Katherine Atwell , and 3 more authors

In EACL , 2023

PDF
Predicting Information Pathways Across Online Communities

Yiqiao Jin , Yeon-Chang Lee , Kartik Sharma , and 4 more authors

In KDD , 2023

PDF
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval

Matthew Gwilliam , Michael Cogswell , Meng Ye , and 3 more authors

arXiv preprint arXiv:2312.00115, 2023

2022

Dual-Key Multimodal Backdoors for Visual Question Answering

Matthew Walmer , Karan Sikka, Indranil Sur , and 2 more authors

In CVPR , 2022

PDF
Challenges in Procedural Multimodal Machine Comprehension: A Novel Way To Benchmark

Pritish Sahu , Karan Sikka, and Ajay Divakaran

In WACV , 2022

PDF Supp

2021

Towards solving multimodal comprehension

Pritish Sahu , Karan Sikka, and Ajay Divakaran

arXiv, 2021

PDF
MISA: Online Defense of Trojaned Models using Misattributions

Panagiota Kiourti , Wenchao Li , Anirban Roy , and 2 more authors

In Annual Computer Security Applications Conference , 2021

PDF
Resilient Data Augmentation Approaches to Multimodal Verification in the News Domain

John Cadigan , Karan Sikka, Meng Ye , and 1 more author

In ICCV Workshops , 2021

PDF

2020

Deep adaptive semantic logic (dasl): Compiling declarative knowledge into deep neural networks

Karan Sikka, Andrew Silberfarb , John Byrnes , and 4 more authors

arXiv, 2020

PDF
ACMM

Rgb2lidar: Towards solving large-scale cross-modal visual localization

Niluthpol Chowdhury Mithun , Karan Sikka, Han-Pang Chiu , and 2 more authors

In ACMM , 2020

PDF

Best Paper Finalist
Zero-shot learning with knowledge enhanced visual semantic embeddings

Karan Sikka, Jihua Huang , Andrew Silberfarb , and 6 more authors

arXiv, 2020

PDF
Detecting trojaned dnns using counterfactual attributions

Karan Sikka, Indranil Sur , Susmit Jha , and 2 more authors

arXiv, 2020

PDF

2019

ICCV

Align2ground: Weakly supervised phrase grounding guided by image-caption alignment

Samyak Datta , Karan Sikka, Anirban Roy , and 3 more authors

In ICCV , 2019

Abs PDF

We propose a weakly supervised approach for phrase grounding that learns to localize textual phrases in images using only image-caption pairs, without requiring expensive bounding box annotations.
EMNLP

Integrating text and image: Determining multimodal document intent in instagram posts

Julia Kruk , Jonah Lubin , Karan Sikka, and 3 more authors

EMNLP, 2019

PDF Video Website
Deep Unified Multimodal Embeddings for Understanding both Content and Users in Social Media Networks

Karan Sikka, Lucas Van Bramer , and Ajay Divakaran

arXiv, 2019

PDF Video
Foodx-251: a dataset for fine-grained food classification

Parneet Kaur , Karan Sikka, Weijun Wang , and 2 more authors

CVPR Workshops, 2019

PDF
Sunny and dark outside?! improving answer consistency in vqa through entailed question generation

Arijit Ray , Karan Sikka, Ajay Divakaran , and 2 more authors

EMNLP, 2019

PDF Website
Semantically-Aware Attentive Neural Embeddings for Long-Term 2D Visual Localization

Zachary Seymour , Karan Sikka, Han-Pang Chiu , and 2 more authors

In BMVC , 2019

PDF

2018

ECCV

Zero-shot object detection

Ankan Bansal , Karan Sikka, Gaurav Sharma , and 2 more authors

In ECCV , 2018

Abs PDF

We present a novel approach for zero-shot object detection (ZSD), which aims to detect object classes not seen during training. Our method leverages semantic relationships between seen and unseen classes using word embeddings to enable detection of novel object categories.
Understanding visual ads by aligning symbols and objects using co-attention

Karuna Ahuja , Karan Sikka, Anirban Roy , and 1 more author

In CVPR Workshops , 2018

PDF
Make up your mind: Towards consistent answer predictions in vqa models

Arijit Ray , Giedrius T Burachas , Karan Sikka, and 4 more authors

In ECCV Workshops , 2018

PDF

2017

Deep active object recognition by joint label and action prediction

Mohsen Malmir , Karan Sikka, Deborah Forster , and 3 more authors

CVIU, 2017

PDF
Discriminatively trained latent ordinal model for video classification

Karan Sikka, and Gaurav Sharma

PAMI, 2017

PDF
Adascan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos

Amlan Kar , Nishant Rai , Karan Sikka, and 1 more author

In CVPR , 2017

PDF

2016

CVPR

Lomo: Latent ordinal model for facial analysis in videos

Karan Sikka, Gaurav Sharma , and Marian Bartlett

In CVPR , 2016

Abs PDF Video

We propose LOMo, a latent ordinal model for video classification that leverages the inherent temporal ordering in videos for improved action recognition under weak supervision.

2015

The more the merrier: Analysing the affect of a group of people in images

Abhinav Dhall , Jyoti Joshi , Karan Sikka, and 2 more authors

In AFGR , 2015

PDF
Exemplar hidden markov models for classification of facial expressions in videos

Karan Sikka, Abhinav Dhall , and Marian Bartlett

In CVPR Workshops , 2015

PDF
Automated assessment of children’s postoperative pain using computer vision

Karan Sikka, Alex A Ahmed , Damaris Diaz , and 4 more authors

Pediatrics, 2015

PDF Supp
Joint Clustering and Classification for Multiple Instance Learning

Karan Sikka, Ritwik Giri , and Marian Bartlett

In BMVC , 2015

PDF
Deep Q-learning for Active Recognition of GERMS: Baseline performance on a standardized dataset for active learning.

Mohsen Malmir , Karan Sikka, Deborah Forster , and 2 more authors

In BMVC , 2015

PDF

2014

A discriminative parts based model approach for fiducial points free and shape constrained head pose normalisation in the wild

Abhinav Dhall , Karan Sikka, Gwen Littlewort , and 2 more authors

In WACV , 2014

PDF
Classification and weakly supervised pain localization using multiple segment representation

Karan Sikka, Abhinav Dhall , and Marian Stewart Bartlett

IVC, 2014

PDF
Emotion recognition in the wild challenge 2014: Baseline, data and protocol

Abhinav Dhall , Roland Goecke , Jyoti Joshi , and 2 more authors

In ICMI , 2014

PDF Website
Facial expression analysis for estimating pain in clinical settings

Karan Sikka

In ICMI , 2014

PDF

2013

AFGR

Weakly supervised pain localization using multiple instance learning

Karan Sikka, Abhinav Dhall , and Marian Bartlett

In AFGR , 2013

PDF

Best Student Paper Honorable Mention
ICMI

Multiple kernel learning for emotion recognition in the wild

Karan Sikka, Karmen Dykstra , Suchitra Sathyanarayana , and 2 more authors

In ICMI , 2013

PDF

Best Paper Award
Pseudo vs. true defect classification in printed circuits boards using wavelet features

Sahil Sikka , Karan Sikka, Manas Kamal Bhuyan , and 1 more author

arXiv preprint arXiv:1310.6654, 2013

PDF

2012

Exploring bag of words architectures in the facial expression domain

Karan Sikka, Tingfan Wu , Josh Susskind , and 1 more author

In ECCV , 2012

PDF

2011

Texture information-based hybrid methodology for the segmentation of SAR images

Pankaj K Singh , Nitesh Sinha , Karan Sikka, and 1 more author

International journal of remote sensing, 2011

PDF

2010

Comparison of algorithms for ultrasound image segmentation without ground truth

Karan Sikka, and Thomas M Deserno

In Medical Imaging 2010: Image Perception, Observer Performance, and Technology Assessment , 2010

PDF

2009

A fully automated algorithm under modified FCM framework for improved brain MR image segmentation

Karan Sikka, Nitesh Sinha , Pankaj K Singh , and 1 more author

Magnetic Resonance Imaging, 2009

PDF