Computer Vision Scientist
SRI International
karan.sikka AT sri DOT com
Short CV
Google Scholar Page

Bio: Dr. Karan Sikka is an Advanced Computer Scientist at Center for Vision Technologies, SRI International in Princeton, USA. He graduated with a PhD degree in 2016 from Machine Perception Lab at UCSD and was advised by Dr. Marian Bartlett. Before joining UCSD, he completed his bachelor's in ECE at Indian Institute of Technology Guwahati in 2010. His current research is focused on solving some fundamental problems in Computer Vision and Machine Learning such as learning with multiple modalities (incl. vision and language), learning under weak supervision, few/zero-shot learning. He has successfully applied these methods to multiple problem such as facial expression recognition, action recognition, object detection, visual grounding, visual localization. The underlying theme in his research has been to improve the generalization of Computer Vision models by providing useful inductive bias either in the model design (e.g. better features or interactions), data (augmentation with knowledge or multimodality) or the loss function (e.g. weakly supervised learning) and that is also applicable across multiple domains. His work has been published at high-quality venues such as CVPR, ECCV, ICCV, PAMI etc. He has won a best paper honorable mention award at IEEE Face and Gesture 2013, and a best paper award at the Emotion Recognition in the Wild Workshop at ICMI 2013. He serves as a reviewer/program-committee for venues such as ECCV, CVPR, ICCV, ICML, NIPS, AAAI, ACCV, IJCV, IEEE TIP, IEEE TAC, IEEE TM, ICMI, AFGR and also as an AC for ACMM-19,20

At SRI he is a co-PI for several Govt. funded programs (ONR CEROSS, DARPA M3I, and AFRL Mesa) related to understanding and analyzing social media content in multiple modalities and user structures. His work has resulted in MatchStax API that allows seamless matching of content with users and content with content through unsupervised embeddings (paper and demo) He has also been working on inducting human knowledge, expressed in first-order logic, in neural networks (here). You can get a glimpse about his work in a short interview recorded at SRI. He actively collaborates with other researchers from industries as well as universities and excellent research cannot be done in isolation.


    1. Dec 2020: I will be giving an invited talk at ICGVIP'20 on multimodal embeddings and its applications in vision-language tasks. Please see this for more details.
    2. Looking for motivated MS/PhD. interns for Summer 2021 . See here.
    3. Aug 2020: Our work on large-scale cross-modal visual localization (from aerial LIDAR to RGB) has been selected as best paper candidate at ACMM'2020 .
    4. May 2019: Talking about my research in a short interview video recorded by Reenita Hora.
    5. Apr 2019: Excited to be an area chair for ACM Multimedia 2020.
    6. Aug 2019: Two papers accepted at EMNLP 2019.
    7. Jul 2019: Our work "Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment" got accepted at ICCV 2019. Check the paper here.
    8. Jul 2019: Our work "Semantically-Aware Attentive Neural Embeddings for Image-based Visual Localization" got accepted at BMVC 2019. Check the paper here.
    9. Apr 2019: We are hosting the iFood 2019 challenge aimed at classifying fine-grained food categories in images. We have introduced a new dataset with 251 food categories for this challenge. This competition is part of the FGVC workshop being held at CVPR 2019. Please visit the Github page and Kaggle page for more details. Looking forward to participation!
    10. Mar 2019 I am an Area Chair for ACM Multimedia 2019.
    11. Jul 2018 I was invited as a speaker at the MADIMA Workshop (4th International Workshop on Multimedia Assisted Dietary Management) held in conjuction with IJCAI and ECAI, Stockholm, Sweden. I presented our work on food classification and recent iFood challenge organized at CVPR 2018.
    12. Jul 2018: Our work on Zero-shot object detection was accepted at ECCV 2018. Visit the webpage for paper and more details.
    13. Jun 2018: Our work on "Understanding Visual Ads by Aligning Symbols and Objects using Co-Attention" is online now . In this work we proposed a novel weakly supervised learning algorithm that uses an iterative co-attention mechanism to effectively combine multiple references (semantic and symbolic) present in an image. This work was presented as workshop paper at CVPR 2018.
    14. Apr 2018: We are hosting the iFood 2018 challenge aimed at classifying fine-grained food categories in images. We have introduced a new dataset with 211 food categories for this challenge. This competition is part of the FGVC workshop being held at CVPR 2018. Please visit the Github page and Kaggle page for more details. This challenge is jointly organized by SRI International and Google.
    15. Apr 2018: Our recent work on Zero-shot object detection is online now. In this work we introduce and target the novel problem of detecting unseen objects in test images. We have tried to comprehensively study this problem and propose new experimental design to evaluate this problem. Visit the webpage for paper and more details.
    16. Dec 2017: Technical report on our work on combining Weakly and Webly supervised learning for classifying food images is available.
    17. Our prior work extending our CVPR 2016 paper on Latent Ordinal Model to Human Action Recognition has been accepted to IEEE TPAMI . Please find the paper here.
    18. AdaScan paper has been accepted to CVPR 2017.
    19. We have uploaded paper for our new work- AdaScan (Adaptive Scan Pooling) for human action classification in videos. AdaScan is a deep CNN that pools informative and discriminative frames in a single temporal scan of the video.
    20. I have joined the Vision and Learning group at SRI International in Princeton, New Jersey.
    21. I am have successfully defended on 15th August 2016 and here are the video as well as the presentation slides.
    22. Our paper extending our CVPR 2016 work has been uploaded to arXiv. This work extends LOMo algorithm and also evaluates it on human action classification. We provide extenstive qualitative and quantitative experiments. We have updated the project page.
    23. I had a wonderful and productive 2 week visit to Indian Institute of Technology Kanpur as a visiting researcher (16th May - 27th May 2016).
    24. Paper (Spotlight Presentation) accepted at Computer Vision and Pattern Recognition (CVPR) 2016, with Dr. Gaurav Sharma (Assistant Professor at Indian Institute of Technology Kanpur).
    25. Blog entry on my thesis and CVPR paper.
    26. Will be working as an Associate intern in Vision and Learning group at SRI International, Princeton from Jan-Mar 2016.
    27. Two Paper accepted in BMVC 2015 .
    28. Another article on Collaborative work on predicting Pain in pediatric population in Engadget Link
    29. Article on Collaborative work on predicting Pain in pediatric population in UCSD Heath Newsroom Link
    30. Paper on 'Automated Assessment of Children's Post-Operative Pain' accepted in Pediatrics journal. Collaborative work with Dr. Jeannie Huang, Dr. Kenneth Craig, Dr. Marian Bartlett, Alex Ahmed and Damariz Diaz.
    31. Group Expression paper (collaborative work with Dr. Abhinav Dhall et al.) accepted in IEEE FG 2015 .
    32. Finished my thesis proposal exam and advanced to candidacy.

Previous and Current Affiliations