Discriminatively Trained Latent Ordinal Models

Our algorithm automatically mines discriminative sub-events while also modeling priors on their ordering for classifying human classes. We have shown visual examples from two classes to highlight that the algorithm detects these sub-events across videos from the same class. Also, these sub-events seem to correspond to semantic motion segments such as first sub-events is moving your club back, second is swinging and third is hitting the ball, for golf class. More examples are shown below.

  1. Sikka, K., Sharma, G. (2016). Discriminatively Trained Latent Ordinal Model for Video Classification. arXiv:1608.02318. (submitted) [arXiv preprint].
  2. Sikka, K., Sharma, G., Bartlett, M. (2016). LOMo: Latent Ordinal Model for Facial Analysis in Videos. Computer Vision and Pattern Recognition (CVPR). [PDF].

  3. We propose a novel 'loosely' structured Latent SVM formulation that models videos as a collection of temporal sub-events with a prior on their ordering. This model is a generalization of Multiple Instance Learning algorithm that does not model multiple sub-events or their temporal structure. The discriminative templates of these sub-events and the priors are learned in a Weakly Supervised setting akin to DPMs.

    We first published this work at CVPR16 [1] [PDF] and also showed state-of-the-art results for human facial behavior classification in videos. We now extend the Latent Ordinal Model (LOMo) to Adaptive LOMo [2] [arXiv preprint] and evaluate it on the challenging problem of human action classification in videos. In order to handle the unconstrained nature of human action classes, the Adaptive LOmo algorithm adapts and learn the relative contributions of both local and global temporal components for different action classes. Adaptation to the temporal structure for each class is important since for classes such as those relying on context (driving a car) or involving a single fast moving action (turning), global temporal information could be sufficient for classification. Through both qualitative and quantitative analysis we show that Adaptive LOMo achieves results comparable to state-of-the-art methods on 3 challenging human action recognition datasets.

    Comparison with State-of-the-art methods on Human Action Recognition

    Comparison with State-of-the-art methods on Human Facial Analysis

    Results as reported in [1, 2].

    Visualizations of detected (latent) events

    golf class

    clean-and-jerk class

    shoot-ball class

    Relative Improvement with respect to Global Temporal Pooling

    We show relative improvement of Adapative LOMo against Global Temporal Pooling on few classes. We see that improvement is higher for classes that have an underlying temporal structure.

    For citations please use following papers:

    1. Karan Sikka and Gaurav Sharma
    Discriminatively Trained Latent Ordinal Model for Video Classification
    (submitted) arXiv:1608.02318, 2016

                title = {Discriminatively Trained Latent Ordinal Model for Video Classification},
                author = {Sikka, Karan and Sharma, Gaurav},
                year = {2016},
                journal = {arXiv preprint arXiv:1608.02318}

    2. Karan Sikka, Gaurav Sharma and Marian Bartlett
    LOMo: Latent Ordinal Model for Facial Analysis in Videos
    IEEE Computer Vision and Pattern Recognition, 2016

                title = {LOMo: Latent Ordinal Model for Facial Analysis in Videos},
                author = {Sikka, Karan and Sharma, Gaurav and Bartlett, Marian},
                year = {2016},
                booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}

    Note: We will be releasing the precomputed features for these datasets shortly. For any questions please email Karan Sikka at karan.sikka1[At]gmail.com