Atelier doctorant A3SI

Orateur : Thomas Belos, Thibaut Issenhuth, Yamna Ouchtar, Mathis Petrovich
10 Mars 2022 à 00:00

le prochain atelier doctorant aura lieu jeudi 10/3 de 13h à 15h à ESIEE Paris dans l’amphi 260. Il sera également possible d’assister aux présentations en ligne à l’adresse https://meet.google.com/rfh-iovd-hpt

Le programme de l’atelier est le suivant :

  • 13h00 – 13h30 : Thomas Belos, « MOD SLAM: Mixed Method for a More Robust SLAM Without Loop Closing »  
  • 13h30 – 14h00 : Thibaut Issenhuth, « Learning disconnected manifolds with generative adversarial networks« 
  • 14h00 – 14h30 : Yamna Ouchtar, « Watershed-Based Oversampling for Imbalanced Dataset Classification« 
  • 14h30 – 15h00 : Mathis Petrovich, « Human motion generation« 

Résumés des présentations

Orateur : Thomas Belos

Titre de la présentation : MOD SLAM: Mixed Method for a More Robust SLAM Without Loop Closing

Résumé : In recent years, the state-of-the-art of monocular SLAM has seen remarkable advances in reducing errors and improving robustness. At the same time, this quality of results can be obtained in real-time on small CPUs. However, most algorithms have a high failure rate out-of-the-box. Systematic error such as drift remains still significant even for the best algorithms. This can be handled by a global measure as a loop closure, but it penalizes online data processing. We propose a mixed SLAM, based on ORB-SLAM2 and DSO: MOD SLAM. It is a fusion of photometric and feature-based methods, without being a simple copy of both. We propose a decision system to predict at each frame which optimization will produce the minimum drift so that only one will be selected to save computational time and resources. We propose a new implementation of the map that is equipped with the ability to actively work with DSO and ORB points at the same time. Our experimental results show that this method increases the overall robustness and reduces the drift without compromising the computational resources. Contrary to the best state-of-the-art algorithms, MOD SLAM can handle 100% of KITTI, TUM, and random phone videos, without any configuration change.

Orateur : Thibaut Issenhuth

Titre de la présentation :  Learning disconnected manifolds with generative adversarial networks

Résumé : Generative Adversarial Networks (GANs) recently achieved impressive results in unconditional image synthesis (e.g. face synthesis), but are still struggling on large-scale multi-class datasets. In this presentation, we will formalize a fundamental limitation of GANs when the target distribution is lying on disconnected manifolds. We establish a « no free lunch » theorem for the disconnected manifold learning, stating an upper bound on the precision of the generated distribution. Then, we will present two methods to improve the precision of a pre-trained generator: a heuristic method rejecting generated samples with high Jacobian Frobenius norms, and a learning-based method trained to minimize the Wasserstein distance between generated and target distributions.

Oratrice : Yamna Ouchtar

Titre de la présentation : Watershed-Based Oversampling for Imbalanced Dataset Classification

Résumé : For several real-world problems, such as credit card frauds, rare diseases, or road accidents, the provided dataset is composed of two or more imbalanced classes: a minority and a majority. With usual machine learning methods, this imbalance often leads to poor results where the minority class is misclassified. To alleviate these issues, several pre-processing methods, such as SMOTE, DBSMOTE or G-SMOTE are developed to build new artificial points.  Nevertheless, these oversampling methods explicitly or implicitly make hypotheses about the cluster’s size, shape, or density that may not fit the dataset in practice. We propose to improve these oversampling methods and reduce cluster assumptions, by relying on another classifier: the watershed-cut. We called this method WSSMOTE. Experimental results demonstrate that, although there is no silver bullet, WSSMOTE often outperforms G-SMOTE, DBSMOTE and SMOTE on several datasets composed of different minority class percentages.

Orateur :  Mathis Petrovich

Titre de la présentation : Human motion generation

Résumé : We tackle the problem of action-conditioned generation of realistic and diverse human motion sequences. In contrast to methods that complete, or extend, motion sequences, this task does not require an initial pose or sequence. Here we learn an action-aware latent representation for human motions by training a generative variational autoencoder (VAE). By sampling from this latent space and querying a certain duration through a series of positional encodings, we synthesize variable-length motion sequences conditioned on a categorical action. Specifically, we design a Transformer-based architecture, ACTOR, for encoding and decoding a sequence of parametricSMPL human body models estimated from action recognition datasets. We evaluate our approach on the NTU RGB+D, HumanAct12 and UESTC datasets and show improvements over the state of the art. Furthermore, we present two use cases: improving action recognition through adding our synthesized data to training, and motion denoising. Finally, I will present TEMOS, a recent extension of this work: generating diverse 3D human motions from textual descriptions.