CESI

[STAGE MASTER] - Generative Few-Shot Learning using Mixture-of-Experts Transformers for 3D Skeleton-based Human Action Recognition

Job Location

Nice, France

Job Description

Scientific fields: Computer science, Artificial Intelligence, Computer Vision Keywords: Generative Few-Shot Learning; Transformers; Mixture-of-Experts; RGBD Datasets; Human-System Interaction (HSI) Research interest: Computer Vision; Machine Learning; Deep Learning Research work: Deep Learning Models for 3D Skeleton-based Human Action Recognition 3D Skeleton-based Human Action Recognition (HAR) [1], [2] is fundamental task of pattern recognition and computer vision and a key issue in many applications, e.g., in medical and industrial imaging, robotics and VR/AR. Human Action Recognition (HAR) [1], [2] decodes human movements by analyzing sequential 3D skeletal joint coordinates obtained through advanced sensor technologies like motion capture devices, depth cameras such as Microsoft Kinect, Intel RealSense, and wearable motion sensors. These sensors track body joint positions in real-time, enabling sophisticated computational analysis of human actions and gestures across diverse domains. Recently, researchers in [3] introduced few-shot generative models for skeleton-based human action recognition, enabling accurate action classification using limited training samples from specific domains. They leveraged large public datasets (NTU RGBD 120 [4, p. 120] and NTU RGBD [5]) to develop cross-domain generative models. By introducing novel entropy-regularization losses, they effectively transferred motion diversity from source to target domains, enabling more robust action recognition with limited training samples. They used a standard model called Spatial temporal graph convolutional network GCN (ST-GCN) [6] for generation of action recognition samples, then, they trained the Few-shot generative model with the concatenated data of the real data and generated samples by ST-GCN. Few-shot scenarios [3], [6], [7] are often the case when training HAR models with very limited labeled data; this is considered a major obstacle to practical use because collecting human action data and annotating labels correctly are time consuming and labor intensive. In [8], they proposed few-shot learning using Cross-Domain HAR. Self-training is proposed to adapt representations learned in a labeled source domain (defined by activities, sensor positions, and users) to the target domain with very limited labeled data. In this internship, we will develop Generative Few-Shot Learning Models for HAR that aims to generate actions samples and data augmentation of limited training data. We will propose a novel approach to 3D Skeleton-based Human Action Recognition (HAR) by combining Generative Few-Shot Learning with Mixture-of-Experts (MoE) Transformers [9], [10], [11]. The proposed approach aims to improve the efficiency and accuracy of action recognition in RGBD datasets while addressing the challenges of limited training data. The main key concepts of the Generative Few-Shot Learning for HAR: 1.3D Skeleton-based Human Action Recognition (HAR): The method focuses on recognizing human actions based on skeletal representations, utilizing joint coordinates to represent movements [12]. The 3D skeleton captures dynamic interactions and positions of different body joints over time, which can be used to classify actions [12]. 2.Generative Few-Shot Learning: Few-shot learning [3] aims to train a model to recognize new actions using only a small number of training examples. This is particularly important in HAR due to the often-limited availability of labeled data for certain action classes. Generative models help by synthesizing additional training examples or variations of existing examples to augment the training dataset, improving model robustness. 3.Generative MoE Transformers: Combining MoE with Transformer architectures [10], [11] allows for a dynamic approach where different experts specialize in different aspects of the action recognition task. The architecture uses self-attention mechanisms to capture temporal dependencies in the skeleton data, enhancing the model's ability to understand complex actions over time. The main potential of Generative Few-Shot Learning with MoE Transformer Architecture for 3D Skeleton-based HAR: 1. Input Representation: for each action, take as input 3D skeleton joint coordinates over time. 2. Transformer Encoder: Modified with MoE layers [9], [10]. 3. MoE Layers: Replace feed-forward layers in the Transformer [9], [10]. Where Multiple experts (feed-forward networks) specializing in different aspects of action recognition. Then, A router network to determine which experts to use for each input sequence. 4. Generative Few-Shot Learning [3], [6], [7], [8], [13] based on ST-GCN [12] and MoE Layers [9], [10] to generates samples 5. Output: Action recognition. In this internship, we will train the proposed models on different datasets for 3D Human action recognition. Finally, to measure the accuracy and study the performance, we will test the proposed models on "NTU RGBD" and "NTU RGBD 120" Datasets. Work plan: The working plan in general is divided in two phases: 1) In the first phase (about two-months), student will provide the state of-the-art (SOTA) of the Few-Shot learning models (machine/deep learning) applied for 3D Skeleton-based Human Action Recognition (HAR). Then, student will test SOTA models on "NTU RGBD" and "NTU RGBD 120" Datasets. 2) In the second phase (about four months), student would propose contributions to the following research directions: 1. Proposing new Few-Shot learning model based on Generative model, Transformers and Mixture-of-Experts for Human Action Recognition 2. Studying the properties of such models (complexity, expressivity, frugality) 3. Application of the proposed Few-Shot learning model on NTU RGBD and NTU RGBD 120 Datasets for Human Action Recognition tasks to measure the accuracy and study the performance. Expected scientific production Different scientific productions, write an international peer-reviewed conference paper or an indexed journal paper are expected: 1. Journal publication relating to the literature review about Deep learning models for 3D Skeleton-based Human Action Recognition (HAR) using NTU RGBD and NTU RGBD 120 Datasets 2. Publication relating to our proposal of a new Generative Few-Shot learning model for 3D HAR, model performance and evaluation based on training/validation and testing on Human-System Interaction datasets. Introduction to the laboratory CESI LINEACT- Research Unit CESI LINEACT (Digital Innovation Laboratory for Companies and Learnings at the service of the territories competitiveness) is the CESI group laboratory whose activities are implemented on CESI campuses. Link to the laboratory…

Location: Nice, FR

Posted Date: 2/3/2025

Click Here to Apply

View More CESI Jobs

Contact Information

Contact	Human Resources CESI