Méthodes statistiques de data-mining et apprentissage

Les activités de l’équipe MSDMA (Méthodes statistiques de data-mining et apprentissage) se situent dans le domaine de la science des données. Elles concernent le traitement de données par des méthodes mathématiques, statistiques et informatiques dont le concept fédérateur est celui du data-mining.. Cette discipline se propose de découvrir des relations et des structures dans des données à travers des méthodes d’apprentissage supervisé et non supervisé. Elle se situe à la frontière de la statistique, de l’intelligence artificielle et des bases de données. La théorie de l’apprentissage lui donne ses fondements conceptuels.

Les travaux des membres de l’équipe portent sur le développement de méthodes exploratoires et de modélisation paramétrique et non paramétrique ainsi que d’outils informatiques pour leur mise en œuvre.

Ces travaux permettent le traitement de données complexes : données éparses, aberrantes, manquantes, tronquées, censurées, données mixtes mélangeant des variables quantitatives et qualitatives, données structurées en blocs ou multi-tableaux…

Ces données proviennent de divers domaines tels que l’environnement, la télédétection, les procédés industriels, les images, la médecine, la santé et les sciences sociales.

En raison l’augmentation en volume et en variété des bases de données et plus généralement avec l’émergence des nouveaux domaines du data mining et du big data, l’équipe répond aux enjeux scientifiques autour de la science des données également en explorant des thèmes émergents tel que celui de l’explicabilité des méthodes d’IA.

Conférences et revues avec comité de lecture

2024

Articles de revue

  1. Abdi, H.; Guillemot, V.; Liu, R.; Niang, N.; Saporta, G. and Yu, J-c. From Plain to Sparse Correspondence Analysis: a Generalized SVD Approach. In Statistica Applicata - Italian Journal of Applied Statistics, 35 (3): 301-338, 2024. doi  www 

Articles de conférence

  1. Ndao, M-L.; Youness, G.; Niang, N. and Saporta, G. Enhancing Explainability in Predictive Maintenance : Investigating the Impact of Data Preprocessing Techniques on XAI Effectiveness. In The 37th International Conference of the Florida Artificial Intelligence Research Society, Florida, United States, Special Track: Explainable, Fair, and Trustworthy AI 37, 2024. doi  www 

2023

Articles de revue

  1. Liu, R.; Niang, N.; Saporta, G. and Wang, H. Sparse correspondence analysis for large contingency tables. In Advances in Data Analysis and Classification, 17 (4): 1037-1056, 2023. doi  www 
  1. Bry, X.; Niang, N.; Verron, T. and Bougeard, S. Clusterwise elastic-net regression based on a combined information criterion. In Advances in Data Analysis and Classification, 17: 75-107, 2023. doi  www 

Articles de conférence

  1. Youness, G.; Phan, Nu U. T. and Boulakia, B. C. BootBOGS: Hands-on optimizing Grid Search in hyperparameter tuning of MLP. In AICCSA 2023 : 20th ACS/IEEE International Conference on Computer Systems and Applications, Giza, Egypt, Track 4: Artificial Intelligence & Cognitive Systems , 2023. www 

2022

Articles de revue

  1. Daouda, O. S.; Chevance, A.; Temime, L.; Légeron, P.; Gaillard, R.; Saporta, G. and Hocine, M. A new ranking index to identify the work-related psychosocial factors most impacting mental health: a cross-sectional study. In BMJ Open, 12 (12): e046444, 2022. doi  www 
  1. Le Guen, V. and Thome, N. Deep Time Series Forecasting with Shape and Temporal Criteria. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 (1): 342-355, 2022. doi  www 
  1. Audigier, V. and Niang, N. Clustering with missing data: which equivalent for Rubin's rules?. In Advances in Data Analysis and Classification, 2022. doi  www 
  1. Boukela, L.; Zhang, G.; Yacoub, M.; Bouzefrane, S. and Baba Ahmadi, S. B. An approach for unsupervised contextual anomaly detection and characterization. In Intelligent Data Analysis, 26 (5): 1185-1209, 2022. doi  www 
  1. Bar-Hen, A. and Audigier, V. An ensemble learning method for variable selection: application to high dimensional data and missing values. In Journal of Statistical Computation and Simulation, 2022. doi  www 

Articles de conférence

  1. Ameur, Y.; Aziz, R.; Audigier, V. and Bouzefrane, S. Secure and non-interactive k-NN classifier using symmetric fully homomorphic encryption. In Privacy in statistical databases (PSD'2022), pages 142-154, Springer International Publishing, Paris, France, Lecture Notes in Computer Science 13463, 2022. doi  www 
  1. Calem, L.; Ben-Younes, H.; Perez, P. and Thome, N. Diverse Probabilistic Trajectory Forecasting with Admissibility Constraints. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 3478-3484, IEEE, Montreal, Canada, 2022. doi  www 

2021

Articles de revue

  1. Djennane, N.; Yacoub, M.; Aoudjit, R. and Bouzefrane, S. CPU-based prediction with Self Organizing Map in Dynamic Cloud Data Centers. In International Journal of Sensors, Wireless Communications and Control, 11 (7): 733-747, 2021. doi  www 
  1. Mboup, B.; Le Tourneau, C. and Latouche, A. Insights for Quantifying the Long-Term Benefit of Immunotherapy Using Quantile Regression. In JCO precision oncology (5): 173-176, 2021. doi  www 
  1. Huang, T.; Saporta, G.; Wang, H. and Wang, S. A robust spatial autoregressive scalar-on-function regression with t-distribution. In Advances in Data Analysis and Classification, 15 (1): 57-81, 2021. doi  www 
  1. Bar-Hen, A.; Gey, S. and Poggi, J-M. Spatial CART Classification Trees. In Computational Statistics, 2021. doi  www 
  1. Boukela, L.; Zhang, G.; Yacoub, M.; Bouzefrane, S.; Bagheri, S. and Jelodar, H. A modified LOF based approach for outlier characterization in IoT. In Annals of Telecommunications - annales des télécommunications, 76 (3-4): 145-153, 2021. doi  www 
  1. Yin, Y.; Le Guen, V.; Don`a, J.; de Bézenac, E.; Ayed, I.; Thome, N. and Gallinari, P. Augmenting physical models with deep networks for complex dynamics forecasting. In Journal of Statistical Mechanics: Theory and Experiment, 2021 (12): 124012, 2021. doi  www 
  1. Moins-Teisserenc, H.; Cordeiro, D. J.; Audigier, V.; Ressaire, Q.; Benyamina, M.; Lambert, J.; Maki, G.; Homyrda, L.; Toubert, A. and Legrand, M. Severe Altered Immune Status After Burn Injury Is Associated With Bacterial Infection and Septic Shock. In Frontiers in Immunology, 12: 586195, 2021. doi  www 

Articles de conférence

  1. Diallo, A. W.; Niang, N. and Ouattara, M. Sparse Subspace K-means. In 3rd IEEE ICDM Workshop on Deep Learning and Clustering. In conjunction with IEEE ICDM 2021 December 7-10, 2021., pages 678-685, IEEE, Auckland, New Zealand, 2021. doi  www 
  1. Audigier, V.; Niang, N. and Resche-Rigon, M. Clustering sur données incomplètes~: quel modèle d'imputation choisir~?. In EPICLIN 2021 -- 15e Conférence francophone d'épidémiologie clinique -- 28e Journées des statisticiens des centres de lutte contre le cancer, pages S21-S22, Elsevier Masson, Marseille, France, 2021. doi  www 

2020

Articles de revue

  1. Meddis, A.; Latouche, A.; Zhou, B.; Michiels, S. and Fine, J. Meta-analysis of clinical trials with competing time-to-event endpoints. In Biometrical Journal, 62 (3): 712-723, 2020. doi  www 
  1. Mirouse, A.; Parrot, A.; Audigier, V.; Demoule, A.; Mayaux, J.; Geri, G.; Mariotte, E.; Bréchot, N.; de Prost, N.; Vautier, M.; Neuville, M.; Bigé, N.; de Montmollin, E.; Cacoub, P.; Resche-Rigon, M.; Cadranel, J. and Saadoun, D. Severe diffuse alveolar hemorrhage related to autoimmune disease: a multicenter study. In Critical Care, 24 (1), 2020. doi  www 
  1. Torres, R.; Di Bernardino, E.; Laniado, H. and Lillo, R. On the estimation of extreme directional multivariate quantiles. In Communications in Statistics - Theory and Methods, 49 (22): 5504-5534, 2020. doi  www 
  1. Wang, Z.; Wang, H.; Wang, S.; Lu, S. and Saporta, G. Linear mixed-effects model for longitudinal complex data with diversified characteristics. In Journal of Management Science and Engineering, 5 (2): 105-124, 2020. doi  www 
  1. Wang, H.; Liu, R.; Wang, S.; Wang, Z. and Saporta, G. Ultra-high dimensional variable screening via Gram--Schmidt orthogonalization. In Computational Statistics, 35: 1153-1170, 2020. doi  www 
  1. Russolillo, G. and Saporta, G. Using partial least squares regression for conjoint analysis. In Statistica Applicata - Italian Journal of Applied Statistics, 32: 67-84, 2020. doi  www 
  1. Zaffora, B.; Demeyer, S.; Magistris, M.; Ronchetti, E.; Saporta, G. and Theis, C. A Bayesian framework to update scaling factors for radioactive waste characterization. In Applied Radiation and Isotopes: 109092, 2020. doi  www 
  1. Yala, K.; Niang, N.; Brajard, J.; Mejia, C.; Ouattara, M.; El Hourany, R.; Crépon, M. and Thiria, S. Estimation of phytoplankton pigments from ocean-color satellite observations in the Senegalo--Mauritanian region by using an advanced neural classifier. In Ocean Science, 16 (2): 513-533, 2020. doi  www 
  1. Desjonquères, C.; Rybak, F.; Ulloa, J. S.; Kempf, A.; Bar-Hen, A. and Sueur, J. Monitoring the acoustic activity of an aquatic insect population in relation to temperature, vegetation and noise. In Freshwater Biology, 65 (1): 107-116, 2020. doi  www 

2019

Articles de revue

  1. Brogi, G. and Bernardino, E. Di Hidden Markov models for advanced persistent threats. In International Journal of Security and Networks, 14 (4): 181, 2019. doi  www 
  1. Duchemin, T.; Bar-Hen, A.; Lounissi, R.; Dab, W. and Hocine, M. Hierarchizing Determinants of Sick Leave. In Journal of Occupational and Environmental Medicine, 61 (8): e340-e347, 2019. doi  www 
  1. Berthelot, G. C.B.; Bar-Hen, A.; Marck, A.; Foulonneau, V.; Douady, S.; Noirez, P.; Zablocki-Thomas, P. B.; Antero, J.; Carter, P. A.; Di Meglio, J-M. and Toussaint, J-F. c. An integrative modeling approach to the age-performance relationship in mammals at the cellular scale. In Scientific Reports, 2019. doi  www 
  1. Wang, H.; Gu, J.; Wang, S. and Saporta, G. Spatial partial least squares autoregression: Algorithm and applications. In Chemometrics and Intelligent Laboratory Systems, 184: 123-131, 2019. doi  www 
  1. Graffeo, N.; Latouche, A.; Le Tourneau, C. and Chevret, S. ipcwswitch: An R package for inverse probability of censoring weighting with an application to switches in clinical trials. In Computers in Biology and Medicine, 111: 103339, 2019. doi  www 
  1. Latouche, A.; Andersen, P. K.; Rey, G. and Moreno-Betancur, M. A note on the measurement of socioeconomic inequalities in life years lost by cause of death. In Epidemiology, 30 (4): 569-572, 2019. doi  www 
  1. Austin, P.; Latouche, A. and Fine, J. A review of the use of time-varying covariates in the Fine-Gray subdistribution hazard competing risk regression model. In Statistics in Medicine, 2019. doi  www 
  1. Huang, T.; Wang, H. and Saporta, G. 成分数据的空间自回归模型. In Journal of Beijing University of Aeronautics and Astronautics, 45 (1): 93-98, 2019. doi  www 
  1. Bougeard, S.; Chauvin, C.; Saporta, G. and Niang, N. Régression multibloc sur classes latentes. Application `a l'usage d'antibiotiques en élevages de lapins. In Epidémiologie et Santé Animale, 76: 43-53, 2019. www 
  1. Biermé, H.; Bernardino, E. Di; Duval, C. and Estrade, A. Lipschitz-Killing curvatures of excursion sets for two dimensional random fields. In Electronic Journal of Statistics, 13: 536-581, 2019. www 

Articles de conférence

  1. Cadene, R.; Ben-Younes, H.; Cord, M. and Thome, N. MUREL: Multimodal Relational Reasoning for Visual Question Answering. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, United States, 2019. www 
  1. Ben-Younes, H.; Cadene, R.; Thome, N. and Cord, M. BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection. In AAAI 2019 - 33rd AAAI Conference on Artificial Intelligence, Honolulu, United States, 2019. www 
  1. Corbière, C.; Thome, N.; Bar-Hen, A.; Cord, M. and Pérez, P. Addressing Failure Prediction by Learning Model Confidence. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pages 2898-2909, Curran Associates, Inc., Vancouver, Canada, 2019. www 
  1. Le Guen, V. and Thome, N. Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, Canada, Advances in Neural Information Processing Systems 32 (NIPS 2019) proceedings 4191--4203, 2019. www 

2018

Articles de revue

  1. Chevalier, M.; Thome, N.; Henaff, G. and Cord, M. Classifying low-resolution images by integrating privileged information in deep CNNs. In Pattern Recognition Letters, 116: 29-35, 2018. doi  www 
  1. Wei, Y.; Wang, H.; Wang, S. and Saporta, G. Incremental modelling for compositional data streams. In Communications in Statistics - Simulation and Computation, 48 (8): 2229-2243, 2018. doi  www 
  1. Bougeard, S.; Cariou, V.; Saporta, G. and Niang, N. Prediction for regularized clusterwise multiblock regression. In Applied Stochastic Models in Business and Industry, 34 (6): 852-867, 2018. doi  www 
  1. Beck, G.; Azzag, H.; Bougeard, S.; Lebbah, M. and Niang, N. A New Micro-Batch Approach for Partial Least Square Clusterwise Regression. In Procedia Computer Science, 144: 239-250, 2018. doi  www 
  1. Bougeard, S.; Abdi, H.; Saporta, G. and Niang, N. Clusterwise analysis for multiblock component methods. In Advances in Data Analysis and Classification, 12 (2): 285-313, 2018. doi  www 
  1. Audigier, V.; White, I.; Jolani, S.; Debray, T.; Quartagno, M.; Carpenter, J.; van Buuren, S. and Resche-Rigon, M. Multiple Imputation for Multilevel Data with Continuous and Binary Variables. In Statistical Science, 33 (2): 160-183, 2018. doi  www 
  1. Massiera, P.; Trinchera, L. and Russolillo, G. 'Evaluation de la présence des capacités marketing. Proposition d'un index multidimensionnel et hiérarchique. In Recherche et Applications en Marketing (French Edition), 33 (1): 31-55, 2018. doi  www 
  1. Ioannidou, D.; Malherbe, L.; Beauchamp, M.; Saby, N. P. A.; Bonnard, R. and Caudeville, J. Characterization of Environmental Health Inequalities Due to Polyaromatic Hydrocarbon Exposure in France. In International Journal of Environmental Research and Public Health, 15 (12): 2680, 2018. doi  www 
  1. Antero, J.; Pohar-Perme, M.; Rey, G.; Toussaint, J-F. c. and Latouche, A. The heart of the matter: years-saved from cardiovascular and cancer deaths in an elite athlete cohort with over a century of follow-up. In European Journal of Epidemiology, 33 (6): 531-543, 2018. doi  www 
  1. Saporta, G. Training data scientists: a few challenges. In International Journal of Data Science and Analytics, 6 (3): 201-204, 2018. doi  www 
  1. Woringer, M.; Martiny, N.; Porgho, S.; Bicaba, B. W.; Bar-Hen, A. and Mueller, J. E. Atmospheric dust, early cases, and localized meningitis epidemics in the African meningitis belt: an analysis using high spatial resolution data. In Environmental Health Perspectives, 126 (9): 097002, 2018. doi  www 

Articles de conférence

  1. Jaupi, L. New Test Methods to Evaluate Potential Performance of Cosmetic Products. In 20th International Conference Materials, Methods & Technologies 2018, Burgas, Bulgaria, 2018. www 
  1. Robert, T.; Thome, N. and Cord, M. HybridNet: Classification and Reconstruction Cooperation for Semi-supervised Learning. In ECCV 2018 - 15th European Conference on Computer Vision, pages 158-175, Springer, Munich, Germany, Lecture Notes in Computer Science 11211, 2018. doi  www 
  1. Durand, P.; Ghorbanzadeh, D. and Jaupi, L. Different approaches for the texture classification of a remote sensing image bank. In Ninth International Conference on Graphic and Image Processing, pages 1-9, SPIE, Qingdao, China, 2018. doi  www 
  1. Durand, P.; Ghorbanzadeh, D. and Jaupi, L. Index Theorem and Applications, a Gentle Review. In CMCGS 2018. 7th Annual International Conference on Computational Mathematics, Computational Geometry, pages pp.1-6, Digital Library, Singapore, Singapore, Series Proc. Computational Mathematics Computational Geometry and Statistics (CMCGS) , 2018. doi  www 

2017

Articles de revue

  1. Zaffora, B.; Magistris, M.; Chevalier, J-P.; Saporta, G.; Luccioni, C. and Ulrici, L. A new approach to characterize very-low-level radioactive waste produced at hadron accelerators. In Appl.Radiat.Isot., 122: 141-147, 2017. doi  www 
  1. Geronimi, J. and Saporta, G. Variable selection for multiply-imputed data with penalized generalized estimating equations. In Computational Statistics and Data Analysis, 110: 103-114, 2017. doi  www 
  1. Liberati, C.; Camillo, F. and Saporta, G. Advances in credit scoring: combining performance and interpretation in kernel discriminant analysis. In Advances in Data Analysis and Classification, 11 (1): 121-138, 2017. doi  www 
  1. Di Bernardino, E. and Rullière, D. A note on upper-patched generators for Archimedean copulas. In ESAIM: Probability and Statistics, 2017. doi  www 

Articles de conférence

  1. Ghorbanzadeh, D.; Durand, P. and Jaupi, L. Generating the Skew Normal random variable. In World Congress on Engineering 2017, pages 113-116, London-UK, United Kingdom, 2017. www 
  1. Ben-Younes, H.; Cadene, R.; Cord, M. and Thome, N. MUTAN: Multimodal Tucker Fusion for Visual Question Answering. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2631-2639, IEEE, Venice, Italy, 2017 IEEE International Conference on Computer Vision (ICCV) , 2017. doi  www 
  1. Ghorbanzadeh, D.; Durand, P. and Jaupi, L. A method for the Generate a random sample from a finite mixture distributions. In CMCGS 2017. 6th Annual International Conference on Computational Mathematics, Computational Geometry & Statistics, Singapore, Singapore, 2017. doi  www 

Actions de diffusion scientifique

2024

Chapitres d'ouvrage

  1. Abdi, H.; Di Ciaccio, A. and Saporta, G. Old and~New Perspectives on~Optimal Scaling. In Analysis of Categorical Data from Historical Perspectives, pages 131-154, Springer Nature, Behaviormetrics: Quantitative Approaches to Human Behavior 17, 2024. doi  www 

Articles de conférence

  1. Saporta, G. Optimal Scaling: New Insights Into an Old Problem. In SDS 2024; Statistics and Data Science Conference, pages 97-100, Palermo University Press, Palermo, Italy, 2024. www 
  1. Audigier, V. Clustering sur données incomplètes avec clusterMI. In 10èmes Rencontres R, Vannes (Bretagne, France), France, 2024. www 
  1. Saporta, G. Codage optimal et encodage: nouveaux regards sur un ancien problème. In CISEM 2024, 4ème Colloque international statistique et économétrie, Mahdia, Tunisia, 2024. www 

Divers

  1. Dieye, N. A.; Niang, N. and Russolillo, G. Sensibilité des indices de qualité d'un classifieur probabiliste. , Poster. www 

2023

Chapitres d'ouvrage

  1. Saporta, G. Histoire et enjeux de l'IA. In L'IA éducative. L'intelligence artificielle dans lénseignement supérieur, pages 41-50, Bréal, Thèmes & Débats , 2023. www 
  1. Saporta, G. Préface. In Voyage au bout de l'IA: Ce qu'il faut savoir sur l'intelligence artificielle, pages 5-8, De Boeck Supérieur, 2023. www 
  1. Ameur, Y.; Bouzefrane, S. and Audigier, V. Application of Homomorphic Encryption in Machine Learning. In Emerging Trends in Cybersecurity Applications, pages 391-410, Springer International Publishing, 2023. doi  www 
  1. Saporta, G. and Stoltz, G. Gilbert Saporta : un parcours éclectique. In Les nombres, acteurs de changement, pages 85-104, Presses des Mines, Sciences Sociales , 2023. www 

2022

Livres

  1. Gégout-Petit, A.; Maumy-Bertrand, M.; Saporta, G. and Thomas-Agnan, C. Données manquantes. Editions Technip, 2022. www 
  1. Aimetti, J-P.; Coppet, O. and Saporta, G. Manifeste pour une intelligence artificielle comprise et responsable. Cent Mille Milliards, 2022. www 

Chapitres d'ouvrage

  1. Gégout-Petit, A.; Maumy-Bertrand, M.; Saporta, G. and Thomas-Agnan, C. Une histoire lacunaire. In Données manquantes, pages 1-27, Editions Technip, 2022. www 
  1. Audigier, V. Imputation multiple en grande dimension par analyse factorielle. In Données manquantes, Editions TECHNIP, 2022. www 
  1. Audigier, V. Gestion des données manquantes par imputation multiple. In Données manquantes, Editions TECHNIP, 2022. www 
  1. Saporta, G. Algorithmes de recommandation. In Données manquantes, pages 247-252, Editions Technip, 2022. www 

Articles de conférence

  1. Huang, T. and Saporta, G. Some spatial regression models for functional and compositional data. In Conference in honor of Christine Thomas-Agnan, Toulouse, France, 2022. www 
  1. Saporta, G. Equité et explicabilité des algorithmes :~ définitions, paradoxes et biais. In CISEM 2022, 3eme Colloque international statistique et économétrie, Mahdia, Tunisia, 2022. www 
  1. Saporta, G. On some issues related to the fairness of algorithms. In Compstat 2022, 24th International Conference on Computational Statistics, Bologna, Italy, 2022. www 
  1. Audigier, V.; Niang, N. and Resche-Rigon, M. Clustering with missing data: which imputation model for which cluster analysis method?. In 17th conference of the International Federation of Classification Societies, Porto, Portugal, 2022. www 

Divers

  1. Charrier, T.; Fresneau, B.; Haddy, N.; Schwartz, B.; Journy, N.; Demoor-Goldschmidt, C.; Diallo, I.; Surun, A.; Aerts, I.; Doz, F. c.; Souchard, V.; Vu-Bezin, G.; Lemler, S.; Letort, V.; Rubino, C.; Fresneau, B.; Haddy, N.; Schwartz, B.; Journy, N.; Demoor-Goldschmidt, C.; Diallo, I.; Surun, A.; Aerts, I.; Doz, F. c.; Souchard, V.; Vu-Bezin, G.; Letort, V.; Rubino, C.; de Vathaire, F.; Latouche, A. and Allodji, R. S Increased Cardiac Risk After a Second Malignant Neoplasm Among Childhood Cancer Survivors, a FCCSS Study. , Poster. www 

2021

Livres

  1. Bertrand, F.; Saporta, G. and Thomas-Agnan, C. Statistique et causalité. Editions Technip, 2021. www 

Chapitres d'ouvrage

  1. Huang, T.; Saporta, G. and Wang, H. A Spatial Durbin Model for Compositional Data. In Advances in Contemporary Statistics and Econometrics, pages 471-488, Springer Nature, 2021. doi  www 

Articles de conférence

  1. Niang-Keita, N.; Ouattara, M. and Saporta, G. Sparse Divisive Feature Clustering. In XXVIII Meeting of the Portuguese Association for Classification and Data Analysis (JOCLAD 2021), pages 75-76, Covilh~a, Portugal, Program and Book of Abstracts , 2021. www 
  1. Hassini, H.; Niang, N. and Audigier, V. SOM-based clusterwise regression. In Data Science, Statistics and Visualisation, Rotterdam, Netherlands, 2021. www 
  1. Saporta, G. Sparse Correspondence Analysis for Contingency Tables. In Celebrating 40 years of Greek Statistical Institute 1981-2021, Athènes, Greece, 2021. www 
  1. Saporta, G. Interprétabilité des modèles prédictifs. In ASI 11. 11ème Colloque International sur l'Analyse Statistique Implicative, Belfort, France, 2021. www 
  1. Fateri Gouard, N.; Niang, N. and Ouattara, M. Unbiased Feature selection in Random Forests using Consensus Feature Clustering. In Data Science, Statistics & Visualisation(DSSV) and European Conference on Data Analysis (ECDA), Rotterdam, Netherlands, 2021. www 
  1. Bougeard, S.; Bry, X.; Verron, T. and Niang, N. Combined-information criterion for clusterwise elastic-net regression. Application to omic data. In 8th Channel Network Conference, Paris, France, 2021. www 
  1. Saporta, G. From the triumph of black boxes to the right to understand and the search for fairness. In ASMDA 2021, Athens, Greece, 2021. www 
  1. Boukela, L.; Zhang, G.; Yacoub, M. and Bouzefrane, S. A near-autonomous and incremental intrusion detection system through active learning of known and unknown attacks. In IEEE International Conference on Security, Pattern Analysis, and Cybernetics, pages 374-379, IEEE, Chengdu, China, 2021. doi  www 
  1. Audigier, V. and Niang, N. Cluster analysis after multiple imputation. In ASMDA 2021, Athènes, Greece, 2021. www 

2020

Livres

  1. Diday, E.; Guan, R.; Saporta, G. and Wang, H. Advances in Data Science. Symbolic, Complex and Network Data. ISTE-WILEY, Big Data, Artificial Intelligence and Data Analysis , 2020. www 

Articles de conférence

  1. Saporta, G. About Interpreting and Explaining Machine Learning and Statistical Models. In SMTDA 2020; 6th Stochastic Modeling Techniques and Data Analysis International Conference, Barcelone (virtual), Spain, 2020. www 

Rapports

  1. Bauer, A.; Faron, O.; Richier, J.; Bar-Hen, A.; Béra, M.; Cappelletti, L.; Collomb, A.; Durance, P.; Fleury-Perkins, C.; Fontanet, A.; Gnesotto, N.; Réau, B. and Trainar, P. Chaire Nouveaux risques : rapport 2020. Technical Report, Conservatoire national des arts et métiers (Cnam) ; Allianz France, 2020.

2019

Chapitres d'ouvrage

  1. Saporta, G. 50 Years of Data Analysis: From Exploratory Data Analysis to Predictive Modeling and Machine Learning. In Data Analysis and Applications 1. Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, ISTE-Wiley, Data Analysis and Applications , 2019. doi  www 
  1. Mariadassou, M.; Bar-Hen, A. and Kishino, H. Tree Evaluation and Robustness Testing. In Encyclopedia of Bioinformatics and Computational Biology, pages 736-745, Elsevier, 2019. doi  www 

Articles de conférence

  1. Saporta, G. De l'analyse exploratoire `a la modélisation prédictive: le chemin de la science des données. In Montpellier: berceau de la Data Science. Colloque en l'honneur du Pr. Yves Escoufier, Montpellier, France, 2019. www 
  1. Milliet de Faverges, M.; Picouleau, C.; Russolillo, G.; Merabet, B. and Houzel, B. Impact of calibration of perturbations in simulation: the case of robustness evaluation at a station. In RailNorrk"oping 2019. 8th International Conference on Railway Operations Modelling and Analysis (ICROMA), Norrk"oping, Sweden, 2019. www 
  1. Saporta, G. Science des données, données massives : défis et nouveaux métiers. In CISEM 2019, Mahdia, Tunisia, 2019. www 
  1. Saporta, G.; Liu, R.; Niang Keita, N. and Wang, H. Sparse Correspondence Analysis. In ASMDA 2019. 18th Conference of the Applied Stochastic Models and Data Analysis International Society, Florence, Italy, 2019. www 
  1. Daouda, O.; Chevance, A.; Salvador, A.; Légeron, P.; Morvan, Y.; Saporta, G.; Hocine, M. and Gaillard, R. Impact of work-related psychosocial factors on mental health: A cross-sectional study in the French working population. In Work, Stress and Health 2019 Conference of the American Psychological Association, Philadelphia, United States, 2019. www 
  1. Saporta, G.; Liu, R.; Niang Keita, N. and Wang, H. Sparse Methods for Unsupervised Data Analysis. In The 4th International Symposium on Interval Data Modelling (SIDM 2019), Pékin, China, 2019. www 
  1. Faucheux, L.; Resche-Rigon, M.; Audigier, V.; Curis, E.; Soumelis, V. and Chevret, S. Clustering with missing data: Pooling multiple imputation results with consensus clustering. In 40th Annual Conference of the International Society for Clinical Biostatistics, Leuven (BE), Belgium, 2019. www 
  1. Jaupi, L. Combinations of Shewhart and CUSUM Control Charts for Individual Observations. In MMT2019, Burgas, Bulgaria, 2019. www 
  1. Audigier, V. and Resche Rigon, M. micemd: a smart multiple imputation R package for missing multilevel data. In UseR!2019, Toulouse, France, 2019. www 

Divers

2018

Livres

  1. Maumy-Bertrand, M.; Saporta, G. and Thomas-Agnan, C. Apprentissage statistique et données massives. Editions Technip, 2018. www 

Chapitres d'ouvrage

  1. Saporta, G. From Conventional Data Analysis Methods to Big Data Analytics. In Big Data for Insurance Companies, pages 27-41, John Wiley & Sons, Inc., 2018. doi  www 
  1. Saporta, G. Une brève histoire de l'apprentissage. In Apprentissage statistique et données massives, Editions Technip, 2018. www 
  1. Ghorbanzadeh, D.; Durand, P. and Jaupi, L. Application and generation of the univariate Skew Normal random variable. In Transactions on Engineering Technologies. 25th World Congress on Engineering, pages pp. 129-138, Springer, Transactions on Engineering Technologies. 25th World Congress on Engineering , 2018. doi  www 

Articles de conférence

  1. Milliet de Faverges, M.; Russolillo, G.; Picouleau, C.; Merabet, B. and Houzel, B. Estimating long-term delay risk with Generalized Linear Models. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2911-2916, IEEE, Maui, France, 2018. doi  www 
  1. Milliet de Faverges, M.; Russolillo, G.; Picouleau, C.; Merabet, B. and Houzel, B. Modelling passenger train arrival delays with Generalized Linear Models and its perspective for scheduling at main stations. In 8th International Conference on Railway Engineering (ICRE 2018), IET, London, United Kingdom, 2018. doi  www 
  1. Huang, T.; Saporta, G.; Wang, H. and Wang, S. SFLM: A mix of a Functional Linear Model and of a Spatial Autoregressive Model for spatially correlated functional data. In CroNoS Workshop on Functional Data Analysis, Iasi, Romania, 2018. www 
  1. Jaupi, L. Statistical methods to study consistency between declared and measured values on waste packages. In COMPSTAT 2018, The 23rd International Conference on Computational Statistics, Iasi, Romania, 2018. www 
  1. Saporta, G. Clusterwise Methods: a Synthesis and New Developments. In Homenagem a Fernando da Costa Nicolau, pages 23-26, Lisbonne, Portugal, Homenagem a Fernando da Costa Nicolau , 2018. www 

Divers

  1. Niang, N. Multiblock consensus clustering. , Poster. www 

Rapports

  1. Comité Scientifique Du Haut Conseil Des Biotechnologies, .; Angevin, F.; Bagnis, C.; Bar-Hen, A.; Barny, M. A. M. A.; Bellivier, F.; Berny, P.; Boireau, P.; Brévault, T.; Chauvel, B. B.; Coléno, F. c.; Couvet, D.; Dassa, E.; Eychenne, N.; Franche, C.; Guerche, P.; Guillemain, J.; Hernandez Raquet, G.; Jestin, A.; Klonjkowski, B.; Lavielle, M.; Le Corre, V. V.; Lemaire, O. O.; Lereclus, D.; Maximilien, R.; Meurs, E.; Moreau de Bellaing, C.; Naffakh, N.; Négre, D.; Noyer, J-L.; Ochatt, S.; Pages, J-C.; Parzy, D.; Regnault-Roger, C.; Renard, M.; Saindrenan, P.; Simonet, P.; Troadec, M-B.; Vaissière, B.; de Verneuil, H. and Vilotte, J-L. Commentaires sur le projet de document consensus de l'OCDE sur les considérations environnementales relatives `a l'évaluation des risques associé. Paris, le 23 mai 2018. Technical Report, Haut Conseil des Biotechnologies, 2018.
  1. Comité Scientifique Du Haut Conseil Des Biotechnologies, .; Angevin, F.; Bagnis, C.; Bar-Hen, A.; Barny, M. A. M. A.; Bellivier, F.; Berny, P.; Boireau, P.; Brévault, T.; Chauvel, B. B.; Coléno, F. c.; Couvet, D.; Dassa, E.; Eychenne, N.; Franche, C.; Guerche, P.; Guillemain, J.; Hernandez Raquet, G.; Jestin, A.; Klonjkowski, B.; Lavielle, M.; Le Corre, V. V.; Lemaire, O. O.; Lereclus, D.; Maximilien, R.; Meurs, E.; Moreau de Bellaing, C.; Naffakh, N.; Négre, D.; Noyer, J-L.; Ochatt, S.; Pages, J-C.; Parzy, D.; Regnault-Roger, C.; Renard, M.; Saindrenan, P.; Simonet, P.; Troadec, M-B.; Vaissière, B.; de Verneuil, H. and Vilotte, J-L. Avis en réponse `a la saisine HCB - dossier C/NL/06/01_001. Paris, le 17 octobre 2018. Technical Report, Haut Conseil des Biotechnologies, 2018.

2017

Livres

  1. Bertrand, F.; Droesbeke, J-J.; Saporta, G. and Thomas-Agnan, C. Model Choice and Model Aggregation. Editions Technip, 2017. www 

Chapitres d'ouvrage

  1. Saporta, G. Des méthodes classiques d'analyse des données au Big Data. In Le big data pour les compagnies d'assurance, pages 41-55, ISTE, Innovation, Entrepreneuriat et Gestion Série Big Data, IA et analyse de données, 2017. www 
  1. Petrarca, F.; Russolillo, G. and Trinchera, L. Integrating Non-metric Data in Partial Least Squares Path Models: Methods and Application. In Partial Least Squares Path Modeling, pages 259-279, Springer International Publishing, 2017. doi  www 

Articles de conférence

  1. Hocine, M.; Feropontova, N.; Niang, N.; Ait Bouziad, K. and Saporta, G. Importance of factors contributing to work-related stress: comparison of four metrics. In ASMDA 2017, London, United Kingdom, 2017. www 
  1. Bougeard, S.; Niang-Keita, N.; Preda, C. and Saporta, G. Clusterwise Sparse PLS. In PLS'17, Macao, Macau SAR China, 2017. www 
  1. Jaupi, L. Dual-use performance measures for customer service evaluation in bike-shared systems. In 61st World Statistics Congress -- WSC-ISI2017, Marrakech, Morocco, 2017. www 
  1. Jaupi, L. Using Big Data to Display the Quality of Service Provided on a Bike-Shared Network. In 19th International Conference Materials, Methods & Technologies 2017, Burgas, Bulgaria, 2017. www 
  1. Saporta, G. Clusterwise methods, past and present. In ISI 2017 61st World Statistics Congress, Marrakech, Morocco, 2017. www 
  1. Renosh, P.; Jourdin, F.; Charantonis, A. A.; Yala, K.; Badran, F.; Thiria, S.; Guillou, N. and Gohin, F. Construction of multi-year time series profiles of suspended particulate inorganic matter concentrations from highly dynamic coastal waters of the English Channel using self-organizing maps and hidden Markov model. In Third International Ocean Colour Science Meeting, Lisbon, Portugal, 2017. www 

Actions de valorisation

Projets en cours

Dotation MSDMA 2024
  • Nom complet: Dotation MSDMA 2024: Dotation MSDMA 2024 - Financeur: Laboratoire Cédric
  • Durée: January 2024 - December 2024
  • Résumé:
PEX Jurai 2024
  • Nom complet: PEX Jurai 2024: PEX Jurai 2024 - Financeur: Laboratoire Cédric
  • Durée: January 2024 - December 2024
  • Résumé:
FASCINATION-SHOM
  • Nom complet: FASCINATION-SHOM: FASCINATION-SHOM - Financeur: Etablissement public administratif SHOM
  • Durée: September 2023 - September 2027
  • Résumé: Représentation géostatistique de champs de célébrité par paysages sonores homogènes

Projets passés

    • Nom complet: Méthodes statistiques, data-mining et apprentissage 2021
    • Durée: December 2020 - December 2021
    • Résumé:

    • Nom complet: CIFRE VELVET
    • Durée: December 2019 - December 2023
    • Résumé:

    • Nom complet: NEZ ELECTRONIQUE
    • Durée: June 2017 - May 2018
    • Résumé:

    • Nom complet: CRM SERVICE 2017-2018
    • Durée: June 2017 - June 2018
    • Résumé:

    • Nom complet: PRESIDIO
    • Durée: January 2015 - July 2019
    • Résumé:

    • Nom complet: Analyse de données issues des patients COVID-19/embolie pulmonaire
    • Durée: June 2020 - June 2021
    • Résumé:

    • Nom complet: EARLY METRICS
    • Durée: May 2017 - September 2019
    • Résumé:

    • Nom complet: IMPACT MDS
    • Durée: November 2019 - October 2021
    • Résumé:

    • Nom complet: MEDIATECH Nafise GOUARD
    • Durée: May 2018 - April 2021
    • Résumé:

    • Nom complet: SOCIETE EARLY METRICS 2
    • Durée: May 2021 - February 2023
    • Résumé:

    • Nom complet: CIFRE UTAC 2021-2024
    • Durée: July 2021 - July 2024
    • Résumé: L'objectif est la recherche de méthodes d’analyse statistique et d’algorithmes d’apprentissage automatique et intelligence artificielle pour la surveillance du contrôle technique automobile.

    • Nom complet: conception et Développement des Jeux Pervasifs Adaptables avec la prise en compte des Etats Emotionnels des Joueurs
    • Durée: January 2022 - December 2022
    • Résumé: Le projet vise à prendre en considération les états émotionnels des utilisateurs en temps réel pour mieux adapter leurs environnements, leurs interactions... En particulier dans ce projet, ceci est appliqué en milieu pervasif.

    • Nom complet: PRivAcy-preserving LocalIzation with MachiNE Learning in IoT
    • Durée: January 2022 - December 2022
    • Résumé: Le projet vise à proposer une solution de localisation des objets connectés tout en assurant la sécurité de cette information en utilisant des algorithmes de machien learning.

    • Nom complet: Soutien équipe MSDMA 2022
    • Durée: January 2022 - December 2022
    • Résumé:

    • Nom complet: Dotation MSDMA 2023
    • Durée: January 2023 - December 2023
    • Résumé:

    • Nom complet: PEX Praline 2023
    • Durée: January 2023 - December 2023
    • Résumé:

    • Nom complet: MIXQUEBEC
    • Durée: February 2022 - December 2023
    • Résumé:

Haut