IP Paris

Data AI master

A Computer science master of the the Institut Polytechnique de Paris


M1 Students are invited to follow introductory courses while M2 students are invited to follow more advanced courses but, unless explicitly specified in the course description, all courses are opened for both M1 and M2 students.

Group "Softskills"

Softskills seminar (M2 only) (PDV_5DA05_TP)

Go to course webpage. This course is taught by Fabian Suchanek.

Students learn how to give good presentations, and present scientific papers. This is an obligatory course of the M2 DataAI.

Program:

Group "Ethics"

AI Ethics (HSS_5DA06_TP)

This course is taught by Maxwell Winston, Sophie Chabridon, Ada Diaconescu, Fabian Suchanek.

Algorithmic fairness, into to the AI Act, ethical issues/fundamental rights, explainability, privacy and security This course is scheduled on Tuesday afternoon in P2 (between 19/11/24 and 28/01/25, with no class 17/12/24, 24/12/24 and 31/12/24)

Program:

Group "Data AI basics"

Data AI basics (CSC_5DA00_TP)

Go to course webpage. This course is taught by Tiphaine Viard, Louis Jachiet, Nils Holzenberger, Jean-Louis Dessalles.

This is an introductory course to many subjects in math/CS. There will be no exams for this course but also no ECTS.

Program:

Group "Logics"

Neuro-Symbolic Artificial Intelligence (APM_4EL20_TP)

Go to course webpage. This course is taught by Nils Holzenberger.

Topics will include: - Prolog (recursivity, backtracking, unification) and DeepProbLog - Formal Logic (propositions, predicates, proof by refutation) - Natural language processing (DCG, parsing through unification) - Symbolic machine learning (symbolic induction, complexity minimum) - Knowledge representation (description logics, ontologies, semantic Web) - Probabilistic programming, binary and sentential decision diagrams, Boolean formulas

Program:

Logics and Symbolic AI (APM_5AI01_TP)

Go to course webpage. This course is taught by Isabelle Bloch.

This course aims at providing the bases of symbolic AI, along with a few selected advanced topics. It includes courses on formal logics, ontologies, symbolic learning, typical AI topics such as revision, merging, etc., with illustrations on preference modeling and image understanding.

Prerequisites: Basic knowledge in algebra

Program:

Group "Databases"

Databases (CSC_4SD02_TP)

This course is taught by Mehwish Alam.

Relational databases: ER modeling, SQL, query execution, query optimization, schema refinement, application programming, graph databases

Program:

Database management systems (CSC_51053_EP)

Go to course webpage. This course is taught by Ioana Manolescu.

Relational databases: ER modeling, SQL, query execution, query optimization, schema refinement, application programming

Group "Big Data Systems"

Big Graph Databases (ECE_5DA04_TP)

This course is taught by Ioana Manolescu, Garima Gaur, Madhulika Mohanty (Inria).

The course presents the main architectures and algorithms used for large-scale data management, in particular for graph databases. We will consider graph querying via structured queries, (semi)-structured search, also when reasoning is involved (on semantic graphs). The course will cover modern data management architectures and systems, such as in-memory databases, cloud databases, and query processing in shared-nothing, Map-Reduce clusters.

Prerequisites: Algorithms and complexity; one database class; a logic course would also be a plus.

Program:

Big Data Infrastructures & semantic networks (TSP-CSC5003)

Go to course webpage. This course is taught by Julien Romero, Amel Bouzeghoub.

The course CSC 5003-1 – Big Data Infrastructure is a third-year course in an engineer school (level Master 2) given at Télécom SudParis. At the end of this course, a student will be able to setup a big data infrastructure using tools from the Hadoop ecosystem. In details, a student will know how to: 1. program in functional style using Scala 2. use the MapReduce framework to parallelize computations 3. explore and manipulate the Hadoop Distributed File System 4. process a data stream using Kafka and Spark Streaming 5. choose the right tools from the Hadoop ecosystem to solve a given problem THIS COURSE COUNTS FOR 5 ECTS (as a double course)

Prerequisites: Programming; Prior knowledge of functional programming and Java can help

Program:

Systems for Big Data (CSC_52083_EP)

This course is taught by Oana BALALAU, Pierre Bourhis, Yanlei Diao.

?

Prerequisites: The main background knowledge required for INF583 includes the relational operators, SQL, storage, and transaction processing. We also expect that they are aware of notions such as query plans and query optimisation, even though INF583 will not build on top of them, except a very small part of it. These prerequisites can be fulfilled by a typical database class such as INF553 (more in depth) or SD202 (more lightweight).

Group "Machine Learning"

Machine Learning: Shallow & Deep Learning (CSC_5DA01_TP)

This course is taught by Mounim El Yacoubi.

Statistical Data Analysis (PCA, LDA), Unsupervised Learning, Clustering, Supervised Learning, Neural Networks / Deep Learning, Hidden Markov Mdoels (HMM), Restricted Boltzmann Machines, Support Vector Machines (SVM), Decision Trees, Random Forest, Boosting,Transfer Learning, Deep Reinforcement Learning, Introduction to LLM/ChatGPT

Prerequisites: Basics of Probability and Statistics; Basics of Algebra and Calculus

Program:

Advanced Machine Learning and Autonomous Agents (CSC_52081_EP)

This course is taught by Jesse Read.

This course blends together topics in Probabilistic Machine Learning, Deep Learning, and Sequential prediction and decision making, with a focus on Reinforcement learning (Q-Learning, Deep Q-Learning, Policy-Gradient Methods, Actor-Critic Methods).

Prerequisites: This course builds on any course providing introductory concepts of Machine Learning, e.g., INF554.

Program:

Advanced Deep Learning (CSC_52087_EP)

Go to course webpage. This course is taught by Vicky Kalogeiton, Johannes Lutzeyer, Michalis Vazirgiannis (LIX).

The primary goal of this course is to introduce students to advanced principles of deep learning, including mathematical foundations, architecture design, and practical applications. This course is particularly relevant given the current state of the job market, where deep learning skills are in high demand in many industries, including tech, finance, healthcare, and entertainment. ECTS:5, Language: English

Prerequisites: Basic concepts of Deep Learning

Machine Learning with Graphs (MAP_670I_TP)

Go to course webpage. This course is taught by Jhony H. Giraldo.

Graph data is ubiquitous. Any system with entities and relationships between them can be represented as a graph. Over the past decade, machine learning algorithms have made remarkable progress in fields such as natural language processing, computer vision, and speech recognition. This success is primarily due to deep neural network architectures' ability to extract high-level features from Euclidean-structured data like images, text, and audio. However, graph data has not received the same level of attention. In this course, we will explore how to create machine learning models to extract high-level features from graph data, a process known as graph representation learning. The topics covered in this course include graph neural networks (GNNs), such as graph convolutions and graph attention mechanisms, scalable GNNs for big data applications, recommender systems using GNNs, and spatiotemporal data analysis with GNNs. This course also includes laboratory sessions to provide hands-on experience with these concepts.

Prerequisites: Deep learning bascis (neural networks, convolutional neural networks), PyTorch basics.

Program:

Group "Fully optional courses"

Learning for robotics (ENSTA - IA305)

This course is taught by S.M. Nguyen.

Learning methods used in robotics and applications to human / robot interaction, learning by demonstration or autonomous learning: imitation learning, reinforcement learning, human motion analysis

Decision Procedures for Artificial Intelligence (ENSTA-INF656L)

Go to course webpage. This course is taught by Alexandre Chapoutot,Sergio Mover.

Reasoning automatically about logical formulas is crucial in solving problems in Artificial Intelligence (e.g., path and task planning) and Formal Methods (e.g., software verification). This course will present the modern, efficient algorithms (decision procedures) used to check the satisfiability (SAT) of formulas in propositional logics (e.g., Conflict Driven Clause Learning, CDCL) and the extensions of these algorithms to check more expressive first-order-logic formulas (Satisfiability Modulo Theory, SMT). The course will also present how logical modeling and satisfiability can solve problems in AI (Logical Knowledge-based agent) and formal methods (software verification). In detail, the tutorial will cover problems such as path planning, task planning, and bounded model checking to illustrate theoretical notions and practical implementation of algorithms.

Prerequisites: linear algebra, python programming

Program:

Robust Computer vision with deep learning, XAI, Uncertainty quantification (ENSTA- IA323)

Go to course webpage. This course is taught by Gianni Franchi (ENSTA).

In today's digital age, computer vision plays a crucial role in numerous applications, ranging from image and video recognition to autonomous vehicles and augmented reality. This course aims to equip students with the knowledge and skills required to tackle complex visual tasks using cutting-edge techniques and models. The Advanced Computer Vision course is designed to provide students with a comprehensive understanding of state-of-the-art techniques and methodologies in computer vision. Through a combination of theoretical concepts and hands-on practical assignments, students will gain expertise in deep neural networks, generative models, uncertainty modeling, tracking, semi-supervised learning, and self-supervised learning. Throughout the course, students will work on hands-on projects and assignments to reinforce their understanding of the concepts covered. By the end of the course, students will be equipped with the skills to design, implement, and deploy advanced computer vision systems using deep neural networks.

Prerequisites: Linear Algebra, Differential Calculus, Probability and Statistics, Signal Processing

Emergence in Complex Systems (MOB_0AT09_TP)

Go to course webpage. This course is taught by J.-L. Dessalles.

"The course will cover several social phenomena, including: evolution theory, collective decision, the hawk-dove dilemma, cooperation, emergence of segregationism, altruism, the ""tragedy of the commons"", the ""green-beard"" effect, social coordination, suicide ""for the group"", honest communication, charity and competitive helping. Several theoretical models will be studied, including preferential attachment, kin selection, the Prisoner’s dilemma, the handicap principle, social signaling."

Prerequisites: Some basic knowledge of Python & object-oriented programming.

Program:

Explainable and Trustworthy AI (CSC_5DA02_TP)

This course is taught by Mounim A. El Yacoubi.

Explainability and Interpretability of Machine / Deep Learning Models; Explanation Methods of Machine Learning models as black boxes: LIME, Shapley Values, SHAP, Counterfactual Explanations; Interpretation of Neural Networks as white boxes: Sensitivity Analysis, Layer-wise Relevance Propagation (LRP), The RETAIN architecture; Adversarial Learning, Targeted and Non-Targeted Adversarial Attacks, Defense against Adversarial Attacks; Verification of the Robustness of neural Networks.

Prerequisites: Knowledge of the basic concepts of Machine Learning and Deep Learning

Program:

Image mining and content-based retrieval (APM_5DA03_TP)

Go to course webpage. This course is taught by Antoine Manzanera (ENSTA), Gianni Franchi (ENSTA), Flora Weissgerber (Onera), TBC.

This course deals with visual data (images and videos), and talks about image representation, processing and indexing, for content-based retrieval purposes. - It starts from image data and their different models, from mathematical and algorithms viewpoints, by exploring the different models: frequency-, discrete-, or set-based, differential, or statistical... - It presents segmentation and feature extraction techniques, i.e. how to reduce the representation support, and what local and global representations can be used to describe the image content. - Practical Work #1 deals with salient point detection, description and matching - Approximately on third of the course is dedicated to classification, detection and image recognition techniques based on machine learning, using CNN (one session) and other unsupervised and supervised techniques (one session). - One session is dedicated to a significant use case: satellite image mining. - One session is on video analysis and the importance of motion in video mining, with an emphasis on object tracking methods. - Practical Work #2 is on object tracking in videos. The practical works use Python, OpenCV and Pytorch. The evaluation is based on the 2 reports on the practical works (Weight 0.5), and a theoretical exam (Weight 0.5) This course is scheduled on Wednesday of P2 (November-December)

Prerequisites: Linear Algebra, Differential Calculus, Probability and Statistics, Signal Processing

Program:

Collective Intelligence (CSC_5DA07_TP)

Go to course webpage. This course is taught by Ada Diaconescu.

The course provides an introduction to decentralised / collective intelligence, including concepts of: system self-adaptation, self-organisation, autonomic control, multi-scale feedbacks and agent-based modelling (MBA). Evaluation will rely on a practical project developed using a multi-agent simulation platform

Prerequisites: Good programming skills (any imperative language, like prolog, C, C++, Java, etc); notions that may help: control theory (also including robotics, automates, autonomous systems), AI (both symbolic and data-oriented); system modelling.

Program:

Knowledge Base Construction (CSC_5DA09_TP)

Go to course webpage. This course is taught by Fabian Suchanek.

Language Models have revolutionized natural language processing. Yet, they can say wrong things in a very convincing way -- they hallucinate. One solution to this problem can come from structured data such as knowledge bases, which can serve to correct and inform the model. In this class, we will see how to bridge the gap between natural language (the sentence “Elvis is alive”) and structured information (the statement <i>alive(Elvis)</i>). We will cover the technical steps of information extraction: named entity recognition, entity disambiguation, and fact extraction. For each of them, we will see different methods: fine-tuning language models, prompt engineering, and training-free procedures. Finally, we will talk about techniques for knowledge cleaning: link prediction, entity alignment and rule mining. https://suchanek.name/work/teaching/kbc-2024/index.html

Program:

Deep Learning for Computer Vision (APM_5DA12_TP)

This course is taught by Stéphane Lathuilière, Jhony H. Giraldo.

The course focuses on various advanced topics in the field. Students will delve into areas such as few-shot learning and domain adaptation, exploring techniques that enable models to learn from limited labeled data and adapt to new domains. The course also covers advanced methods for image and video generation and editing, allowing students to gain insights into cutting-edge approaches for creating and manipulating visual content. Classical vision tasks, including object detection and human pose estimation, are extensively studied, providing students with a strong foundation in fundamental computer vision techniques. Additionally, the course delves into video understanding, equipping students with the necessary tools to extract meaningful information from video data. Lastly, students will explore the integration of vision with other sensors, delving into the fusion of visual information with data from other sensing modalities, opening up new possibilities for perception and analysis. The course will be composed of five lectures and two practical sessions.

Prerequisites: Knowledge of the basic concepts of Machine Learning and Deep Learning

Program:

Representation Learning for Computer Vision and Medical Imaging (APM_5DA13_TP)

Go to course webpage. This course is taught by Pietro Gori (TP), Loic le Folgoc (TP).

Good and expressive data representations can improve the accuracy of machine learning problems and ease interpretability adn transfer. For vision tasks, handcrafting good data representations, a.k.a. feature engineering, was traditionally hard. Deep Learning has changed this paradigm by allowing to automatically discover good representations from data. This is known as representation learning. The objective of this course is to provide an introduction to representation learning in computer vision and medical imaging applications. Standard approaches to representation learning exploit the inductive bias of Convolutional Neural Networks and the supervision of labeled data. Since labeled data is scarce compared to raw data, recent work has turned to unsupervised and self-supervised techniques to boost the expressive power of representations. Furthermore alternatives to CNNs inspired by advances in NLP have been proposed, such as vision transformers. In a different development, causal representations, leveraging causal relationships in the data, allow to answer additional queries (causal effects, interventions, counterfactuals) compared to standard statistical models. All of these developments will be covered in the course. 'Each lecture is followed by a practical lab on the corresponding content where students learn to implement these techniques using the PyTorch framework.

Prerequisites: Introductory course of Deep Learning, Computer Vision, Linear Algebra, Calculus, Probability, Statistics, Image processing, Python, Pytorch

Program:

Programming with GPU for Deep Learning (APM_5AI07_TP)

This course is taught by Elisabeth Brunet, Goran Frehse.

This course gives an introduction to GPU programming techniques used for deep learning. Starting from the ground up with basic matrix operations, students will develop code to implement classifiers based on gradient descent. Programs are written in C and use the CUDA API from Nvidia to access the GPU.

Program:

Natural Language Processing (was Machine Learning for Text Mining) (CSC_5AI12_TP)

This course is taught by Matthieu Labeau.

Text mining is a progressing and challenging domain. For example, a lot of efforts have been recently dedicated to the development of methods able to analyze opinion data available on the social Web. The first objective of this course is to tackle the different methods of language processing and machine learning underlying text and opinion mining. During this course, the students will acquire theoretical and technical skill on advanced machine learning methods and natural language processing. This course is designed for students who will be attending classes and labs. The techniques and concepts that will be studied include: -natural language pre-processing : tokenization, part-of-speech tagging, document representation and word embeddings techniques -natural language resources : lexicons, wordnet and framenet -text clustering and text categorization : advanced machine learning methods such as deep learning, hidden markov models, etc.

Prerequisites: Basic knowledge in machine learning

Program:

Reinforcement Learning (APM_5AI18_TP)

Go to course webpage. This course is taught by Thomas BONALD.

This is an introduction to reinforcement learning: Markov Decision Process, Bellman's equation, bandit algorithms, Q-learning, TD-learning, Monte-Carlo tree search. Applications to games and to recommender systems will be presented.

Prerequisites: Probability theory, Python programming

Program:

Sequence-to-Sequence Models for NLP and Speech Processing (APM_5AI27_TP)

This course is taught by Nils Holzenberger, Mehwish Alam.

Natural language processing has given rise to innumerable industrial applications. While many new tasks have emerged in NLP and speech processing over the last decades, methods to solve them have increasingly converged towards a unified modeling paradigm. In this course, we will use sequence-to-sequence modeling to delve into state-of-the-art statistical machine learning methods — convolutional neural networks, recurrent neural networks, attention, transformers — and apply them to major NLP and speech processing tasks — language modeling, machine translation, speech recognition, information extraction. Students should expect to get an in-depth understanding of these methods, through theoretical analysis and hands-on lab sessions. Grading will involve a project, to be carried out over the course of the class. Topics to be covered 1. Recurrent Neural Networks 2. Hidden Markov models 3. Attention Mechanisms 4. Transformers 5. Convolutional Neural Networks 6. Language Modeling

Program:

Mining of Large Datasets (CSC_4SD01_TP)

Go to course webpage. This course is taught by Mauro Sozio.

The course will provide an introduction to data mining and will cover the following topics: clustering, decision trees, ranking, association rules, recommendation systems, introduction to MapReduce and Spark. Students will work on a project where they will implement some of the previously mentioned algorithms in Python or in Spark.

Program:

Graph Learning (CSC_4SD04_TP)

Go to course webpage. This course is taught by Thomas Bonald.

The focus of this course is on the analysis of large graphs. You will learn how to represent graphs efficiently as sparse matrices. You will apply some key algorithms to real graphs, for clustering, ranking, classifying and embedding nodes, including graph neural networks.

Prerequisites: Basics on graphs, probability theory, linear algebra, Python programming.

Program:

Data Visualization (CSC_51052_EP)

Go to course webpage. This course is taught by Emmanuel Pietriga (INRIA).

This course first gives an overview of the field of data visualization. It then discusses fundamental principles of human visual perception, focusing on how they help inform the design of visualizations. The following sessions focus on visualization techniques for specific data structures, and discuss them in depth from both design and implementation perspectives, including: multi-variate data, hierarchical structures, networks, time-series, statistical data and geographical data. All exercises are based on Web technologies, including the D3 software library (Data-Driven Documents) and the Vega-lite interactive graphics grammar. While positioned at different levels of abstraction, both enable developers to create a wide range of interactive, Web-based visualizations that run on a variety of platforms, ranging from desktop workstations to mobile devices.

Prerequisites: Basic knowledge of Web programming tech is a plus but not a requirement

Machine & Deep Learning Introduction (CSC_51054_EP)

Go to course webpage. This course is taught by M. Vazirgiannis.

The Machine Learning Pipeline Data Preprocessing and Exploration Feature Selection/Engineering & Dimensionality reduction Supervised Learning, Deep and Reinforcement Learning, Unsupervised Learning. This course will *probably* be scheduled on Monday on X calendar

Topological Data Analysis (CSC_51056_EP)

Go to course webpage. This course is taught by Steve Oudot.

Objectives : Topological Data Analysis is an emerging trend in exploratory data analysis and data mining. It has known a growing interest and some notable successes in the recent years. The idea is to use topological tools to tackle challenging data sets, in particular data sets for which the observations lie on or close to non­trivial geometric structures that can fool classical techniques. Topological methods are indeed able to extract useful information about these geometric structures from the data, and to exploit that information to enhance the analysis pipeline. The objective of this course is to familiarize the students with this new topic lying at the confluence of pure mathematics, applied mathematics, and computer science. Emphasis is put on the methods and on their theoretical guarantees. Meanwhile, the lab sessions focus on challenging data sets, primarily multimedia data sets such as collections of images or 3d shapes. Content : The course is divided into nine lectures and nine exercise or lab sessions. These cover the main mathematical concepts and algorithmic tools involved in topological data analysis. The topics covered include: dimensionality reduction and its limitations, hierarchical versus density-based clustering, simplicial and singular homology, persistence theory, topological inference for data exploration, topological signatures for data classification, Reeb graphs and Mapper. Suggested readings: Gunnar Carlsson. Topology and Data, Bulletin of the American Mathematical Society Herbert Edelsbrunner and John Harer, Computational Topoogy: An Introduction, AMS press

Text Mining and NLP (CSC_52082_EP)

Go to course webpage. This course is taught by M. Vazirgiannis, Buscaldi.

Text preprocessing and Information Retrieval, graph-of-words, keyword extraction, Text categorization, topic modeling, supervised document classification, Word and document embeddings, unsupervised document classification with the Word Mover's Distance, Advanced deep learning architectures for NLP seq to seq tasks (HAN, ELMO, BERT/Transformer...), Lexical statistics and n-gram models, Sequence Labeling: Named Entity Recognition, POS-tagging, Introduction to Parsing, elements of Machine Translation, Semantics - Knowledge Bases, Relation Extraction

Introduction to the verification of neural networks (CSC_54441_EP)

Go to course webpage. This course is taught by Sylvie Putot, Eric Goubault.

Neural networks are widely used in numerous applications including safety-critical ones such as control and planning for autonomous systems. A central question is how to verify that they are correct with respect to some specification. Beyond correctness or robustness, we are also interested in questions such as explainability and fairness, that can in turn be specified as formal verification problems. In this course, we will see how formal methods approaches introduced in the context of program verification can be leveraged to address the verification of neural networks. BEWARE, despite being hosted at X, this course is only 24h so 2.5 ECTS!

Navigation for autonomous systems (CSC_54456_EP)

Go to course webpage. This course is taught by D. Filliat.

We will give an overview of algorithmic aspects of Mobile Robotics and autonomous vehicles. We will cover the most common robotics platform and sensors (vision, 3D ultrasound, accelerometers, odometry) and the various navigation components: control; obstacle avoidance; localization; mapping (SLAM) and planning along with  filtering (Kalman filter, particle filtering  etc ...) and optimisation techniques used in these areas. BEWARE, despite being hosted at X, this course is only 24h so 2.5 ECTS!

Computer Vision: from Fundamentals to Applications (CSC_52002_EP)

Go to course webpage. This course is taught by Vicky Kalogeiton.

Generative AI (image generation from text, VAE, Diffusion models, Stable Diffusion), Vision Transformers, Self-supervised learning

Prerequisites: X-INF554 (or equivalent): Deep Learning basics, Pytorch basics

Language Models and Strcutured Data (APM_5AI29_TP)

This course is taught by Mehwish Alam.

Beyond the traditional applications of Language Models in natural language processing oriented tasks such as sentiment analysis, fake news detection, etc., the language models have been leveraged across a broad spectrum of other tasks involving structured data such as graphs, databases, tables, etc. This course is tailored to take into account the merits and demerits of employing language models and conventional approaches for tackling tasks related to structured data. Starting with an exploration of basic concepts in language modeling, including prompt engineering and retrieval augmented generation, the curriculum progressively will move towards the inter play between language models and structured data. This course will further focus on diverse applications such as learning representations over tables and graphs, language models as knowledge bases, Text to SQL, Question Answering over Structured Data.

Program:

Kernel Machines (APM_5AI26_TP)

Go to course webpage. This course is taught by Florence d'Alché.

This course gives an advanced and modern presentation of kernel machines and related tools at the light of recent results in Machine Learning. The course requires to have assimilated basics of Statistical Machine Learning and notions in convex programming. 1- Notions on Kernels and Reproducing Kernel Hilbert Space Theory 2- Kernel machines for regression, classification and dimensionality reduction 3- Kernel Machines for complex output prediction 4- Scaling up kernel machines 5- Relationship between kernel machines and neural networks

Prerequisites: Machine Learning (at least one course), Deep Learning (at least one course)

Program: