IP Paris

Master Data AI

A Master in Computer Science at Institut Polytechnique de Paris


M1 Students are invited to follow introductory courses and M2 students are invited to follow more advanced courses. However, unless explicitly specified in the course description, all courses are open for both M1 and M2 students.

Group "Softskills"

Softskills seminar (M2 only) (PDV_5DA05_TP)

This course is taught by Fabian Suchanek.

Students learn how to give good presentations, and present scientific papers. This is an obligatory course of the M2 DataAI.

Course webpage

Group "Ethics"

AI Ethics (HSS_5DA06_TP)

This course is taught by Tiphaine Viard, Thomas Le Goff, Sophie Chabridon, Ada Diaconescu, Fabian Suchanek.

Socio-environmental issues of AI, intro to the AI Act, ethical issues/fundamental rights, explainability, privacy and security This course is scheduled on Tuesday afternoon in P2 (between 19/11/24 and 28/01/25, with no class 17/12/24, 24/12/24 and 31/12/24)

Group "Data AI basics"

Data AI basics (CSC_5DA00_TP)

This course is taught by Tiphaine Viard, Louis Jachiet, Nils Holzenberger, Jean-Louis Dessalles, ....

This is an introductory course to many subjects in math/CS. There will be no exams for this course but also no ECTS.

Course webpage

Group "Logics"

Logic, Knowledge Representation and Probabilities (CSC_0EL07_TP)

This course is taught by Nils Holzenberger.

Topics will include: - Prolog (recursivity, backtracking, unification) and DeepProbLog - Formal Logic (propositions, predicates, proof by refutation) - Natural language processing (DCG, parsing through unification) - Symbolic machine learning (symbolic induction, complexity minimum) - Knowledge representation (description logics, ontologies, semantic Web) - Probabilistic programming, binary and sentential decision diagrams, Boolean formulas

Course webpage

Logics and Symbolic AI (APM_5AI01_TP)

This course is taught by Thomas Bonald.

This course aims at providing the bases of symbolic AI, along with a few selected advanced topics. It includes courses on formal logics, ontologies, symbolic learning, typical AI topics such as revision, merging, etc., with illustrations on preference modeling and image understanding.

Prerequisites: Basic knowledge in algebra

Group "Databases"

Database management systems (CSC_51053_EP)

This course is taught by Ioana Manolescu.

Relational databases: ER modeling, SQL, query execution, query optimization, schema refinement, application programming

Course webpage

Databases (CSC_4SD02_TP)

This course is taught by Mehwish Alam.

Relational databases: ER modeling, SQL, query execution, query optimization, schema refinement, application programming, graph databases

Group "Big Data Systems"

Big Data Infrastructures (TSP-CSC5003-1)

This course is taught by Julien Romero, Amel Bouzeghoub.

The course CSC 5003-1 – Big Data Infrastructure is a third-year course in an engineer school (level Master 2) given at Télécom SudParis. At the end of this course, a student will be able to setup a big data infrastructure using tools from the Hadoop ecosystem. In details, a student will know how to: 1. program in functional style using Scala 2. use the MapReduce framework to parallelize computations 3. explore and manipulate the Hadoop Distributed File System 4. process a data stream using Kafka and Spark Streaming 5. choose the right tools from the Hadoop ecosystem to solve a given problem

Prerequisites: Programming; Prior knowledge of functional programming can help

Course webpage

Big Graph Databases (ECE_5DA04_TP)

This course is taught by Garima Gaur, Madhulika Mohanty, Georgios Siachamis (Inria).

The course presents the main architectures and algorithms used for large-scale data management, in particular for graph databases. We will consider graph querying via structured queries, (semi)-structured search, also when reasoning is involved (on semantic graphs). The course will cover modern data management architectures and systems, such as in-memory databases, cloud databases, and query processing in shared-nothing, Map-Reduce clusters.

Prerequisites: Algorithms and complexity; one database class; a systems and a logic course would also be a plus.

Systems for Big Data (CSC_52083_EP)

This course is taught by Yanlei Diao.

This course covers the design principles and algorithmic foundation of advanced systems for Big Data Analytics. The course includes the design of large enterprise data warehouses, column stores, online-analytic processing, online aggregation, data mining, and data exploration over large databases. It also covers modern scalable analytical systems, including parallel databases, MapReduce, Spark for unified data analytics (integrating SQL, ML, graph analytics,) and cloud computing. It finally presents the recent research trends of bringing large-scale machine learning to bear on the design of highly-performant big data systems.

Prerequisites: The main background knowledge required for INF583 includes the relational operators, SQL, storage, and transaction processing. We also expect that they are aware of notions such as query plans and query optimisation, even though INF583 will not build on top of them, except a very small part of it. These prerequisites can be fulfilled by a typical database class such as INF553 (more in depth) or SD202 (more lightweight).

Course webpage

Group "Machine Learning"

Machine Learning with Graphs (MAP_670I_TP)

This course is taught by Jhony H. Giraldo.

Graph data is ubiquitous. Any system with entities and relationships between them can be represented as a graph. Over the past decade, machine learning algorithms have made remarkable progress in fields such as natural language processing, computer vision, and speech recognition. This success is primarily due to deep neural network architectures' ability to extract high-level features from Euclidean-structured data like images, text, and audio. However, graph data has not received the same level of attention. In this course, we will explore how to create machine learning models to extract high-level features from graph data, a process known as graph representation learning. The topics covered in this course include graph neural networks (GNNs), such as graph convolutions and graph attention mechanisms, scalable GNNs for big data applications, recommender systems using GNNs, spatiotemporal data analysis with GNNs, adn graph generation. This course also includes laboratory sessions to provide hands-on experience with these concepts.

Prerequisites: Deep learning bascis (neural networks, convolutional neural networks), PyTorch basics.

Course webpage

Machine & Deep Learning Introduction (CSC_51054_EP)

This course is taught by Michalis Vazirgiannis.

The Machine Learning Pipeline Data Preprocessing and Exploration Feature Selection/Engineering & Dimensionality reduction Supervised Learning, Deep and Reinforcement Learning, Unsupervised Learning. This course will *probably* be scheduled on Monday on X calendar

Course webpage

Machine Learning: Shallow & Deep Learning (CSC_5DA01_TP)

This course is taught by Mounîm A. El Yacoubi.

Statistical Data Analysis (PCA, LDA), Unsupervised Learning, Clustering, Supervised Learning, Neural Networks / Deep Learning, Hidden Markov Mdoels (HMM), Restricted Boltzmann Machines, Support Vector Machines (SVM), Decision Trees, Random Forest, Boosting,Transfer Learning, Deep Reinforcement Learning, Introduction to LLM/ChatGPT

Prerequisites: Basics of Probability and Statistics; Basics of Algebra and Calculus

Advanced Deep Learning (CSC_52087_EP)

This course is taught by Vicky Kalogeiton, Johannes Lutzeyer, Michalis Vazirgiannis (LIX).

The primary goal of this course is to introduce students to advanced principles of deep learning, including mathematical foundations, architecture design, and practical applications. This course is particularly relevant given the current state of the job market, where deep learning skills are in high demand in many industries, including tech, finance, healthcare, and entertainment. ECTS:5, Language: English

Prerequisites: Basic concepts of Deep Learning

Course webpage

Group "Fully optional courses"

Collective Intelligence (HSS_5DA06_TP)

This course is taught by Ada Diaconescu.

The course provides an introduction to decentralised / collective intelligence, including concepts of: system self-adaptation, self-organisation, autonomic control, multi-scale feedbacks and agent-based modelling (MBA). Evaluation will rely on a practical project developed using a multi-agent simulation platform

Prerequisites: Good programming skills (any imperative language, like prolog, C, C++, Java, etc); notions that may help: control theory (also including robotics, automates, autonomous systems), AI (both symbolic and data-oriented); system modelling.

Course webpage

Image mining and content-based retrieval (APM_5DA03_TP)

This course is taught by Antoine Manzanera (ENSTA), Gianni Franchi (ENSTA), Flora Weissgerber (Onera).

Image mining and content-based retrieval

Prerequisites: Linear Algebra, Differential Calculus, Probability and Statistics, Signal Processing

Course webpage

Information Theory A: Introduction (=ACCQ202) (APM_4AC02_TP)

This course is taught by Aslan Tchamkerten.

In this course, we present two of Shannon's celebrated results related to information compression and transmission. More recent developments and applications to other domains will be discussed as time permits.

Prerequisites: Probability

Course webpage

Introduction to statistical learning (APM-0EL05-TP)

This course is taught by Aslan Tchamkerten.

Why ML? ML and the broader landscape, ML vs. AI, PAC/APAC model of learning, supervised and unsupervised learning as special cases, ERM, No Free Lunch Theorem. Learning through uniform convergence, shattering, VC dimension. What can/cannot be learned, statistical vs. computational complexity of learning, Linear separators, Linear regression, logistic regression, Model selection/validation, K-NN, K-Means

Prerequisites: Probability

Error correcting codes (APM_4AC04_TP)

This course is taught by Aslan Tchamkerten.

This course presents some of the main classes of error correcting codes, by following a historical development. This is a companion course to Information Theory A.

Prerequisites: Linear algebra, algebra, probability

Course webpage

Randomization in Computer Science: Games, Graphs and Algorithms (CSC_52061_EP)

This course is taught by Benjamin Doerr.

Introduction to randomized methods in computer science, covering topics like randomized algorithms, random graphs, and randomized search heuristics (e.g., genetic algorithms).

Prerequisites: Basic maths

Course webpage

Navigation for autonomous systems (CSC_54456_EP)

This course is taught by D. Filliat.

We will give an overview of algorithmic aspects of Mobile Robotics and autonomous vehicles. We will cover the most common robotics platform and sensors (vision, 3D ultrasound, accelerometers, odometry) and the various navigation components: control; obstacle avoidance; localization; mapping (SLAM) and planning along with  filtering (Kalman filter, particle filtering  etc ...) and optimisation techniques used in these areas. BEWARE, despite being hosted at X, this course is only 24h so 2.5 ECTS!

Prerequisites: Basics in algebra an python

Course webpage

Programming with GPU for Deep Learning (CSC_5AI07_TP)

This course is taught by Elisabeth Brunet, Goran Frehse.

This course gives an introduction to GPU programming techniques used for deep learning. Starting from the ground up with basic matrix operations, students will develop code to implement classifiers based on gradient descent. Programs are written in C and use the CUDA API from Nvidia to access the GPU.

Course webpage

Data Visualization (CSC_51052_EP)

This course is taught by Emmanuel Pietriga (INRIA).

This course first gives an overview of the field of data visualization. It then discusses fundamental principles of human visual perception, focusing on how they help inform the design of visualizations. The following sessions focus on visualization techniques for specific data structures, and discuss them in depth from both design and implementation perspectives, including: multi-variate data, hierarchical structures, networks, time-series, statistical data and geographical data. All exercises are based on Web technologies, including the D3 software library (Data-Driven Documents) and the Vega-lite interactive graphics grammar. While positioned at different levels of abstraction, both enable developers to create a wide range of interactive, Web-based visualizations that run on a variety of platforms, ranging from desktop workstations to mobile devices.

Prerequisites: Basic knowledge of Web programming tech is a plus but not a requirement

Course webpage

Knowledge Base Construction (CSC_5DA09_TP)

This course is taught by Fabian Suchanek.

Language Models have revolutionized natural language processing. Yet, they can say wrong things in a very convincing way -- they hallucinate. One solution to this problem can come from structured data such as knowledge bases, which can serve to correct and inform the model. In this class, we will see how to bridge the gap between natural language (the sentence “Elvis is alive”) and structured information (the statement <i>alive(Elvis)</i>). We will cover the technical steps of information extraction: named entity recognition, entity disambiguation, and fact extraction. For each of them, we will see different methods: fine-tuning language models, prompt engineering, and training-free procedures. Finally, we will talk about techniques for knowledge cleaning: link prediction, entity alignment and rule mining. https://suchanek.name/work/teaching/kbc-2024/index.html

Course webpage

AI for Sound: analysis, processing and generation (APM_5DS20_TP)

This course is taught by Geoffroy Peeters, Gael Richard.

"AMIR presents the technologies (Signal Processing, Machine Learning and Deep Learning) used for processing, retrieving and generating audio (speech, music and environmental sounds). Each of the 10 blocks includes a lecture, a practical application and a keynote from the industry (Sony, Adobe, Spotify, ...) "

Prerequisites: Deep Learning

Course webpage

Robust Computer vision with deep learning, XAI, Uncertainty quantification (CSC_5IA23_TA)

This course is taught by Gianni Franchi (ENSTA).

In today's digital age, computer vision plays a crucial role in numerous applications, ranging from image and video recognition to autonomous vehicles and augmented reality. This course aims to equip students with the knowledge and skills required to tackle complex visual tasks using cutting-edge techniques and models. The Advanced Computer Vision course is designed to provide students with a comprehensive understanding of state-of-the-art techniques and methodologies in computer vision. Through a combination of theoretical concepts and hands-on practical assignments, students will gain expertise in deep neural networks, generative models, uncertainty modeling, tracking, semi-supervised learning, and self-supervised learning. Throughout the course, students will work on hands-on projects and assignments to reinforce their understanding of the concepts covered. By the end of the course, students will be equipped with the skills to design, implement, and deploy advanced computer vision systems using deep neural networks.

Prerequisites: Linear Algebra, Differential Calculus, Probability and Statistics, Signal Processing

Course webpage

Emergence in Complex Systems (MOB_0AT09_TP)

This course is taught by Ada Diaconescu, J.-L. Dessalles.

The course will cover several collective phenomena, including: evolution theory, collective decision, the hawk-dove dilemma, cooperation, emergence of segregationism, altruism, the "tragedy of the commons", the "green-beard" effect, social coordination, suicide "for the group", honest communication, charity and competitive helping. Several theoretical models will be studied, including preferential attachment, kin selection, the Prisoner’s dilemma, the handicap principle, social signaling.

Prerequisites: Some basic knowledge of Python & object-oriented programming.

Course webpage

Algorithmic information and artificial intelligence (CSC_5AI25_TP)

This course is taught by Nils Holzenberger,jean-louis Dessalles.

The notion of complexity has been invented 50 years ago to solve mathematical issues related to machine learning, randomness and proof theory. It led to the development of Algorithmic Information Theory (AIT). Complexity theory and algorithmic information theory (AIT) have recently been shown to provide new perspectives on machine learning and human intelligence.

Prerequisites: Basic programming skills in Python

Course webpage

Reinforcement Learning and Autonomous Agents (CSC_52081_EP)

This course is taught by Jesse Read.

This course blends together topics in Probabilistic Machine Learning, Deep Learning, and Sequential prediction and decision making, with a focus on Reinforcement learning (Q-Learning, Deep Q-Learning, Policy-Gradient Methods, Actor-Critic Methods).

Prerequisites: This course builds on any course providing introductory concepts of Machine Learning, e.g., INF554.

Graph Machine and Deep Learning for Generative AI (CSC_52072_EP)

This course is taught by Johannes Lutzeyer, Michalis Vazirgiannis.

In this course we introduce you to a variety of machine and deep learning methodology to process graph-structured data. We define graph-structured data to refer to the combination of an underlying graph (or network) structure on which vectorial data is observed at the nodes, edges or both. This data type is frequently observed in practice and hence a multitude of methods have been defined to learn from it. In this course, we will review fundamental summary statistics of graphs and probabilistic models to generate graphs; we will introduce you to graph kernel methods and then move on to provide you a comprehensive overview of deep learning methodology, notably Graph Neural Networks among others. We end the course with an review of applications of the introduced methodology and an outlook on current challenges and future directions in the domain of machine and deep learning on graph structured data.

Prerequisites: Deep Learning

Language Models and Structured Data (CSC_5AI29_TP)

This course is taught by Mehwish Alam.

Beyond the traditional applications of Language Models in natural language processing oriented tasks such as sentiment analysis, fake news detection, etc., the language models have been leveraged across a broad spectrum of other tasks involving structured data such as graphs, databases, tables, etc. This course is tailored to take into account the merits and demerits of employing language models and conventional approaches for tackling tasks related to structured data. Starting with an exploration of basic concepts in language modeling, including prompt engineering and retrieval augmented generation, the curriculum progressively will move towards the inter play between language models and structured data. This course will further focus on diverse applications such as learning representations over tables and graphs, language models as knowledge bases, Text to SQL, Question Answering over Structured Data.

Language Modeling (CSC_5AI30_TP)

This course is taught by Mehwish Alam, Maria Boritchev, Fabian Suchanek, Matthieu Labeau, Nils Holzenberger,.

Introduction to Language and NLP, Theory of language modeling, first working models, Neural Language Models, Masked Language Models, Large Language Models, Low-rank Adaptation, Ethical Aspects of LLMs

Text Mining and NLP (CSC_52082_EP)

This course is taught by M. Vazirgiannis, Buscaldi.

Text preprocessing and Information Retrieval, graph-of-words, keyword extraction, Text categorization, topic modeling, supervised document classification, Word and document embeddings, unsupervised document classification with the Word Mover's Distance, Advanced deep learning architectures for NLP seq to seq tasks (HAN, ELMO, BERT/Transformer...), Lexical statistics and n-gram models, Sequence Labeling: Named Entity Recognition, POS-tagging, Introduction to Parsing, elements of Machine Translation, Semantics - Knowledge Bases, Relation Extraction

Course webpage

Explainable and Trustworthy AI (CSC_5DA02_TP)

This course is taught by Mounîm A. El Yacoubi.

Explainability and Interpretability of Machine / Deep Learning Models; Explanation Methods of Machine Learning models as black boxes: LIME, Shapley Values, SHAP, Counterfactual Explanations; Interpretation of Neural Networks as white boxes: Sensitivity Analysis, Layer-wise Relevance Propagation (LRP), The RETAIN architecture; Adversarial Learning, Targeted and Non-Targeted Adversarial Attacks, Defense against Adversarial Attacks; Verification of the Robustness of neural Networks.

Prerequisites: Knowledge of the basic concepts of Machine Learning and Deep Learning

Large-scale Generative Models for NLP and Speech Processing (CSC_5AI27_TP)

This course is taught by Nils Holzenberger, Mehwish Alam.

Natural language processing has given rise to innumerable industrial applications. While many new tasks have emerged in NLP and speech processing over the last decades, methods to solve them have increasingly converged towards a unified modeling paradigm. In this course, we will use sequence-to-sequence modeling to delve into state-of-the-art statistical machine learning methods — convolutional neural networks, recurrent neural networks, attention, transformers — and apply them to major NLP and speech processing tasks — language modeling, machine translation, speech recognition, information extraction. Students should expect to get an in-depth understanding of these methods, through theoretical analysis and hands-on lab sessions. Grading will involve a project, to be carried out over the course of the class. Topics to be covered 1. Recurrent Neural Networks 2. Hidden Markov models 3. Attention Mechanisms 4. Transformers 5. Convolutional Neural Networks 6. Language Modeling

Prerequisites: Probability theory, Python programming

Representation Learning for Computer Vision and Medical Imaging (APM_5DA13_TP)

This course is taught by Pietro Gori (TP), Loic le Folgoc (TP).

Good and expressive data representations can improve the accuracy of machine learning problems and ease interpretability adn transfer. For vision tasks, handcrafting good data representations, a.k.a. feature engineering, was traditionally hard. Deep Learning has changed this paradigm by allowing to automatically discover good representations from data. This is known as representation learning. The objective of this course is to provide an introduction to representation learning in computer vision and medical imaging applications. Standard approaches to representation learning exploit the inductive bias of Convolutional Neural Networks and the supervision of labeled data. Since labeled data is scarce compared to raw data, recent work has turned to unsupervised and self-supervised techniques to boost the expressive power of representations. Furthermore alternatives to CNNs inspired by advances in NLP have been proposed, such as vision transformers. In a different development, causal representations, leveraging causal relationships in the data, allow to answer additional queries (causal effects, interventions, counterfactuals) compared to standard statistical models. All of these developments will be covered in the course. 'Each lecture is followed by a practical lab on the corresponding content where students learn to implement these techniques using the PyTorch framework.

Prerequisites: Introductory course of Deep Learning, Computer Vision, Linear Algebra, Calculus, Probability, Statistics, Image processing, Python, Pytorch

Course webpage

Learning for robotics (CSC_5IA05_TA)

This course is taught by Sao Mai NGUYEN.

Learning methods used in robotics and applications to human / robot interaction, learning by demonstration or autonomous learning: imitation learning, reinforcement learning, human motion analysis

Prerequisites: Basic machine learning and principles of deep learning, pytorch

Course webpage

Topological Data Analysis (CSC_51056_EP)

This course is taught by Steve Oudot.

This course is an introduction to the field of topological data analysis, whose aim is to use concepts and tools from algebraic topology to design or learn new data representations for machine learning. Topics covered include field homology, persistent homology, Reeb graphs, and their application to data analysis.

Prerequisites: Linear algebra; Point set topology; Algorithms and complexity

Course webpage

Introduction to the verification of neural networks (CSC_54441_EP)

This course is taught by Sylvie Putot, Eric Goubault.

Neural networks are widely used in numerous applications including safety-critical ones such as control and planning for autonomous systems. A central question is how to verify that they are correct with respect to some specification. Beyond correctness or robustness, we are also interested in questions such as explainability and fairness, that can in turn be specified as formal verification problems. In this course, we will see how formal methods approaches introduced in the context of program verification can be leveraged to address the verification of neural networks. BEWARE, despite being hosted at X, this course is only 24h so 2.5 ECTS!

Course webpage

Reinforcement Learning (APM_5AI18_TP)

This course is taught by Thomas Bonald.

This is an introduction to reinforcement learning: Markov Decision Process, Bellman's equation, bandit algorithms, Q-learning, TD-learning, Monte-Carlo tree search. Applications to games and to recommender systems will be presented.

Prerequisites: Probability theory, Python programming

Graph Learning (CSC_4SD04_TP)

This course is taught by Thomas Bonald.

The focus of this course is on the analysis of large graphs. You will learn how to represent graphs efficiently as sparse matrices. You will apply some key algorithms to real graphs, for clustering, ranking, classifying and embedding nodes, including graph neural networks.

Prerequisites: Basics on graphs, probability theory, linear algebra, Python programming.