Ludovic Capelli

Teaching Fellow at EPCC, University of Edinburgh, United Kingdom.

l.capelli@epcc.ed.ac.uk

In a nutshell: passionate about high-performance computing, and equally passionate about teaching it.

Degrees

Feb 2025
↑
Oct 2023

Postgraduate Certificate in Academic practice

From: The University of Edinburgh in Edinburgh, United Kingdom.

Comment: focusses on the development of high quality teaching, also accredited by the Higher Education Academy, where "holders of the Certificate are therefore automatically eligible to become Fellows of the Academy" (source).

Oct 2022
↑
Apr 2018

Doctor of Philosophy in Pervasive Parallelism

From: The University of Edinburgh in Edinburgh, United Kingdom.

Comment: second phase of the Centre for Doctoral Training in Pervasive Parallelism. My research focuses on the optimisation of the vertex-centric programming model for graph processing.

Aug 2017
↑
Sep 2016

Master of Science by Research in Pervasive Parallelism

From: The University of Edinburgh in Edinburgh, United Kingdom.

Comment: first phase of the Centre for Doctoral Training in Pervasive Parallelism.

Aug 2016
↑
Sep 2015

Master of Science in High Performance Computing with Data Science

From: The University of Edinburgh in Edinburgh, United Kingdom.

Comment: awarded first class. Obtained with distinction; awarded by The University of Edinburgh when a student maintains first class grades in every semester indepentently.

Aug 2015
↑
Sep 2013

Bachelor of Science with Honours in Computing Science

From: Edinburgh Napier University in Edinburgh, United Kingdom.

Comment: awarded first class. Ranked #1 overall, #1 or #2 in all individual modules and in the top 10 out of 155 for the group module.

Aug 2014
↑
Sep 2013

French DUETI in Computing Science

From: Grenoble Alpes University in Grenoble, France.

Comment: awarded first class. This year-long university degree runs while enrolled in a programme of study abroad.

Aug 2013
↑
Sep 2012

French DUT in Computing Science

From: Grenoble Alpes University in Grenoble, France.

Comment: awarded first class. Ranked #1. This 2-year university degree is equivalent to a Higher National Diploma in the United Kingdom or an Associate's Degree in the United States of America for instance. It was achieved in a unusual format called "special year" where the 2-year degree is actually achieved in a single year. This harder version is available only to students already holding a 2-year degree.

Aug 2012
↑
Sep 2010

French BTS in Accounting and Organisations Management

From: Edouard Herriot Secondary School in Voiron, France.

Comment: awarded first class. This 2-year degree is equivalent to a Higher National Diploma in the United Kingdom or an Associate's Degree in the United States of America for instance.

Awards

Apr 2021

Huawei UK R&D Fellowship

From: Huawei.

What: "The Huawei Fellowship Programme supports the development of talented Ph.D. students to solve the most challenging problems in their disciplines and make innovative breakthroughs in computing technologies." (Source: Huawei flyer)

Jul 2018

HPC Programming Challenge Winner (MPI and CPU categories)

From: the International HPC Summer School.

What: award given to the author of the fastest MPI-code and fastest CPU-code in the HPC programming challenge; a week-long competition where contestants are provided with a source code to optimise.

Sep 2017

Selected to Attend the Heidelberg Laureate Forum

From: the Heidelberg Laureate Forum Foundation.

What: the HLF is a week-long program in Germany where 100 young researchers in computing science selected world-wide meet numerous holders of Abel Prize, ACM A.M. Turing Award, ACM Prize, Fields Medal and Nevanlinna Prize.

Oct 2016

Engineering and Physical Sciences Research Council Scholarship

From: the Engineering and Physical Sciences Research Council of the United Kingdom.

What: a scholarship awarded solely on academic merit which waives the tuition fees, provides a stipend and a travel budget during the 4-year Centre for Doctoral Training in Pervasive Parallelism program.

Oct 2015

Hackathon winner

From: MakeItSocial & SkyScanner.

What: member of the team who won the hackathon organised by MakeItSocial & SkyScanner, by developing an application to facilitate bill sharing within flatmates.

Sep 2015

Highly Skilled Workforce Scholarship

From: The University of Edinburgh.

What: a scholarship awarded solely on academic merit which waives the tuition fees of the Master of Science in High Performance Computing with Data Science.

Jul 2015

University Class Medal Winner

From: Edinburgh Napier University.

What: a medal awarded for ranking #1 in a program of study.

Jun 2015

METERology Prize

From: METERology.

What: prize awarded for innovation in green ICT and overall performance in the Bachelor of Science with Honours in Computing Science.

Publications

Jul 2022

NVRAM as an Enabler to New Horizons in Graph Processing

Published in: Spring Nature Computer Science

Authors: Ludovic A. R. Capelli, Nick Brown, Jonathan M. Bull

Abstract: From the world wide web, to genomics, to traffic analysis, graphs are central to many scientific, engineering, and societal endeavours. Therefore an important question is what hardware technologies are most appropriate to invest in and use for processing graphs, whose sizes now frequently reach terabytes. Non-Volatile Random Access Memory (NVRAM) technology is an interesting candidate enabling organisations to extend the memory in their systems typically by an order of magnitude compared to Dynamic Read Access Memory (DRAM) alone. From a software perspective, it permits to store a much larger dataset within a single memory space and avoid the significant communication cost incurred when going off node. However, to obtain optimal performance one must consider carefully how best to integrate this technology with their code to cope with NVRAM esoteric properties such as asymmetric read/write performance or explicit coding for deeper memory hierarchies for instance. In this paper, we investigate the use of NVRAM in the context of shared memory graph processing via vertex-centric. We find that NVRAM enables the processing of exceptionally large graphs on a single node with good performance, price and power consumption. We also explore the techniques required to most appropriately exploit NVRAM for graph processing and, for the first time, demonstrate the ability to process a graph of 750 billion edges whilst staying within the memory of a single node. Whilst the vertex-centric graph processing methodology is our main focus, not least due to its popularity since introduced by Google over a decade ago, the lessons learnt in this paper apply more widely to graph processing in general.

Digital Object Identifier (DOI): 10.1007/s42979-022-01317-4

Nov 2019

iPregel: Strategies to Deal with an Extreme Form of Irregularity

Published in: workshop on Irregular Applications: Architectures and Algorithms hosted at the International Conference for High Performance Computing, Networking, Storage and Analysis.

Authors: Ludovic A. R. Capelli, Nick Brown, Jonathan M. Bull

Abstract: Vertex-centric programming attracts significant attention in the world of graph processing thanks to its simple interface and the inherent parallelism for the underlying framework to leverage. However, vertex-centric programs represent an extreme form of irregularity; a workload that may greatly vary across supersteps, fine-grain synchronisations and memory accesses unpredictable both in terms of quantity and location.
In this paper, we explore optimisations to address these challenges: a hybrid combiner, vertex structure externalisation, an edge-centric shift in the workload representation and dynamic load-balancing. The optimisations were integrated into the iPregel vertex-centric framework and evaluated across three benchmarks commonly used in vertex-centric, each run on four publicly available graphs comprising up to a billion edges.
The result of this work is a set of techniques which we believe not only provides a significant improvement in vertex-centric performance, but are also applicable more generally to irregular applications.

Digital Object Identifier (DOI): 10.1109/IA349570.2019.00013

Aug 2019

iPregel: Vertex-Centric Programmability vs Memory Efficiency and Performance, Why Choose?

Published in: Journal of Parallel Computing

Authors: Ludovic A. R. Capelli, Zhenjiang Hu, Timothy A. K. Zakian, Nick Brown, Jonathan M. Bull

Abstract: The vertex-centric programming model, designed to improve the programmability in graph processing application writing, has attracted great attention over the years. Multiple shared memory frameworks that have implemented the vertex-centric interface all expose a common tradeoff: programmability against memory efficiency and performance.
Our approach consists in preserving vertex-centric programmability, while implementing optimisations missing from FemtoGraph, developing new ones and designing these so they are transparent to a user's application code, hence not impacting programmability. We therefore implemented our own shared memory vertex-centric framework iPregel, relying on in-memory storage and synchronous execution. In this paper, we evaluate it against FemtoGraph, whose characteristics are identical, but also an asynchronous counterpart GraphChi and the vertex-subset-centric framework Ligra. Our experiments include three of the most popular vertex-centric benchmark applications over 4 real-world publicly accessible graphs, which cover all orders of magnitude between a million to a billion edges. We then measure the execution time and the peak memory usage. Finally, we evaluate the programmability of each framework by comparing it against the original Pregel, Google's closed-source implementation that started the whole area of vertex-centric programming.
Experiments demonstrate that iPregel, like FemtoGraph, does not sacrifice vertex-centric programmability for additional performance and memory efficiency optimisations, which contrasts with GraphChi and Ligra. Sacrificing vertex-centric programmability allowed the latter to benefit from substantial performance and memory efficiency gains. However, experiments demonstrate that iPregel is up to 2300 times faster than FemtoGraph, as well as generating a memory footprint up to 100 times smaller. These results greatly change the situation; Ligra and GraphChi are up to 17,000 and 700 times faster than FemtoGraph but, when comparing against iPregel, this maximum speed-up drops to 10. Furthermore, on PageRank, it is iPregel that proves to be the fastest overall. When it comes to memory efficiency, the same observation applies; Ligra and GraphChi are 100 and 50 times lighter than FemtoGraph, but iPregel nullifies these benefits: it provides the same memory efficiency as Ligra and even proves to be 3 to 6 times lighter than GraphChi on average. In other words, iPregel demonstrates that preserving vertex-centric programmability is not incompatible with a competitive performance and memory efficiency.

Digital Object Identifier (DOI): 10.1016/j.parco.2019.04.005

May 2019

Incrementalization of Vertex-Centric Programs.

Published in: IEEE International Parallel and Distributed Processing Symposium

Authors: Timothy A. K. Zakian, Ludovic A. R. Capelli, Zhenjiang Hu

Abstract: As the graphs in our world become ever larger, the need for programmable, easy to use, and highly scalable graph processing has become ever greater. One such popular graph processing model, the vertex-centric computational model, does precisely this by distributing computations across the vertices of the graph being computed over. Due to this distribution of the program to the vertices of the graph, the programmer "thinks like a vertex" when writing their graph computation, with limited to no sense of shared memory and where almost all communication between each on-vertex computation must be sent over the network. Because of this inherent communication overhead in the computational model, reducing the number of messages sent while performing a given computation is a central aspect of any efforts to optimize vertex-centric programs. While previous work has focused on reducing communication overhead by directly changing communication patterns by altering the way the graph is partitioned and distributed, or by altering the graph topology itself. In this paper we present a different optimization strategy based on a family of complementary compile-time program transformations in order to minimize communication overhead by changing both the messaging and computational structures of programs. Particularly, we present and formalize a method by which a compiler can automatically incrementalize a vertex-centric program through a series of compile-time program transformations by modifying the on-vertex computation and messaging between vertices so that messages between vertices represent patches to be applied to the other vertex's local state. We empirically evaluate these transformations on a set of common vertex-centric algorithms and graphs and achieve an average reduction of 2.7X in total computational time, and 2.9X in the number of messages sent across all programs in the benchmark suite. Furthermore, since these are compile-time program transformations alone, other prior optimization strategies for vertex-centric programs can work with the resulting vertex-centric program just as they would a non-incrementalized program.

Digital Object Identifier (DOI): 10.1109/IPDPS.2019.00109

Aug 2018

iPregel: A Combiner-Based In-Memory Shared Memory Vertex-Centric Framework.

Published in: workshop on Parallel Programming Models and Systems Software for High-End Computing hosted at the International Conference on Parallel Processing

Authors: Ludovic A. R. Capelli, Zhenjiang Hu, Timothy A. K. Zakian

Abstract: The expressiveness of the vertex-centric programming model introduced by Pregel attracted great attention. Over the years, numerous frameworks emerged, abiding by the same programming model, while relying on widely different architectural designs. The vast majority of existing vertex-centric frameworks exploits distributed memory parallelism or out-of-core computations. To our knowledge, only one vertex-centric framework is designed upon in-memory storage and shared memory parallelism. Unfortunately, while built on a faster architecture than that of other vertex-centric frameworks, it did not prove to significantly outperform other existing solutions.
In this paper we present iPregel: another in-memory shared memory vertex-centric framework. The optimisations developed and presented in this paper particularly target three hotspots of vertex-centric calculations: selecting active vertices, routing messages to their recipient and updating recipients inbox. We compare iPregel against the state-of-the-art in-memory distributed memory framework Pregel+ on three of the most common vertex-centric applications: PageRank, Hashmin and the Single-Source Shortest Path. Experiments demonstrate that the single-node framework iPregel is faster than its distributed memory counterpart until at least 11 nodes are used. Further experiments show that iPregel completes a PageRank application with an order of magnitude less memory than popular vertex-centric frameworks.

Digital Object Identifier (DOI): 10.1145/3229710.3229719

Experience

Present
↑
Nov 2022

Teaching Fellow

Where: EPCC in Edinburgh, United Kingdom

What: position focussing exclusively on the teaching components at EPCC. Course organiser of the MSc module "Advanced Message-Passing Programming".

Present
↑
Sep 2023

Acaemic Cohort Lead

Where: EPCC in Edinburgh, United Kingdom

What: position focussing on providing students with academic, as part of the new student support model.

Nov 2020
↑
May 2020

Computational Fluid Dynamics Engineer

Where: Renault Sport Racing in Enstone, United Kingdom

What: internship consisting in improving the speed and resilience of parallel CFD calculations, developing an understanding of HPC and providing an insight into industry best practice and future HPC technologies.

Apr 2018
↑
Oct 2017

Lab Assistant

Where: the National Institute of Informatics in Tokyo, Japan

What: internship focusing on the optimisation of the vertex-centric programming model.

Apr 2016
↑
Jan 2016

Lab Associate

Where: Disney Research in Edinburgh, United Kingdom

What: as a member of the Real-Time Digital Acting team; I investigated techniques for real-time 2D facial landmarks tracking.

Sep 2014
↑
Jul 2014

Flow Controller

Where: Schneider Eletric Energy & Sustainability Services in Moirans, France

What: focusing on logistics and supply chain to aim for a smooth management of resources.

Aug 2013
↑
Jul 2013

Human-Machine Interface Developer

Where: Centum Adeneo in Moirans, France

What: internship first focusing on source code refactoring and makefile generators, then working on the graphical interface of AVAP3000; a device that enables fast data acquisition in aircraft testing, using C++ / Qt programming.

Dec 2011

and

May 2011
↑
Jan 2011

Accountant

Where: MAATEL in Moirans, France

What: internship aiming at learning and practicing daily tasks of an accountant; from invoice registration to bank statement analysis.

Teaching

Present
↑
Jan 2023

Advanced Message-Passing Programming

Where: The University of Edinburgh in Edinburgh, United Kingdom.

Role: Course organiser.

Level: Master's degree.

Covers:

Scalability challenges
Leading-edge HPC architectures
MPI Internals
Message-passing optimisations
Parallel performance tools
Performance modelling
Single-sided protocols
Exploiting heterogeneous architectures
Advanced load-balancing techniques
Parallel file systems and parallel IO
Choice of programming model/language

May 2020
↑
Jan 2020

and

May 2019
↑
Jan 2019

and

May 2017
↑
Jan 2017

Advanced Parallel Programming

Where: The University of Edinburgh in Edinburgh, United Kingdom.

Role: Teaching assistant.

Level: Master's degree.

Covers:

Scalability challenges
Leading-edge HPC architectures
MPI Internals
Message-passing optimisations
Parallel performance tools
Performance modelling
Single-sided protocols
Exploiting heterogeneous architectures
Advanced load-balancing techniques
Parallel file systems and parallel IO
Verification and fault tolerance
Choice of programming model/language

Dec 2019
↑
Sep 2019

and

Dec 2018
↑
Sep 2018

and

Dec 2016
↑
Sep 2016

Message Passing Programming

Where: The University of Edinburgh in Edinburgh, United Kingdom.

Role: Teaching assistant.

Level: Master's degree.

Covers:

The message-passing model
Message-passing parallelisation of a regular domain code
MPI terminology
The anatomy of send and receive (synchronous and asynchronous)
Point-to-point message-passing example (pi)
Bandwidth and latency via pingpong (synchronous and asynchronous)
Non-blocking operations
Collectives
Communicator management: topologies and partitioning
Derived datatypes (focusing mainly on array subsections)
Practicalities / Hints and Tips
MPI implementations

Dec 2018
↑
Sep 2018

and

Dec 2016
↑
Sep 2016

High Performance Computing Architectures

Where: The University of Edinburgh in Edinburgh, United Kingdom.

Role: Teaching assistant.

Level: Master's degree.

Covers:

Basic components of HPC systems: processors, memory, interconnect, storage.
Classification of architectures: SIMD/MIMD, shared vs distributed memory, clusters
System software: OSs, processes, threads, scheduling, batch systems.
Brief history of HPC systems, including Moore's Law.
CPU design: functional units, instructions sets, pipelining, branch prediction, ILP (superscalar, VLIW, SIMD instructions), multithreading.
Caches: operation and design features
Memory: operation and design features, including cache coherency and consistency
Multicore CPUs, including cache and memory hierarchy
GPGPUs: operation and design features
Interconnects: operation and design features
Current HPC architectures

May 2017
↑
Jan 2017

High Performance Computing Ecosystems

Where: The University of Edinburgh in Edinburgh, United Kingdom.

Role: Teaching assistant.

Level: Master's degree.

Covers:

EPCC Core Lectures, covering the following topics:

Distributed computing
Cloud computing
Trends and demographics in HPC
HPC vendors
HPC users and their requirements
HPC system procurement
HPC in Europe
Exascale computing

Guest Lectures from HPC vendors, researchers, and users
Practicals on:

Cloud computing
HPC system procurement

Dec 2022
↑
Sep 2022

and

Dec 2019
↑
Sep 2019

and

Dec 2018
↑
Sep 2018

and

Dec 2016
↑
Sep 2016

Threaded Programming

Where: The University of Edinburgh in Edinburgh, United Kingdom.

Role: Teaching assistant.

Level: Master's degree.

Covers:

Basic concepts of shared memory: threads, tasks, shared/private data, synchronisation.
Concepts of OpenMP: parallel regions, shared/private variables, parallel loops, reductions
OpenMP parallel regions and associated clauses
OpenMP worksharing directives, scheduling of parallel loops
OpenMP synchronisation: barriers, critical sections, atomics, locks.
OpenMP tasks
Additional features of OpenMP: nesting, orphaning, threadprivate globals, OpenMP 4.0 features
OpenmP implementations
Basic concepts of Posix threads, Boost/C++0x threads, Intel TBB, Java threads
Comparison of APIs

Lectures will be followed by tutored practical sessions illustrating the key concepts. Students will have the choice of using either C or Fortran in the practical programming sessions on OpenMP.

May 2020
↑
Jan 2020

and

May 2019
↑
Jan 2019

and

May 2017
↑
Jan 2017

Parallel Design Patterns

Where: The University of Edinburgh in Edinburgh, United Kingdom.

Role: Teaching assistant.

Level: Master's degree.

Covers:

Task Parallelism
Recursive Splitting
Geometric Decomposition
Pipeline
Discrete Event
Actors
Master / Worker
Loop Parallelism
Fork / Join
Shared Data and Queues
Active Messaging

Dec 2018
↑
Sep 2018

and

Dec 2016
↑
Sep 2016

Fundamentals of Data Management

Where: The University of Edinburgh in Edinburgh, United Kingdom.

Role: Teaching assistant.

Level: Master's degree.

Covers:

Why managing research data better matters, and why it's hard
Data management planning: a required part of twenty first century research
Data formats: structuring data and keeping them useful
Metadata: describing data and keeping them useful
Publication and citation of research data
Persistence, preservation and provenance of research data
Licensing, copyright and access rights: some things researchers need to know

May 2015
↑
Jan 2015

Advanced Database Systems

Where: Edinburgh Napier University in Edinburgh, United Kingdom.

Role: Demonstrator.

Level: Bachelor's degree.

Covers:

Object-Relational Databases - data modelling techniques, querying, database implementation: practical utilisation of an advanced database management system to implement a non-relational data model.
Data Warehouses - Why are data warehouses needed? Difference between data warehouses and traditional databases, data modelling techniques, implementation issues
Big Data Analytics - What is big data? Differences between big data and other databases, an introduction to big data analytics.
Emerging database techniques.

Volunteering

Present
↑
Oct 2023

Member of the International Society for the Scholarship of Teaching and Learning

Where: the International Society for the Scholarship of Teaching and Learning (ISSOTL).

What: the ISSOTL "serves faculty members, staff, and students who care about teaching and learning as serious intellectual work. Through building intellectual and collaborative infrastructure, the Society supports the associational life that fosters scholarly work about teaching and learning." (Source)

Present
↑
Mar 2023

Member of the OpenMP language committee

Where: the OpenMP language committee

What: the OpenMP language committee is in charge of the standardisation of "directive-based multi-language high-level parallelism". (Source)

Present
↑
Feb 2023

Member of the MPI Forum.

Where: the MPI Forum

What: the MPI Forum is in charge of the standardisation of the Message-Passing Interface (MPI).

May 2023

Manager of the EPCC booth at ISC

Where: the International Supercomputing Conference (ISC) in Hamburg, Germany.

What: the booth manager is in charge of handling the logistics, coordination and construction/dismantle of equipments, including the cluster used for the competition, to the ISC conference. In the case of EPCC, this implies ensuring the shipment of HPC equipments worth over £400,000.

Jul 2023
↑
Jul 2019

Organiser of the HPC Programming Challenge

Where: the International High-Performance Computing Summer School.

What: The IHPCSS programming challenge is a 5-day programming competition where students practice the technologies they learn during the summer school. The objective is to optimise the source code provided as much as they can, using shared-memory parallelism, distributed-memory parallelism, GPU offloading, or any combination of the three.

Sep 2019
↑
Jun 2017

STEM Ambassador

Where: the Science, Technology, Engineering and Mathematics learning network, in Edinburgh, United Kingdom.

What: "Stem Ambassadors are volunteers for a wide range of science, technology, engineering and mathematics (STEM) related jobs and disciplines across the United Kingdom. They offer their time and enthusiasm to help bring STEM subjects to life and demonstrate their value in life and careers. STEM Ambassadors are an important and exciting free-of-charge resource for teachers and others engaging with young people in and outside the classroom."
(Source: STEM website www.stem.org.uk)

Jun 2017

and

Jun 2016

Volunteering Student

Where: the International SuperComputing conference in Frankfurt, Germany.

What: participating in the student volunteer programme to help run the ISC conference; the international event for High Performance Computing, Networking and Storage.
Student volunteer assisted in:

ISCNet
Acquisition of Participant Data
Set up of Exhibition, Registration, Poster Sessions, Signane and Standing Banners.
Dismantling Process
Exhibitor Services
Registration and Check in Process
Assistance of Session, Speakers and Catering.

May 2015
↑
Sep 2014

Programme Representative

Where: Edinburgh Napier University in Edinburgh, United Kingdom.

What: program representatives act as an interface between students and staff to facilitate communications and interactions. Sometimes, this results in being at the heart of a conflict, nonetheless, the bottom line is we are all on the same boat doing our best to make the most of our BSc experience.
Attending "Student & Staff" committees, forums and events is the primary role of a program representative; doing our best to improve the student experience at univerisity.

May 2015
↑
Sep 2014

Student Mentor

Where: Edinburgh Napier University in Edinburgh, United Kingdom.

What: being a mentor translates to being the one who informs the mentoree about services they can access, how to best prepare to upcoming exams, projects and so on. However, knowing how to listen is the cornerstone to the one willing to become a good mentor. Furthermore, a mentor must be aware of what they cannot / should not handle, and know to whom signpost the student mentored.

Jun 2012
↑
Sep 2011

Treasurer of the "Association of Senior Technicians in Accounting"

Where: Edouard Herriot Secondary School in Voiron, France.

What: the Association of Senior Technicians in Accounting is internal to the Edouard Herriot Secondary School; it handles the budget allocated to the BTS in Accounting and Organisations Management to fund trips to museums or theaters for instance. Being a member of this association implies serious responsabilities since the money managed is as real as it gets. In addition to being a real study case to the student, this position requires dedication, hardwork and honesty. With weekly appointements with the bank and overall supervision from teachers, such an association demands an efficient management as well as an exemplary behaviour from its team members.