Machine Learning Training: Research Challenges and Opportunities for the Distributed Systems Community

The seminar will be given by Dr. Giovanni Neglia, INRIA, Sophia Antipolis, France, as part of the OPT4SMART project.

  • Date: 10 February 2020 from 15:30 to 17:30

  • Event location: Room 5.6, School of Engineering, Viale del Risorgimento 2, Bologna

  • Access Details: Free admission

About the speaker

Giovanni Neglia received the master’s degree in electronic engineering and the PhD degree in telecommunications from the University of Palermo, Italy, in 2001 and 2005, respectively. He has been a researcher at INRIA, Sophia Antipolis, France, since September 2008. In 2005, he was a research scholar with the University of Massachusetts, Amherst, visiting the Computer Networks Research Group. Before joining Inria, he was a post-doctorate with the University of Palermo and an external scientific advisor with the Maestro Team at INRIA. His research focus on modeling and performance evaluation of networks. He is area editor for the Elsevier Computer Communication journal (COMCOM), and associate editor for IEEE Trans. on Mobile Computing.


Abstract

In this talk, I will support the thesis that the distributed systems community is not meant to simply apply machine learning (ML) tools to their classic research problems, but can also contribute to design faster and more efficient distributed ML systems both for training and inference. I will first introduce machine learning training and show that computational speedups directly translate into better ML models. I will then explain why design choices for ML systems are inevitably entangled with optimization and statistical considerations. Finally, I will provide two examples from my recent research activity: dynamic (TCP-like) adaptation of the number of ML workers, and topology design.

Contacts