Mock Lecture (Faculty Interviews) – Samin Aref, University of Toronto


Monday, March 11, 2024
10:00am-12:00pm


RS208, Rosebrugh Building,
164 College St.


Samin Aref, University of Toronto

Monday, March 11 |10:00am-12:00pm | RS208, Roseburgh Building, 164 College Street

Mock Lecture (50 Mins):  A Technical Introduction to Large Language Models

Technical Talk (35 Mins) & Q&A (15 Mins):  A Branch-and-Cut Method for Accurate Clustering of Network Data

Network clustering (descriptive community detection) is a fundamental problem in data science with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize the modularity objective function over different partitions of the network nodes. Using over 100 real and random networks from a wide range of contexts, we investigate the extent to which current heuristic modularity maximization algorithms succeed in returning maximum-modularity (optimal) partitions. We compare nine existing algorithms against an exact integer programming method that globally maximizes modularity. The average modularity-based heuristic algorithm returns optimal partitions for only 43.9% of the 80 graphs considered. More importantly, our results show that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of commonly used modularity-based heuristics for discovering communities: they rarely produce an optimal partition or a partition resembling an optimal partition. If modularity is to be used for detecting communities in small and mid-sized networks, exact and approximate optimization algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits. (Preprints: arxiv.org/abs/2310.10898)

We also propose a specialized algorithm, called Bayan, which returns partitions with a guarantee of proximity to an optimal partition. At the core of the Bayan algorithm is a branch-and-cut scheme that solves an integer programming formulation of the problem to optimality or approximates it within a factor. Bayan is several times faster than open-source and commercial solvers for modularity maximization making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. We also compare Bayan’s accuracy and stability with 29 other algorithms in retrieving ground-truth communities of benchmark networks. Overall, our assessments point to Bayan as a suitable method for exact maximization of modularity in networks with up to 3000 edges (in their largest connected component) and approximating maximum modularity in slightly larger networks on ordinary computers. A Python implementation of the Bayan algorithm (bayanpy) is publicly available through the package installer for Python (pip). (Preprint: arxiv.org/abs/2209.04562)

Joint work with Hriday Chheda (MIE Department, U of T) and Mahdi Mostajabdaveh (Huawei Canada)

Bio: 

Samin Aref is an assistant professor, teaching stream (CLTA) at the MIE department. His area of teaching and research includes data science, machine learning, social computing, and operations research. Prior to joining the U of T in 2021, Samin was a research scientist at the Max Planck Institute for Demographic Research (Germany, 2018-2021). He holds a PhD in Computer Science from the University of Auckland (New Zealand, 2019) and an MSc in Industrial Engineering from Sharif University of Technology (Iran, 2014). At the MIE department, he has developed undergrad and graduate courses (MIE258/MIE358 and MIE1626) and has collaborated with other MIE faculty on improving our Master of Engineering program and the Emphasis in Data Analytics and Machine Learning (formerly named Emphasis in Analytics). Samin has supervised several students and has served in multiple committees at the MIE and the Faculty of Applied Science & Engineering.

Evaluation Questions Link:  https://forms.office.com/r/bWyfDJ1Pw2

 

© 2024 Faculty of Applied Science & Engineering