Mock Lecture (Faculty Interviews) - Emad Andrews, University of Toronto - Department of Mechanical & Industrial Engineering

Thursday, April 4, 2024
10:00am-12:00pm

RS208, Rosebrugh Building,
164 College St.

Emad Andrews, University of Toronto

Mock Lecture (50 Mins): A Technical Introduction to Large Language Models

Technical Talk (35 Mins) & Q&A (15 Mins): The Effect of Expert Labeling & Feature Selection on Classification Performance

Unlike biological and self-identified datasets, many real-world ML classification problems deal with datasets that do not intrinsically possess labels or pre-defined feature sets. To deal with these problems, data scientists resort to expert-labeling and collecting a large number of features, to be able to train the classifiers. While it can be accurate, it is challenging to create an expert-labeled dataset that adequately provides the training search space needed to perform well on production data. In other words, expert-labeled training datasets often create decision boundaries that do not fully resemble the actual ones, which may lead to misclassifying the data points close to these boundaries. Moreover, while trying to maximize the information-gain by collecting as many features as possible, classifiers may suffer from noise, unchecked correlations, and the curse of dimensionality. In this talk, we will explore techniques that help data scientists detect and address the shortcomings of expert-labeled training datasets. Also, we will explore the most useful feature selection algorithms and reverse selection techniques that help us retain the correct number of dimensions while still maximizing the information-gain with respect to the class labels.

Bio:

Emad Andrews is a Lead Data Scientist at the Canadian Investment Regulatory Organization. After earning his BSc. and MSc. in Computer Science, he completed his Ph.D. in Computer Science at the University of Toronto in the field of Machine Learning and Bioinformatics. His Ph.D. thesis title is “Inferring Genetic Regulatory Networks Using Cost-Based Abduction and its Relation to Bayesian Inference.” Dr. Andrews’ contributions are published in top-tier ML journals and conference proceedings, including Neural Networks, Neurocomputing, IJCAI and Cognitive Systems Research. His research interests are in ML, Complexity Theory, and Data Science and Analytics. He has received several prestigious honours, awards and scholarships, including the NSERC Doctoral Scholarship Award.

In the field of Analytics and Data Science, Dr. Andrews is conducting cutting-edge research utilizing massive Big-Data stores that average over a billion data points per day. He designed, led and implemented case studies, statistical analysis, and quantitative and machine learning models, including risk models, alert systems, fraud and market-manipulation detection systems, and predictive models. His latest publication, in the Journal of Financial Markets, demonstrates a novel hybridization between econometrics and ML methods to study the effect of speed segmentation on the Canadian exchange.

In addition to Dr. Andrews’ vast industry experience in Software Engineering and Data Science, he is a sessional instructor in the CS and MIE departments at the University of Toronto, where he teaches various ML and Software Engineering courses in both undergraduate and graduate levels, including CSC411, CSC2515, CSC311, CSC207 and APS1070.

Evaluation Questions Link: https://forms.office.com/r/SAB3rbeW11

Mock Lecture (Faculty Interviews) – Emad Andrews, University of Toronto