FEP-Bench: Benchmarking for Enhanced Feature Engineering and Preprocessing in Machine Learning
Hello, I’m Lihaowen (Jayce) Zhu, currently pursuing my Master of Science in Computer Science at the University of Chicago. I will be spending my summer working on the project FEP-Bench: Benchmarking for Enhanced Feature Engineering and Preprocessing in Machine Learning under the mentorship of Yuyang (Roy) Huang and Swami Sundararaman, my proposal.
The landscape of machine learning (ML) is profoundly impacted by the initial stages of feature engineering and data preprocessing. This phase, critical for the success of ML projects, is often the most time-consuming, representing about 80% of the effort in typical ML workflows. The FEP-Bench project proposes to address the significant bottlenecks encountered during this phase, particularly focusing on the challenges posed by data retrieval from data lakes and computational inefficiencies in data operations. By exploring innovative caching, prefetching, and heuristic strategies, this proposal aims to optimize the preprocessing workflow, thereby enhancing efficiency and reducing the required resources of ML projects.