RL based framework to generate optimal data quality remediation sequence for machine learning

Abstract

We investigate the problem of automatically finding an optimal sequence of operators that should be applied to a dataset to remediate the challenges in the data, found in the context of building a supervised machine learning model. We motivate the need for this problem and propose a novel model agnostic reinforcement learning based framework that automatically finds an optimal sequence. The search for an optimal sequence is guided by the need to improve the overall data quality as well as update model performance.

Publication
SIGMOD 2021

Cite as:

Shangeth Rajaa, Nitin Gupta, Hima Patel, Shanmukh Chaitanya, Ruhi Sharma Mittal, Lokesh N, Naveen Panwar, Shazia Afzal, Sameep Mehta. RL based framework to generate optimal data quality remediation sequence for machine learning. SIGMOD 2021.