Preprint / Version 1

Prediction of EPA Pesticide Tolerance Using Machine Learning and Publicly Available Data

##article.authors##

  • Sahej Singh

DOI:

https://doi.org/ 10.47611/harp.115

Keywords:

Pesticides, Machine Learning, Cheminformatics, Tolerance Levels, EPA

Abstract

The EPA Tolerance Level for pesticide/commodity pair (Tol) is an important indicator in the environmental risk assessment of common pesticides. This metric is used to tell how much residue in parts-per-million (ppm) is tolerated on food. Pesticides must go through rigorous and costly testing to be approved for public use. For this reason, it is necessary to accurately estimate the Tol of a pesticide. This study aims to use publicly available pesticide data, along with collected values of physiochemical properties and molecular descriptors of chemical structures, to develop a reproducible model capable of predicting whether a pesticide can be tolerated. More specifically, the accuracies of models based on a Support Vector Machine, Decision Tree, Logistic Regression, and K-Nearest Neighbors algorithms were compared and evaluated. The experimental results suggest that it is possible to reach a relatively high accuracy using molecular descriptors and specific values from publicly available data. Compared to previous models, these models are more transparent in their methodology and input. Therefore, while not as accurate, the generalizable and modular workflow can be used in the preliminary evaluation of pesticides and reproduced in more data-intensive studies.

Downloads

Posted

2021-12-27