Machine Learning Techniques for Fish Breeding Decision Making
Keywords:
Software Engineering, Machine Learning, Fish breeding programsAbstract
The New Zealand Institute for Plant and Food Research has been working on creating breeding programs for the Australasian Snapper (Chrysophrys auratus) to breed snappers that mature faster and are high quality. Part of the breeding program goals is selecting individuals that produce quick-to-mature offspring. To accomplish this, they collected the genomic makeup of snappers into a dataset. However, the collected data does have missing values in some features, which require imputing to use those features to classify fish that grow fast and slow. As the genes responsible for controlling the growth rate in Snapper are currently unknown, the dataset must maintain most of the features to identify the genes most likely for controlling the snappers' growth rate. This project aimed to discover whether the data imputation methods used impacted the ability of a machine learning classifier to predict the growth rate and, if so, how different imputation methods performed. This project implemented five imputation methods, specifically Most Frequent imputation, K-Nearest Neighbour (KNN) imputation, Multiple Imputation by Chained Equations (MICE), a KNN approach using domain information, and a cascading KNN imputation method using domain information. The KNN and MICE approaches have two different parameter settings for imputation. This project evaluated these imputation techniques using a Random Forest classifier. The results showed that all imputation methods are robust to the test train split and random state used in the random forest classifier. The classification accuracies were similar between the imputation methods. Results indicated that domain-based imputation approaches did perform better than other imputation techniques. The results showed that using domain-based imputation techniques could improve the overall results of the imputation techniques.