- This event has passed.
AARMS Scientific Machine Learning Seminar: Michael W. Dunham (Department of Earth Sciences, Memorial University)
April 12, 2022 @ 11:00 am - 12:00 pm
Semisupervised machine learning algorithms and their application to geoscience classification problems
In recent years, many disciplines have been challenged with trying to efficiently extract meaning, or value, out of large datasets. Technological advances have improved data storage capabilities as well as how data can be obtained (e.g., real-time data). Manually interpreting data that are exponentially growing in volume has obvious management and analysis challenges. Machine learning is a solution to these challenges. Machine learning algorithms teach computers to recognize patterns in data and assign repetitive patterns to similar categories. This process automates pattern recognition of data and allows meaningful information to be extracted in an efficient manner.
For many machine learning problems, there are sufficient data to train a wide range of algorithms. Some applications, such as image classification and speech recognition, have large training datasets readily available. However, in several geoscience-related problems, labeled data are generally obtained by sampling the earth in some manner (e.g., drilling wells, field sampling, etc.), which is not trivial due to cost and logistical factors. As such, many earth science-related machine learning problems have limited training data. Supervised machine learning algorithms are prone to overfitting in scarce training data situations, but semisupervised approaches are designed for these problems because the unlabelled data are also used to inform the learning process.
Three geoscience applications inherently challenged with limited training data are well log classification, seismic classification, and bedrock lithology mapping. I apply various semisupervised algorithms to these three geoscience problems and determine if semisupervised algorithms can perform better than supervised methods and under what conditions, if applicable. The semisupervised methods I consider are self-training, label propagation, and semisupervised Gaussian mixture models. I consider several supervised methods in my work, but the most prevalent are gradient boosting decision tree methods (e.g., XGBoost, LightGBM). The results show that semisupervised methods can outperform their supervised counterparts for each of the geoscience applications, but there are situations where this is not always the case. Nonetheless, semisupervised methods are rarely considered for many geoscience disciplines, which is supported by the lack of published examples in the literature. The outcomes of this work help fill this gap, but they also help raise the awareness of semisupervised methods.