HARNESSING THE POWER OF GEOSPATIAL DATA WITH RANDOM FOREST TO FORECAST GYPSY MOTH OUTBREAK

Open Access
- Author:
- Xia, Zhiyue
- Graduate Program:
- Forest Resources
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- July 03, 2018
- Committee Members:
- Douglas A Miller, Thesis Advisor/Co-Advisor
Laura P Leites, Thesis Advisor/Co-Advisor
Shelby Fleischer, Committee Member - Keywords:
- Gypsy moth
Random Forest
Random Forests
Forest disturbance
Spatial modeling
Ecological modeling - Abstract:
- The gypsy moth (Lymantria dispar) is a non-native forest pest that was introduced to the USA in 1869. Since then it has spread continuously across most of the northeastern US. Larvae of this insect prefer feeding on oak species, although other species may also serve as host trees. During an outbreak, larvae defoliate forests across large regions and repeated defoliation can predispose the trees to attacks by secondary insect pests or fungal infections causing tree mortality. Gypsy moth outbreaks are episodic and are difficult to predict. Development of forecasting models remains a challenge despite their potential usefulness in effectively mobilizing resources to deal with the outbreaks. Previous studies indicate that vegetation attributes measured through remote sensing, terrain, and climate characteristics influence the likelihood of gypsy moth outbreaks. In addition, temporal and spatial variables describing the cyclic and spatial patterns of the outbreaks could be very valuable in forecasting outbreaks. In this thesis, a model is developed to forecast gypsy moth outbreaks using Pennsylvania as a case study. Systematic sampling was used to locate 5,042 sample pixels across forest areas of Pennsylvania and focus on defoliation episodes during the time period 2000-2016 to develop the model. For each pixel, a large suite of temporal and spatial predictor variables is derived from inventory data, climate, topography, and remote sensing measures of vegetation status, while the occurrence of defoliation is obtained from annual defoliation sketch maps. Machine learning modeling algorithm Random Forests was used in this study, which has a well-documented predictive ability and can deal with a large number of variables. The model performance is assessed by hindcasting defoliations in 1985, 1990 and 1995, and by cross validation leaving out one year of the fit dataset at a time. An accurate forecasting model is of critical importance for projecting the spatial extent of future defoliations and for forest management planning.