Illuminating taxonomic and ecoregional variation in bias among community science data to inform conservation-based species distribution modeling

Open Access
- Author:
- Lacey, Lindsay
- Graduate Program:
- Spatial Data Science
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- November 10, 2022
- Committee Members:
- Anthony Robinson, Program Head/Chair
Marcela Suárez, Thesis Advisor/Co-Advisor
Jennifer Miller, Committee Member - Keywords:
- species distribution model
MaxEnt
conservation
volunteered geographic information
citizen science
community science
bias
SDM
ecological niche model
ENM - Abstract:
- Citizen science, also known as community science, has been growing rapidly in use and availability within ecology and conservation. Many community science efforts provide participants the opportunity to submit data on species occurrences anywhere and at any time, deeming the information collected “opportunistic data.” The flexibility of this opportunistic approach has led to a larger participant base that has contributed massive amounts of data, however the lack of a standard methodology has also led to uncertainty and bias in those datasets that hold the potential to influence subsequent analyses. These large datasets are often freely available, making them an excellent resource for groups with limited funding for research and data acquisition, such as conservation non-profits. One particularly frequent application in which opportunistically collected species occurrence data are used is in species distribution modeling, which can inform conservation efforts by helping researchers understand where wildlife are distributed across the landscape. Opportunistically collected data can be influenced by multiple sources of bias including spatial, temporal, and taxonomic bias. The most prevalent of these three, spatial bias, can greatly influence the results of species distribution models (SDMs) and cause them to overemphasize certain areas within a modeling region. For example, observed occurrence data can often exhibit bias toward areas of greater human activity such as roads, trails, and infrastructure, which could lead to inferences about artificially high suitability for a species modeled in these areas. While there are many approaches to address such bias, there is little consensus on when to use certain methods and why, and a limited understanding exists on how different methods and subjective parametrizations will influence SDM results. While these various methods of addressing bias and their differential impacts on SDMs have not been assessed across the diverse taxonomic groups and ecoregions of the United States, studies in other regions of the world have indicated the accuracy of such methods may vary by geographic region or habitat type. Such variation could impact the accuracy of modeling efforts based on a given geographic area of interest, the extent of which remains unknown across the diversity of ecoregions in the United States. To address this, the following study aims to (1) synthesize existing approaches of identifying and controlling for spatial bias in SDMs using community science data and develop a flow chart outlining specific tools, methods, and use cases presented in the literature in a structure beneficial to conservation professionals and early career professionals looking to use these methods, (2) assess the spatial, temporal, and taxonomic bias in community science datasets accessed from the Global Biodiversity Information Facility (GBIF) for 40 species selected to represent each of the 10 EPA Level I ecoregions in the contiguous United States, and (3) generate SDMs with and without the methods identified for addressing bias and quantify the differences in model performance across taxonomic groups and ecoregions in the contiguous U.S.