Data rescue intern: Jawad Sakarchi
For six weeks in the summer of 2023, I had the opportunity to participate in the LDP Data Rescue Internship. The goals set out for my internship were centered around a very large database of Range Reference Area data collected and monitored by rangeland ecologists of the Ministry of Forests in British Columbia. The data focuses on differences inside, outside and between fenced exclosures. These exclosures act as tools to explore the effects of disturbances on ecological communities, in the form of grazing (e.g., livestock or wildlife). The dataset is large, with up to 70 variables, including other disturbances, such as fire, and many biogeographic variables, such as moisture, nutrient, or ecosystem classifications. The benefit of this dataset is that it both provides an exceptionally extensive account of disturbances across 370 sites throughout BC, as well as observations (data, photographs, notes) in the same site separated by decades, or in most extreme cases even 100 years. The Range Branch saw this dataset as extremely valuable and wanted to know what was available for both future data management, but also to provide an inventory for future ecologists that may wish to use and cite this data. How rich the dataset was presented a challenge as it was not clear at first, particularly for an academic not in the field. Over 100 years one may imagine notes take different forms, sites have different names, locations change in names, digital data takes different organization, and often in inaccessible programs (Microsoft Access). At the start of the internship many notes had recently been scanned and digitized, alongside many excel files. Though both the data and organization were conducted by separate people. The goal for this six week internship was to bookkeep what is available and create metadata, restructure and reorganize the data into an efficient entity relationship diagram for future processing in a relational database management system, revise naming conventions to be more consistent, identify unnamed/ambiguous variables to create a data dictionary, identify depreciated variables, create a DOI, and look up tables from non-tidy data while documenting the process for accessibility and future directions. To achieve this, I primarily worked with Nancy Elliot, a rangeland analyst at the British Columbia Ministry of Forests Range Branch (Range, Invasive Plants, and Ecosystem Restoration). Many of the files, such as the data dictionary, entity relationship diagram, background information, and R scripts associated with the project are now publicly available on the Open Science Framework. The metadata will be used to provide future researchers with an opportunity to easily understand and use the data. Alongside its documentation of decisions and next analysis steps, this project allows for reproducible management and living continuation of this rich dataset. Comments are closed.
|
Archives
October 2023
Categories |