Data rescue interns: Ben Mumford and Lindsay Trottier Harvey Janszen (1946-2021) was a beloved amateur botanist and naturalist who extensively documented his findings around the Southern Gulf Islands and Saanich Peninsula in Southern BC as well as the San Juan Islands in Washington. His work filled 5 field journals, spanning from 1973-2017. Although some of these data are already available on the open-access portal GBIF, the observation-only occurrence data, has yet to be extracted. Digitizing these records would generate thousands of new vascular plant species occurrences for the south coastal BC region, in a period before iNaturalist. Andrew Simon from the Institute for Multidisciplinary Ecological Research in the Salish Sea (IMERSS) was a close friend and mentee of Harvey’s. He intends to curate and preserve the valuable data that Harvey collected over his lifetime. This includes overseeing 1) the data rescue of his field notes, 2) the review of Harvey’s specimen data in herbarium databases, and 3) overseeing a committee of curating botanists working to complete Harvey’s last work of publishing an annotated checklist of vascular plant species for the Southern Gulf Islands and Saanich Peninsula. Therefore, this project has many moving parts, including two rounds of LDP internships (see the summary of the first internship by Emma Menchions linked here) plus a Hack-a-thon event held at the University of British Columbia, where undergraduate students worked to digitize a large portion of Harvey Janszen’s field notes. to edit. Ben:
After becoming acquainted with the scripts and photo scans of Journals 7 and 8, I turned my attention away from the occurrence, and towards collection data – samples taken during Harvey's excursions, noted and deposited at the Royal BC Museum, Victoria. A considerable amount of time was spent trawling the original field notes, cross-referencing sample number ID’s and the locations of where samples were collected. Similarly to occurrence observations, the excel spreadsheets, initially compiled through undergraduate hackathons, were cleaned and standardised using R scripts provided by Emma Menchions. These were in turn converted into Darwin Core format, and, once further information is made available from the Royal BC Museum, will also be converted to the internal collections formatting systems used. The digitisation of these field notes will make a portion of Harveys work openly available and should contribute towards a Flora of Texada Island to be produced. Lindsay: The primary goal of my internship was to compile all the records of species occurrences observed by Harvey Janszen in his 8th journal of field notes, which span from 1996-2000. Using the data processing protocol developed by Emma, I carried out data QA/QC on nearly 950 records of species occurrences contained in Harvey’s 8th journal. This process included ensuring species names were up-to-date according to the Flora of the Pacific Northwest, ensuring all species occurrences were properly geolocated, and converting the final dataset into Darwin-core format, a standard for documenting biodiversity. Then, I turned my attention to digitizing a preliminary checklist of the plant species of Texada Island (Toward a Flora of Texada Island), produced by Terry Ludwar and John Dove in 2018. I also compiled and compared species occurrence data from the Consortium of Pacific Northwest Herbaria (pnwherbaria.org) and the Global Biodiversity Information Facility (GBIF; gbif.org). Pulling together all these different sources of data is an important first step toward finding all the pieces of the puzzle that make up the flora of Texada Island. This internship provided a very rewarding experience to both of us. Not only did it provide the opportunity to explore and learn about the vast diversity of plants on Texada Island, but also refined our skills in data cleaning and formatting. The importance of reproducibility in data QA/QC and how valuable field notebooks are as source of data also became apparent throughout the internship. We would both like to extend our thanks to Andrew and the Living Data Project for organizing this important project, and for providing the opportunity to help continue Harvey Janszen’s legacy. Data rescue intern: Jawad Sakarchi
For six weeks in the summer of 2023, I had the opportunity to participate in the LDP Data Rescue Internship. The goals set out for my internship were centered around a very large database of Range Reference Area data collected and monitored by rangeland ecologists of the Ministry of Forests in British Columbia. The data focuses on differences inside, outside and between fenced exclosures. These exclosures act as tools to explore the effects of disturbances on ecological communities, in the form of grazing (e.g., livestock or wildlife). The dataset is large, with up to 70 variables, including other disturbances, such as fire, and many biogeographic variables, such as moisture, nutrient, or ecosystem classifications. The benefit of this dataset is that it both provides an exceptionally extensive account of disturbances across 370 sites throughout BC, as well as observations (data, photographs, notes) in the same site separated by decades, or in most extreme cases even 100 years. The Range Branch saw this dataset as extremely valuable and wanted to know what was available for both future data management, but also to provide an inventory for future ecologists that may wish to use and cite this data. How rich the dataset was presented a challenge as it was not clear at first, particularly for an academic not in the field. Over 100 years one may imagine notes take different forms, sites have different names, locations change in names, digital data takes different organization, and often in inaccessible programs (Microsoft Access). At the start of the internship many notes had recently been scanned and digitized, alongside many excel files. Though both the data and organization were conducted by separate people. The goal for this six week internship was to bookkeep what is available and create metadata, restructure and reorganize the data into an efficient entity relationship diagram for future processing in a relational database management system, revise naming conventions to be more consistent, identify unnamed/ambiguous variables to create a data dictionary, identify depreciated variables, create a DOI, and look up tables from non-tidy data while documenting the process for accessibility and future directions. To achieve this, I primarily worked with Nancy Elliot, a rangeland analyst at the British Columbia Ministry of Forests Range Branch (Range, Invasive Plants, and Ecosystem Restoration). Many of the files, such as the data dictionary, entity relationship diagram, background information, and R scripts associated with the project are now publicly available on the Open Science Framework. The metadata will be used to provide future researchers with an opportunity to easily understand and use the data. Alongside its documentation of decisions and next analysis steps, this project allows for reproducible management and living continuation of this rich dataset. Data rescue intern: Mannfred Masahiro Asada Boehm Sharpsand Creek is located approximately 60 km north of Thessalon, Ontario. Within this area of Crown Land, extensive wildfire research was undertaken by Brian Stocks and Doug McRae (Canadian Forest Service) from the 1970’s to early 1990’s. The oldest data in the Sharpsand Creek data set is 50 years old, with the most significant data coming from prescribed wildfire-research burns. Data collected includes pre-burn forest inventories, fire behaviour data, and post fire site analysis on regeneration, soils, and vegetation. These experimental burns generated a wealth of data, some of which have resulted in influential papers in the wildland fire literature. Other data have, until recently, remained undigitized and not available to the public. The types of large-scale experimental burns at Sharpsand Creek are very unlikely to occur today because of the risk involved.
Since the 1990’s research at Sharpsand Creek has transitioned from experimental burning to the study of duff (forest floor detritus) moisture dynamics. These studies continue to inform the creation and revision of the Canadian Forest Fire Danger Rating System (CFFDRS), a national system of models to support wildland fire decision making. The data collected includes destructive sampling of duff to quantify effects on bulk density and in-stand climate variables (air temperature, relative humidity, wind, rain, soil temp, soil moisture, and solar radiation). Reviewing and mining the historic data from Sharpsand Creek is invaluable for enhancing current wildfire research. Thus, researchers at the Canadian Forest Service are continuing to find, digitize, and database all known studies that have taken place at Sharpsand Creek. The growing database now has over 600 entries, which includes 700 images capturing wildfire dynamics which were previously not attributed to any specific project. Over the course of my LDP internship, I wrote a series of scripts to automate the annotation and sorting of these data into a hierarchical and chronological folder structure. This has allowed the Canadian Forest Service to finally integrate these historic data as it develops the next generation of the CFFDRS. The scripts, and a subset of the sorted data, are housed on the Open Science Framework project: https://osf.io/2np87/ . Data rescue intern : Rolando Trejo-Pérez, Institut de recherche en biologie végétale (IRBV), Université de Montréal During the summer 2023, I participated in a rescue data internship as part of the Data Management and Reproducible Research program. The primary goal of this internship was to transform and clean water quality and temperature data provided from the Nova Scotia Salmon Association (NSSA) into the DataStream schema. This transformation and cleaning were accomplished using reproducible R scripts, ensuring consistency and transparency in the data conversion process. Figure 1. Map of the sites included in A) W.A.T.E.R. (Watershed Assessments Towards Ecosystem Recovery) project, B) WRSH (West River, Sheet Harbour) Acid project, and C) Oceans North project. Source: Leaflet/ Tiles © Esri-Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, UPR-EGP, and GIS User Community. Data rescue intern: Lauren Gill Bill Merilees is an accomplished naturalist who has had a long, varied, and productive career. As Bill was influenced by many people through the years, he in turn has influenced and inspired many people throughout his life... Data rescue intern: Emily Black The greater sage-grouse (Centrocercus urophasianus) is a charismatic North American prairie bird. This species engages in a unique breeding behaviour called lekking, where males perform communal breeding displays on historic breeding grounds called leks. These breeding behaviours have led to the development of unique and exaggerated male characteristics such as brightly coloured combs above the eye, noisy mating dances (aka struts), and large inflatable air sacs on the male’s chests. digitization of museum specimens for the Lyman Entomological Museum and McGill University Herbarium10/6/2023
Data rescue intern:
Jessica Reemeyer My Data Rescue Internship focused on facilitating digitization of museum specimens for the Lyman Entomological Museum and McGill University Herbarium. Both museums had data that had been digitized but needed a database to store the data and an easier way for volunteers to upload data as they digitize new specimens. To fill these needs, I created a web-hosted SQL database for each museum. I then created a corresponding website with a webform to facilitate digitization of specimens. |
Archives
October 2023
Categories |