The data rescue intern: Billi Krochuck
Bird populations have experienced marked declines across Canada in recent decades; grassland bird species are among those which have experienced the greatest population declines. Indeed, in Canada these species have experienced declines of approximately 60% since the 1970s. This is primarily due to loss of habitat or habitat degradation, whether through conversion to commercial monoculture agriculture or through poor land management practices. This underlines the importance of monitoring these species.
Understanding patterns in breeding phenology plays an important role for informing management and conservation efforts across the country. The province of Saskatchewan has a long-term dataset of breeding bird nest records, dating back to the 1940s and ‘50s. Until now, the nest records were maintained in multiple formats (spreadsheets, paper data cards) and had never been quality assessed or cleaned. The heterogeneous nature of the data system puts the data at high risk of being lost. The historical nest records are very valuable to establishing historical bird population metrics and distribution, as well as change over time, including range expansion or contraction and changes in brood sizes. Ultimately, to rescue the nest record database, the records needed to be cleaned and quality controlled, then archived in an open, permanent, and accessible repository to facilitate future use. The rescue of these data has been undertaken by the University of Regina, the Royal Saskatchewan Museum, the Saskatchewan Conservation Data Centre (CDC), and the Saskatchewan Ministry of Environment as well as two graduate student interns and two undergraduate assistants who have helped work towards achieving their data rescue goals.
One challenging part of the data rescue process was dealing with the different storage formats. Historical nest records are stored on physical nest cards in filing cabinets at the Saskatchewan CDC, in Regina. As well, most of the data was entered by several different people (staff, volunteers, and citizen scientists) into Birds Canada’s Nest Watch Database. Some of the data entry was done manually, and some of it was done using a Scantron. However, neither of these storage methods maintain the data in a suitable format for use by the Saskatchewan CDC. The Nest Watch Database consists of approximately 18,800 unique nest records. Such a large database requires a team of students and mentors to complete the rescue and archive process. In 2020, Kelsey Bell from the University of Regina, was tasked with reviewing the data quality and developing a plan for cleaning the data and preparing it to be rescued and archived. Josh Christiansen, also from the University of Regina, was the undergraduate assistant who assisted with the data rescuing.
To start, Kelsey worked on creating a reproducible, quality control check for the nest locations. Several issues were discovered, such as missing locations, outlier locations (i.e., nests outside Saskatchewan) and 4000+ records with the exact same coordinates. Josh worked through rescuing the location data by going to the Saskatchewan CDC office, finding the physical nest cards, and correcting any locational data (if possible). The next step was to design a reproducible way to transform the Birds Canada Nest Watch Database to match the template of the Saskatchewan CDC. Since the Nest Watch project is still active and ongoing, data will need to be continually quality controlled and transformed with each new year. It was important to come up with a data cleaning and transformation solution that could be used year after year with the additional data. Kelsey’s final, and biggest challenge, was to define and code a set of rules to estimate location accuracy. The Birds Canada Nest Watch Database does not normally include a column for location accuracy. However, an estimate of location accuracy is a critically important piece of data for the Saskatchewan CDC. Based on different coordinate sources (GPS, map, postal codes, etc.) Kelsey developed a set of reproducible rules to assign accuracy level estimates to Nest Watch locations.
In 2021, Billi Krochuk from the University of British Columbia tackled breeding record data including date of observation as well as clutch size. All values were assessed at species-specific levels and so she amassed breeding season information from Birds Canada and the Saskatchewan Breeding Bird Survey and clutch information from Birds of the World and the Saskatchewan Breeding Bird Atlas. Using these species-specific details, she developed quality control filters. Any records that were flagged as early (48 records), a week early (65 records), a week late (266 records), or late (179 records) were assessed and either deemed as legitimate outliers or had their original nest cards checked by Michelle Desjarlais, from the University of Regina, to make sure there were no issues with Scantron misreading numbers, for instance, or that there were no cases where month and day values were swapped (for which there were potentially up to 34 records). In addition, Michelle located 95 individual nest cards for which there was no date information in the database. There were only three records flagged as having egg or chick counts that were unusually high but deemed legitimate outliers. An additional step was taken to assess whether egg records were unusually late for a given species or whether chick records were unusually early.
By the end of the first two internship periods, the team was able to rescue just under 3000 nest records. This accomplishment would not have been possible without Josh and Michelle’s hard work to manually correct the location and breeding biology data. As well, the database was transformed into a tidy, and usable format for the Saskatchewan CDC and location estimate rules were defined and established. It was certainly a challenging experience, but a rewarding one. Kelsey, Billi, Josh, and Michelle got to work with a great team from the Province of Saskatchewan, Ryan Fisher (The Royal Saskatchewan Museum) and Andrea Benville (Saskatchewan CDC) with special thanks to Ellen Bledsoe (University of Regina) for all the guidance and advice with R and GitHub.