Counts, binned data, or other data representations can be mapped to the PPO and by so doing, assure interoperability with field monitoring data. There are some taxonomic groups that have very small floral structures that either require an onerous amount of time to score or require expertise to determine what kinds of organs are actually present on the sheet. Members of Poaceae, Cyperaceae, and Juncaceae, as well as certain Asteraceae, are a few examples for which second-order scoring may be more challenging. However, it should be relatively easy to apply first-order scorings to these groups, thereby greatly increasing the utility of these specimens for phenological research. Our protocol does not address the presence/ absence or abundance of male vs. female flowers, or distinguish between perfect and imperfect flowers in gynodioecious, gynomonoecious, or and romonoecious species, largely due to the fact that these categories have seldom been included in phenological research. The timing of reproduction is not the only important phenological event of interest to be tracked in plants. Leaf bud break and leaf-out are important phenomena for deciduous forests, as is autumn senescence. These vegetative characters are often tracked via satellite imagery and in situ monitoring efforts. Scoring phenological leaf traits on herbarium specimens is rare , vertical aeroponic tower garden but it provides valuable insights into the effects of climate change .
A similar scoring protocol is recommended for foliar structures, although we do not specify a protocol here. Online documentation, including definitions and examples, is provided for each term used in the Darwin Core . Any attempts to share phenological or other trait data from specimens should utilize Darwin Core fields to assure that the basic specimen occurrence information is standardized. However, phenological trait descriptions are not part of the Darwin Core and therefore other mechanisms are needed to support narrower or broader data-sharing approaches. We are proposing to share phenological data using the Darwin Core Extended MeasurementOrFact extension . This extension provides a mechanism whereby many measurements or facts can be shared for each specimen record in a Darwin Core Archive. This extension allows for sharing of metadata associated with each phenological scoring. For example, when evaluating data quality, it can be useful to know when, how, and by whom scorings were recorded. Accuracy may be affected by whether the specimen was scored from a web-based image or the physical specimen. An eMoF record can contain a definition of the type of measurement, the value and units of the measurement, the method of measurement, and by whom and when it was measured. Table 5 shows an example Darwin Core Archive file of the eMoF extension of two herbarium sheets that were scored using our protocol . The first record was scored with a measurement Value = ‘Reproductive.’ Additionally that same record is scored as measurement Value = ‘Open flowers.’ The second record, united by catalog number, is scored ‘Reproductive,’ ‘Open flowers,’ and ‘Fruiting.’ Using the eMoF extension has a potential disadvantage, namely that it does not allow the measurement to be rigorously tied to a particular aspect of the core record.
This means that any user can define a new and non-standard ‘measurement Type’ and ‘measurement Value’ , which could lead to difficulty compiling data. Unless various measurement Types and measurement Values are rigorously defined, an excessive number of unique text strings could be generated. To address this, we are working toward defining these terms within Apple Core. Apple Core is a set of best practice guidelines for publishing botanical specimen information for herbaria. A goal of the guidelines is to mitigate the generality of Darwin Core by providing detailed guidelines for publishing botanical specimen information in Darwin Core. These guidelines will include recommended terms, specific definitions, multiple examples, common issues, and controlled vocabularies where appropriate that are specific to herbarium specimens. Apple Core is a community-curated resource that is still being refined, and interaction with phenological researchers will help to strengthen this resource. Finally, use of this approach is complementary with broader sharing initiatives that utilize ontologies, such as the Plant Phenology Ontology. In the near future, using the eMoF extension will allow for phenological scorings to be published in iDigBio, the Global Biodiversity Information Facility , and other public repositories. Darwin Core Archive publishing services are available within all Symbiota portals and form the basis from which iDigBio harvests specimen data from these portals . Adherence to our protocol at local institutions will facilitate the search functions provided and developed by large aggregators such as iDigBio and GBIF.The questions presented here provide important data for researchers while also requiring minimal effort from herbarium curators. Phenological questions are easily integrated into standard label digitization workflows or could be subsequently scored from images.
Due to the nested nature of the questions, a third-order question can be scored initially, with the appropriate second- and first-order questions automatically populated. For example, a report of “fruits present” on a specimen would automatically score a “yes” for the first-order question, indicating that reproductive structures are present. To answer first-order questions, the person who is performing the initial data entry for a specimen need only look at the sheet and check a box indicating whether reproductive structures of any kind are present. For databases that do not have the infrastructure to accommodate this type of scoring, a few alternatives are presented below.Phenological scores can be recorded at a number of steps in a digitization workflow. In the case of an object to-data workflow, scores could be made directly from the sheet as label data are being captured. With an image-based workflow, the scoring of specimens can be achieved by visual inspection of their images. The latter approach provides the option of making the image available online where the public can record phenological observations. Machine learning approaches are likely to facilitate our ability to score images at scale in the near future. Database fields in local databases need to be modified to accommodate the proposed structure. Implementation of controlled vocabularies can be facilitated with drop-down menus or pick-lists ; however, providing such functionality might require changes to database management software. Fortunately, a number of tools have been developed for scoring the phenological status of specimens. For curators who do not have a database with Symbiota-type functionality that provides phenological check boxes corresponding to our proposed protocol, we suggest that users enter phenological information into an appropriate text field within their existing database with the expectation that new tools will enable users to search these text fields and score the specimens appropriately . Ideally, every institution’s home database will include a text field dedicated exclusively to information pertaining to phenology. However, including phenological information as text anywhere within a given specimen’s label data is better than not capturing any phenological traits. To choose the best text field within a local database, it is important to know how the specimen data appear when shared using a Darwin Core Archive. If, for example, one’s local database conforms to Darwin Core, reproductive traits should be included in the ‘reproductive Condition’ field. The words entered into the text field should be unambiguous and should correspond to the protocol above . This is an action that all curators can immediately integrate into their current digitization workflows.Those managing or implementing digitization workflows should consider incorporating the scoring of phenological data into their workflows. At the very least, vertical gardening in greenhouse first- or second-order phenological data should be considered for capture. Doing so will facilitate future scoring of the specimens. If time does not permit training herbarium personnel to record challenging second-order scorings, then simply adding the word “reproductive” somewhere in a relevant database field will aid future work and research use.Part of the NEVP project was the development of a tool to score phenological traits using digitized label text . This tool allows a user to search for specific words in database fields and map these to the proposed vocabulary. For example, using this tool to search the field ‘reproductive Condition’ within SEINet resulted in over 4000 unique text strings . The Attribute Mining Tool allows one to select all records containing text that refer solely to a single scoring category. For example, if a user were scoring “open flowers present” only, the user could select all the highlighted rows in Table 1 and click “Open flowers present.” In the example from SEINet presented in Table 1, this single scoring event would result in the selection and scoring of 1,031,786 records.
In a separate scoring event, the user could select all records that make reference to both open flowers and fruits and then select “open flowers present” and “fruits present.” Because a curator is responsible for mapping free text strings from the database to a controlled vocabulary, this method does not rely on computerized inference. The ability to apply phenological scoring to any specimen within a Symbiota portal is highly efficient, and these types of tools should be developed within other database platforms.Many platforms have been developed for remotely scoring images of specimens, and we review them below. It is vital that future scoring platforms conform as closely as possible to the proposed protocol to facilitate data integration. Furthermore, it is vital that specimen trait data, even when scored outside of the local database, remain associated with the original specimen record. This will allow trait data and occurrence data to travel together through the data aggregation process, preventing duplicated scoring efforts.The new Image Scoring Tool, developed as part of the NEVP project, allows Symbiota network users to filter images and apply a phenological score to them . This approach has facilitated the scoring of over 240,000 images of New England specimens to date. Phenological scorings are being shared with end users through the Consortium of Northeastern Herbaria portal via the Darwin Core Extended MeasurementOrFact extension and Darwin Core Archives, as outlined above. This functionality will soon be available to all Symbiota-based databases.Notes from Nature is an online citizen science platform originally developed to support the transcription of specimen labels, but it has expanded to include phenological classifications. Notes from Nature extends the Zooniverse model by providing a simple way for curators or researchers to bundle and upload images, set targets for transcriptions or scoring, launch new expeditions, and engage volunteers . Notes from Nature addresses data quality by requiring a minimum of three replicated classifications for each imaged specimen. When expeditions are completed, a suite of tools are available for reporting outcomes of efforts, and automated reconciliation occurs to produce a “best classification.” This includes data for phenological categories and for counts of reproductive structures. Notes from Nature phenology expeditions have so far solicited reports of flowering and fruiting as well as counts of reproductive structures for Quercus L., Coreopsis L., and Cakile Mill. Notes from Nature has launched expeditions asking for simple annotations of open flowers or fruits present, to more complex expeditions where users are asked to count numbers of unopened flowers, open flowers, and fruits. Asking users to report first- and second-order scorings generated large volumes of accurate phenological data, whereas expeditions asking for more complex scorings, such as counts, had lower participation from the community of citizen science annotators and took much longer.CrowdCurio is a new online platform designed to give researchers the ability to design and implement crowd sourcing projects tailored to their specific interests and data sources . Most recently, a CrowdCurio project, titled “Thoreau’s Field Notes,” demonstrated that the platform was an effective tool for crowd sourcing the collection of phenological data from digitized herbarium specimens . Participants are presented with an image of a herbarium specimen and asked to annotate the image by clicking on each visible unopened flower, open flower, and fruit. These annotations are then transformed into counts that can be used to approximate the phenological stage of a given individual specimen. In a preliminary study of the efficiency and quality of CrowdCurio data collection, Willis et al. compared data collected by expert and non-expert participants for two common New England species: greater celandine and lowbush blueberry . They found that non-expert counts were similar to expert counts, but that non-experts were able to record nearly twice as much data at less cost over the same amount of time. Data collected via crowd sourcing, however, are not without limitations.