All biocuration is time consuming and requires assistance from expert biologists

Analyses of single census years provide wildly varying estimates of the effect of landscape simplification on insecticide use. It is evident that the relationship between landscape simplification and insecticide use is spatially and temporally context-dependent, and that there are a number of ways that context could be determined. Although it remains unclear what underlying mechanisms are providing the context, it is abundantly clear that the relationship between landscape simplification and insecticide use observed in 2007 does not hold for previous census years. It is time to move beyond simply asking whether landscape simplification drives insecticide use and instead focus on what factors may explain the variability in this relationship over time and space.We are in an exciting time in Biology. Genomic discovery on a large scale is cheaper, easier and faster than ever. Picture a world where every piece of biological data is available to researchers from easy-to-find and well-organized resources; the data are accurately described and available in an accessible and standard formats; the experimental procedures, samples and time points are all completely documented; and researchers can find answers to any question about the data that they have. Imagine that, with just a few mouse clicks, you could determine the expression level of any gene under every condition and developmental stage that has ever been tested. You could explore genetic diversity in any gene to find mutations with consequences. Imagine seamless and valid comparisons between experiments from different groups. Picture a research environment where complete documentation of every experimental process is available,dutch bucket for tomatoes and data are always submitted to permanent public repositories, where they can be easily found and examined.

We ‘can’ imagine that world, and feel strongly that all outcomes of publicly funded research can and should contribute to such a system. It is simply too wasteful to ‘not’ achieve this goal. Proper data management is a critical aspect of research and publication. Scientists working on federally funded research projects are expected to make research findings publicly available. Data are the lifeblood of research, and their value often do not end with the original study, as they can be reused for further investigation if properly handled. Data become much more valuable when integrated with other data and information . For example, traits, images, seed/sample sources, sequencing data and high-throughput phenotyping results become much more informative when integrated with germplasm accessions and pedigree data. Access to low-cost, high-throughput sequencing, large-scale phenotyping and advanced computational algorithms, combined with significant funding by the National Science Foundation , the US Department of Agriculture and the US Department of Energy for cyber infrastructure and agricultural-related research have fueled the growth of databases to manage, store, integrate, analyse and serve these data and tools to scientists and other stakeholders. To describe agricultural-related databases, we use the term ‘GGB database’. GGB databases include any online resource that holds genomic, genetic, phenotypic and/or breeding-related information and that is organized via a database schema, and contained within a database management system , or non-relational storage systems. GGB databases play a central role in the communities they serve by curating and distributing published data, by facilitating collaborations between scientists and by promoting awareness of what research is being done and by whom in the community. GGB databases prevent duplicated research efforts and foster communication and collaboration between laboratories .

As more and more organisms are sequenced, cross-species investigations become increasingly informative, requiring researchers to use multiple GGB databases and requiring that GGB databases share data and use compatible software tools. Use of common data standards, vocabularies, ontologies and tools will make curation more effective, promote data sharing and facilitate comparative studies . The AgBioData consortium was formed in 2015 in response to the need for GGB personnel to work together to come up with better, more efficient database solutions. The mission of the consortium, comprised of members responsible for over 25 GGB databases and allied resources, is to work together to identify ways to consolidate and standardize common GGB database operations to create database products with more interoperability. The AgBioData consortium joins the larger scientific community in embracing the Findable, Accessible Interoperable, and Reusable data principles, established by stakeholders from the scientific, publishing and library communities . FAIR principles have rapidly become standard guidelines for proper data management, as they outline a road map to maximize data reuse across repositories. However, more specific guidelines on how to implement FAIR principles for agricultural GGB data are needed to assist and streamline implementation across GGB databases. Members of the AgBioData consortium convened in Salt Lake City, UT on 18 & 19 April 2017 to describe challenges and recommendations for seven topics relevant to GGB databases—Biocuration, Ontologies, Metadata and persistence, GGB database platforms, Programmatic access to data, Communication and Sustainability. Preceding this workshop, a survey was sent out to all AgBioData members regarding the seven topics, in order to identify concerns and challenges of AgBioData members. The results were used to focus and foster the workshop discussions. Here we present the current challenges facing GGBs in each of these seven areas and recommendations for best practices, incorporating discussions from the Salt Lake City meeting and results of the survey.

The purpose of this paper is 3-fold: first, to document the current challenges and opportunities of GGB databases and online resources regarding the collection, integration and provision of data in a standardized way; second, to outline a set of standards and best practices for GGB databases and their curators; and third, to inform policy and decision makers in the federal government, funding agencies, scientific publishers and academic institutions about the growing importance of scientific data curation and management to the research community. The paper is organized by the seven topics discussed at the Salt Lake City workshop. For each topic, we provide an overview, challenges and opportunities and recommendations. The acronym ‘API’ appears frequently in this paper, referring to the means by which software components communicate with each other: i.e. a set of instructions and data transfer protocols. We envision this paper will be helpful to scientists in the GGB database community, publishers, funders and policy makers and agricultural scientists who want to broaden their understanding of FAIR data practices.Bio-curators strive to present an accessible, accurate and comprehensive representation of biological knowledge . Bio-curation is the process of selecting and integrating biological knowledge, data and metadata within a structured database so that it can be accessible, understandable and reusable by the research community. Data and metadata are taken from peer-reviewed publications and other sources and integrated with other data to delivera value-added product to the public for further research. Biocuration is a multidisciplinary effort that involves subject area experts, software developers, bio-informaticians and researchers. The curation process usually includes a mixture of manual, semi-automated and fully automated workflows. Manual biocuration is the process of an expert reading one or several related publications, assessing and/or validating the quality of the data and entering data manually into a database using curation tools, or by providing spreadsheets to the database manager. It also encompasses the curation of facts or knowledge, in addition to raw data; for example, the role a gene plays in a particular pathway. These data include information on genes, proteins, DNA or RNA sequences, pathways, mutant and nonmutant phenotypes, mutant interactions, qualitative and quantitative traits, genetic variation, diversity and population data, genetic stocks, genetic maps, chromosomal information, genetic markers and any other information from the publication that the curator deems valuable to the database consumers. Manual curation includes determining and attaching appropriate ontology and metadata annotations to data. This sometimes requires interaction with authors to ensure data is represented correctly and completely,blueberry grow pot and indeed to ask where the data resides if they are not linked to a publication. In well-funded large GGB databases, manually curated data may be reviewed by one, two or even three additional curators. Manual biocuration is perhaps the best way to curate data, but no GGB database has enough resources to curate all data manually. Moreover, the number of papers produced by each research community continues to grow rapidly. Thus, semi-automated and fully automated workflows are also used by most databases. For example, a species-specific database may want to retrieve all Gene Ontology annotations for genes and proteins for their species from a multi-species database like UniProt . In this case, a script might be written and used to retrieve that data ‘en masse’. Prediction of gene homologs, orthologs and function can also be automated. Some of these standard automated processes require intervention at defined points from expert scientist to choose appropriate references, cut off values, perform verifications and do quality checks. All biocuration aims to add value to data. Harvesting biological data from published literature, linking it to existing data and adding it to a database enables researchers to access the integrated data and use it to advance scientific knowledge. The manual biocuration of genes, proteins and pathways in one or more species often leads to the development of algorithms and software tools that have wider applications and contribute to automated curation processes.

For example, The Arabidopsis Information Resource has been manually adding GO annotations to thousands of Arabidopsis genes from the literature since 1999. This manual GO annotation is now the gold standard reference set for all other plant GO annotations and is used for inferring gene function of related sequences in all other plant species . Another example is the manually curated metabolic pathways in Ecocyc, MetaCyc and PlantCyc, which have been used to predict genome-scale metabolic networks for several species based on gene sequence similarity . The recently developed Plant Reactome database has further streamlined the process of orthology-based projections of plant pathways by creating simultaneous projections for 74 species. These projections are routinely updated along with the curated pathways from the Reactome reference species Oryza sativa . Without manual biocuration of experimental data from Arabidopsis, rice and other model organisms, the plant community would not have the powerful gene function prediction workflows we have today, nor would the development of the wide array of existing genomic resources and automated protocols have been possible. Biocurators continue to provide feedback to improve automated pipelines for prediction workflows and help to streamline data sets for their communities and/or add a value to the primary data.Current efforts in machine learning and automated text mining to pull data or to rank journal articles for curation more effectively work to some extent, but so far these approaches are not able to synthesize a clear narrative and thus cannot yet replace biocurators. The manual curation of literature, genes, proteins, pathways etc. by expert biologists remains the gold standard used for developing and testing text mining tools and other automated workflows. We expect that although text-mining tools will help biocurators achieve higher efficiency, biocurators will remain indispensable to ensure accuracy and relevance of biological data. Well-curated GGB databases play an important role in the data lifecycle by facilitating dissemination and reuse. GGB databases can increase researchers’ efficiency, increase the return on research funding investment by maximizing reuse and provide use metrics for those who desire to quantify research impact. We anticipate that the demand for biocurators will increase as the tsunami of ‘big data’ continues. Despite the fact that the actual cost of data curation is estimated to be less than 0.1% of the cost of the research that generated primary data , data curation remains underfunded .Databases are focused on serving the varied needs of their stakeholders. Because of this, different GGB databases may curate different data types or curate similar data types to varying depths, and are likely to be duplicating efforts to streamline curation. In addition, limited resources for most GGB databases often prevent timely curation of the rapidly growing data in publications.The size and the complexity of biological data resulting from recent technological advances require the data to be stored in computable or standardized form for efficient integration and retrieval. Use of ontologies to annotate data is important for integrating disparate data sets. Ontologies are structured, controlled vocabularies that represent specific knowledge domains .

Most growers along California’s Central Coast use phosphorus fertilizer to maintain high crop production

Combinatorial biosynthesis has been successfully applied to generate a library of fungicidal antimycin analogs, which are cytochrome C reductase inhibitors; fenpicoxamid for instance has been developed by Dow AgroSciences to control the wheat pathogen Zymoseptoria tritici. Based on detailed understanding of the bio-synthetic pathway of antimycin, diversity-oriented biosynthesis of about 400 analogs was achieved by altering the chemical identities of priming, extending, and tailoring building blocks. Several of these analogs exhibited stronger biological activities than the original NPs, while a few introduced orthogonal reactive handles in the molecules that enabled further chemical derivatization.The application of insecticides, herbicides, and fungicides with potent bio-activities and good safety profiles has played an indispensable role in improving the yield and quality of agricultural products. However, their continuous and excessive use has led to the emergence of resistance among plants and plant pathogens. Resistance gene-directed NP discovery has been demonstrated to be an effective strategy to uncover novel NPs with desired modes of action as lead candidates for new insecticides, fungicides, or herbicides to address the problem of growing resistance. Metabolic and bio-synthetic engineering of NP synthetic pathways in yeast can further improve titers for microbial production and biological activities for commercial applications. The increasing sophistication of these tools means that we are entering a renaissance of NP discovery for both pharmaceutical and agricultural applications.The Pajaro River and Elkhorn Slough watersheds on California’s Central Coast include some of the state’s most productive and highly valued agricultural lands. The watersheds’ streams and rivers serve as key municipal and agricultural water sources, recreational areas, and wildlife habitat.

Both watersheds drain into Monterey Bay, a nationally protected marine sanctuary,blueberry grow pot and water from the Elkhorn Slough watershed passes through Elkhorn Slough, the largest tidal salt marsh along the Central Coast and a critical resource for resident and migratory birds, fisheries, and other wildlife. Agricultural and urban land uses in the Pajaro River and Elkhorn Slough watersheds have compromised the quality of their waterways. Two nutrients, nitrogen and phosphorus , are of particular concern. High levels of nitrate-N in drinking water pose a threat to human health, and both nitrogen and phosphorus are linked to excessive growth or “blooms” of algae and other plants that can decrease the amount of dissolved oxygen in waterways below the levels that aquatic organisms need to survive. As part of state and federal efforts to protect and restore water quality, regulatory agencies have been charged with establishing target concentrations for pollutants in waterways that will protect beneficial uses1 . The Central Coast Regional Water Quality Control Board has set a preliminary target of 0.12mg/L for soluble reactive phosphorus concentrations, based on the lowest concentrations they have observed in waterways of the Pajaro watershed with excessive plant or algae growth. This pollution is thought to come primarily from “non-point” sources, which are unregulated discharges from urban and agricultural land uses.Increasing evidence suggests that crops cannot take up all of the phosphorus fertilizer being applied ; as a result, excess phosphorus accumulates in the soil. High levels of soil phosphorus in turn lead to higher phosphorus levels in water draining from agricultural fields . In the Pajaro River and Elkhorn Slough watersheds, high concentrations of phosphorus have been identified in several waterways. The RWQCB Watershed Management Initiative implicates agriculture as the primary source of this and other nutrient pollution . However, little empirical data exists to demonstrate that agriculture is responsible for nutrient loading into these waterways. In this research brief we present data from water quality monitoring conducted between October 2000 and September 2004, to demonstrate the way that agricultural land use influences phosphorus concentrations in streams and rivers.

We discuss the nature of phosphorus pollution from agriculture along the Central Coast, examine the implications of these data for agricultural regulations, and offer suggestions for reducing phosphorus losses from farmlands.The Pajaro River watershed drains approximately 1,300 square miles of land, with 7.5% of the watershed in agriculture. Agricultural activities are concentrated in three productive areas: on the flood plain of the Pajaro River near the towns of Watsonville and Aromas ; in South Santa Clara Valley near Gilroy and San Martin ; and in the San Juan Valley near San Juan Bautista and Hollister . Production near the coast is dominated by cool-weather vegetables, berries, flowers, and apples. In the warmer inland areas—east of the Santa Cruz and Gabilan ranges—growers rotate crops of cool- and warm-weather vegetables, along with grapes, flowers, and stone fruits. Approximately 70 square miles in size, the Elkhorn Slough watershed drains northern Monterey County and a small portion of San Benito County. Approximately 24% of the watershed is in agriculture , with strawberries and cool-weather vegetables making up the majority of cultivated acreage .To assess the role of agricultural land use on phosphorus levels in waterways, we began sampling two creeks in October 2000 in the Elkhorn Slough watershed , and several waterways in the Pajaro River watershed, including Corralitos Creek, Watsonville Slough, the Pajaro River, and publicly accessible agricultural drainage ditches. In October 2002 we expanded the project to include all tributaries of the Pajaro River to determine the proportion of nutrients each water basin contributes to the river. We collected water samples every 2 weeks at approximately 60 sites throughout the watershed. Sites were selected to bracket agricultural activity and other land uses in order to compare concentrations upstream and downstream of potential nutrient sources. In addition, several locations were sampled more frequently to capture storm event variability and to measure water discharge for calculations of nutrient loads . For brevity we report here on several key sites that demonstrate spatial and temporal patterns we found to be characteristic of the entire watershed.Naturally occurring phosphorus is derived from apatite, a common mineral consisting of calcium fluoride phosphate or calcium chloride phosphate.

The availability of P to plants in any soil is limited by the rate at which apatite dissolves. Relative to other plant macro-nutrients, inorganic P is fairly insoluble and binds to soil particles. This means it is typically retained in the soil profile and doesn’t leach into groundwater. Phosphorus availability to plants is greatest when the soil’s pH is around 6.55–7.5. In acid soils, dissolved phosphate can precipitate with iron and aluminum oxides, making it unavailable to growing plants, whereas in alkaline soils, dissolved phosphorus can precipitate with calcium. Both inorganic and organic forms of P are found in soils. Since soils tend to “hold” P, it is most commonly lost from soils via erosion. However, if sufficient amounts of P are added to soils over time in the form of fertilizers or other inputs, all the attachment sites on soil particles can become filled, at which point the soluble form of P will be lost through runoff or by leaching. The amount of phosphorus lost from agricultural fields varies greatly,hydroponic bucket and is specific to both local environmental conditions and land management practices. Conditions that increase erosion, runoff, and subsurface water flow also increase soil P losses. Therefore, climate, soil type, and slope can all influence P losses. In addition, a number of nutrient and soil management practices impact soil P movement, including the amount of P applied in fertilizer, the solubility of applied P, the timing of fertilizer applications in relation to plant use and irrigation or rain events, the presence of artificial drainage systems, and cover cropping and tillage practices that affect erosion and water infiltration. In general, most soil P is lost via surface runoff and erosion, but the amounts lost and the timing of such losses are unique to the conditions and management practices used at each ranch or farm. For example, the use of tile drainage systems, which are common in parts of the Pajaro River and Elkhorn Slough watersheds, can greatly increase subsurface P losses. Drains can affect P movement and loss in different ways. As water moves through the soil profile toward the drain, the soil can bind dissolved P, thus removing it from the water; however, tile drains also reduce the amount of time P fertilizer is in contact with soil particles, so overall a smaller fraction of applied P may be retained in the soil profile . Tile drains have also been shown to transport significant amounts of particulate P from topsoil to surface waters during storm events .

Conversely, in soils with poor drainage, installation of tile drains can reduce total P losses during storms by improving infiltration and reducing P lost via surface runoff . Therefore, determining the role of tile drains in P transport under local soil and climate conditions is important for managing P levels in the Central Coast region. In addition to agriculture, natural processes and urban runoff may also contribute P to waterways. Small amounts of P are deposited from the atmosphere in rainfall and in dry airborne particulates. Urban sources of P include residential fertilizer use, automotive products, and septic tanks and leach fields. In the past, detergents were a significant source of urban P pollution, but most detergents are now phosphate-free. In aquatic environments, particulate P can convert to dissolved forms and increase the pool of reactive, dissolved P . These reactive forms, called orthophosphate or soluble reactive phosphorus , are readily taken up by algae, and in excess levels may lead to algae “blooms” and eutrophication .Geographical patterns of dissolved phosphorus concentrations suggest that levels are influenced by land features as well as land use practices. Soil characteristics such as a shallow water table are associated with elevated stream SRP levels, particularly in agricultural areas. In the south Santa Clara Valley, SRP concentrations were low in all waterways with the exception of San Juan Creek. The San Juan drainage has a shallow, perched water table, and receives discharge from artificial tile drain systems, used in agricultural fields to remove water from the rooting zone of crop plants. In contrast, Llagas and Uvas Creeks, which do not receive tile drainage, had low SRP concentrations at all sites. Median SRP concentrations increased slightly at sites downstream of agriculture , but exceeded the target level on fewer than 20% of visits . San Benito Creek and Miller’s Canal, which were both sampled near agricultural fields, also had low median SRP concentrations. The use of tile drainage systems may account for higher SRP levels in waterways with shallow water tables and agricultural land use, including Watsonville Slough and Corn Cob Canyon Creek. Tile drainage systems can increase phosphorus losses by increasing soil infiltration rate and reducing the amount of phosphorus that adheres to soil particles . During winter storms, tile drains may also act as conduits for particulate phosphorus, carrying eroded topsoil to waterways . Non-agricultural land uses, and occurrence of mineral types naturally high in SRP, may also contribute to elevated SRP concentrations in some areas. While nutrients were generally higher at locations downstream of agriculture, Corralitos Creek had elevated nutrients both upstream and downstream of agriculture. At the most upstream site , SRP concentrations often exceeded the target level of 0.12 mg/L while two other nutrients, nitrate and ammonium were very low. The elevated SRP levels are not likely from fertilizer or septic sources, which also tend to be high in nitrogen compounds, but may be due to the mineral composition of soils in this drainage and/or soil erosion. Comparisons of sites upstream and downstream of agriculture revealed higher downstream SRP concentrations in many waterways, providing evidence that agricultural land is a source of phosphorus in surface waters. In the Elkhorn Slough watershed, SRP progressively increased with the amount of cultivated acreage located upstream . The phosphorus content of the soils in the watershed may play a role in how phosphorus moves through this system, but this has not been looked at systematically. In Carneros Creek at Dunbarton Road, which is at the upstream edge of cultivated acreage, the median SRP concentration was 0.10 mg/L, and at San Miguel Canyon Road, downstream of several miles of farmland, the median concentration was 0.53 mg/L. However, in addition to row crops, land use along Carneros Creek is mixed with ranches and rural homes, and more intensive monitoring is necessary to partition nutrient inputs from these potential sources.