To estimate the extent of incorrectly-predicted pathways in the PMN databases, and to measure the overall accuracy of the computational predictions, both alone and in conjunction with manual curation, we evaluated the prediction of 120 randomlyselected pathways on both the released single-species databases in Pathway Tools and naïve prediction versions generated using only computational prediction . Biocurators evaluated the pathway assignments to the 126 organisms currently in PMN, and classified them as “Expected” , “Broader” , “Narrower” , or as NonPMN Pathways . In the naïve prediction databases, only 15% of selected pathways were predicted within the phylogenetic ranges expected from the literature, and 58% were NPPs. In the released PGDBs, however, 78% of evaluated pathways were predicted as expected . In addition to correcting the prediction for 94% of all NPPs of the surveyed pathways, incorporating curated information also reduced the percent of pathways predicted beyond their expected phylogenetic ranges from 13% to 4%. Thus, the applicationof phylogenetic information and manual curation drastically improves the quality of pathway prediction throughout PMN databases over the use of computational prediction alone. To determine whether different groups of plants can be differentiated solely by their metabolic capacity, we performed multiple correspondence analysis , a type of dimension reduction analysis that is similar to principal component analysis but can be used for categorical data . MCA was carried out using presence-absence matrices for pathways, reactions, and compounds . Reactions were considered present only if at least one enzyme in the species was annotated as catalyzing the reaction. Independently,25 liter pot the plants were categorized according to phylogenetic groups. Dimensions 1 and 3 of the pathway and compound MCA, and dimensions 1 and 2 of the reaction MCA, separated the species into several phylogenetic groups .
Phylogenetic groups that clearly cluster together and away from other groups include algae, non-flowering plants, Brassicaceae, and Poaceae . Dimension 1 separates the chlorophytes from land plants and dimension 3 separates certain angiosperm families such as the Brassicaceae and Poaceae well. No clear separation was observed among other eudicot groups. In addition, dimension 2 of the pathway and compound MCA mostly separated a small number of highly curated species from all the rest . Overall, the MCA clustering shows that some groups of plants can be readily differentiated based on their metabolic information in PMN, while other groups cannot, suggesting that further curation of species in these groups may be beneficial. We next asked which metabolic pathways drive the separation of the taxonomic groups on each dimension . Seventy percent of the variance in dimension 1 was described by 109 pathways, all of which were predicted to be either embryophyte-specific pathways or present in a larger proportion of embryophytes than chlorophytes. This mirrors the separation of the Chlorophyta cluster in dimension 1 of the MCA plot . Similarly, 70% of the variance along dimension 3 was captured by 150 pathways, of which 81 were associated more strongly with Poaceae and 69 were associated more strongly with Brassicaceae . The pathways that contributed 95% of the variance in dimension 1, which separates chlorophytes from embryophytes, were enriched for hormone metabolism . Hormone metabolism may have helped support the increased complexity of land plants compared to their algal ancestors . In contrast, pathways responsible for clustering along dimension 3 were enriched for specialized metabolism , which is more lineage-specific than other domains of metabolism and can help distinguish between clades of angiosperms . Thus, it appears that metabolic data in PMN can effectively differentiate groups of species not only by the presence or absence of specific pathways and reactions, but also by the types of metabolic processes which are related to their evolutionary divergence. One of the major advantages of PMN 15 is the ability to quantitatively and qualitatively analyze omics data in the context of global metabolism for the 126 species represented in the resource.
Here we demonstrate two applications of integrating omics data with PMN 15 to gain novel insights about plant metabolism. Pathway Tools’ Omics Dashboard allows users to visualize omics data across experimental timepoints and conditions across a broad range of cellular subsystems and at multiple scales of resolution . To demonstrate the utility of the Omics Dashboard within a metabolic context, we used a transcriptomic survey of two sorghum cultivars, RTx430 and BTx642, subjected to drought stress . RTx430 is tolerant to pre-flowering drought, whereas BTx642 is tolerant to post-flowering drought. To see if there was any difference in metabolic gene expression between the two cultivars in response to post-flowering drought, we examined differentially expressed genes in droughted plants compared to well-watered plants from the last week of watering to the first two weeks of post-flowering drought . We observed a consistent down-regulation of biosynthetic activity from root tissues in the post-flowering drought sensitive cultivar RTx430 compared to relatively stable expression in the post-flowering drought tolerant cultivar BTx642 . This observation is consistent with the authors’ findings that BTx642 demonstrated higher levels of redox balancing and likely experienced lower levels of reactive oxygen species stress, compared to RTx430, as a result of drought. By analyzing expression patterns of all metabolic genes, we observed a widespread metabolic down regulation in RTx430 root tissue, which was not reported previously . Drought-responsive DEGs were enriched in metabolic genes among both leaf and root tissues. However, contrary to the clear cultivar-specific trends shown in the root DEGs , there was no clear trend in expression patterns of metabolic genes in the leaves of either cultivar as a result of drought . To determine whether the consistent reduction of metabolic gene expression observed in RTx430 roots in response to drought was a global trend in the transcriptome or specific to metabolic genes, we compared relative expression levels of all non-metabolic root DEGs to all metabolic root DEGs in both cultivars during the same 3-week period.
While the average relative expression decreased each week among both metabolic and non-metabolic genes in RTx430, the down regulation was greater among metabolic genes at both time points . In contrast, BTx642 roots showed no difference in expression among both metabolic and non-metabolic genes in response to drought , suggesting a global metabolic homeostasis in sorghum drought tolerance. By comparing the patterns of expression among DEGs in root and leaf tissues, rather than solely the number of DEGs, analysis via the Omics Dashboards revealed that roots exhibited stronger genotype-specific responses to drought than leaves, which was not observed previously . In addition to offering a visual overview of metabolism via the Omics Dashboard, PMN’s analytical toolkit allows researchers to easily conduct enrichment analyses among a set of genes or compounds of interest. From within a SmartTable, users can view the pathways associated with a set of genes or compounds, and can then ask whether those genes or compounds are enriched for specific pathways. Broader metabolic classifications can also be added to the list of enriched pathways to better understand which area of metabolism are most enriched. For example, among the set of drought-responsive DEGs in RTx430 roots, we observed an enrichment in various domains of carbohydrate and amino acid biosynthesis and degradation, in addition to chitin degradation, consistent with the authors’ observation of drought-induced responsiveness of biotic defense genes . Thus, by combining PMN’s analytical capabilities with its broad set of metabolic data, users can find additional means of supporting existing hypotheses, uncovering novel insights, and finding new avenues for exploration in their own research. PMN 15 data can also be integrated with other cutting-edge datasets to investigate novel biological questions. As an example, we integrated A. thaliana root single-cell RNA-seq datasets from five independent studies to examine cell type specificity of metabolic domains and pathways . We define cell type-specific metabolic domains as those whose constituent genes show significantly higher expression levels in certain cell types compared to their average expression level in total cells. Different metabolic domains showed overlapping as well as distinct cell type specificity . First,25 liter pot epidermal and cortex cells were most metabolically active throughout the various domains of metabolism . This is consistent with previous observations that the major groups of metabolites detected in Arabidopsis roots, including glucosinolates, phenylpropanoids, and dipeptides, were highly abundant in the cortex . In contrast, maturing xylem showed relatively low metabolic activity as the major roles of these cells are structural support and water/soluble transport . Viewed from the level of metabolic domains, this analysis demonstrates a diverse range of metabolic activity across unique cell types in Arabidopsis roots. We next probed cell-type specificity of individual pathways. Among the 198 pathways associated with at least 10 genes, 40 pathways showed specificity in at least one cell type compared to their background gene expression levels represented by the average expression level of the pathway across all cell types . For example, in actively dividing cells, such as meristematic xylem cells, pathways involved in pyrimidine, histidine, arginine, and lysine biosynthesis showed high activity . These pathways are involved in essential metabolism, which are critical for maintaining cell division and growth. On the other hand, hormone biosynthesis pathways, such as cytokinin glucoside and gibberellin, showed high activity in the cortex. This is consistent with current understanding that the cortex is one of the predominant cell types that synthesizes these two hormones in the Arabidopsis root .
By elucidating cell type-level activity of metabolic pathways, we can begin to map metabolism at cellular and tissue levels, which will be instrumental in understanding how metabolism affects plant development and responses to the environment as well as enabling effective engineering strategies. Similar to cell-type specificity, the concept of pathway divergence at the individual cell level can also be explored using single-cell transcriptomics data. To probe this question, we asked whether isozymes catalyzing the same reaction are more likely to be expressed in different cells compared to enzymes catalyzing different reactions in the same pathway. Isozymes are defined as enzymes encoded by different genes catalyzing the same reaction, which are usually the result of gene duplication events. We computed Spearman’s correlation coefficient to measure gene expression pattern similarity between a pair of enzymes across Arabidopsis root cells. The coefficients computed based on single cell data were generally lower than that generated by bulk RNA-seq, which may be due to the sparseness of single cell transcriptomic profiles or high heterogeneity of gene expression across cells. Nonetheless, metabolic genes in the same pathway showed higher correlation than randomly sampled metabolic genes , which suggests functional coordination between genes involved in the same pathway at the cellular level. Isozymes were much less correlated than enzyme pairs catalyzing different reactions in the same pathway. This indicates that isozymes may have evolved divergent expression patterns in root cells . Since isozymes are often the results of gene duplication events, this diversified expression between isozymes may contribute to retaining duplicated genes through sub-functionalization or neofunctionalization and fine-tuning metabolic pathways at the cellular level . Recently we introduced the Pathway Co-Expression Viewer, which integrates information from PMN 15 and ATTED-II , a database of gene co-expression, to visualize co-expression of the genes in a pathway for species represented in ATTED-II , Solanumlycopersicum , Oryza sativa , Zea mays , Brassica rapa, Vitis vinifera , Populus trichocarpa , and Medicago truncatula. An example is shown in Supplemental Figure S4A-B; Lysine biosynthesis is currently known to occur via two distinct routes, utilizing either diaminopimelate or α- aminoadipate as an intermediate. Its biosynthetic pathway in plants, cyanobacteria, and certain archaebacteria converts tetrahydrodipicolinate to L,L-diaminopimelate via L,L-diaminopimelate aminotransferase and is distinct from that of other prokaryotes and of fungi . Lysine biosynthesis is of particular importance as it is both an essential amino acid not biosynthesized by mammals and it is the least abundant essential amino acid in cereals and legumes . The co-expression levels of six pairs of genes are in the top 1% of coexpressed gene pairs within ATTED-II, while an additional 10 gene pairs are in the top 5% . This tool provides a convenient way of visualizing the co-expression of genes in a pathway and thus provides clues as to how the pathway may be regulated. PMN 15 introduces an additional feature which provides a new way of visualizing pathways that span intracellular compartments and include transport reactions. For example, the glutamate-glutamine shuttle from AraCyc is a pathway in which glutamate and glutamine are exchanged between the mitochondria and chloroplast as a means of ridding the mitochondria of ammonium produced during photorespiration .