The class II DNA transposons content is 34.0% . This high percentage of class II DNA transposons appear to be present in three lineages of rust fungi, the Melampsoraceae , Pucciniaceae and Phakopsoraceae . The recently assembled large genome of the rust fungus Austropuccinia psidii in the family Sphaerophragmiaceae, however seems to mainly have expanded in retrotransposons. This illustrates that TEs exhibit different evolutionary tracjectories in different rust taxonomical families. Over 80% of the P. pachyrhizi genome is comprised of only two superfamilies of TEs: long terminal repeat and terminal inverted repeat . The largest single family of TE are the Gypsy retrotransposons comprising 43% of the entire genome . To understand the evolutionary dynamics of the different TE families present in the P. pachyrhizi genome, we compared the sequence similarities of TEs with their consensus sequences in the three genomes, which ranges from 65 to 100% sequence identity . Based on the concept of burst and decay evolution of TEs, the extent of sequence similarity between each TE copy to its cognate consensus is proportional to the divergence time of copies. This approach allows us to compare within-genome relative insertion ages of TE insertions using consensus of TE families, a proxy for the ancestral sequence. TEs were categorised as conserved TEs , intermediate TEs and divergent TEs. The average TE composition of the three isolates is 13.2−18.3% conserved, 29.4–29.9% intermediate and represent 51.7−57.3% divergent . The average Gypsy retrotransposon composition of the three isolates is 16.5–20.7% conserved, 30.4–31.03% intermediate, and 48.8–52.5% divergent . Similarly, average TIR composition of the three isolates is 12.2–18.4% conserved, 29.0–29.7 % intermediate and 51.8–57.8% divergent . This suggests that i) multiple waves of TE proliferation have occurred during the history of the species, ii) the invasion of the two major TE families into the P. pachyrhizi genome is not a recent event, and iii) the presence of conserved TEs indicates ongoing bursts of expansion of TEs in the P. pachyrhizi genome.
Therefore,berry pots the proportion and distribution of TEs indicate that different categories of TEs differentially shaped the genomic landscape of P. pachyrhizi during different times in its evolutionary history . We set out to date the Gypsy and Copia TEs in P. pachyrhizi, using a TE insertion age estimation. We observe that most retrotransposon insertions were dated less than 100 million years ago . We, therefore, decided to perform a more granulated study taking 1.0 million year intervals over this period. We approximated the start of TEs expansion at around 65 Mya after which the TE content gradually accumulates . We can see a more rapid expansion of TEs in the last 10 Mya, indeed over 40% of the Gypsy and Copia TEs in the genome seem to have arisen between today and 5 Mya . The climatic oscillations during the past 3 Myr are well known as the period of differentiation for multiple species. Therefore, the genome expansion through waves of TE proliferation in P. pachyrhizi correlates with periods in which other species, including their host species the legumes started their main radiation, and differentiation due to external stressors. This suggests that TEs either play an important role in generating the variation needed to adaptation of various stressors and/or proliferation of TEs is triggered by stressful events. Although a clear causal and or mechanistical role of TEs in adaptation, like in many other systems is still lacking, it is clear TEs have had a major impact on the architecture of the P. pachyrhizi genome.To build a high-quality resource that can facilitate future in-depth analyses, within the consortium, we combined several robust, independently generated RNAseq datasets from all three isolates that include major soybean infection-stages and in vitro germination . Altogether, eleven different stages are captured with seven having an overlap of two or more isolates, representing a total of 72 different transcriptome data sets . These data were used to support the prediction of gene models with the de novo annotation pipeline of JGI MycoCosm. Those proteins secreted by the pathogen that impact the outcome of an interaction between host and pathogen are called effectors and are of particular interest We used a variety of complementary methods to identify 2,183, 2,027, and 2,125 secreted proteins encoded within the genome assembly of K8108, MT2006 and UFV02, respectively.In P. pachyrhizi, depending on methodology, 36.73 − 42.30% of these secreted proteins are predicted to be effectors . We identified 437 common secreted proteins that are differentially expressed at least in one time-point in planta, of which 246 are predicted to be effectors providing a robust set of proteins to investigate in follow-up functional studies . We performed expression analysis on the annotated TEs and observed that 6.66−11.65% of TEs are expressed in the three isolates .
We compared the TE expression from different infection stages versus in vitro stages and used the in planta RNAseq data from the isolates K8108 and UFV02. A relatively small subset of TEs are expressed during the early infection stages between 10 to 72 hours post-inoculation . Remarkably, for this subset, we observed a 20 to 70- fold increase in the expression when compared to the spore and germinated-spore stages, with the expression levels reaching a peak at 24 HPI . To estimate the impact of the insertion age of this in planta-induced TE subset, we performed expression analysis on the conserved, intermediate, and divergent TEs. Although there is a slight overrepresentation of the conserved TEs, several intermediate TEs and divergent TEs are also highly expressed during 10–24 HPI . To compare the expression profile of this subset of TEs to the predicted effectors, we used the 246 core effectors and compared these with 25 known and constitutively expressed housekeeping genes across three isolates. We found that both TE and effector expression peaked at 24 HPI . While expression of effectors remained higher than the 25 selected housekeeping genes during infection, expression of TEs started to be repressed after 72 HPI . This observation would corroborate the hypothesis of stress-driven TE derepression observed in other patho-systems. However, it also shows that in P. pachyrhizi only a small percentage of the TEs are highly expressed during early infection stages. In several different phytopathogenic species a distinct genomic organization or compartmentalization can be observed for effector proteins. For example, the bipartite genome architecture of Phytophthora infestans and Leptospheria maculans in which gene sparse, repeat-rich compartments allow rapid adaptive evolution of effector genes. Other fungi display other organizations such as virulence chromosomes or lineage-specific regions. However, when interrogating both genomic location and genomic distribution of the predicted candidate effector genes in P. pachyrhizi, we could not detect an analogous type of organization . In addition, we did not observe evidence of the specific association between TE super families and secreted protein genes , as has been observed in other fungal species. Additional analyses comparing the distance between BUSCO genes and genes encoding secreted proteins also showed no specific association . Therefore, despite the large genome size and high TE content of P. pachyrhizi, its genome appears to be organized in a similar fashion to other rust fungi with smaller genome sizes. The lack of detection of a specific association between TE and genes in P. pachyrhizi may be due to the level of TE invasion with 93% TE observed for this genome.Rust fungi are dikaryotic, therefore variation can exist both between isolates and between the two nuclei present in each cell of a single isolate. Long-term asexual reproduction is predicted to promote divergence between alleles of loci,hydroponic grow system which in principle can increase indefinitely. Some rusts can reproduce both sexually and asexually leading to a mixed clonal/sexual reproduction. In the rust fungus P. striiformis f.sp. tritici, asexual lineages showed a higher degree of heterozygosity between two haploid nuclei when compared to the sexual lineages.
In the case of P. pachyrhizi, there are clear indications that the population is propagating asexually in South America based on early studies using simple-sequence repeats and internal transcribed spacer sequences. Our data utilizing high coverage raw Illumina data corroborate these earlier studies as we observed high levels of heterozygosity; 2.47% for UFV02, 1.61% for K8108 and 1.43% in MT2006, respectively . This was further corroborated by mapping the Illumina reads to the genome assembly. In total, 283.355, 359.939, and 458.719 SNPs were identified from K8108, MT2006 and UFV02, respectively. The average heterozygous SNPs across the genome is 2.97 SNPs per Kb in UFV02 compared to 2.58 and 3.34 SNPs per Kb in K8108 and MT2006, respectively . We subsequently studied the structural variation as well as the haplotype variation between the three isolates. Remarkably, the structural variation between the haplotypes of UFV02 is 163.3 Mb, while the variation between the complete genomes of the three isolates is 8 to 13 Mb . For example, the total number of repeat expansion and contractions is 7 and 16 times higher between the haplotypes than the variation between the isolates . To look at this inter-haplotype variation in more detail, we selected contigs larger than 1 Mb to study large syntenic blocks between isolates and haplotigs. The largest of these contigs, the 1.3 Mb contig 148 from UFV02 has synteny with contig 5809 from K8108, and contigs 220 and 362 from MT2006 , but not with its haplotig genome counterpart within UFV02, which indicates lack of recombination between haplotypes. This corroborates earlier studies that in South America P. pachyrhizi reproduces only asexually. Collection of the monopustule isolates K8108, MT2006, UFV02 is separated in both time and geographical location . To study SNP variation, we mapped the Illumina data of all three isolates to the reference assembly of UFV02. Given the high level of heterozygosity and TE content, we focused our analysis on the now annotated exome space . After removal of SNPs shared between either all three or two of the isolates, we identified only three nonsynonymous mutations unique for K8180, eight non-synonymous mutations for MT2006 and five unique non-synonymous mutations forUFV02. For these 16 predicted genes, we found evidence for expression in our transcriptome analyses for ten genes. This total number of non-synonymous mutations within exons between the isolates may appear counterintuitive given the time and space differences between collection of these isolates. Nonetheless, it is likely that other single pustule isolates identified from another field would yield a similar number of mutations. Approximately 6 million spores may be produced per plant in a single day resulting in 3 × 1012 spores per hectare per day. Therefore, the ability to generate variation through mutation cannot be underestimated. We observed an enrichment of mutations in the upstream and downstream regions of protein-coding genes , similar to other rust fungi. In contrast to the low number of mutated exons, the number of uniquely expressed genes between the three isolates is relatively high when compared to the core set of differentially expressed genes . This may reflect a mechanism in which transcriptional variation is generated via modification of promotor regions which would have the advantage that coding sequences that are not beneficial in a particular situation can be “shelved” for later use. This would result in a set of differentially transcribed genes for different isolates, and a core set of genes that are transcribed in each isolate.We subsequently set out to identify expanding and contracting gene families within P. pachyrhizi. To this end, a phylogenetic tree of 17 selected fungal species was built using 408 conserved orthologous markers. We estimated that P. pachyrhizi diverged from its most recent common ancestor 123.2−145.3 million years ago , a time frame that coincides with the evolution of the Pucciniales. We derived gene families including orthologues and paralogues from a diverse set of plant-interacting fungi and identified gene gains and losses using computational analysis of gene family evolution. Genomes of rust fungi including P. pachyrhizi underwent more extensive gene losses than gains, as would be anticipated for obligate biotrophic parasites . In total, we identified 2,366 contracted families and 833 expanding families within UFV02, including 792 and 669 families with PFAM domains, respectively. The most striking and significant contraction in the P. pachyrhizi genome is related to DEAH helicase which is involved in many cellular processes, e.g., RNA metabolism and ribosome biogenesis . In contrast, significant expansions in 12 gene families were found, including genes encoding glutamate synthase, GMC oxidoreductase and CHROMO domain-containing proteins .