The biased expression of the maternal nuclear copy would resolve any potential conflicts

Our results were also consistent with those previously reported in goldfish, which suggests the most common mechanism for duplicate gene retention in these allopolyploid cyprinines since their 4R event is due to dosage constraints. However, mechanisms for duplicate gene retention are not strictly inferable because a multilevel set of phenomena that range across WGD.Allopolyploids face the unique challenge of integrating two subgenomes, which evolved independently in the diploid progenitors since their most recent common ancestor, that now reside in a single nucleus. One way to resolve potential genetic or epigenetic conflicts is “subgenome dominance”, which results in one subgenome being dominant over the ‘submissive’ subgenome. The dominant subgenome not only has higher gene expression but also retains a greater number of ohnologs compared to the submissive subgenome. To better understand the dynamics among subgenomes of our three sequenced allotetraploids, we compared their gene loss , gene expression level, the density of TEs near ohnologs, constraint on conserved noncoding sequences , DNA methylation patterns and 3D genome structure. To examine gene fractionation differences among subgenomes, blueberry grow bag size gene retention patterns between the two subgenomes of the three allopolyploid species were examined relative to the diploid references from zebrafish, O. macrolepis and Sc. acanthopterus.

These results revealed that in all cases, the maternal subM showed slightly higher gene retention rates relative to the paternal subP . Compared to the reference zebrafish, subM showed 2.815% higher gene retention in L. capito, 0.427% higher gene retention in P. rabaudi, and 0.819% higher gene retention in S. sinensis relative to subP. However, these patterns are not supportive of strong subgenome dominance pattterns as has been reported in some plant allopolyploids . Ohnolog retention bias of certain sets of genes, including BUSCO genes towards one subgenome has been recently reported for the Prussian carp, goldfish and common carp. Similarly, we found that the number of BUSCO singleton genes in maternal subM was significantly higher than those in subP for all three allotetraploids . For example, subM of S. sinensis has retained 609 complete and single copy BUSCO genes, compared to only 448 in subP . Next, we performed GO analysis of the genes that returned to single copy in subP and subM. Functional enrichment analysis revealed that similar GO term classes were identified for all species, including mitochondrial related processes, nc/rRNA processes and DNA repair . These GO terms were also identified as returning to singleton state post-WGD from a previous analysis of the Prussian carp, goldfish, and common carp genomes.A previous study investigating subgenome dominance in octoploid strawberry revealed that the dominant subgenome retained a significantly greater number of tandem duplicated genes. Here, we uncovered a similar pattern for retained tandem gene duplications being biased towards the maternal subM in all three allotetraploid cyprinids .

First, significantly more tandem duplicates are encoded on the maternal subM compared to the paternal subP . Second, a greater number of tandem gene arrays were observed in the maternal subM compared to paternal subP . Lastly, the maternal subM genomes contained a greater number of larger tandem arrays than the paternal subP . An analysis of protein family domains revealed an enrichment of functions associated with the immune system for retained tandem duplicates in these subgenomes .We also tested the hypothesis that DNA methylation patterns in genes and TEs in the extant relatives of diploid progenitor species,and thus subgenomes within allotetraploids, may explain observed subgenome expression bias patterns. Whole-genome bisulfite sequencing of the muscle tissue from two diploid ancestors and three allotetraploids was performed . Levels of CH methylation were very low in genes of all five species , which was also observed in the common carp genome and is typical of somatic tissues in humans. Therefore, we focused on CG methylation for all subsequent analyses. A similar pattern of CG methylation was observed within the gene body and 2 kb flanking regions in case of the diploid Sc. acanthopterus and the three allotetraploid species . However, for O. macrolepis, there was much lower CG methylation levels ~1 kb upstream up to the transcriptional start site and higher levels throughout the gene body and ~1 kb downstream. No difference in CG methylation was observed among the diploid and subgenomes of tetraploid species . However, this analysis of the entire set of ohnologs may obscure more subtle differences.

To examine this, we next analyzed CG methylation for genes with biased expression in muscle tissue towards either the paternal subP or maternal subM. Interestingly, CG methylation levels of expression biased genes towards the subgenome A were lower from ~1.5 kb upstream to TSS compared with subM levels . Similarly, the upstream region of subM bias genes for all species showed lower CG methylation levels in this same region than those of the corresponding regions of duplicated genes in subP . This suggests that CG methylation levels in upstream regions of genes may have a role in observed expression bias towards a particular subgenome. Further, to determine if there are any significant differences in TE CG methylation between subP and subM of tetraploid species, we investigated CG methylation of TEs that are in 1 kb vicinity of 7040 positionally conserved syntenic ohnologs and at the whole genome level. We found some degree of variation in mCG levels between subgenome TEs that were found in 1 kb vicinity of 7040 duplicate orthologs . However, elevated levels of TE methylation in subP were observed in L. capito which was opposite to what was observed in S. sinensis and P. rabaudi where subM showed higher methylation levels. This phenomenon was also observed for TE methylation at the whole genome level . This opposite trend of TE methylation in L. capito in comparison to S. sinensis and P. rabaudi can be attributed to the difference in the TE density of respective genomes .Cypriniformes represent the largest clade of freshwater fish with ~600 described species in the family Cyprinidae, which has experienced multiple rounds of independent WGD. The phylogenetic relationships, evolutionary history, and the genetic basis of previously reported subgenome dominance of these polyploids has remained poorly understood. In this study, high-quality genomes of twenty-one cyprinid fishes, including subgenome-resolved allotetraploid genomes from three tribes, were de novo assembled and analyzed to investigate subgenome evolution at the genetic and epigenetic levels. Our results are supportive of previous reports for subgenome dominance at both the gene retention and transcriptome level. In addition, we observed that the dominant subgenome retained a greater number of tandem duplicates with a functional bias towards immune related processes. Our phylogenetic analyses revealed that S. sinensis, L. capito, and P. rabaudi are allopolyploids and that observed dominance is consistently towards the subgenome contributed by the maternal parent. Also, the most recent polyploid event in P. rabaudi is likely shared with common carp and goldfish. Functional enrichment analyses revealed similar GO term classes, blueberry box including mitochondrial related processes, for the genes that returned to single copy in all examined allopolyploids. The observed consistent bias towards the maternal subgenome donor, alongside the bias towards mitochondrial functions, suggests that observed subgenome dominance patterns in these allopolyploid fish may be due to maternal dominance. The maternal contributed nuclearencoded genes that interact with mitochondrial encoded genes may be favored to maintain proper cytonuclear interactions. The mitochondrial proteome contains products from over a thousand genes, while the mitochondrial genome encodes approximately only 13 proteins.

The vast majority of genes are now nuclear genome encoded following the horizontal gene transfer from the organellar genome to the nuclear genome over the past hundred million years. However, these nuclear genes might encode dosage-sensitive proteins that function in either organellar signaling networks or macromolecular complexes that must maintain proper stoichiometric balance with interacting partner that are encoded in the organellar genome. Furthermore, the sequence of the proteins encoded by both organellar and nuclear-encoded mitochondrial genes may have diverged among the diploid progenitors. Thus, there’s a possibility for incompatibilities to arise from “mismatches” between the genes contributed by the paternal subgenome and the organellar genomes contributed by the maternal parents in allopolyploids. The model that we are proposing here is that observed dominance patterns in these allopolyploids is to preserve proper cytonuclear interactions, and ultimately, core cellular functions. Nonetheless, we cannot exclude the possibility that some ofthe observed subgenome expression differences, particularly at the individual gene level, is due to differences in DNA methylation and transposable element density differences as hypothesized in previous studies. We observed that methylation levels at CG sites in upstream regions of genes, ~1.5 kb upstream to the transcriptional start site, may have a role in observed expression bias towards the maternal subM genome. Epigenetic factors, including changes in methylation at certain CG sites, have been previously shown to alter gene expression and involved in maternal imprinting including of nuclear encoded mitochondrial and DNA repair genes. We also observed that the dominantly expressed ohnolog, from either subgenome, in some cases, had significantly lower TE densities. This suggests that both maternal dominance and TE differences are likely contributing to observed independently repeated subgenome dominance patterns in allopolyploid cyprinid fishes. To the best of our knowledge, this is the first study to show the potential role of maternal dominance in contributing to subgenome dominance in any allopolyploid animal. Future studies of other allopolyploids are needed to determine if these observed patterns are shared by other polyploid animals or are potentially unique to cyprinids. Furthermore, our multi-species comparisons suggest that genetic divergence of the diploid progenitors, for the allopolyploids and divergence times examined in this study, did not contribute to subgenome expression dominance. However, it is important to note the possibility that the divergence of the diploid species in each allopolypoid wasn’t sufficiently different to observe additive subgenome expression dominance effects. Lastly, we also examined genome organization using Hi-C data and selective constraints on noncoding regulatory sequences, which revealed no significant differences among subgenomes. These new reference genomes and various datasets should serve as a powerful platform for the community to further investigate genome evolution of cyprinids, and as a valuable resource for a wide range of studies including modeling human disease.Genomic DNA degradation and contamination was monitored on agarose gels. DNA purity was checked using the NanoPhotometer spectrophotometer . DNA concentration was measured using Qubit DNA Assay Kit in Qubit 2.0 Flurometer . Microgram genomic DNA spiked with lambda DNA were fragmented by sonication to 200-300 bp with Covaris S220, followed by end repair and adenylation. Cytosine-methylated barcodes were ligated to sonicated DNA as per manufacturer’s instructions. Then these DNA fragments were treated twice with bisulfite using EZ DNA Methylation-GoldTM Kit , before the resulting single-strand DNA fragments were PCR amplificated using KAPA HiFi HotStart Uracil + ReadyMix . Library concentration was quantified by Qubit 2.0 Flurometer and sequenced by Novaseq platform w producing 24.39-55.95 Gb raw bases with a bisulfite conversion rate of 99.57–99.75%. MethylC-seq data for each sample were aligned to their respective genomes and methylation called using the methylpy pipeline v.1.4.6. This pipeline uses Cutadapt v.4.1 for adapter trimming, Bowtie 2 v.2.4.4 for alignment, and Picard v.2.26.10 to mark duplicate reads. Spiked-in unmethylated lambda phage DNA was used as a control to calculate non-conversion rates from bisulfite treatment . Gene and TE metaplots were made as previously done using custom scripts and pybedtools v.0.9.0. Gene/TE bodies were divided into 20 bins, and the weighted methylation level calculated across all genes/TEs. For gene bodies, only exonic cytosines were included. This process was repeated for both 2 kb upstream and downstream regions, and the data plotted in R with ggplot2. To examine the effects of neighboring TEs on genicmethylation, we used bedtools v.2.30.0 to identifying genes with an intersecting TE within 1 kb.Domestication has altered the interactions among crop plants, herbivorous insects, and higher trophic levels of agroecosystems . While selection has resulted in enhanced agronomic traits like yield, defensive traits such as toxic compounds have been reduced or removed . A consistent pattern of reduced defense has not been established across species, but crop plants tend to be more vulnerable to herbivory than their wild relatives . Domesticated beans may be more attractive to insect herbivores than their wild relatives, but they are also more effective at recruiting parasitoids .