DNA quality was verified by pulsed-field gel electrophoresis. DNA fragments longer than 50 Kb were used to construct a 10× Gemcode library using the Chromium instrument and sequenced at HudsonAlpha Institute for Biotechnology on a HiSeqX system with paired-end 150 bp reads. Approximately 95 Gb of 10X Chromium library data was sequenced . To increase sequence diversity and depth, three separate mate-pair libraries were constructed with 2–5 Kb, 5–7 Kb, and 7–10 Kb jumps using the Illumina Nextera Mate-Pair Sample Preparation Kit. In addition, two additional size-selected Illumina genomic libraries, ∼470 bp and ∼800 bp, were sequenced. The ∼470 bp and ∼800 bp libraries were made using the Illumina TruSeq DNA PCR-free Sample Preparation V2 kit. The ∼470 bp library was designed to produce ”overlapping libraries” after sequencing with paired-end 265 bp reads on an lllumina Hiseq2500 system, producing ”stitched” reads of approximately 265 bp to 520 bp in length. The 800 bp library was sequencedon an Illumina HiSeq2500 system with paired-end 160 bp reads, while the MP libraries were sequenced on an Illumina HiSeq4000 system with paired-end 150 bp reads. A total of ∼433 Gb of additional Illumina sequencing data were generated . Illumina library construction and sequencing was conducted at Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign.The genome of ”Draper” was assembled using the DeNovoMAGIC software platform , hydroponic gutter which is a de Bruijn graph-based assembler designed for higher polyploid, heterozygous, and/or repetitive genomes.
The Chromium 10X data were utilized to phase, elongate, and validate haplotype scaffolds. Four Dovetail Hi-C libraries were prepared as described previously and sequenced on an Illumina HiSeq X system with paired-end 150 bp reads to a total of 90.7X physical coverage of the genome . The de novo genome assembly, raw genomic reads, and Dovetail Hi-C library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies. Illumina genomic and Dovetail Hi-C library sequences were aligned to the draft input assembly using a modified SNAP read mapper. The separations of Dovetail Hi-C read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins and to make joins to close gaps between contigs.Plant Thissue samples were collected from blueberry cv. Draper grown in the growth chamber . For the fruit developmental series, three biological replicates each of berries at seven developmental stages were collected from cv. Draper in a field at the Horticulture Teaching and Research Center, Michigan State University, in July 2017. All plant Thissues were immediately flash frozen in liquid nitrogen, and total RNA isolation was performed using the KingFisher Pure RNA Plant kit . Isolated total RNA was quantified using a Qubit 3 fluorometer . RNA libraries were prepared according to the KAPA mRNA HyperPrep kit protocol . All samples were submitted to the Michigan State University Research Technology Support Facility Genomics core and sequenced with paired-end 150 bp reads on an Illumina HiSeq 4000 system .The draft genome of V. corymbosum cv. Draper was annotated using the MAKER annotation pipeline. Transcript and protein evidence used in the annotation included protein sequences downloaded from A. thaliana and UniprotKB plant databases, V. corymbosum ESTs from NCBI, and transciptome data assembled with StringTie from different blueberry Thissues .
A custom repeat library and Repbase were used to mask repetitive regions in the genome using Repeatmasker. Ab initio gene prediction was performed using gene predictors SNAP and AugusThus. The resulting MAKER Max gene set was filtered to select gene models with Pfam domain and annotation edit distance <1.0. The filtered gene set was further scanned for transposase coding regions. The amino acid sequence of predicted genes was searched against a transposase database. The alignment between the genes and the transposases was further filtered for those caused by the presence of sequences with low complexity. The total length of genes matching transposases was calculated based on the output from the search. If more than 30% of gene length aligned to the transposases, the gene was removed from the gene set. Furthermore, to assess the completeness of annotation, the V. corymbosum Maker standard gene set was searched against the BUSCO v.3 plant dataset . Genes were annotated with pfam domains using InterProScan v5.26–65.0 .To identify and classify repetitive elements in the genome, LTR retrotransposon candidates were searched using LTR harvest and LTR finder and further identified and classified using LTR retriever. A non-redundant LTR library was also produced by LTR retriever. Miniature inverted transposable elements were identified using MITE-Hunter . MITEs were manually checked for target site duplications and terminal inverted repeats and classified into super families . Those with ambiguous Target Site Duplication and Terminal Inverted Repeats were classified as ”unknowns.” Using the MITE and LTR libraries, the V. corymbosum genome was masked using Repeat masker. The masked genome was further mined for repetitive elements using Repeat modeler. The repeats were then categorized into two groups: sequences with and without identities. Those without identities were searched against the transposase database; if they had a match, they were considered a transposon.
The repeats were then filtered to exclude gene fragments using Prot Excluder and summarized using the ‘fam coverage.pl’ script in the LTR retriever package. The assembly continuity of repeat space was assessed using the LLAI deployed in the LTR retriever package. LAI was calculated based on either 3 Mb sliding windows or the whole assembly using LAI = /Total LTR-RT length. For the sliding window estimation, a step of 300 Kb was used . To account for dynamics of LTR retrotransposons, LAI was adjusted by the mean identity of LTR sequences in the genome based on all-versus-all blastn search, which was also performed by the LAI program .Illumina adapters were removed from the raw reads using Trimmomatic/0.33, and trimmed reads were filtered using FASTX Toolkit. After quality assessment using FastQC , the filtered reads were then aligned to the V. corymbosum genome using STAR. For the samples that were usedfor annotation, transcript assembly was performed de novo using StringTie. Counts of uniquely mapping reads were generated through HTSeq for all 35 RNA-seq datasets . Multi-mapping reads were excluded from the analysis except for the tandem gene expression analysis. Differential gene expression analysis was performed using the DESeq2 pipeline across fruit developmental stages with three biological replicates per developmental stage . Gene expression values were derived by calculating the fragments per kilobase per million reads mapped values using the standard formula for FPKM /gene length in kilobases [Kb]. To construct the gene co-expression network, genes that were not expressed or very weakly expressed in 30 or more conditions were first excluded from the analysis. The count data was then transformed into variance stabilized values using the variance stabilizing transformation function in DEseq. Pairwise correlations of gene expression were calculated using Pearson correlation coefficient and mutual rank using scripts available for download from the project’s data repository. MR scores were transformed to network edge weights using geometric decay function; five different co-expression networks were constructed with x set to 5, 10, 25, 50, and 100, respectively. Edges with PCC <0.6 or edge weight <0.01 were excluded. For each network, modules of coexpressed genes were detected using ClusterONE v1.0 using default parameters, u planting gutter and modules with P value > 0.1 or quality score <0.2 were excluded. The results from all co-expression networks were then combined by collapsing modules into metamodules of nonoverlapping gene sets.Total antioxidant capacity of Thissues from the fruit developmental panel was analyzed using the ORAC assay. Briefly, ∼20–30 mg of frozen ground fruit Thissue was measured for Thissue samples prior to extraction. Sample extractions were performed on ground Thissue using 1.8 mL of ice cold 50% acetone. Samples were vortexed and then put on a shaker for 5 minutes at room temperature. Samples were then centrifuged at 4◦C for 15 minutes . The ORAC assay was performed in a 96-well black microplate using the FLUOstar OPTIMA microplate reader . Each reaction well contained 150 μL of 0.08 μM fluorescein and 25 μL of 75 mM phosphate buffer , Trolox standards , or diluted sample extracts. For blueberry Thissue samples, 1:80–1:20 dilutions were used. Upon loading all appropriate wells, the 96-well microplate was put into the microplate reader and incubated for 10 minutes at 37◦C. Following incubation, 25 μL of 150 mM AAPH was added to each well, and fluorescence measurements began immediately. Fluorescence measurements were taken for 90 seconds per cycle for 70 cycles until the fluorescent probe signal was completely quenched.
The total antioxidant capacity of a sample was calculated by subtracting the AUC from the blank curve from the AUC of the sample curve to obtain the net AUC. Using Trolox of a known concentration, a standard curve was generated , and the total antioxidant capacity of each sample was calculated as Trolox equivalents. Each sample was run twice for two technical replicates. The coefficient of variation between technical replicates was required to be less than 0.20. Biological replicates were run for all Thissues in the fruit developmental series.Berries from ”Draper” were collected as described above. Approximately 100 mg of each frozen ground sample was resuspended in extraction solvent in a 2 mL tube . Ground Thissue was immediately mixed thoroughly to prevent thawing during extraction and to prevent metabolism of analytes by enzymes in the samples. All tubes were spun down for 10 minutes at 13,000 × g to pellet protein and other insoluble material. Then, 1 mL of supernatant was transferred to an autosampler vial. Anthocyanin content was evaluated by LC-MS as follows: 5 uL of sample extract were separated using a 10 minute gradient on a Waters Acquity HSS-T3 UPLC column on a Waters Acquity UPLC system interfaced with a Waters Xevo G2-XS quadrupole time-of-flight mass spectrometer . Column temperature was maintained at 40◦C, and the flow rate was 0.3 mL/min with starting conditions of 100% solvent A and 0% solvent B . The gradient was as follows: hold at 100% A for 0.5 minutes, ramp to 50% B at 6 minutes, then ramp to 99% B at 6.5 minutes, hold at 99% B to 8.5 minutes, return to 100% A at 8.51 minutes, and hold at 100% A until 10 minutes. Mass spectra were acquired in positive ion mode electrospray ionization over m/z 50–1500 in continuum mode using a data-independent MSE method that acquires data under both low and high collision energy conditions with the high collision energy setting using a ramp from 20–80 V. Capillary voltage was 3 kV, desolvation temperature was 350◦C, source temperature was 100◦C, cone gas flow was 25 L/hr, and desolvation gas flow was 600 L/hr. Correction for mass drift was performed using continuous infusion of the lock mass compound leucine encephalin. Anthocyanins and other related flavonoids were identified based on accurate mass and fragmentation pattern. Peak areas were determined using Quanlynx within the Masslynx software package . Relative anthocyanin content was calculated for each sample using the formula: reported peak area of the compound/peak area of internal standard/weight of extracted Thissue .Cardiovascular disease remains the leading cause of mortality in the United States, and after decades of decline, is rising coincident with the increase in obesity, insulin resistance, and diabetes that characterize cardiometabolic risk. Notwithstanding hereditary predisposition, reduction in identified, modifiable lifestyle risk factors can reverse CMR and CVD. It is estimated that 45.4% of all cardiometabolic deaths in the United States due to heart disease, stroke, and diabetes are associated with sub-optimal intakes of 10 dietary factors. Fewer than 1% of American children and adolescents meet full recommended metrics of heart healthy nutrition, falling especially short of recommended intake in the categories of fruits, vegetables, fiber and essential fatty acids. Intensive pediatric lifestyle interventions for obesity are effective in achieving significant reduction in body mass index but do not elicit stable changes in nutrition habits in children and adolescents. These studies suggest a critical need for developing innovative tools to improve diet quality in youth. We have previously shown that twice daily consumption for two weeks of a whole food based nutrient bar composed of a blueberry, dark chocolate, red grape, and walnut matrix, soluble and insoluble fiber, with supplemental vitamins, minerals and essential long chain fatty acids, significantly increased high density lipoprotein cholesterol , due primarily to a 28% increase in large HDL particles, in generally healthy and insulin sensitive lean and overweight adults.