Comparative analysis of the complete chloroplast genomes of six threatened subgenus Gynopodium (Magnolia) species | BMC Genomics | Full Text
In this study, the coverage depth of each organelle genome reached over 100 × (Magnolia omeiensis: 168 × , M. sinica: 102 × , M. nitida: 132 × , M. kachirachirai: 103 ×). The six CPGs within the subgenus Gynopodium ranged in size, from 160,027 bp (M. kachirachirai) to 160,114 bp (M. lotungensis) (Table 1). All CPGs were a typical quadripartite circular structure (Fig. 1) that included a LSC region and a SSC region divided by a pair of IR regions (Fig. 1 and Table 1). The length of the LSC region ranged from 88,130 bp (M. kachirachirai) to 88,170 bp (M. yunnanensis), and the length of the SSC and IR regions ranged from 18,725 bp (M. kachirachirai) to 18,767 bp (M. lotungensis), and from 26,571 bp (M. sinica) to 26,586 bp (M. kachirachirai), respectively (Table 1). The GC-content was similar in all six CPGs. The GC content of the whole plasmid sequence was 39.3%; the GC content of the IR regions was 43.2%, which was higher than that of in LSC and SSC regions (38% and 34.3%) (Table S1). In addition, 131 genes were annotated in all six CPGs, including 37 transfer RNA (tRNA) genes, 8 ribosomal RNA (rRNA) genes, and 86 protein-coding genes (Fig. 1 and Table 1). There were two copies for seven of the protein-coding genes, seven of the tRNA genes, and four of the rRNA genes; the other 95 genes were all represented by single copies. Eleven genes possessed introns: rps16, rps12, ropC1, rpl2, rpl16, petB, petD, ndhB, ndhA, clpP1, and atpF (Table 2).
Gene map of the CPGs of six subgenus Gynopodium species. The genes inside and outside of the circle are transcribed in the clockwise and counterclockwise directions, respectively. Genes belonging to different functional groups are shown in different colors. The darker gray area in the inner circle indicates the GC content and the lighter gray indicates the AT content of the genome. The thick lines indicate the extent of the inverted repeats (IRa and IRb) that separate the genomes into the small single-copy (SSC) and large single-copy (LSC) regions
The alignments indicated high sequence similarity among the CPGs of the six subgenus Gynopodium species. However, sequence divergence in non-coding regions was greater than that in coding regions, such as trnH–psbA, rps2–rpoC2, ycf4–cemA, petA–psbJ, and ccsA–ndhD (Fig. 2). The greatest variation among coding regions was observed in ycf1. No major genomic rearrangements or insertions were detected among the six CPGs relative to that of M. omeiensis (Fig. S1).
Large repeat sequences were identified using REPuter software [26]. A total of 300 repeats were identified. Palindromic repeats were the most common repeat sequences, and no complement repeat was found in the CPGs of six subgenus Gynopodium species (Fig. 4). Variation was observed in the number of palindromic repeats and reverse repeats among the six CPGs. The lowest number of palindromic repeats (19) was observed in M. sinica, followed by M. omeiensis (20), M. lotungensis (21). M. nitida (22), M. kachirachirai (22), and M. yunnanensis (22). The number of reverse repeats was less in M. nitida, M. kachirachirai, and M. yunnanensis (9) than in M. lotungensis (10), M. omeiensis (12), and M. sinica (13). Among these repeats, nine were over 30 bp and 24 were 20–29 bp; the longest repeat was 39 bp. Over half of the repeats (60%) were located in non-coding regions, and some of the repeats were located in the coding regions of genes, such as psaA, psaB, ndhC, ycf1, ycf2, rpoB, and rpoC2 (Table S2).
Comparison of the numbers of repeats among the CPGs of six subgenus Gynopodium species: Magnolia omeiensis, Magnolia sinica, Magnolia nitida, Magnolia kachirachirai, Magnolia lotungensis, and Magnolia yunnanensis. (F: Forward, P: Palindromic, and R: Reverse repeats)
The nucleotide diversity within a 600-bp window was calculated for all six CPGs, which ranged from 0 to 0.008 (Fig. 6). There were five highly variable regions with Pi values greater than 0.004, including the ycf1 gene and four intergenic regions (psbA-trnH-GUG, petA–psbJ, rpl32-trnL-UAG and ccsA-ndhD). Pi was greatest (0.007) for the intergenic region between ccsA and ndhD. Highly variable regions were located in the LSC region (2) and SSC region (3); no highly variable region was detected in the IR region (Fig. 6), which reflects similar patterns with structure variability of CPGs. In addition, we evaluated the potential utility of the five highly variable regions. The rpl32-trnL-UAG marker (π = 0.007) with the highest discriminatory power can discriminate six haplotypes from the six subgenus Gynopodium species (Table 4). The psbA-trnH-GUG marker (π = 0.006) with high haplotype diversity can discriminate five haplotypes. Similarly, the marker petA-psbJ (π = 0.005), ccsA-ndhD (π = 0.007), and ycf1 (π = 0.004) can discriminate three haplotypes from the six subgenus Gynopodium species (Table 4).
Phylogenetic relationships were reconstructed using both ML and BI approaches, based on the whole CPGs of 22 species covering all known sections within Magnoliaceae. Topologies of the ML and BI trees were concordant and confirmed that Magnoliaceae comprised two subfamilies (Liriodendroideae and Magnolioideae), each with one genus (Liriodendron and Magnolia). Within Magnolia, subgenus Gynopodium was sister to the subgenus Yulania (BS = 100%, PP = 1.00) (Fig. 7). However, due to the non‐monophyly of subgenus Magnolia, three previously established subgenera in Magnolia were not supported (Fig. 7). Subgenus Gynopodium should be treated as a section of genus Magnolia following Wang et al. (2021) [27]. Within Subgenus Gynopodium, M. sinica diverged first (PP = 1, BS = 100), followed by M. nitida, M. kachirachirai, and M. lotungensis (albeit with relatively low support values), and M. omeiensis was sister to M. yunnanensis (PP = 0.97, BS = 50) (Fig. 7, Fig. S3).
Phylogenetic relationship of Magnoliaceae based on the CPGs of 20 Magnolia species and two Liriondendron species. The phylogeny was inferred by Bayesian inference. Numbers above the lines indicate the posterior probabilities from the Bayesian inference
The CPGs of most angiosperms varied in size from 120 to 160 kb [16]. Our results indicated that the CPGs of six subgenus Gynopodium species are similar in size (ca. 160 kb) and structure (quadripartite circular structure) to other Magnolia species [28,29,30] as well as other higher plants [31]. The total number, order, and composition of genes in the CPGs were highly conserved within subgenus Gynopodium, which is also consistent with most Magnolia species [32, 33], suggesting a very conserved structure of CPGs of subgenus Gynopodium.
The overall GC content has been reported to be associated with the phylogenetic position; specifically, the GC content tends to be higher in early diverged lineages, such as magnoliids [34]. Our results are consistent with these previous findings. Of the six subgenus Gynopodium species, the overall GC content of CPGs was approximately 39.3%, which is similar to that of other Magnolia species, such as M. shiluensis [32], M. grandiflora [35], and M. zenii [36] but higher than the average GC content (35%) of most angiosperms [37]. The GC content also varies among different regions of the CPG [34, 38]. IR region (43.2%) contains significantly higher GC content than that of the LSC (38%) and SSC regions (34.3%) (Table S1), which can attribute to the high GC content in the ribosomal RNA (rRNA) genes in IR region (Fig. 1). Identical findings have been reported in other species, such as Magnolia polytepala [39], Magnolia delavayi [40] and Datura stramonium [41].
Conservatisms of the CPGs
We compared the CPGs of six species within the subgenus Gynopodium. The results indicated that the SSC and LSC regions were more divergent than IR regions, and sequences in non-coding regions were more divergent than that in coding regions, which were consistent with previous findings in Magnolia species [29] and other flowering plants [42, 43] In this study, we identified six regions presenting significant variations in the CPGs of subgenus Gynopodium species, such as five intergenic regions: trnH–psbA, rps2–rpoC2, ycf4–cemA, petA–psbJ, and ccsA–ndhD, and one gene ycf1 (Fig. 2). No major genomic rearrangements or insertions were detected among the six CPGs, which further corroborated the results of recently published studies about Magnoliaceae [27]. Previous studies also found that variation in the size of angiosperms CPGs might be largely driven by length variation in IR regions, intergenic regions, and the number of gene copies [44,45,46]. The structure of the six CPGs within subgenus Gynopodium species was highly conserved; no major expansions or contractions were observed in the IR regions. However, variations in sequence length have been observed in both the LSC and SSC regions, which may drive variations in the size of CPGs within the subgenus Gynopodium species, as reported in other species [29, 47, 48].
Large repeats and simple sequence repeats
Knowledge of genetic diversity within subgenus Gynopodium is necessary to develop sustainable conservation management that ensures long-term maintenance of the genetic diversity within these species [3, 49]. Repeat sequences, which are dispersed in CPGs, are an important source of structural variation and play a significant role in genomic evolution [16, 50]. In our study, 300 repeats were identified, of which palindromic repeats were the most common, while complement repeats were missing in CPGs of the subgenus Gynopodium. The different number of forward repeats, palindromic repeats and reverse repeats generated the variations of CPGs [41]. Therefore, genetic variation in large repeats can provide useful information for phylogenetic research and population genetics. Previous studies have indicated that repeat sequences are mostly located in the intergenic spacer regions, followed by the coding regions [14, 32]. Our findings are consistent with this general pattern; 61.22-65.31% of the repeats were located in IGS regions, followed by coding regions and introns (34.69-38.38%) (Table S2).
SSRs are useful molecular markers that have been widely used in species discrimination, breeding and conservation, and phylogenetic studies [51,52,53,54]. In the CPGs of six subgenus Gynopodium species, the number of SSRs located in the LSC and SSC regions accounted for 92.86% of all SSRs, and only ten SSRs were located in the IR region (Table S3). Our findings were consistent with the general pattern of angiosperm that most of the repeats were located in the LSC and SSC regions of CPGs [36, 48]. The SSRs of the CPGs of six subgenus Gynopodium species identified in our study provided valuable sources for developing primers of specific SSR loci and a useful tool for species identification.
Highly variable regions
Highly variable regions provide abundant phylogenetic information and can be used as potential molecular markers to delimit closely related taxa [55]. The Pi of highly variable regions within subgenus Gynopodium species was lower (< 0.008) compared with previously published values of other species [56, 57] and some of Magnolia species [29, 30]. The low genetic diversity of subgenus Gynopodium species and other Magnolia species, e.g., Magnolia ashei may relate to their limited habitat and small populations as threatened species [54, 58, 59].
In the Magnoliaceae, several highly variable regions, such as, matk, ycf1, psbA–trnH and atpB–rbcL have been recognized as potential sites for DNA barcoding [39, 60]. In this study, we recognized five highly variable regions with Pi values greater than 0.004, including one gene (ycf1) and four intergenic regions (psbA-trnH-GUG, petA–psbJ, rpl32-trnL-UAG and ccsA-ndhD). The highly variable regions identified here have high discriminatory power to distinguish 6 (rpl32-trnL-UAG), 5 (psbA-trnH-GUG), 3 (petA-psbJ), 3 (ccsA-ndhD), and 3 (ycf1) plastid haplotypes from six subgenus Gynopodium species (Table 4). These regions could be considered as potential barcoding markers for species identification of subgenus Gynopodium.
Phylogenetic relationship
CPGs have shown substantial power in solving phylogenetic relationships among angiosperms [61]. However, it is still controversial regarding the boundaries of the genera of Magnoliaceae [1, 6]. Based on the whole CPGs of 22 species covering all known sections of Magnoliaceae, topologies of the ML and BI trees all supported that Magnoliaceae consisted of two subfamilies Magnolioideae and Liriodendroideae, each with one genus, Magnolia and Liriodendron, respectively. However, due to the non‐monophyly of subgenus Magnolia, three previously established subgenera in Magnolia were not supported. Our results supported the infrageneric circumscriptions reported by Wang et al. that classified Magnolia into 15 clades corresponding to 15 sections and subgenus Gynopodium treated as a section of Magnolia [27, 62]. And our results also supported merging section Manglietiastrum into section Gynopodium as reported previously [62, 63].
Although we recovered the phylogenetic relationship within subgenus Gynopodium, some of the nodes were poorly supported (Fig. 7). The low nucleotide diversity and nucleotide substitution rate in the CPGs of subgenus Gynopodium species and other Magnolia species might contribute to the lack of phylogenetic resolution in Magnoliaceae [62, 64, 65]. Consequently, genetic markers from the mitochondrial and nuclear genomes should be developed to reconstruct more robust phylogenies of subgenus Gynopodium species.