genome ecology evolution etc

Visit Blog Website

33 posts · 11,820 views

Collective blog of the tutorial "Genome Ecology Evolution etc." of the Doctoral school of Biology of the University of Lausanne

Sort by: Latest Post, Most Popular

View by: Condensed, Full

  • May 27, 2013
  • 06:24 AM
  • 40 views

Analyses of pig genomes provide insight into porcine demography and evolution

by Anna Kostikova in genome ecology evolution etc

Pig domestication has started over 10 000 years ago and has had important consequences on human life, changing our agricultural and medical practices. Much has been argued on whether pig was domesticated independently across multiple locations or it was adopted … Continuer la lecture →... Read more »

Groenen MA, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, Rogel-Gaillard C, Park C, Milan D, Megens HJ.... (2012) Analyses of pig genomes provide insight into porcine demography and evolution. Nature, 491(7424), 393-8. PMID: 23151582  

  • May 15, 2013
  • 03:57 AM
  • 50 views

Genomic analysis of a key innovation in an experimental Escherichia coli population

by Romain Savary in genome ecology evolution etc

In this paper Richard E. Lenski and colleague are showing an example of how efficient adaptation by natural selection is. During 20 year they have been growing twelve populations of Escherichia coli in glucose medium containing also abundant citrate, this … Continuer la lecture →... Read more »

  • May 7, 2013
  • 03:38 AM
  • 41 views

Genome Patterns of Selection and Introgression of Haplotypes in Natural Populations of the House Mouse (Mus musculus)

by Martha Serrano in genome ecology evolution etc

  1. How genomes evolve in natural populations? is a question that, despite to be a long-standing search for geneticists, recent molecular genomic approaches may help to understand. Their evolution among natural populations may be shaped by forces derived from … Continuer la lecture →... Read more »

  • April 23, 2013
  • 11:36 AM
  • 32 views

Flycatchers’ genomes bring new insights into the genomic basis of evolution

by Charlotte Récapet in genome ecology evolution etc

How exactly do lineages diverge to the point that they can be considered separate species, and especially reach reproductive isolation, is still an ongoing question in evolutionary biology. Classical views of speciation hypothesize the existence of speciation genes, defined as … Continuer la lecture →... Read more »

Ellegren H, Smeds L, Burri R, Olason PI, Backström N, Kawakami T, Künstner A, Mäkinen H, Nadachowska-Brzyska K, Qvarnström A.... (2012) The genomic landscape of species divergence in Ficedula flycatchers. Nature, 491(7426), 756-60. PMID: 23103876  

  • March 27, 2013
  • 05:30 AM
  • 142 views

Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History.

by Romain Savary in genome ecology evolution etc

Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. The origin of modern Human is clear with evidences coming from many different disciplines. Africa is the continent where the highest genetic diversity is found; this clue associated … Continuer la lecture →... Read more »

Schlebusch, C., Skoglund, P., Sjodin, P., Gattepaille, L., Hernandez, D., Jay, F., Li, S., De Jongh, M., Singleton, A., Blum, M.... (2012) Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. Science, 338(6105), 374-379. DOI: 10.1126/science.1227721  

  • March 19, 2013
  • 12:30 PM
  • 68 views

The Genetic Architecture of Adaptations to High Altitude in Ethiopia

by Guillaume Cossard in genome ecology evolution etc

Human populations have colonized high altitude (HA) habitats (above 2500m of altitude) multiple times and independently. HA habitats are essentially characterized by lower biodiversity and low levels of oxygen availability, also called hypoxia. Classically, organisms respond to this decreased oxygen … Continuer la lecture →... Read more »

Alkorta-Aranburu, G., Beall, C., Witonsky, D., Gebremedhin, A., Pritchard, J., & Di Rienzo, A. (2012) The Genetic Architecture of Adaptations to High Altitude in Ethiopia. PLoS Genetics, 8(12). DOI: 10.1371/journal.pgen.1003110  

  • December 18, 2012
  • 05:11 AM
  • 194 views

by Nicla Loviglio in genome ecology evolution etc

Genome-wide analysis of a long-term evolution experiment with Drosophila For decades, most researchers have provided some general insights into the nature of adaptation in asexually reproducing populations with small genome, such as bacteria and yeast. They assumed that sexual species evolve the same way these populations do, i.e. their adaptation is driven by the so-called selective sweeps or newly arising beneficial genetic mutation quickly becomes "fixated" on a particular portion of DNA, with the genome-wide haplotype associated with it. When we relate to obligate sexually reproducing systems, this is much more complicated by the fact that selection can act on standing variation, that means that weak selection can act on many pre-existing genetic variants involved in fitness traits. The idea is that short-term evolution have occurred through a so-called “soft sweep” model, which contrasts the hypothesis of the “hard sweep”, where strong selective sweep originates from a single mutation, while all its linked neutral variants are eliminated. Burke et al. compared outbred, sexually reproducing, replicated populations of D. melanogaster selected for accelerated development and their matched control populations on a genome-wide basis, and this is the first time that such a study of a sexually reproducing species has been done. As shown in figure 1, they used the Illumina platform to get short-read sequences from three genomic DNA libraries, obtained from sets of replicated populations experiencing different selection treatments, maintained since 1980 under the specific conditions of large population size (N > 1,000) and discrete generations: 1) a pooled sample of five replicate populations that have undergone sustained selection for accelerated development and early fertility for over 600 generations (ACO); 2) a pooled sample of five replicate ancestral control populations, which experience no direct selection on development time (CO); 3) a single ACO replicate population (ACO1). Phenotype was assayed by using longevity assay, starvation resistance assay, development time assay, dry weight assay.Figure 1. Grey bars represent values measured in each of the five replicate populations in the ACO and CO treatments. Measures from the five baseline (B) replicate populations represent phenotypes typical of populations kept on two-week generation maintenance schedules. Only data for females are shown. Longevity and starvation resistance data were collected after at least 619 generations of ACO treatment, and both development time and dry weight data (dry weight values are mean masses of groups of ten females) were collected after 640 generations of ACO treatment. Error bars, s.e.m. for each replicate population. As represented in Figure 2, a 100-kb genome-wide sliding-window analysis was carried out to identify regions diverged in allele frequency, with a large number of genomic regions showing significant difference between the ACO population and their matched controls, while no significant divergence was displayed by the comparison of the single replicate population (ACO1) and the pooled sample consisting of all five ACO populations. The presence of an apparent excess of diverged regions on the X chromosome was explained as a result of selection on initially rare recessive or partially recessive alleles. Another important consideration to do is that the adaptive response was highly multigenic, as not only one or few region were identified to be affected by selection on developmental time, but most likely a larger portion of the genome was involved. Figure 2. Sliding-window analysis (100 kb) of differentiation in allele frequency between the ACO and CO populations: the solid black line depicts L10FET5%Q scores at 2-kb steps (Methods). The dotted line is the threshold that any given window has a 0.1% chance of exceeding relative to the genome-wide level of noise. The grey line depicts L10FET5%Q scores for a difference in allele frequency between ACO1 and the ACO pooled sample. The five panels show the five major D. melanogaster chromosome arms (as indicated). Looking instead at the heterozigosity throughout the genome, they found a relevant and expected concordance with these results. Regions of reduced heterozygosity are in fact expected to be strongly associated with regions of differentiated allele frequency. Accordingly, if we compare figure 2 with figure 3, we can observe that also in this case the regions identified for divergence in allele frequency were the ones associated with reduced heterozigosity. Figure 3. Sliding-window analysis (100 kb) of heterozygosity in the CO pool (blue), the ACO pool (red) and ACO1(grey), with a 2-kb step size. The panels show the five major chromosome arms of D. melanogaster. ... Read more »

Burke, M., Dunham, J., Shahrestani, P., Thornton, K., Rose, M., & Long, A. (2010) Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature, 467(7315), 587-590. DOI: 10.1038/nature09352  

  • December 14, 2012
  • 07:51 AM
  • 180 views

Genome-wide analysis of a long-term evolution experiment with Drosophila

by Namrata Sarkar in genome ecology evolution etc

Normal 0 false false false EN-US JA X-NONE ... Read more »

Burke, M., Dunham, J., Shahrestani, P., Thornton, K., Rose, M., & Long, A. (2010) Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature, 467(7315), 587-590. DOI: 10.1038/nature09352  

  • December 6, 2012
  • 05:31 PM
  • 255 views

Parallel evolution in adaptive phenotypes: the case of the threespine stickleback

by Francesco Nicola Carelli in genome ecology evolution etc

How do adaptive phenotypes evolve? This question, despite the increasing availability of genomic and other molecular data, remains still largely unanswered. Among the different aspects investigated, a major point of discussion in this topic is the extent of the contribution of coding versus non-coding variation in the evolution of new traits. Although many research groups suggested that non-coding mutations might play a pivotal role because might avoid pleiotropic effects, still few examples are available to discard a potential major contribution of coding variants in adaptive evolution.The paper from Jones et al. we discussed tried to answer this question by looking at the differences between distinct populations of threespine sticklebacks (Gasterosteus aculeatus). This species, originally found in marine habitats, colonized the freshwater environment evolving specific phenotypic traits, but still maintaining the ability to hybridize with the marine individuals. An important feature of this species, already known from previous studies, is the presence of shared genomic variants in geographically unrelated populations distinguishing the marine from the freshwater populations. This finding suggested the possibility of a parallel adaptive evolution of phenotypic traits due to the reuse of standing genetic variation. To test this hypothesis, Jones et al. generated a reference genomic assembly of a female freshwater stickleback (Sanger sequencing, 9.0x coverage, total gapped size: 463Mb). This reference genome provided the basis to analyze genomic differences in marine and freshwater populations collected in several locations around the world (Europe, North America and Japan). For this purpose, a total number of 20 individuals (classified in clearly marine and clearly freshwater based on multiple phenotypic features) were sequenced at 2.3x average coverage and genome-wide single nucleotide polymorphisms (SNPs) identified.The data collected were analyzed using three different approaches with the aim of finding regions in the genomes showing a high similarity among the freshwater individuals and differing from the corresponding loci in the marine samples. The first approach consisted in a self-organizing map-based iterative Hidden Markov Model (SOM/HMM), used to reconstruct common relationships (trees) among the individuals. Although most of the phylogenies recapitulated the geographical relationships among the samples, four of them separated most of the marine from most of the freshwater individuals, identifying genomic loci putatively involved in the differentiation of the ecotypes. The second and the third approaches used a sliding window analysis to detect the divergence between the two populations. The second consisted in the calculation of a cluster separation score (CSS) to quantify the distance between the marine and the freshwater clusters; the third consisted in an unguided Bayesian model-based data-driven clustering (DDC) to calculate for each window a maximum number of clusters to which assign the individual samples. In total, 242 genomic regions showing a shared marine-freshwater divergence were identified by either method (0.5% of the genome). Testing of these approaches on a genomic location known to have evolved adaptively in the distinct species (EDA gene, fig.1) revealed the reliability of the three and the power of their complementary usage to spot putatively adaptively evolving loci. Figure 1: Parallel divergence signals at known armour plate locus. a) Ensembl gene models around EDA. b) Visual genotypes for sequenced fish (homozygous sites for most frequent allele in marine fish (red); homozygous for alternative allele (blue); heterozygous (yellow), or non-variable/missing/repeat- masked data (white)). c) DDC cluster assignments for marine (red) and freshwater populations (blue). Most fish are assigned to cluster k1, except in the boxed region, where freshwater fish are assigned to a distinct cluster (k2). d) SOM/HMM analysis supports patterns of divergence with a marine– freshwater-like tree topology in the centre, but not edges, of the window (trees a–d). e, f) Similar support is shown by CSS analysis (e) and its associated P-value (f). The combined analyses define a consensus 16-kb region shared in freshwater fish (vertical shaded box), matching the minimal haplotype known to control repeated low armour evolution in sticklebacks.  To test the extent of parallel reuse of these regions in adaptation to the freshwater environment in contrast to newly evolved adaptive loci, an independent sample of a pair of marine and freshwater individuals from the same geographical zone (River Tyne, Scotland) was subjected to sequencing and SNP analysis. The experiment showed that, within the most highly divergent windows of the genome, only a part (35.3% of the 0.1% most divergent windows) contained the globally shared loci (fig. 2). The result indicates that part of the divergence between the two phenotypes actually derives from shared standing variation, but also that new population specific mutation can play a role in the determination of the specific traits. Figure 2: How much of local marine–freshwater adaptation occurs by reuse of global variants? a) Classic marine and freshwater ecotypes are maintained in downstream and upstream locations of the River Tyne, Scotland, despite extensive hybridization at intermediate sites16. b) Pairwise sequence comparisons identify many genomic regions that show high divergence between upstream and downstream fish (x axis). Many, but not all, of these regions also show high global marine–freshwater divergence (y axis; red points indicate significant CSS FDR , 0.05), indicating that both global and local variants contribute to formation and reproductive isolation of a marine– freshwater species pair. Interestingly, the group found three loci showing clear marine-freshwater divergence within regions involved in chromosomal inversions (chromosomes I, XI and XXI, fig. 3). The finding supported the hypothesis that molecular mechanisms, such as chromosomal inversions, suppressing recombination between adaptive loci can be favored by selection for the maintenance of contrasting ecotypes in hybridizing populations. Figure 3: Genome-wide distribution of marine–freshwater divergence regions. Whole-genome profiles of SOM/HMM and CSS analyses reveal many loci distributed on multiple chromosomes (plus unlinked scaffolds, here grouped as ‘ChrUn’). Extended regions of marine–freshwater divergence on chromosomes I, XI and XXI correspond to inversions (red arrows). Marine–freshwater divergent regions detected by CSS are shown as grey peaks with grey points above chromosomes indicating regions of significant marine– freshwater divergence (FDR , 0.05). Genomic regions with marine– freshwater-like tree topologies detected by SOM/HMM are shown as green points below chromosomes. Finally, the analysis of the 64 genomic regions showing the strongest evidence of parallel evolution were investigated to determine the contribution of coding and non-coding variation to the adaptation to a different environment. Only 17% of them could be classified as coding based on the presence of non-synonymous substitutions, while the remaining part could be attributed to regulatory or probably regulatory changes (fig. 4a). To actually test if any regulatory change could be linked to these regions, a whole-genome microarray expression analysis was performed on tissues from a marine and a freshwater sample. The results obtained from genes mapping within or close to the loci identified by either method show a general divergence in the expression levels in different tissues between the two ecotypes (fig. 4b). ... Read more »

Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J, Swofford R, Pirun M, Zody MC, White S.... (2012) The genomic basis of adaptive evolution in threespine sticklebacks. Nature, 484(7392), 55-61. PMID: 22481358  

  • November 22, 2012
  • 10:21 AM
  • 213 views

Ecological success of recently emerged bacterial hybrids living in the wild

by Nicla Loviglio in genome ecology evolution etc

Microbial species are one of the most ubiquitous living group on Earth's biosphere, showing incredible ability to thrive even in ambient conditions to the limit of human endurance. By virtue of their rapid growth, bacteria are ideal for unraveling the molecular mechanisms of many evolutionary processes. Their rapidity to respond to changes has been associated to the combined effect of evolutionary processes, species composition or gene expression shifts. Most of the studies have focused so far on isolation and comparison of cultured bacterial population, while very few data are available concerning free-living bacteria. Therefore, it is still controversial how quickly, to which extent and by which mechanism microorganisms evolve in their natural environment. Two researchers of the University of California, Denef and Banfield, have tried to answer this question, as described in a paper recently published in Science. Their work report evolutionary rate estimates from bacterial populations living in a really challenging site, the hot, humid, low-pH, metal-rich and low-oxygen acid mine drainage in the Richmond Mine (Iron Mountain, CA), over the course of 9 years. This would seem not the ideal model system for conducting such kind of study, for the low accessibility at the sites only in limited periods of the year, but it perfectly fits the requirement of a discrete, reproducible and simple microbial community meeting very restricted input from other regions. In fact, the air-solution interface biofilm community consists in few organisms types, normally four to six. In this case Leptospirillum group II, which comprises iron-oxidizing bacteria that can live in sulfuric acid, dominates. The authors tried to trace back a lineage history of the group, starting by using the data of previous metagenomic studies Simmons et al. from the same lab published on Plos Biology in 2008. These led to the reconstruction of Leptospirillum group II type I and VI “reference” genomes, which share about 94% average nucleotide identity. As shown in figure 1, the two populations were sampled in 2002 and 2005 at 5-way and UBA locations, respectively.Figure 1. Adapted from Denef et al. Richmond Mine schematic map, with pie charts indicating genotype proportions in 24 samples, estimated on the basis of read recruitment. Already assembled genomes were compared against total population DNA, thus allowing the authors to find other four distinct Leptospirillum genotypes (types II to V). As indicated by proteomics-based results published by the same group in 2009, some particular sites, like C75, were clearly dominated by Leptospirillum type III genotype, which appeared to be a recombinant hybrid of genotypes I and VI. The recombination points, which were found by identifying discontinuities in reads alignments with the reference genomes, were all located within genes. The population sampled at C75 was ideal to be used to calculate the substitution rate, because of low level of variation within population and across space. A high rate of substitution of 1.4 × 10−9 per nucleotide per generation was calculated, if compared to previous estimates of bacterial genome-wide short term substitution rates, which have ranged from 7.2 × 10−11 to 4.0 × 10−9. Many reasons can account for these unexpected results, which anyway match with universal mutations-per-genome size predicted by Drake; for example, they used a unique approach combining proteomics inferred genotyping dataset, population genomic time series and, for the first time, cultivation-independent population genotypes, but also they sampled a unique model system, where human and natural perturbations, combined with the low biological complexity, can both affect the evolutionary rate estimates. Population genomic analyses suggested that the six Leptospirillum genotypes consist of a mosaic of type I and type VI genome blocks tens to hundreds of kb in length, probably recombined in a single cell and fixed in its descendants, as shown by the recurrence of the same transition points. Each genotype’s fixed mutation were used to construct the phylogenetic tree showed in figure 2, which suggests that the six Leptospirillum genotypes diverged from a common ancestor in a matter of decades (time of coalescence estimated between 2 and 44 years). Figure 2. Adapted from Denef et al. Evolutionary history of the sampled genotypes, based on the variant loci inferred using the maximum parsimony method. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test is shown next to the branches. Dotted arrows indicate recombination events; circle schematics represent the regions affected. The timeline indicates the calculated time ranges of recombination events as well as historical events. Branch D* is presented as two strains, each assigned half the total number of UBA 06/05 SNPs, because low incidence of SNPs precluded their linkage. BP, years before the present. Nicely, evidence for positive selection for hybrid genotypes was given by the finding of fixed non-synonymous substitutions in high number of genes involved in signal transduction, transcriptional regulation and global regulators. This study suggest that the evolution of Leptospirillum consisted of a mosaic of different events, comprising homologous recombination, fixation and selective sweeps that generated the different genotypes that can be currently observed. Some limits related to the paper can be found in the final author’s speculation that states selection between genotypes as due to genotypes divergence by only a few nucleotides. In fact it’s likely that this result was biased by the approach they chose, that only relied on the comparison of genes present in their reference genomes, which can produce incomplete or erroneous interpretation if the genome assemblies are not corrected. ... Read more »

  • October 10, 2012
  • 05:44 AM
  • 269 views

What could our genomes actually tell about disease risk?

by Charlotte Récapet in genome ecology evolution etc

Despite the recent advances in whole-genome sequencing, two recent studies let us think that we are far from uncovering the genetic basis of common diseases risk. In fact, information relevant to complex diseases might hide within rare or even private genome variations, often too scarce to be studied statistically. We might thus have to change radically our way of thinking of genes-diseases associations to make a step forward and make the DNA talk.Whereas a few, usually rare and severe “genetic disorders” can be traced to variations at one or two locations, or “loci”, in the DNA sequence, most common diseases are the result of complex interactions between protein-coding genes, non-coding DNA and environmental effects. These well-named “complex diseases” include cardiovascular, metabolic, neurologic and psychiatric conditions of great concern to health policies, such as early-onset stroke, myocardial infarction, diabetes, dyslipemia, Alzeihmer's, bipolar disorder or schizophrenia. Some of these complex diseases have a high heritability, which means that a great part of individual differences in the probability to develop the disease can be explained by differences in genomes. For example, the heritability of early-onset myocardial infarction is about 60% [1]: genomes are more important than environment in explaining the differences in early-onset infarction between individuals. Thus a lot of work has been going into identifying the changes in DNA sequences involved in complex disease heritability. Especially, the development of new sequencing technologies has allowed for comparison of hundreds of individual sequences and their mapping to various symptoms, a method known as “genome-wide association studies”. Hundreds of disease-related genetic variations have been identified this way. However they explain only a very small fraction of the heritability: in the case of early-onset myocardial infarction, only 2.8% of the heritability has already been linked to particular genes [2]. To explain the low power of association studies to identify genetic variants contributing to complex diseases, it was hypothesized that most variation in disease predisposition were due to “high risk” variants, that have a strong negative impact on health, but remain rare in a population because they are counter-selected [3]. In consequence, we would only need to increase sample size and therefore our power to detect rare variants to better explain the genetic basis of common diseases. In that scope, two studies published in the July issue of Science have used large datasets (respectively 2 440  and 14 002 genomes) to investigate the potential role of rare variants, defined when one of the variants at one locus is present in less than 0.5% of the individuals sampled. The large sample sizes allowed for detection of lots of previously unknown variants, thus highlighting the limits of previous smaller-scale studies: 90% of rare variants, but only 5% of common variants, found in 202 drug-target genes were novel, and estimates of discovery rates showed that lots of new variants are still to discover (Fig. 1).  Normal 0 0 1 18 107 1 1 131 11.1287 0 21 0 0 Nelson et al., An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People, Science 337, 2012. Fig. 1 Number of variants discovered per kilobase of sequence with sample sizes increasing to 5000 people for multiple populations.The studies also confirmed that variants with an potential impact on health remained rare: the proportion of non-synonymous variants, which result in an alteration of the protein synthesized, was higher in rare than in common variants (Fig. 2). Normal 0 0 1 18 107 1 1 131 11.1287 0 21 0 0 Nelson et al., An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People, Science 337, 2012. Fig. 2 Expected ratios of non-synonymous to synonymous variants in the absence of selection and observed ratios for rare to common alleles, from left to right. MAF (Minor Allele Frequency) is the frequency of the rarest version of a variant.However, rare variants were found to be more numerous than previously thought: around 90% of variants were rare. Interestingly, individuals of African ancestry exhibited less rare variants, but more variants of intermediate frequency than those of European ancestry. Moreover, most rare variants were population-specific (Fig. 3 and 4) and about 60% of all variants were only present in one individual. ... Read more »

  • October 3, 2012
  • 11:06 AM
  • 338 views

Evolutionary consequences of sex: It's not about what you're doing, but who you're doing it with...

by Charlotte Récapet in genome ecology evolution etc

Bacteria are one of the most ubiquitous living group and exhibit finely tuned adaptations to a wide range of habitats, even the most inhospitable ones. Their ability to evolve rapidly is at the roots of many public health issues, such as the development of resistances to antibiotics or the rapid evolution of seasonal diseases, but can also be of great help to humans by creating new metabolic pathways to transform human-made pollutants and harmful substances. In the early 20th century, new bacterial genomes were still thought to be the result of mutations only, and to be then transmitted vertically within a clonal strain. In the 40’s, the discovery of bacterial DNA recombination through transformation (Avery, MacLeod and McCarty experiment in 1944) or conjugation (Lederberg and Tatum experiment in 1946) shed light on the processes responsible for the rapid ecological differentiation of bacterial strains: an individual can acquire new genes or alleles through recombination that allow it to stand new ecological conditions.In Eucaryotes, genetic exchange and recombination through sexual reproduction is considered the basis of gene-specific transmission and selection among a population. However, the importance of genetic exchange between bacteria in uncoupling selection processes between different genes remains a controversial issue. In fact, contradictory observations have elicited two models of selection:On one hand, the ecological clustering of bacterial biodiversity in genetically consistent ecotypes support the traditional view that adaptive mutations are selected through whole-genome clonal selection. Moreover the low measured levels of recombination are insufficient to unlink a gene from the rest of the genome.On the other hand, the existence of environment specific genes and alleles suggests that recombination can unlink parts of the genome. Moreover, some loci exhibit low nucleotidic diversities compared to the rest of the genome, with suggest purifying selection on these regions. Thus adaptive mutations seems to be selected quite independently of the rest of the genome.To disentangle those apparently incompatible observations and assess the degree of gene uncoupling in bacteria, researchers of the MIT examined in a recently published study[i] the genomes of 20 strains representing two ecotypes in the marine species Vibrio cyclitrophicus. As the genomes of these ecotypes are extremely similar, they can be considered the result of recent ecological differentiation, thus giving us a snapshot of this evolutionary process. Based on the comparison of these sequences, the authors claim that gene-specific sweeps do occur and can lead to environment specific-genes on a short time scale, but also to ecological clustering, through preferential within-habitat recombination, on a longer time scale.In fact, different parts of the genome have different evolutionary histories. Especially, ecotype-specific SNPs are only found on a few locations in the genome, whereas the rest of the polymorphic genome supports a genetic intermingling between the ecotypes. Moreover, the two chromosomes of V. cyclitrophicus support different phylogenies, with chromosome 1 grouping the ecotypes, whereas chromosome 2 splits one ecotype into two groups. The phylogeny within one of these two groups is strongly supported by chr2 but not by chr1. Thus, habitat-specific genes are evolving quite independently and do not drive genomewide selective sweeps, an observation consistent with the environment specific genes and alleles that have already been documented[ii].These results highlight the need for high quality sequencing data and fine grained analysis to understand the evolutionary histories of different parts of the genome. In fact, the authors show that a few loci with consistent phylogeny, such as the ecotype-specific SNPs here, are sufficient to drive the whole-genome phylogeny, if the signal of clonal ancestry in the rest of the genome has been blurred by homologous recombination (Fig. 1). Therefore, the ecotype theory might be based on phylogenies biased toward the history of a few loci under purifying habitat-driven selection rather than on neutral loci with inconsistent histories accounting for most of the genome.Fig. 1: A. Maximum-likelihood phylogeny for the core genome (genes presents in all strains) of chromosome I in V. cyclitrophicus.Scale is substitution per site. All nodes have a 100% bootstrap support unless indicated. B. Genome regions with uninterrupted support for (black points) or against (grey points) the ecological split. ML trees for three major regions are shown. Adapted from Shapiro et al. 2012.The most important point made by the authors remains however their evidences for preferential within-ecotype recombination. When examining recombination events affecting recently diverged pairs of strains, recombination rates were found to be higher within than between habitats. The authors make here an essential point toward a unified theory of bacterial genomes evolution. Such preferential recombination indeed provides an explanation for the development of ecotypes, usually considered an evidence of genomewide genetic sweeps, from gene-specific sweeps. Even if the mechanisms involved in genes transmission are quite different between Eubacteria and Eucaryotes, they seem to converge in allowing gene-specific selective sweeps and in restricting genetic ... Read more »

Shapiro BJ, Friedman J, Cordero OX, Preheim SP, Timberlake SC, Szabó G, Polz MF, & Alm EJ. (2012) Population genomics of early events in the ecological differentiation of bacteria. Science (New York, N.Y.), 336(6077), 48-51. PMID: 22491847  

  • September 21, 2012
  • 03:19 AM
  • 279 views

The yak genome and adaptation to life at high altitude

by Nadja in genome ecology evolution etc

Normal 0 false false false EN-US JA X-NONE ... Read more »

Qiu Q, Zhang G, Ma T, Qian W, Wang J, Ye Z, Cao C, Hu Q, Kim J, Larkin DM.... (2012) The yak genome and adaptation to life at high altitude. Nature genetics, 44(8), 946-9. PMID: 22751099  

  • September 19, 2012
  • 03:17 AM
  • 460 views

The evolutionary history of polar bears

by Sacha in genome ecology evolution etc

The study of the Ursus lineage, composed of brown bear (Ursus arctos), black bear (Ursus americanus) and polar bear (Ursus maritimus), provides the ability of addressing the subject of adaptation to extreme (salty and glacial) environments in mammals. Moreover, in last few decades, polar bears won public and media attention, being one of the most charismatic species endangered by global warming and Arctic ice melting. To trace history of innovations and determine response to environmental change in population of polar bears, two articles published in Science and Proceedings of the National Academy of Sciences in April and June 2012 provide new data and insights to resolve this question. The absence of fossil of polar bears dating before the late Pleistocene (circa 126 000 years ago) and mitochondrial data, suggesting that polar bear were very closely related to a group of brown bear living in Admiralty, Baranof and Chicagof (ABC) islands in Alaska, previously led to believe that polar bears recently emerged from brown bears. The consequences of this hypotheses would be :Polar bear underwent a very rapid and recent (less than 200 ky ago) adaptation to extreme environment (previously not seen in mammals)Brown bear is a paraphyletic taxon, as polar bear is the sister specie of the ABC bears (see Fig. 1) Fig. 1: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012 Phylogeny of bear lineage with mitochondrial DNA and Bayesian maximum clade credibility modelThe blue box contains polar individuals coming from Svalbard and Alaska and an ancient sample 130ky to 110 ky old, the yellow box ABC individuals and the pink box other brown bear individuals. The outgroup is made of black bears individuals.Nevertheless, both fossil data, as it can be incomplete, and mitochondrial data, as it sensitive to hybridization, are not sufficient to confirm this hypothesis. Thus the two publishing group led in parallel projects aiming to collect nuclear data and test its agreement with mitochondrial data.Hailer et al., in their work Nuclear Genomic Sequences Reveal that Polar Bears Are an Old and Distinct Bear Lineage published in Science, sequenced 9116 nucleotides from 14 independent introns in 45 individuals of black, brown and polar bears. Introns were sequenced to provide more variation between individuals: given the low amount of time since the divergence of the last common ancestor of bears (estimated between 559 to 1 429 ky ago in their study), choosing exons, whose evolution being more likely bounded by selection, would have yielded less information.Using this data and various phylogenetic reconstruction (bayesian multilocus coalescent approach, bayesian inference for the concatenated data and neighbour-joining of the differentiation estimates between species) that all led to the same conclusion, they recovered the three species of bears as being monophyletic and observed in the species tree the polar bear clade being sister to the brown bear clade. They estimated the divergence time of the two species around 603 ky ago (338 to 934 ky being the 99% highest credibility range) and clearly revealed a discrepancy with the mitochondrial data.The authors resolved this incongruence by stating that the most probable scenario was a divergence between polar and brown species 600 ky ago and an hybridization event between 111 to 166 ky ago between polar bears and ABC bears leading to the complete replacement of the former mtDNA by the latter. The opposite phenomenon (several and severe introgression events of polar bears mtDNA into brown bears leading to all extant mtDNA being of polar origin) is judged very unlikely by the authors given the extended range of distribution of the brown bear. The lack of finding of older fossil from polar bears was explained by their constantly changing living environment. Despite the recent hybridization event, Hailer et al. found very few common nuclear haplotypes between polar and brown bears: out of the 35 polar and 79 brown haplotypes, only 6 of them were shared across both species. Nevertheless, we must bear in mind that given the relatively low amount of nuclear data analysed, those findings might not reflect the entire picture of polar and brown bears nuclear DNA ancestry.In Polar and brown bear genomes reveal ancient admixture and demographics footprints of past climate change, published in PNAS by Millet et al., a genome-wide sequencing project was adopted to unravel the same problem. In this extensive study, the authors assembled a reference genome of a polar bear individual, deeply sequenced the genome of two ABC, one black and one non-ABC brown bear (GRZ). Finally, they produced low coverage data from 23 other polar bear individuals, one of them being an ancient specimen 110 to 130 ky old found in Svalbard. Having aligned all reads from every samples to the polar bear genome reference, they identified 12 millions of what they called "SNPs" (even though they are dealing with three different species) and constructed the following phylogeny (Fig. 2).Fig. 2: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012Phylogeny based on the matrix of distances of the 12 millions SNP and using a neighbour-joining algorithm (probably given the amount of data and computational time needed with more sophisticated algorithms)We observe that, as in the previous paper, the nuclear data is not in agreement with the mitochondrial data. A scenario where polar bears emerged as a sister species of the brown species and later experienced a massive and unique event of mtDNA introgression from ABC bears (as the polar bear individuals form only one group in Fig. 1) is again strongly favoured. Regarding the ancient polar bear specimen, both trees inform us that it dates after the mtDNA introgression event and that the modern individuals living in Svalbard are actually more closely related to the modern individuals in Alaska than to the ancient one.Though up to this point both article seem consistent, following findings radically differs with the previous study. Indeed, Miller et al., used  a coalescence hidden Markov model for four of their deeply-covered genomes (one ABCs, one polar bear, one brown bear, one black bear) to assess the history of the lineage. They estimated both the splits of polar bears with brown bears and the common ancestor of those two species with black bears to have occurred around 4 to 5 My ago, as shown in Fig. 3.... Read more »

Hailer F, Kutschera VE, Hallström BM, Klassert D, Fain SR, Leonard JA, Arnason U, & Janke A. (2012) Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science (New York, N.Y.), 336(6079), 344-347. PMID: 22517859  

Miller W, Schuster SC, Welch AJ, Ratan A, Bedoya-Reina OC, Zhao F, Kim HL, Burhans RC, Drautz DI, Wittekindt NE.... (2012) Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. Proceedings of the National Academy of Sciences of the United States of America, 109(36). PMID: 22826254  

  • June 5, 2012
  • 02:39 AM
  • 141 views

The Molecular Diversity of Adaptive Convergence

by mrr in genome ecology evolution etc

The ArticleThe authors of this article wanted to find out how the mutational background of adaptation looks like. Specifically, they asked if identical populations adapted to a fixed environment, would adaptation occur via identical mutations or via various alternative pathways. To answer this question they experimentally evolved 115 populations of Escherichia coli to 42.2° Celsius for 2000 generations (6.64 generations of binary fission daily) and sequenced one clone each of every population, what they call “strain” or “line” throughout the paper. All populations originated from the same E. coli B REL1206 ancestral clone. Their system fulfills all of the requirements needed to answer the question: (i) a large number of replicatesfor statistical power, (ii) complete genome sequencing, so that mutations can be identified unambiguously, and (iii) a complex biological system, to ensure that the number of potential adaptive solutions is not trivial. As the experimental environmental change they chose temperature, a rather complex environmental variable since it affects different biological reactions such as respiration, growth and reproduction. Performance of the different strains was measured as fitness and yield. Fitness was defined as the density after competition of each of the evolved lines against a newly-derived Ara+ mutant of REL1206 after 1 day of competition. Each competition trial was replicated 6 times. Yield was measured as the number of colony forming units and also as the number of cells per volume, both also replicated 6 times. All 115 strains were fully sequenced using illumina paired-end sequencing on an Illumina HiSeq 2000. A computational pipeline was developed to identify all de novo mutations. 1331 total mutations were found and over 114 genomes, the ratio of non-synonymous to synonymous mutations per site was 5.75 as was the ratio of intergenic and non-synonymous mutations. It was estimated that ~80% of intergenic and non-synonymous mutations were beneficial. 82 of 119 large (>30bp) deletions were identical between at least two lines. While there were almost no shared point mutations and indels between two lines, there were several shared IS insertions, duplications and large deletions shared among several lines. On average, two strains shared only 2.6% of mutations (excluding synonymous mutations) but shared 20% of modified genes and 24.5% of affected operons. Focusing on genes with > 5 mutations, genes were clustered based on the literature into 10 functional units containing 37.5% of the mutations. At this level, two lines shared an average of 31.5% of affected units. All synonymous mutations were singletons. At the level of individual mutations, two mutations had to be identical to be counted as convergent. At higher levels (gene, operon, functional unit), two mutations were assumed shared if at least 10% of the gene affected by the mutations were shared. The difference in convergence between point mutations (2.6%) and functional units (31%) suggests that the diversity of possible adaptive mutations was not fully explored. To estimate the number of sites that contribute to an adaptive response given their data, they developed a, what they call, “simple model”. The modelis similar to the coupons collector’s problem [1]. This problem arises when you are collecting a predefined number of targets, for example Panini stickers of all players of the European Championship in soccer. You buy them in pockets of a small number. In the beginning every pocket contains a lot of new players and you advance fast in collecting different players. But as the number of collected players grows you need more and more pockets to find them all. The same goes for finding all the possible beneficial mutations. You need to find a lot of sites with mutations to be able to track down all beneficial mutations that contribute to an adaptive response. Their model assumes a number of L beneficial mutations (sites) and, additionally to the Panini sticker model, it also contains a variance parameter V. This parameter is needed because the sampling probability differs among sites. V captures the compound effect of mutation rates and selection coefficients among sites. For each combination of L and V, for 100 replicates, the sampling probability of each of the L sites was drawn from a shifted gamma distribution and stored in a table. Now, given this table 20, 40, 60, 80, 100 and 114 strains were sampled (without replacement) and for each strain, its exact number of mutations of interest was sampled (without replacement). For each sample size, the number of different sites was estimated. The curve was then averaged across 100 replicates of the process and the squared difference between the averaged curve and the one based on real data was used to identify the parameter space of L and V. With this model it was estimated that with no variance in V, 850 possible sites of beneficial mutations are required to yield the 400 observed point mutations (including > 3 convergent mutations). When the sampling probability of beneficial mutations varied across sites, the estimated number would raise up to ~4500 sites. A further task that the authors tried to tackle was to decipher the role of epistasis to see how beneficial mutations interact among each other. Epistasis was examined statistically using a resampling procedure. Assuming no epistasis, the presence of a mutation in a gene should not affect the probability of observing another mutation. The randomization pr... Read more »

Tenaillon, O., Rodriguez-Verdugo, A., Gaut, R., McDonald, P., Bennett, A., Long, A., & Gaut, B. (2012) The Molecular Diversity of Adaptive Convergence. Science, 335(6067), 457-461. DOI: 10.1126/science.1212986  

  • May 31, 2012
  • 07:34 AM
  • 323 views

Rapid Evolution of Enormous, Multichromosomal Genomes in Flowering Plant Mitochondria with Exceptionally High Mutation Rates

by mrr in genome ecology evolution etc

Theories Genome size and complexity variation has been a long-term debate during the last decades. In multi-cellular eukaryotes, genome expansion is a consequence of noncoding DNA proliferation [1]. Several theories have emerged to explain variation in genome size and complexity. Among them, the most generally accepted are the bulk-DNA hypothesis, followed by the selfish –DNA hypothesis [2]. However, theses hypotheses explain only partially divergent patterns observed in eukaryotes. Mutational burden hypothesis (MBH), which is mostly based on population genetics principles, is a unifying concept that attempts to reconcile different points of view. This hypothesis implies that  “…noncoding element are generally deleterious but proliferate nonadaptively when small effective population reduce the effectiveness of selection relative to genetic drift”. In other words[3], the genome is constantly under two nonadaptative forces: random genetic drift and mutation pressure. What was expected?If the MBH is correct, a genome under high mutation rate would be reduce in term of size and complexity.A glimpse of plant mitochondrial genomes: what make them specialMitochondrial genomes exhibited a broad range of diversity in term of genome structure and diversity among eukaryotes [4]. The plant mitochondrial genome contain usually more than 90% of non coding DNA with usually low point mutation rate whereas animal mitochondrial seems refractory to such expansion of noncoding DNA [5]. The authors select the genus Silene, which include members with high mitochondrial mutations rate, while other members within the same genus have maintained their low rates. Findings and InterpretationsA massive expansion of genome associated to massive acceleration of mutation rates at DNA level was clearly established in S. noctiflora and S. conica, as compared to S. vulgaris and S.  latifolia (Figs 1,2 and 3). However, during our round table discussion, it was unclear how the branch length of the tree presented in the figure 1 was computed. As no branch values were shown, was it done based on pre-computed data? Theses observations were neither correlated to gene nor intron content. Usually genome growth is largely dependent on intronic and intergenic sequences. Intronic sequences did not shown significant variation among Silene species (shown in Table 1). As expected, this massive genome expansion was mostly due to intergenic sequences, which constitute 99% of the total genome size. These intergenic sequences in S. conica and S. noctifloralack detectable homology when compared to other genomes. A possible explanation may be that high mutation rates may have exerted such pressure that made them significantly diverge from their counterparts in other Silene. A striking feature in S. conica and S. noctiflora, was the large number of imperfect repeats observed, which were linked to the presence of large number of small circular-mapping chromosomes. It is also worthwhile to see that these chromosomes shared only short repeats with other parts of the genome. At the opposite of what was found in S. vulgarisand S. latifolia, fast-evolving genomes in S. conica and S. noctiflora had a reduced recombination rate (figure 6). The underlying idea is that high mutational rate may favor changes in the repeats that make them less efficient for recombination. However this argument has to be considered with caution, as recombination may also favor formation of novel sequences or chimeras, which may potentially contribute to genome instability instead of maintenance. It is still unclear whether this impaired recombination activity in fast evolving genome may be responsible for the expansion, but at least it would partially agree with the MBH. The authors investigated, if the biparental inherence and heteroplasmy may play a role in genome expansion and finally claim that there is no significant i... Read more »

  • May 25, 2012
  • 03:04 AM
  • 443 views

An Aboriginal Australian genome reveals separate human dispersals into Asia

by mrr in genome ecology evolution etc

This blog section concerns a trendy debate in science, the human population history, which has extensions into daily life, as it can constitutes a topic of general public curiosity. Therefore, let’s see what is contribution described herein.BackgroundModern human populations seems to be derived from a single African ancestral population, under the well supported “out of Africa” hypothesis (1). Particularly, for eastern Asian colonization a “single-dispersal” model have been hypothesized (2), which suggest the aboriginal australians are a lineage diversified recently within the Asian cluster. This hypothesis could be summarized in a topological representation, as drawn in figure 1A of the article (Africans,(Europeans,(Asians,Australians))). Recent studies dated the split between Europeans and Asians around 17K-43K years before the present (ybp). In addition, archaeological evidence supports modern humans in Australia back to ~50K ybp. Those inferences are incompatible with the above mentioned hypothesis, at least in a time framework. A second scenario could be hypothesized, with an early branching process and occupation of Australia, and probable later genetic exchange between Asians and Australians, described as (Africans, (Australians,(Asians, Europeans)). This possibility has been non tested so far. Using an ancient, free of current admixtures, aboriginal australian genome, and SNPs data from different human populations, as well as, a background in molecular evolution and population genetic theories, this paper aims to distinguish between competing hypotheses to tackle the human population relatedness and migrations history of ancient australian populations.The facts in briefA 100-year-old lock of hair from an aboriginal Australian male (from Museum of Archaeology and Ethnology, UK)31 Institutions implied in a worldwide scale58 Authors, with same geographical extentAn ancient genome sequenced by Illumina technology and SNP-chip on other human populationsComputational analyses (PCA, clustering methods, ABBA/BABA expectations)A Science podcast interview (http://www.sciencemag.org/content/334/6052/94/suppl/DC2)Discussion We found the paper quite convincing in testing the two possible scenarios for human colonization in the Australian area. Next paragraphs will describe and discuss the evidence and test they used.1. Testing the genetic clustering of Aboriginal Australian genome.The principal component analysis illustrated in figure 1B shows the clustering pattern from 1220 individuals SNP chip data (449k SNPs), covering 79 human populations. This figure revealed a close relationship between the Australian genome, Highland Papua New Guinea (PNG), Bougainville and Aeta samples, all of them from the australo-melanesian region. That pattern could exclude any European contamination of the sample, which is highly probable by his long handling by Europeans. We noted the geographical tendency of a “continuous” colonization for human populations outside of Africa. I quoted continuous to clarify we are not referring to a single wave of colonization, but to a geographical ordination of the populations. A confusing point was expressed for the PCA inset, which looks like a 3D-box, but it already corresponds just to a zoom-in on the same PCA graph. A further review of the next PCA axes on supplementary material evidenced a very clear differentiation of the australo-melanesian sequences in the axis4.We speculated about the amount of data explained in the first two PCA axes, which is not described. Contrary to our expectations, from experiences in other types of characters (as morphology and climatic variables), the proportion of variance explained on this plot seems to be very low, as usual for genomic studies. Then, we discussed a bit the idea of a checklist of requirements when a publication is being prepared: if you are planning to present an analysis, take at hand i, ii, iii and please do not forget to include them.2. Testing admixture between Aboriginal Australian genome and other populationsThe figure 1C describes the ancestry proportions of all individuals SNPs set, obtained by a maximum likelihood estimation in Admixture software. This clustering analysis resembles the Structure k-categories approach, in which each line in the plot correspond to an individual and the colors represent the ancestral populations identities. The number of k-categories is assigned a-priori, and can modify the ancestry proportions of certain individuals revealing admixture processes between populations. At first, using a k=5, the aboriginal australian sample appears belonging to the same ancestral population than PNG and a higher proportion of the Bougainville individuals. Interestingly, south Asian population seems to share a small proportion of the SNPs with the ancestral aboriginal australian category. Once we moved in deep k-values, as far as k=20, the aboriginal australian genome appears more mixed with PNG, Bougainville, Aetas and South Asian populations.We debated the accuracy of use an individual genome to represent the admixture in the ancestral aboriginal australian population, and the unknown variability of the population at the ancient time, which is not being considered here. We formulated how could be affected the admixture patterns if this aboriginal Australian genome represents the most or the least mixed individual in the ancestral population? We wondered why there are not other recent Australian samples? Even if current aborigines inhabit in Australia. At this point in the discussion, we moved into more socio-political issues about the use of samples and information, as I stated at the beginning, this topic could be of general concern and discussion for several reasons.The evidence presented so far and an additional test below can help to distinguish between single vs. multiple dispersals “out of Africa” and likely the proportion of admixture between the first established populations and the second wave of migration. Furthermore, questions about how or why the second migration replaced almost in a complete way the first one, from my point of view, constitute statements largely "historical" and therefore difficult to draw and test from the evidence available. I consider is very difficult to go beyond of the patterns and processes we are able to model and test.3. D-test and ABBA/BABA hypothesis We tried to identify the goal and configuration of this test to discriminate between the competing hypotheses. Complete information of the test could be found in references 3 and 4. I will try to summarize it in a nutshell. The D-test is a four-taxon configuration (see figure) in which only biallelic sites are considered (A and B variants), two out of four taxa have fixed states, commonly on the outgroup sequence (here the Africans, but also the Europeans), and the other two sites differ between groups (here Aboriginals and Asians). This configuration produces either BABA or ABBA patterns. The next step is to count the number of sites supporting one or other patterns. The D test = ∑ (sites ABBA - sites BABA) / ∑ total sites. Usually, the test was defined to identify admixture between populations (with AB/BA sites), with the expectation of an equal number of the two types of sites. D test can be considered more robust to sequencing errors because it compares nucleotides in more than one sequence, which is less probable that have been taken place twice by error. The authors explicitly said the test do not allow to distinguish neither between the two models of origin, nor gene flow between Asians and Australian populations, however I consider the D-test performed here can support the multiple dispersal model, due to a statistically significant excess of sites grouping Africans and Australian Aboriginal genomes (sites with pattern 2 in figure). Expected vs. observed values of the D-test can facilitate the hypotheses discrimination (as they tried on the Table 2), however the expected values reported here for single and multiple dispersal models are so closer each other (~50%), with no credible intervals, that does difficult to support one or other hypothesis with the observed patterns. Finally, it is worthy of attention in the implementation of the D-test, consider that the patterns on current populations given the hypothetical past events, may have been altered by many other evolutionary processes as secondary gene flow, structure in the ancient population, incomplete lineage sorting, among others.Figure 1. Grouping site patterns 1 and 2 used in D-test. Note that African and European populations have fixed states, whereas that Aboriginal Australian and Asian populations vary. This figure is a modification of the figure 3 in reference 5. Even though it is not clear the ABBA/BABA patters, the different grouping patterns are based on the article text describing the two models of early dispersal hypotheses used to perform the test. ... Read more »

Rasmussen, M., Guo, X., Wang, Y., Lohmueller, K., Rasmussen, S., Albrechtsen, A., Skotte, L., Lindgreen, S., Metspalu, M., Jombart, T.... (2011) An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia. Science, 334(6052), 94-98. DOI: 10.1126/science.1211177  

  • May 4, 2012
  • 04:35 PM
  • 378 views

Distinct signatures of diversifying selection revealed by genome analysis of respiratory tract and invasive bacterial populations (Shea et al, PNAS 2011)

by PierreMillon in genome ecology evolution etc

Diversifying selection is a form of natural selection where intermediate values of a trait become less represented within a population, in favour of extreme values; a process that may subdivide a population between specialized niches and eventually lead to speciation. For instance, it can be theorized that a pathogen colonising several sites of the human body, where it is exposed to wildly different conditions and selective pressures, would have greater chances of survival by expressing a multitude of site-appropriate phenotypes than by reaching an adaptive compromise. While this strategy could be achieved through phenotypic plasticity, it could also result from genetically distinct strains of the pathogen.Streptococcus pyogenes, also known as the group A streptococcus, or GAS, is a Gram-positive human bacterial pathogen. It is responsible for diseases such as impetigo, a localized skin infection, or pharyngitis, the streptococcal “sore throat”, both of which are mild superficial infections. The same bacterium is involved in a wide range of “invasive” infections, i.e. infections of sterile sites such as blood, which can be severe. On an experimental standpoint, S. pyogenes is a useful model for studying bacterial clonal evolution, because its strains exhibit relatively limited amounts of horizontal transfer across portions of the core genome. This is in contrast to bacterial species that frequently exchange genetic material, thus complicating phylogenetic inference.The authors of this paper compare S. pyogenes strains found in superficial infections, more precisely in pharyngitis cases, to strains found in invasive infections. The authors enunciate several objectives:First, they want to extend our limited knowledge about the genomes of pharyngitis strains. Greater efforts have so far been expended to dissect the molecular basis of the more health threatening invasive infections.Secondly, they point out that very little is known about the precise genetic relationship between those two categories, and present their work as the first full genome analysis performed to address this issue. This analysis has been made possible thanks to high-thoughput DNA sequencing technologies.In particular, they want to test the widely accepted model, supported by epidemiologic studies, that most strains causing invasive infections arise from pharyngeal or other benign infections. In other words, do pharyngitis and invasive strains belong to the same genetic pool, provided they were collected from the same geographical location?Finally, they try to make sense of the genetic differences between pharyngitis and invasive strains in the light of diversifying selection. Can a link be made between the genomic sequences and the selective forces expected from the host oropharynx or sterile-site environments? On the origin of dataDuring the tutorial session, we discussed the notions of convenience sampling and reusing material from previous studies. The work presented in this paper is based on eighty-six serotype M3 GAS pharyngitis strains collected from six regional laboratories across Ontario from 2002 to 2010, as well as on two hundred fifteen serotype M3 GAS invasive strains collected from the same location as part of a prospective population-based surveillance study of invasive GAS infections from 1992 to 2009. Those invasive infections include unequal numbers of soft tissue infections, bacteremias, lower respiratory infections, unknown invasive infections, septic arthritides, necrotizing fasciitis, meningitides, toxic shock syndrome cases, peritonitis and other unspecified invasive infections. We were unsure as to whether the different and long time periods involved, or the high number and diversity of invasive infections when compared to pharyngitis, should be seen as strengths or weaknesses for the pertinence of the paper. Some of our concerns on the matter resurfaced while we were discussing figure 4, as will be explained later.The DNA sequence data obtained from those strains was mapped to the genome sequence of the M3 reference strain MGAS315 (NC_004070). A different but related experiment, also described in this paper, involved strains obtained from experimentally inoculated nonhuman primates [1]. Go figure As with previous sessions of this tutorial, we organised our discussion on a figure-by-figure basis. We found most of the figures in this paper to be in a large part confusing and / or unconvincing:Figure 1 shows the distribution of Chi2 statistics for unique polymorphisms per gene. The corresponding Bonferroni-adjusted P values, to correct for multiple testing, are written next to the dots on the plot. The meaning of the x-axis is not indicated, making the figure difficult to understand.Figure 3 shows two unrooted neighbour-joining phylogenetic trees assembled from the complete list of all core biallelic SNPs. One corresponds to the eighty-six pharyngitis strains and the other to one hundred temporally matched invasive strains. The two trees were assembled completely independently from each other. The authors claim that their remarkably similar overall structure suggests common evolutionary histories.  First, we discussed whether or not it would have been possible to root the trees, and concluded that it probably would have been very difficult. We also had our doubts about the focus on SNPs that seems to appear through this paper. But most importantly, we didn’t see any striking similarity between the shapes of those two trees.Figure 4, a combined phylogenetic tree for pharyngitis and invasive strains, was much more convincing in regard to the last issue. Pharyngitis strains are not massed on one branch of the tree, nor are invasive strains. An invasive strain will often be closer to a pharyngitis strain than to another invasive strain, supporting the idea that the two kinds of strains belong to the same genetic pool. Still, we also had problems with figure 4. The meaning of “SC”, as in SC1 to SC10, is unclear. Assuming those represent ten different strain collections, it would further suggest that strains from the same geographical area are more closely related. But the figure should more explicitly indicate which strains belong to which collection and give us more information about those collections. Otherwise, we are left wondering, for example, why SC3, SC4 and SC7 are so close from each other. Another issue, r... Read more »

Shea, P., Beres, S., Flores, A., Ewbank, A., Gonzalez-Lugo, J., Martagon-Rosado, A., Martinez-Gutierrez, J., Rehman, H., Serrano-Gonzalez, M., Fittipaldi, N.... (2011) Distinct signatures of diversifying selection revealed by genome analysis of respiratory tract and invasive bacterial populations. Proceedings of the National Academy of Sciences, 108(12), 5039-5044. DOI: 10.1073/pnas.1016282108  

  • April 4, 2012
  • 08:37 AM
  • 291 views

Mouse genomic variation and its effect on phenotypes and gene regulation (Keane, Goodstadt, and Danecek et al., Nature 2011)

by Laetitia in genome ecology evolution etc

Motivation: Documenting the genomic variation of 17 inbredstrains of mice. Describing the distribution of variants between strains andits relation to phenotypes and gene regulation. Exploring the evolutionaryorigins of the subspecies that gave rise to the laboratory mouse.-      Structure: The article is divided up in three mainparts: i) description of genomic variants, ii) examination of functionalconsequences of allele-specific variation on transcript abundance, and iii)investigation of the molecular nature of functional variants and their positionrelative to genes.-      Experimentaldesign: The 17 most widelyused mouse strains (liver tissue) were selected for whole genome sequencing onthe illumina GAIIx sequencing platform. To estimate error rates and evaluatethe method a NOD/ShiLtJ BAC clone library was constructed. 107 BACs from sevenloci on chromosomes 1, 6, 11 and 17 from this library were shotgun cloned andcapillary sequenced. SNPs, structural variants (inversions, balancedtranslocations, CNVs), and transposable elements were identified based on areference genome (the one that had already been sequenced before: C57BL/6J).Bayesian concordance analysis was used to construct gene trees across thegenomes of M. m. musculus, M. m.domesticus and M. m. castaneus. M.spretus was used as the outgroup. Allele specific expression was analyzedin liver, thymus, spleen, lung, hippocampus and heart using RNA sequencing.Each lane of transcriptome sequence was re-genotyped prior to downstreamanalysis. For this transcriptome analysis a F1 hybrid of two sequenced strainswas used. To identify sequence variants that underlie quantitative traits andinvestigate their common molecular features and their position relative tocoding genes the complete genome sequence of eight inbred strains (founderhaplotypes of lab strains) were used. QTLs used were chosen based on previousliterature (mainly [1]). Formore details on methods consult the supplementary information of the article.-      Main results: The whole genome sequences of 17 inbredlaboratory mouse strains are reported. Ten times more variants than previouslyknown were found. The phylogenetic history of laboratory mice strains could notbe completely resolved. 12% of allele-specific transcripts showed a significanttissue-specific expression pattern. The molecular nature of functionalvariants, as well as their position relative to coding genes, varies accordingto the effect size of the quantitative trait locus (QTL) and seems to have asignificant effect on the function.  Oddities of the article-      22 authors-      3 really bigguys in the end-      3 guyssharing first author-      verycondensed-      articlerepresents the integrative nature of current scienceDiscussionamong tutorial participants-      General discussionabout hiring processa)    Generallyit is good to be several times an author in the middle of an article if the PI,as well as the journal itself have a good reputation.b)   For firstauthors mostly the reputation of the journal counts. --> The hiring process is different for differentpositions. For a technician it is good if a) applies and for a PhD position ora post-doc position it is good if b) applies. For further steps in a career thecriteria are more stringent. -      ExperimentaldesignWouldyou repeat the experimental design of this study?Yes)The results are influential for all kinds of inbred mammal studies – evenhumans, a lot of new information is produced and the study has a high impact. No)It might be considered a waste of money to spend on 17 lab strains if only asubset is used in most analyses. Most of the lab strains look more or less thesame. Less of lab strains and more wild types could have been chosen. --> Afterwards we always know more than beforehand!  Conclusively, for the lab strainsbehavior, morphology and physiology are better studied than of any species.This information can be used to explain small genetic differences. There isalso a social constraint: You want to include as many people as possible tomake it more interesting for the whole mouse community. This contributes to theCollaborative Cross, a community resource for the genetic analysis of complextraits. The Complex Trait Consortium is to promote the development of resourcesthat can be used to understand, treat and ultimately prevent pervasive humandiseases [2]-      Figure 1……causedproblems to understand. What is the reference genome? (C57BL/6J) What does„inaccessible“ mean? (mostly LINEs, chr 17, chr X). There is more variation inoutbred strains (more color, longer distance). A lot of people had problemswith this figure. Most probably because figure 1a contains a lot of informationat different levels and it takes the reader a long time and a good colorprinter to understand what they want to show. On the other hand figure 1b israther simple. From the left to the right the blue circle increases relative tothe red one. Does that represent how variation evolves in the genome? The SNPsshow a small blue circle. This could be explained by bottlenecking or byselection acting on SNPs. Transposable elements show a large blue circle. Arelab strains evolving faster in this class? Unfortunately this part of thefigure is not touched in the discussion section of the article. -      Thegeneration and sequencing of NOD/ShiLtJ bacterial artificial chromosomes wasappreciated by most of the students. It is a nice way to estimate error ratesof the new sequencing techniques and it evaluates and confirms the method used.Public databases contain lots of false negatives per se right now because notmany individuals/strains/species were fully sequenced. It is compulsory to ... Read more »

Keane, T., Goodstadt, L., Danecek, P., White, M., Wong, K., Yalcin, B., Heger, A., Agam, A., Slater, G., Goodson, M.... (2011) Mouse genomic variation and its effect on phenotypes and gene regulation. Nature, 477(7364), 289-294. DOI: 10.1038/nature10413  

  • March 30, 2012
  • 03:47 AM
  • 475 views

Cryptic genetic variation promotes rapid evolutionary adaptation in an RNA enzyme (Hayden et al, Nature, 2011)

by Diego in genome ecology evolution etc

Cryptic genetic variation (CGV) is defined as “standing genetic variation that does not contribute to the normal range of phenotypes observed in a population, but that is available to modify a phenotype that arises after environmental change or the introduction of novel alleles”... Read more »

join us!

Do you write about peer-reviewed research in your blog? Use ResearchBlogging.org to make it easy for your readers — and others from around the world — to find your serious posts about academic research.

If you don't have a blog, you can still use our site to learn about fascinating developments in cutting-edge research from around the world.

Register Now

Research Blogging is powered by SMG Technology.

To learn more, visit seedmediagroup.com.