Contingency, predictability in the evolution of a prokaryotic pangenome

Mon, 01 Jan 0001 00:00:00 +0000

Significance

Different strains of the same prokaryotic species often show significant variation in gene content.

We do not know whether this variation is due to:

genetic drift
selection
- This expects sets of genes to be consistently and repeatedly gained or lost together, or sequentially.

We used machine learning to predict variable genes in a large set of Escherichia coli strains, using other variable genes as predictors.

Results and Discussion

Mon, 01 Jan 0001 00:00:00 +0000

Results

A Substantial Subset of Accessory Genes in E. coli Can Be Predicted Accurately.

The Random Forest approach is stochastic, so we repeated the analysis 100 times, each time splitting the data into training and test sets differently.

Contingency, predictability in the evolution of a prokaryotic pangenome on Superphysics

Contingency, predictability in the evolution of a prokaryotic pangenome

Significance

Results and Discussion

Results