The paper uses 16S rDNA sequencing, which is a bit old fashioned now but it was a good method when the paper was published. The steps basically involve:
1. Extract all DNA from poop, normally using a kit that basically makes DNA stick to tiny plastic beads. You wash the beads in a bunch of different chemical solutions to isolate DNA from the original sample and purify it. There are a lot of different methods to do this.
2. Amplify a small section of DNA that's universally unique to bacteria and archaea which is used as a barcode. This barcode has some areas that change a lot across different species and some areas that don't change much.
3. Sequence the amplified DNA. The DNA sequencer determines the sequence of nucleotides in each DNA amplicon (an amplicon is a piece amplified piece of DNA). An example DNA sequence is ACCTGGCT
3. The DNA sequencer produces millions of DNA sequences in parallel and stores them and some metadata (e.g. quality and confidence measurements) in text files
4. When this paper was published, a friendly bioinformatician would have taken the text file and clustered the different sequences. Sequences 97% similar were binned together as a rough approximation of a species. Different taxonomic levels have different cutoffs, but it's all quite vague and there are better methods now that involve denoising sequences from quality measurements (e.g. dada2 method)
5. A count for each different bin is generated, and "representative sequences" for each bin are matched against taxonomic databases to see what species are present
6. Normal ecological analysis is done on the count data to calculate alpha and beta diversity or do other types of analysis. Once you have counts, it doesn't matter that the data are from bacteria instead of sheep or penguins
Newer methods involve sequencing every single bit of DNA in a sample, not just a specific region. This is called metagenomics and it's very hard to do and requires very big computers and big DNA sequencers.
Great summary! Although I would argue that 16S is still a perfectly good (and cost-effective) method, especially with DADA2. There are also neat sequencing techniques that like CCS which give you really high resolution of a target region (amplicon) without sequencing a lot of redundant/uninformative DNA.
Very small amounts are sampled, on the order of grams. I'm not sure exactly, I worked on the bioinformatics side of things. I think an Illumina MiSeq requires 50 - 500 nanograms of DNA to work well.
Sampling and storage methods can significantly change the bacterial composition of an environmental sample (in this case, poop). The exact protocols will depend on the aims of the study. The gut microbiome is a gradient and very dynamic. Different parts of the gut will have different bacterial compositions. Some people might prefer to get a locally accurate sample from a biopsy of the intestine, but you won't manage to recruit many participants. Other studies may prefer to use faecal samples as a proxy for overall gut state, which lets you recruit more people. Some protocols may homogenise (blend) the poop before sampling, others might not. Here's a nice review:
> I think an Illumina MiSeq requires 50 - 500 nanograms of DNA to work well.
You can go as low as 10ng depending on the library you use (or so the vendors say), but I'm not sure it's the case for these specific applications (my experience is with other, equally difficult samples, but from a single source).
Really cool, thanks for the details. It's reminiscent of a plot point in a sci-fi I'm reading, Zendegi - the protagonist is hell-bent on cracking the problem of simulating the brain in a computer, but funding is running out. Her coworker excitedly explains that he's awarded a grant for his project: simulating the interactions of microbe species in latrines, for purposes of preventing outbreaks of disease. He wants our protagonist to jump ship on the brain stuff, says it will always be there, that she can do some real good right now, and the skills will transfer over, she won't be wasting her time learning how to simulate the microbial communities in poop.
(Alas, an eccentric billionaire with hopes of uploading himself to a supercomputer swoops in to fund the mind-mapping project...)
1. Extract all DNA from poop, normally using a kit that basically makes DNA stick to tiny plastic beads. You wash the beads in a bunch of different chemical solutions to isolate DNA from the original sample and purify it. There are a lot of different methods to do this.
2. Amplify a small section of DNA that's universally unique to bacteria and archaea which is used as a barcode. This barcode has some areas that change a lot across different species and some areas that don't change much.
3. Sequence the amplified DNA. The DNA sequencer determines the sequence of nucleotides in each DNA amplicon (an amplicon is a piece amplified piece of DNA). An example DNA sequence is ACCTGGCT
3. The DNA sequencer produces millions of DNA sequences in parallel and stores them and some metadata (e.g. quality and confidence measurements) in text files
4. When this paper was published, a friendly bioinformatician would have taken the text file and clustered the different sequences. Sequences 97% similar were binned together as a rough approximation of a species. Different taxonomic levels have different cutoffs, but it's all quite vague and there are better methods now that involve denoising sequences from quality measurements (e.g. dada2 method)
5. A count for each different bin is generated, and "representative sequences" for each bin are matched against taxonomic databases to see what species are present
6. Normal ecological analysis is done on the count data to calculate alpha and beta diversity or do other types of analysis. Once you have counts, it doesn't matter that the data are from bacteria instead of sheep or penguins
Newer methods involve sequencing every single bit of DNA in a sample, not just a specific region. This is called metagenomics and it's very hard to do and requires very big computers and big DNA sequencers.