
This page builds upon that with: 1) heavier annotations and explanations to, in the style of this site all around, hopefully help new-comers to bioinformatics of course 🙂 and 2) examples of common analyses to do in R after processing amplicon data.

The DADA2 team has a great tutorial available here, and I learned my way around DADA2 by following that and reading through the manual. The paper can be found here, and the DADA2 R package can be found here here. It generates an error model based on your actual data, and then uses this error model to do its best to infer the original, true biological sequences that generated your data. Developed and maintained by et al., DADA2 leverages sequencing quality and abundance information to a greater extent than previously developed tools. We’re going to be using DADA2, which is a relatively new processing workflow for recovering single-nucleotide resolved Amplicon Sequence Variants (ASVs) from amplicon data – if you’re unfamiliar with ASVs, you can read more about ASVs vs OTUs on the amplicon home page here, along with some other introductory information. Now that that’s out of the way, let’s get to it! Don't let anything here, or anywhere, constrain your science to doing only what others have done! These differences can often require changes to parameters that can be important. When working with your own data you should of course never follow any pipeline blindly, and pay attention to differences in your data vs the tutorial dataset you are using. This is simply one example of one workflow.

Keep in mind here that as with everything on this site, none of this is meant to be authoritative. If you’re new to either or both, there is a Unix crash course here and an intro to R here you may want to check out first 🙂īefore we get started here, an obligatory public service announcement: So it’d be best if you are already have some experience with both. We’ll be working a little at the command line, and then primarily in R.

Here we’re going to run through one way to process an amplicon dataset and then many of the standard, initial analyses.
