Data Format Overview
Two main types of files are required: a taxonomic profile (including abundance and taxonomy information) and a metadata file. |
Taxonomic profile formatsTaxonomic profiles derived from both amplified 16S rRNA census data and whole-genome shotgun metagenomic data can be uploaded. The following formats are accepted (accompanying example metadata files are also given below):
mothurTwo files are needed for a mothur taxonomic profile: a consensus taxonomy file (download) and a .shared file (download). The consensus taxonomy file can be created with mothur's classify.otu command. The .shared format can be created using mothur's make.shared command. The accompanying example metadata file can be downloaded here. BIOM formatQIIME v1.5.0, QIIME uses the BIOM format for its OTU table format. The BIOM file can be generated using QIIME's make_otu_table.py script. An example biom file can be downloaded from here. Tab-separated (.txt) filesThe tab-separated (.txt) format is used for taxonomic profiles. Essentially, it consists of a data table containing expression values (raw counts from 16S data saved as a tab delimited text file (.txt) with rows for features (OTUs) and columns for sample). The tab delimited file can be generated from any spreadsheet program. Such a file must be in a specific format as discussed below:
Unrecognizable terms (e.g. "uncultured" or strain identifiers) can also be included in the taxonomic profile without causing any issues. There is no requirement to include information for multiple taxonomic rank levels, and there is no minimum or maximum taxonomic rank that must be included. Data cells can indicate the read count (preferable) or proportions or percentages of taxa in each sample.
Notes when formatting your data:
Examples
Metadata file format (download)Tab delimited (.txt) format is also used for metadata files. Sample names/IDs are in first column beginning with "#NAME" in first row. For metadata, sample names are present in columns and metadata types (e.g. depth, temperature) in rows. Data values should be discrete, qualitative labels (e.g. HIGH, MED, LOW). Please make sure that file does not contain Empty cells or with NA values. Use the same sample names/IDs as in your input taxonomic profile file. Note that you should make sure that neither your metadata type names or metadata labels include tab, since these are used to delimit separate items. Notes when formatting your data:
Example#NAME SampleType Sample1 skin Sample2 gut Sample3 skin Sample4 gut Sample5 gut Sample6 gut Sample7 skin Sample8 skin Tree file format (optional) (download)The tree file (.tre) must either be in newick format (example data below) or nexus-format. You can generate the tree file from QIIME or other software using representative sequences. A phylogenetic tree is a diagram which represents evolutionary relationships among species. It reflects how they have evolved from common ancestors. If two organisms share a more recent common ancestor, they are more related. Therefore, the phylogenetic tree can be used to represent distances between organisms, which will then be used for UniFrac distance based analysis, as implemented in phylogenetic beta diversity. Example(((((589277:0.00067,580629:0.00014)0.459:0.09069,535375:0.02088)0.766:0.02036,589071:0.03177)0.870:0.56187,(968675:0.23223,(1060621:0.00076,355750:0.00014)0.726:0.21268)0.845:0.15546)0.772:0.05313, ((1078207:0.24993,1097208:0.13062)0.918:0.16344,938948:0.34514)0.890:0.02316)0.862; Raw sequence data formatsRaw 16s sequence data (single or paired-end) should be uploaded as demultiplexed, per-sample, compressed files together with a metadata file.
ExampleA demo example data set containing 10 fastq files can be downloaded here. |
You will be logged off in seconds.
Do you want to continue your session? |