How to download fastq data?
Here are three ways for fastq files download
- use SRAtool Kit to download .sra files and then convert to .fastq files
- use SRA selector to get accession list.txt then download in batches
- SRA Explorer search and download [!recommended] support batches
- EMBL-EBI search and download
For data preprocessing, please refer to miRNA-seq的预处理
1.USE Official recommended software SRAtool Kit
Step1 Download NCBI SRA Toolkit
# Download sra tool
wget [choosed link]
tar zxvf *.tar.gz
cd *
pwd
# Install & add PATH
#. Way 1
vim ~/.bash_profile
source ~/.bash_profile
#. Way 2
echo 'export export PATH=$PATH:YOUR_PATH/sratoolkit.2.9.6-ubuntu64/bin' >> ~/.bash_profile
source ~/.bash_profile
Step2 Use prefetch command obtain .sra files
-O –output-directory
Step3 Use fastq-dump command obtain .fastq files
prefetch SRR13338290 -O <DIRECTORY>
fastq-dump SRR13338290.sra
Frequently Used Tools (in sratool Kit):
fastq-dump: Convert SRA data into fastq format
prefetch: Allows command-line downloading of SRA, dbGaP, and ADSP data
sam-dump: Convert SRA data to sam format
vdb-validate: Validate the integrity of downloaded SRA data
Download in Batches
Go to SRA Run Selector For example SRP299920
SRP299920: includes 20 SRR* files, you can selected SRR flies you interested, then choose “Accession List” to obtain all SRR files list (in SRR_Acc_List.txt). then you can use prefetch command with file option to download all files.
After Downloaded, data(.sra files) are stored in ~/ncbi/public/sra
prefetch --option-file SRR_Acc_List.txt
-------- File Details ------
SRR_Acc_List.txt
SRR13338271
SRR13338272
SRR13338273
SRR13338274
SRR13338275
SRR13338276
SRR13338277
SRR13338278
SRR13338279
SRR13338280
SRR13338281
SRR13338282
SRR13338283
SRR13338284
SRR13338285
SRR13338286
SRR13338287
SRR13338288
SRR13338289
SRR13338290
After Downloaded, data(.sra files) are stored in ~/ncbi/public/sra
2. USE SRA Explorer: you can download fastq file directly
SRA Explorer, This tool aims to make datasets within the Sequence Read Archive more accessible.
Step1 Search Accession number
Step2 Selected Datasets and way for downloading
optional ways:
- Raw FastQ Download URLs
- Bash script for downloading FastQ files
- Aspera commands for downloading FastQ files
- Cluster Flow FastQ download file (nice filenames)
- bcbio project file for FastQ downloads (nice filenames)
Step3 Copy command into a script and run by “nohup script.sh &”
Step4 unzip SRR13338275.fastq.gz
gzip SRR13338275.fastq.gz -d ./