Download SRA fastq Data

how to get raw sequence datasets in batches

Posted by Xuan on February 20, 2021

How to download fastq data?

Here are three ways for fastq files download

  1. use SRAtool Kit to download .sra files and then convert to .fastq files
    • use SRA selector to get accession list.txt then download in batches
  2. SRA Explorer search and download [!recommended] support batches
  3. EMBL-EBI search and download

For data preprocessing, please refer to miRNA-seq的预处理

Step1 Download NCBI SRA Toolkit

# Download sra tool 
wget [choosed link]
tar zxvf *.tar.gz
cd *
pwd

# Install & add PATH 

#. Way 1
vim ~/.bash_profile
source ~/.bash_profile

#. Way 2
echo 'export export PATH=$PATH:YOUR_PATH/sratoolkit.2.9.6-ubuntu64/bin' >> ~/.bash_profile
source ~/.bash_profile

Step2 Use prefetch command obtain .sra files

-O –output-directory save files to DIRECTORY

Step3 Use fastq-dump command obtain .fastq files

prefetch SRR13338290 -O <DIRECTORY>

fastq-dump SRR13338290.sra

Frequently Used Tools (in sratool Kit):

fastq-dump: Convert SRA data into fastq format

prefetch: Allows command-line downloading of SRA, dbGaP, and ADSP data

sam-dump: Convert SRA data to sam format

vdb-validate: Validate the integrity of downloaded SRA data

Download in Batches

Go to SRA Run Selector For example SRP299920

SRP299920: includes 20 SRR* files, you can selected SRR flies you interested, then choose “Accession List” to obtain all SRR files list (in SRR_Acc_List.txt). then you can use prefetch command with file option to download all files.

After Downloaded, data(.sra files) are stored in ~/ncbi/public/sra

prefetch --option-file SRR_Acc_List.txt

-------- File Details ------
SRR_Acc_List.txt
SRR13338271
SRR13338272
SRR13338273
SRR13338274
SRR13338275
SRR13338276
SRR13338277
SRR13338278
SRR13338279
SRR13338280
SRR13338281
SRR13338282
SRR13338283
SRR13338284
SRR13338285
SRR13338286
SRR13338287
SRR13338288
SRR13338289
SRR13338290

After Downloaded, data(.sra files) are stored in ~/ncbi/public/sra

2. USE SRA Explorer: you can download fastq file directly

SRA Explorer, This tool aims to make datasets within the Sequence Read Archive more accessible.

Step1 Search Accession number

Step2 Selected Datasets and way for downloading

optional ways:

  • Raw FastQ Download URLs
  • Bash script for downloading FastQ files
  • Aspera commands for downloading FastQ files
  • Cluster Flow FastQ download file (nice filenames)
  • bcbio project file for FastQ downloads (nice filenames)

Step3 Copy command into a script and run by “nohup script.sh &”

Step4 unzip SRR13338275.fastq.gz

gzip SRR13338275.fastq.gz -d ./

Reference

  1. 推荐参考Aspera快速下载部分. SRA Toolkit - prefetch 快速下载NCBI SRA数据
  2. 文件下载流程
  3. aspera 下载ENA 数据报错Error: Client unable to connect to server (check UDP port and firewall
  4. .bash_profile和.bashrc的什么区别及启动过程