mpiBlast

Purpose

Run mpiBlast on 4 node using 64 processors in total to blast TAIR10_cdna_2000seq.fa against nt formatted in 62 data volumes.

Query sequence: TAIR10_cdna_2000seq.fa mpiBlast configuration file: .ncbirc pre-formatted nt : /ptmp/LAS/BioDatabase/mpiBLASTdb/NCBI/4nodes-62fragments

Steps

Let us run mpiblast from your working directory on condo: /work/LAS/your-netID/

cd /work/LAS4/xfzhao
mkdir mpiblast-tair10
cd mpiblast-tair10
cp /work/LAS4/xfzhao/tutorial/mpiBLAST/TAIR10_cdna_2000seq.fa ./
cp /work/LAS4/xfzhao/tutorial/mpiBLAST/.ncbirc ./

Make your own blast database or use the pre-formatted database

The pre-formatted nt with 62 fragments is located in

/ptmp/LAS/BioDatabase/mpiBLASTdb/NCBI/4nodes-62fragments

which is updated monthly.

To build your own formatted nt, link the nt fasta file to your test directory:

ln -s /ptmp/LAS/BioDatabase/mpiBLASTdb/NCBI/nt ./

Copy the script for mpiformatdb and the qsub script to your test directory, check the two scripts.
Then submit the job to build the database locally.

cp /ptmp/LAS/BioDatabase/mpiBLASTdb/NCBI/4nodes-62fragments/mpiformatdb_run.sh ./
cp /ptmp/LAS/BioDatabase/mpiBLASTdb/NCBI/4nodes-62fragments/mpiformatdb.qsub ./
qsub mpiformatdb.qsub

mpiformatdb.qsub submits mpiformatdb_run.sh to run, which formats nt into 62 fragments.

Run mpiBLAST

Edit the path in .ncbirc if needed.

"Data" and "BLASTMAT" are the path of the NCBI blast matrices, BLASTDB is for the regular blast database, and "Shared" is the path for blast database formatted by mpiformatdb, and "Local" is the temporary location for mpiblast output.

The command for mpiblast in mpiblast_TAIR10_cdna_2000seq.qsub is:

mpiexec -n 64 mpiblast -p blastn -d nt -i TAIR10_cdna_2000seq.fa   -o tair10_blast_output.txt --use-parallel-write   --time-profile=time_profile.txt --removedb

Request 4 nodes for total 64 processors, mpiblast 2000 CDNA sequences of TAIR10 against nt which was fragmented into 62 parts. --removedb instructs mpiBLAST to do all temporary work in the local storage space and remove the local copy of the database before terminating execution.