Efficient Use of Resources¶
Picking the Best Partition For Your Job¶
On Pronto, servers are divided into partitions based on intended use case. Choosing the correct partition when you submit your job will maximize its performance. If you do not specify a partition when you submit your job, it will run on whatever server happens to be available. This can result in the job taking 5x longer or more.
The best way to choose a partition is to do a test run of your job on each partition with a small subset of your data.
If that is not possible, this page will help you give you a basic idea of which partition is suited to each use case.
Singled Threaded Jobs¶
If you are running a single threaded job, we recommend that you use one of the speedy nodes.
Add this to your batch submission file:
#SBATCH --partition=speedy
MPI Compatible Jobs¶
For mpi enabled jobs, you could consider either biocrunch of speedy depending on the nature of the job.
Add one of these to your batch submission file:
#SBATCH --partition=speedy
or
#SBATCH --partition=biocrunch
If your job scales quite well you could consider running on nova as well.
OpenMP Based or Local Thread Jobs¶
If your job is multithreaded and scales well with 8 or more cores, use the biocrunch nodes.
Add this to your batch submission file:
#SBATCH --partition=biocrunch
Massively Parallel Jobs¶
Consider the legion nodes, as they have many threads (272) per node, but each thread is rather slow. If your jobs parallelizes extremely well legion may be a good choice.
Add this to your batch submission file:
#SBATCH --partition=legion
If you find the legion nodes too slow for your application, then nova would be good local choices.
RAM-Limited jobs¶
Need more than 1.5TB of ram¶
You should use the bigram partition.
Add this to your batch submission file:
#SBATCH --partition=bigram
Don't need quite that much, but still a lot¶
Either bigram or biocrunch may be appropriate.
Add one of these to your batch submission file:
#SBATCH --partition=bigram
or
#SBATCH --partition=biocrunch
Other options¶
One of the fat nodes under nova may also work.
Disk Bound Jobs¶
In this case, you may want to consider transferring to localtmp or ptmp.
GPU Jobs¶
Add this to your batch submission file:
#SBATCH --gres=gpu:1 #If you just need one gpu, you're done, if you need more you can change the number
#SBATCH --partition=gpu #specify the gpu partition
Interactive Jobs¶
GPU¶
Use this srun command
srun --time=01:00:00 --nodes=1 --cpus-per-task=8 --partition=gpu-interactive --gres=gpu:1 --pty /usr/bin/bash
Other¶
Use this srun command
srun --time=01:00:00 --nodes=1 --cpus-per-task=8 --partition=interactive --pty /usr/bin/bash
I'm Not Sure¶
For a complete list of all the available hardware, please refer to this link.
If you are still not sure the best place to run your job, please contact researchit@iastate.edu and we would be happy to assist you.
How Many Nodes to Use¶
The number of nodes to use depends on the problem size and how well the application has been parallelized.
The number of nodes requested will also affect your priority in the job queue, with a dependence on the number of jobs already in a given queue. It may be useful to examine the queue and see how many jobs are ahead of you for a given queue type before deciding how many nodes to request for your job.
Often, applications will require a minimum number of nodes due to large memory requirements. Once we decide the minimum number of nodes we can determine the number of nodes to use for running the application. For example, let's take an MPI parallelization of the Jacobi iteration with N = 4*1024 and N = 64*1024 using differing numbers of nodes. Both of these problem sizes can be run on a single node, so the question is how many nodes should one use. The following numbers were obtained running on ISU CyEnce cluster in the fall of 2013.
For N = 4*1024, it is best to use 1 node for this program if one wants to make best use of the allocation and use 8 nodes to minimize the execution time, see Table 1. For N = 64*1024, using 64 nodes gives the shortest time and the cost for the allocation is not much different from the best value using only 1 node, see Table 2.
Number of Nodes | Seconds for N =4*1024 | Node-Seconds for N = 4*1024 |
---|---|---|
1 | 3.3 | 3.3 |
2 | 2.3 | 4.6 |
4 | 1.3 | 5.2 |
8 | 0.8 | 6.4 |
16 | 1.8 | 28.8 |
Table 1: Jacobi iteration with N = 4*1024 using different numbers of nodes.
Number of Nodes | Seconds for N = 64*1024 | Node-Seconds for N = 64*1024 |
---|---|---|
1 | 875.6 | 875.6 |
2 | 442.4 | 884.8 |
4 | 224.7 | 898.8 |
8 | 113.8 | 910.4 |
16 | 59.4 | 950.4 |
32 | 32.6 | 1,043.2 |
64 | 17.2 | 1,100.8 |
Table 2: Jacobi iteration with N = 64*1024 using different numbers of nodes.
Reference: "How to use the Condo and CyEnce Cluster" by Gleen R. Luecke, Summer 2015
Checking CPU/Memory utilization¶
We have the program htop installed on all of our servers, and we find this to be the most friendly way to see what's going on. Once you are logged in just run the htop command, and you'll get a view that looks something like this:
1 [||||||||||||||||||||||||||| 60.8%] 9 [||||||||||||||||||||||||||| 62.1%] 17 [|||||||||||||| 28.1%]
2 [|||||||||||||||||||||||||| 59.7%] 10 [||||||||||||||||||||||| 51.3%] 18 [|||||||||||||||||||| 43.8%]
3 [||||||||||||||||||||| 46.1%] 11 [||||||||||||||| 31.8%] 19 [||||||||||||||||||| 40.5%]
4 [||||||||||||||||| 37.7%] 12 [||||||||||| 23.7%] 20 [||||||||||||||||||||||||||||| 64.7%]
5 [|||||||||||||||||| 38.3%] 13 [|||||||||||||||||||| 45.8%] 21 [|||||||||||||||||||||||| 53.2%]
6 [||||||||||||||||| 37.9%] 14 [|||||||||||||||||||||||||| 60.0%] 22 [|||||||||||||||||||||| 48.7%]
7 [|||||||||||||||||||||||||||||||| 72.4%] 15 [|||||||||||||||| 33.8%] 23 [|||||||||| 20.4%]
8 [||||||||||||||||||||||||||||||||||| 78.1%] 16 [||||||||||||||||||| 42.2%] 24 [||||||||| 18.2%]
Mem[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 42.4G/252G] Tasks: 155, 67 thr; 12 running
Swp[| 169M/16.0G] Load average: 12.95 11.56 11.63
Uptime: 21 days, 19:31:25
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
113053 howard 20 0 191M 9076 3372 R 101. 0.0 36h46:47 /usr/bin/ssh -x -oForwardAgent=no -oPermitLocalCommand=no -oClearAllForwardings=yes
113048 howard 20 0 185M 7216 2244 R 30.6 0.0 11h21:22 scp -r phantoon.las.iastate.edu:/oldwork/LAS/jones-lab/copied_from_LSS .
24881 marshal 20 0 401M 197M 1636 R 27.3 0.1 0:00.42 /home/marshal/anaconda2/bin/trinity-2.0.6/Chrysalis/GraphFromFasta -i /home/marshal
25026 marshal 20 0 401M 274M 1640 R 26.7 0.1 0:00.41 /home/marshal/anaconda2/bin/trinity-2.0.6/Chrysalis/GraphFromFasta -i /home/marshal
The top section with bar graphs is showing you the percent of each compute core that is in use. Directly below the CPU usage, there is a bar graph representing the amount of memory in use, and some statistics to the right.
Below the graphs and statistics is a list of processes. Use the 'u' key to filter the list if you want to see how your processes are performing (this is a good way to confirm that your application is using the expected number of cores / RAM)
Checking GPU utilization¶
On servers that contain Nvidia GPUs, you'll need to use the nvidia-smi command to gauge GPU utilization, in addition to htop which will continue to show CPU and system RAM (different than GPU memory).
The output of the nvidia-smi command is a point in time snapshot (it doesn't auto-update like htop) , and output will look something like this:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37 Driver Version: 396.37 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:1B:00.0 Off | N/A |
| 25% 36C P2 71W / 250W | 10781MiB / 11178MiB | 32% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:1C:00.0 Off | N/A |
| 25% 31C P2 60W / 250W | 10781MiB / 11178MiB | 30% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:1D:00.0 Off | N/A |
| 25% 34C P2 87W / 250W | 10781MiB / 11178MiB | 29% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:1E:00.0 Off | N/A |
| 25% 33C P2 68W / 250W | 10781MiB / 11178MiB | 30% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 108... Off | 00000000:3D:00.0 Off | N/A |
| 25% 26C P8 11W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 108... Off | 00000000:3F:00.0 Off | N/A |
| 25% 26C P8 11W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX 108... Off | 00000000:40:00.0 Off | N/A |
| 25% 28C P8 16W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX 108... Off | 00000000:41:00.0 Off | N/A |
| 25% 25C P8 11W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 122537 C python36 10771MiB |
| 1 122537 C python36 10771MiB |
| 2 122537 C python36 10771MiB |
| 3 122537 C python36 10771MiB |
+-----------------------------------------------------------------------------+
In this example, you can see that 4 of the 8 GPUs in the system are currently using approx 30% of their GPU cores, and the memory on these cards is fully utilized. nvidia-smi doesn't show memory as a percentage, but we know that this model of GPU has approx 11GB of memory per card.