Nova Hardware

Overview

The Nova cluster began in 2018 with compute nodes using Intel Skylake Xeon processors, 1.5TB or 11TB of fast NVME local storage and 192GB / 384GB / 3TB of memory.  Five of those nodes also have one or two Nvidia Tesla V100-32GB GPU cards. In 2021 the cluster was expanded with nodes using AMD processors, each having two 32-Core  AMD EPYC 7502 Processors, 1.5TB of fast NVME local storage and 512GB of memory. The new GPU nodes in addition have four NVidia A100 80GB GPU cards. The 2022 expansion consists of 54 regular compute nodes (with two 32-Core Intel 8358 processors, 1.6TB of local storage and 512GB of memory each) and 5 GPU nodes with two 24-Core AMD EPYC 7413 processors, eight A100 GPU cards, 960GB of local storage and 512GB of memory each.

Contents

Detailed Hardware Specification

Number of NodesProcessors per NodeCores per NodeMemory per NodeInterconnectLocal $TMPDIR DiskAccelerator CardJob Constraint Flags
12Two 96-Core AMD EPYC 96551922.3 TB100G IB2.9 TBN/Anova25, amd, zen5
15Two 96-Core AMD EPYC 96551921.5 TB100G IB2.9 TBN/Anova25, amd, zen5
1Two 32-Core AMD EPYC 9335641.5 TB100G IB1.7 TB8x NVIDIA L40Snova25, amd
1Two 64-Core AMD EPYC 95551282.3 TB100G IB3.5 TB4x NVIDIA H200nova25, amd
1Two 32-Core AMD EPYC 9335641.5 TB100G IB1.7 TB4x NVIDIA H200nova25, amd
72Two 18-Core
Intel Skylake 6140
36192 GB100G IB1.5 TBN/Anova18, intel, skylake, avx512
40Two 18-Core
Intel Skylake 6140
36384 GB100G IB1.5 TBN/Anova18, intel, skylake, avx512
28Two 24-Core Intel Skylake 826048384 GB100G IB1.5 TBN/Anova18, intel, skylake, avx512
2Two 18-Core
Intel Skylake 6140
36192 GB100G IB1.5 TB2x NVIDIA Tesla V100-32GBnova18, intel, skylake, avx512
1Two 18-Core
Intel Skylake 6140
36192 GB100G IB1.5 TBone NVIDIA Tesla V100-32GBnova18, intel, skylake, avx512
2Two 18-Core
Intel Skylake 6140
36384 GB100G IB1.5 TB2x NVIDIA Tesla V100-32GBnova18, intel, skylake, avx512 
1Four 16-Core
Intel 6130
643 TB100G IB11 TBN/Anova18, intel, skylake, avx512
2Four 24-Core
Intel 8260
963 TB100G IB1.5 TBN/Anova18, intel, skylake, avx512
40Two 32-Core  AMD EPYC 750264512 GB100G IB1.5 TBN/Anova21, amd, epyc-7502
15Two 32-Core  AMD EPYC 750264512 GB100G IB1.5 TBfour NVidia A100 80GB nova21, amd, epyc-7502
56Two 32-Core Intel Icelake 8358 64512GB100G IB1.6TBN/Anova22, intel, icelake, avx512
5Two 24-Core  AMD EPYC 741348512GB100G IB960GBeight NVidia A100 80GB nova22, amd
16Two 32-Core Intel Icelake835864512GB100G IB1.5TBN/Anova23, intel, icelake, avx512
11Two 96-Core AMD EPYC 9654192768GB100G IB1.7TBN/Anova24, amd, epyc-9654, avx512
3Two 96-Core AMD EPYC 9684x192768GB100G IB1.7TBN/Anova24, amd, epyc-9684x, avx512
15Two 96-Core AMD EPYC 96551921.5TB100G IB3TBN/Aamd,avx512,nova25,zen5
10 Two 96-Core AMD EPYC 96551922.3TB100G IB3TBN/Aamd,avx512,nova25,zen5

HPC group schedules regular maintenances between academic terms to update system software and to perform other tasks that require a downtime.

The date of the next maintenance is listed in the message of the day displayed at login (when ssh-ing to the cluster).

Note: Queued jobs will not start if they cannot complete before the maintenance begins. In the output of the squeue command the reason for those jobs will state (ReqNodeNotAvail, Reserved for maintenance) . The jobs will start after the scheduled outage completes.

Detailed GPU Specification

NVIDIA Model SLURM codeHostsTotal # of GPU 
Cards Available
MemoryCuda CoresTensor Cores
v100 SXM2v100-sxm2-32G nova18-singularity, nova20-frost-18325120640
v100v100nova18-gpu-[1-5], nova20-matrix14325120640
a100 (40GB)a100-pcienova20-amp-1, nova21-amp-[2,3], nova22-amp-420406912432
a100 (80GB)a100nova21-gpu-[1-15], nova22-gpu-[1-5], nova22-amp-[5-8], nova23-amp-9124806912432
L40sl40snova24-gpu-1, nova25-gpu-1164818176568
RTX 2080 TIrtx_2080_Tinova20-frost-24114352544
RTX 6000rtx_6000nova20-frost-[3-7]204818176568
A40a40nova21-frost-844810752336
H200h200nova25-gpu-[2-4]12141 16,896528
GH200 (arm node)gh200nova24-gh-119614592456