Using Matlab Parallel Server

Introduction

Matlab Parallel Server is a set of Matlab functions that allow you to run parallel jobs on the cluster.  In fact, you must use the Matlab Parallel Server if you want to run jobs that span multiple nodes. 

In this guide, you will learn how to configure Matlab using Nova OnDemand so that you can submit and manage Matlab jobs on the cluster.

       Example 1: A Serial Job
       Example 2:  A Parallel Job on 1 Node
       Example 3:  A Distributed Parallel Job on Multiple Nodes

Prerequisites

  1. Log in to  Nova OnDemand
  2. Request a Nova Desktop session from the list of Interactive Apps.

Set up the Configuration Scripts

You will need to copy a directory that contains some Matlab cluster configuration scripts to your directory, preferably a location under your group's work folder to store the scripts.  Also, you will need to choose a folder where your Matlab jobs will store their output.

  1. Copy the configuration scripts from /shared/hpc/matlab/matlab-cluster-config to a place under your work directory. 
    The following rsync command copies the directory /shared/hpc/matlab/matlab-cluster-config to a personal work directory along with the files it contains.  You should choose a path appropriate for you.
    $ rsync -av  /shared/hpc/matlab/matlab-cluster-config /work/ccresearch/jedicker/
  2. Next, under your work directory, make a directory for storing Matlab job results.  Use a path that makes sense for you:
    $ mkdir /work/ccresearch/jedicker/matlab-job-storage-location
  3. Navigate to where you copied the matlab-cluster-config directory.  Inside that directory, you should a file called mdcs.rc.  Open that file in a text editor such as vim or nano.  
  4. Find the line:
    RemoteJobStorageLocation = MATLAB_STORAGE_DIRECTORY

    Change the text MATLAB_STORAGE_DIRECTORY to the path you've selected for the Matlab job storage path from step 2 above. 

Launch Matlab

If you haven't already done so, launch Matlab.  (All versions of Matlab can use Matlab Parallel Server).   There are two ways to launch Matlab:
Select the version you want from the desktop menu under Applications -> Mathematics -> Matlab

or
Open the Terminal Emulator application from the desktop menu to open a shell window.  Then, from the shell, do:
$ module load matlab/R2024b   
$ matlab

Run the configCluster.m script from Matlab

From within the Matlab Command Window, navigate to your copy of the matlab-configuration-scripts directory that you made earlier and run the configCluster script.   The script will prompt you for the username you use to log in to Nova.  It will also and remind you that whenever you submit a job you must be sure to set the job time and number of nodes, along with a snippet of Matlab code showing how to do that.  See the following for an exmple:

 Open the Cluster Profile Manager

If the configCluster program worked correctly, you should see the cluster profile added to your Parallel environment.  To see this, select the Home tab in the Matlab desktop.   Find the ENVIRONMENT pane.  Click on the Parallel pull-down menu and select Create and Manage Clusters as shown below.  This will open the Cluster Profile Manager window.

Open the Create and Manage Clusters window

Note that the clusterConfig script automatically appends the version of Matlab you are using to the profile name.   If you run multiple versions of Matlab, or upgrade to another version, you will need to run the configCluster program again for each Matlab version you use.  Initially, the Cluster Profile Manager window should look similar to the screenshot below.   However, we will need to make several changes to the cluster profile.   To edit the profile, choose the profile you created above (in this case, the profile labeled nova R2024b (default) ), then click the Edit button at the bottom of the page.

The Matlab Cluster Profile Manager

 Settings for the Cluster Profile

Click the Edit button at the bottom of the Cluster Profile Manager to make changes.  Click Done to save any changes, or Cancel to discard any changes. 

The first properties we need to change are shown below:

Description (Description of this cluster):  nova R2024b  (this is the default)

JobStorageLocation (Folder where job data is stored on the client):  Set this to the same path you set for RemoteJobStorageLocation in the mdcs.rc file above, preferably under your work directory.

NumWorkers (Number of workers available to cluster):  200 (this is the default)

NumThreads (Number of computational threads on each worker):  Use default

ClusterMatlabRoot (Root folder of Matlab installation):   This will be determined by the clusterConfig.m script. The path will be different based on the version of Matlab.  For R2024b, the path is /opt/engr/apps/matlab/R2024b

RequiresOnlineLicensing (Cluster uses online licensing):  Use default

LicenseNumber (Optional):   Leave blank

 

cluster-properties-1st-page

Scrolling down to the CLUSTER ENVIRONMENT properties in the editor, change the settings as shown:

Cluster environment properties

Settings Additional Properties for Slurm

Scrolling down, the next properties appear under the SCHEDULER PLUGIN heading.   These options configure how Matlab interacts with the Slurm job scheduler and allow you to set default parameters for your cluster jobs. 

PluginScriptLocation (Folder containing schedule plugin scripts):  This should already be set correctly after running the clusterConfig.m script in Matlab earlier.   The value should be the path to where you copied the matlab-cluster-config directory in your work directory.

You can use the AdditionalSubmitArgs property to set any default values for Slurm job parameters that you like.  For example, if you set AdditionalSubmitArgs to: 
      --time=8:00:00 --partition=nova --gres=gpu:1
then any Matlab Parallel Server jobs you submit would automatically include those Slurm job submit arguments.  Though, as you'll see, you can still override these submit values as needed for individual jobs. 

ClusterHost:  localhost
In the scenario we are setting up, the ClusterHost property should always be set to localhost.  This is because Matlab will issue Slurm commands directly from the computer the OnDemand session is logged in to.  To Linux, localhost means "the local computer". 

EmailAddress:   Set this to your preferred e-mail address. 
 

IdentityFile:   /home/<username>/.ssh/id_rsa      (Replace <username> with your username)
Explanation:  The compute nodes on the Nova cluster are set up to use SSH public key authentication without requiring a password.   Matlab uses this SSH public key authentication to submit jobs.   Basically, all users should have a private key called id_rsa and the matching public key file called id_rsa.pub stored in their ~/.ssh directory.   The contents of the id_rsa.pub file should have already been added to your ~/.ssh/authorized_keys file.     
IdentityFileHasPassphrase should be set to false.  (The SSH key pair does not use a passphrase.)

A sample image of the scheduler settings for this profile is shown below:

 

additional-props-for-slurm

Slide the scroll button down on the side of the AdditionalProperties table to see more additional properties that must be set:

MemUsage:   If your simulation doesn't require a lot of memory, you can leave this field blank and Slurm will assign roughly 8GB of memory per processor.   If you need a lot amount of memory for your job, you can specify a value (e.g. 120GB).    Or, you can set the value of MemUsage to 0, which tells Slurm that you want to use as much memory as is available.

NumNodes:  1 node is fine for many jobs.  If you're running a job across multiple nodes, you might want a higher value.

ProcsPerNode:  16 or 32 are good values to start with.

QueueName:  nova

RemoteJobStorageLocation:  Use the path for RemoteJobStorageLocation you used in the mdcs.rc file from above.

UseIdentityFile:  true  ( This tells Matlab to use the IdentityFile provided above with SSH public key authentication. )

The values should look like so:

More Scheduler Options

Above, we first created a cluster object 'c' using the 'hpc-class R2019b' profile.  Then, we entered 'c' without a semi-colon to see the list of properties in the cluster object.   Notice that there is a very important subset of properties called AdditionalProperties.  You will need to modify some of these properties values to actually submit a job to Slurm.   Let's examine the AdditionalProperties:


>> c.AdditionalProperties

ans = 

  AdditionalProperties with properties:

        AdditionalSubmitArgs: ''
                 ClusterHost: 'hpc-class.its.iastate.edu'
                EmailAddress: ''
                 EnableDebug: 0
                    MemUsage: ''
                    NumNodes: 0
                ProcsPerNode: 0
                   QueueName: ''
    RemoteJobStorageLocation: '/home/jedicker/MdcsDataLocation/hpc-class/purvis.localdomain/R2019b/remote'
             UseIdentityFile: 0
                     UseSmpd: 0
                    Username: 'jedicker'
                    WallTime: ''


As you can see, when you first create the cluster object, many of the properties do not have values.  As a minimum, you must set values for NumNodes, ProcsPerNode, and WallTime.  Optionally, you can set the EmailAddress, QueueName, and AdditionalSubmitArgs.  The AdditionalSubmitArgs is used for passing any additional options to the Slurm sbatch command.   

The examples below show how to use Matlab Parallel Server to run serial (single CPU) jobs, multi-processor jobs on a single machine, and distributed parallel jobs running on multiple nodes.  If you'd like to download Matlab scripts used in the examples, they are available here:

matlab_parallel_examples.zip

In the first example, we will run a simple Matlab job that uses a single processor.  This program isn't parallelized, but we can still execute the job on the cluster.   We'll begin by creating a Matlab script called mywave.m simply calculates a million points on a sine wave in a for loop.     The file mywave.m has been created in our current Matlab folder.  The function is pretty simple:


%--  mywave.m --%

%  a simple 'for' loop (non-parallelized):
for i = 1:1000000
    A(i) = sin(i*2*pi/102400);
end


As it is, this function just loops a million times. To run this job on the cluster, we'll create a new script called 'run_serial_job.m':


% run_serial_job
%   Run a serial (i.e. single CPU, non-parallelized) Matlab job on a cluster. 
%   Calls a script 'mywave' that executes a loop in serial.

 

% First initialize a cluster object based on an existing cluster profile.
c = parcluster('condo R2019b')

 

% Use the AdditionalProperties property of the cluster object to set job specific details:
c.AdditionalProperties.NumNodes = 1;                           % Number of nodes requested 
c.AdditionalProperties.EmailAddress = 'your_netid@iastate.edu';  % Your Email address (please modify accordingly).
c.AdditionalProperties.ProcsPerNode = 1;                       % Number of processors per node.
c.AdditionalProperties.WallTime = '2:00:00';                   % Max wall time
c.AdditionalProperties.QueueName = 'compute';
c.AdditionalProperties.AdditionalSubmitArgs = '';

 

% Other properties you may need to set:
% To set a specific queue name:
%        c.AdditionalProperties.QueueName = 'freecompute';
% To set the Slurm job name:
%        c.AdditionalProperties.AdditionalSubmitArgs = '--job-name=xxx';
%    NOTE:  if --job-name is not set here then Matlab assigns the job name itself as "JobN" where
%    N is determined by Matlab.

% Before starting the job, start a job timer to see how long the job runs:

tic

 

% Below, submit a batch job that calls the 'mywave.m' script.
% Also set the parameter AutoAddClientPath to false so that Matlab won't complain when paths on 
% your desktop don't exist on the cluster compute nodes (this is expected and can be ignored).

myjob = batch(c,'mywave','AutoAddClientPath',false)

 

% see https://www.mathworks.com/help/parallel-computing/batch.html for additional tips and examples for using the batch command.


% Wait for the job to finish (on a busy server, this might not be a good strategy).  
wait(myjob)

 

% display the job diary (i.e. the Matlab standard output text, if any)
diary(myjob)

 

% load the 'A' array (computed in mywave) from the results of job 'myjob':

load(myjob,'A');

 

%-- plot the results --%
plot(A);

 

% print the elapsed time for the job:
toc
 


When you run this script in Matlab, it should prompt you for your password for logging in to the cluster.  You will not need your google authenticator verification code (since you should already be connected over from the VPN).  Matlab caches your password during this session so you don't have to enter your again for submitting other jobs during the same Matlab session.

In the next example, we will run a parallel job using 8 processors on a single node.  We will be using a parallelized version of mywave.m called parallel_mywave.m that uses the parfor statement to parallelize the previous for loop:


%--  parallel_mywave --%

% A parfor loop will use parallel workers if available.
parfor i = 1:10000000
    A(i) = sin(i*2*pi/2500000);
end
 


The parfor command will use a pool of workers to execute the loop in parallel if a worker pool exists.  You can learn more about the use of parfor loops here:  https://www.mathworks.com/help/parallel-computing/parfor.html

Now we'll use a new Matlab script run_parallel_job.m (shown below) to run this job.  Note that the batch command has changed significantly by adding a pool for 8 workers.  Consequently, the AdditionalProperties.ProcsPerNode property is set to 1 more than the number of workers:


% run_parallel_job
%   Run a parallel Matlab job on a 1 cluster node with a pool of 8 Matlab workers.

% First initialize a cluster object based on an existing cluster profile,
% in this case we are using the 'condo R2019b' profile.


c = parcluster('condo R2019b')

 

% Use the AdditionalProperties property of the cluster object to set job specific details:
c.AdditionalProperties.NumNodes = 1;                           % Number of nodes requested. 
c.AdditionalProperties.EmailAddress = 'your_netid@iastate.edu';  % Your Email address (please modify).
c.AdditionalProperties.ProcsPerNode = 9;                       % 1 more than number of Matlab workers per node.
c.AdditionalProperties.WallTime = '2:00:00';                   % The max wall time for the job.
c.AdditionalProperties.QueueName = 'compute';
c.AdditionalProperties.AdditionalSubmitArgs = '';

% Examples of other properties that you might need to set:
%    To set a specific queue name, in this case to use the 'freecompute' free tier:
%        c.AdditionalProperties.QueueName = 'freecompute';

%    To set the Slurm job name.  (if not set, Matlab will use "JobN" where N is determined by Matlab):
%        c.AdditionalProperties.AdditionalSubmitArgs = '--job-name=xxx';
%    NOTE: The value of AdditionalProperties.AdditionalSubmitArgs is simply added on to the sbatch command
%          so this can be used supply any additional options to sbatch. 

 

% Start a job timer for recording the elapsed time for the job:


tic

 

% The batch command below creates a job object called 'myjob' that runs a
% Matlab job with 8 parallel pool workers.
% NOTE: Matlab will add an additional worker to the pool for its own use so be 
% sure that the number of processors requested from Slurm (NumNodes X ProcsPerNode)
% is greater than the total number of workers needed by Matlab.
% We also set the parameter AutoAddClientPath to false so that Matlab won't complain when paths on 
% your desktop don't exist on the compute node (this is typical and can be ignored).

 

myjob = batch(c,'parallel_mywave','pool', 8, 'AutoAddClientPath',false)

 

% see https://www.mathworks.com/help/parallel-computing/batch.html for additional tips and examples.

 

% Wait for the job to finish before continuing. 
wait(myjob)

 

% load the 'A' array from the job results. (The values for 'A' are calculated in parallel_mywave.m):

load(myjob,'A');

 

%-- plot the results --%
plot(A);

% print the elapsed time for the job:
toc
 


The next example, run_distributed_parallel_jobs.m, executes a function in parallel using 48 workers distributed to 4 nodes and 12 cpus per node. 


% run_distributed_parallel_job.m
%   Run a parallel Matlab job on a cluster with a pool of 48 Matlab workers across 4 compute nodes.

% First initialize a cluster object based on an existing cluster profile,
% in this case we are using the 'condo R2019b' profile.
c = parcluster('condo R2019b')

% Use the AdditionalProperties property of the cluster object to set job specific details:
c.AdditionalProperties.NumNodes = 4;                           % Number of nodes requested 
c.AdditionalProperties.EmailAddress = 'your_netid@iastate.edu';  % Your Email address (please modify).
c.AdditionalProperties.ProcsPerNode = 13;                      % Number of processors per node (1 more than Matlab workers per node).
c.AdditionalProperties.WallTime = '3:00:00';                   % Set a maximum wall time of 3 hours for this job.
c.AdditionalProperties.QueueName = 'compute';
c.AdditionalProperties.AdditionalSubmitArgs = '';

% Other properties often needed:
% To set a specific queue name, in this case to use the 'freecompute' free tier:
%        c.AdditionalProperties.QueueName = 'freecompute';
% To set the Slurm job name.  (if not set, Matlab will use "JobN" where N is determined by Matlab):
%        c.AdditionalProperties.AdditionalSubmitArgs = '--job-name=xxx';
% NOTE: The value of AdditionalProperties.AdditionalSubmitArgs is simply added on to the sbatch command. 

% Start a job timer for recording the elapsed time for the job:
tic

% The batch command below creates a Matlab job with 48 pool workers, one
% per CPU.  Each worker will do a portion of the computation in the parallel_eigen function.
% A handle to the job object is created called 'myjob'.
% NOTE: Matlab will add an additional worker to the pool for its own use so be 
% sure that the number of processors requested from Slurm (NumNodes X ProcsPerNode)
% is at least 1 greater than the total number of workers needed by Matlab.
% Also, set the parameter AutoAddClientPath to false so that Matlab won't complain when paths on 
% your desktop don't exist on the compute node (this is typical and can be ignored).

myjob = batch(c,'parallel_eigen','pool', 48, 'AutoAddClientPath',false)

 

% see https://www.mathworks.com/help/parallel-computing/batch.html for additional tips and examples.


% Wait for the job to finish (on a busy cluster, this may not be a good strategy).  
wait(myjob)

 

% load the 'E' array from the job results. (The values for 'E' are calculated in parallel_eigen.m):

load(myjob,'E');

 

%-- plot the results --%
plot(E);

 

% print the elapsed time for the job:
toc