Running your applications on TSUBAME follows the following steps:
See here for how to get logged in to the system. For how to submit and manage your jobs, see below.
To run your applications on TSUBAME2, you first need to create a job script. A job script is a standard shell script file that includes the list of commands to run on the compute nodes. For example, to run a program named "Application" with arguments "-arg foo," the script would look like:
Save this as, say, "job.sh" and give the execution permission by "chmod +x job.sh". Note that the path to Application must be included in the $PATH environment variale. Standard commands such as mpirun would just work by "mpirun," but some may need to be specified with their full paths, such as /home/usr0/YOUR_ID/YOUR_PROGRAM.
Exact command lists depend on particular applicaions and commands. For example, to run MPI programs, job scripts would look like:
The job script created as above can be passed, or submitted, to the job management system called PBS. When submitted, the job management system finds available compute nodes and let your program run on them.
To share the 1000+ nodes of TSUBAME2 in a fair and efficient way, the system is partitioned into several groups of nodes with different usage modes. Each group is associated with a batch queue, to which user job scripts are submitted, so for the users batch queus are the interface to the node groups. See here for the available queues on TSUBAME.
To submit a job, use the "t2sub" command. This is actually a customized wrapper command of the native PBS job submission command, and provides several TSUBAME-specific options.
Note that except for the free trial queue you need to be a member of a TSUBAME group with enough TSUBAME points to run your job.
Here are some examples of t2sub command usage. See the command help for more details (run "t2sub -help" on TSUBAME).
Submitting a job script "job.sh" to a queue named "Q" by using TSUBAME points of group "t2gSomeGroup":
Like above, but to use 32 nodes and 12 processes per node, add this option:
Note that by default there are several resource limits such as memory usage and execution times. To extend the memory size limitation, add the following option:
where MEMORY_SIZE is the maximum memory size per node (e.g., 32gb). The default is 1GB. See the command help (t2sub -help) for how to specify the size value.
To extend the execution-time limit, whose default limit is 1 hour,
For example, "-l walltime=2:0:0" would allow a job to run up to 2 hours.
t2sub command has more command options for finer job controls. See the command help for more details. If you have some requirements but do not know how to do so with the command options, contact us at here.
Use the t2stat command to display your submitted jobs. By default this command only shows the information of your jobs. Use option -all to check all the jobs currently running or waiting on the queues. This command display the job ID number of each job, which can be used to control each specific job, such as cancelling.
If you accidentally submitted a job, you can cancel a job if it is not yet running. To do so, use the following command:
JOBID is the ID number of the job to be cancelled.