Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:centro:servizos:hpc [2016/05/20 17:50] – [User queues] fernando.guillenen:centro:servizos:hpc [2024/03/13 10:37] (current) – [Sending a job to the queue system] fernando.guillen
Line 1: Line 1:
-FIXME **This page is not fully translated, yetPlease help completing the translation.**\\ //(remove this paragraph once the translation is finished)//+====== High Performance Computing (HPC) cluster ctcomp3  ====== 
 +[[ https://web.microsoftstream.com/video/f5eba154-b597-4440-9307-3befd7597d78 | Video of the presentation of the service (7/3/22) (Spanish only) ]] 
 +===== Description =====
  
-====== High Performance Computing (HPC) ======+The computing part of the cluster is made up of: 
 +  * 9 servers for general computing. 
 +  * 1 "fat node" for memory-intensive jobs. 
 +  * 4 servers for GPU computing. 
 +  
 +Users only have direct access to the login node, which has more limited features and should not be used for computing. \\ 
 +All nodes are interconnected by a 10Gb network. \\ 
 +There is distributed storage accessible from all nodes with 220 TB of capacity connected by a dual 25Gb fibre network. \\
  
-===== Quick usage instructions ===== +\\ 
----------------- +^  Name                    ^  Model      ^  Processor                                      Memory  ^  GPU                         ^ 
-A summary of the steps necessary to get a job done:+|  hpc-login2                |  Dell R440    1 x Intel Xeon Silver 4208 CPU @ 2.10GHz (8c)  |  16 GB    |                            | 
 +|  hpc-node[1-2]              Dell R740    2 x Intel Xeon Gold 5220 @2,2 GHz (18c)        |  192 GB   |                            | 
 +|  hpc-node[3-9]              Dell R740    2 x Intel Xeon Gold 5220R @2,2 GHz (24c)        192 GB   |                            | 
 +|  hpc-fat1                  |  Dell R840    4 x Xeon Gold 6248 @ 2.50GHz (20c)              1 TB     |                            | 
 +|  hpc-gpu[1-2]  |  Dell R740    2 x Intel Xeon Gold 5220 CPU @ 2.20GHz (18c)    192 GB    2x Nvidia Tesla V100S       | 
 +|  hpc-gpu3                  |  Dell R7525  |  2 x AMD EPYC 7543 @2,80 GHz (32c)              |  256 GB    2x Nvidia Ampere A100 40GB  | 
 +|  hpc-gpu4                  |  Dell R7525  |  2 x AMD EPYC 7543 @2,80 GHz (32c)              |  256 GB    1x Nvidia Ampere A100 80GB  |
  
-  - [[ es:centro:servizos:hpc:acceso_al_cluster | Log into the cluster and copy the necessary files.]] +===== Accessing the cluster ===== 
-  - [[ es:centro:servizos:hpc:escribir_script | Prepare the job for submission to the queue manager.]] +To access the cluster, access must be requested in advance via [[https://citius.usc.es/uxitic/incidencias/add|incident form]]. Users who do not have access permission will receive an "incorrect password" message.
-  - [[ es:centro:servizos:hpc:envio_trabajo Submit and manage the job in the queue manager.]]+
  
 +The access is done through an SSH connection to the login node:
 +<code bash>
 +ssh <nombre_de_usuario>@hpc-login2.inv.usc.es
 +</code>
  
 +=====  Storage, directories and filesystems  =====
 +<note warning> None of the file systems in the cluster are backed up!!!</note>
 +The HOME of the users in the cluster is on the file share system, so it is accessible from all nodes in the cluster. Path defined in the environment variable %%$HOME%%. \\
 +Each node has a local 1TB scratch partition, which is deleted at the end of each job. It can be accessed through the %%$LOCAL_SCRATCH%% environment variable in the scripts. \\
 +For data to be shared by groups of users, you must request the creation of a folder in the shared storage that will only be accessible by members of the group.\\
 +^  Directory        ^  Variable                Mount point              Capacity  ^
 +|  Home              |  %%$HOME%%              |  /mnt/beegfs/home/<username>  |  220 TB*    |
 +|  local Scratch      |  %%$LOCAL_SCRATCH%%      varía                        |  1 TB       |
 +|  Group folder  |  %% $GRUPOS/<nombre>%%  |  /mnt/beegfs/groups/<nombre>  |  220 TB*    |
 +%%* storage is shared %%
 +=== WARNING ===
 +The file share system performs poorly when working with many small files. To improve performance in such scenarios, create a file system in an image file and mount it to work directly on it. The procedure is as follows:
 +  * Create the image file at your home folder:
 +<code bash>
 +## truncate image.name -s SIZE_IN_BYTES
 +truncate example.ext4 -s 20G
 +</code>
 +  *  Create a filesystem in the image file:
 +<code bash>
 +## mkfs.ext4 -T small -m 0 image.name
 +## -T small optimized options for small files
 +## -m 0 Do not reserve capacity for root user 
 +mkfs.ext4 -T small -m 0 example.ext4
 +</code>
 +  * Mount the image (using SUDO) with the script  //mount_image.py// :
 +<code bash>
 +## By default it is mounted at /mnt/imagenes/<username>/ in read-only mode.
 +sudo mount_image.py example.ext4
 +</code>
 +  * To unmount the image use the script //umount_image.py// (using SUDO)
  
-===== Introduction ===== +The mount script has this options: 
-------------- +<code> 
-High Performance Computing (HPC from now oninfrastructures offer CITIUS researchers a platform to resolve problems with high computational requirements. A computational cluster is an set of nodes interconnected by a dedicated network that can act as a single computational elementThis offers a huge computational power (allowing the execution of a big parallel job or several concurrent small executions) in a shared infrastructure.+--mount-point path   <-- (optional) This option creates subdirectories under /mnt/imagenes/<username>/<path>  
 +--rw                  <-- (optionalBy default it is mounted readonly, with this option it is mounted readwrite. 
 +</code> 
 +<note warning> Do not mount the image file readwrite from more than one node!!!</note>
  
-A queue management system is a program that plans how and when jobs will execute using the available computational resources.  Allows for an efficient use of computational resources in systems with multiple usersIn the our cluster we use PBS/TORQUE.+The unmounting script has this options: 
 +<code>only supports as an optional parameter the same path you have used when mounting with the option  
 +--mount-point  <-- (optional) 
 +</code> 
 +=====  Transference of files and data  ===== 
 +=== SCP === 
 +From your local machine to the cluster: 
 +<code bash> 
 +scp filename <username>@hpc-login2:/<path> 
 +</code> 
 +From the cluster to your local machine: 
 +<code bash> 
 +scp filename <username>@<hostname>:/<path> 
 +</code> 
 +[[https://man7.org/linux/man-pages/man1/scp.1.html | SCP man page]] 
 +=== SFTP === 
 +To transfer several files or to navigate through the filesystem. 
 +<code bash> 
 +<hostname>:~$ sftp <user_name>@hpc-login2 
 +sftp> 
 +sftp> ls 
 +sftp> cd <path> 
 +sftp> put <file> 
 +sftp> get <file> 
 +sftp> quit 
 +</code> 
 +[[https://www.unix.com/man-page/redhat/1/sftp/ | SFTP man page]] 
 +=== RSYNC === 
 +[[ https://rsync.samba.org/documentation.html | RSYNC documentation ]] 
 +=== SSHFS === 
 +Requires local installation of the sshfs package.\\ 
 +Allows for example to mount the user's local home in hpc-login2: 
 +<code bash> 
 +## Mount 
 +sshfs  <username>@ctdeskxxx.inv.usc.es:/home/<username> <mount_point> 
 +## Unmount 
 +fusermount -u <mount_point> 
 +</code> 
 +[[https://linux.die.net/man/1/sshfs | SSHFS man page]]
  
-The way these systems work is: +===== Available Software ===== 
-        - The user requests some resources to the queue manager for a computational taskThis task is a set of instructions written in a script+All nodes have the basic software that is installed by default in AlmaLinux 8.4, in particular
-        The queue manager assigns the request to one of its queues+  * GCC 8.5.0 
-        When the requested resources are available and depending on the priorities established by the systemthe queue manager executes the task and stores the output.+  * Python 3.6.8 
 +  * Perl 5.26.3 
 +GPU nodes, in addition: 
 +  * nVidia Driver 510.47.03 
 +  * CUDA 11.6 
 +  * libcudnn 8.7 
 +To use any other software not installed on the system or another version of the system, there are three options: 
 +  Use Modules with the modules that are already installed (or request the installation of a new module if it is not available)
 +  Use a container (uDocker or Apptainer/Singularity) 
 +  - Use Conda 
 +A module is the simplest solution for using software without modifications or difficult to satisfy dependencies.\\ 
 +A container is ideal when dependencies are complicated and/or the software is highly customised. It is also the best solution if you are looking for reproducibilityease of distribution and teamwork.\\ 
 +Conda is the best solution if you need the latest version of a library or program or packages not otherwise available.\\
  
-It is important to note that the request and the execution of given task are independent actions that are not resolved atomically. In fact it is usual that the execution of the task has to wait in one of the queues until the requested resources are available. Also, interactive use is impossible.+==== Modules/Lmod use==== 
 +[[ https://lmod.readthedocs.io/en/latest/010_user.html | Lmod documentation]] 
 +<code bash> 
 +# See available modules: 
 +module avail 
 +# Module load: 
 +module <module_name> 
 +# Unload module: 
 +module unload <module_name> 
 +# List modules loaded in your environment: 
 +module list 
 +# ml can be used as a shorthand of the module command: 
 +ml avail 
 +# To get info of a module: 
 +ml spider <module_name> 
 +</code>
  
-==== Hardware description ====+==== Software containers execution ==== 
 +=== uDocker ==== 
 +[[ https://indigo-dc.gitbook.io/udocker/user_manual | uDocker manual ]] \\ 
 +udocker is installed as a module, so it needs to be loaded into the environment: 
 +<code bash> 
 +ml uDocker 
 +</code>
  
-Ctcomp2 is a heterogeneous cluster, composed of 8 HP Proliant BL685c G7, 5 Dell PowerEdge M910 and 5 Dell PowerEdge M620 nodes. +=== Apptainer/Singularity === 
-  * Each HP Proliant node has 4 AMD Opteron 6262 HE (16 cores) processors and 256 GB RAM(except node1 and the master with 128GB) +[[ https://sylabs.io/guides/3.8/user-guide/ | Apptainer/Singularity documentation]] \\ 
-  * Each Dell PowerEdge M910 node has 2  Intel Xeon L7555 (cores, 16 threads) processors and 64 GB RAM.  +Apptainer/Singularity is installed on each node's systemso you don't need to do anything to use it.
-  * Each Dell PowerEdge M620 node has 2 Intel Xeon E5-2650L (8 cores16 threads) processors and 64 GB RAM.  +
-  * Connection with the cluster is made at 1Gb but nodes are connected between them by several 10 GbE networks+
  
  
-==== Software description ==== +==== CONDA ==== 
-The job management is done by the queue manager PBS/TORQUETo improve energetic efficiency an on demand power on and off system called CLUES has been implemented.+[[ https://docs.conda.io/en/latest/miniconda.html | Conda Documentation ]] \\ 
 +Miniconda is the minimal version of Anaconda and only includes the conda environment manager, Python and a few necessary packagesFrom there on, each user only downloads and installs the packages they need. 
 +<code bash> 
 +# Getting miniconda 
 +wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.11.0-Linux-x86_64.sh 
 +# Install  
 +sh Miniconda3-py39_4.11.0-Linux-x86_64.sh 
 +#  Initialize for bash shell 
 +~/miniconda3/bin/conda init bash 
 +</code>
  
-  * [[http://docs.adaptivecomputing.com/maui/index.php|MAUI 3.3.1]+===== Using SLURM ===== 
-  [[http://docs.adaptivecomputing.com/torque/4-1-7/help.htm|Torque 4.1.3]] +The cluster queue manager is[[ https://slurm.schedmd.com/documentation.html SLURM ]]\\ 
-  [[http://www.grycap.upv.es/clues/eng/index.php|CLUES 0.88]]+<note tip>The term CPU identifies a physical core in a socketHyperthreading is disabled, so each node has as many CPUs available as (number of sockets) * (number of physical cores per socket) it has.</note> 
 +== Available resources == 
 +<code bash> 
 +hpc-login2 ~]# ver_estado.sh 
 +============================================================================================================= 
 +  NODO     ESTADO                        CORES EN USO                           USO MEM     GPUS(Uso/Total) 
 +============================================================================================================= 
 + hpc-fat1    up   0%[--------------------------------------------------]( 0/80) RAM:  0%     --- 
 + hpc-gpu1    up   2%[||------------------------------------------------]( 1/36) RAM47%   V100S (1/2) 
 + hpc-gpu2    up   2%[||------------------------------------------------]( 1/36) RAM: 47%   V100S (1/2) 
 + hpc-gpu3    up   0%[--------------------------------------------------]( 0/64) RAM:  0%   A100_40 (0/2) 
 + hpc-gpu4    up   1%[|-------------------------------------------------]( 1/64) RAM: 35%   A100_80 (1/1) 
 + hpc-node1   up   0%[--------------------------------------------------]( 0/36) RAM:  0%     --- 
 + hpc-node2   up   0%[--------------------------------------------------]( 0/36) RAM:  0%     --- 
 + hpc-node3   up   0%[--------------------------------------------------]( 0/48) RAM:  0%     --- 
 + hpc-node4   up   0%[--------------------------------------------------]( 0/48) RAM:  0%     --- 
 + hpc-node5   up   0%[--------------------------------------------------]( 0/48) RAM 0%     --- 
 + hpc-node6   up   0%[--------------------------------------------------]( 0/48) RAM:  0%     --- 
 + hpc-node7   up   0%[--------------------------------------------------]( 0/48) RAM:  0%     --- 
 + hpc-node8   up   0%[--------------------------------------------------]( 0/48) RAM:  0%     --- 
 + hpc-node9   up   0%[--------------------------------------------------]( 0/48) RAM:  0%     --- 
 +============================================================================================================= 
 +TOTALES: [Cores : 3/688] [Mem(MB): 270000/3598464] [GPU: 3/ 7] 
 +hpc-login2 ~]$ sinfo -e -o "%30N  %20c  %20m  %20f  %30G " --sort=N 
 +# There is an alias for that command: 
 +hpc-login2 ~]$ ver_recursos 
 +NODELIST                        CPUS                  MEMORY                AVAIL_FEATURES        GRES                            
 +hpc-fat1                        80                    1027273               cpu_intel             (null)                          
 +hpc-gpu[1-2]                    36                    187911                cpu_intel             gpu:V100S:                    
 +hpc-gpu3                        64                    253282                cpu_amd               gpu:A100_40:                  
 +hpc-gpu4                        64                    253282                cpu_amd               gpu:A100_80:1(S:0)              
 +hpc-node[1-2                  36                    187645                cpu_intel             (null)                          
 +hpc-node[3-9                  48                    187645                cpu_intel             (null)
  
-===== User queues ===== +# To see current resource use: (CPUS (Allocated/Idle/Other/Total)) 
--------------+hpc-login2 ~]$ sinfo -N -r -O NodeList,CPUsState,Memory,FreeMem,Gres,GresUsed 
 +# There is an alias for that command: 
 +hpc-login2 ~]$ ver_uso 
 +NODELIST            CPUS(A/I/O/T)       MEMORY              FREE_MEM            GRES                GRES_USED 
 +hpc-fat1            80/0/0/80           1027273             900850              (null)              gpu:0,mps:
 +hpc-gpu3            2/62/0/64           253282              226026              gpu:A100_40:      gpu:A100_40:2(IDX:0- 
 +hpc-gpu4            1/63/0/64           253282              244994              gpu:A100_80:1(S:0)  gpu:A100_80:1(IDX:0) 
 +hpc-node1           36/0/0/36           187645              121401              (null)              gpu:0,mps:
 +hpc-node2           36/0/0/36           187645              130012              (null)              gpu:0,mps:
 +hpc-node3           36/12/0/48          187645              126739              (null)              gpu:0,mps:
 +hpc-node4           36/12/0/48          187645              126959              (null)              gpu:0,mps:
 +hpc-node5           36/12/0/48          187645              128572              (null)              gpu:0,mps:
 +hpc-node6           36/12/0/48          187645              127699              (null)              gpu:0,mps:
 +hpc-node7           36/12/0/48          187645              127002              (null)              gpu:0,mps:
 +hpc-node8           36/12/0/48          187645              128182              (null)              gpu:0,mps:
 +hpc-node9           36/12/0/48          187645              127312              (null)              gpu:0,mps:
 +</code> 
 +==== Nodes ==== 
 +A node is SLURM's computation unit and corresponds to a physical server. 
 +<code bash> 
 +# Show node info: 
 +hpc-login2 ~]$ scontrol show node hpc-node1 
 +NodeName=hpc-node1 Arch=x86_64 CoresPerSocket=18  
 +   CPUAlloc=0 CPUTot=36 CPULoad=0.00 
 +   AvailableFeatures=cpu_intel 
 +   ActiveFeatures=cpu_intel 
 +   Gres=(null) 
 +   NodeAddr=hpc-node1 NodeHostName=hpc-node1 Version=21.08.6 
 +   OS=Linux 4.18.0-305.el8.x86_64 #1 SMP Wed May 19 18:55:28 EDT 2021  
 +   RealMemory=187645 AllocMem=0 FreeMem=166801 Sockets=2 Boards=1 
 +   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/
 +   Partitions=defaultPartition  
 +   BootTime=2022-03-01T13:13:56 SlurmdStartTime=2022-03-01T15:36:48 
 +   LastBusyTime=2022-03-07T14:34:12 
 +   CfgTRES=cpu=36,mem=187645M,billing=36 
 +   AllocTRES= 
 +   CapWatts=n/
 +   CurrentWatts=0 AveWatts=0 
 +   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/
 +</code> 
 +==== Partitions ==== 
 +Partitions in SLURM are logical groups of nodes. In the cluster there is a single partition to which all nodes belong, so it is not necessary to specify it when submitting jobs. 
 +<code bash> 
 +# Show partition info: 
 +hpc-login2 ~]$ sinfo 
 +defaultPartition*    up   infinite     11   idle hpc-fat1,hpc-gpu[1-4],hpc-node[1-9] 
 +</code> 
 +==== Jobs ==== 
 +Jobs in SLURM are resource allocations to a user for a given time. Jobs are identified by a sequential number or JOBID. \\ 
 +A JOB consists of one or more STEPS, each consisting of one or more TASKS that use one or more CPUs. There is one STEP for each program that executes sequentially in a JOB and there is one TASK for each program that executes in parallel. Therefore in the simplest case such as launching a job consisting of executing the hostname command the JOB has a single STEP and a single TASK.
  
-There are four user and eight system queues. The user queues are //routing// queues that setdepending on the number of computational numbers requested, the system queue in which each job is going to be executed. Users can'send their jobs directly to the system queues, jobs have to be submitted to the user queues+==== Queue system (QOS) ==== 
 +The queue to which each job is submitted defines the priority, the limits and also the relative "cost" to the user
 +<code bash> 
 +# Show queues 
 +hpc-login2 ~]$ sacctmgr show qos 
 +# There is an alias that shows only the relevant info: 
 +hpc-login2 ~]$ ver_colas 
 +      Name    Priority                                  MaxTRES     MaxWall            MaxTRESPU MaxJobsPU MaxSubmitPU  
 +----------  ---------- ---------------------------------------- ----------- -------------------- --------- -----------  
 +   regular         100                cpu=200,gres/gpu=1,node=4  4-04:00:00       cpu=200,node=4        10          50  
 +interactive        200                                   node=1    04:00:00               node=1                   1  
 +    urgent         300                        gres/gpu=1,node=1    04:00:00               cpu=36                  15  
 +      long         100                        gres/gpu=1,node=4  8-04:00:00                              1           5  
 +     large         100                       cpu=200,gres/gpu=2  4-04:00:00                              2          10  
 +     admin         500                                                                                                  
 +     small         150                             cpu=6,node=2    04:00:00              cpu=400        40         100  
 +</code> 
 +# Priority: is the relative priority of each queue. \\ 
 +# DenyonLimit: job will not be executed if it doesn'comply with the queue limits \\ 
 +# UsageFactor: relive cost for the user to execute jobs on that queue \\ 
 +# MaxTRES: limnits applied to each job \\ 
 +# MaxWall: maximum time the job can run \\ 
 +# MaxTRESPU: global limits per user \\ 
 +# MaxJobsPU: Maximum number of jobs a user can have running simultaneously. \\ 
 +# MaxSubmitPU: Maximum number of jobs that a user can have in total both queued and running.\\ 
 +  
 +==== Sending a job to the queue system ==== 
 +== Requesting resources == 
 +By default, if you submit a job without specifying anything, the system submits it to the default (regular) QOS and assigns it a node, a CPU and 4 GB. The time limit for job execution is that of the queue (4 days and 4 hours).  
 +This is very inefficient, the ideal is to specify as much as possible at least three parameters when submitting jobs: 
 +  -  %%Node number (-N or --nodes), tasks (-n or --ntasks) and/or CPUs per task (-c or --cpus-per-task).%% 
 +  -  %%Memory (--mem) per node or memory per cpu (--mem-per-cpu).%% 
 +  -  %%Job execution time ( --time )%%
  
-Independently of the type of queue used for job submissionsan user can only specify the following parameters: **node number**, **process number per node** and ** execution time**Size of memory assigned and maximum execution time of a job are determined by the system queue in which the job gets routed. Jobs that exceed those limits during execution will be canceled+In additionit may be interesting to add the following parameters: 
-Therefore for jobs in which both memory and execution time are critical it is recommended to modify the number of process requested (even though not all of them get used during the execution) to guarantee that the job needs are fulfilledThe system queue also determines the maximum number of jobs per user and their priority. Users are allowed to specify the job execution time because a precise estimation of execution times allows the queue management system to use resources efficiently without disturbing established priorities. Anyway it is advisable to set an execution time long enough as to guarantee the correct execution of the job and avoid its cancellation. +|  -J    %%--job-name%%  |Job nameDefault: executable name  | 
- __To execute jobs that don't adjust to queue parameters get in touch with the IT department.__+|  -q    %%--qos%%       |Name of the queue to which the job is sentDefault: regular  | 
 +|  -o    %%--output%%    |File or file pattern to which all standard and error output is redirected. 
 +|        %%--gres%%      |Type and/or number of GPUs requested for the job.   | 
 +|  -C    %%--constraint%%  |Para especificar que se quieren nodos con procesadores Intel o AMD (cpu_intel o cpu_amd) 
 +|    |  %%--exclusive%%  |To specify that you want nodes with Intel or AMD processors (cpu_intel or cpu_amd) 
 +|  -w  |  %%--nodelist%%   |List of nodes to run the job on  |
  
-User queues are ''batch'', ''short'', ''bigmem'' and ''interactive''. +== How resources are allocated == 
-   ''batch''It's the default queue.((If no queue is specified with the ''-q'' parameter of the ''qsub'' command job will be assigned to the ''batch'' queue.)) Accepts up to 10 jobs per userJobs sent to this queue can be executed by any system queue.   +The default allocation method between nodes is block allocation (all available cores on a node are allocated before using another node)The default allocation method within each node is cyclic allocation (the required cores are distributed equally among the available sockets in the node).  
-   ''short''This queue is designed to reduce the waiting time of jobs that don't need much computational time (maximum 12 hours) and that don't use many resources (less than 16 computational cores)It has more priority than the ''batch'' queue and admits up to 40 jobs per user. Jobs sent to this queue can be executed by the system queues:''np16'',''np8'', ''np4'',''np2'' and ''np1''To send job to this queue it is necessary to use the ''-q'' option of the ''qsub'' command explicitly+ 
-<code> +== Priority calculation == 
-ctqsub -q short script.sh +When a job is submitted to the queuing system, the first thing that happens is that the requested resources are checked to see if they fall within the limits set in the corresponding queue. If it exceeds any of them, the submission is cancelled. \\ 
 +If resources are available, the job is executed directly, but if not, it is queuedEach job is assigned a priority that determines the order in which the jobs in the queue are executed when resources are availableTo determine the priority of each job, 3 factors are weighted: the time it has been waiting in the queue (25%), the fixed priority of the queue (25%) and the user's fairshare (50%)\\ 
 +The fairshare is dynamic calculation made by SLURM for each user and is the difference between the resources allocated and the resources consumed over the last 14 days
 +<code bash
 +hpc-login2 ~]sshare -l  
 +      User  RawShares  NormShares    RawUsage   NormUsage   FairShare  
 +---------- ---------- ----------- ----------- -----------  ----------  
 +                         1.000000     2872400                0.500000  
 +                    1    0.500000     2872400    1.000000    0.250000  
 +user_name         100    0.071429        4833    0.001726    0.246436
 </code> </code>
-  *  ''bigmem''This queue is designed for jobs that need a lot of memory. This queue will set aside a full 64 core node for the job, so ''nodes=1:ppn=64'' in the ''-l'' option of ''qsub'' is requiredThis queue has more priority than the ''batch'' queue and is limited to two jobs per userTo send a job to this queue it is necessary to use the ''-q'' option of the ''qsub'' command explicitly:   +# RawShares: Is the amount of resources allocated to the user in absolute terms It is the same for all users.\\ 
-<code> +# NormShares: This is the above amount normalised to the total allocated resources.\\ 
-ctqsub -q bigmem script.sh +# RawUsage: The number of seconds/cpu consumed by all user jobs.\\ 
 +# NormUsage: RawUsage normalised to total seconds/cpu consumed in the cluster.\\ 
 +# FairShare: The FairShare factor between 0 and 1. The higher the cluster usage, the closer to 0 and the lower the priority.\\ 
 + 
 +== Job submission == 
 +  - sbatch 
 +  - salloc 
 +  - srun 
 + 
 +1. SBATCH \\ 
 +Used to send a script to the queuing system. It is batch-processing and non-blocking. 
 +<code bash
 +# Crear el script: 
 +hpc-login2 ~]vim test_job.sh 
 +    #!/bin/bash 
 +    #SBATCH --job-name=test              # Job name 
 +    #SBATCH --nodes=1                    # -N Run all processes on a single node    
 +    #SBATCH --ntasks=1                   # -n Run a single task    
 +    #SBATCH --cpus-per-task=1            # -c Run 1 processor per task        
 +    #SBATCH --mem=1gb                    # Job memory request 
 +    #SBATCH --time=00:05:00              # Time limit hrs:min:sec 
 +    #SBATCH --qos=urgent                 # Queue 
 +    #SBATCH --output=test%j.log          # Standard output and error log 
 + 
 +    echo "Hello World!" 
 + 
 +hpc-login2 ~]$ sbatch test_job.sh 
 </code> </code>
-  *  ''interactive''This is the only queue that admits interactive sessions in the computational nodes. Also only one job per user is allowedwith maximum execution time of one hour and access to a single core of one node. Use of the ''interactive'' queue doesn't require the use of a //script//, but it is necessary to denote the interactivity of the job using the ''-I'' option: +2SALLOC \\ 
-<code> +It is used to immediately obtain an allocation of resources (nodes)As soon as it is obtainedthe specified command or shell is executed.  
-ctqsub -q interactive -I+<code bash> 
 +# Get 5 nodes and launch a job. 
 +hpc-login2 ~]$ salloc -N5 myprogram 
 +# Get interactive access to a node (Press Ctrl+D to exit): 
 +hpc-login2 ~]$ salloc -N1  
 +# Get interactive EXCLUSIVE access to node 
 +hpc-login2 ~]$ salloc -N1 --exclusive 
 +</code> 
 +3. SRUN \\ 
 +It is used to launch a parallel job (preferable to using mpirun). It is interactive and blocking. 
 +<code bash
 +# Launch the hostname command on 2 nodes 
 +hpc-login2 ~]srun -N2 hostname 
 +hpc-node1 
 +hpc-node2
 </code> </code>
  
-The system queues are ''np1'', ''np2'', ''np4'', ''np8'', ''np16'', ''np32'', ''np64'' y ''parallel''. 
-  *  ''np1''. Jobs that require 1 process and 1 node. Maximum memory for jobs in this queue is 1,99 GB and maximum execution time is 672 hours. 
-  *  ''np2''. Jobs that require 2 processes. Maximum memory for jobs in this queue is 3,75 GB and maximum execution time is 192 hours.  
-  *  ''np4''.Jobs that require 4 processes. Maximum memory for jobs in this queue is 7,5 GB and maximum execution time is 192 hours. 
-  *  ''np8''. Jobs that require 8 processes and as much as 5 nodes. Maximum memory for jobs in this queue is 15 GB and maximum execution time is 192 hours. 
-  *  ''np16''. Jobs that require 16 processes and as much as 5 nodes. Maximum memory for jobs in this queue is 31 GB and maximum execution time is 192 hours. 
-  *  ''np32''. Jobs that require 32 processes and as much as 5 nodes. Maximum memory for jobs in this queue is 63 GB and maximum execution time is 288 hours. 
-  *  ''np64''. Jobs that require 64 processes and as much as 5 nodes. Maximum memory for jobs in this queue is 127 GB and maximum execution time is 384 hours. 
-  *  ''parallel''. Jobs that require more than 32 processes in at least two separate nodes.Maximum memory for jobs in this queue is 64 GB and maximum execution time is 192 hours. 
  
-The following table summarizes the characteristics of the user and system queues; +==== GPU use ==== 
 +To specifically request a GPU allocation for a job, options must be added to sbatch or srun:  
 +|  %%--gres%%  |  Request gpus per NODE  |  %%--gres=gpu[[:type]:count],...%% 
 +|  %%--gpus o -G%%  |  Request gpus per JOB  |  %%--gpus=[type]:count,...%% 
 +There are also the options %% --gpus-per-socket,--gpus-per-node y --gpus-per-task%%,\\ 
 +Ejemplos: 
 +<code bash> 
 +## See the list of nodes and gpus: 
 +hpc-login2 ~]$ ver_recursos 
 +## Request any 2 GPUs for a JOB, add: 
 +--gpus=2 
 +## Request a 40G A100 at one node and an 80G A100 at another node, add: 
 +--gres=gpu:A100_40:1,gpu:A100_80:1  
 +</code>
  
-^ Queue             ^ Limits                                                                                                                   |||||| 
-| :::              ^ Processes  ^ Nodes  ^ Memory (GB)  ^ Jobs/user  ^ Maximum time (hours)  ^ Priority((Higher = more priority))  ^ 
-| ''batch''        |      1-64 | -      | -             | 128               | -                      | 1                                            | 
-| ''short''        |      1-16 | -      | -             | 256               | -                      | 3                                            | 
-| ''bigmem''              64 | -      | -             | 8                 | -                      | 2                                            | 
-| ''interactive''  | 1         | 1      | 2             | 1                 | 1                      | 7                                            | 
-| ''np1''          | 1         | 1      | 1,99          | 120               | 672                    | 6                                            | 
-| ''np2''          | 2         | 2      | 3,75          | 120               | 192                    | 5                                            | 
-| ''np4''          | 4         | 4      | 7,5           | 60                | 192                    | 4                                            | 
-| ''np8''          | 8         | 5      | 15            | 60                | 192                    | 4                                            | 
-| ''np16''         | 16        | 5      | 31            | 15                | 192                    | 3                                            | 
-| ''np32''         | 32        | 5      | 63            | 15                | 288                    | 2                                            | 
-| ''np64''         | 64        | 5      | 127           | 3                 | 384                    | 1                                            | 
-| ''parallel''     | 32-160    | 5      | 64            | 15                | 192                    | 3                                            | 
  
 +==== Job monitoring ====
 +<code bash>
 +## List all jobs in the queue
 +hpc-login2 ~]$ squeue
 +## Listing a user's jobs            
 +hpc-login2 ~]$ squeue -u <login>
 +## Cancel a job:
 +hpc-login2 ~]$ scancel <JOBID>
 +## List of recent jobs:
 +hpc-login2 ~]$ sacct -b
 +## Detailed historical information for a job:
 +hpc-login2 ~]$ sacct -l -j <JOBID>
 +## Debug information of a job for troubleshooting:
 +hpc-login2 ~]$ scontrol show jobid -dd <JOBID>
 +## View the resource usage of a running job:
 +hpc-login2 ~]$ sstat <JOBID>
 +</code>
 +==== Configure job output ====
 +== Exit codes ==
 +By default these are the output codes of the commands:
 +^  SLURM command  ^  Exit code  ^
 +|  salloc  |  0 success, 1 if the user's command cannot be executed  |
 +|  srun  |  The highest among all executed tasks or 253 for an out-of-mem error.  |
 +|  sbatch  |  0 success, if not, the corresponding exit code of the failed process  |
 +
 +== STDIN, STDOUT y STDERR ==
 +**SRUN:**\\
 +By default stdout and stderr are redirected from all TASKS to srun's stdout and stderr, and stdin is redirected from srun's stdin to all TASKS. This can be changed with:
 +|  %%-i, --input=<option>%%    | 
 +|  %%-o, --output=<option>%%   |
 +|  %%-e, --error=<option>%%   |
 +And options are:
 +  * //all//: by default.
 +  * //none//: Nothing is redirected.
 +  * //taskid//: Redirects only to and/or from the specified TASK id.
 +  * //filename//: Redirects everything to and/or from the specified file.
 +  * //filename pattern//: Same as the filename option but with a file defined by a [[ https://slurm.schedmd.com/srun.html#OPT_filename-pattern | pattern ]].
 +
 +**SBATCH:**\\
 +By default "/dev/null" is open in the script's stdin and stdout and stderror are redirected to a file named "slurm-%j.out". This can be changed with:
 +|  %%-i, --input=<filename_pattern>%%  |
 +|  %%-o, --output=<filename_pattern>%%  |
 +|  %%-e, --error=<filename_pattern>%%  |
 +The reference of filename_pattern is [[ https://slurm.schedmd.com/sbatch.html#SECTION_%3CB%3Efilename-pattern%3C/B%3E | here ]].
 +
 +==== Sending mail ====
 +JOBS can be configured to send mail in certain circumstances using these two parameters (**BOTH ARE REQUIRED**):
 +|  %%--mail-type=<type>%%  |  Options: BEGIN, END, FAIL, REQUEUE, ALL, TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_50.  |
 +|  %%--mail-user=<user>%%  |  The destination mailing address.  |
 +
 +
 +
 +==== Status of Jobs in the queuing system ====
 +<code bash>
 +hpc-login2 ~]# squeue -l
 +JOBID PARTITION     NAME     USER      STATE       TIME  NODES NODELIST(REASON)
 +6547  defaultPa  example <username>  RUNNING   22:54:55      1 hpc-fat1
 +
 +## Check status of queue use:
 +hpc-login2 ~]$ estado_colas.sh
 +JOBS PER USER:
 +--------------
 +       usuario.uno:  3
 +       usuario.dos:  1
 +
 +JOBS PER QOS:
 +--------------
 +             regular:  3
 +                long:  1
 +
 +JOBS PER STATE:
 +--------------
 +             RUNNING:  3
 +             PENDING:  1
 +==========================================
 +Total JOBS in cluster:  4
 +</code>
 +Common job states:
 +  * R RUNNING Job currently has an allocation.
 +  * CD COMPLETED Job has terminated all processes on all nodes with an exit code of zero. 
 +  * F FAILED Job terminated with non-zero exit code or other failure condition.
 +  * PD PENDING Job is awaiting resource allocation.
 + 
 +[[ https://slurm.schedmd.com/squeue.html#SECTION_JOB-STATE-CODES | Full list of possible job statuses ]].\\
 +
 +If a job is not running, a reason will be displayed underneath REASON:[[ https://slurm.schedmd.com/squeue.html#SECTION_JOB-REASON-CODES | reason list ]] for which a job may be awaiting execution.
  
-  * Processes: Maximum number of processes by job in this queue.Número máximo de procesos por trabajo en esta cola. 
-  * Nodes: Número máximo de nodos en los que se ejecutará el trabajo en esta cola. 
-  * Memory: Cantidad de memoria virtual máxima usada de modo concurrente por todos los procesos del trabajo. 
-  * Jobs/usuer: Número máximo de trabajos por usuario en esta cola. Es independiente del estado de dichos trabajos. 
-  * Maximum time (hours): tiempo real máximo durante el que el trabajo puede estar en ejecución. 
-  * Prioridad: Prioridad de la cola de ejecución frente a las otras. Un valor más alto expresa una mayor prioridad. Nótese que esto implica que ante la falta de otros criterios, cualquier trabajo enviado con qsub sin definir parámetros se ejecutará en np1 con los límites de dicha cola.