Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:centro:servizos:hpc [2023/03/03 13:59] – [Sending a job to the queue system] fernando.guillenen:centro:servizos:hpc [2024/03/13 10:37] (current) – [Sending a job to the queue system] fernando.guillen
Line 163: Line 163:
 # Install  # Install 
 sh Miniconda3-py39_4.11.0-Linux-x86_64.sh sh Miniconda3-py39_4.11.0-Linux-x86_64.sh
 +#  Initialize for bash shell
 +~/miniconda3/bin/conda init bash
 </code> </code>
  
Line 281: Line 283:
 ==== Sending a job to the queue system ==== ==== Sending a job to the queue system ====
 == Requesting resources == == Requesting resources ==
-By default, if you submit a job without specifying anything, the system submits it to the default (regular) QOS and assigns it a node, a CPU and all available memory. The time limit for job execution is that of the queue (4 days and 4 hours). +By default, if you submit a job without specifying anything, the system submits it to the default (regular) QOS and assigns it a node, a CPU and 4 GB. The time limit for job execution is that of the queue (4 days and 4 hours). 
 This is very inefficient, the ideal is to specify as much as possible at least three parameters when submitting jobs: This is very inefficient, the ideal is to specify as much as possible at least three parameters when submitting jobs:
   -  %%Node number (-N or --nodes), tasks (-n or --ntasks) and/or CPUs per task (-c or --cpus-per-task).%%   -  %%Node number (-N or --nodes), tasks (-n or --ntasks) and/or CPUs per task (-c or --cpus-per-task).%%
Line 348: Line 350:
 # Get interactive access to a node (Press Ctrl+D to exit): # Get interactive access to a node (Press Ctrl+D to exit):
 hpc-login2 ~]$ salloc -N1  hpc-login2 ~]$ salloc -N1 
 +# Get interactive EXCLUSIVE access to a node
 +hpc-login2 ~]$ salloc -N1 --exclusive
 </code> </code>
 3. SRUN \\ 3. SRUN \\
Line 432: Line 436:
 JOBID PARTITION     NAME     USER      STATE       TIME  NODES NODELIST(REASON) JOBID PARTITION     NAME     USER      STATE       TIME  NODES NODELIST(REASON)
 6547  defaultPa  example <username>  RUNNING   22:54:55      1 hpc-fat1 6547  defaultPa  example <username>  RUNNING   22:54:55      1 hpc-fat1
 +
 +## Check status of queue use:
 +hpc-login2 ~]$ estado_colas.sh
 +JOBS PER USER:
 +--------------
 +       usuario.uno:  3
 +       usuario.dos:  1
 +
 +JOBS PER QOS:
 +--------------
 +             regular:  3
 +                long:  1
 +
 +JOBS PER STATE:
 +--------------
 +             RUNNING:  3
 +             PENDING:  1
 +==========================================
 +Total JOBS in cluster:  4
 </code> </code>
 Common job states: Common job states: