Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:centro:servizos:servidores_de_computacion_gpgpu [2019/06/20 14:17] – [Service description] fernando.guillenen:centro:servizos:servidores_de_computacion_gpgpu [2023/10/11 13:57] (current) – [Restricted access GPU servers] fernando.guillen
Line 2: Line 2:
  
 ===== Service description ===== ===== Service description =====
- +==== Servers with free access GPUs ====
-Five servers with graphic cards: +
- +
-  * ''ctgpgpu1'': +
-    * Supermicro X8DTG-D +
-    * 2 x [[http://ark.intel.com/products/40200|Intel Xeon E5520]] +
-    * 10 GB RAM (5 DIMM 1333 MHz) +
-    * 2 x Nvidia GF100 [Tesla S2050] +
-    * Ubuntu 10.04 +
-      * CUDA 5.0 +
-  * ''ctgpgpu2'': +
-    * Dell Precision R5400 +
-    * 2 x [[http://ark.intel.com/products/33082/|Intel Xeon E5440]] +
-    * 8 GB RAM (4 x DDR2 FB-DIMM 667 MHz) +
-    * 1 Nvidia GK104 [Geforce GTX 680] +
-    * Ubuntu 18.04 operative system +
-      * Slurm (//mandatory to queue jobs!//) +
-      * CUDA 9.2 (//Nvidia official repo//) +
-      * Docker-ce 18.06 (//Docker official repo//) +
-      * Nvidia-docker 2.0.3 (//Nvidia official repo//) +
-      * Nvidia cuDNN v7.2.1 for CUDA 9.2 +
-      * Intel Parallel Studio Professional for C++ 2015 (//single license! coordinate with other users!//) +
-  * ''ctgpgpu3'': +
-    * PowerEdge R720 +
-    * 1 x [[http://ark.intel.com/products/64588|Intel Xeon E52609]] +
-    * 16 GB RAM (1 DDR3 DIMM  1600MHz) +
-    * Connected to a graphical card extensión box with: +
-      * Gigabyte GeForce GTX Titan 6GB (2014) +
-      * Nvidia Titan X Pascal 12GB (2016) +
- +
-    * Ubuntu 18.04 operative system +
-      * Slurm (//mandatory to queue jobs!//) +
-      * CUDA 9.2 (//Nvidia official repo//) +
-      * Docker-ce 18.06 (//Docker official repo//) +
-      * Nvidia-docker 2.0.3 (//Nvidia official repo//) +
-      * Nvidia cuDNN v7.2.1 for CUDA 9.2 +
-      * Intel Parallel Studio Professional for C++ 2015 (//single license! coordinate with other users!//) +
-      * ROS Melodic Morenia (//repositorio oficial de ROS//)+
   * ''ctgpgpu4'':   * ''ctgpgpu4'':
       * PowerEdge R730       * PowerEdge R730
Line 45: Line 8:
       * 128 GB RAM (4 DDR4 DIMM  2400MHz)       * 128 GB RAM (4 DDR4 DIMM  2400MHz)
       * 2 x Nvidia GP102GL 24GB [Tesla P40]       * 2 x Nvidia GP102GL 24GB [Tesla P40]
-      * Centos 7.4 +      * AlmaLinux 9.1 
-          * Docker 17.09 and nvidia-docker 1.0.1 +          * Cuda 12.0 
-          * OpenCV 2.4.5 +          * **Mandatory use of Slurm queue manager**
-          * Dliv, Caffe, Caffe2 and pycaffe + 
-          Python 3.4cython, easydict, sonnet +  HPC cluster servers[[ en:centro:servizos:hpc | HPC cluster ]] 
-          TensorFlow +  CESGA servers: [[ en:centro:servizos:cesga | Access procedure info ]]  
-  * ''ctgpgpu5'':+ 
 +==== Restricted access GPU servers  ==== 
 + * ''ctgpgpu5'':
       * PowerEdge R730       * PowerEdge R730
       * 2 x  [[https://ark.intel.com/products/92980/Intel-Xeon-Processor-E5-2623-v4-10M-Cache-2_60-GHz|Intel Xeon E52623v4]]       * 2 x  [[https://ark.intel.com/products/92980/Intel-Xeon-Processor-E5-2623-v4-10M-Cache-2_60-GHz|Intel Xeon E52623v4]]
       * 128 GB RAM (4 DDR4 DIMM  2400MHz)       * 128 GB RAM (4 DDR4 DIMM  2400MHz)
       * 2 x Nvidia GP102GL 24GB [Tesla P40]       * 2 x Nvidia GP102GL 24GB [Tesla P40]
-      * Ubuntu 16.04+      * Ubuntu 18.04
           * **Slurm as a mandatory use queue manager**.           * **Slurm as a mandatory use queue manager**.
           * ** Modules for library version management **.           * ** Modules for library version management **.
-          * CUDA 9.0+          * CUDA 11.0
           * OpenCV 2.4 and 3.4           * OpenCV 2.4 and 3.4
           * Atlas 3.10.3           * Atlas 3.10.3
Line 65: Line 30:
           * TensorFlow           * TensorFlow
           * Caffee           * Caffee
- * ''ctgpgpu6'': (soon)   +   
-      * Nvidia Titan Xp 12GB (2017+  * ''ctgpgpu6'':  
-      * Nvidia Quadro P6000 24GB (2018)       +      * Server SIE LADON 4214 
 +      * 2 processors  [[https://ark.intel.com/content/www/us/en/ark/products/193385/intel-xeon-silver-4214-processor-16-5m-cache-2-20-ghz.html|Intel Xeon Silver 4214]] 
 +      * 192 GB RAM memory(12 DDR4 DIMM 2933MHz)  
 +      * Nvidia Quadro P6000 24GB (2018) 
 +      * Nvidia Quadro RTX8000 48GB (2019) 
 +      * Operating system Centos 7.7 
 +          * Nvidia Driver 418.87.00 for CUDA 10.1 
 +          * Docker 19.03 
 +          * [[https://github.com/NVIDIA/nvidia-docker | Nvidia-docker  ]] 
 +  * ''ctgpgpu9'': 
 +      * Dell PowerEdge R750 
 +      * 2 x [[ https://ark.intel.com/content/www/es/es/ark/products/215274/intel-xeon-gold-6326-processor-24m-cache-2-90-ghz.html |Intel Xeon Gold 6326 ]] 
 +      * 128 GB RAM  
 +      * 2 x NVIDIA Ampere A100 80 GB 
 +      * AlmaLinux 8.6 
 +           * NVIDIA 515.48.07 driver and CUDA 11.7 
 +  * ''ctgpgpu10'': 
 +      * PowerEdge R750 
 +      * 2 x [[ https://ark.intel.com/content/www/es/es/ark/products/215272/intel-xeon-gold-5317-processor-18m-cache-3-00-ghz.html |Intel Xeon Gold 5317 ]] 
 +      * 128 GB  RAM  
 +      * NVIDIA Ampere A100 80 GB 
 +      * Sistema operativo AlmaLinux 8.7 
 +           * Driver NVIDIA 525.60.13 and CUDA 12.0 
 +  * ''ctgpgpu11'': 
 +      * Server Gybabyte  G482-Z54 
 +      * 2 x [[ https://www.amd.com/es/products/cpu/amd-epyc-7413 | AMD EPYC 7413 @2,65 GHz 24c ]] 
 +      * 256 GB RAM 
 +      * 4 x NVIDIA Ampere A100 de 80 GB   
 +      * AlmaLinux 9.1 
 +           * Driver NVIDIA 520.61.05 and CUDA 11.8 
 +  * ''ctgpgpu12'': 
 +      * Servidor Dell PowerEdge R760 
 +      * 2 x [[ https://ark.intel.com/content/www/xl/es/ark/products/232376.html |Intel Xeon Silver 4410Y ]] 
 +      * 384 GB RAM  
 +      * 2 x NVIDIA Hopper H100 de 80 GB 
 +      * Sistema operativo AlmaLinux 9.2 
 +           * Driver NVIDIA 535.104.12 and CUDA 12.2
  
 ===== Activation ===== ===== Activation =====
-All CITIUS users can access this service, You have to register filling the [[https://citius.usc.es/dashboard/enviar-incidencia| requests and problem reporting form]]. +Not all servers are available to use freely. Access must be requested filling the [[https://citius.usc.es/dashboard/enviar-incidencia| requests and problem reporting form]]. Users without access permission will receive an incorrect password error message.
  
 ===== User Manual ===== ===== User Manual =====
Line 76: Line 77:
 Use SSH. Hostnames and ip addresses are: Use SSH. Hostnames and ip addresses are:
  
-  * ctgpgpu1.inv.usc.es - 172.16.242.91:1301 
-  * ctgpgpu2.inv.usc.es - 172.16.242.92:22 
-  * ctgpgpu3.inv.usc.es - 172.16.242.93:22 
-  * ctgpgpu4.inv.usc.es - 172.16.242.201:22 
-  * ctgpgpu5.inv.usc.es - 172.16.242.201:22 
  
 +  * ctgpgpu4.inv.usc.es - 172.16.242.201:22
 +  * ctgpgpu5.inv.usc.es - 172.16.242.202:22
 +  * ctgpgpu6.inv.usc.es - 172.16.242.205:22
 +  * ctgpgpu9.inv.usc.es - 172.16.242.94:22
 +  * ctgpgpu10.inv.usc.es - 172.16.242.95:22
 +  * ctgpgpu11.inv.usc.es - 172.16.242.96:22
 +  * ctgpgpu12.inv.usc.es - 172.16.242.97:22
 Connection in only possible from inside the CITIUS network. To connect from other places or from the RAI network it is necessary to use the [[https://wiki.citius.usc.es/en:centro:servizos:vpn:start | VPN]] or the [[https://wiki.citius.usc.es/en:centro:servizos:pasarela_ssh|SSH gateway]]. Connection in only possible from inside the CITIUS network. To connect from other places or from the RAI network it is necessary to use the [[https://wiki.citius.usc.es/en:centro:servizos:vpn:start | VPN]] or the [[https://wiki.citius.usc.es/en:centro:servizos:pasarela_ssh|SSH gateway]].
  
Line 91: Line 94:
 ==== Job management with SLURM ==== ==== Job management with SLURM ====
  
-In ''ctgpgpu2'',''ctgpgpu3'' and ''ctgpgpu5'' there is a queue management software installed to send jobs and avoid conflicts between different processes because two jobs shouldn't be executed at the same time.+On servers where there is a queue management software installed its use is mandatory to send jobs and avoid conflicts between different processes because two jobs shouldn't be executed at the same time.
  
 To send a job to the queue command ''srun'' is used: To send a job to the queue command ''srun'' is used: