====== Cluster Big Data 1 ====== ==== Service description ==== Cluster for experimentation and research in big data. Its objective is to provide an adequate hardware platform for Big Data jobs with a very flexible configuration, to make it easy to research and experiment on it. ==== Activation ==== You have to register filling the [[https://citius.usc.es/dashboard/enviar-incidencia| requests and problem reporting form]]. This form is only available to CITIUS users. ==== Access ==== The cluster can be accessed only through the master node which acts as a frontend for the cluster: ssh -X @master-bd1.inv.usc.es To access the Ambari manager web interface, you need to tunnel the ''nodo1'' port 8080 using SSH. In case you want to use this interface, you must use this command instead: ssh -L 8080:nodo1:8080 @master-bd1.inv.usc.es ==== Use ==== To manage the cluster from the Apache Ambari web interface, once connected through SSH using the option ''-L 8080:nodo1:8080'', connect to http://localhost:8080/ User and password are ''admin/admin''. ==== Software ==== The following software can be managed from the Ambari console: * HDFS 2.7.3 * YARN 2.7.3 * Tez 0.7.0 * Hive 1.2.1000 * Pig 0.16.0 * ZooKeeper 3.4.6 * Storm 1.1.0 * Spark 1.6.3 * Spark2 2.2.0 * Zeppelin Notebook 0.7.3 * Slider 0.92.0 Also in the following table other projects managed by users can be found: ^ User ^ Software ^ Nodes ^ Notes ^ | David Luaces Cachaza | MongoDB Sharding | Todos | Hasta 12/19 | | Cesar Piñeiro Pomar | GlusterFS | Todos | | | Rodrigo Martinez Castaño | Docker | 3 y 4 | | ==== Hardware ==== 16 Dell EMC PowerEdge R730 servers, each one with the following configuration: * 2 x Intel Xeon E5-2630 v4 (2,2Ghz 10c) * 384 GB de RAM: 12 x 32GB RDIMM 2400MT/s * 32 TB HDD: 8 x 4TB 7.2k SATA 6Gbps in JBOD * 2 x 10Gb BaseT y 2 x 1Gb BaseT