Wiki do CiTIUS

Service description

Cluster for experimentation and research in big data. Its objective is to provide an adequate hardware platform for Big Data jobs with a very flexible configuration, to make it easy to research and experiment on it.

Activation

You have to register filling the requests and problem reporting form. This form is only available to CITIUS users.

Access

The cluster can be accessed only through the master node which acts as a frontend for the cluster:

ssh -X <citius user>@master-bd1.inv.usc.es

To access the Ambari manager web interface, you need to tunnel the nodo1 port 8080 using SSH. In case you want to use this interface, you must use this command instead:

ssh -L 8080:nodo1:8080 <citius_user>@master-bd1.inv.usc.es

Use

To manage the cluster from the Apache Ambari web interface, once connected through SSH using the option -L 8080:nodo1:8080, connect to http://localhost:8080/

User and password are admin/admin.

Software

The following software can be managed from the Ambari console:

HDFS 2.7.3
YARN 2.7.3
Tez 0.7.0
Hive 1.2.1000
Pig 0.16.0
ZooKeeper 3.4.6
Storm 1.1.0
Spark 1.6.3
Spark2 2.2.0
Zeppelin Notebook 0.7.3
Slider 0.92.0

Also in the following table other projects managed by users can be found:

User	Software	Nodes	Notes
David Luaces Cachaza	MongoDB Sharding	Todos	Hasta 12/19
Cesar Piñeiro Pomar	GlusterFS	Todos
Rodrigo Martinez Castaño	Docker	3 y 4

Hardware

16 Dell EMC PowerEdge R730 servers, each one with the following configuration:

2 x Intel Xeon E5-2630 v4 (2,2Ghz 10c)
384 GB de RAM: 12 x 32GB RDIMM 2400MT/s
32 TB HDD: 8 x 4TB 7.2k SATA 6Gbps in JBOD
2 x 10Gb BaseT y 2 x 1Gb BaseT

Cluster Big Data 1

Table of Contents