Esta é unha revisión antiga do documento!
Amazon Elastic MapReduce
Amazon Elastic MapReduce (Amazon EMR) é un servicio web para a configuración e despliegue dun cluster baseado na instanciación de máquinas no servicio Amazon Elastic Compute Cloud (Amazon EC2) e xestionado mediante Hadoop. Tamén podese executar en Amazon EMR outros marcos de trabllo distribuídos populares, como Spark, e interactuar cos datos noutros almacéns de datos, como Amazon S3.
Creación de un cluster con EMR
Almacenamento con S3
Configuración do cluster
Logs
Spark sobre EMR
Instalar Spark
Executar un traballo
- Step type: Custom JAR
- JAR Location:
s3://<CLUSTER_REGION>.elasticmapreduce/libs/script-runner/script-runner.jar
- Arguments:
/home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master yarn-cluster --class <MAIN_CLASS> s3://<BUCKET>/<FILE_JAR> <JAR_OPTIONS>
Java 8 en EMR
# Check java version JAVA_VER=$(java -version 2>&1 | sed 's/java version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q') if [ "$JAVA_VER" -lt 18 ] then # Download jdk 8 echo "Downloading and installing jdk 8" wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8-b132/jdk-8-linux-x64.rpm" # Silent install sudo yum -y install jdk-8-linux-x64.rpm # Figure out how many versions of Java we currently have NR_OF_OPTIONS=$(echo 0 | alternatives --config java 2>/dev/null | grep 'There ' | awk '{print $3}' | tail -1) echo "Found $NR_OF_OPTIONS existing versions of java. Adding new version." # Make the new java version available via /etc/alternatives sudo alternatives --install /usr/bin/java java /usr/java/default/bin/java 1 # Make java 8 the default echo $(($NR_OF_OPTIONS + 1)) | sudo alternatives --config java # Set some variables export JAVA_HOME=/usr/java/default/bin/java export JRE_HOME=/usr/java/default/jre export PATH=$PATH:/usr/java/default/bin fi # Check java version again JAVA_VER=$(java -version 2>&1 | sed 's/java version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q') echo "Java version is $JAVA_VER!" echo "JAVA_HOME: $JAVA_HOME" echo "JRE_HOME: $JRE_HOME" echo "PATH: $PATH"