Amazon Elastic MapReduce

Esta é unha revisión antiga do documento!


Amazon Elastic MapReduce

Amazon Elastic MapReduce (Amazon EMR) é un servicio web para a configuración e despliegue dun cluster baseado na instanciación de máquinas no servicio Amazon Elastic Compute Cloud (Amazon EC2) e xestionado mediante Hadoop. Tamén podese executar en Amazon EMR outros marcos de trabllo distribuídos populares, como Spark, e interactuar cos datos noutros almacéns de datos, como Amazon S3.

Creación de un cluster con EMR

Almacenamento con S3

Configuración do cluster

Logs

Spark sobre EMR

Instalar Spark

Executar un traballo

  • Step type: Custom JAR
  • JAR Location:
    s3://<CLUSTER_REGION>.elasticmapreduce/libs/script-runner/script-runner.jar
  • Arguments:
    /home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master yarn-cluster --class <MAIN_CLASS> s3://<BUCKET>/<FILE_JAR> <JAR_OPTIONS>

Java 8 en EMR

# Check java version
JAVA_VER=$(java -version 2>&1 | sed 's/java version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q')
 
if [ "$JAVA_VER" -lt 18 ]
then
    # Download jdk 8
    echo "Downloading and installing jdk 8"
    wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8-b132/jdk-8-linux-x64.rpm"
 
    # Silent install
    sudo yum -y install jdk-8-linux-x64.rpm
 
    # Figure out how many versions of Java we currently have
    NR_OF_OPTIONS=$(echo 0 | alternatives --config java 2>/dev/null | grep 'There ' | awk '{print $3}' | tail -1)
 
    echo "Found $NR_OF_OPTIONS existing versions of java. Adding new version."
 
    # Make the new java version available via /etc/alternatives
    sudo alternatives --install /usr/bin/java java /usr/java/default/bin/java 1
 
    # Make java 8 the default
    echo $(($NR_OF_OPTIONS + 1)) | sudo alternatives --config java
 
    # Set some variables
    export JAVA_HOME=/usr/java/default/bin/java
    export JRE_HOME=/usr/java/default/jre
    export PATH=$PATH:/usr/java/default/bin
fi
 
# Check java version again
JAVA_VER=$(java -version 2>&1 | sed 's/java version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q')
 
echo "Java version is $JAVA_VER!"
echo "JAVA_HOME: $JAVA_HOME"
echo "JRE_HOME: $JRE_HOME"
echo "PATH: $PATH"