Amazon Elastic MapReduce

Esta é unha revisión antiga do documento!


Amazon Elastic MapReduce

Amazon Elastic MapReduce (Amazon EMR) é un servicio web para a configuración e despliegue dun cluster baseado na instanciación de máquinas no servicio Amazon Elastic Compute Cloud (Amazon EC2) e xestionado mediante Hadoop. Tamén podese executar en Amazon EMR outros marcos de trabllo distribuídos populares, como Spark, e interactuar cos datos noutros almacéns de datos, como Amazon S3.

Creación de un cluster con EMR

Almacenamento con S3

Configuración do cluster

Spark sobre EMR

Instalar Spark

Executar un traballo

  • Step type: Custom JAR
  • JAR Location:
    s3://<CLUSTER_REGION>.elasticmapreduce/libs/script-runner/script-runner.jar
  • Arguments:
    /home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master yarn-cluster --class <MAIN_CLASS> s3://<BUCKET>/<FILE_JAR> <JAR_OPTIONS>

Java 8 en EMR

# Check java version
JAVA_VER=$(java -version 2>&1 | sed 's/java version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q')
 
if [ "$JAVA_VER" -lt 18 ]
then
    # Download jdk 8
    echo "Downloading and installing jdk 8"
    wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8-b132/jdk-8-linux-x64.rpm"
 
    # Silent install
    sudo yum -y install jdk-8-linux-x64.rpm
 
    # Figure out how many versions of Java we currently have
    NR_OF_OPTIONS=$(echo 0 | alternatives --config java 2>/dev/null | grep 'There ' | awk '{print $3}' | tail -1)
 
    echo "Found $NR_OF_OPTIONS existing versions of java. Adding new version."
 
    # Make the new java version available via /etc/alternatives
    sudo alternatives --install /usr/bin/java java /usr/java/default/bin/java 1
 
    # Make java 8 the default
    echo $(($NR_OF_OPTIONS + 1)) | sudo alternatives --config java
 
    # Set some variables
    export JAVA_HOME=/usr/java/default/bin/java
    export JRE_HOME=/usr/java/default/jre
    export PATH=$PATH:/usr/java/default/bin
fi
 
# Check java version again
JAVA_VER=$(java -version 2>&1 | sed 's/java version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q')
 
echo "Java version is $JAVA_VER!"
echo "JAVA_HOME: $JAVA_HOME"
echo "JRE_HOME: $JRE_HOME"
echo "PATH: $PATH"