{"id":424,"date":"2017-08-17T17:30:23","date_gmt":"2017-08-17T15:30:23","guid":{"rendered":"http:\/\/whizzkit.nl\/?p=424"},"modified":"2019-05-15T15:31:19","modified_gmt":"2019-05-15T13:31:19","slug":"installing-sap-spark-controller-2-2-0-on-mapr-mep-3-0-1","status":"publish","type":"post","link":"https:\/\/whizzkit.nl\/?p=424","title":{"rendered":"Installing SAP Spark Controller 2.2.0 on MapR MEP 3.0.1"},"content":{"rendered":"<p><script src=\"https:\/\/cdn.rawgit.com\/google\/code-prettify\/master\/loader\/run_prettify.js?skin=desert\"><\/script><\/p>\n<style>\npre {\nbackground-color:#000000 !important;\ncolor:white;\nborder:1px solid #a0b842 !important;\nborder-radius:7px;\nfont-size:10px;\n}\nstrong {\nmargin-top:10px;\n}\n<\/style>\n<p>&copy; Notice: I copied this post&#8217;s image from one of SAP&#8217;s Bob tutorials!<br \/>\n<strong>The Challenge<\/strong><br \/>\nAt one of my customers we need to install the SAP Spark Controller to be able to connect our MapR cluster to HANA. If you arrived at this post, you will know why this is a great feature for a lot of use cases. But it also lead to a small challenge which may be clear from the URL&#8217;s below:<\/p>\n<ul>\n<li><a href=\"http:\/\/maprdocs.mapr.com\/home\/EcosystemRN\/MEP3.0PkgNames.html\" target=\"_blank\">http:\/\/maprdocs.mapr.com\/home\/EcosystemRN\/MEP3.0PkgNames.html<\/a><\/li>\n<li><a href=\"https:\/\/uacp2.hana.ondemand.com\/viewer\/3b120d20f0b54c91a3be763d387e61ac\/2.0.0.0\/en-US\/\" target=\"_blank\">https:\/\/uacp2.hana.ondemand.com\/viewer\/3b120d20f0b54c91a3be763d387e61ac\/2.0.0.0\/en-US\/<\/a><\/li>\n<\/ul>\n<p>In short: the 2.2.0 Spark Controller needs Spark 1.6 but MapR MEP3 provides Spark 2.1. Trust me, I have tried to get it running but due to the absence of the Spark assembly in 2.x versions and the absence of Akka stuff I think it&#8217;s nearly impossible to get it running unless SAP does a recompile. Luckily, in YARN, a Spark Job actually is &#8216;just another job&#8217; if you just provide the Spark assembly when submitting. Because of this, you are still able to set up the Spark Controller while not having to hack too much and leave the cluster messy&#8230;. By far!<\/p>\n<p><strong>A few words of warning and a prerequisite<\/strong><\/p>\n<ul>\n<li>Because I like to install things at minimum, sometimes I cut corners or use a different approach than stated in the fine manuals. This may lead to situations that are unsupported by one or the other vendor. This goes in particular for the installation of the Spark Controller, for which I didn&#8217;t use the rpm command to install, but extracted the rpm and copied the files over. I did this to be easily be able to switch between Spark Controller versions. In a production environment, nothing stops you from using the normal rpm installation approach<\/li>\n<li>With regards to MapR, I download one rpm from their repository to be able to extract <i>just<\/i> the jar file you need, no more. This prevents a full Spark installation having to be hanging around on your node(s)<\/li>\n<li>You will need a MapR 5.2 based cluster with MEP 3.0.1 installed. I will not cover the installation of that<\/li>\n<\/ul>\n<p><strong>Actions to be performed as root<\/strong><br \/>\nI upgraded my old MEP 1.1 to MEP 3.0.1 using yum -y update (after changing \/etc\/yum.repos.d\/mapr_ecosystem.repo to reflect the base url to be http:\/\/package.mapr.com\/releases\/MEP\/MEP-3.0.1\/redhat), but it seemed that this upgrade left my YARN installation unusable. To fix, I had to restore the rights on the container-executor executable as follows:<\/p>\n<pre>\r\nchmod -R 6050 \/opt\/mapr\/hadoop\/hadoop-2.7.0\/bin\/container-executor \r\nchown root.mapr \/opt\/mapr\/hadoop\/hadoop-2.7.0\/bin\/container-executor \r\n<\/pre>\n<p>From this point on it&#8217;s a straightforward install from the downloaded zip file at the SAP download site. Most of the comments will be in the code blocks.<\/p>\n<pre>\r\n# Add hanaes user and groups\r\n# Note that it MUST be hanaes because it seems to be hardcoded in the jar's\r\ngroupadd sapsys\r\nuseradd -g sapsys -d \/home\/hanaes -s \/bin\/bash hanaes\r\n\r\n# Create some directories we need. These are the default when you use rpm to install the controller\r\n# You are able to change all of these but then you'll have to configure much more later on\r\nmkdir -p \/var\/log\/hanaes\/ \/var\/run\/hanaes\/ \/usr\/sap\/spark\/\r\n\r\n# Extract installation rpm from the 2.2.0 Spark Controller\r\nunzip HANASPARKCTRL02_0-70002101.zip sap.hana.spark.controller-2.2.0-1.noarch.rpm\r\nrpm2cpio sap.hana.spark.controller-2.2.0-1.noarch.rpm|cpio -idmv\r\n\r\n# Now remove (be careful if you have an install present!) a possibly existing installation\r\nrm -rf \/usr\/sap\/spark\/controller\r\n# Move the extracted files to the target directory\r\nmv usr\/sap\/spark\/controller\/ \/usr\/sap\/spark\/\r\n\r\n# I remove this template because I do not want it to be interfering in any way with the installation\r\nrm -f \/usr\/sap\/spark\/controller\/conf\/hanaes-template.xml\r\n\r\n# Now the 'old' mapr-spark package rpm package is downloaded and the assembly jar we need extracted\r\nwget http:\/\/archive.mapr.com\/releases\/MEP\/MEP-1.1\/redhat\/mapr-spark-1.6.1.201707241448-1.noarch.rpm\r\nmkdir -p .\/opt\/mapr\/spark\/spark-1.6.1\/lib\/\r\n# The below command leaves us with the assembly in .\/opt\/mapr\/spark\/spark-1.6.1\/lib\/\r\nrpm2cpio mapr-spark-1.6.1.201707241448-1.noarch.rpm| cpio -icv \"*spark-assembly-1.6.1-mapr-1707-hadoop2.7.0-mapr-1602.jar\"\r\n\r\n# Remove possibly present old jars from Hive or Spark.\r\n# We need fresh copies from the \/opt\/mapr\/hive directory and the extracted rpm file\r\nrm -f \/usr\/sap\/spark\/controller\/lib\/spark-assembly*jar \/usr\/sap\/spark\/controller\/lib\/datanucleus-*.jar \/usr\/sap\/spark\/controller\/lib\/bonecp-*.jar\r\n\r\n# Copy needed files to lib dir of the Spark Controller installation\r\n# Because the controller uses Hive as the metadata provider, these libs are needed to access the Hive Thrift Server \r\n# (as per Spark Controller installation instructions)\r\ncp \/opt\/mapr\/hive\/hive-2.1\/lib\/datanucleus-*.jar \/opt\/mapr\/hive\/hive-2.1\/lib\/bonecp-*.jar .\/opt\/mapr\/spark\/spark-1.6.1\/lib\/spark-assembly-1.6.1-mapr-1707-hadoop2.7.0-mapr-1602.jar \/usr\/sap\/spark\/controller\/lib\/\r\n\r\n# Set ownership of files needed for hanaes\r\nchown -R hanaes.sapsys \/var\/log\/hanaes\/ \/var\/run\/hanaes\/ \/usr\/sap\/spark\r\n\r\n# Make directory on HDFS for hanaes user, mostly to be able to stage the application to YARN\r\nhadoop fs -mkdir -p \/user\/hanaes\r\nhadoop fs -chown hanaes:sapsys \/user\/hanaes\r\n\r\n# Now allow the hanaes user to actually submit an app to YARN. Add the below:\r\nvi \/opt\/mapr\/hadoop\/hadoop-2.7.0\/etc\/hadoop\/core-site.xml\r\n  &#x3C;property&#x3E;\r\n    &#x3C;name&#x3E;hadoop.proxyuser.hanaes.hosts&#x3C;\/name&#x3E;\r\n    &#x3C;value&#x3E;*&#x3C;\/value&#x3E;\r\n  &#x3C;\/property&#x3E;\r\n  &#x3C;property&#x3E;\r\n    &#x3C;name&#x3E;hadoop.proxyuser.hanaes.groups&#x3C;\/name&#x3E;\r\n    &#x3C;value&#x3E;*&#x3C;\/value&#x3E;\r\n  &#x3C;\/property&#x3E;\r\n\r\n# You might need to tweak values like yarn.nodemanager.resource.memory-mb, but I'll leave that to the reader.\r\n\r\n# Restart node- and resourcemanager to reflect the change above\r\nmaprcli node services resourcemanager restart -nodes &lt;your cluster nodes with resource managers&gt;\r\nmaprcli node services nodemanager restart -nodes &lt;your cluster nodes with node managers&gt;\r\n<\/pre>\n<p><strong>Actions to be performed as the hanaes user<\/strong><br \/>\nOnce the root part is done, the hanaes user is able to configure the controller and start it. Most important part is the configuration part. Pay attention especially to location of the java home directory stored in the JAVA_HOME parameter. This will for sure be different on your install.<\/p>\n<pre>\r\n# Become hanaes\r\nsu - hanaes\r\n\r\n# Configure Spark Controller\r\ncd \/usr\/sap\/spark\/controller\/conf\/\r\n\r\n# Contents of log4j.properties. This may be too chatty in production, you may restrict the rootCategory INFO to WARN or ERROR\r\n# The INFO in the spark.sql packages will log the SQL executed and requested, which may come in handy when looking for errors.\r\n\r\nlog4j.rootCategory=INFO, console\r\nlog4j.appender.console=org.apache.log4j.ConsoleAppender\r\nlog4j.appender.console.target=System.err\r\nlog4j.appender.console.layout=org.apache.log4j.PatternLayout\r\nlog4j.appender.console.layout.ConversionPattern=%d{yy\/MM\/dd HH:mm:ss} %p %c{1}: %m%n\r\nlog4j.logger.com.sap=INFO\r\nlog4j.logger.org.apache.spark.sql.hana=INFO\r\nlog4j.logger.org.apache.spark.sql.hive.hana=INFO\r\n\r\n# Contents of hana_hadoop-env.sh. These are the bare minimum lines you will need to get started on MapR\r\n\r\n#!\/bin\/bash\r\nexport JAVA_HOME=\/usr\/lib\/jvm\/java-1.8.0-openjdk-1.8.0.141-1.b16.el7_3.x86_64\/\r\nexport HANA_SPARK_ASSEMBLY_JAR=\/usr\/sap\/spark\/controller\/lib\/spark-assembly-1.6.1-mapr-1707-hadoop2.7.0-mapr-1602.jar\r\nexport HADOOP_CONF_DIR=\/opt\/mapr\/hadoop\/hadoop-2.7.0\/etc\/hadoop\/\r\nexport HADOOP_CLASSPATH=`hadoop classpath`\r\nexport HIVE_CONF_DIR=\/opt\/mapr\/hive\/hive-2.1\/conf\r\n\r\n# Contents of hanaes-site.xml. Tweak to your need, especially the executor instances and memory\r\n# The below is for my one node testing VM\r\n\r\n&#x3C;?xml version=&#x22;1.0&#x22;?&#x3E;\r\n&#x3C;configuration&#x3E;\r\n  &#x3C;property&#x3E;\r\n    &#x3C;name&#x3E;sap.hana.es.server.port&#x3C;\/name&#x3E;\r\n    &#x3C;value&#x3E;7860&#x3C;\/value&#x3E;\r\n  &#x3C;\/property&#x3E;\r\n  &#x3C;property&#x3E;\r\n    &#x3C;name&#x3E;spark.executor.memory&#x3C;\/name&#x3E;\r\n    &#x3C;value&#x3E;1g&#x3C;\/value&#x3E;\r\n  &#x3C;\/property&#x3E;\r\n  &#x3C;property&#x3E;\r\n    &#x3C;name&#x3E;spark.executor.instances&#x3C;\/name&#x3E;\r\n    &#x3C;value&#x3E;1&#x3C;\/value&#x3E;\r\n  &#x3C;\/property&#x3E;\r\n  &#x3C;property&#x3E;\r\n    &#x3C;name&#x3E;spark.executor.cores&#x3C;\/name&#x3E;\r\n    &#x3C;value&#x3E;1&#x3C;\/value&#x3E;\r\n  &#x3C;\/property&#x3E;\r\n  &#x3C;property&#x3E;\r\n    &#x3C;name&#x3E;sap.hana.enable.compression&#x3C;\/name&#x3E;\r\n    &#x3C;value&#x3E;true&#x3C;\/value&#x3E;\r\n  &#x3C;\/property&#x3E;\r\n&#x3C;\/configuration&#x3E;\r\n\r\ncd ..\/bin\/\r\n\r\n# If you are on a memory restricted node, you might want to change the ES_HEAPSIZE\r\n# This got me baffeled for a while when using this latest Spark Controller version\r\n# Main but undescriptive error I got was something like network.Server did not start, check YARN logs.\r\n# But the driver did not start, so there was nothing in YARN ;)\r\n\r\nvi hanaes (first line after the shebang line)\r\nexport HANA_ES_HEAPSIZE=2048\r\n\r\n# I think you WILL need the large amount of memory if you want to use the caching feature but I'm not too sure about that.\r\n\r\n# Now start the controller and cross your fingers\r\n# I like to clear the log first when setting this up. In production situations this may not be handy.\r\necho > \/var\/log\/hanaes\/hana_controller.log && .\/hanaes restart\r\n\r\n# And monitor\r\ntail -f \/var\/log\/hanaes\/hana_controller.log\r\n\r\n<\/pre>\n<p><strong>The result<\/strong><br \/>\nWell, that wasn&#8217;t too hard! Spark Controller is up and running and accessible from HANA Studio.<\/p>\n<pre>\r\n[hanaes@mapr ~]$ cat \/var\/log\/hanaes\/hana_controller.log \r\n17\/08\/17 12:18:00 INFO Server: Starting Spark Controller\r\n17\/08\/17 12:18:02 INFO CommandRouter$$anon$1: Added JAR \/usr\/sap\/spark\/controller\/lib\/controller.common-2.1.1.jar at ...\r\n17\/08\/17 12:18:02 INFO CommandRouter$$anon$1: Added JAR \/usr\/sap\/spark\/controller\/lib\/spark1_6\/spark.shims_1.6.2-2.1.1.jar at ...\r\n17\/08\/17 12:18:02 INFO CommandRouter$$anon$1: Added JAR \/usr\/sap\/spark\/controller\/lib\/controller.core-2.1.1.jar at ...\r\n17\/08\/17 12:18:02 INFO ZooKeeper: Client environment:java.class.path=\/usr\/sap\/spark\/controller\/lib\/spark-assembly-1.6.1-mapr-1707-hadoop2.7.0-mapr-1602.jar: ...\r\n17\/08\/17 12:18:03 INFO Client: Uploading resource file:\/usr\/sap\/spark\/controller\/lib\/spark-assembly-1.6.1-mapr-1707-hadoop2.7.0-mapr-1602.jar -> maprfs:\/...\r\n17\/08\/17 12:18:03 INFO Client: Uploading resource file:\/usr\/sap\/spark\/controller\/lib\/spark.extension-2.1.1.jar -> maprfs:\/...\r\n17\/08\/17 12:18:03 INFO Client: Uploading resource file:\/usr\/sap\/spark\/controller\/lib\/controller.common-2.1.1.jar -> maprfs:\/...\r\n17\/08\/17 12:18:10 INFO CommandRouterDefault: Running Spark Controller on Spark 1.6.1 with Application Id application_1502967355788_0022\r\n17\/08\/17 12:18:10 INFO CommandRouterDefault: Connecting to Hive MetaStore!\r\n17\/08\/17 12:18:11 INFO HanaHiveSQLContext: Initializing execution hive, version 1.2.1\r\n17\/08\/17 12:18:17 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0\r\n17\/08\/17 12:18:18 INFO HanaHiveSQLContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.\r\n17\/08\/17 12:18:18 INFO metastore: Trying to connect to metastore with URI thrift:\/\/mapr.whizzkit.nl:9083\r\n17\/08\/17 12:18:18 INFO metastore: Connected to metastore.\r\n17\/08\/17 12:18:18 INFO CommandRouterDefault: Server started\r\n\r\n[root@mapr ~]# yarn application -list\r\n17\/08\/17 15:44:58 INFO client.MapRZKBasedRMFailoverProxyProvider: Updated RM address to mapr.whizzkit.nl\/10.80.91.143:8032\r\nTotal number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1\r\n                Application-Id Application-Name                   Application-Type User   Queue       State   Final-State Progress Tracking-URL\r\napplication_1502967355788_0022 SAP HANA Spark Controller:sparksql SPARK            hanaes root.hanaes RUNNING UNDEFINED   10%      http:\/\/mapr.whizzkit.nl:4040\r\n<\/pre>\n<p>Have fun installing!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&copy; Notice: I copied this post&#8217;s image from one of SAP&#8217;s Bob tutorials! The Challenge At one of my customers we need to install the SAP Spark Controller to be able to connect our MapR cluster to HANA. If you arrived at this post, you will know why this is a great feature for a [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":425,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/whizzkit.nl\/index.php?rest_route=\/wp\/v2\/posts\/424"}],"collection":[{"href":"https:\/\/whizzkit.nl\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/whizzkit.nl\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/whizzkit.nl\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/whizzkit.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=424"}],"version-history":[{"count":30,"href":"https:\/\/whizzkit.nl\/index.php?rest_route=\/wp\/v2\/posts\/424\/revisions"}],"predecessor-version":[{"id":455,"href":"https:\/\/whizzkit.nl\/index.php?rest_route=\/wp\/v2\/posts\/424\/revisions\/455"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/whizzkit.nl\/index.php?rest_route=\/wp\/v2\/media\/425"}],"wp:attachment":[{"href":"https:\/\/whizzkit.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/whizzkit.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/whizzkit.nl\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}