Accessing Spark Systems

Read(150) Label: sparkcli,

1. Find the following jars in Spark and Hadoop installation directories and put them in esProc external library folder. Get the appropriate jars according to the Spark and Hadoop versions you are using. The directory containing files of this external library is: installation directory\esProcext\lib\SparkCli. The Raqsoft core jar for this external library is SparkCli.jar.

antlr-runtime-3.4.jar

antlr4-runtime-4.5.jar

chill_2.11-0.8.0.jar

commons-cli-1.2.jar

commons-codec-1.4.jar

commons-compiler-2.7.6.jar

commons-configuration-1.6.jar

commons-lang-2.6.jar

commons-logging-1.1.3.jar

hadoop-auth-2.6.5.jar

hadoop-common-2.6.5.jar

hadoop-hdfs-2.6.5.jar

hadoop-mapreduce-client-core-2.6.5.jar

hadoop-mapreduce-client-jobclient-2.6.5.jar

hive-common-2.1.1.jar

hive-exec-2.1.1.jar

jackson-annotations-2.6.5.jar

jackson-core-2.6.5.jar

jackson-databind-2.6.5.jar

jackson-module-paranamer-2.6.5.jar

jackson-module-scala_2.11-2.6.5.jar

janino-2.7.8.jar

javax.ws.rs-api-2.0.1.jar

jersey-container-servlet-core-2.22.2.jar

jersey-server-2.22.2.jar

json4s-ast_2.11-3.2.11.jar

kryo-shaded-3.0.3.jar

log4j-1.2.17.jar

lz4-1.3.0.jar

metrics-json-3.1.2.jar

netty-all-4.0.29.Final.jar

paranamer-2.3.jar

scala-library.jar

scala-reflect-2.11.8.jar

scala-xml_2.11-1.0.2.jar

spark-catalyst_2.11-2.0.2.jar

spark-core_2.11-2.0.2.jar

spark-hive_2.11-2.0.2.jar

spark-launcher_2.11-2.0.2.jar

spark-network-common_2.11-2.0.2.jar

spark-network-shuffle_2.11-2.0.2.jar

spark-sql_2.11-2.0.2.jar

spark-unsafe_2.11-2.0.2.jar

xbean-asm5-shaded-4.4.jar

hive-cli-2.1.1.jar

hive-jdbc-2.1.1-standalone.jar

hive-metastore-1.2.1.spark2.jar

htrace-core-3.0.4.jar

jcl-over-slf4j-1.7.16.jar

Note: The third-party jars are provided within the package and users can choose appropriate ones for specific scenarios.

 

2. Download the following four files from the web and place them in installation directory\bin:

hadoop.dll

hadoop.lib

libwinutils.lib

winutils.exe

Note: The above files are required under Windows environment, but not under Linux. There are x86 winutils.exe and x64 winutils.exe depending on different OS versions.

 

3. SparkCli requires a JRE version 1.7 or above. The embedded JRE version in esProc is JRE1.6. Users need to install a higher version and configure java_home in the config.txt under installation directory\esProc\bin. If a JDK version 1.7 or above has been chosen when installing esProc, just ignore this step.

 

4. Users can manually change the size of memory if the default size isn’t large enough for needs. Two ways to manage memory under Windows are available: Change the memory settings in config.txt when starting esProc through the executable file; and in the .bat file when starting the application through the batch file. Under Linux, change the memory size in the .sh file.

Below is the method of changing memory settings in config.txt under Windows:

java_home=C:\ProgramFiles\Java\JDK1.7.0_11;esproc_port=48773;jvm_args=-Xms256m -XX:PermSize=256M -XX:MaxPermSize=512M -Xmx9783m -Duser.language=zh

 

5. On the machine where esProc is installed, find the hosts file to add the IP address and hostname of the machine holding the Spark system. For example, if the IP address and hostname are 192.168.0.8 and masters respectively, here are the settings:

 

6. esProc provides four external library functions - spark_client(), spark_query(), spark_cursor() and spark_close() - to access Spark systems. Look up them inHelp-Function referenceto find the uses.