Problem:
Spark 2.4.x that ships with CDH 5.13.3 prevents users from obtaining a SparkContext in Anaconda Enterprise and throws an error message similar to the following in spark logs when using Livy 0.5.0:
ERROR repl.PythonInterpreter: Traceback (most recent call last):
File "/data/02/yarn/nm/usercache/livy/appcache/application_1565648280838_0270/container_e183_1565648280838_0270_01_000001/tmp/7728029442563235757", line 700, in <module>
sys.exit(main())
File "/data/02/yarn/nm/usercache/livy/appcache/application_1565648280838_0270/container_e183_1565648280838_0270_01_000001/tmp/7728029442563235757", line 589, in main
sc = SparkContext(jsc=jsc, gateway=gateway, conf=conf)
File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 121, in __init__
ValueError: You are trying to pass an insecure Py4j gateway to Spark. This is not allowed as it is a security risk.
Apparently this version of Spark requires a token, however it's not clear where that token is created or where it should be passed to/from. The docs suggest setting:
PYSPARK_ALLOW_INSECURE_GATEWAY=1
in CDH, however this seems to have no effect. Meanwhile, Spark 1.6 and 2.3.x work fine with the same version of Livy.
Workaround:
Install Livy 0.6.0. Livy 0.6.0 appears to handle the token issue out of the box and requires no additional configuration. The downside is that Livy 0.6.0 doesn't support Spark 1.6, which shouldn't be a problem, but could be for organizations that want to switch between Spark versions.