Update to pyspark package requirements (#798)

bweissman · web-flow · commit 4dbb29857cb4 · 2020-08-12T12:13:32.000-07:00
* separate SP creation

separated Service Principal creation of aks due to issues on some machines with integrated SP creation
fixed typo in README

* Update deploy-sql-big-data-aks.py

removed JSON output

* adjusted for CU5

* change back

* update to CU5

* Updated requirements for pyspark installation in CU5
diff --git a/samples/features/sql-big-data-cluster/spark/config-install/installpackage_Spark.ipynb b/samples/features/sql-big-data-cluster/spark/config-install/installpackage_Spark.ipynb
@@ -106,9 +106,20 @@
                 "The following code can be used to install packages on each executor node at runtime. \\\n",
                 "**Note**: This functionality is not available on a non-root BDC deployment (including OpenShift). This installation is temporary, and must be performed each time a new Spark session is invoked.\n",
                 "\n",
+                "If you want to use this from CU5 upwards, you must add two settings pre-deployment.\n",
+                "\n",
+                "In contron.json, add (under security):\n",
+                "\n",
+                "_\"allowRunAsRoot\": true_\n",
+                "\n",
+                "In BDC.json, add (under spec.services.spark.settings): \n",
+                "\n",
+                "_\"yarn-site.yarn.nodemanager.container-executor.class\": \"org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor\"_\n",
+                "\n",
                 "``` Python\n",
                 "import subprocess\n",
-                "\n",
+                "import os\n",
+                "os.environ[\"XDG_CACHE_HOME\"]=\"/tmp\"\n",
                 "# Install TensorFlow\n",
                 "stdout = subprocess.check_output(\n",
                 "    \"pip3 install tensorflow\",\n",