|
97 | 97 | { |
98 | 98 | "cell_type": "markdown", |
99 | 99 | "source": [ |
100 | | - "# Install Python Packages at Runtime for use with PySpark\n", |
101 | | - "\n", |
102 | | - "The following code can be used to install packages on each executor node at runtime. \\\n", |
103 | | - "**Note**: This functionality is not available on a non-root BDC deployment (including OpenShift). This installation is temporary, and must be performed each time a new Spark session is invoked.\n", |
104 | | - "\n", |
105 | | - "If you want to use this from CU5 upwards, you must add two settings pre-deployment.\n", |
106 | | - "\n", |
107 | | - "In contron.json, add (under security):\n", |
108 | | - "\n", |
109 | | - "_\"allowRunAsRoot\": true_\n", |
110 | | - "\n", |
111 | | - "In BDC.json, add (under spec.services.spark.settings): \n", |
112 | | - "\n", |
113 | | - "_\"yarn-site.yarn.nodemanager.container-executor.class\": \"org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor\"_\n", |
114 | | - "\n", |
115 | | - "``` Python\n", |
116 | | - "import subprocess\n", |
117 | | - "import os\n", |
118 | | - "os.environ[\"XDG_CACHE_HOME\"]=\"/tmp\"\n", |
119 | | - "# Install TensorFlow\n", |
120 | | - "stdout = subprocess.check_output(\n", |
121 | | - " \"pip3 install tensorflow\",\n", |
122 | | - " stderr=subprocess.STDOUT,\n", |
123 | | - " shell=True).decode(\"utf-8\")\n", |
124 | | - "print(stdout)\n", |
125 | | - "```" |
| 100 | + "# Install Python Packages at Runtime for use with PySpark\r\n", |
| 101 | + "\r\n", |
| 102 | + "This capability changed significantly after SQL Server Big Data Clusters CU10.\r\n", |
| 103 | + "\r\n", |
| 104 | + "For more information on this scenario, refer to [Spark library management](https://docs.microsoft.com/sql/big-data-cluster/spark-install-packages?view=sql-server-ver15)\r\n" |
126 | 105 | ], |
127 | 106 | "metadata": { |
128 | 107 | "azdata_cell_guid": "07944b55-7266-4fcd-8e9b-9fd6cb8cfef5" |
|
0 commit comments