Skip to content

Commit 86e2c8d

Browse files
authored
Merge pull request #564 from inchiosa/master
Add instruction for CTP 2.5
2 parents f767a69 + 673f55e commit 86e2c8d

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

samples/features/sql-big-data-cluster/machine-learning/spark/h2o/h2o-automl-powerplant.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@
203203
},
204204
{
205205
"cell_type": "markdown",
206-
"source": "# Configuration settings for scaling to larger data\n\n## Number and size of nodes in our Kubernetes cluster\nWe can control the number and size of nodes in our Kubernetes cluster via the node-vm-size and node-count switches in our `aks create` command:\n\n`az aks create --name mycluster --resource-group myrg --generate-ssh-keys --node-vm-size Standard_DS14_v2 --node-count 3 --kubernetes-version 1.10.9`\n\nMore information is available [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-on-aks?view=sqlallproducts-allversions#create-a-kubernetes-cluster).\n\n## Number of Spark pods\nWe can control the number of Spark pods via the CLUSTER_STORAGE_POOL_REPLICAS environment variable used by `mssqlctl create cluster`:\n\nSET CLUSTER_STORAGE_POOL_REPLICAS=2\n\n## YARN scheduler memory and cores\nWe can control the YARN scheduler memory and cores via the following environment variable used by `mssqlctl create cluster`:\n\n- YARN_SCHEDULER_MAX_MEMORY\n- YARN_SCHEDULER_MAX_VCORES\n- YARN_NODEMANAGER_RESOURCE_MEMORY\n- YARN_NODEMANAGER_RESOURCE_VCORES\n\nFurther information regarding mssqlctl environtment variables is available [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-guidance?view=sqlallproducts-allversions#define-environment-variables).\n\n## Livy timeout\nThe Livy timeout sets a limit on the runtime of a cell in a PySpark3 Jupyter notebook. In SQL Server 2019 Big Data CTP 2.1, the Livy timeout defaults to 1 hour. In CTP 2.2, it defaults to 24 days. One can modify this as follows:\n\n- Log into the mssql-master-pool-0 pod using this command (requires permission to run kubectl):\n\n```\nkubectl exec -it mssql-master-pool-0 -n <your-cluster-name> -- /bin/bash\n```\n- To set the Livy timeout to 24 days, run the following command or edit /livy/conf/livy.conf accordingly:\n\n```\necho 'livy.server.session.timeout = 24d' | cat >> /livy/conf/livy.conf \n```\n- Then restart the Livy server by running the following command:\n\n```\nsupervisorctl restart livy\n```",
206+
"source": "# Configuration settings for scaling to larger data\n\n## Number and size of nodes in our Kubernetes cluster\nWe can control the number and size of nodes in our Kubernetes cluster via the node-vm-size and node-count switches in our `aks create` command:\n\n`az aks create --name mycluster --resource-group myrg --generate-ssh-keys --node-vm-size Standard_DS14_v2 --node-count 3 --kubernetes-version 1.10.9`\n\nMore information is available [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-on-aks?view=sqlallproducts-allversions#create-a-kubernetes-cluster).\n\n## Number of Spark pods\nWe can control the number of Spark pods via the CLUSTER_STORAGE_POOL_REPLICAS environment variable used by `mssqlctl create cluster`:\n\nSET CLUSTER_STORAGE_POOL_REPLICAS=2\n\n## YARN scheduler memory and cores\nWe can control the YARN scheduler memory and cores via the following environment variable used by `mssqlctl create cluster`:\n\n- YARN_SCHEDULER_MAX_MEMORY\n- YARN_SCHEDULER_MAX_VCORES\n- YARN_NODEMANAGER_RESOURCE_MEMORY\n- YARN_NODEMANAGER_RESOURCE_VCORES\n\nFurther information regarding mssqlctl environtment variables is available [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-guidance?view=sqlallproducts-allversions#define-environment-variables).\n\nIn CTP 2.5 and later, these environment variables are replaced by similarly named properties in a JSON file. See [Custom configurations](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-guidance?view=sqlallproducts-allversions#customconfig).\n\n## Livy timeout\nThe Livy timeout sets a limit on the runtime of a cell in a PySpark3 Jupyter notebook. In SQL Server 2019 Big Data CTP 2.1, the Livy timeout defaults to 1 hour. In CTP 2.2, it defaults to 24 days. One can modify this as follows:\n\n- Log into the mssql-master-pool-0 pod using this command (requires permission to run kubectl):\n\n```\nkubectl exec -it mssql-master-pool-0 -n <your-cluster-name> -- /bin/bash\n```\n- To set the Livy timeout to 24 days, run the following command or edit /livy/conf/livy.conf accordingly:\n\n```\necho 'livy.server.session.timeout = 24d' | cat >> /livy/conf/livy.conf \n```\n- Then restart the Livy server by running the following command:\n\n```\nsupervisorctl restart livy\n```",
207207
"metadata": {}
208208
},
209209
{

0 commit comments

Comments
 (0)