You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The new built-in notebooks in Azure Data Studio enables data scientists and data engineers to run Python, R, Scala, or Spark SQL code against the cluster.
3
+
SQL Server Big Data cluster bundles Spark and HDFS together with SQL server. Azure Data Studio IDE provides built in notebooks that enables data scientists and data engineers to run Spark notebooks and job in Python, R, or Scala code against the Big Data Cluster. This folder contains spark sample notebook on using Spark in SQL server Big data cluster
4
4
5
-
## Instructions to open a notebook from Azure Data Studio and execute the commands
5
+
## Folder contents
6
6
7
-
1. Connect to the SQL Server Master instance in a big data cluster
3. Right-click on the server name, select **Manage**, switch to **SQL Server Big Data Cluster** tab, and open the notebook in Azure Data Studio. Wait for the “Kernel” and the target context (“Attach to”) to be populated. If required set the relevant “Kernel” ( e.g **PySpark3** ) and **Attach to** needs to be the IP address of your big data cluster endpoint.
28
26
29
-
This folder contains samples that show how to integrate Spark with other data sources.
Copy file name to clipboardExpand all lines: samples/features/sql-big-data-cluster/spark/data-virtualization/spark_to_sql_jdbc.ipynb
+32-14Lines changed: 32 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@
19
19
"cells": [
20
20
{
21
21
"cell_type": "markdown",
22
-
"source": "# Read and write from Spark to SQL\r\nA typical big data scenario is large scale ETL in Spark and writing the processed data to SQLServer. The following samples shows \r\n- reading a HDFS file, \r\n- some basic processing on it and \r\n- then processed data to SQL Server table.\r\n\r\nNeed a database precreated in SQL for this sample. Here we are using database name \"MyTestDatabase\" that can be created using SQL statements below.\r\n\r\n``` sql\r\nCreate DATABASE MyTestDatabase\r\nGO \r\n```\r\n",
22
+
"source": "# Read and write from Spark to SQL\r\nA typical big data scenario is large scale ETL in Spark and post processing the data is written out to SQLServer for access to LOB applications. This sample shows how to write to SQLServer from Spark. The main steps in the sample are \r\n- Reading a HDFS file, \r\n- Basic processing on it and \r\n- Then writing processed data to SQL Server table using JDBC\r\n\r\nPreReq : \r\n- The sample uses a SQL database named \"MyTestDatabase\". Create this before you run this sample. The database can be created as follows\r\n ``` sql\r\n Create DATABASE MyTestDatabase\r\n GO \r\n``` \r\n- Download [AdultCensusIncome.csv]( https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv ) to your local machine. Create a hdfs folder named spark_data and upload the file there. \r\n\r\n\r\n",
"source": "\r\n#Process this data. Very simple data cleanup steps. Replacing \"-\" with \"_\" in column names\r\ncolumns_new = [col.replace(\"-\", \"_\") for col in df.columns]\r\ndf = df.toDF(*columns_new)\r\ndf.show(5)\r\n\r\n",
"source": "#Write from Spark to SQL table using JDBC\r\nprint(\"Use build in JDBC connector to write to SQLServer master instance in Big data \")\r\n\r\nservername = \"jdbc:sqlserver://mssql-master-pool-0.service-master-pool\"\r\ndbname = \"MyTestDatabase\"\r\nurl = servername + \";\" + \"databaseName=\" + dbname + \";\"\r\n\r\nc = \"dbo.AdultCensus\"\r\nuser = \"sa\"\r\npassword = \"****\"\r\n\r\nprint(\"url is \", url)\r\n\r\ntry:\r\n df.write \\\r\n .format(\"jdbc\") \\\r\n .mode(\"overwrite\") \\\r\n .option(\"url\", url) \\\r\n .option(\"dbtable\", dbtable) \\\r\n .option(\"user\", user) \\\r\n .option(\"password\", password)\\\r\n .save()\r\nexcept ValueError as error :\r\n print(\"JDBC Write failed\", error)\r\n\r\nprint(\"JDBC Write done \")\r\n\r\n\r\n",
71
+
"source": "#Write from Spark to SQL table using JDBC\r\nprint(\"Use build in JDBC connector to write to SQLServer master instance in Big data \")\r\n\r\nservername = \"jdbc:sqlserver://master-0.master-svc\"\r\ndbname = \"MyTestDatabase\"\r\nurl = servername + \";\" + \"databaseName=\" + dbname + \";\"\r\n\r\ndbtable = \"dbo.AdultCensus\"\r\nuser = \"sa\"\r\npassword = \"Yukon900\"\r\n\r\nprint(\"url is \", url)\r\n\r\ntry:\r\n df.write \\\r\n .format(\"jdbc\") \\\r\n .mode(\"overwrite\") \\\r\n .option(\"url\", url) \\\r\n .option(\"dbtable\", dbtable) \\\r\n .option(\"user\", user) \\\r\n .option(\"password\", password)\\\r\n .save()\r\nexcept ValueError as error :\r\n print(\"JDBC Write failed\", error)\r\n\r\nprint(\"JDBC Write done \")\r\n\r\n\r\n",
54
72
"metadata": {},
55
73
"outputs": [
56
74
{
57
-
"output_type": "stream",
58
75
"name": "stdout",
59
-
"text": "Use build in JDBC connector to write to SQLServer master instance in Big data \nurl is jdbc:sqlserver://mssql-master-pool-0.service-master-pool;databaseName=MyTestDatabase;\nJDBC Write done"
76
+
"text": "Use build in JDBC connector to write to SQLServer master instance in Big data \nurl is jdbc:sqlserver://master-0.master-svc;databaseName=MyTestDatabase;\nJDBC Write done",
77
+
"output_type": "stream"
60
78
}
61
79
],
62
-
"execution_count": 10
80
+
"execution_count": 9
63
81
},
64
82
{
65
83
"cell_type": "code",
66
84
"source": "#Read to Spark from SQL table using JDBC\r\nprint(\"read data from SQL server table \")\r\njdbcDF = spark.read \\\r\n .format(\"jdbc\") \\\r\n .option(\"url\", url\r\n ) \\\r\n .option(\"dbtable\", dbtable) \\\r\n .option(\"user\", user) \\\r\n .option(\"password\", password) \\\r\n .load()\r\n\r\njdbcDF.show(5)",
0 commit comments