|
1 | 1 | # SQL Server big data clusters |
2 | 2 |
|
3 | | -The new built-in notebooks in Azure Data Studio enables data scientists and data engineers to run Python, R, Scala, or Spark SQL code against the cluster. |
| 3 | +SQL Server Big Data cluster bundles Spark and HDFS together with SQL server. Azure Data Studio IDE provides built in notebooks that enables data scientists and data engineers to run Spark notebooks and job in Python, R, or Scala code against the Big Data Cluster. This folder contains spark sample notebook on using Spark in SQL server Big data cluster |
4 | 4 |
|
5 | | -## Instructions to open a notebook from Azure Data Studio and execute the commands |
| 5 | +## Folder contents |
6 | 6 |
|
7 | | -1. Connect to the SQL Server Master instance in a big data cluster |
| 7 | +[PySpark Hello World](dataloading/hello_PySpark.ipynb) |
8 | 8 |
|
9 | | -1. Right-click on the server name, select **Manage**, switch to **SQL Server Big Data Cluster** tab, and use open Notebook. |
| 9 | +[Scala Hello World ](dataloading/hello_Scala.ipynb) |
10 | 10 |
|
11 | | -1. Open the notebook in Azure Data Studio, wait for the “Kernel” and the target context (“Attach to”) to be populated. |
| 11 | +[SparkR Hello World ](dataloading/hello_sparkR.ipynb) |
12 | 12 |
|
13 | | -1. Run each cell in the Notebook sequentially. |
| 13 | +[DataLoading - Transforming CSV to Parquet](dataloading/transform-csv-files.ipynb/) |
14 | 14 |
|
15 | | -## __[data-loading](data-loading/)__ |
| 15 | +[Data Transfer - Spark to SQL using Spark JDBC connector](data-virtualization/spark_to_sql_jdbc.ipynb/) |
16 | 16 |
|
17 | | -This folder contains samples that show how to load data using Spark and query them using SQL statements. |
| 17 | +[Data Transfer - Spark to SQL using MSSQL Spark connector](spark_to_sql/mssql_spark_connector.ipynb/) |
18 | 18 |
|
19 | | -[data-loading/transform-csv-files.ipynb](dataloading/transform-csv-files.ipynb/) |
20 | | - |
21 | | -This samnple notebook shows how to transform CSV files in HDFS to parquet files. |
| 19 | +## Instructions on how to run in Azure Data Studio |
22 | 20 |
|
23 | | -[dataloading/spark-sql.ipynb](dataloading/spark-sql.ipynb/) |
| 21 | +[data-loading/transform-csv-files.ipynb](dataloading/transform-csv-files.ipynb/) |
24 | 22 |
|
25 | | -This samnple notebook shows how to query hive tables created from Spark. |
| 23 | +2. From Azure Data Studio Connect to the SQL Server Master instance in a big data cluster. |
26 | 24 |
|
27 | | -## __[data-virtualization](data-virtualization/)__ |
| 25 | +3. Right-click on the server name, select **Manage**, switch to **SQL Server Big Data Cluster** tab, and open the notebook in Azure Data Studio. Wait for the “Kernel” and the target context (“Attach to”) to be populated. If required set the relevant “Kernel” ( e.g **PySpark3** ) and **Attach to** needs to be the IP address of your big data cluster endpoint. |
28 | 26 |
|
29 | | -This folder contains samples that show how to integrate Spark with other data sources. |
| 27 | +4. Run each cell in the Notebook sequentially. |
0 commit comments