Skip to content

Commit 5fdc3c6

Browse files
committed
Added README file to each folder
1 parent 09b207f commit 5fdc3c6

7 files changed

Lines changed: 60 additions & 51 deletions

File tree

samples/features/sql-big-data-cluster/data-pool/data-ingestion-spark.md renamed to samples/features/sql-big-data-cluster/data-pool/README.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,20 @@
1-
# Data ingestion using Spark streaming
1+
# Data pools in SQL Server 2019 big data cluster
22

3-
SQL Server Big Data clusters provide scale-out compute and storage to improve the performance of analyzing any data. Data from a variety of sources can be ingested and distributed across data pool instances for analysis. In this example, you are going to use Spark to read and transform data from HDFS and cache it in a data pool. Querying the external table created over this aggregated data stored in data pools will be much more efficient than going to the raw data always.
3+
SQL Server Big Data clusters provide scale-out compute and storage to improve the performance of analyzing any data. Data from a variety of sources can be ingested and distributed across data pool instances for analysis. In this example, we will insert data from a SQL query into an external table stored in a data pool and query it.
4+
5+
## Data ingestion using SQL stored procedure
6+
7+
SQL Server Big Data clusters provide scale-out compute and storage to improve the performance of analyzing any data. Data from a variety of sources can be ingested and distributed across data pool instances for analysis. In this example, we will insert data from a SQL query into an external table stored in a data pool and query it.
8+
9+
### Instructions
10+
11+
1. Connect to SQL Server Master instance.
12+
13+
1. Execute the .sql script [data-ingestion-sql.sql](data-ingestion-sql.sql).
14+
15+
## Data ingestion using Spark streaming
16+
17+
In this example, you are going to use Spark to read and transform data from HDFS and cache it in a data pool. Querying the external table created over this aggregated data stored in data pools will be much more efficient than going to the raw data always.
418

519
### Instructions
620

samples/features/sql-big-data-cluster/data-pool/data-ingestion-sql.md

Lines changed: 0 additions & 9 deletions
This file was deleted.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Data virtualization in SQL Server 2019 big data cluster
2+
3+
In SQL Server 2019 big data clusters, the SQL Server engine has gained the ability to natively read HDFS files, such as CSV and parquet files, by using SQL Server instances collocated on each of the HDFS data nodes to filter and aggregate data locally in parallel across all of the HDFS data nodes. SQL Server 2019 introduces new ODBC connectors to data sources like SQL Server, Oracle, MongoDB and Teradata.
4+
5+
## Query data in HDFS from SQL Server master
6+
7+
In this example, you are going to create an external table in the SQL Server Master instance that points to data in HDFS within the SQL Server Big data cluster. Then you will join the data in the external table with high value data in SQL Master instance.
8+
9+
### Instructions
10+
11+
1. Connect to SQL Server Master instance.
12+
13+
1. Execute the [external-table-hdfs.sql](external-table-hdfs.sql).
14+
15+
## Query data in Oracle from SQL Server master
16+
17+
In this example, you are going to create an external table in SQL Server Master instance over the inventory table that sits on an Oracle server.
18+
19+
**Before you begin**, you need to have an Oracle instance and credentials. Execute the SQL script [inventory-ora.sql](inventory-ora.sql/) in Oracle to create the table and import the "inventory.csv" file created by the bootstrap sample database.
20+
21+
### Instructions
22+
23+
1. Connect to SQL Server Master instance.
24+
25+
1. Execute the SQL [external-table-oracle.sql](external-table-oracle.sql/).

samples/features/sql-big-data-cluster/data-virtualization/external-table-hdfs.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

samples/features/sql-big-data-cluster/data-virtualization/external-table-oracle.md

Lines changed: 0 additions & 12 deletions
This file was deleted.

samples/features/sql-big-data-cluster/machine-learning/spark/ml-spark.md renamed to samples/features/sql-big-data-cluster/machine-learning/README.md

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,38 @@
1-
# Machine learning with Spark on SQL Server 2019 big data cluster
1+
# Machine learning in SQL Server 2019 big data cluster
2+
3+
## SQL Server Machine Learning Services on SQL Master instance
4+
5+
In this example, we are building a machine learning model using R and a logistic regression algorithm for a recommendation engine on an online store. Based on existing users' click pattern online and their interest in other categories and demographics, we are training a machine learning model. This model will then be used to predict if the visitor is interested in a given item category using the T-SQL PREDICT function.
6+
7+
### Instructions
8+
9+
1. Connect to SQL Server Master instance.
10+
11+
1. Execute the SQL [sql/book-click-prediction-r.sql](sql/book-click-prediction-r.sql/).
12+
13+
## Machine learning using Spark
214

315
The new built-in notebooks in Azure Data Studio enables data scientists and data engineers to run Python, R, or Scala code against the cluster. This is a great way to explore the data and build machine learning models. Notebooks facilitate collaboration between teammates working on a shared data set.
416

517
This sample builds a machine learning model using AdultCensusIncome.csv available [here](https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv).
618

719

8-
## Instructions
20+
### Instructions
921

1022
In this example, you are going to run sample notebooks that build a machine learning model over a public data set.
1123

1224
Follow the steps below to get up and running with the sample.
1325

14-
## Upload the data for analysis
26+
#### Upload the data for analysis
1527

1628
1. From Azure Data Studio, connect to the SQL Server big data cluster endpoint. Information about how you connect from Azure Data Studio can be found [here](https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-server-2019-extension?view=sql-server-ver15).
1729

1830
2. Download the data from https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv and save AdultCensusIncome.csv in a folder called spark_ml in HDFS.
1931

20-
## Run notebook for data preparation
32+
#### Run notebook for data preparation
2133
As a first step we'll load the data, do some basic cleanup on that data, choose the features that we want to build the machine learning model with. Finally we'll split the data set as training and test sets.
2234

23-
1. Download and save the notebook file [1-data-prep.ipynb](1-data-prep.ipynb/) locally.
35+
1. Download and save the notebook file [spark/1-data-prep.ipynb](spark/1-data-prep.ipynb/) locally.
2436

2537
1. Open the notebook file in Azure Data Studio (right click on the SQL Server big data cluster server name-> **Manage**-> Open Notebook.
2638

@@ -30,10 +42,10 @@ As a first step we'll load the data, do some basic cleanup on that data, choose
3042

3143
1. The training and test sets created would be stored as /spark_ml/AdultCensusIncomeTrain and /spark_ml/AdultCensusIncomeTest
3244

33-
## Run notebook to create a machine learning model and use it to predict
45+
#### Run notebook to create a machine learning model and use it to predict
3446
We'll now create the machine learning model, use the model to predict results on the test set and then save the created model to a file.
3547

36-
1. Download and save the notebook (ipynb) file [2-build-ml-model.ipynb] (2-build-ml-model.ipynb/)
48+
1. Download and save the notebook (ipynb) file [spark\2-build-ml-model.ipynb](spark/2-build-ml-model.ipynb/)
3749

3850
1. Open the notebook file in Azure Data Studio (right click on the SQL Server big data cluster server name-> **Manage**-> Open Notebook.
3951

samples/features/sql-big-data-cluster/machine-learning/sql/ml-master.md

Lines changed: 0 additions & 10 deletions
This file was deleted.

0 commit comments

Comments
 (0)