You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: samples/features/sql-big-data-cluster/README.md
+27-14Lines changed: 27 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,33 +11,46 @@ Installation instructions for SQL Server 2019 big data cluster can be found [her
11
11
12
12
## Samples Setup
13
13
14
-
**Before you begin**, download the sample database [backup file](https://sqlchoice.blob.core.windows.net/sqlchoice/static/tpcxbb_1gb.bak) and save it locally. Run the CMD script called *bootstrap-sample-db.cmd* or the shell script *bootstrap-sample-db.sh* depending on your platform. This script will restore the database on the SQL Master instance, execute the *bootstrap-sample-db.sql* script, create the database objects needed, export the web_clickstreams & inventory tables to CSV file, and upload the web_clickstreams CSV file to HDFS inside the SQL Server 2019 big data cluster.
14
+
**Before you begin**, run the CMD script called [bootstrap-sample-db.cmd](bootstrap-sample-db.cmd) or the shell script [bootstrap-sample-db.sh](bootstrap-sample-db.sh) depending on your platform. This script does the following operations:
15
+
16
+
1. Downloads the tpcx-bb 1GB sample database
17
+
1. Restores the database on the SQL Master instance
18
+
1. Executes the bootstrap-sample-db.SQL script
19
+
1. Exports the web_clickstreams, inventory, customer & product_reviews tables to files
20
+
1. Uploads the web_clickstreams CSV file to the HDFS inside the SQL Server 2019 big data cluster
15
21
16
22
__[data-pool](data-pool/)__
17
23
24
+
SQL Server 2019 big data cluster contains a data pool which consists of many SQL Server instances to store data & query in a scale-out manner.
25
+
18
26
### Data ingestion using Spark
19
-
Connect to the master instance in your SQL Server big data cluster and the SQL Server big data cluster endpoint, and follow the steps in *data-pool/data-ingestion-spark.sql*.
27
+
The sample script [data-pool/data-ingestion-spark.sql](data-pool/data-ingestion-spark.sql) shows how to perform data ingestion from Spark into datapool table(s).
20
28
21
29
### Data ingestion using sql
22
-
Connect to the master instance in your SQL Server big data cluster and execute the steps in *data-pool/data-ingestion-sql.sql*.
30
+
The sample script [data-pool/data-ingestion-sql.sql](data-pool/data-ingestion-sql.sql) shows how to perform data ingestion from T-SQL into datapool table(s).
23
31
24
32
__[data-virtualization](data-virtualization/)__
25
33
26
-
### External table over HDFS
27
-
Connect to the master instance in your SQL Server big data cluster and execute the steps in *data-virtualization/external-table-hdfs.sql*.
34
+
SQL Server 2019 or SQL Server 2019 big data cluster can use PolyBase external tables to connect to other data sources.
35
+
36
+
### External table over Storage Pool
37
+
SQL Server 2019 big data cluster contains a storage pool consisting of HDFS, Spark and SQL Server instances. The [data-virtualization/storage-pool](data-virtualization/storage-pool) folder contains samples that demonstrate how to query data in HDFS inside SQL Server 2019 big data cluster.
28
38
29
39
### External table over Oracle
30
-
To execute this sample script, you will need following:
31
-
1. Oracle instance and credentials
32
-
1. Create inventory table in Oracle using [data-virtualization/inventory-oracle.sql](data-virtualization/inventory-oracle.sql/) script
33
-
1. Import the inventory.csv file generated by the bootstrap-sample-db script to a table in Oracle
40
+
SQL Server 2019 uses new ODBC connectors to enable connectivity to SQL Server, Oracle, Teradata, MongoDB and generic ODBC data sources.
34
41
35
-
Connect to the master instance in your SQL Server big data cluster and execute the steps in *data-virtualization/external-table-oracle.sql*.
42
+
The [data-virtualization/oracle](data-virtualization/oracle) folder contains samples that demonstrate how to query data in Oracle using external tables.
43
+
44
+
__[deployment](deployment/)__
45
+
46
+
The [deployment](deployment) folder contains the scripts for deploying a Kubernetes cluster for SQL Server 2019 big data cluster.
36
47
37
48
__[machine-learning](machine-learning/)__
38
49
39
-
### SQL Server ML Services on master instance
40
-
Connect to the master instance in your SQL Server big data cluster and execute the steps in *machine-learning/sql/book-category-r-ml.sql*.
50
+
SQL Server 2016 added support executing R scripts from T-SQL. SQL Server 2017 added support for executing Python scripts from T-SQL. SQL Server 2019 adds support for executing Java code from T-SQL. SQL Server 2019 big data cluster adds support for executing Spark code inside the big data cluster.
51
+
52
+
### SQL Server Machine Learning Services
53
+
The [machine-learning\sql](machine-learning\sql) folder contains the sample SQL scripts that show how to invoke R, Python, and Java code from T-SQL.
41
54
42
-
### Spark ML
43
-
Connect to the SQL Server big data cluster endpoint, and run the notebook files *machine-learning/spark/1-data-prep.ipynb* and *machine-learning/spark/2-build-ml-model.ipynb* cell by cell.
55
+
### Spark Machine Learning
56
+
The [machine-learning\spark](machine-learning\spark) folder contains the Spark samples.
Copy file name to clipboardExpand all lines: samples/features/sql-big-data-cluster/data-virtualization/oracle/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Data virtualization in SQL Server 2019
2
2
3
-
**Applies to: SQL Server 2019 on Windows or Linux, SQL Server 2019 big data cluster**
3
+
***Applies to:*** SQL Server 2019 on Windows or Linux, SQL Server 2019 big data cluster
4
4
5
5
SQL Server 2019 introduces new ODBC connectors to data sources like SQL Server, Oracle, MongoDB and Teradata. These connectors can be used from stand-alone SQL Server 2019 on Windows or Linux or SQL Server 2019 big data cluster.
This folder contains scripts that can be executed on Oracle server to create the necessary objects for data virtualization in SQL Server 2019 big data cluster.
3
+
This folder contains scripts that can be executed on Oracle server to create the necessary objects for data virtualization in SQL Server 2019+ or SQL Server 2019+ big data cluster.
4
4
5
5
# Instructions
6
6
7
-
1. Connect to Oracle instance.
7
+
***Before you begin***, you need the Oracle instance name and credentials.
8
8
9
-
1. Execute the [sales-user.sql](sales-user.sql). This script creates the sample user. If there is name conflict please change the script user/credentials.
10
-
11
-
1. Execute the [inventory.sql](inventory.sql). This script creates the inventory table.
9
+
1. Execute the [bootstrap-oracle.cmd](bootstrap-oracle.cmd) to create the necessary objects in Oracle
0 commit comments