Skip to content

Commit 1d79ad8

Browse files
authored
Merge branch 'master' into pmasl
2 parents 7f6219a + bd3cf0e commit 1d79ad8

25 files changed

Lines changed: 836 additions & 206 deletions

samples/features/sql-big-data-cluster/README.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ Installation instructions for SQL Server 2019 big data clusters can be found [he
99
## Executing the sample scripts
1010
The scripts should be executed in a specific order to test the various features. Execute the scripts from each folder in below order:
1111

12-
1. __[spark/dataloading/transform-csv-files.sql](spark/dataloading/transform-csv-files.sql)__
12+
1. __[spark/dataloading/transform-csv-files.ipynb](spark/dataloading/transform-csv-files.ipynb)__
13+
1. __[data-virtualization/generic-odbc](data-virtualization/generic-odbc)__
14+
1. __[data-virtualization/hadoop](data-virtualization/hadoop)__
1315
1. __[data-virtualization/storage-pool](data-virtualization/storage-pool)__
1416
1. __[data-virtualization/oracle](data-virtualization/oracle)__
1517
1. __[data-pool](data-pool/)__
@@ -28,16 +30,20 @@ The sample script [data-pool/data-ingestion-sql.sql](data-pool/data-ingestion-sq
2830

2931
## __[data-virtualization](data-virtualization/)__
3032

31-
SQL Server 2019 or SQL Server 2019 big data cluster can use PolyBase external tables to connect to other data sources.
33+
SQL Server 2019 or SQL Server 2019 big data cluster can use PolyBase external tables to connect to other data sources.
3234

33-
### External table over Storage Pool
34-
SQL Server 2019 big data cluster contains a storage pool consisting of HDFS, Spark and SQL Server instances. The [data-virtualization/storage-pool](data-virtualization/storage-pool) folder contains samples that demonstrate how to query data in HDFS inside SQL Server 2019 big data cluster.
35+
### External table over Generic ODBC data source
36+
The [data-virtualization/generic-odbc](data-virtualization/generic-odbc) folder contains samples that demonstrate how to query data in MySQL & PostgreSQL using external tables and generic ODBC data source. The generic ODBC data soruce can be used only in SQL Server 2019 on Windows.
3537

36-
### External table over Oracle
37-
SQL Server 2019 uses new ODBC connectors to enable connectivity to SQL Server, Oracle, Teradata, MongoDB and generic ODBC data sources.
38+
### External table over Hadoop
39+
The [data-virtualization/hadoop](data-virtualization/hadoop) folder contains samples that demonstrate how to query data in HDFS using external tables. This demonstrates the functionality available from SQL Server 2016 using the HADOOP data source.
3840

41+
### External table over Oracle
3942
The [data-virtualization/oracle](data-virtualization/oracle) folder contains samples that demonstrate how to query data in Oracle using external tables.
4043

44+
### External table over Storage Pool
45+
SQL Server 2019 big data cluster contains a storage pool consisting of HDFS, Spark and SQL Server instances. The [data-virtualization/storage-pool](data-virtualization/storage-pool) folder contains samples that demonstrate how to query data in HDFS inside SQL Server 2019 big data cluster.
46+
4147
## __[deployment](deployment/)__
4248

4349
The [deployment](deployment) folder contains the scripts for deploying a Kubernetes cluster for SQL Server 2019 big data cluster.

samples/features/sql-big-data-cluster/bootstrap-sample-db.sql

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -76,20 +76,20 @@ BEGIN
7676

7777
-- Create default data sources for SQL Big Data Cluster
7878
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'SqlDataPool')
79-
IF SERVERPROPERTY('ProductLevel') = 'CTP2.5'
80-
CREATE EXTERNAL DATA SOURCE SqlDataPool
81-
WITH (LOCATION = 'sqldatapool://service-mssql-controller:8080/datapools/default');
82-
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
79+
IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
8380
CREATE EXTERNAL DATA SOURCE SqlDataPool
8481
WITH (LOCATION = 'sqldatapool://controller-svc:8080/datapools/default');
82+
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.1'
83+
CREATE EXTERNAL DATA SOURCE SqlDataPool
84+
WITH (LOCATION = 'sqldatapool://controller-svc/default');
8585

8686
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'SqlStoragePool')
87-
IF SERVERPROPERTY('ProductLevel') = 'CTP2.5'
88-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
89-
WITH (LOCATION = 'sqlhdfs://nmnode-0-0.nmnode-0-svc:50070');
90-
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
87+
IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
9188
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9289
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
90+
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.1'
91+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
92+
WITH (LOCATION = 'sqlhdfs://controller-svc/default');
9393

9494
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'HadoopData')
9595
CREATE EXTERNAL DATA SOURCE HadoopData

samples/features/sql-big-data-cluster/data-pool/data-ingestion-sql.sql

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ GO
44
-- Create external data source for Data Pool inside a SQL big data cluster
55
--
66
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'SqlDataPool')
7-
IF SERVERPROPERTY('ProductLevel') = 'CTP2.5'
8-
CREATE EXTERNAL DATA SOURCE SqlDataPool
9-
WITH (LOCATION = 'sqldatapool://service-mssql-controller:8080/datapools/default');
10-
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
11-
CREATE EXTERNAL DATA SOURCE SqlDataPool
12-
WITH (LOCATION = 'sqldatapool://controller-svc:8080/datapools/default');
7+
IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
8+
CREATE EXTERNAL DATA SOURCE SqlDataPool
9+
WITH (LOCATION = 'sqldatapool://controller-svc:8080/datapools/default');
10+
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.1'
11+
CREATE EXTERNAL DATA SOURCE SqlDataPool
12+
WITH (LOCATION = 'sqldatapool://controller-svc/default');
1313

1414
-- Create external table in a data pool in SQL Server 2019 big data cluster.
1515
-- The SqlDataPool data source is a special data source that is available in
@@ -26,12 +26,6 @@ IF NOT EXISTS(SELECT * FROM sys.external_tables WHERE name = 'web_clickstream_cl
2626
);
2727
GO
2828

29-
-- Currently the create external table operation is asynchronous and there is no
30-
-- way to determine completion of the operation. To prevent failures of the insert
31-
-- into the external table, wait for few minutes.
32-
WAITFOR DELAY '00:02:00';
33-
GO
34-
3529
-- Insert results of a SELECT statement into the external table created on the data pool.
3630
-- Store summary results for quick access instead of going to the source tables always.
3731
--
@@ -72,5 +66,6 @@ GO
7266
-- Cleanup
7367
/*
7468
DROP EXTERNAL TABLE [dbo].[web_clickstream_clicks_data_pool];
69+
DROP EXTERNAL DATA SOURCE SqlDataPool;
7570
GO
7671
*/

samples/features/sql-big-data-cluster/data-virtualization/oracle/customer-oracle.sql

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,23 +22,23 @@ IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'OracleSalesS
2222
CREATE EXTERNAL TABLE [dbo].[customer_ora]
2323
(
2424
[C_CUSTOMER_SK] DECIMAL(10,0),
25-
[C_CUSTOMER_ID] NVARCHAR(16) COLLATE SQL_Latin1_General_CP1_CI_AS,
25+
[C_CUSTOMER_ID] NVARCHAR(16) COLLATE Latin1_General_100_BIN2_UTF8,
2626
[C_CURRENT_CDEMO_SK] DECIMAL(10,0),
2727
[C_CURRENT_HDEMO_SK] DECIMAL(10,0),
2828
[C_CURRENT_ADDR_SK] DECIMAL(10,0),
2929
[C_FIRST_SHIPTO_DATE_SK] DECIMAL(10,0),
3030
[C_FIRST_SALES_DATE_SK] DECIMAL(8,0),
31-
[C_SALUTATION] NVARCHAR(10) COLLATE SQL_Latin1_General_CP1_CI_AS,
32-
[C_FIRST_NAME] NVARCHAR(20) COLLATE SQL_Latin1_General_CP1_CI_AS,
33-
[C_LAST_NAME] NVARCHAR(30) COLLATE SQL_Latin1_General_CP1_CI_AS,
34-
[C_PREFERRED_CUST_FLAG] NVARCHAR COLLATE SQL_Latin1_General_CP1_CI_AS,
31+
[C_SALUTATION] NVARCHAR(10) COLLATE Latin1_General_100_BIN2_UTF8,
32+
[C_FIRST_NAME] NVARCHAR(20) COLLATE Latin1_General_100_BIN2_UTF8,
33+
[C_LAST_NAME] NVARCHAR(30) COLLATE Latin1_General_100_BIN2_UTF8,
34+
[C_PREFERRED_CUST_FLAG] NVARCHAR COLLATE Latin1_General_100_BIN2_UTF8,
3535
[C_BIRTH_DAY] DECIMAL(8,0),
3636
[C_BIRTH_MONTH] DECIMAL(8,0),
3737
[C_BIRTH_YEAR] DECIMAL(8,0),
38-
[C_BIRTH_COUNTRY] NVARCHAR(20) COLLATE SQL_Latin1_General_CP1_CI_AS,
39-
[C_LOGIN] NVARCHAR(13) COLLATE SQL_Latin1_General_CP1_CI_AS,
40-
[C_EMAIL_ADDRESS] NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS,
41-
[C_LAST_REVIEW_DATE] NVARCHAR(20) COLLATE SQL_Latin1_General_CP1_CI_AS
38+
[C_BIRTH_COUNTRY] NVARCHAR(20) COLLATE Latin1_General_100_BIN2_UTF8,
39+
[C_LOGIN] NVARCHAR(13) COLLATE Latin1_General_100_BIN2_UTF8,
40+
[C_EMAIL_ADDRESS] NVARCHAR(50) COLLATE Latin1_General_100_BIN2_UTF8,
41+
[C_LAST_REVIEW_DATE] NVARCHAR(20) COLLATE Latin1_General_100_BIN2_UTF8
4242
)
4343
WITH (DATA_SOURCE=[OracleSalesSrvr],
4444
LOCATION='<oracle_service_name,nvarchar(30),xe>.SALES.CUSTOMER');

samples/features/sql-big-data-cluster/data-virtualization/storage-pool/product-reviews-hdfs-csv.sql

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ GO
44
-- Create external data source for HDFS inside SQL big data cluster.
55
--
66
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'SqlStoragePool')
7-
IF SERVERPROPERTY('ProductLevel') = 'CTP2.5'
8-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9-
WITH (LOCATION = 'sqlhdfs://nmnode-0-0.nmnode-0-svc:50070');
10-
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
11-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12-
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
7+
IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
8+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9+
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
10+
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.1'
11+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12+
WITH (LOCATION = 'sqlhdfs://controller-svc/default');
1313

1414
-- Create file format for CSV separated file with appropriate properties.
1515
--

samples/features/sql-big-data-cluster/data-virtualization/storage-pool/product-reviews-hdfs-parquet.sql

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ GO
44
-- Create external data source for HDFS inside SQL big data cluster.
55
--
66
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'SqlStoragePool')
7-
IF SERVERPROPERTY('ProductLevel') = 'CTP2.5'
8-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9-
WITH (LOCATION = 'sqlhdfs://nmnode-0-0.nmnode-0-svc:50070');
10-
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
11-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12-
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
7+
IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
8+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9+
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
10+
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.1'
11+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12+
WITH (LOCATION = 'sqlhdfs://controller-svc/default');
1313

1414
-- Create file format for parquet file with appropriate properties.
1515
--

samples/features/sql-big-data-cluster/data-virtualization/storage-pool/product-reviews-hdfs-tsv.sql

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ GO
44
-- Create external data source for HDFS inside SQL big data cluster.
55
--
66
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'SqlStoragePool')
7-
IF SERVERPROPERTY('ProductLevel') = 'CTP2.5'
8-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9-
WITH (LOCATION = 'sqlhdfs://nmnode-0-0.nmnode-0-svc:50070');
10-
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
11-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12-
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
7+
IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
8+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9+
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
10+
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.1'
11+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12+
WITH (LOCATION = 'sqlhdfs://controller-svc/default');
1313

1414
-- Create file format for tab separated file with appropriate properties.
1515
--

samples/features/sql-big-data-cluster/data-virtualization/storage-pool/web-clickstreams-hdfs-csv.sql

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ GO
44
-- Create external data source for HDFS inside SQL big data cluster.
55
--
66
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'SqlStoragePool')
7-
IF SERVERPROPERTY('ProductLevel') = 'CTP2.5'
8-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9-
WITH (LOCATION = 'sqlhdfs://nmnode-0-0.nmnode-0-svc:50070');
10-
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
11-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12-
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
7+
IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
8+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9+
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
10+
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.1'
11+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12+
WITH (LOCATION = 'sqlhdfs://controller-svc/default');
1313

1414
-- Create file format for CSV file with appropriate properties.
1515
--

samples/features/sql-big-data-cluster/data-virtualization/storage-pool/web-clickstreams-hdfs-parquet.sql

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ GO
44
-- Create external data source for HDFS inside SQL big data cluster.
55
--
66
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'SqlStoragePool')
7-
IF SERVERPROPERTY('ProductLevel') = 'CTP2.5'
8-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9-
WITH (LOCATION = 'sqlhdfs://nmnode-0-0.nmnode-0-svc:50070');
10-
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
11-
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12-
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
7+
IF SERVERPROPERTY('ProductLevel') = 'CTP3.0'
8+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
9+
WITH (LOCATION = 'sqlhdfs://controller-svc:8080/default');
10+
ELSE IF SERVERPROPERTY('ProductLevel') = 'CTP3.1'
11+
CREATE EXTERNAL DATA SOURCE SqlStoragePool
12+
WITH (LOCATION = 'sqlhdfs://controller-svc/default');
1313

1414
-- Create file format for parquet file with appropriate properties.
1515
--

samples/features/sql-big-data-cluster/deployment/aks/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Using this sample Python script, you will deploy a Kubernetes cluster in Azure u
1818
```
1919
- Install mssqlctl CLI latest version using . Run the command below using elevated priviledges (sudo or admin cmd window):
2020
```
21-
pip3 install -r https://private-repo.microsoft.com/python/ctp3.0/mssqlctl/requirements.txt
21+
pip3 install -r https://private-repo.microsoft.com/python/ctp3.1/mssqlctl/requirements.txt
2222
```
2323
1. Login into your Azure account. Run this command:
2424
```

0 commit comments

Comments
 (0)