Skip to content

Commit c677799

Browse files
committed
Split SQL19 & SQL19 big data cluster data-virtualization samples
1 parent 10199d1 commit c677799

17 files changed

Lines changed: 240 additions & 29 deletions

samples/features/sql-big-data-cluster/bootstrap-sample-db.cmd

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
@echo off
2-
REM CLICKSTREAM FILES
2+
REM bootstrap sample database CMD script
33
setlocal enableextensions
44
set CLUSTER_NAMESPACE=%1
55
set SQL_MASTER_IP=%2
@@ -33,9 +33,10 @@ popd
3333
echo Configuring sample database...
3434
%DEBUG% sqlcmd -S %SQL_MASTER_INSTANCE% -Usa -P%SQL_MASTER_SA_PASSWORD% -i "%STARTUP_PATH%bootstrap-sample-db.sql" -o "%STARTUP_PATH%bootstrap.out" -I -b || goto exit
3535

36-
for %%F in (web_clickstreams inventory) do (
36+
for %%F in (web_clickstreams inventory customer) do (
3737
echo Exporting %%F data...
38-
%DEBUG% bcp sales.dbo.%%F out "%STARTUP_PATH%%%F.csv" -S %SQL_MASTER_INSTANCE% -Usa -P%SQL_MASTER_SA_PASSWORD% -c -t, -o "%STARTUP_PATH%%%F.out" -e "%STARTUP_PATH%%%F.err" || goto exit
38+
if /i %%F EQU web_clickstreams (set DELIMITER=,) else (SET DELIMITER=^|)
39+
%DEBUG% bcp sales.dbo.%%F out "%STARTUP_PATH%%%F.csv" -S %SQL_MASTER_INSTANCE% -Usa -P%SQL_MASTER_SA_PASSWORD% -c -t"%DELIMITER%" -o "%STARTUP_PATH%%%F.out" -e "%STARTUP_PATH%%%F.err" || goto exit
3940
)
4041

4142
echo Exporting product_reviews data...
@@ -59,7 +60,7 @@ goto :eof
5960

6061
:exit
6162
echo Bootstrap of the sample database failed.
62-
exit /b %ERRORLEVEL%
63+
exit /b 1
6364

6465
:usage
6566
echo USAGE: %0 ^<CLUSTER_NAMESPACE^> ^<SQL_MASTER_IP^> ^<SQL_MASTER_SA_PASSWORD^> ^<BACKUP_FILE_PATH^> ^<KNOX_IP^> [^<KNOX_PASSWORD^>]
Lines changed: 6 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,15 @@
1-
# Data virtualization in SQL Server 2019 big data cluster
1+
# Data virtualization in SQL Server 2019 and SQL Server 2019 big data cluster
22

3-
In SQL Server 2019 big data clusters, the SQL Server engine has gained the ability to natively read HDFS files, such as CSV and parquet files, by using SQL Server instances collocated on each of the HDFS data nodes to filter and aggregate data locally in parallel across all of the HDFS data nodes. SQL Server 2019 introduces new ODBC connectors to data sources like SQL Server, Oracle, MongoDB and Teradata.
3+
In **SQL Server 2019 big data clusters**, the SQL Server engine has gained the ability to natively read HDFS files, such as CSV and parquet files, by using SQL Server instances collocated on each of the HDFS data nodes to filter and aggregate data locally in parallel across all of the HDFS data nodes. **SQL Server 2019** also introduces **new ODBC connectors** to data sources like SQL Server, Oracle, MongoDB and Teradata.
44

55
## Query data in HDFS from SQL Server master
66

7-
In this example, you are going to create an external table in the SQL Server Master instance that points to data in HDFS within the SQL Server Big data cluster. Then you will join the data in the external table with high value data in SQL Master instance.
7+
**Applies to: SQL Server 2019 big data cluster**
88

9-
### Instructions
10-
11-
1. Connect to HDFS/Knox gateway from Azure Data Studio using SQL Server big data cluster connection type.
12-
13-
1. Run the [../spark/spark-sql.ipynb](../spark/spark-sql.ipynb/) notebook to generate the sample parquet file(s).
14-
15-
1. Connect to SQL Server Master instance.
16-
17-
1. Execute the [web-clickstreams-hdfs-csv.sql](web-clickstreams-hdfs-csv.sql). This script demonstrates how to read CSV file(s) stored in HDFS.
18-
19-
1. Execute the [web-clickstreams-parquet.sql](web-clickstreams-hdfs-parquet.sql). This script demonstrates how to read parquet file(s) stored in HDFS.
20-
21-
1. Execute the [product-reviews-hdfs-csv.sql](product-reviews-hdfs-csv.sql). This script demonstrates how to read CSV file(s) stored in HDFS.
9+
In SQL Server 2019 big data cluster, the storage pool consists of HDFS data node with SQL Server & Spark endpoints. The [storage-pool](storage-pool) folder contains SQL scripts that demonstrate how to query data residing in HDFS data inside a big data cluster.
2210

2311
## Query data in Oracle from SQL Server master
2412

25-
In this example, you are going to create an external table in SQL Server Master instance over the inventory table that sits on an Oracle server.
26-
27-
**Before you begin**, you need to have an Oracle instance and credentials. Follow the instruction in the [oracle-setup\README.md](oracle-setup\README.md).
28-
29-
### Instructions
30-
31-
1. Connect to SQL Server Master instance.
13+
**Applies to: SQL Server 2019 on Windows or Linux, SQL Server 2019 big data cluster**
3214

33-
1. Execute the SQL [inventory-oracle.sql](inventory-oracle.sql/).
15+
The [oracle](oracle) folder contains SQL scripts that demonstrate how to query data residing in Oracle instance.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Data virtualization in SQL Server 2019
2+
3+
**Applies to: SQL Server 2019 on Windows or Linux, SQL Server 2019 big data cluster**
4+
5+
SQL Server 2019 introduces new ODBC connectors to data sources like SQL Server, Oracle, MongoDB and Teradata. These connectors can be used from stand-alone SQL Server 2019 on Windows or Linux and SQL Server 2019 big data cluster.
6+
7+
## Query data in Oracle from SQL Server master
8+
9+
In this example, you are going to create an external table in SQL Server Master instance over the inventory table that sits on an Oracle server.
10+
11+
**Before you begin**, you need to have an Oracle instance and credentials. Follow the instruction in the [setup\README.md](setup\README.md).
12+
13+
### Instructions
14+
15+
1. Connect to SQL Server Master instance.
16+
17+
1. Execute the SQL [inventory-oracle.sql](inventory-oracle.sql/).

samples/features/sql-big-data-cluster/data-virtualization/inventory-oracle.sql renamed to samples/features/sql-big-data-cluster/data-virtualization/oracle/inventory-oracle.sql

File renamed without changes.

samples/features/sql-big-data-cluster/data-virtualization/oracle-setup/README.md renamed to samples/features/sql-big-data-cluster/data-virtualization/oracle/setup/README.md

File renamed without changes.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
@echo off
2+
REM bootstrap sample oracle tables CMD script
3+
setlocal enableextensions
4+
set ORACLE_SERVER=%1
5+
set ORACLE_USER=%2
6+
set ORACLE_PASSWORD=%3
7+
8+
if NOT DEFINED ORACLE_SERVER goto :usage
9+
if NOT DEFINED ORACLE_USER goto :usage
10+
if NOT DEFINED ORACLE_PASSWORD goto :usage
11+
12+
echo Verifying sqlplus.exe is in path & CALL WHERE /Q sqlplus.exe || GOTO exit
13+
echo Verifying sqlldr.exe is in path & CALL WHERE /Q sqlldr.exe || GOTO exit
14+
15+
echo Creating user & tables...
16+
echo exit | sqlplus -S %ORACLE_USER%/%ORACLE_PASSWORD%@%ORACLE_SERVER% @sales-user.sql || GOTO exit
17+
echo exit | sqlplus -S %ORACLE_USER%/%ORACLE_PASSWORD%@%ORACLE_SERVER% @inventory.sql || GOTO exit
18+
echo exit | sqlplus -S %ORACLE_USER%/%ORACLE_PASSWORD%@%ORACLE_SERVER% @customer.sql || GOTO exit
19+
20+
echo Loading tables data...
21+
sqlldr CONTROL=inventory.ctl userid=%ORACLE_USER%/%ORACLE_PASSWORD%@%ORACLE_SERVER% || GOTO exit
22+
sqlldr CONTROL=customer.ctl userid=%ORACLE_USER%/%ORACLE_PASSWORD%@%ORACLE_SERVER% || GOTO exit
23+
24+
:: del /q *.out *.err *.csv
25+
endlocal
26+
exit /b 0
27+
goto :eof
28+
29+
:exit
30+
echo Bootstrap of the sample tables failed.
31+
exit /b 1
32+
33+
:usage
34+
echo USAGE: %0 ^<ORACLE_SERVER^> ^<ORACLE_USER^> ^<ORACLE_PASSWORD^>
35+
exit /b 0
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
options (readsize=2048000,bindsize=1600000, rows=100000, silent=(header, feedback) )
2+
load data
3+
infile '..\..\..\customer.csv' "str '\r\n'"
4+
append
5+
into table SALES.CUSTOMER
6+
fields terminated by '|'
7+
OPTIONALLY ENCLOSED BY '"' AND '"'
8+
trailing nullcols
9+
( C_CUSTOMER_SK,
10+
C_CUSTOMER_ID CHAR(16),
11+
C_CURRENT_CDEMO_SK,
12+
C_CURRENT_HDEMO_SK,
13+
C_CURRENT_ADDR_SK,
14+
C_FIRST_SHIPTO_DATE_SK,
15+
C_FIRST_SALES_DATE_SK,
16+
C_SALUTATION CHAR(10),
17+
C_FIRST_NAME CHAR(20),
18+
C_LAST_NAME CHAR(30),
19+
C_PREFERRED_CUST_FLAG CHAR(1),
20+
C_BIRTH_DAY,
21+
C_BIRTH_MONTH,
22+
C_BIRTH_YEAR,
23+
C_BIRTH_COUNTRY CHAR(20),
24+
C_LOGIN CHAR(13),
25+
C_EMAIL_ADDRESS CHAR(50),
26+
C_LAST_REVIEW_DATE CHAR(10)
27+
)
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
-- Customer table over which the SQL Server external table will be defined
2+
CREATE TABLE "SALES"."CUSTOMER"
3+
(
4+
"C_CUSTOMER_SK" NUMBER(10,0),
5+
"C_CUSTOMER_ID" VARCHAR2(16 BYTE),
6+
"C_CURRENT_CDEMO_SK" NUMBER(10,0),
7+
"C_CURRENT_HDEMO_SK" NUMBER(10,0),
8+
"C_CURRENT_ADDR_SK" NUMBER(10,0),
9+
"C_FIRST_SHIPTO_DATE_SK" NUMBER(10,0),
10+
"C_FIRST_SALES_DATE_SK" NUMBER(8,0),
11+
"C_SALUTATION" VARCHAR2(10 BYTE),
12+
"C_FIRST_NAME" VARCHAR2(20 BYTE),
13+
"C_LAST_NAME" VARCHAR2(30 BYTE),
14+
"C_PREFERRED_CUST_FLAG" VARCHAR2(1 BYTE),
15+
"C_BIRTH_DAY" NUMBER(8,0),
16+
"C_BIRTH_MONTH" NUMBER(8,0),
17+
"C_BIRTH_YEAR" NUMBER(8,0),
18+
"C_BIRTH_COUNTRY" VARCHAR2(20 BYTE),
19+
"C_LOGIN" VARCHAR2(13 BYTE),
20+
"C_EMAIL_ADDRESS" VARCHAR2(50 BYTE),
21+
"C_LAST_REVIEW_DATE" VARCHAR2(20 BYTE)
22+
);
23+
24+
CREATE INDEX "SALES"."CUSTOMER_C_CUSTOMER_SK" ON "SALES"."CUSTOMER"("C_CUSTOMER_SK");
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
options (readsize=2048000,bindsize=1600000, rows=100000, silent=(header, feedback) )
2+
load data
3+
infile '..\..\..\inventory.csv' "str '\r\n'"
4+
append
5+
into table SALES.INVENTORY
6+
fields terminated by '|'
7+
OPTIONALLY ENCLOSED BY '"' AND '"'
8+
trailing nullcols
9+
( INV_DATE,
10+
INV_ITEM,
11+
INV_WAREHOUSE,
12+
INV_QUANTITY_ON_HAND
13+
)

samples/features/sql-big-data-cluster/data-virtualization/oracle-setup/inventory.sql renamed to samples/features/sql-big-data-cluster/data-virtualization/oracle/setup/inventory.sql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ CREATE TABLE "SALES"."INVENTORY"
77
"INV_QUANTITY_ON_HAND" NUMBER(10,0)
88
);
99

10-
CREATE INDEX INV_ITEM ON "SALES"."INVENTORY"("INV_ITEM");
10+
CREATE INDEX "SALES"."INVENTORY_INV_ITEM" ON "SALES"."INVENTORY"("INV_ITEM");

0 commit comments

Comments
 (0)