|
1 | | -# Data virtualization in SQL Server 2019 big data cluster |
| 1 | +# Data virtualization in SQL Server 2019 and SQL Server 2019 big data cluster |
2 | 2 |
|
3 | | -In SQL Server 2019 big data clusters, the SQL Server engine has gained the ability to natively read HDFS files, such as CSV and parquet files, by using SQL Server instances collocated on each of the HDFS data nodes to filter and aggregate data locally in parallel across all of the HDFS data nodes. SQL Server 2019 introduces new ODBC connectors to data sources like SQL Server, Oracle, MongoDB and Teradata. |
| 3 | +In **SQL Server 2019 big data clusters**, the SQL Server engine has gained the ability to natively read HDFS files, such as CSV and parquet files, by using SQL Server instances collocated on each of the HDFS data nodes to filter and aggregate data locally in parallel across all of the HDFS data nodes. **SQL Server 2019** also introduces **new ODBC connectors** to data sources like SQL Server, Oracle, MongoDB and Teradata. |
4 | 4 |
|
5 | 5 | ## Query data in HDFS from SQL Server master |
6 | 6 |
|
7 | | -In this example, you are going to create an external table in the SQL Server Master instance that points to data in HDFS within the SQL Server Big data cluster. Then you will join the data in the external table with high value data in SQL Master instance. |
| 7 | +**Applies to: SQL Server 2019 big data cluster** |
8 | 8 |
|
9 | | -### Instructions |
10 | | - |
11 | | -1. Connect to HDFS/Knox gateway from Azure Data Studio using SQL Server big data cluster connection type. |
12 | | - |
13 | | -1. Run the [../spark/spark-sql.ipynb](../spark/spark-sql.ipynb/) notebook to generate the sample parquet file(s). |
14 | | - |
15 | | -1. Connect to SQL Server Master instance. |
16 | | - |
17 | | -1. Execute the [web-clickstreams-hdfs-csv.sql](web-clickstreams-hdfs-csv.sql). This script demonstrates how to read CSV file(s) stored in HDFS. |
18 | | - |
19 | | -1. Execute the [web-clickstreams-parquet.sql](web-clickstreams-hdfs-parquet.sql). This script demonstrates how to read parquet file(s) stored in HDFS. |
20 | | - |
21 | | -1. Execute the [product-reviews-hdfs-csv.sql](product-reviews-hdfs-csv.sql). This script demonstrates how to read CSV file(s) stored in HDFS. |
| 9 | +In SQL Server 2019 big data cluster, the storage pool consists of HDFS data node with SQL Server & Spark endpoints. The [storage-pool](storage-pool) folder contains SQL scripts that demonstrate how to query data residing in HDFS data inside a big data cluster. |
22 | 10 |
|
23 | 11 | ## Query data in Oracle from SQL Server master |
24 | 12 |
|
25 | | -In this example, you are going to create an external table in SQL Server Master instance over the inventory table that sits on an Oracle server. |
26 | | - |
27 | | -**Before you begin**, you need to have an Oracle instance and credentials. Follow the instruction in the [oracle-setup\README.md](oracle-setup\README.md). |
28 | | - |
29 | | -### Instructions |
30 | | - |
31 | | -1. Connect to SQL Server Master instance. |
| 13 | +**Applies to: SQL Server 2019 on Windows or Linux, SQL Server 2019 big data cluster** |
32 | 14 |
|
33 | | -1. Execute the SQL [inventory-oracle.sql](inventory-oracle.sql/). |
| 15 | +The [oracle](oracle) folder contains SQL scripts that demonstrate how to query data residing in Oracle instance. |
0 commit comments