Skip to content

Commit f5f429e

Browse files
Introducing Dataplat.Dbatools.Csv (#33)
1 parent 65c91d5 commit f5f429e

42 files changed

Lines changed: 10380 additions & 8 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# dbatools.library
22

33
[![PowerShell Gallery](https://img.shields.io/powershellgallery/v/dbatools.library)](https://www.powershellgallery.com/packages/dbatools.library)
4+
[![NuGet - Dataplat.Dbatools.Csv](https://img.shields.io/nuget/v/Dataplat.Dbatools.Csv.svg?label=nuget%20-%20Csv)](https://www.nuget.org/packages/Dataplat.Dbatools.Csv)
45
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
56

67
The library that powers [dbatools](https://dbatools.io), the community module for SQL Server professionals.
@@ -13,10 +14,31 @@ dbatools.library is a .NET library that provides the core functionality for the
1314
- Microsoft.Data.SqlClient for SQL Server connectivity
1415
- DacFx for database deployment operations
1516
- Extended Events (XEvent) processing capabilities
17+
- **High-performance CSV reader** for bulk data import (also available as standalone NuGet package)
1618
- Multi-framework support (.NET Framework 4.7.2 and .NET 8.0)
1719

1820
This library enables dbatools to work seamlessly across Windows PowerShell 5.1 and PowerShell 7+ on Windows, macOS, and Linux.
1921

22+
## Standalone NuGet Packages
23+
24+
### Dataplat.Dbatools.Csv
25+
26+
[![NuGet](https://img.shields.io/nuget/v/Dataplat.Dbatools.Csv.svg)](https://www.nuget.org/packages/Dataplat.Dbatools.Csv)
27+
28+
High-performance CSV reader and writer for .NET. **20%+ faster than LumenWorks CsvReader** with modern features:
29+
30+
- Streaming `IDataReader` for SqlBulkCopy (~25,000 rows/sec)
31+
- Automatic compression support (GZip, Deflate, Brotli, ZLib)
32+
- Parallel processing for large files
33+
- Multi-character delimiters, smart quote handling
34+
- Robust error handling and security protections
35+
36+
```bash
37+
dotnet add package Dataplat.Dbatools.Csv
38+
```
39+
40+
See the [CSV package documentation](project/Dataplat.Dbatools.Csv/README.md) for full details.
41+
2042
## Installation
2143

2244
Install from the PowerShell Gallery:
@@ -137,6 +159,8 @@ The library targets both:
137159
dbatools.library/
138160
├── project/
139161
│ ├── dbatools/ # Main C# library project
162+
│ │ └── Csv/ # CSV reader/writer source
163+
│ ├── Dataplat.Dbatools.Csv/ # Standalone CSV NuGet package
140164
│ ├── dbatools.Tests/ # Unit tests
141165
│ └── dbatools.sln # Solution file
142166
├── build/ # Build scripts
@@ -177,6 +201,12 @@ This library includes several major SQL Server components:
177201
| Microsoft.AnalysisServices | 19.101.1 | Analysis Services management |
178202
| Microsoft.SqlServer.XEvent.XELite | 2024.2.5.1 | Extended Events processing |
179203

204+
### Standalone Packages
205+
206+
| Package | Purpose |
207+
|---------|---------|
208+
| [Dataplat.Dbatools.Csv](https://www.nuget.org/packages/Dataplat.Dbatools.Csv) | High-performance CSV reader/writer for .NET |
209+
180210
## Contributing
181211

182212
Contributions are welcome! This library is primarily maintained by the dbatools team.

build/build-csv.ps1

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
param(
2+
[string]$Version,
3+
[switch]$Sign,
4+
[switch]$Publish,
5+
[string]$NuGetApiKey
6+
)
7+
8+
$ErrorActionPreference = 'Stop'
9+
$ProgressPreference = 'SilentlyContinue'
10+
11+
# Get script root and project root
12+
$scriptroot = $PSScriptRoot
13+
if (-not $scriptroot) {
14+
$scriptroot = Split-Path -Path $MyInvocation.MyCommand.Path
15+
}
16+
$root = Split-Path -Path $scriptroot
17+
$csvProjectPath = Join-Path $root "project\Dataplat.Dbatools.Csv"
18+
$csvCsproj = Join-Path $csvProjectPath "Dataplat.Dbatools.Csv.csproj"
19+
$artifactsDir = Join-Path $root "artifacts"
20+
$csvArtifacts = Join-Path $artifactsDir "csv"
21+
22+
Write-Host "=== Dataplat.Dbatools.Csv Build Script ===" -ForegroundColor Cyan
23+
Write-Host ""
24+
25+
# Update or read version using XML parsing (safer than regex)
26+
[xml]$csproj = Get-Content $csvCsproj
27+
$propertyGroup = $csproj.Project.PropertyGroup | Where-Object { $_.Version } | Select-Object -First 1
28+
29+
if ($Version) {
30+
Write-Host "Updating version to: $Version" -ForegroundColor Yellow
31+
$propertyGroup.Version = $Version
32+
$csproj.Save($csvCsproj)
33+
} else {
34+
$Version = $propertyGroup.Version
35+
Write-Host "Building version: $Version" -ForegroundColor Yellow
36+
}
37+
38+
# Clean and create artifacts directory
39+
if (Test-Path $csvArtifacts) {
40+
Remove-Item $csvArtifacts -Recurse -Force
41+
}
42+
$null = New-Item -ItemType Directory -Path $csvArtifacts -Force
43+
44+
# Clean previous builds
45+
Write-Host "Cleaning previous builds..." -ForegroundColor Yellow
46+
Push-Location $csvProjectPath
47+
try {
48+
dotnet clean -c Release --nologo 2>$null
49+
} catch { }
50+
Pop-Location
51+
52+
# Build first (without packing) so we can sign the DLLs
53+
Write-Host "Building project..." -ForegroundColor Yellow
54+
Push-Location $csvProjectPath
55+
try {
56+
dotnet build -c Release --nologo
57+
if ($LASTEXITCODE -ne 0) {
58+
throw "dotnet build failed with exit code $LASTEXITCODE"
59+
}
60+
} finally {
61+
Pop-Location
62+
}
63+
64+
# Sign the DLLs BEFORE packing if requested
65+
if ($Sign) {
66+
if (Get-Command Invoke-DbatoolsTrustedSigning -ErrorAction SilentlyContinue) {
67+
Write-Host ""
68+
Write-Host "=== Signing with Azure Trusted Signing ===" -ForegroundColor Cyan
69+
70+
# Find built DLLs in the bin/Release folders
71+
$binPath = Join-Path $csvProjectPath "bin\Release"
72+
$dllsToSign = Get-ChildItem -Path $binPath -Filter "*.dll" -Recurse |
73+
Where-Object { $_.Name -like "*Dbatools*" -or $_.Name -like "*Dataplat*" }
74+
75+
if ($dllsToSign) {
76+
Write-Host "Signing $($dllsToSign.Count) DLL(s)..." -ForegroundColor Yellow
77+
78+
foreach ($dll in $dllsToSign) {
79+
Write-Host " Signing: $($dll.Name)" -ForegroundColor Gray
80+
$result = $dll.FullName | Invoke-DbatoolsTrustedSigning
81+
if ($result.Status -ne 'Valid') {
82+
throw "Signing failed for $($dll.Name): Status '$($result.Status)'. Cannot continue with unsigned DLLs."
83+
}
84+
Write-Host " Signed (Thumbprint: $($result.Thumbprint))" -ForegroundColor Green
85+
}
86+
} else {
87+
Write-Host "No DLLs found to sign" -ForegroundColor Yellow
88+
}
89+
} else {
90+
Write-Warning "Invoke-DbatoolsTrustedSigning not found - skipping signing"
91+
}
92+
}
93+
94+
# Now pack (will include the signed DLLs)
95+
Write-Host "Packing NuGet package..." -ForegroundColor Yellow
96+
Push-Location $csvProjectPath
97+
try {
98+
dotnet pack -c Release --nologo --no-build -o $csvArtifacts
99+
if ($LASTEXITCODE -ne 0) {
100+
throw "dotnet pack failed with exit code $LASTEXITCODE"
101+
}
102+
} finally {
103+
Pop-Location
104+
}
105+
106+
# Find the generated packages
107+
$nupkg = Get-ChildItem -Path $csvArtifacts -Filter "*.nupkg" | Select-Object -First 1
108+
$snupkg = Get-ChildItem -Path $csvArtifacts -Filter "*.snupkg" | Select-Object -First 1
109+
110+
if (-not $nupkg) {
111+
throw "No .nupkg file found in $csvArtifacts"
112+
}
113+
114+
Write-Host "Package created: $($nupkg.Name)" -ForegroundColor Green
115+
if ($snupkg) {
116+
Write-Host "Symbols package: $($snupkg.Name)" -ForegroundColor Green
117+
}
118+
119+
# Publish to NuGet if requested
120+
if ($Publish) {
121+
if (-not $NuGetApiKey) {
122+
$NuGetApiKey = $env:NUGET_API_KEY
123+
}
124+
125+
if (-not $NuGetApiKey) {
126+
Write-Warning "No NuGet API key provided. Set -NuGetApiKey or `$env:NUGET_API_KEY"
127+
} else {
128+
Write-Host ""
129+
Write-Host "=== Publishing to NuGet ===" -ForegroundColor Cyan
130+
131+
dotnet nuget push $nupkg.FullName --api-key $NuGetApiKey --source https://api.nuget.org/v3/index.json --skip-duplicate
132+
133+
if ($snupkg) {
134+
dotnet nuget push $snupkg.FullName --api-key $NuGetApiKey --source https://api.nuget.org/v3/index.json --skip-duplicate
135+
}
136+
137+
if ($LASTEXITCODE -eq 0) {
138+
Write-Host "Published to NuGet!" -ForegroundColor Green
139+
}
140+
}
141+
}
142+
143+
Write-Host ""
144+
Write-Host "=== Build Complete ===" -ForegroundColor Cyan
145+
Write-Host "Output: $csvArtifacts" -ForegroundColor White
146+
Get-ChildItem $csvArtifacts | ForEach-Object {
147+
Write-Host " $($_.Name)" -ForegroundColor Gray
148+
}

build/build.ps1

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,6 @@ if (-not $bogusCoreCopied) {
215215
Copy-Item (Join-Path $tempPath "LumenWorksCsvReader/lib/net461/LumenWorks.Framework.IO.dll") -Destination (Join-Path $libPath "desktop/third-party/LumenWorks/LumenWorks.Framework.IO.dll") -Force
216216
Copy-Item (Join-Path $tempPath "LumenWorksCsvReader/lib/netstandard2.0/LumenWorks.Framework.IO.dll") -Destination (Join-Path $libPath "core/third-party/LumenWorks/LumenWorks.Framework.IO.dll") -Force
217217

218-
219218
# Core files are already in place from dotnet publish
220219

221220
# Copy var/misc files to appropriate locations

dbatools.library.psd1

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
#
88
@{
99
# Version number of this module.
10-
ModuleVersion = '2025.11.2'
10+
ModuleVersion = '2025.11.28'
1111

1212
# ID used to uniquely identify this module
1313
GUID = '00b61a37-6c36-40d8-8865-ac0180288c84'

dbatools.library.psm1

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -166,12 +166,6 @@ try {
166166
Write-Error "Could not import $assemblyPath : $($_ | Out-String)"
167167
}
168168

169-
try {
170-
$null = Import-Module ([IO.Path]::Combine($script:libraryroot, "third-party", "LumenWorks", "LumenWorks.Framework.IO.dll"))
171-
} catch {
172-
Write-Error "Could not import LumenWorks.Framework.IO.dll : $($_ | Out-String)"
173-
}
174-
175169
foreach ($name in $names) {
176170
# REMOVED win-sqlclient handling and mac-specific logic since files are in standard lib folder
177171

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Changelog
2+
3+
All notable changes to Dataplat.Dbatools.Csv will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [Unreleased]
9+
10+
### Added
11+
- Null and empty string validation for `Delimiter` property to prevent runtime errors
12+
- Decompression bomb protection tests verifying `LimitedReadStream` security feature
13+
14+
### Changed
15+
- Build script now uses XML parsing for version updates instead of regex (safer for edge cases)
16+
17+
## [1.0.0] - 2024-11-28
18+
19+
### Added
20+
- High-performance CSV reader implementing `IDataReader` for seamless `SqlBulkCopy` integration
21+
- High-performance CSV writer with streaming support
22+
- Multi-character delimiter support (e.g., `::`, `||`, `\t`)
23+
- Automatic compression detection and handling (GZip, Deflate, Brotli, ZLib)
24+
- Decompression bomb protection via configurable `MaxDecompressedSize` limit (default 10GB)
25+
- Parallel processing pipeline for 2-4x performance on large files
26+
- String interning for reduced memory allocations on repeated values
27+
- Culture-aware type conversion with customizable converters
28+
- Configurable error handling: throw, skip, or collect parse errors
29+
- Quote handling modes: Strict (RFC 4180) and Lenient for malformed data
30+
- Mismatched field count handling: throw, pad with nulls, or truncate
31+
- Duplicate header handling options
32+
- Static column injection for adding computed values to each record
33+
- Column filtering (include/exclude)
34+
- `DistinguishEmptyFromNull` option for precise null vs empty string semantics
35+
- Smart/curly quote normalization
36+
- Skip rows feature for files with preamble content
37+
- SourceLink support for debugging NuGet packages
38+
39+
### Performance
40+
- 20%+ faster than LumenWorks CsvReader in benchmarks
41+
- 64KB default buffer size (vs 4KB in LumenWorks)
42+
- Span-based parsing with `ArrayPool<T>` for minimal allocations
43+
- SIMD-optimized delimiter matching via `Span.SequenceEqual`
44+
- Zero-copy direct field parsing where possible
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
<Project Sdk="Microsoft.NET.Sdk">
2+
<PropertyGroup>
3+
<TargetFrameworks>net472;net8.0</TargetFrameworks>
4+
<RootNamespace>Dataplat.Dbatools.Csv</RootNamespace>
5+
<AssemblyName>Dataplat.Dbatools.Csv</AssemblyName>
6+
<LangVersion>7.3</LangVersion>
7+
8+
<!-- NuGet Package Metadata -->
9+
<PackageId>Dataplat.Dbatools.Csv</PackageId>
10+
<Version>1.0.0</Version>
11+
<Authors>dbatools team</Authors>
12+
<Company>Dataplat</Company>
13+
<Product>Dataplat.Dbatools.Csv</Product>
14+
<Description>High-performance CSV reader and writer for .NET. Features streaming IDataReader for SqlBulkCopy, automatic compression (GZip, Deflate, Brotli, ZLib), multi-character delimiters, parallel processing, string interning, and robust error handling. 20%+ faster than LumenWorks CsvReader. From the trusted dbatools project.</Description>
15+
<Copyright>Copyright (c) dbatools team</Copyright>
16+
<PackageTags>csv;parser;reader;writer;datareader;idatareader;sqlbulkcopy;compression;gzip;brotli;dbatools;high-performance;parallel</PackageTags>
17+
<PackageLicenseExpression>MIT</PackageLicenseExpression>
18+
<PackageProjectUrl>https://github.com/dataplat/dbatools.library</PackageProjectUrl>
19+
<RepositoryUrl>https://github.com/dataplat/dbatools.library</RepositoryUrl>
20+
<RepositoryType>git</RepositoryType>
21+
<RepositoryBranch>main</RepositoryBranch>
22+
<PackageReadmeFile>README.md</PackageReadmeFile>
23+
<PackageReleaseNotes>Initial release with high-performance CSV parsing, parallel processing support, and comprehensive edge case handling.</PackageReleaseNotes>
24+
<PublishRepositoryUrl>true</PublishRepositoryUrl>
25+
<EmbedUntrackedSources>true</EmbedUntrackedSources>
26+
<IncludeSymbols>true</IncludeSymbols>
27+
<SymbolPackageFormat>snupkg</SymbolPackageFormat>
28+
29+
<!-- Build settings -->
30+
<GenerateDocumentationFile>true</GenerateDocumentationFile>
31+
<NoWarn>CS1591</NoWarn>
32+
<Nullable>disable</Nullable>
33+
<ImplicitUsings>disable</ImplicitUsings>
34+
</PropertyGroup>
35+
36+
<PropertyGroup Condition="'$(Configuration)' == 'Release'">
37+
<DebugType>pdbonly</DebugType>
38+
<Optimize>true</Optimize>
39+
</PropertyGroup>
40+
41+
<!-- Link to CSV source files from main dbatools project -->
42+
<ItemGroup>
43+
<Compile Include="..\dbatools\Csv\**\*.cs" Link="%(RecursiveDir)%(Filename)%(Extension)" />
44+
</ItemGroup>
45+
46+
<!-- Polyfill packages needed for net472 -->
47+
<ItemGroup Condition="'$(TargetFramework)' == 'net472'">
48+
<PackageReference Include="System.Buffers" Version="4.5.1" />
49+
<PackageReference Include="System.Memory" Version="4.5.5" />
50+
</ItemGroup>
51+
52+
<!-- Package README -->
53+
<ItemGroup>
54+
<None Include="README.md" Pack="true" PackagePath="\" />
55+
</ItemGroup>
56+
57+
<!-- SourceLink for debugging support -->
58+
<ItemGroup>
59+
<PackageReference Include="Microsoft.SourceLink.GitHub" Version="8.0.0" PrivateAssets="All" />
60+
</ItemGroup>
61+
</Project>

0 commit comments

Comments
 (0)