Skip to content

Commit 163546d

Browse files
dataroaringclaude
andcommitted
Add comprehensive migration documentation section
Create consolidated migration guides covering: - Overview page with migration path comparison table - PostgreSQL to Doris (JDBC Catalog, Flink CDC, Export/Import) - MySQL to Doris (Flink CDC, JDBC Catalog, DataX) - Elasticsearch to Doris (ES Catalog, inverted index migration) - Other OLAP systems (ClickHouse, Greenplum, Hive/Iceberg/Hudi) Each guide includes data type mappings, step-by-step instructions, and troubleshooting for common issues. Chinese translations included. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 9d323ee commit 163546d

11 files changed

Lines changed: 3638 additions & 0 deletions

File tree

docs/migration/elasticsearch-to-doris.md

Lines changed: 450 additions & 0 deletions
Large diffs are not rendered by default.

docs/migration/mysql-to-doris.md

Lines changed: 408 additions & 0 deletions
Large diffs are not rendered by default.

docs/migration/other-olap-to-doris.md

Lines changed: 457 additions & 0 deletions
Large diffs are not rendered by default.

docs/migration/overview.md

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
---
2+
{
3+
"title": "Migration Overview",
4+
"language": "en",
5+
"description": "Guide to migrating data from various databases and data systems to Apache Doris"
6+
}
7+
---
8+
9+
Apache Doris provides multiple methods to migrate data from various source systems. This guide helps you choose the best migration approach based on your source system and requirements.
10+
11+
## Migration Paths
12+
13+
| Source System | Recommended Method | Real-time Sync | Full Migration | Incremental |
14+
|---------------|-------------------|----------------|----------------|-------------|
15+
| [PostgreSQL](./postgresql-to-doris.md) | JDBC Catalog / Flink CDC | Yes | Yes | Yes |
16+
| [MySQL](./mysql-to-doris.md) | Flink CDC / JDBC Catalog | Yes | Yes | Yes |
17+
| [Elasticsearch](./elasticsearch-to-doris.md) | ES Catalog | No | Yes | Manual |
18+
| [ClickHouse](./other-olap-to-doris.md#clickhouse) | JDBC Catalog | No | Yes | Manual |
19+
| [Greenplum](./other-olap-to-doris.md#greenplum) | JDBC Catalog | No | Yes | Manual |
20+
| [Hive/Iceberg/Hudi](./other-olap-to-doris.md#data-lake) | Multi-Catalog | No | Yes | Yes |
21+
22+
## Choosing a Migration Method
23+
24+
### Catalog-Based Migration (Recommended)
25+
26+
Doris's [Multi-Catalog](../lakehouse/lakehouse-overview.md) feature allows you to directly query external data sources without data movement. This is the recommended approach for:
27+
28+
- **Initial exploration**: Query source data before deciding on migration strategy
29+
- **Hybrid queries**: Join data across Doris and external sources
30+
- **Incremental migration**: Gradually move data while keeping source accessible
31+
32+
```sql
33+
-- Create a catalog to connect to your source
34+
CREATE CATALOG pg_catalog PROPERTIES (
35+
'type' = 'jdbc',
36+
'user' = 'username',
37+
'password' = 'password',
38+
'jdbc_url' = 'jdbc:postgresql://host:5432/database',
39+
'driver_url' = 'postgresql-42.5.6.jar',
40+
'driver_class' = 'org.postgresql.Driver'
41+
);
42+
43+
-- Query source data directly
44+
SELECT * FROM pg_catalog.schema_name.table_name LIMIT 10;
45+
46+
-- Migrate data with INSERT INTO SELECT
47+
INSERT INTO doris_db.doris_table
48+
SELECT * FROM pg_catalog.schema_name.source_table;
49+
```
50+
51+
### Flink CDC (Real-time Synchronization)
52+
53+
[Flink CDC](../ecosystem/flink-doris-connector.md) is ideal for:
54+
55+
- **Real-time data sync**: Capture changes as they happen
56+
- **Full database migration**: Sync entire databases with automatic table creation
57+
- **Zero-downtime migration**: Keep source and Doris in sync during transition
58+
59+
### Export-Import Method
60+
61+
For scenarios where direct connectivity is limited:
62+
63+
1. Export data from source system to files (CSV, Parquet, JSON)
64+
2. Stage files in object storage (S3, GCS, HDFS)
65+
3. Load into Doris using [S3 Load](../data-operate/import/data-source/amazon-s3.md) or [Broker Load](../data-operate/import/import-way/broker-load-manual.md)
66+
67+
## Migration Planning Checklist
68+
69+
Before migrating, consider the following:
70+
71+
1. **Data Volume Assessment**
72+
- Total data size and row count
73+
- Daily/hourly data growth rate
74+
- Historical data retention requirements
75+
76+
2. **Schema Design**
77+
- Choose appropriate [Data Model](../table-design/data-model/overview.md) (Duplicate, Unique, Aggregate)
78+
- Plan [Partitioning](../table-design/data-partitioning/data-distribution.md) strategy
79+
- Define [Bucketing](../table-design/data-partitioning/data-bucketing.md) keys
80+
81+
3. **Data Type Mapping**
82+
- Review type compatibility (see migration guides for specific mappings)
83+
- Handle special types (arrays, JSON, timestamps with timezone)
84+
85+
4. **Performance Requirements**
86+
- Query latency expectations
87+
- Concurrent query load
88+
- Data freshness requirements
89+
90+
5. **Migration Window**
91+
- Acceptable downtime (if any)
92+
- Sync vs. async migration needs
93+
94+
## Best Practices
95+
96+
### Start with a Pilot Table
97+
98+
Before migrating your entire database, test with a representative table:
99+
100+
```sql
101+
-- 1. Create the Doris table with appropriate schema
102+
CREATE TABLE pilot_table (
103+
id INT,
104+
created_at DATETIME,
105+
data VARCHAR(255)
106+
)
107+
UNIQUE KEY(id)
108+
DISTRIBUTED BY HASH(id) BUCKETS 8;
109+
110+
-- 2. Migrate data
111+
INSERT INTO pilot_table
112+
SELECT id, created_at, data
113+
FROM source_catalog.db.source_table;
114+
115+
-- 3. Validate row counts
116+
SELECT COUNT(*) FROM pilot_table;
117+
SELECT COUNT(*) FROM source_catalog.db.source_table;
118+
```
119+
120+
### Batch Large Migrations
121+
122+
For tables with billions of rows, migrate in batches:
123+
124+
```sql
125+
-- Migrate by date range
126+
INSERT INTO doris_table
127+
SELECT * FROM source_catalog.db.source_table
128+
WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01';
129+
```
130+
131+
### Monitor Migration Progress
132+
133+
Track load jobs using:
134+
135+
```sql
136+
-- Check active load jobs
137+
SHOW LOAD WHERE STATE = 'LOADING';
138+
139+
-- Check recent load history
140+
SHOW LOAD ORDER BY CreateTime DESC LIMIT 10;
141+
```
142+
143+
## Next Steps
144+
145+
Choose your source system to see detailed migration instructions:
146+
147+
- [PostgreSQL to Doris](./postgresql-to-doris.md)
148+
- [MySQL to Doris](./mysql-to-doris.md)
149+
- [Elasticsearch to Doris](./elasticsearch-to-doris.md)
150+
- [Other OLAP Systems to Doris](./other-olap-to-doris.md)

0 commit comments

Comments
 (0)