|
| 1 | +--- |
| 2 | +{ |
| 3 | + "title": "Migration Overview", |
| 4 | + "language": "en", |
| 5 | + "description": "Guide to migrating data from various databases and data systems to Apache Doris" |
| 6 | +} |
| 7 | +--- |
| 8 | + |
| 9 | +Apache Doris provides multiple methods to migrate data from various source systems. This guide helps you choose the best migration approach based on your source system and requirements. |
| 10 | + |
| 11 | +## Migration Paths |
| 12 | + |
| 13 | +| Source System | Recommended Method | Real-time Sync | Full Migration | Incremental | |
| 14 | +|---------------|-------------------|----------------|----------------|-------------| |
| 15 | +| [PostgreSQL](./postgresql-to-doris.md) | JDBC Catalog / Flink CDC | Yes | Yes | Yes | |
| 16 | +| [MySQL](./mysql-to-doris.md) | Flink CDC / JDBC Catalog | Yes | Yes | Yes | |
| 17 | +| [Elasticsearch](./elasticsearch-to-doris.md) | ES Catalog | No | Yes | Manual | |
| 18 | +| [ClickHouse](./other-olap-to-doris.md#clickhouse) | JDBC Catalog | No | Yes | Manual | |
| 19 | +| [Greenplum](./other-olap-to-doris.md#greenplum) | JDBC Catalog | No | Yes | Manual | |
| 20 | +| [Hive/Iceberg/Hudi](./other-olap-to-doris.md#data-lake) | Multi-Catalog | No | Yes | Yes | |
| 21 | + |
| 22 | +## Choosing a Migration Method |
| 23 | + |
| 24 | +### Catalog-Based Migration (Recommended) |
| 25 | + |
| 26 | +Doris's [Multi-Catalog](../lakehouse/lakehouse-overview.md) feature allows you to directly query external data sources without data movement. This is the recommended approach for: |
| 27 | + |
| 28 | +- **Initial exploration**: Query source data before deciding on migration strategy |
| 29 | +- **Hybrid queries**: Join data across Doris and external sources |
| 30 | +- **Incremental migration**: Gradually move data while keeping source accessible |
| 31 | + |
| 32 | +```sql |
| 33 | +-- Create a catalog to connect to your source |
| 34 | +CREATE CATALOG pg_catalog PROPERTIES ( |
| 35 | + 'type' = 'jdbc', |
| 36 | + 'user' = 'username', |
| 37 | + 'password' = 'password', |
| 38 | + 'jdbc_url' = 'jdbc:postgresql://host:5432/database', |
| 39 | + 'driver_url' = 'postgresql-42.5.6.jar', |
| 40 | + 'driver_class' = 'org.postgresql.Driver' |
| 41 | +); |
| 42 | + |
| 43 | +-- Query source data directly |
| 44 | +SELECT * FROM pg_catalog.schema_name.table_name LIMIT 10; |
| 45 | + |
| 46 | +-- Migrate data with INSERT INTO SELECT |
| 47 | +INSERT INTO doris_db.doris_table |
| 48 | +SELECT * FROM pg_catalog.schema_name.source_table; |
| 49 | +``` |
| 50 | + |
| 51 | +### Flink CDC (Real-time Synchronization) |
| 52 | + |
| 53 | +[Flink CDC](../ecosystem/flink-doris-connector.md) is ideal for: |
| 54 | + |
| 55 | +- **Real-time data sync**: Capture changes as they happen |
| 56 | +- **Full database migration**: Sync entire databases with automatic table creation |
| 57 | +- **Zero-downtime migration**: Keep source and Doris in sync during transition |
| 58 | + |
| 59 | +### Export-Import Method |
| 60 | + |
| 61 | +For scenarios where direct connectivity is limited: |
| 62 | + |
| 63 | +1. Export data from source system to files (CSV, Parquet, JSON) |
| 64 | +2. Stage files in object storage (S3, GCS, HDFS) |
| 65 | +3. Load into Doris using [S3 Load](../data-operate/import/data-source/amazon-s3.md) or [Broker Load](../data-operate/import/import-way/broker-load-manual.md) |
| 66 | + |
| 67 | +## Migration Planning Checklist |
| 68 | + |
| 69 | +Before migrating, consider the following: |
| 70 | + |
| 71 | +1. **Data Volume Assessment** |
| 72 | + - Total data size and row count |
| 73 | + - Daily/hourly data growth rate |
| 74 | + - Historical data retention requirements |
| 75 | + |
| 76 | +2. **Schema Design** |
| 77 | + - Choose appropriate [Data Model](../table-design/data-model/overview.md) (Duplicate, Unique, Aggregate) |
| 78 | + - Plan [Partitioning](../table-design/data-partitioning/data-distribution.md) strategy |
| 79 | + - Define [Bucketing](../table-design/data-partitioning/data-bucketing.md) keys |
| 80 | + |
| 81 | +3. **Data Type Mapping** |
| 82 | + - Review type compatibility (see migration guides for specific mappings) |
| 83 | + - Handle special types (arrays, JSON, timestamps with timezone) |
| 84 | + |
| 85 | +4. **Performance Requirements** |
| 86 | + - Query latency expectations |
| 87 | + - Concurrent query load |
| 88 | + - Data freshness requirements |
| 89 | + |
| 90 | +5. **Migration Window** |
| 91 | + - Acceptable downtime (if any) |
| 92 | + - Sync vs. async migration needs |
| 93 | + |
| 94 | +## Best Practices |
| 95 | + |
| 96 | +### Start with a Pilot Table |
| 97 | + |
| 98 | +Before migrating your entire database, test with a representative table: |
| 99 | + |
| 100 | +```sql |
| 101 | +-- 1. Create the Doris table with appropriate schema |
| 102 | +CREATE TABLE pilot_table ( |
| 103 | + id INT, |
| 104 | + created_at DATETIME, |
| 105 | + data VARCHAR(255) |
| 106 | +) |
| 107 | +UNIQUE KEY(id) |
| 108 | +DISTRIBUTED BY HASH(id) BUCKETS 8; |
| 109 | + |
| 110 | +-- 2. Migrate data |
| 111 | +INSERT INTO pilot_table |
| 112 | +SELECT id, created_at, data |
| 113 | +FROM source_catalog.db.source_table; |
| 114 | + |
| 115 | +-- 3. Validate row counts |
| 116 | +SELECT COUNT(*) FROM pilot_table; |
| 117 | +SELECT COUNT(*) FROM source_catalog.db.source_table; |
| 118 | +``` |
| 119 | + |
| 120 | +### Batch Large Migrations |
| 121 | + |
| 122 | +For tables with billions of rows, migrate in batches: |
| 123 | + |
| 124 | +```sql |
| 125 | +-- Migrate by date range |
| 126 | +INSERT INTO doris_table |
| 127 | +SELECT * FROM source_catalog.db.source_table |
| 128 | +WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01'; |
| 129 | +``` |
| 130 | + |
| 131 | +### Monitor Migration Progress |
| 132 | + |
| 133 | +Track load jobs using: |
| 134 | + |
| 135 | +```sql |
| 136 | +-- Check active load jobs |
| 137 | +SHOW LOAD WHERE STATE = 'LOADING'; |
| 138 | + |
| 139 | +-- Check recent load history |
| 140 | +SHOW LOAD ORDER BY CreateTime DESC LIMIT 10; |
| 141 | +``` |
| 142 | + |
| 143 | +## Next Steps |
| 144 | + |
| 145 | +Choose your source system to see detailed migration instructions: |
| 146 | + |
| 147 | +- [PostgreSQL to Doris](./postgresql-to-doris.md) |
| 148 | +- [MySQL to Doris](./mysql-to-doris.md) |
| 149 | +- [Elasticsearch to Doris](./elasticsearch-to-doris.md) |
| 150 | +- [Other OLAP Systems to Doris](./other-olap-to-doris.md) |
0 commit comments