You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Remove how-to instructions, keep principles and type mappings
Strip step-by-step code examples from all migration docs (EN + ZH-CN)
since their accuracy is unverified. Retain considerations/principles,
data type mapping tables, reference tables (SQL conversion, DSL-to-SQL,
table engine mapping), brief migration option descriptions with links
to official Doris docs, and validation checklists.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@@ -16,6 +16,14 @@ This guide covers migrating data from MySQL to Apache Doris. MySQL is one of the
16
16
17
17
3.**Full Database Sync**: The Flink Doris Connector supports synchronizing entire MySQL databases including DDL operations.
18
18
19
+
4.**Auto Increment Columns**: MySQL AUTO_INCREMENT columns can map to Doris's auto-increment feature. When migrating, you can preserve original IDs by explicitly specifying column values.
20
+
21
+
5.**ENUM and SET Types**: MySQL ENUM and SET types are migrated as STRING in Doris.
22
+
23
+
6.**Binary Data**: Binary data (BLOB, BINARY) is typically stored as STRING. Consider using HEX encoding for binary data during migration.
24
+
25
+
7.**Large Table Performance**: For tables with billions of rows, consider increasing Flink parallelism, tuning Doris write buffer, and using batch mode for initial load.
26
+
19
27
## Data Type Mapping
20
28
21
29
| MySQL Type | Doris Type | Notes |
@@ -54,352 +62,17 @@ Flink CDC captures MySQL binlog changes and streams them to Doris. This is the r
54
62
- Full database migration with automatic table creation
55
63
- Continuous sync with schema evolution support
56
64
57
-
#### Prerequisites
58
-
59
-
- MySQL 5.7+ or 8.0+ with binlog enabled
60
-
- Flink 1.15+ with Flink CDC 3.x and Flink Doris Connector
61
-
62
-
#### Step 1: Configure MySQL Binlog
63
-
64
-
Ensure these settings in MySQL:
65
-
66
-
```ini
67
-
[mysqld]
68
-
server-id = 1
69
-
log_bin = mysql-bin
70
-
binlog_format = ROW
71
-
binlog_row_image = FULL
72
-
expire_logs_days = 7
73
-
```
74
-
75
-
Create a user for CDC:
76
-
77
-
```sql
78
-
CREATEUSER 'flink_cdc'@'%' IDENTIFIED BY 'password';
79
-
GRANTSELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON*.* TO 'flink_cdc'@'%';
80
-
FLUSH PRIVILEGES;
81
-
```
82
-
83
-
#### Step 2: Single Table Sync with Flink SQL
84
-
85
-
```sql
86
-
-- Source: MySQL CDC
87
-
CREATETABLEmysql_orders (
88
-
order_id INT,
89
-
customer_id INT,
90
-
order_date DATE,
91
-
total_amount DECIMAL(10, 2),
92
-
status STRING,
93
-
created_at TIMESTAMP(3),
94
-
PRIMARY KEY (order_id) NOT ENFORCED
95
-
) WITH (
96
-
'connector'='mysql-cdc',
97
-
'hostname'='mysql-host',
98
-
'port'='3306',
99
-
'username'='flink_cdc',
100
-
'password'='password',
101
-
'database-name'='source_db',
102
-
'table-name'='orders',
103
-
'server-time-zone'='UTC'
104
-
);
105
-
106
-
-- Sink: Doris
107
-
CREATETABLEdoris_orders (
108
-
order_id INT,
109
-
customer_id INT,
110
-
order_date DATE,
111
-
total_amount DECIMAL(10, 2),
112
-
status STRING,
113
-
created_at DATETIME
114
-
) WITH (
115
-
'connector'='doris',
116
-
'fenodes'='doris-fe:8030',
117
-
'table.identifier'='target_db.orders',
118
-
'username'='doris_user',
119
-
'password'='doris_password',
120
-
'sink.enable-2pc'='true',
121
-
'sink.label-prefix'='mysql_orders_sync'
122
-
);
123
-
124
-
-- Start synchronization
125
-
INSERT INTO doris_orders SELECT*FROM mysql_orders;
126
-
```
127
-
128
-
#### Step 3: Full Database Sync with Flink Doris Connector
65
+
**Prerequisites**: MySQL 5.7+ or 8.0+ with binlog enabled; Flink 1.15+ with Flink CDC 3.x and Flink Doris Connector.
129
66
130
-
The Flink Doris Connector provides a powerful whole-database sync feature:
SELECT order_id, customer_id, order_date, total_amount, status
218
-
FROMmysql_catalog.source_db.orders;
219
-
```
71
+
The [JDBC Catalog](../lakehouse/catalogs/jdbc-catalog.md) allows direct querying and batch migration from MySQL. This is the simplest approach for one-time or periodic batch migrations.
220
72
221
73
### Option 3: DataX
222
74
223
-
[DataX](https://github.com/alibaba/DataX) is a widely-used data synchronization tool that supports MySQL to Doris migration.
224
-
225
-
#### DataX Job Configuration
226
-
227
-
```json
228
-
{
229
-
"job": {
230
-
"setting": {
231
-
"speed": {
232
-
"channel": 4
233
-
}
234
-
},
235
-
"content": [{
236
-
"reader": {
237
-
"name": "mysqlreader",
238
-
"parameter": {
239
-
"username": "mysql_user",
240
-
"password": "mysql_password",
241
-
"connection": [{
242
-
"querySql": ["SELECT order_id, customer_id, order_date, total_amount, status FROM orders"],
MySQL AUTO_INCREMENT columns should map to Doris's auto-increment feature:
278
-
279
-
```sql
280
-
-- Doris table with auto increment
281
-
CREATETABLEusers (
282
-
user_id BIGINT AUTO_INCREMENT,
283
-
username VARCHAR(64),
284
-
email VARCHAR(128)
285
-
)
286
-
UNIQUE KEY(user_id)
287
-
DISTRIBUTED BY HASH(user_id) BUCKETS 8;
288
-
```
289
-
290
-
For migration, you may want to preserve original IDs:
291
-
292
-
```sql
293
-
-- Disable auto increment during migration
294
-
INSERT INTO users (user_id, username, email)
295
-
SELECT user_id, username, email
296
-
FROMmysql_catalog.source_db.users;
297
-
```
298
-
299
-
### Handling ENUM and SET Types
300
-
301
-
MySQL ENUM and SET types are migrated as STRING in Doris:
302
-
303
-
```sql
304
-
-- MySQL source
305
-
CREATETABLEproducts (
306
-
id INT,
307
-
status ENUM('active', 'inactive', 'pending'),
308
-
tags SET('featured', 'sale', 'new')
309
-
);
310
-
311
-
-- Doris target
312
-
CREATETABLEproducts (
313
-
id INT,
314
-
status VARCHAR(32),
315
-
tags VARCHAR(128)
316
-
)
317
-
DISTRIBUTED BY HASH(id) BUCKETS 8;
318
-
```
319
-
320
-
### Handling Binary Data
321
-
322
-
Binary data (BLOB, BINARY) is typically stored as base64-encoded STRING:
323
-
324
-
```sql
325
-
-- Use HEX encoding for binary data
326
-
INSERT INTO doris_table
327
-
SELECT
328
-
id,
329
-
HEX(binary_col) as binary_hex
330
-
FROMmysql_catalog.source_db.table_with_binary;
331
-
```
332
-
333
-
### Large Table Migration Performance
334
-
335
-
For tables with billions of rows:
336
-
337
-
1.**Increase Flink parallelism**:
338
-
```sql
339
-
SET'parallelism.default'='8';
340
-
```
341
-
342
-
2.**Tune Doris write buffer**:
343
-
```sql
344
-
-- In Flink sink configuration
345
-
'sink.buffer-size'='1048576',
346
-
'sink.buffer-count'='3'
347
-
```
348
-
349
-
3.**Use batch mode for initial load**:
350
-
```sql
351
-
-- Flink sink batch configuration
352
-
'sink.enable-2pc'='false',
353
-
'sink.properties.format'='json'
354
-
```
355
-
356
-
## Multi-Tenant Migration
357
-
358
-
For MySQL instances with multiple databases:
359
-
360
-
```shell
361
-
# Sync multiple databases
362
-
<FLINK_HOME>/bin/flink run \
363
-
-c org.apache.doris.flink.tools.cdc.CdcTools \
364
-
flink-doris-connector.jar \
365
-
mysql-sync-database \
366
-
--database "db1|db2|db3" \
367
-
--mysql-conf hostname=mysql-host \
368
-
--mysql-conf database-name="db1|db2|db3" \
369
-
--doris-conf fenodes=doris-fe:8030 \
370
-
--including-tables ".*"
371
-
```
372
-
373
-
## Validation
374
-
375
-
After migration, validate data integrity:
376
-
377
-
```sql
378
-
-- Row count comparison
379
-
SELECT
380
-
'mysql'as source,
381
-
COUNT(*) as cnt
382
-
FROMmysql_catalog.source_db.orders
383
-
UNION ALL
384
-
SELECT
385
-
'doris'as source,
386
-
COUNT(*) as cnt
387
-
FROMinternal.target_db.orders;
388
-
389
-
-- Checksum validation (sample)
390
-
SELECT
391
-
SUM(order_id) as id_sum,
392
-
SUM(total_amount) as amount_sum,
393
-
COUNT(DISTINCT customer_id) as unique_customers
394
-
FROMinternal.target_db.orders;
395
-
396
-
-- Compare with MySQL
397
-
SELECT
398
-
SUM(order_id) as id_sum,
399
-
SUM(total_amount) as amount_sum,
400
-
COUNT(DISTINCT customer_id) as unique_customers
401
-
FROMmysql_catalog.source_db.orders;
402
-
```
75
+
[DataX](https://github.com/alibaba/DataX) is a widely-used data synchronization tool that supports MySQL to Doris migration via the `mysqlreader` and `doriswriter` plugins.
0 commit comments