Skip to content

Commit 23f1d2d

Browse files
dataroaringclaude
andcommitted
Remove how-to instructions, keep principles and type mappings
Strip step-by-step code examples from all migration docs (EN + ZH-CN) since their accuracy is unverified. Retain considerations/principles, data type mapping tables, reference tables (SQL conversion, DSL-to-SQL, table engine mapping), brief migration option descriptions with links to official Doris docs, and validation checklists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 38f2640 commit 23f1d2d

File tree

10 files changed

+157
-2753
lines changed

10 files changed

+157
-2753
lines changed

docs/migration/elasticsearch-to-doris.md

Lines changed: 15 additions & 365 deletions
Large diffs are not rendered by default.

docs/migration/mysql-to-doris.md

Lines changed: 12 additions & 339 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,14 @@ This guide covers migrating data from MySQL to Apache Doris. MySQL is one of the
1616

1717
3. **Full Database Sync**: The Flink Doris Connector supports synchronizing entire MySQL databases including DDL operations.
1818

19+
4. **Auto Increment Columns**: MySQL AUTO_INCREMENT columns can map to Doris's auto-increment feature. When migrating, you can preserve original IDs by explicitly specifying column values.
20+
21+
5. **ENUM and SET Types**: MySQL ENUM and SET types are migrated as STRING in Doris.
22+
23+
6. **Binary Data**: Binary data (BLOB, BINARY) is typically stored as STRING. Consider using HEX encoding for binary data during migration.
24+
25+
7. **Large Table Performance**: For tables with billions of rows, consider increasing Flink parallelism, tuning Doris write buffer, and using batch mode for initial load.
26+
1927
## Data Type Mapping
2028

2129
| MySQL Type | Doris Type | Notes |
@@ -54,352 +62,17 @@ Flink CDC captures MySQL binlog changes and streams them to Doris. This is the r
5462
- Full database migration with automatic table creation
5563
- Continuous sync with schema evolution support
5664

57-
#### Prerequisites
58-
59-
- MySQL 5.7+ or 8.0+ with binlog enabled
60-
- Flink 1.15+ with Flink CDC 3.x and Flink Doris Connector
61-
62-
#### Step 1: Configure MySQL Binlog
63-
64-
Ensure these settings in MySQL:
65-
66-
```ini
67-
[mysqld]
68-
server-id = 1
69-
log_bin = mysql-bin
70-
binlog_format = ROW
71-
binlog_row_image = FULL
72-
expire_logs_days = 7
73-
```
74-
75-
Create a user for CDC:
76-
77-
```sql
78-
CREATE USER 'flink_cdc'@'%' IDENTIFIED BY 'password';
79-
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'flink_cdc'@'%';
80-
FLUSH PRIVILEGES;
81-
```
82-
83-
#### Step 2: Single Table Sync with Flink SQL
84-
85-
```sql
86-
-- Source: MySQL CDC
87-
CREATE TABLE mysql_orders (
88-
order_id INT,
89-
customer_id INT,
90-
order_date DATE,
91-
total_amount DECIMAL(10, 2),
92-
status STRING,
93-
created_at TIMESTAMP(3),
94-
PRIMARY KEY (order_id) NOT ENFORCED
95-
) WITH (
96-
'connector' = 'mysql-cdc',
97-
'hostname' = 'mysql-host',
98-
'port' = '3306',
99-
'username' = 'flink_cdc',
100-
'password' = 'password',
101-
'database-name' = 'source_db',
102-
'table-name' = 'orders',
103-
'server-time-zone' = 'UTC'
104-
);
105-
106-
-- Sink: Doris
107-
CREATE TABLE doris_orders (
108-
order_id INT,
109-
customer_id INT,
110-
order_date DATE,
111-
total_amount DECIMAL(10, 2),
112-
status STRING,
113-
created_at DATETIME
114-
) WITH (
115-
'connector' = 'doris',
116-
'fenodes' = 'doris-fe:8030',
117-
'table.identifier' = 'target_db.orders',
118-
'username' = 'doris_user',
119-
'password' = 'doris_password',
120-
'sink.enable-2pc' = 'true',
121-
'sink.label-prefix' = 'mysql_orders_sync'
122-
);
123-
124-
-- Start synchronization
125-
INSERT INTO doris_orders SELECT * FROM mysql_orders;
126-
```
127-
128-
#### Step 3: Full Database Sync with Flink Doris Connector
65+
**Prerequisites**: MySQL 5.7+ or 8.0+ with binlog enabled; Flink 1.15+ with Flink CDC 3.x and Flink Doris Connector.
12966

130-
The Flink Doris Connector provides a powerful whole-database sync feature:
131-
132-
```shell
133-
<FLINK_HOME>/bin/flink run \
134-
-c org.apache.doris.flink.tools.cdc.CdcTools \
135-
flink-doris-connector-1.18-25.1.0.jar \
136-
mysql-sync-database \
137-
--database source_db \
138-
--mysql-conf hostname=mysql-host \
139-
--mysql-conf port=3306 \
140-
--mysql-conf username=flink_cdc \
141-
--mysql-conf password=password \
142-
--mysql-conf database-name=source_db \
143-
--doris-conf fenodes=doris-fe:8030 \
144-
--doris-conf username=doris_user \
145-
--doris-conf password=doris_password \
146-
--doris-conf jdbc-url=jdbc:mysql://doris-fe:9030 \
147-
--table-conf replication_num=3 \
148-
--including-tables "orders|customers|products"
149-
```
150-
151-
Key options:
152-
153-
| Parameter | Description |
154-
|-----------|-------------|
155-
| `--including-tables` | Regex pattern for tables to include |
156-
| `--excluding-tables` | Regex pattern for tables to exclude |
157-
| `--multi-to-one-origin` | Map multiple source tables to one target |
158-
| `--create-table-only` | Only create tables without syncing data |
67+
For detailed setup, see the [Flink Doris Connector](../ecosystem/flink-doris-connector.md) documentation.
15968

16069
### Option 2: JDBC Catalog
16170

162-
The JDBC Catalog allows direct querying and batch migration from MySQL.
163-
164-
#### Step 1: Download MySQL JDBC Driver
165-
166-
```bash
167-
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.33/mysql-connector-java-8.0.33.jar
168-
cp mysql-connector-java-8.0.33.jar $DORIS_HOME/fe/jdbc_drivers/
169-
cp mysql-connector-java-8.0.33.jar $DORIS_HOME/be/jdbc_drivers/
170-
```
171-
172-
#### Step 2: Create MySQL Catalog
173-
174-
```sql
175-
CREATE CATALOG mysql_catalog PROPERTIES (
176-
'type' = 'jdbc',
177-
'user' = 'mysql_user',
178-
'password' = 'mysql_password',
179-
'jdbc_url' = 'jdbc:mysql://mysql-host:3306/source_db',
180-
'driver_url' = 'mysql-connector-java-8.0.33.jar',
181-
'driver_class' = 'com.mysql.cj.jdbc.Driver'
182-
);
183-
```
184-
185-
#### Step 3: Query and Migrate
186-
187-
```sql
188-
-- Explore source data
189-
SWITCH mysql_catalog;
190-
SHOW DATABASES;
191-
USE source_db;
192-
SHOW TABLES;
193-
SELECT * FROM orders LIMIT 10;
194-
195-
-- Create target table in Doris
196-
SWITCH internal;
197-
CREATE TABLE target_db.orders (
198-
order_id INT,
199-
customer_id INT,
200-
order_date DATE NOT NULL,
201-
total_amount DECIMAL(10, 2),
202-
status VARCHAR(32)
203-
)
204-
UNIQUE KEY(order_id, order_date)
205-
PARTITION BY RANGE(order_date) ()
206-
DISTRIBUTED BY HASH(order_id) BUCKETS 16
207-
PROPERTIES (
208-
"dynamic_partition.enable" = "true",
209-
"dynamic_partition.time_unit" = "DAY",
210-
"dynamic_partition.end" = "3",
211-
"dynamic_partition.prefix" = "p",
212-
"replication_num" = "3"
213-
);
214-
215-
-- Migrate data
216-
INSERT INTO internal.target_db.orders
217-
SELECT order_id, customer_id, order_date, total_amount, status
218-
FROM mysql_catalog.source_db.orders;
219-
```
71+
The [JDBC Catalog](../lakehouse/catalogs/jdbc-catalog.md) allows direct querying and batch migration from MySQL. This is the simplest approach for one-time or periodic batch migrations.
22072

22173
### Option 3: DataX
22274

223-
[DataX](https://github.com/alibaba/DataX) is a widely-used data synchronization tool that supports MySQL to Doris migration.
224-
225-
#### DataX Job Configuration
226-
227-
```json
228-
{
229-
"job": {
230-
"setting": {
231-
"speed": {
232-
"channel": 4
233-
}
234-
},
235-
"content": [{
236-
"reader": {
237-
"name": "mysqlreader",
238-
"parameter": {
239-
"username": "mysql_user",
240-
"password": "mysql_password",
241-
"connection": [{
242-
"querySql": ["SELECT order_id, customer_id, order_date, total_amount, status FROM orders"],
243-
"jdbcUrl": ["jdbc:mysql://mysql-host:3306/source_db"]
244-
}]
245-
}
246-
},
247-
"writer": {
248-
"name": "doriswriter",
249-
"parameter": {
250-
"feLoadUrl": ["doris-fe:8030"],
251-
"jdbcUrl": "jdbc:mysql://doris-fe:9030/",
252-
"database": "target_db",
253-
"table": "orders",
254-
"username": "doris_user",
255-
"password": "doris_password",
256-
"loadProps": {
257-
"format": "json",
258-
"strip_outer_array": true
259-
}
260-
}
261-
}
262-
}]
263-
}
264-
}
265-
```
266-
267-
Run the job:
268-
269-
```bash
270-
python datax.py mysql_to_doris.json
271-
```
272-
273-
## Handling Common Issues
274-
275-
### Auto Increment Columns
276-
277-
MySQL AUTO_INCREMENT columns should map to Doris's auto-increment feature:
278-
279-
```sql
280-
-- Doris table with auto increment
281-
CREATE TABLE users (
282-
user_id BIGINT AUTO_INCREMENT,
283-
username VARCHAR(64),
284-
email VARCHAR(128)
285-
)
286-
UNIQUE KEY(user_id)
287-
DISTRIBUTED BY HASH(user_id) BUCKETS 8;
288-
```
289-
290-
For migration, you may want to preserve original IDs:
291-
292-
```sql
293-
-- Disable auto increment during migration
294-
INSERT INTO users (user_id, username, email)
295-
SELECT user_id, username, email
296-
FROM mysql_catalog.source_db.users;
297-
```
298-
299-
### Handling ENUM and SET Types
300-
301-
MySQL ENUM and SET types are migrated as STRING in Doris:
302-
303-
```sql
304-
-- MySQL source
305-
CREATE TABLE products (
306-
id INT,
307-
status ENUM('active', 'inactive', 'pending'),
308-
tags SET('featured', 'sale', 'new')
309-
);
310-
311-
-- Doris target
312-
CREATE TABLE products (
313-
id INT,
314-
status VARCHAR(32),
315-
tags VARCHAR(128)
316-
)
317-
DISTRIBUTED BY HASH(id) BUCKETS 8;
318-
```
319-
320-
### Handling Binary Data
321-
322-
Binary data (BLOB, BINARY) is typically stored as base64-encoded STRING:
323-
324-
```sql
325-
-- Use HEX encoding for binary data
326-
INSERT INTO doris_table
327-
SELECT
328-
id,
329-
HEX(binary_col) as binary_hex
330-
FROM mysql_catalog.source_db.table_with_binary;
331-
```
332-
333-
### Large Table Migration Performance
334-
335-
For tables with billions of rows:
336-
337-
1. **Increase Flink parallelism**:
338-
```sql
339-
SET 'parallelism.default' = '8';
340-
```
341-
342-
2. **Tune Doris write buffer**:
343-
```sql
344-
-- In Flink sink configuration
345-
'sink.buffer-size' = '1048576',
346-
'sink.buffer-count' = '3'
347-
```
348-
349-
3. **Use batch mode for initial load**:
350-
```sql
351-
-- Flink sink batch configuration
352-
'sink.enable-2pc' = 'false',
353-
'sink.properties.format' = 'json'
354-
```
355-
356-
## Multi-Tenant Migration
357-
358-
For MySQL instances with multiple databases:
359-
360-
```shell
361-
# Sync multiple databases
362-
<FLINK_HOME>/bin/flink run \
363-
-c org.apache.doris.flink.tools.cdc.CdcTools \
364-
flink-doris-connector.jar \
365-
mysql-sync-database \
366-
--database "db1|db2|db3" \
367-
--mysql-conf hostname=mysql-host \
368-
--mysql-conf database-name="db1|db2|db3" \
369-
--doris-conf fenodes=doris-fe:8030 \
370-
--including-tables ".*"
371-
```
372-
373-
## Validation
374-
375-
After migration, validate data integrity:
376-
377-
```sql
378-
-- Row count comparison
379-
SELECT
380-
'mysql' as source,
381-
COUNT(*) as cnt
382-
FROM mysql_catalog.source_db.orders
383-
UNION ALL
384-
SELECT
385-
'doris' as source,
386-
COUNT(*) as cnt
387-
FROM internal.target_db.orders;
388-
389-
-- Checksum validation (sample)
390-
SELECT
391-
SUM(order_id) as id_sum,
392-
SUM(total_amount) as amount_sum,
393-
COUNT(DISTINCT customer_id) as unique_customers
394-
FROM internal.target_db.orders;
395-
396-
-- Compare with MySQL
397-
SELECT
398-
SUM(order_id) as id_sum,
399-
SUM(total_amount) as amount_sum,
400-
COUNT(DISTINCT customer_id) as unique_customers
401-
FROM mysql_catalog.source_db.orders;
402-
```
75+
[DataX](https://github.com/alibaba/DataX) is a widely-used data synchronization tool that supports MySQL to Doris migration via the `mysqlreader` and `doriswriter` plugins.
40376

40477
## Next Steps
40578

0 commit comments

Comments
 (0)