Skip to content

Commit 538833e

Browse files
authored
Merge branch 'master' into nm-refactor-migration-context
2 parents 2ad65ee + 7d875b4 commit 538833e

17 files changed

Lines changed: 173 additions & 99 deletions

doc/command-line-flags.md

Lines changed: 51 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
A more in-depth discussion of various `gh-ost` command line flags: implementation, implication, use cases.
44

5+
### allow-master-master
6+
7+
See [`--assume-master-host`](#assume-master-host).
8+
59
### allow-on-master
610

711
By default, `gh-ost` would like you to connect to a replica, from where it figures out the master by itself. This wiring is required should your master execute using `binlog_format=STATEMENT`.
@@ -14,20 +18,20 @@ When your migration issues a column rename (`change column old_name new_name ...
1418

1519
`gh-ost` will print out what it thinks the _rename_ implied, but will not issue the migration unless you provide with `--approve-renamed-columns`.
1620

17-
If you think `gh-ost` is mistaken and that there's actually no _rename_ involved, you may pass `--skip-renamed-columns` instead. This will cause `gh-ost` to disassociate the column values; data will not be copied between those columns.
21+
If you think `gh-ost` is mistaken and that there's actually no _rename_ involved, you may pass [`--skip-renamed-columns`](#skip-renamed-columns) instead. This will cause `gh-ost` to disassociate the column values; data will not be copied between those columns.
1822

1923
### assume-master-host
2024

2125
`gh-ost` infers the identity of the master server by crawling up the replication topology. You may explicitly tell `gh-ost` the identity of the master host via `--assume-master-host=the.master.com`. This is useful in:
2226

23-
- master-master topologies (together with `--allow-master-master`), where `gh-ost` can arbitrarily pick one of the co-master and you prefer that it picks a specific one
24-
- _tungsten replicator_ topologies (together with `--tungsten`), where `gh-ost` is unable to crawl and detect the master
27+
- _master-master_ topologies (together with [`--allow-master-master`](#allow-master-master)), where `gh-ost` can arbitrarily pick one of the co-masters and you prefer that it picks a specific one
28+
- _tungsten replicator_ topologies (together with [`--tungsten`](#tungsten)), where `gh-ost` is unable to crawl and detect the master
2529

2630
### assume-rbr
2731

2832
If you happen to _know_ your servers use RBR (Row Based Replication, i.e. `binlog_format=ROW`), you may specify `--assume-rbr`. This skips a verification step where `gh-ost` would issue a `STOP SLAVE; START SLAVE`.
2933
Skipping this step means `gh-ost` would not need the `SUPER` privilege in order to operate.
30-
You may want to use this on Amazon RDS
34+
You may want to use this on Amazon RDS.
3135

3236
### conf
3337

@@ -41,21 +45,25 @@ password=123456
4145

4246
### concurrent-rowcount
4347

44-
See `exact-rowcount`
48+
Defaults to `true`. See [`exact-rowcount`](#exact-rowcount)
4549

46-
### critical-load-interval-millis
50+
### critical-load
51+
52+
Comma delimited status-name=threshold, same format as [`--max-load`](#max-load).
4753

4854
`--critical-load` defines a threshold that, when met, `gh-ost` panics and bails out. The default behavior is to bail out immediately when meeting this threshold.
4955

5056
This may sometimes lead to migrations bailing out on a very short spike, that, while in itself is impacting production and is worth investigating, isn't reason enough to kill a 10 hour migration.
5157

58+
### critical-load-interval-millis
59+
5260
When `--critical-load-interval-millis` is specified (e.g. `--critical-load-interval-millis=2500`), `gh-ost` gives a second chance: when it meets `critical-load` threshold, it doesn't bail out. Instead, it starts a timer (in this example: `2.5` seconds) and re-checks `critical-load` when the timer expires. If `critical-load` is met again, `gh-ost` panics and bails out. If not, execution continues.
5361

5462
This is somewhat similar to a Nagios `n`-times test, where `n` in our case is always `2`.
5563

5664
### cut-over
5765

58-
Optional. Default is `safe`. See more discussion in [cut-over](cut-over.md)
66+
Optional. Default is `safe`. See more discussion in [`cut-over`](cut-over.md)
5967

6068
### discard-foreign-keys
6169

@@ -74,7 +82,7 @@ The `--dml-batch-size` flag controls the size of the batched write. Allowed valu
7482

7583
Why is this behavior configurable? Different workloads have different characteristics. Some workloads have very large writes, such that aggregating even `50` writes into a transaction makes for a significant transaction size. On other workloads write rate is high such that one just can't allow for a hundred more syncs to disk per second. The default value of `10` is a modest compromise that should probably work very well for most workloads. Your mileage may vary.
7684

77-
Noteworthy is that setting `--dml-batch-size` to higher value _does not_ mean `gh-ost` blocks or waits on writes. The batch size is an upper limit on transaction size, not a minimal one. If `gh-ost` doesn't have "enough" events in the pipe, it does not wait on the binary log, it just writes what it already has. This conveniently suggests that if write load is light enough for `gh-ost` to only see a few events in the binary log at a given time, then it is also light neough for `gh-ost` to apply a fraction of the batch size.
85+
Noteworthy is that setting `--dml-batch-size` to higher value _does not_ mean `gh-ost` blocks or waits on writes. The batch size is an upper limit on transaction size, not a minimal one. If `gh-ost` doesn't have "enough" events in the pipe, it does not wait on the binary log, it just writes what it already has. This conveniently suggests that if write load is light enough for `gh-ost` to only see a few events in the binary log at a given time, then it is also light enough for `gh-ost` to apply a fraction of the batch size.
7886

7987
### exact-rowcount
8088

@@ -84,8 +92,8 @@ A `gh-ost` execution need to copy whatever rows you have in your existing table
8492
`gh-ost` also supports the `--exact-rowcount` flag. When this flag is given, two things happen:
8593
- An initial, authoritative `select count(*) from your_table`.
8694
This query may take a long time to complete, but is performed before we begin the massive operations.
87-
When `--concurrent-rowcount` is also specified, this runs in parallel to row copy.
88-
Note: `--concurrent-rowcount` now defaults to `true`.
95+
When [`--concurrent-rowcount`](#concurrent-rowcount) is also specified, this runs in parallel to row copy.
96+
Note: [`--concurrent-rowcount`](#concurrent-rowcount) now defaults to `true`.
8997
- A continuous update to the estimate as we make progress applying events.
9098
We heuristically update the number of rows based on the queries we process from the binlogs.
9199

@@ -95,6 +103,10 @@ While the ongoing estimated number of rows is still heuristic, it's almost exact
95103

96104
Without this parameter, migration is a _noop_: testing table creation and validity of migration, but not touching data.
97105

106+
### heartbeat-interval-millis
107+
108+
Default 100. See [`subsecond-lag`](subsecond-lag.md) for details.
109+
98110
### initially-drop-ghost-table
99111

100112
`gh-ost` maintains two tables while migrating: the _ghost_ table (which is synced from your original table and finally replaces it) and a changelog table, which is used internally for bookkeeping. By default, it panics and aborts if it sees those tables upon startup. Provide `--initially-drop-ghost-table` and `--initially-drop-old-table` to let `gh-ost` know it's OK to drop them beforehand.
@@ -103,37 +115,55 @@ We think `gh-ost` should not take chances or make assumptions about the user's t
103115

104116
### initially-drop-old-table
105117

106-
See #initially-drop-ghost-table
118+
See [`initially-drop-ghost-table`](#initially-drop-ghost-table)
107119

108120
### max-lag-millis
109121

110122
On a replication topology, this is perhaps the most important migration throttling factor: the maximum lag allowed for migration to work. If lag exceeds this value, migration throttles.
111123

112-
When using [Connect to replica, migrate on master](cheatsheet.md), this lag is primarily tested on the very replica `gh-ost` operates on. Lag is measured by checking the heartbeat events injected by `gh-ost` itself on the utility changelog table. That is, to measure this replica's lag, `gh-ost` doesn't need to issue `show slave status` nor have any external heartbeat mechanism.
124+
When using [Connect to replica, migrate on master](cheatsheet.md#a-connect-to-replica-migrate-on-master), this lag is primarily tested on the very replica `gh-ost` operates on. Lag is measured by checking the heartbeat events injected by `gh-ost` itself on the utility changelog table. That is, to measure this replica's lag, `gh-ost` doesn't need to issue `show slave status` nor have any external heartbeat mechanism.
113125

114-
When `--throttle-control-replicas` is provided, throttling also considers lag on specified hosts. Lag measurements on listed hosts is done by querying `gh-ost`'s _changelog_ table, where `gh-ost` injects a heartbeat.
126+
When [`--throttle-control-replicas`](#throttle-control-replicas) is provided, throttling also considers lag on specified hosts. Lag measurements on listed hosts is done by querying `gh-ost`'s _changelog_ table, where `gh-ost` injects a heartbeat.
115127

116128
See also: [Sub-second replication lag throttling](subsecond-lag.md)
117129

130+
### max-load
131+
132+
List of metrics and threshold values; topping the threshold of any will cause throttler to kick in. See also: [`throttling`](throttle.md#status-thresholds)
133+
118134
### migrate-on-replica
119135

120136
Typically `gh-ost` is used to migrate tables on a master. If you wish to only perform the migration in full on a replica, connect `gh-ost` to said replica and pass `--migrate-on-replica`. `gh-ost` will briefly connect to the master but other issue no changes on the master. Migration will be fully executed on the replica, while making sure to maintain a small replication lag.
121137

138+
### postpone-cut-over-flag-file
139+
140+
Indicate a file name, such that the final [cut-over](cut-over.md) step does not take place as long as the file exists.
141+
When this flag is set, `gh-ost` expects the file to exist on startup, or else tries to create it. `gh-ost` exits with error if the file does not exist and `gh-ost` is unable to create it.
142+
With this flag set, the migration will cut-over upon deletion of the file or upon `cut-over` [interactive command](interactive-commands.md).
143+
144+
### replica-server-id
145+
146+
Defaults to 99999. If you run multiple migrations then you must provide a different, unique `--replica-server-id` for each `gh-ost` process.
147+
Optionally involve the process ID, for example: `--replica-server-id=$((1000000000+$$))`.
148+
149+
It's on you to choose a number that does not collide with another `gh-ost` or another running replica.
150+
See also: [`concurrent-migrations`](cheatsheet.md#concurrent-migrations) on the cheatsheet.
151+
122152
### skip-foreign-key-checks
123153

124-
By default `gh-ost` verifies no foreign keys exist on the migrated table. On servers with large number of tables this check can take a long time. If you're absolutely certain no foreign keys exist (table does not referenece other table nor is referenced by other tables) and wish to save the check time, provide with `--skip-foreign-key-checks`.
154+
By default `gh-ost` verifies no foreign keys exist on the migrated table. On servers with large number of tables this check can take a long time. If you're absolutely certain no foreign keys exist (table does not reference other table nor is referenced by other tables) and wish to save the check time, provide with `--skip-foreign-key-checks`.
125155

126156
### skip-renamed-columns
127157

128-
See `approve-renamed-columns`
158+
See [`approve-renamed-columns`](#approve-renamed-columns)
129159

130160
### test-on-replica
131161

132-
Issue the migration on a replica; do not modify data on master. Useful for validating, testing and benchmarking. See [testing-on-replica](testing-on-replica.md)
162+
Issue the migration on a replica; do not modify data on master. Useful for validating, testing and benchmarking. See [`testing-on-replica`](testing-on-replica.md)
133163

134164
### throttle-control-replicas
135165

136-
Provide a command delimited list of replicas; `gh-ost` will throttle when any of the given replicas lag beyond `--max-lag-millis`. The list can be queried and updated dynamically via [interactive commands](interactive-commands.md)
166+
Provide a command delimited list of replicas; `gh-ost` will throttle when any of the given replicas lag beyond [`--max-lag-millis`](#max-lag-millis). The list can be queried and updated dynamically via [interactive commands](interactive-commands.md)
137167

138168
### throttle-http
139169

@@ -142,3 +172,7 @@ Provide a HTTP endpoint; `gh-ost` will issue `HEAD` requests on given URL and th
142172
### timestamp-old-table
143173

144174
Makes the _old_ table include a timestamp value. The _old_ table is what the original table is renamed to at the end of a successful migration. For example, if the table is `gh_ost_test`, then the _old_ table would normally be `_gh_ost_test_del`. With `--timestamp-old-table` it would be, for example, `_gh_ost_test_20170221103147_del`.
175+
176+
### tungsten
177+
178+
See [`tungsten`](cheatsheet.md#tungsten) on the cheatsheet.

doc/interactive-commands.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Both interfaces may serve at the same time. Both respond to simple text command,
4343

4444
### Querying for data
4545

46-
For commands that accept an argumetn as value, pass `?` (question mark) to _get_ current value rather than _set_ a new one.
46+
For commands that accept an argument as value, pass `?` (question mark) to _get_ current value rather than _set_ a new one.
4747

4848
### Examples
4949

doc/understanding-output.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,15 @@ Initial output lines may look like this:
2424
2016-05-19 17:57:11 INFO connection validated on 127.0.0.1:3306
2525
2016-05-19 17:57:11 INFO rotate to next log name: mysql-bin.002587
2626
2016-05-19 17:57:11 INFO connection validated on 127.0.0.1:3306
27-
2016-05-19 17:57:11 INFO Droppping table `mydb`.`_mytable_gst`
27+
2016-05-19 17:57:11 INFO Dropping table `mydb`.`_mytable_gst`
2828
2016-05-19 17:57:11 INFO Table dropped
29-
2016-05-19 17:57:11 INFO Droppping table `mydb`.`_mytable_old`
29+
2016-05-19 17:57:11 INFO Dropping table `mydb`.`_mytable_old`
3030
2016-05-19 17:57:11 INFO Table dropped
3131
2016-05-19 17:57:11 INFO Creating ghost table `mydb`.`_mytable_gst`
3232
2016-05-19 17:57:11 INFO Ghost table created
3333
2016-05-19 17:57:11 INFO Altering ghost table `mydb`.`_mytable_gst`
3434
2016-05-19 17:57:11 INFO Ghost table altered
35-
2016-05-19 17:57:11 INFO Droppping table `mydb`.`_mytable_osc`
35+
2016-05-19 17:57:11 INFO Dropping table `mydb`.`_mytable_osc`
3636
2016-05-19 17:57:11 INFO Table dropped
3737
2016-05-19 17:57:11 INFO Creating changelog table `mydb`.`_mytable_osc`
3838
2016-05-19 17:57:11 INFO Changelog table created

doc/why-triggerless.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Use of triggers simplifies a lot of the flow in doing a live table migration, bu
1616

1717
Triggers are stored routines which are invoked on a per-row operation upon `INSERT`, `DELETE`, `UPDATE` on a table.
1818
They were introduced in MySQL `5.0`.
19-
A trigger may contain a set of queries, and these queries run in the same transaction space as the query that manipulates the table. This makes for an atomicy of both the original operation on the table and the trigger-invoked operations.
19+
A trigger may contain a set of queries, and these queries run in the same transaction space as the query that manipulates the table. This makes for an atomicity of both the original operation on the table and the trigger-invoked operations.
2020

2121
### Triggers, overhead
2222

go/base/utils.go

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ import (
1111
"regexp"
1212
"strings"
1313
"time"
14+
15+
gosql "database/sql"
16+
"github.com/github/gh-ost/go/mysql"
17+
"github.com/outbrain/golib/log"
1418
)
1519

1620
var (
@@ -33,6 +37,15 @@ func FileExists(fileName string) bool {
3337
return false
3438
}
3539

40+
func TouchFile(fileName string) error {
41+
f, err := os.OpenFile(fileName, os.O_APPEND|os.O_CREATE, 0755)
42+
if err != nil {
43+
return (err)
44+
}
45+
defer f.Close()
46+
return nil
47+
}
48+
3649
// StringContainsAll returns true if `s` contains all non empty given `substrings`
3750
// The function returns `false` if no non-empty arguments are given.
3851
func StringContainsAll(s string, substrings ...string) bool {
@@ -50,3 +63,25 @@ func StringContainsAll(s string, substrings ...string) bool {
5063
}
5164
return nonEmptyStringsFound
5265
}
66+
67+
func ValidateConnection(db *gosql.DB, connectionConfig *mysql.ConnectionConfig) (string, error) {
68+
query := `select @@global.port, @@global.version`
69+
var port, extraPort int
70+
var version string
71+
if err := db.QueryRow(query).Scan(&port, &version); err != nil {
72+
return "", err
73+
}
74+
extraPortQuery := `select @@global.extra_port`
75+
if err := db.QueryRow(extraPortQuery).Scan(&extraPort); err != nil {
76+
// swallow this error. not all servers support extra_port
77+
}
78+
79+
if connectionConfig.Key.Port == port || (extraPort > 0 && connectionConfig.Key.Port == extraPort) {
80+
log.Infof("connection validated on %+v", connectionConfig.Key)
81+
return version, nil
82+
} else if extraPort == 0 {
83+
return "", fmt.Errorf("Unexpected database port reported: %+v", port)
84+
} else {
85+
return "", fmt.Errorf("Unexpected database port reported: %+v / extra_port: %+v", port, extraPort)
86+
}
87+
}

go/binlog/gomysql_reader.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,12 @@ func NewGoMySQLReader(migrationContext *base.MigrationContext) (binlogReader *Go
5555
// ConnectBinlogStreamer
5656
func (this *GoMySQLReader) ConnectBinlogStreamer(coordinates mysql.BinlogCoordinates) (err error) {
5757
if coordinates.IsEmpty() {
58-
return log.Errorf("Emptry coordinates at ConnectBinlogStreamer()")
58+
return log.Errorf("Empty coordinates at ConnectBinlogStreamer()")
5959
}
6060

6161
this.currentCoordinates = coordinates
6262
log.Infof("Connecting binlog streamer at %+v", this.currentCoordinates)
63-
// Start sync with sepcified binlog file and position
63+
// Start sync with specified binlog file and position
6464
this.binlogStreamer, err = this.binlogSyncer.StartSync(gomysql.Position{this.currentCoordinates.LogFile, uint32(this.currentCoordinates.LogPos)})
6565

6666
return err
@@ -112,7 +112,7 @@ func (this *GoMySQLReader) handleRowsEvent(ev *replication.BinlogEvent, rowsEven
112112
}
113113
}
114114
// The channel will do the throttling. Whoever is reding from the channel
115-
// decides whether action is taken sycnhronously (meaning we wait before
115+
// decides whether action is taken synchronously (meaning we wait before
116116
// next iteration) or asynchronously (we keep pushing more events)
117117
// In reality, reads will be synchronous
118118
entriesChannel <- binlogEntry

go/cmd/gh-ost/main.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ func main() {
4646
migrationContext := base.NewMigrationContext()
4747

4848
flag.StringVar(&migrationContext.InspectorConnectionConfig.Key.Hostname, "host", "127.0.0.1", "MySQL hostname (preferably a replica, not the master)")
49-
flag.StringVar(&migrationContext.AssumeMasterHostname, "assume-master-host", "", "(optional) explicitly tell gh-ost the identity of the master. Format: some.host.com[:port] This is useful in master-master setups where you wish to pick an explicit master, or in a tungsten-replicator where gh-ost is unabel to determine the master")
49+
flag.StringVar(&migrationContext.AssumeMasterHostname, "assume-master-host", "", "(optional) explicitly tell gh-ost the identity of the master. Format: some.host.com[:port] This is useful in master-master setups where you wish to pick an explicit master, or in a tungsten-replicator where gh-ost is unable to determine the master")
5050
flag.IntVar(&migrationContext.InspectorConnectionConfig.Key.Port, "port", 3306, "MySQL port (preferably a replica, not the master)")
5151
flag.StringVar(&migrationContext.CliUser, "user", "", "MySQL user")
5252
flag.StringVar(&migrationContext.CliPassword, "password", "", "MySQL password")
@@ -121,14 +121,15 @@ func main() {
121121
version := flag.Bool("version", false, "Print version & exit")
122122
checkFlag := flag.Bool("check-flag", false, "Check if another flag exists/supported. This allows for cross-version scripting. Exits with 0 when all additional provided flags exist, nonzero otherwise. You must provide (dummy) values for flags that require a value. Example: gh-ost --check-flag --cut-over-lock-timeout-seconds --nice-ratio 0")
123123
flag.StringVar(&migrationContext.ForceTmpTableName, "force-table-names", "", "table name prefix to be used on the temporary tables")
124+
flag.CommandLine.SetOutput(os.Stdout)
124125

125126
flag.Parse()
126127

127128
if *checkFlag {
128129
return
129130
}
130131
if *help {
131-
fmt.Fprintf(os.Stderr, "Usage of gh-ost:\n")
132+
fmt.Fprintf(os.Stdout, "Usage of gh-ost:\n")
132133
flag.PrintDefaults()
133134
return
134135
}

0 commit comments

Comments
 (0)