Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl : support ADMIN REPAIR TABLE to override bad tableInfo in meta & supply a REPAIR MODE for safely restart. #12046

Merged
merged 49 commits into from
Dec 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
650bb09
support admin repair table
AilinKid Sep 5, 2019
d402bdb
fix config test
AilinKid Sep 5, 2019
3959828
fix comment
AilinKid Oct 21, 2019
102b277
fix go.mod
AilinKid Oct 21, 2019
e23929b
add partition table repair
AilinKid Oct 22, 2019
8e55c92
add partition table repair
AilinKid Oct 22, 2019
a1d5cc2
fix parition id and index id assignment
AilinKid Oct 23, 2019
724c533
add comment
AilinKid Nov 8, 2019
f9bc798
address ming's comment
AilinKid Nov 11, 2019
4c22243
fix go.sum
AilinKid Nov 11, 2019
8e4a58b
fix go.sum
AilinKid Nov 11, 2019
b18e8ba
add column id reuse
AilinKid Nov 11, 2019
2cd11a6
add column id reuse
AilinKid Nov 11, 2019
bb3a6aa
add create table name checkout
AilinKid Nov 11, 2019
da9402d
extract domainutil
AilinKid Nov 15, 2019
6b8fdea
fix import
AilinKid Nov 15, 2019
88c52cc
add test
AilinKid Nov 18, 2019
fe209b4
handle domain reload diff
AilinKid Nov 19, 2019
6d05304
fix map data race
AilinKid Nov 19, 2019
95e8432
fix repairlist datarace
AilinKid Nov 19, 2019
dccb2ed
add MODE check in reload diff
AilinKid Nov 19, 2019
356e83f
change MODE from bool to atomic.value
AilinKid Nov 19, 2019
1a61e92
address comment and add partition test
AilinKid Nov 20, 2019
f1aa1ca
add drop table if exists
AilinKid Nov 20, 2019
d0a57fe
add drop table if exists.
AilinKid Nov 20, 2019
b1607f0
fix comment
AilinKid Nov 20, 2019
601b380
add hash partition
AilinKid Nov 20, 2019
5545413
address comment
AilinKid Nov 21, 2019
25fbc12
fix atomic value *struct copy write
AilinKid Nov 21, 2019
2feb238
address comment
AilinKid Nov 25, 2019
13503f0
fix autoIncID
AilinKid Nov 27, 2019
8b5eeb8
fix the keyword new
AilinKid Nov 27, 2019
b41e009
eliminate confusion in build tableInfo with ddl=nil
AilinKid Nov 27, 2019
8b912b4
eliminate the RepairedCallBack keyType
AilinKid Nov 27, 2019
ffc2bb8
extract checkAndOverridePartitionID in partition.go
AilinKid Nov 29, 2019
bc604c7
address comment
AilinKid Nov 29, 2019
26f126f
address comment
AilinKid Nov 30, 2019
92303d1
address config test
AilinKid Nov 30, 2019
5bfbad3
fix data race
AilinKid Nov 30, 2019
f58baea
Merge branch 'master' into admin_repair_table
AilinKid Nov 30, 2019
595874b
add cmd parameter
AilinKid Dec 2, 2019
2365dbf
add rollback/cancel support
AilinKid Dec 2, 2019
9de3312
Merge branch 'master' into admin_repair_table
AilinKid Dec 2, 2019
ccc4d34
Merge branch 'master' into admin_repair_table
AilinKid Dec 2, 2019
a03e642
address comment
AilinKid Dec 3, 2019
1941108
Merge branch 'master' into admin_repair_table
AilinKid Dec 3, 2019
86b202c
address comment
AilinKid Dec 3, 2019
d2e578d
Merge branch 'master' into admin_repair_table
AilinKid Dec 3, 2019
00cd67a
Merge branch 'master' into admin_repair_table
AilinKid Dec 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ type Config struct {
DelayCleanTableLock uint64 `toml:"delay-clean-table-lock" json:"delay-clean-table-lock"`
SplitRegionMaxNum uint64 `toml:"split-region-max-num" json:"split-region-max-num"`
StmtSummary StmtSummary `toml:"stmt-summary" json:"stmt-summary"`
// RepairMode indicates that the TiDB is in the repair mode for table meta.
RepairMode bool `toml:"repair-mode" json:"repair-mode"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s better to bootstrap repair mode with CLI arguments instead of configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this configuration if we have CLI parameters? @djshow832 @bb7133

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK here. We can offer users with another choice.

RepairTableList []string `toml:"repair-table-list" json:"repair-table-list"`
}

// nullableBool defaults unset bool options to unset instead of false, which enables us to know if the user has set 2
Expand Down Expand Up @@ -440,6 +443,8 @@ var defaultConf = Config{
EnableTableLock: false,
DelayCleanTableLock: 0,
SplitRegionMaxNum: 1000,
RepairMode: false,
RepairTableList: []string{},
TxnLocalLatches: TxnLocalLatches{
Enabled: false,
Capacity: 2048000,
Expand Down
7 changes: 7 additions & 0 deletions config/config.toml.example
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,13 @@ split-region-max-num = 1000
# In order to support "drop primary key" operation , this flag must be true and the table does not have the pkIsHandle flag.
alter-primary-key = false

# repair mode is used to repair the broken table meta in TiKV in extreme cases.
repair-mode = false

# Repair table list is used to list the tables in repair mode with the format like ["db.table",].
# In repair mode, repairing table which is not in repair list will get wrong database or wrong table error.
repair-table-list = []

[log]
# Log level: debug, info, warn, error, fatal.
level = "info"
Expand Down
2 changes: 2 additions & 0 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ alter-primary-key = true
delay-clean-table-lock = 5
split-region-max-num=10000
enable-batch-dml = true
repair-mode = true
[performance]
txn-total-size-limit=2000
[tikv-client]
Expand Down Expand Up @@ -215,6 +216,7 @@ max-sql-length=1024
c.Assert(conf.StmtSummary.MaxStmtCount, Equals, uint(1000))
c.Assert(conf.StmtSummary.MaxSQLLength, Equals, uint(1024))
c.Assert(conf.EnableBatchDML, Equals, true)
c.Assert(conf.RepairMode, Equals, true)
c.Assert(f.Close(), IsNil)
c.Assert(os.Remove(configFile), IsNil)

Expand Down
236 changes: 236 additions & 0 deletions ddl/db_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ import (
"github.com/pingcap/tidb/tablecodec"
"github.com/pingcap/tidb/types"
"github.com/pingcap/tidb/util/admin"
"github.com/pingcap/tidb/util/domainutil"
"github.com/pingcap/tidb/util/israce"
"github.com/pingcap/tidb/util/mock"
"github.com/pingcap/tidb/util/testkit"
Expand Down Expand Up @@ -2003,6 +2004,241 @@ func (s *testDBSuite1) TestCreateTable(c *C) {
c.Assert(err.Error(), Equals, "[types:1291]Column 'a' has duplicated value 'B' in ENUM")
}

func (s *testDBSuite5) TestRepairTable(c *C) {
c.Assert(failpoint.Enable("github.com/pingcap/tidb/infoschema/repairFetchCreateTable", `return(true)`), IsNil)
defer func() {
c.Assert(failpoint.Disable("github.com/pingcap/tidb/infoschema/repairFetchCreateTable"), IsNil)
}()
s.tk = testkit.NewTestKit(c, s.store)
s.tk.MustExec("use test")
s.tk.MustExec("drop table if exists t, other_table, origin")

// Test repair table when TiDB is not in repair mode.
s.tk.MustExec("CREATE TABLE t (a int primary key, b varchar(10));")
_, err := s.tk.Exec("admin repair table t CREATE TABLE t (a float primary key, b varchar(5));")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: TiDB is not in REPAIR MODE")

// Test repair table when the repaired list is empty.
domainutil.RepairInfo.SetRepairMode(true)
_, err = s.tk.Exec("admin repair table t CREATE TABLE t (a float primary key, b varchar(5));")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: repair list is empty")

// Test repair table when it's database isn't in repairInfo.
domainutil.RepairInfo.SetRepairTableList([]string{"test.other_table"})
_, err = s.tk.Exec("admin repair table t CREATE TABLE t (a float primary key, b varchar(5));")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: database test is not in repair")

// Test repair table when the table isn't in repairInfo.
s.tk.MustExec("CREATE TABLE other_table (a int, b varchar(1), key using hash(b));")
_, err = s.tk.Exec("admin repair table t CREATE TABLE t (a float primary key, b varchar(5));")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: table t is not in repair")

// Test user can't access to the repaired table.
_, err = s.tk.Exec("select * from other_table")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[schema:1146]Table 'test.other_table' doesn't exist")

// Test create statement use the same name with what is in repaired.
_, err = s.tk.Exec("CREATE TABLE other_table (a int);")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:1103]Incorrect table name 'other_table'%!(EXTRA string=this table is in repair)")

// Test column lost in repair table.
_, err = s.tk.Exec("admin repair table other_table CREATE TABLE other_table (a int, c char(1));")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: Column c has lost")

// Test column type should be the same.
_, err = s.tk.Exec("admin repair table other_table CREATE TABLE other_table (a bigint, b varchar(1), key using hash(b));")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: Column a type should be the same")

// Test index lost in repair table.
_, err = s.tk.Exec("admin repair table other_table CREATE TABLE other_table (a int unique);")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: Index a has lost")

// Test index type should be the same.
_, err = s.tk.Exec("admin repair table other_table CREATE TABLE other_table (a int, b varchar(2) unique)")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: Index b type should be the same")

// Test sub create statement in repair statement with the same name.
_, err = s.tk.Exec("admin repair table other_table CREATE TABLE other_table (a int);")
c.Assert(err, IsNil)

// Test whether repair table name is case sensitive.
domainutil.RepairInfo.SetRepairMode(true)
domainutil.RepairInfo.SetRepairTableList([]string{"test.other_table2"})
s.tk.MustExec("CREATE TABLE otHer_tAblE2 (a int, b varchar(1));")
_, err = s.tk.Exec("admin repair table otHer_tAblE2 CREATE TABLE otHeR_tAbLe (a int, b varchar(2));")
c.Assert(err, IsNil)
repairTable := testGetTableByName(c, s.s, "test", "otHeR_tAbLe")
c.Assert(repairTable.Meta().Name.O, Equals, "otHeR_tAbLe")

// Test memory and system database is not for repair.
domainutil.RepairInfo.SetRepairMode(true)
domainutil.RepairInfo.SetRepairTableList([]string{"test.xxx"})
_, err = s.tk.Exec("admin repair table performance_schema.xxx CREATE TABLE yyy (a int);")
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: memory or system database is not for repair")

// Test the repair detail.
turnRepairModeAndInit(true)
defer turnRepairModeAndInit(false)
// Domain reload the tableInfo and add it into repairInfo.
s.tk.MustExec("CREATE TABLE origin (a int primary key, b varchar(10), c int auto_increment);")
// Repaired tableInfo has been filtered by `domain.InfoSchema()`, so get it in repairInfo.
originTableInfo, _ := domainutil.RepairInfo.GetRepairedTableInfoByTableName("test", "origin")

hook := &ddl.TestDDLCallback{}
var repairErr error
hook.OnJobRunBeforeExported = func(job *model.Job) {
if job.Type != model.ActionRepairTable {
return
}
if job.TableID != originTableInfo.ID {
repairErr = errors.New("table id should be the same")
return
}
if job.SchemaState != model.StateNone {
repairErr = errors.New("repair job state should be the none")
return
}
// Test whether it's readable, when repaired table is still stateNone.
tkInternal := testkit.NewTestKitWithInit(c, s.store)
_, repairErr = tkInternal.Exec("select * from origin")
// Repaired tableInfo has been filtered by `domain.InfoSchema()`, here will get an error cause user can't get access to it.
if repairErr != nil && terror.ErrorEqual(repairErr, infoschema.ErrTableNotExists) {
repairErr = nil
}
}
originalHook := s.dom.DDL().GetHook()
defer s.dom.DDL().(ddl.DDLForTest).SetHook(originalHook)
s.dom.DDL().(ddl.DDLForTest).SetHook(hook)

// Exec the repair statement to override the tableInfo.
s.tk.MustExec("admin repair table origin CREATE TABLE origin (a int primary key, b varchar(5), c int auto_increment);")
c.Assert(repairErr, IsNil)

// Check the repaired tableInfo is exactly the same with old one in tableID, indexID, colID.
// testGetTableByName will extract the Table from `domain.InfoSchema()` directly.
repairTable = testGetTableByName(c, s.s, "test", "origin")
c.Assert(repairTable.Meta().ID, Equals, originTableInfo.ID)
c.Assert(len(repairTable.Meta().Columns), Equals, 3)
c.Assert(repairTable.Meta().Columns[0].ID, Equals, originTableInfo.Columns[0].ID)
c.Assert(repairTable.Meta().Columns[1].ID, Equals, originTableInfo.Columns[1].ID)
c.Assert(repairTable.Meta().Columns[2].ID, Equals, originTableInfo.Columns[2].ID)
c.Assert(len(repairTable.Meta().Indices), Equals, 1)
c.Assert(repairTable.Meta().Indices[0].ID, Equals, originTableInfo.Columns[0].ID)
c.Assert(repairTable.Meta().AutoIncID, Equals, originTableInfo.AutoIncID)

c.Assert(repairTable.Meta().Columns[0].Tp, Equals, mysql.TypeLong)
c.Assert(repairTable.Meta().Columns[1].Tp, Equals, mysql.TypeVarchar)
c.Assert(repairTable.Meta().Columns[1].Flen, Equals, 5)
c.Assert(repairTable.Meta().Columns[2].Tp, Equals, mysql.TypeLong)

// Exec the show create table statement to make sure new tableInfo has been set.
result := s.tk.MustQuery("show create table origin")
c.Assert(result.Rows()[0][1], Equals, "CREATE TABLE `origin` (\n `a` int(11) NOT NULL,\n `b` varchar(5) DEFAULT NULL,\n `c` int(11) NOT NULL AUTO_INCREMENT,\n PRIMARY KEY (`a`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin")

}

func turnRepairModeAndInit(on bool) {
list := make([]string, 0, 0)
if on {
list = append(list, "test.origin")
}
domainutil.RepairInfo.SetRepairMode(on)
domainutil.RepairInfo.SetRepairTableList(list)
}

func (s *testDBSuite5) TestRepairTableWithPartition(c *C) {
c.Assert(failpoint.Enable("github.com/pingcap/tidb/infoschema/repairFetchCreateTable", `return(true)`), IsNil)
defer func() {
c.Assert(failpoint.Disable("github.com/pingcap/tidb/infoschema/repairFetchCreateTable"), IsNil)
}()
s.tk = testkit.NewTestKit(c, s.store)
s.tk.MustExec("use test")
s.tk.MustExec("drop table if exists origin")

turnRepairModeAndInit(true)
defer turnRepairModeAndInit(false)
// Domain reload the tableInfo and add it into repairInfo.
s.tk.MustExec("create table origin (a int not null) partition by RANGE(a) (" +
"partition p10 values less than (10)," +
"partition p30 values less than (30)," +
"partition p50 values less than (50)," +
"partition p70 values less than (70)," +
"partition p90 values less than (90));")
// Test for some old partition has lost.
_, err := s.tk.Exec("admin repair table origin create table origin (a int not null) partition by RANGE(a) (" +
"partition p10 values less than (10)," +
"partition p30 values less than (30)," +
"partition p50 values less than (50)," +
"partition p90 values less than (90)," +
"partition p100 values less than (100));")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: Partition p100 has lost")

// Test for some partition changed the condition.
_, err = s.tk.Exec("admin repair table origin create table origin (a int not null) partition by RANGE(a) (" +
"partition p10 values less than (10)," +
"partition p20 values less than (25)," +
"partition p50 values less than (50)," +
"partition p90 values less than (90));")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: Partition p20 has lost")

// Test for some partition changed the partition name.
_, err = s.tk.Exec("admin repair table origin create table origin (a int not null) partition by RANGE(a) (" +
"partition p10 values less than (10)," +
"partition p30 values less than (30)," +
"partition pNew values less than (50)," +
"partition p90 values less than (90));")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: Partition pnew has lost")

originTableInfo, _ := domainutil.RepairInfo.GetRepairedTableInfoByTableName("test", "origin")
s.tk.MustExec("admin repair table origin create table origin_rename (a int not null) partition by RANGE(a) (" +
"partition p10 values less than (10)," +
"partition p30 values less than (30)," +
"partition p50 values less than (50)," +
"partition p90 values less than (90));")
repairTable := testGetTableByName(c, s.s, "test", "origin_rename")
c.Assert(repairTable.Meta().ID, Equals, originTableInfo.ID)
c.Assert(len(repairTable.Meta().Columns), Equals, 1)
c.Assert(repairTable.Meta().Columns[0].ID, Equals, originTableInfo.Columns[0].ID)
c.Assert(len(repairTable.Meta().Partition.Definitions), Equals, 4)
c.Assert(repairTable.Meta().Partition.Definitions[0].ID, Equals, originTableInfo.Partition.Definitions[0].ID)
c.Assert(repairTable.Meta().Partition.Definitions[1].ID, Equals, originTableInfo.Partition.Definitions[1].ID)
c.Assert(repairTable.Meta().Partition.Definitions[2].ID, Equals, originTableInfo.Partition.Definitions[2].ID)
c.Assert(repairTable.Meta().Partition.Definitions[3].ID, Equals, originTableInfo.Partition.Definitions[4].ID)

// Test hash partition.
s.tk.MustExec("drop table if exists origin")
domainutil.RepairInfo.SetRepairMode(true)
domainutil.RepairInfo.SetRepairTableList([]string{"test.origin"})
s.tk.MustExec("create table origin (a varchar(1), b int not null, c int, key idx(c)) partition by hash(b) partitions 30")

// Test partition num in repair should be exactly same with old one, other wise will cause partition semantic problem.
_, err = s.tk.Exec("admin repair table origin create table origin (a varchar(2), b int not null, c int, key idx(c)) partition by hash(b) partitions 20")
c.Assert(err, NotNil)
c.Assert(err.Error(), Equals, "[ddl:8215]Failed to repair table: Hash partition num should be the same")

originTableInfo, _ = domainutil.RepairInfo.GetRepairedTableInfoByTableName("test", "origin")
s.tk.MustExec("admin repair table origin create table origin (a varchar(3), b int not null, c int, key idx(c)) partition by hash(b) partitions 30")
repairTable = testGetTableByName(c, s.s, "test", "origin")
c.Assert(repairTable.Meta().ID, Equals, originTableInfo.ID)
c.Assert(len(repairTable.Meta().Partition.Definitions), Equals, 30)
c.Assert(repairTable.Meta().Partition.Definitions[0].ID, Equals, originTableInfo.Partition.Definitions[0].ID)
c.Assert(repairTable.Meta().Partition.Definitions[1].ID, Equals, originTableInfo.Partition.Definitions[1].ID)
c.Assert(repairTable.Meta().Partition.Definitions[29].ID, Equals, originTableInfo.Partition.Definitions[29].ID)
}

func (s *testDBSuite2) TestCreateTableWithSetCol(c *C) {
s.tk = testkit.NewTestKitWithInit(c, s.store)
s.tk.MustExec("create table t_set (a int, b set('e') default '');")
Expand Down
3 changes: 3 additions & 0 deletions ddl/ddl.go
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ var (
errRunMultiSchemaChanges = terror.ClassDDL.New(mysql.ErrUnsupportedDDLOperation, fmt.Sprintf(mysql.MySQLErrName[mysql.ErrUnsupportedDDLOperation], "multi schema change"))
errWaitReorgTimeout = terror.ClassDDL.New(mysql.ErrLockWaitTimeout, mysql.MySQLErrName[mysql.ErrWaitReorgTimeout])
errInvalidStoreVer = terror.ClassDDL.New(mysql.ErrInvalidStoreVersion, mysql.MySQLErrName[mysql.ErrInvalidStoreVersion])
// ErrRepairTableFail is used to repair tableInfo in repair mode.
ErrRepairTableFail = terror.ClassDDL.New(mysql.ErrRepairTable, mysql.MySQLErrName[mysql.ErrRepairTable])

// We don't support dropping column with index covered now.
errCantDropColWithIndex = terror.ClassDDL.New(mysql.ErrUnsupportedDDLOperation, fmt.Sprintf(mysql.MySQLErrName[mysql.ErrUnsupportedDDLOperation], "drop column with index"))
Expand Down Expand Up @@ -244,6 +246,7 @@ type DDL interface {
UnlockTables(ctx sessionctx.Context, lockedTables []model.TableLockTpInfo) error
CleanupTableLock(ctx sessionctx.Context, tables []*ast.TableName) error
UpdateTableReplicaInfo(ctx sessionctx.Context, tid int64, available bool) error
RepairTable(ctx sessionctx.Context, table *ast.TableName, createStmt *ast.CreateTableStmt) error

// GetLease returns current schema lease time.
GetLease() time.Duration
Expand Down
Loading