[07:13:59] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2826343 (10Marostegui) The transfer of s3 has started, from db1044 to db1095. [07:39:05] 10DBA, 06Operations, 13Patch-For-Review: db1092 crash - https://phabricator.wikimedia.org/T151272#2826351 (10Marostegui) 05Open>03Resolved I have repooled this server after a week of no issues. [07:48:49] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2826356 (10Marostegui) ALTER running on db1069 s5 instance. [07:57:26] 10DBA, 06Operations, 10Wikidata, 10netops, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2824480 (10Marostegui) Holding connections on the master: if there are 5-10 jobs running it shouldn't be a big deal as I assume only 10 c... [09:00:09] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure: Rebuild old timestamp format tables - https://phabricator.wikimedia.org/T151607#2826415 (10Marostegui) MariaDB replied with a solution that works, however they admit that it is weird that an `ALTER TABLE force` doesn't work: https://jira... [09:02:27] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure: Rebuild old timestamp format tables - https://phabricator.wikimedia.org/T151607#2826417 (10Marostegui) The above comment obviously only works on `10.1` [09:11:43] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2826426 (10Marostegui) The ALTER is running on the master (db1040) ``` ./software/dbtools/osc_host.sh --host=db1040.eqiad.wmnet --port=3306 --db=commonswiki --table=revision --method=ddl --no-replicate "add key p... [09:25:21] 10DBA, 06Operations, 10Wikidata, 10netops, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2826465 (10jcrespo) @Manuel, @Daniel Actually it is a problem, because masters have a limit of CPU# or 32 active threads on the pool of co... [09:27:09] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2826467 (10jcrespo) [09:54:49] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2826506 (10Marostegui) The master is done ``` root@neodymium:/home/marostegui/git/software/dbtools# mysql -hdb1040 -A commonswiki -e "show create table revision\G" *************************** 1. row *************... [09:56:39] 10DBA: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#2826511 (10Marostegui) [09:56:43] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2826509 (10Marostegui) 05Open>03Resolved All the servers are done (db2044|db2037|db1053|db1056 are rc ones): ``` root@neodymium:/home/marostegui/git/software/dbtools# for i in `cat s4.hosts | awk -F " " '{pri... [09:58:24] 10DBA, 06Operations, 10ops-codfw: db2042 disk predictive failure - https://phabricator.wikimedia.org/T150974#2826515 (10Marostegui) [10:31:06] 10DBA, 06Labs, 06Operations: fstrim: Operation not supported on Labs DBs - https://phabricator.wikimedia.org/T151746#2826574 (10Volans) [11:22:55] 10DBA, 06Labs: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2826725 (10MarcoAurelio) [11:23:50] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2826738 (10Marostegui) - data transferred - ran mysql_upgrade on it all good - replication is... [11:48:06] 10DBA, 06Labs: Prepare and check storage layer for new fi.wikivoyage.org - https://phabricator.wikimedia.org/T151756#2826837 (10MarcoAurelio) [11:49:00] 10DBA, 06Labs: Prepare and check storage layer for new fi.wikivoyage.org - https://phabricator.wikimedia.org/T151756#2826852 (10MarcoAurelio) Task created as per [[ https://wikitech.wikimedia.org/wiki/Add_a_wiki#Start | instructions ]] on Wikitech. [11:56:41] 10DBA, 06Labs: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2826870 (10MarcoAurelio) #Patch-for-review: https://gerrit.wikimedia.org/r/#/c/323814/2 - Added `arbcom_cswiki` to `$private_wikis` in `realm.pp`. [12:04:43] 10DBA, 06Labs, 15User-Urbanecm: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2826897 (10Urbanecm) [12:35:52] 10DBA, 06Labs, 15User-Urbanecm: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2826966 (10jcrespo) a:03jcrespo [13:04:36] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: db2049 overheated and restarted - https://phabricator.wikimedia.org/T150876#2827020 (10Marostegui) 05Open>03Resolved This server didn't have any other issue and not even when it gots its cpu burned for some hours. So I have pooled it back [14:23:04] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2827166 (10daniel) I would favor doing the locking using an alternative mechanism for which we already have infrastructure. Setting up a separate Maria server just to... [14:31:53] jynus: ping pong help :) pymysql.err.OperationalError: (1142, "CREATE VIEW command denied to user 'maintainviews'@'localhost' for table 'page_assessments'") on labsdb1003 even though it's fine on labsdb1001 [14:32:05] (same command same script etc) [14:32:34] let me see [14:32:59] same for command: maintain-views --databases testwiki enwikivoyage enwiki --table page_assessments_projects --debug [14:33:02] pymysql.err.OperationalError: (1142, "CREATE VIEW command denied to user 'maintainviews'@'localhost' for table 'page_assessments_projects'") [14:33:10] and is ok on labsdb1001 [14:35:07] I see no differences on grants [14:36:09] flush privs? [14:36:12] 10DBA, 06Operations, 10ops-codfw: Degraded RAID on db2068 - https://phabricator.wikimedia.org/T151763#2827191 (10Volans) p:05Triage>03Normal [14:36:43] yes, this gives no differences: diff <(pt-show-grants h=labsdb1001 | grep maintainviews) <(pt-show-grants h=labsdb1003 | grep maintainviews) [14:37:38] jynus: a command to replicate what I'm doing and see different behavior [14:37:39] maintain-views --databases testwiki --table page_assessments --debug --replace-all [14:37:50] ok on labsdb1001 not on 1003 and --debug outputs the sql run [14:38:13] yes, I got it. I run flush privileges, but that should not make any difference [14:38:18] *ran [14:38:27] huh [14:39:02] md5sum $(which maintain-views) seems the same on each fyi [14:39:18] run it again, just in case [14:39:36] hm no dice, same thing [14:39:44] yeah, that is expected [14:39:52] let me see if the other user has a difference [14:39:59] I remember already setting up views on 1003 for other db's / tables [14:40:06] so this /has/ worked for sure [14:40:12] kk [14:40:34] nothing has changed, except that I deleted a bunch of databases on 1001 (not on 1003) [14:41:30] can you print in which database it is failing, everyone? [14:44:08] GRANT ALL PRIVILEGES ON `%wik%\_p`.* TO 'maintainviews'@'localhost'; [14:44:58] are you sure it is trying to create the views on testwiki_p, and not on testwiki ? [14:45:10] I checked how the hosts resolve their localhost just in case but they are fine both of them [14:45:20] also there is no other maintainview users than the one for @localhost [14:45:30] and as Jaime said, they do have exactly the same privs [14:46:09] maybe the table doesn't exist [14:47:56] if the table doesn't exist the view usually succeeds in creation but is not useful (I think?) and the script also checks for existence of hte underlying table and if it doesn't exist an attempt isn't made [14:48:15] so eitehr way that shouldn't affect [14:48:38] well, the tables exist on enwiki enwikivoyage and testwiki on labsdb1003 [14:48:43] not everywhere else [14:48:48] right [14:48:53] I'm only running this gainst 3 DBs [14:48:58] and only then the specific tables [14:49:05] and it fails for all 3 dbs teh same on 1003 [14:49:25] this the first command I tried [14:49:25] maintain-views --databases testwiki enwikivoyage enwiki --table page_assessments_projects --debug [14:49:45] that says, find these 3 DBs (make sure not private) and create views for these tables only if the underlying tables exist [14:49:58] if I run across the individual DBs only it fails on all 3 individually [14:50:04] can you print the full query and the current database at the time it fails? [14:51:01] https://phabricator.wikimedia.org/P4521 [14:51:58] https://phabricator.wikimedia.org/P4521 [14:52:24] that gives me a syntax error [14:52:52] what does? [14:54:01] nah, it was unrelated [14:54:04] the syntax is ok [14:58:01] marostegui: only an fyi it's using teh local socket file to connect if that matters [14:58:02] yes, it is a db issue [14:58:13] an account db issue [14:58:19] what I do not know yet is why [14:59:07] 10DBA, 06Labs, 15User-Urbanecm: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2827258 (10MarcoAurelio) I guess we can remove #Labs here as it should not replicate in Labs? It's a private wiki after all. [14:59:08] I can reproduce it, so it should be fast to fix [14:59:17] gotcha, thanks man [15:01:32] but it has to be a crazy issue [15:01:48] I have virtually no ideas at the moment :) [15:01:56] the only difference i noticed was the order of the privileges in the output of show grants for [15:01:59] seems bizarre [15:02:02] yes [15:02:04] but that should make _no_ difference [15:02:07] but that should not affect [15:02:16] we can try dropping the user and recreating it? [15:02:25] "switching on and off" [15:02:27] We could, and if that works….then I am puzzled [15:02:40] drop, recreate, flush privs? [15:02:55] the flush privileges is not needed [15:03:23] I am going to drop the user [15:03:44] chasemp: normally flush privileges is only needed when you directly touch the users tables in the mysql database [15:04:02] I only mention as last time we were in this pickle it was required [15:04:37] chasemp: yeah, that is why I said: normally XD [15:04:43] hehe [15:05:03] if there is a corner case that only effects 1% of 1%: Labs will find it [15:05:03] it still doesn't work [15:05:04] that's our motto [15:05:18] weird [15:06:25] it could be some weird issue because the underscore table [15:07:28] nope [15:07:39] it doesn't work with enwiki.test [15:08:53] it doen't work without definer either [15:09:25] some crazy max views or soemthing totally unrelated but that could differ between servers by coincidence? [15:09:36] no [15:09:57] some grants we are missing, but why it worked in the past [15:10:47] I am going to try granting CREATE VIEW explicitly [15:10:58] and/or all on a specific database [15:13:03] it is the privilege \_ [15:13:36] I don't get why it works differently in the 1001 and 1003 case then? [15:13:44] I know [15:13:50] there is a grant missing [15:14:04] SELECT ON `%wik%` [15:14:37] there is something wrong [15:14:41] let me fix it [15:14:45] but that grant is present on 1003 [15:14:57] it is not, [15:15:08] GRANT SELECT ON `%wik%`.* TO 'maintainviews'@'localhost' [15:15:08] or it was not [15:15:13] it is now [15:15:16] with all [15:15:20] on my tests [15:15:26] that is from my history [15:15:27] weird [15:16:21] I hear twilight zone music playing in my head [15:16:23] yes, not it works, chasemp [15:16:26] *now [15:16:55] sure does, thanks guys [15:17:03] that is weird [15:17:51] chasemp, one last thing [15:18:04] it is weird, that grant was there since the start, veeeery weird issue [15:18:08] I may move the operations/mediawiki-config to a common [15:18:13] role [15:18:23] because I will need it for a monitoring check [15:18:32] the sync of it on puppet [15:18:38] will ask for a +1 [15:19:01] it's all good to me, now that it's decoupled from the script itself, as long as the repo lands on the DBs somehow I am good w/ however you guys want to do it [15:19:05] let me know dude [15:20:33] 10DBA, 06Community-Tech, 06Labs, 10MediaWiki-extensions-PageAssessments, and 2 others: Replicate page_assessments and page_assessments_projects tables on Labs - https://phabricator.wikimedia.org/T150832#2827293 (10chasemp) 05Open>03Resolved This has been created. (let me know otherwise) [15:21:17] jynus: do you object to me running w/ https://phabricator.wikimedia.org/T150679#2818304 (attachment for custom view change)? [15:21:38] object no [15:21:45] I worry about angry users [15:22:10] me too but I'm sending them to bawolff's house [15:22:16] I am ok [15:22:28] I will try to come up with a solution, the same I did with watchlist_count [15:22:34] and puppetize both properly [15:23:15] but that can go now [15:26:10] I think what it could be [15:26:37] if tables change and they are wildcard-based [15:26:48] maybe mysql cache doesn't get it [15:27:01] and it requires a privilege change to get it [15:27:09] which is a problem that could reappear in the future [15:27:40] but you dropped the user [15:27:51] then, it is magic! [15:28:08] or a huge issue with custom mysql grant tables [15:33:22] 10DBA, 15User-Urbanecm: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2827317 (10jcrespo) Correct, this is production replication labs-only task. Normally we categorize those as subtasks of {T50930}, but I intend to fix this ASAP. [15:33:54] ^replication-only, I meant [15:55:24] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2827363 (10Marostegui) >>! In T148967#2826356, @Marostegui wrote: > ALTER running on db1069 s5 instance. This was accidentally killed - after talking to @jcrespo we decided that it is really not worth the time, but we will do it on:... [16:18:51] 10DBA, 15User-Urbanecm: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2827396 (10jcrespo) So this is applied to db1069, which is the active sanitarium. However, right now, sanitarium2 (the next-generation server for this service)... [16:38:18] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2827470 (10jcrespo) [16:38:20] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2827467 (10jcrespo) 05Open>03Resolved a:03Marostegui I would say the testing is done, let's... [16:49:28] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2827519 (10daniel) Possibly relevant: https://github.com/mlanett/redis-lock [16:54:37] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2827546 (10daniel) Using memcached is probably not feasible, since locks may be dropped at any time. If Redis turns out not to be an option, we may consider using a m... [16:56:50] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2827557 (10daniel) >>! In T151681#2826465, @jcrespo wrote: > @Manuel, @Daniel Actually it is a problem, because masters have a limit of CPU# or 32 active threads on t... [16:57:56] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2827561 (10daniel) >>! In T151681#2826465, @jcrespo wrote: > @Marostegui, @Daniel Actually it is a problem, because masters have a limit of CPU# or 32 active threads... [17:10:09] 10DBA, 15User-Urbanecm: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2827580 (10MarcoAurelio) There's no need to rush here so it's fine to wait until the maintenance is done. [17:18:42] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2827592 (10jcrespo) max_connections is 5000, maximum active threads is 32 enforced on the connection pool. No connections should be open that are idle, and a typical... [17:39:13] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2827708 (10aaron) How often would locks be dropped? Using ScopedLock would handle exceptions in non-lock code. The shutdown handler usually catches SIGINT. I guess th... [17:44:16] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2827758 (10aaron) There is also a flip-side to automatically dropping on connection loss, which is that loss can happen (possibly due to the net_wait_timeout options)... [18:26:00] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2827913 (10Jdforrester-WMF) Thank you! [18:54:38] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2828133 (10Marostegui) S3 and S1 are now replicating in db1095. There was some issues when rep... [18:57:36] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review: Wikibase\Repo\Store\Sql\SqlEntityIdPager::fetchIds query slow - https://phabricator.wikimedia.org/T151356#2828165 (10hoo) [19:17:29] jynus: I'm about to run the fixup for user_properties on labsdb1001 fyi [19:19:03] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2828236 (10jcrespo) Basically, the heartbeat table is shared between shards, so the replace co... [19:19:23] that is ok [19:19:29] I am going to disconnect now [19:19:50] we will have to sync, maybe this week? about account handling and proxies [19:20:34] the bulk of DBA work will finish soon (TM), but there are lots of small pending tasks that need labs input [19:21:29] we could maybe have 2 shards replication as early as this or next week [19:21:52] and that is 800 wikis, including enwiki [19:21:59] sweet [19:22:14] Let's talk a bit friday morning? [19:22:17] my morning that is [19:22:24] ok for me [19:22:31] will ask manuel tomorrow [19:22:42] have a good evening jynus :) [19:29:57] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Wikibase\Repo\Store\Sql\SqlEntityIdPager::fetchIds query slow - https://phabricator.wikimedia.org/T151356#2828329 (10hoo) 05Open>03Resolved [21:56:46] 10DBA, 06Labs, 10Labs-Infrastructure: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2829103 (10chasemp) @jcrespo how do you feel about setting limits via http://dev.mysql.com/d... [22:44:09] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2829238 (10daniel) >>! In T151681#2827708, @aaron wrote: > How often would locks be dropped? Using ScopedLock would handle exceptions in non-lock code. The shutdown h... [22:44:42] 10DBA, 10Wikidata, 07Performance: DispatchChanges: Avoid long-lasting connections to the master DB - https://phabricator.wikimedia.org/T151681#2829240 (10daniel)