[00:33:56] 10DBA, 10Operations, 10Performance-Team, 10Platform Engineering, and 2 others: Document remaining database load groups - https://phabricator.wikimedia.org/T267077 (10Krinkle) p:05Triage→03Medium [01:21:28] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Huji) [02:35:22] 10DBA: Remove muswiki and mhwiktionary from s3 - https://phabricator.wikimedia.org/T260112 (10Urbanecm) Thanks Marostegui! [06:05:44] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) a:03Marostegui [06:06:05] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) p:05Triage→03Medium [06:06:15] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) [06:19:22] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) [08:06:28] 10DBA: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10Marostegui) p:05Triage→03Medium @kostajh thanks for the task. However, can we get some more details from your side? The clearer everything else, the faster we can try to get it accommodated on our backlog without m... [08:31:18] 10DBA: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10kostajh) >>! In T267214#6605223, @Marostegui wrote: > @kostajh thanks for the task. > However, can we get some more details from your side? The clearer everything else, the faster we can try to get it accommodated on o... [08:48:17] 10DBA, 10Operations: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Kormat) p:05Triage→03Medium [08:49:37] 10DBA, 10Operations, 10ops-codfw: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Kormat) a:03Papaul Hi @Papaul, Can you run a firmware upgrade on this host, please? Let me know a day that works for you, and i can have the host powered down safely. [09:24:17] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [09:28:01] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [09:29:05] 10DBA, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10Wikimedia-production-error, 10mariadb-optimizer-bug: Slow load times for Special:Homepage on cswiki - https://phabricator.wikimedia.org/T267216 (10Marostegui) p:05Triage→03Medium [09:29:14] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [09:30:33] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [09:35:10] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) es1 eqiad: [x] es1012 [x] es1016 [] es1018 [x] es1027 [x] es1029 [09:35:38] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [09:35:56] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) es2 eqiad: [x] es1011 [x] es1013 [] es1015 [x] es1026 [x] es1030 [09:36:24] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [09:36:58] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) es3 eqiad: [x] es1014 [x] es1017 [] es1019 [x] es1028 [x] es1031 [09:37:15] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [09:39:06] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [09:40:39] 10DBA, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10Wikimedia-production-error, 10mariadb-optimizer-bug: Slow load times for Special:Homepage on cswiki - https://phabricator.wikimedia.org/T267216 (10Tgr) I'm still confused why this is only happening on cswiki (the DB server is not actually limi... [09:42:23] 10DBA, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10Wikimedia-production-error, 10mariadb-optimizer-bug: Slow load times for Special:Homepage on cswiki - https://phabricator.wikimedia.org/T267216 (10Marostegui) Given that it looks like an optimizer bug, it could be just that particular set of v... [09:49:28] marostegui: hello, could you please add me to P13194? [09:50:02] Urbanecm: just made it public [09:50:02] would adding an index hit be bad for queries on other wikis? [09:50:10] thanks marostegui [09:50:13] s/hit/hint/ [09:50:37] jynus: who knows [09:50:42] he he [09:50:52] jynus: we're already filtering by a primary key? (if we're talking about T267216) [09:50:52] T267216: Slow load times for Special:Homepage on cswiki - https://phabricator.wikimedia.org/T267216 [09:51:01] it is weird it only happens on cswiki [09:51:17] you have definitely more DB knowledge than I do, but I really don't see how that query should take more than a second [09:51:33] Urbanecm: check the task, looks like an optimizer bug [09:51:45] well, Manuel's explanation makes a lot of sense [09:51:52] even if a bug [09:52:15] marostegui: ok :-) [09:53:04] but the hint may help (needs testing) as a shortcut until bug is fixed [09:53:25] and deployed to WMF prod (or do we somehow backport important patches?) [09:53:29] Urbanecm: I would focus on testing if that would work elsewere [09:53:43] if it doesn't break other wikis [09:53:45] 10DBA: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) es1029 pooled in es1 es1030 pooled in es2 es1031 pooled in es3 [09:53:55] 10DBA: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) [09:54:58] I loaded cs wiki and it wasn't slow, so it only happens under certain conditions? [09:55:29] jynus: you will only see that at cs.wikipedia.org/wiki/Special:Homepage [09:55:35] (and only if the bug applies) [09:56:02] you need to also enable that, it's "Newcomer homepage" settings in Preferences. [09:56:10] ah, so it is that [09:56:44] so I would say to test if a temporary fix would make the query bad on a selection of other wikis [09:57:09] we don't like index hints as they are technical debt, but we already have a few of them [09:57:19] That is up to Urbanecm, but if it is not something very recurrent, I wouldn't spend time on it. Anyways, it is not my call [09:58:46] I still cannot reproduce it BTW [10:00:13] I am going to backup db1139 before shutdown [10:01:56] I wasn't able to reproduce it as well, but it certainly happens - the logs don't lie :-) [10:04:09] as for a workaround, that'd be a question for the team's engineers (Roan.Kattouw, tgr and kostajh) [10:04:24] sure [10:04:55] but it's pretty rare, https://logstash.wikimedia.org/goto/c46f756934a4329470eb14d53ad1666e, about 600 in last 30 days, and the homepage is the main landing page for cswiki newcomers, so it's visited by a lot of people (don't have numbers at hand right now) [10:09:12] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) Creating a backup before shutting them down, in case data got lost after maintenance. [10:14:08] Urbanecm: I am going to keep droping muswiki and friend [10:14:19] From s3 [10:14:19] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [10:14:21] as long as you drop them from s3 only :) [10:14:24] thanks marostegui ! [10:15:47] Urbanecm: yeah, I am triple checking things, including checking the ibd on the itself, to make sure they've not being touched since 11th aug [10:15:57] (y) [10:23:13] 10DBA: Remove muswiki and mhwiktionary from s3 - https://phabricator.wikimedia.org/T260112 (10Marostegui) 05Open→03Resolved This is done: ` # ./section s3 | while read host port; do echo "$host:$port"; mysql.py -h$host:$port -e "show databases like 'muswiki'; show databases like 'mhwiktionary'";done labsdb10... [11:17:23] 10DBA: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10Marostegui) Thanks @kostajh - adding those to the original template. [11:18:22] 10DBA: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10Marostegui) [11:24:29] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [11:40:46] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [11:58:56] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) [12:00:06] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) [12:04:05] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) db1139 is down, backed up, and ready for maintenance - I have downtime'd until Friday. Let us know either if you will need more time or when it has been done to put it back into p... [12:04:18] ^FYI [12:42:30] PROBLEM - MariaDB sustained replica lag on db2090 is CRITICAL: 3500 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2090&var-port=9104 [12:47:08] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [12:55:22] 10DBA, 10Commons, 10Operations, 10Platform Engineering, and 2 others: Increase on database writes and deletes activity on Commonswiki leads to some replication lag - https://phabricator.wikimedia.org/T266432 (10Marostegui) 05Open→03Resolved a:03Marostegui This has ceased and we are back to normal val... [12:59:40] RECOVERY - MariaDB sustained replica lag on db2090 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2090&var-port=9104 [13:06:14] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [13:25:40] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [13:36:59] godog: finally got around to doing http://mystery.knightlab.com/. that was neat, thanks :) [13:37:08] (you mentioned it on 2020-08-04) [13:38:48] kormat: haha! glad you enjoyed it :) [13:39:15] I must admit I only played a few queries [13:39:22] kormat: https://www.slideshare.net/jynus/mysql-schema-design-in-practice next [13:39:49] jynus: amusingly, that's the next tab i have open for this :) [13:40:44] we can do live sessions if needed, some of the problems are difficult to solve without more explanation [13:41:23] * sobanski takes notes... [13:42:24] some of those now that I see are outdated, due to actor and MCR migrations [13:54:30] I already disagree at the first example query... in SQLite you can just do .tables :-P [13:54:53] (re: mistery game above, not the slides) [13:56:04] volans: not an sqllist pro, but I would guess it is the same as SHOW TABLES; vs INFORMATION_SCHEMA.TABLES [13:56:31] you do the first quickly on command line but use the I_S as good practice [13:56:41] the output is actually the same, the full query one per line, the other in columns [13:56:54] but yeah both are totally viable [13:59:11] it could be worse: "SELECT * FROM cat" [13:59:37] why not "SELECT * FROM dog" ?????? [14:03:31] something is going on since 13:09- lots of errors about lag [14:03:43] ah, it is codfw only [14:03:53] most likely the alter table [14:04:09] ok, it wasn't clear at first which dc it was [14:04:20] https://logstash.wikimedia.org/goto/7e9e06a0b0c6e569d26b03678367c6eb [14:05:24] -host:mw2* on filter makes it look good :-D [14:20:20] sobanski: if you want what looks like a decent intro to sql, https://selectstarsql.com/ seems good [14:20:29] i'm doing a bit of it while a schema change runs [14:21:00] I opened this one from the initial recommendation on the knight lab page :) [14:21:21] :) [14:39:30] 10DBA: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10kostajh) [14:39:44] 10DBA: Add a link engineering: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10kostajh) [14:44:19] 10DBA, 10Operations, 10ops-codfw: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Papaul) @Kormat you can doing it now if you have time. Thanks. [14:44:54] 10DBA, 10Operations, 10ops-codfw: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Kormat) Perfect. I'll bring it down now, and update here when done. [14:47:26] 10DBA, 10Operations, 10ops-codfw: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Kormat) @Papaul: it's powering off now. Thanks! [15:14:37] 10Blocked-on-schema-change, 10DBA: Drop default of protected_titles.pt_expiry - https://phabricator.wikimedia.org/T267335 (10Ladsgroup) [15:18:03] 10DBA, 10Operations, 10ops-codfw: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Papaul) a:05Papaul→03Kormat Before ` BIOS Version 2.4.3 Firmware Version 2.40.40.40 Lifecycle Controller Firmware 2.40.40.40 `` After `` BIOS Version 2.11.0 Firmware Version 2.75.75.75 Lifecy... [15:24:50] 10DBA, 10Operations: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Kormat) [15:24:55] 10DBA, 10Operations, 10ops-codfw: db2077 hung on reboot - https://phabricator.wikimedia.org/T267220 (10Kormat) 05Open→03Resolved Great, thanks :) [15:59:41] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10Cmjohnson) When these arrive they will be sitting on the floor until we have space to rack them. At this time I may be able to get 4 or 5 racked in 10G racks. [16:17:06] 10DBA, 10Operations: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [16:24:20] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10Marostegui) @Cmjohnson we are going to decommission 9 2U hosts soon. I can prioritize to decommission at least 3 of them in the next 2 weeks (in different rows) so... [16:27:51] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10Cmjohnson) @Marostegui yes, db1091 is already gone from the racks. I did a more detailed count and right now, not removing any 1G servers from 10G racks I can fit... [16:29:30] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10Marostegui) Cool, we can plan for that. Count also with at least those 6U I am going to free up before those arrive [17:22:32] marostegui/jynus john is working on the motherboard swap for db1139 now [17:23:56] thanks [17:25:43] just updating the task would be enough, as we may not be around later in the day :-) [17:25:57] (when done) [18:10:16] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Jclark-ctr) @jcrespo mainboard replaced configured settings [18:16:53] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Jclark-ctr) 05Open→03Resolved [18:17:58] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Jclark-ctr) [19:31:57] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [19:49:50] 10DBA, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Slow load times for Special:Homepage on cswiki - https://phabricator.wikimedia.org/T267216 (10Tgr) a:03Tgr