[01:05:56] 10DBA, 10Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Jdforrester-WMF) [01:06:16] 10DBA, 10Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Jdforrester-WMF) [02:32:19] 10DBA, 10Epic, 10Tracking-Neverending: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10Zoranzoki21) [02:32:21] 10DBA, 10Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Zoranzoki21) [05:56:59] 10DBA, 10Patch-For-Review: Productionize es1020-es1025, es2020-es2025 - https://phabricator.wikimedia.org/T243052 (10Marostegui) [05:57:05] 10DBA, 10Epic, 10Goal: Setup es4 and es5 replica sets for new read-write external store service - https://phabricator.wikimedia.org/T226704 (10Marostegui) [05:57:07] 10DBA, 10Patch-For-Review: Productionize es1020-es1025, es2020-es2025 - https://phabricator.wikimedia.org/T243052 (10Marostegui) 05Open→03Resolved All hosts ready [05:58:49] 10DBA, 10Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Marostegui) p:05Triage→03Medium Let's truncate them, same as T227717#5806662 [06:50:04] 10DBA, 10MediaWiki-API, 10Pywikibot, 10Wikidata, and 3 others: Wikidata API fails with timeout when asking for 5 RC redirects - https://phabricator.wikimedia.org/T245989 (10Marostegui) From what I have seen, the optimizer chooses the wrong index on all s8 hosts. I have also tried to optimize `page` to see... [07:36:57] 10DBA: Slow query on 10.4: SpecialRecentChanges::doMainQuery - https://phabricator.wikimedia.org/T246069 (10Marostegui) [07:37:24] 10DBA, 10mariadb-optimizer-bug: Slow query on 10.4: SpecialRecentChanges::doMainQuery - https://phabricator.wikimedia.org/T246069 (10Marostegui) p:05Triage→03Medium I am going to report this to MariaDB [07:41:56] 10DBA, 10mariadb-optimizer-bug: Slow query on 10.4: SpecialRecentChanges::doMainQuery - https://phabricator.wikimedia.org/T246069 (10Marostegui) Another comparison with the JSON output Normal run: ` "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 12905, "table": { "tabl... [08:36:58] 10DBA, 10mariadb-optimizer-bug: Slow query on 10.4: SpecialRecentChanges::doMainQuery - https://phabricator.wikimedia.org/T246069 (10Marostegui) Created this: https://jira.mariadb.org/browse/MDEV-21813 [08:47:58] 10DBA, 10Upstream, 10mariadb-optimizer-bug: Slow query on 10.4: SpecialRecentChanges::doMainQuery - https://phabricator.wikimedia.org/T246069 (10Aklapper) [08:50:11] 10DBA, 10Core Platform Team: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Marostegui) [08:50:44] 10DBA, 10Core Platform Team, 10Goal: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Marostegui) p:05Triage→03Medium [08:53:50] 10DBA, 10Upstream: Possibly disable optimizer flag: rowid_filter on 10.4 - https://phabricator.wikimedia.org/T245489 (10Marostegui) [09:11:37] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Wikibase client description API module results in 15k selected rows with new term storage - https://phabricator.wikimedia.org/T246005 (10Addshore) [10:00:34] I was checking the CHECK constraint on db1114, it is easy to implement [10:48:25] same syntax in mariadb and mysql [10:48:43] but if violated, mysql says: ERROR 3819 (HY000): Check constraint 'mycheck' is violated. [10:48:49] and mariadb says: [10:48:58] ERROR 4025 (23000): CONSTRAINT `mycheck` failed for `ops`.`mytable` [12:18:57] jynus, marostegui: is it okay to leave tendril in the current state for another ~ 2 hrs? there's something not working in the combination of mod_rewrite and mod_auth_cas, will revert to mod_auth_ldap if no fix can be tracked down, but would like to collect some more logs for an upstream bug report [12:19:30] moritzm: fine by me, but we should leave it running by the end of the day, in case we have issues during the evening, it is useful for us to debug [12:20:07] definitely, will revert to mod_auth_ldap later for sure [12:22:45] cool then [12:22:47] thank you [12:23:04] ack, thx [12:45:12] marostegui if you can shutdown es1019 I can update f/w...not sure if you need more notice https://phabricator.wikimedia.org/T243963 [12:45:20] 10DBA, 10OTRS, 10Operations, 10Recommendation-API, 10Research: Upgrade and restart m2 primary database master (db1132) - https://phabricator.wikimedia.org/T246098 (10Marostegui) [12:45:28] cmjohnson1: Let me try to depool it and see if connections stops - thanks! [12:45:50] 10DBA, 10OTRS, 10Operations, 10Recommendation-API, 10Research: Upgrade and restart m2 primary database master (db1132) - https://phabricator.wikimedia.org/T246098 (10Marostegui) p:05Triage→03Medium [12:52:58] cmjohnson1: es1019 is now off! once you are done it, power it back on and I will take it from there. Thank you :) [13:21:13] head up, I will be (ab)using the test-s1 hosts db1114/db2102 for backup recovery testing (aka its intended usage) [13:21:48] keep in mind that db1114 is percona [13:24:25] I know, I set it up myself :-D, but It is nice because it allows me to test under far from ideal conditions [13:24:51] I remember one thing that I should add to the 10.4 ticket [13:25:40] we should create wmf-pt-heartbeat package on buster, or we may end up too later heartbeat doesn't work on debian 10 [13:26:10] *learning [13:27:01] 10DBA: Test MariaDB 10.4 in production - https://phabricator.wikimedia.org/T242702 (10jcrespo) [13:27:10] ^I hope you are ok with this edit [13:32:18] yes [13:38:48] 10DBA: Test MariaDB 10.4 in production - https://phabricator.wikimedia.org/T242702 (10jcrespo) [14:05:34] 10DBA: Test MariaDB 10.4 in production - https://phabricator.wikimedia.org/T242702 (10Marostegui) I have left a heartbeat running on db1107, it should have no problem, but let's give it a few days to make sure it doesn't die or crash or whatever ` root 25652 0.0 0.0 34892 17016 ? Ss 13:41 0:00... [14:30:50] marostegui: the update keeps failing, I've tried locally and through web portal. I am out of ideas right now and a bit frustrated with it, I have other things that I need to take care of. It didn't go as easy as it was supposed. we can keep the task open but I'm not sure what to do next. [14:32:22] cmjohnson1: manuel not around [14:32:35] let me know how I should proceed, is this about es1019? [14:32:42] okay...yes it is es1019 [14:32:50] ok, will pool it back [14:32:56] okay! thanks [14:32:57] if you think it is not easy [14:33:10] and we can discuss how to proceed with more time [14:33:14] okay [14:33:43] when es4 and es5 hosts are in production, we may have more margin [14:33:53] the main blocker would be the reimage of that host [14:34:02] could you please update the ticket with the summary? [14:34:07] so manuel is aware? [14:34:31] if it is not working it is not working, nothing we can do :-D [14:36:40] cmjohnson1: ok, let's forget about it but let's make sure the idrac is reachable. that host will be refreshed next FY [14:37:05] cmjohnson1: so let's leave the host back on and with idrac reachable and let's forget the FW update anyways [14:37:08] is that possible, cmjohnson1^? [14:37:26] a power drain worked last time [14:37:39] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10Cmjohnson) I attempted to update the idrac f/w for es1019 but the update failed several times for not being able to verify package signature. The update was downloaded directly from dell's portal... [14:38:23] jynus, yes the f/w update was supposedly going to correct the idrac from freezing periodically. [14:39:39] what I mean, could it be drained so at least it is temporarilly fixed? [14:39:57] yes, that's what we've been doing [14:40:17] thanks, that would be ok for now [14:40:27] as the permanent solution is not easy [14:40:56] maybe when we failover to codfw we will have extra time to debug more calmly [14:52:14] cmjohnson1: just to be clear, do I have green light to put es1019 back into production service? [14:52:23] (for now) [14:53:21] jynus: yes sorry! all clear [14:53:43] thanks, just confirming to avoid unscheduled service interruptions [14:53:56] :-) [14:55:29] I also confirm admin interface up (for now :-S) [14:57:54] cool [14:58:05] as I said that host is meant to be refreshed next FY anyways [14:58:21] so hopefully it won't be long till we can remove it [15:01:41] I will start es1019 so it catches up replication wise [15:02:54] thanks [15:03:05] I upgraded it before powering it off [15:04:07] I can see, was wondering if to upgrade to 43-2 [15:06:38] done :) [15:10:53] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10Marostegui) Thanks Chris for tackling this. Let's not spend more time on this host, it has a big history of failing idrac :( So let's just make sure it is available and if it fails in a few months... [15:13:41] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10jcrespo) Will close this, then, once the host is fully back into production. [15:45:32] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10MW-1.35-notes (1.35.0-wmf.20; 2020-02-18), and 2 others: Wikibase client description API module results in 15k selected rows with new term storage - https://phabricator.wikimedia.org/T246005 (10Ladsgroup) https://grafana.wikimedia.org/d/000000273/... [15:51:26] 10DBA, 10Upstream, 10mariadb-optimizer-bug: Slow query on 10.4: SpecialRecentChanges::doMainQuery - https://phabricator.wikimedia.org/T246069 (10Anomie) I note that neither `tmp_2` nor `tmp_3` are in tables.sql. See also {T206103}. None of the possible queries here really seem all that good. `tmp_2` has pre... [15:57:04] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10Cmjohnson) 05Open→03Resolved The reseat was completed but idrac f/w updated failed. resolving the task and will do flea power drains if or when idrac freezes again. [16:02:45] 10DBA, 10Upstream, 10mariadb-optimizer-bug: Slow query on 10.4: SpecialRecentChanges::doMainQuery - https://phabricator.wikimedia.org/T246069 (10Marostegui) Yes, those indexed are meant to be removed, but last time I checked they are indeed being used (as we saw here). We can probably try to get a host and r... [16:28:24] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Anomie) [16:35:33] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10jcrespo) We discussed this, Manuel and I, and unless someone can figure an ingenious way to test it, writes cannot... [16:51:43] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Anomie) I'm not terribly familiar with this process either, but as I understand it the process might go something... [16:57:37] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10jcrespo) Can a cluster be set as read only on one mw server and read-write on others? Would that work well, as far... [16:57:51] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Anomie) >>! In T246072#5916373, @jcrespo wrote: > We discussed this, Manuel and I, and unless someone can figure a... [17:00:40] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10jcrespo) Thanks, that solves my fears. [17:11:20] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Anomie) >>! In T246072#5916561, @jcrespo wrote: > Can a cluster be set as read only on one mw server and read-writ... [17:51:08] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10jcrespo) es1019 is just pending the last config push back to normal traffic weights (and reducing the master's). [17:58:55] 10DBA, 10Core Platform Team, 10GlobalUsage, 10StructuredDataOnCommons: Normalize globalimagelinks table - https://phabricator.wikimedia.org/T241053 (10WDoranWMF) Moving to feature request review but may be destined for future initiatives to be prioritised [18:20:30] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Marostegui) Thanks @anomie for the detailed steps. That's super helpful. I think we can try that with es4 and one... [19:10:35] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Anomie) >>! In T246072#5917047, @Marostegui wrote: > I'm not familiar with `shell.php` so I'll have a look a how t... [21:03:00] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Marostegui) >>! In T246072#5917275, @Anomie wrote: >>>! In T246072#5917047, @Marostegui wrote: >> I'm not familiar... [21:49:20] 10DBA, 10Operations, 10ops-eqiad: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10Jclark-ctr) @Marostegui Received replacement bbu. please message me on irc to schedule replacement [21:59:56] 10DBA, 10Operations, 10ops-eqiad: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10Jclark-ctr) Replaced BBU @jcrespo @Marostegui [22:32:41] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10MW-1.35-notes (1.35.0-wmf.20; 2020-02-18), and 2 others: Wikibase client description API module results in 15k selected rows with new term storage - https://phabricator.wikimedia.org/T246005 (10Addshore) 05Open→03Resolved