[00:55:40] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) @Marostegui Watchlist Expiry has now been enabled on Metawiki, Enwikisource, Enwikivoyage, Enwikiversity, Eswiktionary, Eswikisource, Eswikivoyage, Hewik... [05:16:23] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) Thank you @ifried - I will take a look at the graphs for the servers where those wikis live in case there're strange things. [05:40:34] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2019.codfw.wmnet - https://phabricator.wikimedia.org/T264063 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `es2019.codfw.wmnet` - es2019.codfw.wmnet (**PASS**) - Downtimed host on Icinga... [05:43:55] 10DBA, 10decommission-hardware: decommission es2019.codfw.wmnet - https://phabricator.wikimedia.org/T264063 (10Marostegui) a:05Marostegui→03Papaul [06:03:05] 10Blocked-on-schema-change, 10DBA: Schema change to drop three indexes from wb_changes - https://phabricator.wikimedia.org/T264109 (10Marostegui) Progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1005 [] db2100 [] db2094 [] db2091 [] db2086 [] db2085 [x] db2084 [] db2083 [] db2082 []... [06:03:43] 10Blocked-on-schema-change, 10DBA: Schema change to drop three indexes from wb_changes - https://phabricator.wikimedia.org/T264109 (10Marostegui) @Ladsgroup I am going to deploy this first to a few hosts in codfw (as it is active now) to make sure we are totally fine, performance wise. [06:29:43] 10DBA, 10Security: Database reboots for MD5 vulnerabilities - https://phabricator.wikimedia.org/T264154 (10Marostegui) [06:57:48] 10DBA, 10decommission-hardware: decommission es2016.codfw.wmnet - https://phabricator.wikimedia.org/T264156 (10Marostegui) [06:58:23] 10DBA, 10decommission-hardware: decommission es2016.codfw.wmnet - https://phabricator.wikimedia.org/T264156 (10Marostegui) [07:01:31] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2016.codfw.wmnet - https://phabricator.wikimedia.org/T264156 (10Marostegui) [07:05:37] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2016.codfw.wmnet - https://phabricator.wikimedia.org/T264156 (10Marostegui) [07:48:46] 10DBA: Failover DB masters in row D - https://phabricator.wikimedia.org/T186188 (10Marostegui) [07:48:52] 10DBA, 10Patch-For-Review: Failover s6 master, db1093 to db1131 - https://phabricator.wikimedia.org/T263227 (10Marostegui) 05Open→03Resolved This is all done - db1131 is the new s6 eqiad master [07:51:42] 10DBA: Failover DB masters in row D - https://phabricator.wikimedia.org/T186188 (10Marostegui) 05Open→03Resolved a:03Marostegui After failing over s6 eqiad master (T263227) and s8 eqiad master (T239238), the scope for this task, which was to balance the masters across rows for sX-x1, that's achieved: 2 mas... [08:07:26] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2016.codfw.wmnet - https://phabricator.wikimedia.org/T264156 (10Marostegui) [08:10:23] 10DBA, 10Operations: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Marostegui) [08:29:00] marostegui: re: T264154, can i set it as blocked on the dc switchback? the remaining nodes are either in codfw, or are ES hosts that are getting replaced [08:29:31] yeah, I was thinking about stalling it, so +1! [08:30:38] hmm. is there a function in phabricator for "this task is blocked on this other task"? or is it just set to stalled, and mention it in the comment? [08:31:03] yeah, you cannot set blocked by or anything specifically [08:36:06] marostegui: is there a task for decomming the old es1* hosts? closest i can find is T261717, which doesn't mention that bit specifically [08:36:07] T261717: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 [08:36:33] kormat: no, not really, I am just using that one and creating subtasks, but es1* cannot be touched, as the replacements aren't ready [08:36:44] I am hoping dc-ops will have them installed by 30th oct [08:37:04] https://phabricator.wikimedia.org/T260370 [08:37:26] ok, i'll reference both of those then, thanks :) [08:37:34] sweeet [08:39:13] marostegui: i'm not sure i understand your action here: https://phabricator.wikimedia.org/T261389#6504219 [08:39:25] the parent task you removed was the codfw->eqiuad _switchback_ [08:39:34] which surely we're still blocked on [08:39:42] ohh [08:39:49] i've misunderstood the point of the parent task [08:39:52] haha [08:39:53] yeah [08:39:54] it's for things which block the switchback [08:39:55] it is confusing [08:39:59] not things which are blocked by the switchback [08:40:00] 'yay' [08:44:34] marostegui: is it clear to deploy my favourite schema change to s7/eqiad? [08:44:51] kormat: Go for it! [08:46:49] i'm really glad my script does conditional checks for each db. for s5 some of the dbs already had the change [08:47:02] the new wikis I suppose? [08:47:05] good luck with s3 [08:47:06] XD [08:47:22] although there weren't many new wikis created since that schema change was merged I believe, so maybe they were all in s5 [08:48:12] kormat: this is a good example of a fun schema change in s3 https://phabricator.wikimedia.org/T260111 [08:48:26] yeah, all of the skipped dbs were created in the last month [08:49:31] how lovely re: that task :) [09:31:43] 10DBA, 10Upstream: Investigate possible memory leak on db1115 - https://phabricator.wikimedia.org/T231769 (10Marostegui) 05Open→03Resolved a:03Marostegui I am going to close this, there's not much else we can do here, the bug is filed at https://jira.mariadb.org/browse/MDEV-22809 which indeed shows some... [09:35:11] 10DBA, 10Operations: Migrate MySQLs to use ROW-based replication - https://phabricator.wikimedia.org/T109179 (10Marostegui) @jcrespo I would like to close this - I don't think this is doable on long-term even, I would even say this is very long-long-long-long term for sX sections. There are many limitations he... [09:38:30] 10DBA, 10Operations: Migrate MySQLs to use ROW-based replication - https://phabricator.wikimedia.org/T109179 (10jcrespo) 05Open→03Declined > This ticket is to decide if this change is worth it, how to do it, where (maybe not all servers require it), when and what blockers there are. In a way this is "done... [09:38:46] 10DBA, 10Operations: Migrate MySQLs to use ROW-based replication - https://phabricator.wikimedia.org/T109179 (10jcrespo) 05Declined→03Resolved [09:41:09] 10Blocked-on-schema-change, 10DBA: Schema change to drop three indexes from wb_changes - https://phabricator.wikimedia.org/T264109 (10Ladsgroup) Thank you so much <3 [10:00:17] 10DBA, 10Sustainability: Look into Maria 10 parallel-replication - https://phabricator.wikimedia.org/T85266 (10Marostegui) 05Open→03Resolved a:03jcrespo Closing this task, it's been a while and `Parallel_Mode: conservative` is the default everywhere. I don't think we are moving any time soon from any oth... [10:06:08] marostegui: the schema change has replicated to everything in s7/eqiad apart from some of the labsdb hosts (which is expected). am i good to run against s4/eqiad? [10:06:35] kormat: yeah, at the moment I am only touching s8 [10:06:41] grand so [10:07:14] kormat: let's also do s4 in codfw, once it is deployed in eqiad [10:07:22] (without replication in codfw, of course) [10:07:30] * kormat grumbles [10:07:35] so we can leave that big wiki completely done [10:08:07] kormat: as it is the first big big wiki, let's make sure it is all ok on that front [10:08:12] fiine [10:08:16] so we avoid surprises with enwiki or wikidatawiki when switching back to eqiad [10:25:40] 10DBA, 10Operations, 10ops-codfw, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) Just for the record, those CPU reset/error have been happening since the first crash. [10:47:51] 10DBA, 10Operations, 10ops-codfw, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) There is one thing I have seen, which is that the temperature of this host, according to grafana is a lot higher than a host on the same section (db2126... [14:08:18] 10DBA, 10Operations, 10SRE-swift-storage, 10Goal: Research storage solutions for media backups - https://phabricator.wikimedia.org/T264190 (10jcrespo) [14:13:10] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Marostegui) The plan agreed with Papaul is to use an old disk from an es host that was decommissioned, and see if the controller recognizes the disk. If it does, the new disk is prob... [14:46:31] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Papaul) The disk from one of the decom es server works [14:46:59] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Papaul) Status Name State Slot Number Size Security Status Bus Protocol Media Type Hot Spare Remaining Rated Write Endurance Physical Disk 0:1:0 Online 0 1862.5 GB... [14:51:31] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Marostegui) I can see the disk now: ` Time: Wed Sep 30 14:44:09 2020 Code: 0x00000072 Class: 0 Locale: 0x02 Event Description: State change on PD 02(e0x20/s2) from OFFLINE(10) to RE... [15:06:39] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Marostegui) Not sure if it is actually going to work but at least the disk is seen: ` root@es2026:~# megacli -PDRbld -ShowProg -physdrv[32:2] -aALL Rebuild Progress on Device at En... [15:11:43] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Papaul) I create another dispatch to request a new disk and shipped the one received on 9/25/2020 back. ` Create Dispatch: Success You have successfully submitted request SR1038... [15:13:24] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Marostegui) Great! The rebuild is happening, slowly, but at least has started: ` root@es2026:~# megacli -PDRbld -ShowProg -physdrv[32:2] -aALL Rebuild Progress on Device at Enclosur... [16:10:16] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Papaul) Return tracking information {F32368964} [17:44:25] 10DBA: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Papaul) [17:45:38] 10DBA: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Papaul) [18:33:04] 10DBA, 10Operations, 10Sustainability (Incident Followup), 10Wikimedia-Incident: S5 replication issue, affecting watchlist and probably recentchanges - https://phabricator.wikimedia.org/T263842 (10RhinosF1) I just noticed the IR on wikitech says: >duplicate key for ip banning The ipblocks table actually s...