[04:57:53] 10Blocked-on-schema-change, 10DBA: Schema change to rename user_newtalk indexes - https://phabricator.wikimedia.org/T234066 (10Marostegui) [04:58:12] 10Blocked-on-schema-change, 10DBA, 10Analytics, 10Core Platform Team: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135 (10Marostegui) [04:59:26] 10DBA: Recompress special slaves across eqiad and codfw - https://phabricator.wikimedia.org/T235599 (10Marostegui) [05:42:44] 10DBA, 10Operations: decommission db1070.eqiad.wmnet - https://phabricator.wikimedia.org/T235464 (10Marostegui) [06:45:14] | 3251 | dump.s4.2019-10-22--04-40-14 | failed | db2099.codfw.wmnet:3314 | dbprov2001.codfw.wmnet | dump | s4 | 2019-10-22 04:40:14 | 2019-10-22 04:40:43 | NULL | [06:45:22] Just saw that when looking for something else :) [07:13:02] ok, will have a look [07:13:43] ta [07:22:14] jynus: In a bit I will stop mysql on db1116 for the PDU maintenance (backup source) - it runs s7 and s8 which I checked that the backups have ben taking already. [07:22:34] ok [07:25:12] will you have to put down db2099? [07:25:57] also feel free to suggest a different time for the backups if they affect you- they normally work from 0 to 7 UTC [07:26:54] jynus: nope, the PDU work is only in eqiad [08:06:32] akosiaris: https://wikitech.wikimedia.org/wiki/Bacula#Monitoring [08:06:43] ^also volans [08:07:40] jynus: thanks! was about to answer to the doc comment with "that would do" (link to the doc) [08:08:21] volans: you would make a good test to read thoser and see if they are understandable by those of us that re not in the rabbithole [08:10:10] sure [08:14:00] jynus: I fixed some typos. In general looks good and is comprehensible, I have just two comments: [08:14:15] 1) I'm not sure what [FD < (DIR == SD)] refers to? [08:14:56] 2) given that the check would report only the first host in each category, it would be nice to have a way to list them all manually. [08:16:35] yeah, I was thinking of creating a verbose mode [08:16:52] --verbose with a couple of pages of info [08:17:07] with all the names of the jobs that failed and when [08:17:14] but not implemented yet [08:17:27] ack, sounds a good plan [08:17:51] just think this is a quartely plan, and we are in cotober! [08:17:56] :-D [08:18:39] I think we sometimes have missunderstandings because I like to work in small sprints, but each feature-complete [08:25:56] no prob [09:13:26] 10DBA, 10conftool: specify group (api/vslow/etc) weights in terms of 0..100 instead of 0..1 - https://phabricator.wikimedia.org/T231018 (10Marostegui) [09:21:51] In around 40 minutes I will turn off mysql on db1115 (tendril) for the PDU maintenance [09:22:09] So we won't have tendril for around 2h [09:22:23] I will !log it as it will affect dbtree [09:28:04] ok [09:29:18] if you can wait until s4 for codfw gathers metadata [09:29:30] it should be finishing soon [09:30:02] | 3255 | dump.s4.2019-10-22--07-29-46 | ongoing | db2099.codfw.wmnet:3314 | dbprov2001.codfw.wmnet | dump | s4 [09:30:54] it should finish in around 30 minutes [09:34:10] sure thing [09:34:32] I will check with you before doing so [10:06:35] jynus: I see db2099 still running, I was planning to go for lunch to be able to be back at 1pm, do you want to own db1115 and shut it down when the backup is done? [10:06:52] I have downtimed it already [10:12:45] I can do that [10:13:59] thanks [10:15:29] https://phabricator.wikimedia.org/T227142#5594339 [10:26:23] it is finishing now (mydumper finished) [10:28:04] it gets stuck on the image table because it is very large but has not a large amount of rows (probably the EXIM entries) [10:28:18] commonswiki.globalimagelinks.sql.gz | 6543754153 | 2019-10-22 09:09:48 [10:28:26] commonswiki.image.sql.gz | 30858343572 | 2019-10-22 10:24:23 [10:28:32] metadata | 418 | 2019-10-22 10:24:23 [10:29:17] it finished now [10:55:14] done, sorry for the delay [11:04:31] what would you think about me doing T224589 while you monitor the PDU stuff? [11:04:32] T224589: Migrate dbmonitor hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224589 [11:22:12] jynus: go for it! [11:23:18] thanks, marostegui [11:27:39] marostegui: godog: https://wikitech.wikimedia.org/wiki/Tendril#Service_dependencies [11:28:44] jynus: <3 thanks! [11:29:14] I think it is ok to make mistakes, as long as we document those and we don't make them twice! [11:29:38] :) [12:20:35] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: db1105 rebooted itself - https://phabricator.wikimedia.org/T235877 (10Cmjohnson) Updated all F/W on db1105 - Raid -Bios - Backplane - Idrac [12:22:10] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: db1105 rebooted itself - https://phabricator.wikimedia.org/T235877 (10Marostegui) Thank you Chris! [12:22:50] 10DBA, 10Operations: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [12:26:47] tendril is back up [12:27:55] 10DBA: Recompress special slaves across eqiad and codfw - https://phabricator.wikimedia.org/T235599 (10Marostegui) [12:28:18] I will check that dbmonitor2001 installed correctly, and if it did, I will do dbmonitor1001 [12:28:23] or we can leave it for later [12:30:05] if 2001 works fine, maybe leave it 24h and do 1001 tomorrow? [12:30:09] up to you really [12:30:12] ok to me [12:46:23] puppet seems fine [12:46:52] actually, wrong host, it complains [12:47:11] will fix it, and definitely delay dbmonitor1001 for later [12:47:26] thanks for working on this [13:06:32] 10DBA, 10Operations: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [13:06:35] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: db1105 rebooted itself - https://phabricator.wikimedia.org/T235877 (10Marostegui) 05Open→03Resolved Host fully repooled in production. Thanks Chris! [14:36:10] 10DBA, 10Wikimedia-production-error: scap sync failed, database error - https://phabricator.wikimedia.org/T236166 (10mmodell) Looks like a schema error: >Function: MediaWiki\Revision\RevisionStore::fetchRevisionRowFromConds >Error: 1146 Table 'labtestwiki.revision' doesn't exist (10.64.32.72) I'm confused by... [14:39:17] 10DBA, 10Wikimedia-production-error: scap sync failed, database error: RevisionStore::fetchRevisionRowFromConds Error: 1146 Table 'labtestwiki.revision' doesn't exist - https://phabricator.wikimedia.org/T236166 (10mmodell) [20:00:10] 10Blocked-on-schema-change, 10DBA, 10Analytics, 10Core Platform Team: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135 (10JAllemandou) >>! In T233135#5592951, @Nuria wrote: > Pinging analytics temporarily so we know these changes are happening, it shoudl no... [20:10:27] 10Blocked-on-schema-change, 10DBA, 10Core Platform Team: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135 (10Nuria) [22:22:42] bd808: Why does https://de.wikipedia.org/w/index.php?title=Kategorie:Vestfoldberge&action=info show subcategories == 0, but it contains one subcategory [22:24:43] doctaxon: no idea. I'm not a MediaWiki category expert. [22:24:45] select cat_subcats from category where cat_title = 'Vestfoldberge'; --> gives "0" in dewiki [22:26:06] i think you have not to be category expert, it's a thing with the database [22:30:09] andrewbogott: maybe you can help? [22:30:25] sorry, no idea [22:30:33] who else [22:31:26] doctaxon: why are you randomly pinging people? Neither Andrew nor I are DBAs or MediaWiki developers [22:31:53] because you are here on the channel and I need database help [22:32:16] I don't know if you really are developer [22:32:47] You also do not know if this is a database issue or a MediaWiki issue [22:32:54] and neither do I [22:33:10] sorry [22:33:26] so somebody is going to have to dig in deeper. we have phabricator for reporting bugs [22:34:21] pinging people on irc often means you are ringing their phone or otherwise interrupting their life [22:34:38] with great power comes great responsibility [22:34:45] so use that power wisely