[08:33:06] jynus: this is waving "good morning" at you https://phabricator.wikimedia.org/T244958 [09:33:18] I saw it from bed [09:33:24] haha [09:49:55] 10DBA, 10Cloud-Services: Prepare and check storage layer for ngwikimedia - https://phabricator.wikimedia.org/T240772 (10Urbanecm) @Ammarpad Please ping here when you created users for the rest of your user group [10:03:59] good news is that the latest s2 and s3 snapshots are minutes old [10:04:25] Also, I just got shipped a new mouse and it is a great thing to have back a scroll wheel [10:06:42] hahahaha [10:06:42] however, I will set up both on db1140 [10:07:00] which is our backup backup source host [10:07:26] is there space there? [10:07:31] yeah [10:07:32] for 2 more sections? [10:07:58] yes for s3 and s2 [10:08:45] cool [10:22:16] 10DBA, 10Operations: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10jcrespo) created /srv/sqldata.s2 on db1140 and ran: ` transfer.py --type=decompress --no-encrypt --no-checksum dbprov1002.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s2.2020-02-12--01-22-05.tar.... [10:22:48] ^Available: 5.0T [10:23:17] ah sweet [10:23:38] that was on purpose, we had x1 on it, but it was a "spare" for this [10:23:58] if I have a backup plan for our backup host, do I have a backup squared? [10:25:08] And do we have a backup plan for that backup plan? [10:25:20] actually we do, we call it codfw [10:25:50] but codfw is the backup plan [10:25:50] or as we call it on the trade, "backup cubed" [10:25:54] not the backup plan for it [10:26:31] I think I got lost somewhat along the last plan :-D [10:26:39] haha [10:26:41] :) [10:32:48] recovery is happening at around 600MBy/s after compression [10:33:17] nice, I am curious to see how much the whole process takes [10:33:42] it will slow down once cached filesystems is exausted [10:33:56] plust there is a running mysql there taking cpu and memory [10:49:47] copy should be finishing any time now [10:50:13] that was fast! [10:50:18] the copy+decompression? [10:51:55] yes, happens at the same time [10:52:14] cool, very fast! [10:53:58] https://www.youtube.com/watch?v=g4mf35D7K_8 [10:54:11] haha [11:14:57] 10DBA, 10Operations, 10Patch-For-Review: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10jcrespo) Now running: ` transfer.py --type=decompress --no-encrypt --no-checksum dbprov1002.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s3.2020-02-12--05-46-09.tar.gz db1140... [11:17:21] Re next year planning, we should do an "audit" of space growth of not only core but other stuff (multiinstance, sanitarium, etc.) to preview needs, eg. if s4 grows a lot [11:20:21] I have in mind things like dbprov requiring extra instances due to that [11:21:06] yep, definitely [11:22:23] it is a pain, because growth on a single place implies lots of needs on other places [13:30:36] instances = 0 worked better than expected, it checks mysql running processes is 0 [13:32:37] "PROCS OK: 0 processes with command name 'mysqld'" [13:40:07] 10DBA, 10Operations, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10Marostegui) @jcrespo @akosiaris any tentative date? [13:41:21] 10DBA, 10Operations, 10Phabricator, 10Release-Engineering-Team (Development services): Upgrade and restart m3 (phabricator) master (db1128) - https://phabricator.wikimedia.org/T244566 (10Marostegui) 12th (the original date I suggested) has passed, any tentative date @mmodell you'd like to consider, there i... [13:42:52] 10DBA, 10Operations, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10jcrespo) Sorry, I thought I had answered, but I apparently I did not hit submit. Any time during the UTC day, outside of the first 1 week of a month is ok for bacula. Preferably,... [13:44:41] 10DBA, 10Operations, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10Marostegui) Let's aim for Thursday 20th at 09:00AM UTC? [13:45:09] 10DBA, 10Operations, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10akosiaris) >>! In T244238#5876636, @Marostegui wrote: > @jcrespo @akosiaris any tentative date? Anytime is good for etherpad! [13:46:13] 10DBA, 10Operations, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10jcrespo) >>! In T244238#5876670, @Marostegui wrote: > Let's aim for Thursday 20th at 09:00AM UTC? Cool to me, send some invites this way! :-D [13:47:24] 10DBA, 10Operations, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10Marostegui) >>! In T244238#5876672, @jcrespo wrote: >>>! In T244238#5876670, @Marostegui wrote: >> Let's aim for Thursday 20th at 09:00AM UTC? > > Cool to me, send some invites th... [13:50:19] marostegui: Re T241058 ,although not directly related [13:50:45] while you were away, I changed the "definition" of s1 and s4 on zarcillo [13:50:59] and pinged you about it, but want to make sure you remember it [13:51:18] I don't :-), what do you mean with the definition? [13:51:41] this was because serviceops got worried while doing stuff on the test hosts [13:51:51] so test hosts are no longer s1 or s4 [13:51:58] they are test-s1 and test-s4 [13:52:03] ah cool [13:52:06] so if you run section s1 [13:52:08] that matches their definition on hiera too [13:52:13] it will NOT return test hosts [13:52:17] this can be confusing [13:52:42] because there is s1 "the category" and s1 "the replica set" [13:52:45] yeah, I tend to ignore those [13:52:55] I haven't found a good way to deal with that [13:53:13] so just that you know, s1 will not include for now test-s1 hosts, etc. [13:53:29] so not to appear on production grafana dashboards [13:53:40] it should appear if you do ./section test-s1 [13:53:45] cool, thanks for the reminder, I have indeed forgotten :) [13:53:52] I put it on a ticket [13:54:01] ETOOMANY :) [13:54:02] or somewhere, but I prefered to remind it again [13:54:08] as you weren't around [13:54:17] yeah, definitely, you did well, I didn't remember [13:54:26] this mostly impacts schema changes [13:54:38] but I thought that the worst it could happen if repl braking on a test host [13:54:39] 10DBA, 10Operations, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10Marostegui) Email: https://lists.wikimedia.org/pipermail/wikitech-l/2020-February/093063.html [14:17:50] 10DBA, 10Operations: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10jcrespo) eqiad backup service has been restored on a different host, now to handle hw issues. [15:32:00] 10DBA, 10Operations: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10Marostegui) [16:00:54] 10DBA, 10conftool: Enforce in dbctl that core sections and es clusters always have at least two replicas - https://phabricator.wikimedia.org/T245036 (10Krinkle) [16:02:34] 10DBA, 10conftool: Enforce in dbctl that core sections and es clusters always have at least two replicas - https://phabricator.wikimedia.org/T245036 (10Marostegui) right now this is what we have on es config: ` root@cumin1001:/home/marostegui# dbctl -s eqiad section es3 get { "es3": { "flavor": "ex... [16:05:26] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10Cmjohnson) @Marostegui We can upgrade the f/w. That can be anytime, please pick a convenient date for you. [16:06:15] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10Marostegui) >>! In T243963#5877162, @Cmjohnson wrote: > @Marostegui We can upgrade the f/w. That can be anytime, please pick a convenient date for you. Can we do it tomorrow at the most conveni... [16:08:13] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10Marostegui) @Cmjohnson I can have the host depooled and off tomorrow in the UTC morning so you can do it whenever you can tomorrow, and once done, just power it back on. Would that work? [16:43:58] 10DBA, 10Operations, 10Phabricator, 10Release-Engineering-Team (Development services): Upgrade and restart m3 (phabricator) master (db1128) - https://phabricator.wikimedia.org/T244566 (10mmodell) Hey @Marostegui, How about tomorrow? I can be around tomorrow, if you'd like. If you'd like to do it at your le... [16:45:18] 10DBA, 10Operations, 10Phabricator, 10Release-Engineering-Team (Development services): Upgrade and restart m3 (phabricator) master (db1128) - https://phabricator.wikimedia.org/T244566 (10Marostegui) >>! In T244566#5877381, @mmodell wrote: > Hey @Marostegui, How about tomorrow? I can be around tomorrow, if... [16:52:39] 10DBA, 10Operations, 10ops-eqiad: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10jcrespo) a:05jcrespo→03wiki_willy Battery of db1095, our of warranty, is toasted. It would be nice not throw away the whole server for just the RAID battery. Could we order one? For... [17:10:20] 10DBA, 10Growth-Team, 10Operations, 10StructuredDiscussions, 10WorkType-Maintenance: Setup separate logical External Store for Flow in production - https://phabricator.wikimedia.org/T107610 (10Anomie) This blocks {T106363} which blocks {T106386}. If we want to do T106386 then this needs to be done. I do... [17:11:36] 10DBA, 10Operations, 10ops-eqiad: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10RobH) a:05wiki_willy→03Jclark-ctr Please note that we just ordered replacement raid batteries for HP Gen9 raid controllers via T243547. @jclark-ctr: Please use one of the batteries... [17:40:21] 10DBA, 10Cloud-Services: Prepare and check storage layer for ngwikimedia - https://phabricator.wikimedia.org/T240772 (10Ammarpad) Done. I created additional users. [17:43:09] 10DBA, 10Cloud-Services: Prepare and check storage layer for ngwikimedia - https://phabricator.wikimedia.org/T240772 (10jcrespo) I checked users table, looking good, but will let manuel close this. [18:31:49] 10DBA, 10Operations, 10Phabricator, 10Release-Engineering-Team (Development services): Upgrade and restart m3 (phabricator) master (db1128) - https://phabricator.wikimedia.org/T244566 (10mmodell) @Marostegui Thursday 6:00 AM works for me. [18:33:01] 10DBA, 10Operations, 10Phabricator, 10Release-Engineering-Team (Development services): Upgrade and restart m3 (phabricator) master (db1128) - https://phabricator.wikimedia.org/T244566 (10Marostegui) >>! In T244566#5877826, @mmodell wrote: > @Marostegui Thursday 6:00 AM works for me. Excellent! See you to... [18:40:03] 10DBA, 10Growth-Team, 10Operations, 10StructuredDiscussions, 10WorkType-Maintenance: Setup separate logical External Store for Flow in production - https://phabricator.wikimedia.org/T107610 (10kchapman) @jcrespo do you still want us to do the compression in T106386? Are the storage constraints still rele... [18:47:24] 10DBA, 10Growth-Team, 10Operations, 10StructuredDiscussions, 10WorkType-Maintenance: Setup separate logical External Store for Flow in production - https://phabricator.wikimedia.org/T107610 (10Marostegui) >>! In T107610#5877888, @kchapman wrote: > @jcrespo do you still want us to do the compression in T1... [21:11:36] 10DBA, 10Growth-Team, 10Operations, 10StructuredDiscussions, 10WorkType-Maintenance: Setup separate logical External Store for Flow in production - https://phabricator.wikimedia.org/T107610 (10MMiller_WMF) 05Open→03Declined Given that it doesn't sound like this is needed. I'm declining the task. Pl...