[05:07:06] 10DBA, 10Parsoid, 10Parsoid-Tests: testreduce_vd database in m5 still in use? - https://phabricator.wikimedia.org/T245408 (10Marostegui) I have included that IP so it should be fixed now: ` | GRANT SELECT, INSERT, UPDATE, DELETE, ALTER ON `testreduce`.* TO 'ssastry'@'10.64.48.94' | `... [06:02:27] 10DBA, 10Data-Services, 10Patch-For-Review: Make watchlist table available as curated foo_p.watchlist_count on labsdb - https://phabricator.wikimedia.org/T59617 (10jcrespo) [07:01:36] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install backup1002 + array - https://phabricator.wikimedia.org/T250816 (10jcrespo) a:05Cmjohnson→03jcrespo Thanks, that made it boot. Thank you! Now I am only blocked by pending update of buster installer to la... [07:04:37] kormat: buster installer is broken ATM, do not try to reinstall any server [07:04:50] feel free to ping me if you have any issue [07:04:56] ah :) is there a task tracking this? [07:05:19] not yet, I just run into it [07:05:40] but I think I saw AndrewB complain about this on the weekend [07:06:25] we just need a command run on the installer, but I am unsure to do it without owner's ok [07:07:33] process documented at https://wikitech.wikimedia.org/wiki/Updating_netboot_image_with_newer_kernel [07:08:23] mmm, not sure if that is the right one, I want to remember there was a script [07:09:46] jynus: as i'm currently spending all my time doing reinstalls on d-i-test, can you give me some details on what breaks? [07:10:34] yeah, AFAIK [07:10:52] when debian point update gets released [07:10:57] the installer image has to be updated [07:11:16] ah - i see: http://bogott.net/misc/nokernel.png [07:11:27] yep [07:11:37] there is a script for that [07:11:45] but I have to find it [07:11:56] `modules/profile/manifests/puppetmaster/updatenetboot.pp` according to volans [07:12:04] (from the scrollback in #-sre) [07:12:18] I don't think it is the instructions above, that is when the kernel changes, this is only a normal installer update [07:12:25] yeah, that sound like it [07:13:51] so I guess we can run update-netboot-image ? [07:14:04] sounds like it [07:15:08] I'm updating netboot images in a bit, unless someone's already on it [07:15:26] ok, then [07:15:32] last weekend's point update bumped the kernel ABI [07:15:37] we should wait, leave the experts [07:15:43] see-- that is why [07:15:51] "more things could break" [07:16:03] :-D [07:16:56] kormat: you sound like you only have 1 thing to do- reimage- that is dangerous to admit as you could quickly get assigned more work :-))) [07:17:40] * kormat hides [07:19:23] continue doing that, but could I suggest someting to "just look at" until fixed? [07:20:18] i still have more investigation of d-i i can be doing, but what do you have in mind? [07:20:44] I would like to have an extra reviewer/feedback for puppet cleaning up [07:21:01] nothing concrete, just looking at mariadb- related puppet classes [07:21:05] i can certainly look, but so far my puppet knowledge is veeery limited [07:21:15] netboot images updated [07:21:17] well, I have been writing puppet here for 5 years [07:21:19] moritzm: \o/ [07:21:22] and still now nothing [07:21:27] jynus: haha [07:21:42] my point is for you too look at as you will learn about our infra [07:22:01] no assigment for now, the bell saved you :-D [07:22:06] kormat: for background: we customize the netboot images with additional firmware needed to PXE boot on some models and these need to be updated after every point releases in which the kernel changed the ABI [07:22:08] * kormat wipes brow [07:22:09] thanks, moritzm, you are the best! [07:22:18] ABI change essentially means that some kernel interface changed [07:22:38] moritzm: i feel like i'm missing something - do we automatically grab the latest netinst.iso from upstream? [07:22:39] oh, I thought the errors was just a version mismatch? [07:22:48] so kernel modules built for an older ABI can no longer be loaded [07:22:56] or is the version mistatch DUE to abi changes? [07:23:01] or does a new point release break previous netinst.iso's? [07:23:16] the problem is: [07:23:27] when your install failed earlier the morning [07:23:36] it was booted with 10.3 (as it wasn't updated) [07:23:42] and after the initial PXE boot [07:24:11] the second stage install kicks in where d-i loads kernel modules from the mirror [07:24:45] but those are not versioned to the extent that a d-i images with kernel ABI 4 only attempts to download and fetch ABI 4 [07:24:58] ohh. *facepalm* [07:25:12] it only loads the latest version which at this point is ABI 5 and incompatible with the older image [07:25:24] yep yep. that's not a gotcha i would have envisioned [07:25:30] the core issue here is some Debian vote a decade ago [07:25:37] I guess it is an installer limitation that really no one cares because "just use the latest version installer" is the right path [07:25:42] :-D [07:25:48] ah, so it was "voted" [07:25:51] that explains a lot [07:25:52] :-D [07:26:14] moritzm: I hope our pings don't stress you much [07:26:26] consider this as your work being essential for us! [07:26:47] would a run of the script + move it to the right place have worked? [07:27:57] where it was decided (in a rather shady way which makes George Bush's reelection in Florida look good in comparison) that firmware shipped in the Linux kernel also needs come with full source/full toolchain to be in main (and hence the official d-i images cannot boot with unmodified images on comtemporary NICs) [07:28:04] jynus: it [07:28:28] jynus: it's perfectly fine! I would have it now anyway, opening a task for the 10.4 update in a bit as well [07:28:39] the general process is: [07:28:58] run sudo-update-netboot-images $DISTRO on puppetmaster1001 [07:29:06] check that everything went well [07:29:28] copy the d-i environment to the volatile dir of our puppet repo [07:29:42] yeah, I have a vague rememberance when faidon used to do it, but it happens every far in-between that I forget the details [07:29:48] run puppet on install* servers to apply the volatile file changes to the install servers [07:31:37] kormat: you are saved... for now [07:31:45] thanks again, moritzm [07:32:25] jynus: i'll take what i can get :) [07:33:31] yw :-) [07:36:15] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install backup1002 + array - https://phabricator.wikimedia.org/T250816 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts: ` ['backup1002.eqiad.wmnet'] ` The log can be found in `/var/... [07:42:43] mmm, got the same error, will check puppet logs [07:43:00] moritzm: i'm still getting "No kernel modules were found." [07:43:09] i've run puppet on install* [07:43:42] yeah, same [07:43:55] checking if it got the updates on log [07:44:32] it did, at May 11 07:19:33 install1003 [07:45:47] maybe apt rsync delays? [08:20:40] kormat: going for a cup of coffee, I guess we can still talk puppet on our weekly meeting 0:-D [08:21:37] lucky me :) [08:37:01] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install backup1002 + array - https://phabricator.wikimedia.org/T250816 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['backup1002.eqiad.wmnet'] ` Of which those **FAILED**: ` ['backup1002.eqiad.wmnet'] ` [10:01:26] kormat: meeting? [10:01:34] oops, yep [10:11:29] 10DBA: In-place conversion from LVM to normal partition - https://phabricator.wikimedia.org/T252195 (10Kormat) TODO: scan all db hosts to ensure that `dmsetup table` only has a single entry. [11:45:08] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install backup1002 + array - https://phabricator.wikimedia.org/T250816 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts: ` ['backup1002.eqiad.wmnet'] ` The log can be found in `/var/... [11:57:48] yay "Started first puppet run (sit back, relax, and enjoy the wait)" [11:58:09] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install backup1002 + array - https://phabricator.wikimedia.org/T250816 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['backup1002.eqiad.wmnet'] ` Of which those **FAILED**: ` ['backup1002.eqiad.wmnet'] ` [11:58:19] I celebrated too soon "Unable to run wmf-auto-reimage-host: Failed to puppet_first_run" [12:01:59] no puppet role [12:02:04] I know [12:04:14] https://gerrit.wikimedia.org/r/595509 [12:06:57] I was able to except spicerack, eh! [12:10:42] jynus: FYI the current role for WIP hosts is 'insetup' ;) see https://gerrit.wikimedia.org/r/c/operations/puppet/+/575485 [12:11:22] mmm [12:11:34] that's new to me [12:11:52] does it disable alerts? [12:12:15] don't know by heart, check with mo.ritz ;) [12:12:19] I see no changes in hiera for that [12:12:32] that's the main reason I use spare [12:12:52] I will send a patch if not [12:13:08] I just want to setup the RAID [12:46:23] vaolhttps://gerrit.wikimedia.org/r/#/c/operations/puppet/+/595519/ [12:46:31] volans: lhttps://gerrit.wikimedia.org/r/#/c/operations/puppet/+/595519/ [12:46:43] I made as a dependency https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/595517/ [12:46:56] but will wait on that for feedback [13:21:09] 10DBA, 10Operations, 10User-notice: Upgrade and restart s4 (commonswiki) primary database master: Tue 12th May - https://phabricator.wikimedia.org/T251502 (10Marostegui) Package has been upgraded on db1138 [13:24:26] 10DBA, 10Operations, 10User-notice: Upgrade and restart s4 (commonswiki) primary database master: Tue 12th May - https://phabricator.wikimedia.org/T251502 (10Marostegui) Maintenance day: - Silence all hosts in s4 - Set read only on s4: ` dbctl --scope eqiad section s4 ro "Maintenance on s4 T251502" && dbct... [13:45:55] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10Jclark-ctr) [13:46:20] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson name rack_name position switchport db1141 A3 1 7 db1142 A5 36 36 db1143 B3 32 26 db1144 B8 7 13 db1145 C5 9 8 db1146 C5 33... [14:31:07] 10DBA, 10Operations, 10User-notice: Upgrade and restart s4 (commonswiki) primary database master: Tue 12th May - https://phabricator.wikimedia.org/T251502 (10Marostegui) Window reserved on the Deployment's calendar [14:54:00] "we should the tftpboot" splif off, move? moritzm [14:54:14] didn't get that last comment [14:55:55] meh, I meant "should serve the", edited the Phab comment for clarification [14:57:02] thanks, sorry [17:23:54] 10DBA: Upgrade parsercache to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T252182 (10jcrespo) [17:24:11] 10DBA: Upgrade parsercache to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T252182 (10jcrespo) updating title to prevent confusion with Buster 10.4 release. [17:24:29] 10DBA, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10jcrespo) [17:32:25] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10Cmjohnson) [17:43:44] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10Marostegui) @Cmjohnson I have ammended your patch for the DHCP to make sure they use the Buster installer. [17:44:40] marostegui: cool, they're ready for imaging [17:44:55] oh..wait, did you do the site.pp entry? [17:45:25] doesn't look like it, let me do that first [17:46:17] I made this change today: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/595517/ [17:46:27] "in setup" hosts no longer can generate alerts [17:46:50] cmjohnson1: yes, I did the site.pp change a few days ago [17:47:46] oh..cool! [17:48:54] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1148.eqiad.wmnet ` The log can be... [17:53:11] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10Jdforrester-WMF) [17:55:59] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1149.eqiad.wmnet ` The log can be... [18:01:31] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1141.eqiad.wmnet ` The log can be... [18:02:07] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1142.eqiad.wmnet ` The log can be... [18:02:22] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1148.eqiad.wmnet ` The log can be... [18:02:43] there is lag on db1113 [18:02:46] s6 [18:02:56] inserts got a huge spike [18:03:18] https://grafana.wikimedia.org/d/000000273/mysql?panelId=2&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1113&var-port=13316&from=1589209393485&to=1589220193486 [18:12:31] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1143.eqiad.wmnet ` The log can be found in `/var/log/wmf... [18:17:03] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1149.eqiad.wmnet'] ` and were **ALL** successful. [18:17:27] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1144.eqiad.wmnet ` The log can be found in `/var/log/wmf... [18:18:10] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install backup1002 + array - https://phabricator.wikimedia.org/T250816 (10Cmjohnson) [18:18:58] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install backup1002 + array - https://phabricator.wikimedia.org/T250816 (10Cmjohnson) 05Open→03Resolved the ops-eqiad portion of this task has been completed. Thank you for finishing the install @jcrespo/@marostegui [18:23:38] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1141.eqiad.wmnet'] ` and were **ALL** successful. [18:23:52] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1142.eqiad.wmnet'] ` and were **ALL** successful. [18:23:57] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1145.eqiad.wmnet ` The log can be found in `/var/log/wmf... [18:24:08] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1146.eqiad.wmnet ` The log can be found in `/var/log/wmf... [18:25:28] 10DBA, 10Operations, 10Goal: Set up backup strategy for es clusters - https://phabricator.wikimedia.org/T79922 (10jcrespo) last host needed, backup1002 is finally fully setup, HW and OS-wise and ready to implement the last part of external storage backups (cross-dc redundancy). [18:25:32] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1148.eqiad.wmnet'] ` and were **ALL** successful. [18:25:42] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1147.eqiad.wmnet ` The log can be found in `/var/log/wmf... [18:32:31] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1143.eqiad.wmnet'] ` and were **ALL** successful. [18:38:59] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1144.eqiad.wmnet'] ` and were **ALL** successful. [18:45:18] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1146.eqiad.wmnet'] ` and were **ALL** successful. [18:47:51] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1147.eqiad.wmnet'] ` and were **ALL** successful. [18:51:32] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` db1145.eqiad.wmnet ` The log can be found in `/var/log/wmf... [18:53:21] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1145.eqiad.wmnet'] ` Of which those **FAILED**: ` ['db1145.eqiad.wmnet'] ` [18:54:47] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10Cmjohnson) [19:14:13] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1145.eqiad.wmnet'] ` and were **ALL** successful. [19:18:53] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10Cmjohnson) [19:19:26] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10Cmjohnson) 05Open→03Resolved These are all yours @Marostegui [19:39:38] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10Marostegui) Thank you! They look good: ` _____FORMATTED_OUTPUT_____ db1141.eqiad.wmnet: Filesystem Type Size Used Avail Use% Mounted on db1141.eqiad.wmne...