2021-03-18 01:46:46
|
<bstorm>
|
!log tools killed the toolschecker cron job, which had an LDAP error, and ran it again by hand
|
2021-03-18 01:46:50
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 03:35:36
|
<bstorm>
|
!log tools rebooting tools-sgecron-01 to try to clear up the ldap-related errors coming out of it
|
2021-03-18 03:35:41
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 03:37:11
|
<bstorm>
|
!log tools deleted a massive number of stuck jobs that misfired from the cron server
|
2021-03-18 03:37:13
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 03:49:30
|
<bstorm>
|
!log tools restarting sssd on tools-sgegrid-master
|
2021-03-18 03:49:34
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 03:59:13
|
<bstorm>
|
!log tools rebooting grid master. sorry for the cron spam
|
2021-03-18 03:59:17
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 04:12:52
|
<bstorm>
|
!log tools rebooted tools-sgeexec-0935.tools.eqiad.wmflabs because it forgot how to LDAP...likely root cause of the issues tonight
|
2021-03-18 04:12:56
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 06:43:12
|
<Majavah>
|
https://wiki.debian.org/DebianStretch says that Stretch should be on kernel 4.9, but somehow I have deployment-ms-be[05-06] on stretch and kernel .19 which is causing some issues with swift, any ideas how this happened? :/
|
2021-03-18 09:56:28
|
<arturo>
|
Majavah: no, sorry :-(
|
2021-03-18 10:28:34
|
<Majavah>
|
arturo: sll stretch wmcs vms I checked are on 4.19 for some reason, I guess I'll have to work around this specific issue, looks like it should be as simple as changing a condition on a puppet manifest
|
2021-03-18 10:37:30
|
<arturo>
|
Majavah: I'm not familiar with why that happened. Maybe open a phab task and ask for clarification from other deployment-prep admins?
|
2021-03-18 10:46:24
|
<Majavah>
|
arturo: it's not limited to deployment-prep, everything I checked on toolforge (bastion and a random sgeexec node) and the generic bastions are on 4.19, so I'm suspecting it's either some cloud-vps automatic update or in the base images itself
|
2021-03-18 10:46:56
|
<Majavah>
|
it can be worked around, just curious why cloud vps uses a different kernel than production on the same distro. should I still open a task, and where? cloud-vps and wmcs-kanban?
|
2021-03-18 10:49:00
|
<arturo>
|
Majavah: I see stretch-backports has 4.19.118-2+deb10u1~bpo9+1, so perhaps the VMs are using a backported kernel?
|
2021-03-18 10:50:59
|
<arturo>
|
Majavah: does this helps?
|
2021-03-18 10:51:01
|
<arturo>
|
https://www.irccloud.com/pastebin/M2XfZh2I/
|
2021-03-18 10:51:32
|
<dcaro>
|
maybe https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/wmcs/instance.pp#77 ?
|
2021-03-18 10:51:46
|
<Majavah>
|
arturo: Installed: 4.9+80+deb9u13
|
2021-03-18 10:51:52
|
<Majavah>
|
now I'm just more confused
|
2021-03-18 10:52:12
|
<arturo>
|
the interesting commands are the last 2
|
2021-03-18 10:52:48
|
<arturo>
|
Majavah:
|
2021-03-18 10:52:51
|
<arturo>
|
https://www.irccloud.com/pastebin/LasFZpcY/
|
2021-03-18 10:53:08
|
<arturo>
|
somehow a backported kernel ended up in the stretch security repository
|
2021-03-18 10:53:40
|
<Majavah>
|
https://paste.toolforge.org/view/173d16d8
|
2021-03-18 10:55:04
|
<Majavah>
|
dcaro: I guess that's where the repo comes, I just thought each package would need to have been pulled from backports manually
|
2021-03-18 10:55:08
|
<arturo>
|
Majavah: yeah, same: backported kernel in the security repository
|
2021-03-18 10:55:54
|
<dcaro>
|
https://www.debian.org/security/2021/dsa-4843 ?
|
2021-03-18 10:58:11
|
<arturo>
|
Majavah: in deployment-prep VMs you can simply downgrade the kernel. But again, that's something you deployment-prep folks should decide on
|
2021-03-18 11:00:05
|
<Majavah>
|
arturo: okay, thanks for the help, not sure yet what's the best way forwards (fixing puppet manifests to support 4.19 on stretch or downgrading) but I'm sure we'll figure something out
|
2021-03-18 11:08:23
|
<arturo>
|
đź‘Ť
|
2021-03-18 12:47:32
|
<arturo>
|
!log toolsbeta delete puppet prefix `toolsbeta-buster-grirdmaster` (no longer useful) T277653
|
2021-03-18 12:47:38
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 12:47:38
|
<stashbot>
|
T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653
|
2021-03-18 12:48:13
|
<arturo>
|
!log toolsbeta destroy VM toolsbeta-buster-gridmaster (no longer useful) T277653
|
2021-03-18 12:48:19
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 12:50:10
|
<arturo>
|
!log toolsbeta added puppet prefix `toolsbeta-sgegrid-shadow`, migrate puppet config from VM to here
|
2021-03-18 12:50:13
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 12:51:30
|
<arturo>
|
!log toolsbeta rebuild toolsbeta-sgegrid-shadow instance as debian buster (T277653)
|
2021-03-18 12:51:33
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 12:53:11
|
<arturo>
|
!log toolsbeta create anti-affinity server group toolsbeta-sgegrid-master-shadow
|
2021-03-18 12:53:17
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 16:19:09
|
<bstorm>
|
!log tools added profile::toolforge::infrastructure class to puppetmaster T277756
|
2021-03-18 16:19:12
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 16:20:41
|
<andrewbogott>
|
!log tools disabling puppet tools-wide to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456
|
2021-03-18 16:20:44
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 16:21:30
|
<andrewbogott>
|
!log tools enabling puppet tools-wide
|
2021-03-18 16:21:33
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 16:24:10
|
<arturo>
|
!log toolsbeta live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456
|
2021-03-18 16:24:13
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 16:52:53
|
<arturo>
|
andrewbogott: sorry I got distracted. I'm ready to pay attention to the grid stuff now
|
2021-03-18 16:53:27
|
<andrewbogott>
|
np, I'm in the process of building toolsbeta-sgeexec-0902, once it's up I'll have you check my work :)
|
2021-03-18 16:53:38
|
<arturo>
|
ok, that will take some time
|
2021-03-18 16:54:00
|
<andrewbogott>
|
yeah :/
|
2021-03-18 16:58:05
|
<andrewbogott>
|
maybe I will get some lunch while puppet runs. arturo if you want to go we can revisit this tomorrow, otherwise I'll ping you when it finishes
|
2021-03-18 16:59:28
|
<arturo>
|
andrewbogott: ok! I will disconnect then for today
|
2021-03-18 16:59:45
|
<arturo>
|
I think this doesn't block me for the grid buster upgrade stuff I plan to do tomorrow anyway
|
2021-03-18 16:59:56
|
<arturo>
|
(without the path is just a small puppet agent complain)
|
2021-03-18 17:03:31
|
<mutante>
|
I am getting the "Puppet failure" mails for nodes like "node3.cluster.local" but I have no idea what they are. Normally you receive this mail if you are a project admin. But this one surprises me. Should I dig more?
|
2021-03-18 17:43:28
|
<andrewbogott>
|
mutante: that suggests that a VM has forgotten its hostname :(
|
2021-03-18 17:44:42
|
<andrewbogott>
|
let me see if I can figure out who is saying that.
|
2021-03-18 17:47:03
|
<andrewbogott>
|
mutante: are you a member of pontoon by chance? That would be my first guess...
|
2021-03-18 17:47:15
|
<mutante>
|
andrewbogott: aha! thank you. so actually there are 3 of them. node1, node2 and node3
|
2021-03-18 17:47:20
|
<andrewbogott>
|
Things in automation-framework are also chronically broken
|
2021-03-18 17:47:32
|
<mutante>
|
checking pontoon
|
2021-03-18 17:47:58
|
<andrewbogott>
|
When I last looked it seemed beyond help :)
|
2021-03-18 17:48:39
|
<mutante>
|
no, I don't see pontoon in my project list on Horizon
|
2021-03-18 17:49:06
|
<mutante>
|
maybe "puppet-diffs". let's see if I can leave that
|
2021-03-18 17:49:46
|
<mutante>
|
ah no.. then I probably cant sync compiler facts
|
2021-03-18 17:50:41
|
<mutante>
|
checking which of the projects I am in has exactly 3 nodes..hmm
|
2021-03-18 17:50:54
|
<mutante>
|
bastion does
|
2021-03-18 17:51:49
|
<mutante>
|
packaging does
|
2021-03-18 17:52:28
|
<mutante>
|
puppet-diffs does ... yea,, those 3.. the others would not match
|
2021-03-18 17:55:09
|
<andrewbogott>
|
I'm doing some cumin searches for broken instances… no guarantee this will find whatever's emailing you though
|
2021-03-18 17:55:12
|
<andrewbogott>
|
https://www.irccloud.com/pastebin/PELdagPK/
|
2021-03-18 17:56:18
|
<mutante>
|
andrewbogott: I guess let's just see if it keeps doing it every day in the future or it stops
|
2021-03-18 17:56:59
|
<andrewbogott>
|
mutante: here are the other candidates:
|
2021-03-18 17:57:02
|
<andrewbogott>
|
https://www.irccloud.com/pastebin/eqg1ehF2/
|
2021-03-18 17:57:14
|
<andrewbogott>
|
although if the hostname is scrambled I have no idea if cumin can reach them
|
2021-03-18 17:57:23
|
<mutante>
|
if it mails me as the proper "k8splay" again i will ping Wolfgang and check why puppet is bron there
|
2021-03-18 17:58:09
|
<mutante>
|
if it keeps mailing me as "node1" though I can open a ticket
|
2021-03-18 17:59:11
|
<mutante>
|
andrewbogott: hmm, thanks but in that paste I see nothing that looks familiar. maybe it was k8splay though
|
2021-03-18 18:14:06
|
<andrewbogott>
|
mutante: ok! It's clearly not urgent, just depends on your tolerance for cronspam
|
2021-03-18 18:17:29
|
<mutante>
|
andrewbogott: tolerance is high enough :)
|
2021-03-18 18:18:03
|
<andrewbogott>
|
we all have lots of practice
|
2021-03-18 18:35:31
|
<arturo>
|
mutante: that domain sounds like k8s
|
2021-03-18 18:44:33
|
<arturo>
|
!log toolsbeta replacing toolsbeta-sgegrid-master with a Debian Buster VM (T277653)
|
2021-03-18 18:44:37
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 18:44:38
|
<stashbot>
|
T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653
|
2021-03-18 18:49:29
|
<arturo>
|
!log toolsbeta deleting VM toolsbeta-workflow-test, no longer useful
|
2021-03-18 18:49:32
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 18:50:19
|
<arturo>
|
!log toolsbeta deleting VMs toolsbeta-paws-worker-1001 toolsbeta-paws-worker-1002 toolsbeta-paws-master-01 (testing for PAWS should happen in the paws project)
|
2021-03-18 18:50:22
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 18:55:55
|
<bstorm>
|
!log toolsbeta set profile::toolforge::infrastructure across the entire project with login_server set on the bastion prefix
|
2021-03-18 18:55:58
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
|
2021-03-18 19:24:23
|
<bstorm>
|
!log tools set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes
|
2021-03-18 19:24:26
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
|
2021-03-18 19:52:23
|
<bstorm>
|
lunch
|
2021-03-18 19:52:37
|
<bstorm>
|
ugh, getting my tabs confused. moving that to the other channel
|
2021-03-18 20:02:26
|
<Cyberpower678>
|
bd808: andrewbogott: maybe it's just my monitoring application but I just mounted the new Cinder volume under /srv-new and my usage monitor already is claiming 19GB of used space.
|
2021-03-18 20:03:11
|
<Cyberpower678>
|
I haven't put anything in it yet.
|
2021-03-18 20:05:12
|
<andrewbogott>
|
what does df say?
|
2021-03-18 20:06:20
|
<Cyberpower678>
|
That my monitoring application is being dumb and I should switch to something better lol.
|
2021-03-18 20:06:57
|
<Cyberpower678>
|
Thanks
|
2021-03-18 20:08:04
|
<andrewbogott>
|
np! Glad you're trying out cinder.
|
2021-03-18 20:09:23
|
<Cyberpower678>
|
I've been eagerly waiting for this new feature. :D
|
2021-03-18 20:10:49
|
<Cyberpower678>
|
andrewbogott: I
|
2021-03-18 20:11:09
|
<Cyberpower678>
|
I'm reading the section on moving old srv data to the new cinder, but the instructions confuse me a liitle.
|
2021-03-18 20:11:37
|
<Cyberpower678>
|
Step 2 tells me to verify the mounted drive exists of the new volume.
|
2021-03-18 20:11:50
|
<Cyberpower678>
|
So what am I mounting in Step 6?
|
2021-03-18 20:13:51
|
<Cyberpower678>
|
https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances#Moving_old_/srv_data_to_new_volume
|
2021-03-18 20:23:10
|
<andrewbogott>
|
Cyberpower678: you're reading in the deprecated lvm section now I think?
|
2021-03-18 20:23:17
|
<andrewbogott>
|
So not sure if it applies to what you're ding
|
2021-03-18 20:23:48
|
<Cyberpower678>
|
Yea, I noticed. I just adapted the command to copy from /srv to /srv-new
|
2021-03-18 20:24:00
|
<andrewbogott>
|
'k
|
2021-03-18 20:24:22
|
<Cyberpower678>
|
But once I have it copied, how can I change the mountpoints to move srv to srv-old and srv-new to srv?
|
2021-03-18 20:25:02
|
<andrewbogott>
|
I would edit /etc/fstab and then reboot
|
2021-03-18 20:25:35
|
<andrewbogott>
|
although typically when people are moving from an lvm volume to cinder it's in anticipation of attaching that cinder volume to a fresh VM I would think
|
2021-03-18 20:25:42
|
<Cyberpower678>
|
alright that's easy enough.
|
2021-03-18 20:26:20
|
<Cyberpower678>
|
Not inclined to setup the VM at this time. I've got so much to do right now. :-)
|
2021-03-18 20:27:40
|
<Cyberpower678>
|
Should I also remove the srv puppet role, or will it not matter>?
|
2021-03-18 20:33:08
|
<andrewbogott>
|
You probably should remove it, otherwise they'll collide
|