Wikimedia IRC logs browser

2021-03-18 01:46:46	<bstorm>	!log tools killed the toolschecker cron job, which had an LDAP error, and ran it again by hand
2021-03-18 01:46:50	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 03:35:36	<bstorm>	!log tools rebooting tools-sgecron-01 to try to clear up the ldap-related errors coming out of it
2021-03-18 03:35:41	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 03:37:11	<bstorm>	!log tools deleted a massive number of stuck jobs that misfired from the cron server
2021-03-18 03:37:13	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 03:49:30	<bstorm>	!log tools restarting sssd on tools-sgegrid-master
2021-03-18 03:49:34	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 03:59:13	<bstorm>	!log tools rebooting grid master. sorry for the cron spam
2021-03-18 03:59:17	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 04:12:52	<bstorm>	!log tools rebooted tools-sgeexec-0935.tools.eqiad.wmflabs because it forgot how to LDAP...likely root cause of the issues tonight
2021-03-18 04:12:56	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 06:43:12	<Majavah>	https://wiki.debian.org/DebianStretch says that Stretch should be on kernel 4.9, but somehow I have deployment-ms-be[05-06] on stretch and kernel .19 which is causing some issues with swift, any ideas how this happened? :/
2021-03-18 09:56:28	<arturo>	Majavah: no, sorry :-(
2021-03-18 10:28:34	<Majavah>	arturo: sll stretch wmcs vms I checked are on 4.19 for some reason, I guess I'll have to work around this specific issue, looks like it should be as simple as changing a condition on a puppet manifest
2021-03-18 10:37:30	<arturo>	Majavah: I'm not familiar with why that happened. Maybe open a phab task and ask for clarification from other deployment-prep admins?
2021-03-18 10:46:24	<Majavah>	arturo: it's not limited to deployment-prep, everything I checked on toolforge (bastion and a random sgeexec node) and the generic bastions are on 4.19, so I'm suspecting it's either some cloud-vps automatic update or in the base images itself
2021-03-18 10:46:56	<Majavah>	it can be worked around, just curious why cloud vps uses a different kernel than production on the same distro. should I still open a task, and where? cloud-vps and wmcs-kanban?
2021-03-18 10:49:00	<arturo>	Majavah: I see stretch-backports has 4.19.118-2+deb10u1~bpo9+1, so perhaps the VMs are using a backported kernel?
2021-03-18 10:50:59	<arturo>	Majavah: does this helps?
2021-03-18 10:51:01	<arturo>	https://www.irccloud.com/pastebin/M2XfZh2I/
2021-03-18 10:51:32	<dcaro>	maybe https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/wmcs/instance.pp#77 ?
2021-03-18 10:51:46	<Majavah>	arturo: Installed: 4.9+80+deb9u13
2021-03-18 10:51:52	<Majavah>	now I'm just more confused
2021-03-18 10:52:12	<arturo>	the interesting commands are the last 2
2021-03-18 10:52:48	<arturo>	Majavah:
2021-03-18 10:52:51	<arturo>	https://www.irccloud.com/pastebin/LasFZpcY/
2021-03-18 10:53:08	<arturo>	somehow a backported kernel ended up in the stretch security repository
2021-03-18 10:53:40	<Majavah>	https://paste.toolforge.org/view/173d16d8
2021-03-18 10:55:04	<Majavah>	dcaro: I guess that's where the repo comes, I just thought each package would need to have been pulled from backports manually
2021-03-18 10:55:08	<arturo>	Majavah: yeah, same: backported kernel in the security repository
2021-03-18 10:55:54	<dcaro>	https://www.debian.org/security/2021/dsa-4843 ?
2021-03-18 10:58:11	<arturo>	Majavah: in deployment-prep VMs you can simply downgrade the kernel. But again, that's something you deployment-prep folks should decide on
2021-03-18 11:00:05	<Majavah>	arturo: okay, thanks for the help, not sure yet what's the best way forwards (fixing puppet manifests to support 4.19 on stretch or downgrading) but I'm sure we'll figure something out
2021-03-18 11:08:23	<arturo>	👍
2021-03-18 12:47:32	<arturo>	!log toolsbeta delete puppet prefix `toolsbeta-buster-grirdmaster` (no longer useful) T277653
2021-03-18 12:47:38	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 12:47:38	<stashbot>	T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653
2021-03-18 12:48:13	<arturo>	!log toolsbeta destroy VM toolsbeta-buster-gridmaster (no longer useful) T277653
2021-03-18 12:48:19	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 12:50:10	<arturo>	!log toolsbeta added puppet prefix `toolsbeta-sgegrid-shadow`, migrate puppet config from VM to here
2021-03-18 12:50:13	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 12:51:30	<arturo>	!log toolsbeta rebuild toolsbeta-sgegrid-shadow instance as debian buster (T277653)
2021-03-18 12:51:33	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 12:53:11	<arturo>	!log toolsbeta create anti-affinity server group toolsbeta-sgegrid-master-shadow
2021-03-18 12:53:17	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 16:19:09	<bstorm>	!log tools added profile::toolforge::infrastructure class to puppetmaster T277756
2021-03-18 16:19:12	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 16:20:41	<andrewbogott>	!log tools disabling puppet tools-wide to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456
2021-03-18 16:20:44	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 16:21:30	<andrewbogott>	!log tools enabling puppet tools-wide
2021-03-18 16:21:33	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 16:24:10	<arturo>	!log toolsbeta live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456
2021-03-18 16:24:13	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 16:52:53	<arturo>	andrewbogott: sorry I got distracted. I'm ready to pay attention to the grid stuff now
2021-03-18 16:53:27	<andrewbogott>	np, I'm in the process of building toolsbeta-sgeexec-0902, once it's up I'll have you check my work :)
2021-03-18 16:53:38	<arturo>	ok, that will take some time
2021-03-18 16:54:00	<andrewbogott>	yeah :/
2021-03-18 16:58:05	<andrewbogott>	maybe I will get some lunch while puppet runs. arturo if you want to go we can revisit this tomorrow, otherwise I'll ping you when it finishes
2021-03-18 16:59:28	<arturo>	andrewbogott: ok! I will disconnect then for today
2021-03-18 16:59:45	<arturo>	I think this doesn't block me for the grid buster upgrade stuff I plan to do tomorrow anyway
2021-03-18 16:59:56	<arturo>	(without the path is just a small puppet agent complain)
2021-03-18 17:03:31	<mutante>	I am getting the "Puppet failure" mails for nodes like "node3.cluster.local" but I have no idea what they are. Normally you receive this mail if you are a project admin. But this one surprises me. Should I dig more?
2021-03-18 17:43:28	<andrewbogott>	mutante: that suggests that a VM has forgotten its hostname :(
2021-03-18 17:44:42	<andrewbogott>	let me see if I can figure out who is saying that.
2021-03-18 17:47:03	<andrewbogott>	mutante: are you a member of pontoon by chance? That would be my first guess...
2021-03-18 17:47:15	<mutante>	andrewbogott: aha! thank you. so actually there are 3 of them. node1, node2 and node3
2021-03-18 17:47:20	<andrewbogott>	Things in automation-framework are also chronically broken
2021-03-18 17:47:32	<mutante>	checking pontoon
2021-03-18 17:47:58	<andrewbogott>	When I last looked it seemed beyond help :)
2021-03-18 17:48:39	<mutante>	no, I don't see pontoon in my project list on Horizon
2021-03-18 17:49:06	<mutante>	maybe "puppet-diffs". let's see if I can leave that
2021-03-18 17:49:46	<mutante>	ah no.. then I probably cant sync compiler facts
2021-03-18 17:50:41	<mutante>	checking which of the projects I am in has exactly 3 nodes..hmm
2021-03-18 17:50:54	<mutante>	bastion does
2021-03-18 17:51:49	<mutante>	packaging does
2021-03-18 17:52:28	<mutante>	puppet-diffs does ... yea,, those 3.. the others would not match
2021-03-18 17:55:09	<andrewbogott>	I'm doing some cumin searches for broken instances… no guarantee this will find whatever's emailing you though
2021-03-18 17:55:12	<andrewbogott>	https://www.irccloud.com/pastebin/PELdagPK/
2021-03-18 17:56:18	<mutante>	andrewbogott: I guess let's just see if it keeps doing it every day in the future or it stops
2021-03-18 17:56:59	<andrewbogott>	mutante: here are the other candidates:
2021-03-18 17:57:02	<andrewbogott>	https://www.irccloud.com/pastebin/eqg1ehF2/
2021-03-18 17:57:14	<andrewbogott>	although if the hostname is scrambled I have no idea if cumin can reach them
2021-03-18 17:57:23	<mutante>	if it mails me as the proper "k8splay" again i will ping Wolfgang and check why puppet is bron there
2021-03-18 17:58:09	<mutante>	if it keeps mailing me as "node1" though I can open a ticket
2021-03-18 17:59:11	<mutante>	andrewbogott: hmm, thanks but in that paste I see nothing that looks familiar. maybe it was k8splay though
2021-03-18 18:14:06	<andrewbogott>	mutante: ok! It's clearly not urgent, just depends on your tolerance for cronspam
2021-03-18 18:17:29	<mutante>	andrewbogott: tolerance is high enough :)
2021-03-18 18:18:03	<andrewbogott>	we all have lots of practice
2021-03-18 18:35:31	<arturo>	mutante: that domain sounds like k8s
2021-03-18 18:44:33	<arturo>	!log toolsbeta replacing toolsbeta-sgegrid-master with a Debian Buster VM (T277653)
2021-03-18 18:44:37	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 18:44:38	<stashbot>	T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653
2021-03-18 18:49:29	<arturo>	!log toolsbeta deleting VM toolsbeta-workflow-test, no longer useful
2021-03-18 18:49:32	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 18:50:19	<arturo>	!log toolsbeta deleting VMs toolsbeta-paws-worker-1001 toolsbeta-paws-worker-1002 toolsbeta-paws-master-01 (testing for PAWS should happen in the paws project)
2021-03-18 18:50:22	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 18:55:55	<bstorm>	!log toolsbeta set profile::toolforge::infrastructure across the entire project with login_server set on the bastion prefix
2021-03-18 18:55:58	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 19:24:23	<bstorm>	!log tools set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes
2021-03-18 19:24:26	<stashbot>	Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 19:52:23	<bstorm>	lunch
2021-03-18 19:52:37	<bstorm>	ugh, getting my tabs confused. moving that to the other channel
2021-03-18 20:02:26	<Cyberpower678>	bd808: andrewbogott: maybe it's just my monitoring application but I just mounted the new Cinder volume under /srv-new and my usage monitor already is claiming 19GB of used space.
2021-03-18 20:03:11	<Cyberpower678>	I haven't put anything in it yet.
2021-03-18 20:05:12	<andrewbogott>	what does df say?
2021-03-18 20:06:20	<Cyberpower678>	That my monitoring application is being dumb and I should switch to something better lol.
2021-03-18 20:06:57	<Cyberpower678>	Thanks
2021-03-18 20:08:04	<andrewbogott>	np! Glad you're trying out cinder.
2021-03-18 20:09:23	<Cyberpower678>	I've been eagerly waiting for this new feature. :D
2021-03-18 20:10:49	<Cyberpower678>	andrewbogott: I
2021-03-18 20:11:09	<Cyberpower678>	I'm reading the section on moving old srv data to the new cinder, but the instructions confuse me a liitle.
2021-03-18 20:11:37	<Cyberpower678>	Step 2 tells me to verify the mounted drive exists of the new volume.
2021-03-18 20:11:50	<Cyberpower678>	So what am I mounting in Step 6?
2021-03-18 20:13:51	<Cyberpower678>	https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances#Moving_old_/srv_data_to_new_volume
2021-03-18 20:23:10	<andrewbogott>	Cyberpower678: you're reading in the deprecated lvm section now I think?
2021-03-18 20:23:17	<andrewbogott>	So not sure if it applies to what you're ding
2021-03-18 20:23:48	<Cyberpower678>	Yea, I noticed. I just adapted the command to copy from /srv to /srv-new
2021-03-18 20:24:00	<andrewbogott>	'k
2021-03-18 20:24:22	<Cyberpower678>	But once I have it copied, how can I change the mountpoints to move srv to srv-old and srv-new to srv?
2021-03-18 20:25:02	<andrewbogott>	I would edit /etc/fstab and then reboot
2021-03-18 20:25:35	<andrewbogott>	although typically when people are moving from an lvm volume to cinder it's in anticipation of attaching that cinder volume to a fresh VM I would think
2021-03-18 20:25:42	<Cyberpower678>	alright that's easy enough.
2021-03-18 20:26:20	<Cyberpower678>	Not inclined to setup the VM at this time. I've got so much to do right now. :-)
2021-03-18 20:27:40	<Cyberpower678>	Should I also remove the srv puppet role, or will it not matter>?
2021-03-18 20:33:08	<andrewbogott>	You probably should remove it, otherwise they'll collide

Wikimedia IRC logs browser - #wikimedia-cloud