Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-cloud

Filter:
Start date
End date

Displaying 126 items:

2021-03-18 01:46:46 <bstorm> !log tools killed the toolschecker cron job, which had an LDAP error, and ran it again by hand
2021-03-18 01:46:50 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 03:35:36 <bstorm> !log tools rebooting tools-sgecron-01 to try to clear up the ldap-related errors coming out of it
2021-03-18 03:35:41 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 03:37:11 <bstorm> !log tools deleted a massive number of stuck jobs that misfired from the cron server
2021-03-18 03:37:13 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 03:49:30 <bstorm> !log tools restarting sssd on tools-sgegrid-master
2021-03-18 03:49:34 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 03:59:13 <bstorm> !log tools rebooting grid master. sorry for the cron spam
2021-03-18 03:59:17 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 04:12:52 <bstorm> !log tools rebooted tools-sgeexec-0935.tools.eqiad.wmflabs because it forgot how to LDAP...likely root cause of the issues tonight
2021-03-18 04:12:56 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 06:43:12 <Majavah> https://wiki.debian.org/DebianStretch says that Stretch should be on kernel 4.9, but somehow I have deployment-ms-be[05-06] on stretch and kernel .19 which is causing some issues with swift, any ideas how this happened? :/
2021-03-18 09:56:28 <arturo> Majavah: no, sorry :-(
2021-03-18 10:28:34 <Majavah> arturo: sll stretch wmcs vms I checked are on 4.19 for some reason, I guess I'll have to work around this specific issue, looks like it should be as simple as changing a condition on a puppet manifest
2021-03-18 10:37:30 <arturo> Majavah: I'm not familiar with why that happened. Maybe open a phab task and ask for clarification from other deployment-prep admins?
2021-03-18 10:46:24 <Majavah> arturo: it's not limited to deployment-prep, everything I checked on toolforge (bastion and a random sgeexec node) and the generic bastions are on 4.19, so I'm suspecting it's either some cloud-vps automatic update or in the base images itself
2021-03-18 10:46:56 <Majavah> it can be worked around, just curious why cloud vps uses a different kernel than production on the same distro. should I still open a task, and where? cloud-vps and wmcs-kanban?
2021-03-18 10:49:00 <arturo> Majavah: I see stretch-backports has 4.19.118-2+deb10u1~bpo9+1, so perhaps the VMs are using a backported kernel?
2021-03-18 10:50:59 <arturo> Majavah: does this helps?
2021-03-18 10:51:01 <arturo> https://www.irccloud.com/pastebin/M2XfZh2I/
2021-03-18 10:51:32 <dcaro> maybe https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/wmcs/instance.pp#77 ?
2021-03-18 10:51:46 <Majavah> arturo: Installed: 4.9+80+deb9u13
2021-03-18 10:51:52 <Majavah> now I'm just more confused
2021-03-18 10:52:12 <arturo> the interesting commands are the last 2
2021-03-18 10:52:48 <arturo> Majavah:
2021-03-18 10:52:51 <arturo> https://www.irccloud.com/pastebin/LasFZpcY/
2021-03-18 10:53:08 <arturo> somehow a backported kernel ended up in the stretch security repository
2021-03-18 10:53:40 <Majavah> https://paste.toolforge.org/view/173d16d8
2021-03-18 10:55:04 <Majavah> dcaro: I guess that's where the repo comes, I just thought each package would need to have been pulled from backports manually
2021-03-18 10:55:08 <arturo> Majavah: yeah, same: backported kernel in the security repository
2021-03-18 10:55:54 <dcaro> https://www.debian.org/security/2021/dsa-4843 ?
2021-03-18 10:58:11 <arturo> Majavah: in deployment-prep VMs you can simply downgrade the kernel. But again, that's something you deployment-prep folks should decide on
2021-03-18 11:00:05 <Majavah> arturo: okay, thanks for the help, not sure yet what's the best way forwards (fixing puppet manifests to support 4.19 on stretch or downgrading) but I'm sure we'll figure something out
2021-03-18 11:08:23 <arturo> đź‘Ť
2021-03-18 12:47:32 <arturo> !log toolsbeta delete puppet prefix `toolsbeta-buster-grirdmaster` (no longer useful) T277653
2021-03-18 12:47:38 <stashbot> T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653
2021-03-18 12:47:38 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 12:48:13 <arturo> !log toolsbeta destroy VM toolsbeta-buster-gridmaster (no longer useful) T277653
2021-03-18 12:48:19 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 12:50:10 <arturo> !log toolsbeta added puppet prefix `toolsbeta-sgegrid-shadow`, migrate puppet config from VM to here
2021-03-18 12:50:13 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 12:51:30 <arturo> !log toolsbeta rebuild toolsbeta-sgegrid-shadow instance as debian buster (T277653)
2021-03-18 12:51:33 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 12:53:11 <arturo> !log toolsbeta create anti-affinity server group toolsbeta-sgegrid-master-shadow
2021-03-18 12:53:17 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 16:19:09 <bstorm> !log tools added profile::toolforge::infrastructure class to puppetmaster T277756
2021-03-18 16:19:12 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 16:20:41 <andrewbogott> !log tools disabling puppet tools-wide to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456
2021-03-18 16:20:44 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 16:21:30 <andrewbogott> !log tools enabling puppet tools-wide
2021-03-18 16:21:33 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 16:24:10 <arturo> !log toolsbeta live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456
2021-03-18 16:24:13 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 16:52:53 <arturo> andrewbogott: sorry I got distracted. I'm ready to pay attention to the grid stuff now
2021-03-18 16:53:27 <andrewbogott> np, I'm in the process of building toolsbeta-sgeexec-0902, once it's up I'll have you check my work :)
2021-03-18 16:53:38 <arturo> ok, that will take some time
2021-03-18 16:54:00 <andrewbogott> yeah :/
2021-03-18 16:58:05 <andrewbogott> maybe I will get some lunch while puppet runs. arturo if you want to go we can revisit this tomorrow, otherwise I'll ping you when it finishes
2021-03-18 16:59:28 <arturo> andrewbogott: ok! I will disconnect then for today
2021-03-18 16:59:45 <arturo> I think this doesn't block me for the grid buster upgrade stuff I plan to do tomorrow anyway
2021-03-18 16:59:56 <arturo> (without the path is just a small puppet agent complain)
2021-03-18 17:03:31 <mutante> I am getting the "Puppet failure" mails for nodes like "node3.cluster.local" but I have no idea what they are. Normally you receive this mail if you are a project admin. But this one surprises me. Should I dig more?
2021-03-18 17:43:28 <andrewbogott> mutante: that suggests that a VM has forgotten its hostname :(
2021-03-18 17:44:42 <andrewbogott> let me see if I can figure out who is saying that.
2021-03-18 17:47:03 <andrewbogott> mutante: are you a member of pontoon by chance? That would be my first guess...
2021-03-18 17:47:15 <mutante> andrewbogott: aha! thank you. so actually there are 3 of them. node1, node2 and node3
2021-03-18 17:47:20 <andrewbogott> Things in automation-framework are also chronically broken
2021-03-18 17:47:32 <mutante> checking pontoon
2021-03-18 17:47:58 <andrewbogott> When I last looked it seemed beyond help :)
2021-03-18 17:48:39 <mutante> no, I don't see pontoon in my project list on Horizon
2021-03-18 17:49:06 <mutante> maybe "puppet-diffs". let's see if I can leave that
2021-03-18 17:49:46 <mutante> ah no.. then I probably cant sync compiler facts
2021-03-18 17:50:41 <mutante> checking which of the projects I am in has exactly 3 nodes..hmm
2021-03-18 17:50:54 <mutante> bastion does
2021-03-18 17:51:49 <mutante> packaging does
2021-03-18 17:52:28 <mutante> puppet-diffs does ... yea,, those 3.. the others would not match
2021-03-18 17:55:09 <andrewbogott> I'm doing some cumin searches for broken instances… no guarantee this will find whatever's emailing you though
2021-03-18 17:55:12 <andrewbogott> https://www.irccloud.com/pastebin/PELdagPK/
2021-03-18 17:56:18 <mutante> andrewbogott: I guess let's just see if it keeps doing it every day in the future or it stops
2021-03-18 17:56:59 <andrewbogott> mutante: here are the other candidates:
2021-03-18 17:57:02 <andrewbogott> https://www.irccloud.com/pastebin/eqg1ehF2/
2021-03-18 17:57:14 <andrewbogott> although if the hostname is scrambled I have no idea if cumin can reach them
2021-03-18 17:57:23 <mutante> if it mails me as the proper "k8splay" again i will ping Wolfgang and check why puppet is bron there
2021-03-18 17:58:09 <mutante> if it keeps mailing me as "node1" though I can open a ticket
2021-03-18 17:59:11 <mutante> andrewbogott: hmm, thanks but in that paste I see nothing that looks familiar. maybe it was k8splay though
2021-03-18 18:14:06 <andrewbogott> mutante: ok! It's clearly not urgent, just depends on your tolerance for cronspam
2021-03-18 18:17:29 <mutante> andrewbogott: tolerance is high enough :)
2021-03-18 18:18:03 <andrewbogott> we all have lots of practice
2021-03-18 18:35:31 <arturo> mutante: that domain sounds like k8s
2021-03-18 18:44:33 <arturo> !log toolsbeta replacing toolsbeta-sgegrid-master with a Debian Buster VM (T277653)
2021-03-18 18:44:37 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 18:44:38 <stashbot> T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653
2021-03-18 18:49:29 <arturo> !log toolsbeta deleting VM toolsbeta-workflow-test, no longer useful
2021-03-18 18:49:32 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 18:50:19 <arturo> !log toolsbeta deleting VMs toolsbeta-paws-worker-1001 toolsbeta-paws-worker-1002 toolsbeta-paws-master-01 (testing for PAWS should happen in the paws project)
2021-03-18 18:50:22 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 18:55:55 <bstorm> !log toolsbeta set profile::toolforge::infrastructure across the entire project with login_server set on the bastion prefix
2021-03-18 18:55:58 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
2021-03-18 19:24:23 <bstorm> !log tools set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes
2021-03-18 19:24:26 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
2021-03-18 19:52:23 <bstorm> lunch
2021-03-18 19:52:37 <bstorm> ugh, getting my tabs confused. moving that to the other channel
2021-03-18 20:02:26 <Cyberpower678> bd808: andrewbogott: maybe it's just my monitoring application but I just mounted the new Cinder volume under /srv-new and my usage monitor already is claiming 19GB of used space.
2021-03-18 20:03:11 <Cyberpower678> I haven't put anything in it yet.
2021-03-18 20:05:12 <andrewbogott> what does df say?
2021-03-18 20:06:20 <Cyberpower678> That my monitoring application is being dumb and I should switch to something better lol.
2021-03-18 20:06:57 <Cyberpower678> Thanks
2021-03-18 20:08:04 <andrewbogott> np! Glad you're trying out cinder.
2021-03-18 20:09:23 <Cyberpower678> I've been eagerly waiting for this new feature. :D
2021-03-18 20:10:49 <Cyberpower678> andrewbogott: I
2021-03-18 20:11:09 <Cyberpower678> I'm reading the section on moving old srv data to the new cinder, but the instructions confuse me a liitle.
2021-03-18 20:11:37 <Cyberpower678> Step 2 tells me to verify the mounted drive exists of the new volume.
2021-03-18 20:11:50 <Cyberpower678> So what am I mounting in Step 6?
2021-03-18 20:13:51 <Cyberpower678> https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances#Moving_old_/srv_data_to_new_volume
2021-03-18 20:23:10 <andrewbogott> Cyberpower678: you're reading in the deprecated lvm section now I think?
2021-03-18 20:23:17 <andrewbogott> So not sure if it applies to what you're ding
2021-03-18 20:23:48 <Cyberpower678> Yea, I noticed. I just adapted the command to copy from /srv to /srv-new
2021-03-18 20:24:00 <andrewbogott> 'k
2021-03-18 20:24:22 <Cyberpower678> But once I have it copied, how can I change the mountpoints to move srv to srv-old and srv-new to srv?
2021-03-18 20:25:02 <andrewbogott> I would edit /etc/fstab and then reboot
2021-03-18 20:25:35 <andrewbogott> although typically when people are moving from an lvm volume to cinder it's in anticipation of attaching that cinder volume to a fresh VM I would think
2021-03-18 20:25:42 <Cyberpower678> alright that's easy enough.
2021-03-18 20:26:20 <Cyberpower678> Not inclined to setup the VM at this time. I've got so much to do right now. :-)
2021-03-18 20:27:40 <Cyberpower678> Should I also remove the srv puppet role, or will it not matter>?
2021-03-18 20:33:08 <andrewbogott> You probably should remove it, otherwise they'll collide

This page is generated from SQL logs, you can also download static txt files from here