[13:18:20] is this the right place to look for someone who can give a C+2 on the puppet repo? [13:18:48] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1077803 is the other side of a config patch Lucas just deployed for me, but I only have C+1 rights on the puppet repo. [13:22:40] ^ akosiaris [13:23:42] it looks ok to merge, not sure the status of scandium though [13:24:06] jayme, hnowlan, kamila_ - do you know by any chance? --^ [13:25:54] looking although I don't know much about scandium [13:28:43] most of that patch is just adding "...and parsoidtest1001" to the places that previously mentioned only scandium [13:28:59] it looks like that visualdiff class isn't used anywhere in production, is that expected? [13:30:00] our visualdiff servers are on parsoid-qa-02.wikitextexp.eqiad1.wikimedia.cloud [13:30:03] https://www.mediawiki.org/wiki/Parsing/Visual_Diff_Testing [13:31:01] i don't know the full history, but perhaps there was an attempt to move that class from wmcloud to production? [13:31:16] ahh okay [13:31:56] it needs to be in puppet even if it's just puppetising wmcloud hosts. change looks okay to me, shall I merge? [13:32:37] (I really don't know anything beyond "looks syntactically correct", sorry) [13:32:52] sure, works for me. should be harmless -- we just finished our testing of our tagged parsoid for this week, so we have until next monday to fix scandium/parsoidtest1001 if we've broken it [13:36:54] puppet compiler is still running, maybe you want to wait until it confirms that the patch doesn't break anything [13:37:02] https://puppet-compiler.wmflabs.org/output/1077803/2021/ [13:38:07] ah, merged already [13:38:18] based on the classes it touches I'm not too worried [13:44:01] thanks! [14:20:11] jhathaway, brett: calm morning, just a a non impacting crash. Also see https://phabricator.wikimedia.org/T303534#10206075 [14:21:28] thanks jynus [14:24:46] moritzm: thanks. I 've left a message in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1077803, summary is I already had the same patches up and was going to land them in this week [14:25:06] cscott: ^ [14:25:36] ok, no worries. sorry for jumping the gun :) [14:25:55] is there anything else that needs to be landed at this point? [14:30:07] Not from SRE side, but if you spot anything problematic on your side, let us know so we see how we can fix it. [14:37:17] jelto: CI failure for the cookbook is clearly caused by a new release of prospector that happened ~1h ago that includes a pylint >3.0.0. I'll send a fix after the meetings, sorry for the noise. [14:38:17] ah great thanks. I was a bit surprised and thought it's a rebase issue. Then I'll wait for the fix [15:25:44] patch sent [16:00:39] <_joe_> volans: I think it's time we start freezing the prospector version, it's unacceptable that people can't merge changes because of linter changes, and we should be explicit about switching to new versions and fix the issues when we do [16:00:50] <_joe_> freezing all the CI requirements tbh [16:01:01] volans: I don't know if it's known, but dbctl is really slow [16:01:19] try running "sudo dbctl config commit -b -m "Depool for maint"" on cumin1002 [16:01:20] <_joe_> Amir1: uhm where are you running it from? [16:01:36] basically unresponsive [16:01:48] <_joe_> uhm I hope it's not conftool2git :) [16:01:55] <_joe_> it should reply very quickly [16:02:06] <_joe_> Amir1: can you run it with --debug and see where it gets stuck? [16:02:11] sure [16:03:01] I ctrl+c'ed it and now it says nothing to commit [16:03:16] <_joe_> uhm [16:03:36] <_joe_> so I see conftool2git just got restarted [16:03:43] let me try depooling something from s6 [16:04:06] <_joe_> Oct 07 15:55:53 puppetserver1003 systemd[1]: Stopping conftool2git.service - Conftool2git service... [16:04:08] <_joe_> uh [16:04:13] <_joe_> that is strange [16:05:24] now it's fast [16:05:44] <_joe_> Amir1: yeah looks like someone stopped conftool2git on puppetserver1003 [16:05:49] <_joe_> don't get what or why [16:06:19] <_joe_> exactly while you were doing your commit, but it should in theory time out after a few seconds [16:06:24] <_joe_> I'll re-check that [16:07:21] <_joe_> Amir1: when you did ctrl+c it, what was the traceback you got? [16:08:10] let me try to find it [16:09:25] https://phabricator.wikimedia.org/P69491 [16:13:28] sorry, puppetserver1003 was rebooted, I was adding additional RAM modules to it with valerie [16:13:47] I had depooled it via the SRV records, but didn't anticipate conftool changes [16:14:47] 1002 and 1003 are done, and 1001 is for another time, so shouldn't happen again for now [16:16:55] <_joe_> moritzm: heh unfortunate timing on our part [16:17:07] <_joe_> that would explain the problem - connect timeout too high [16:18:17] <_joe_> I can fix that, the design of the system is to resist reboots without users noticing and no real consequence besides lost of change granularity in the audit log [16:18:50] <_joe_> and yes, the trace confirms it [16:18:54] <_joe_> File "/usr/lib/python3.9/socket.py", line 831, in create_connection [16:22:09] always happy to provide random chaosmonkey testing, then :-) [16:23:39] :D