[05:54:56] https://www.youtube.com/playlist?list=PLbRoZ5Rrl5lcxarVFj9qeOOt9lO8stUxT SREcon APAC 2019, some interesting talks [09:01:54] FYI https://gerrit.wikimedia.org/r/c/operations/puppet/+/523891 varnishreqstats deprecation, also removing alerts on absolute number of 5xx, PTAL [09:56:11] jynus: there is a changed staged from you 'Remove sarin/neodymium from grant/mysql root hosts' is it ok to merge [09:56:19] yes [09:56:26] it is a noop to infra [09:56:28] sorry [09:56:39] np its going now [10:06:23] do you know if we have a puppetdb in wmcs? context is https://phabricator.wikimedia.org/T228395 [10:07:34] godog: there are some within projects that have a standalone puppetmaster, AFAIK not a global one [10:08:09] I'm sure automation-framework projects has it and probably deployment-prep [10:08:17] yeah, there's no puppetdb in WMCS in general [10:08:57] thank you! did I get it right that standalone puppetmaster nowadays gets a puppetdb too (by default?) [10:09:28] I'm checking on horizon if I still have standalone puppet master [10:09:32] no, not by default AFAIK [10:09:44] at least I have it on separate VMs [10:09:58] hi all i just hit one of the issues people get when running puppet 4.8 with ruby > 2.3. the fix is https://phabricator.wikimedia.org/P8772 [10:10:00] and last time I checked our puppetization requires 2 hosts (postgres master/slave) [10:13:34] thanks, sounds a bit more work than I was willing to invest in fixing the bug above right away [10:13:40] at least now that is [10:16:02] godog: from a (not so quick) cumin query on labpuppetmaster1001 I got 6 hosts with puppetdb in the name: [10:16:05] af-puppetdb[01-02].automation-framework.eqiad.wmflabs,deployment-puppetdb02.deployment-prep.eqiad.wmflabs,jeh-puppetdb.testlabs.eqiad.wmflabs,tools-puppetdb-01.tools.eqiad.wmflabs,toolsbeta-puppetdb-01.toolsbeta.eqiad.wmflabs [10:16:10] but people can be creative with names; ) [10:16:47] 'name:puppetdb' is the query btw [10:18:13] oh! thanks volans, worth a test on the deployment one [10:18:21] err, deployment-prep [10:21:24] heheh of course not reproducible in there, good times [10:21:56] ok guesswork will have to do [10:29:23] godog: i have added a list of servers with puppetdb installed to that task (had to look this up recently) [10:30:25] jbond42: thanks! currently can't reproduce on the deployment-prep one, though I'm confident the review in the task will fix it [10:33:50] ack i can take a look at the change in a bit but will need to refresh myself on kmx first [10:34:09] s/kmx/jmx/ [10:41:46] jbond42: thanks, sounds good, also no rush really [10:42:11] ack [11:17:27] fyi all i just updated the pupppet_coding guid to refrence lookup instead of hiera https://wikitech.wikimedia.org/w/index.php?title=Puppet_coding&type=revision&diff=1832880&oldid=1832313 [11:26:21] jbond42: thanks! This is a perfect example of info I'd like to easily find when looking at "what happen during last week" (re:discussion from yesterday's meeting, cc paravoid ) :-) [12:09:25] jbond42: so if [12:09:40] "is frowned upon, but is still commonly used" [12:09:48] what should we do so to use fallback values? [12:15:10] jijiki: hiera, no that the structure has been reorganised it should not cause issues storing defaults in labs.yaml and under common in hiera [12:16:21] jijiki: its worth saying that, this has always been the policy, i have just s/hiera/lookup/ [12:29:24] ok I see [12:29:28] thank yoU! [12:38:39] for those intrested in wmcs/ops puppet evolution you may want to follow this ticket https://phabricator.wikimedia.org/T227029 [17:10:14] some easy +1 https://gerrit.wikimedia.org/r/c/operations/puppet/+/524277 [17:28:19] done. just check that icinga config is ok after merge [17:35:32] mw2250 - Unable to run wmf-auto-reimage-host - REIMAG END | retcode=2 (Still waiting for Puppet after 115.0 minutes) [17:36:06] :( so also for appservers it doesn't just work out of the box with a host that currently has the role applied and just gets reimaged [17:37:12] mutante: when did you reimaged it ? [17:37:12] looking what the puppet issue is.. oh. it's supposedly already/still running [17:37:38] volans: started the command in screen last night (PST) checked result just now [17:37:41] I see all runs failed since the first one at 2019-07-18 13:07:02.667000+00:00 [17:37:47] https://puppetboard.wikimedia.org/node/mw2250.codfw.wmnet [17:37:54] (puppet runs) [17:38:06] also related to scap [17:38:11] like the other one few days ago [17:38:21] '/usr/bin/scap pull' returned 1 instead of one of [0] [17:38:33] oh. that is the bug i reported yesterday, lol [17:38:46] it all comes back together :p [17:39:08] yea, different puppet roles but both use scap [17:39:16] and scap pull fails because mwscript is not existing [17:39:33] bad order of deps? [17:39:36] volans: https://phabricator.wikimedia.org/T228328 [17:40:30] yea, so it needs deployment to this host from deployment server [17:40:56] I'd say it needs a fix in puppet [17:41:02] it must work at first puppet run [17:41:18] or our reimage workflow is totally broken and not anymore automated [17:41:22] it probably usually does in this case [17:41:49] if scap wasn't broken. but yes [17:42:10] https://gerrit.wikimedia.org/r/523999 [17:42:58] the pull fails because of "Command '/usr/local/bin/mwscript extensions/WikimediaMaintenance/refreshMessageBlobs.php' returned non-zero exit status 1" [17:43:09] so Tyler was going to remove that part from scap pull [17:44:35] mwscript is not installed on all appservers (anymore) [17:46:46] well, i have the fresh OS and we can use the server again after maintenance. it's alright [17:48:20] can wait to see how it is after that eventually gets merged [17:51:15] volans: yea, confirmed the puppet error is really just because of "scap pull" failure, nothing else. so usually the workflow is intact for appservers except the scap bug that is being worked on. last case was different as it had a true puppet issue that needs fixing. different role [17:54:22] ack