[08:56:52] 10Acme-chief, 10cloud-services-team (Kanban): tools/toolsbeta: improve acme-chief integration - https://phabricator.wikimedia.org/T252762 (10aborrero) 05Open→03Resolved a:03Andrew Thanks! We can close this task now and reopen if required later. [09:24:07] hello folks [09:24:16] hi elukey [09:24:34] as FYI I am about to merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/603366/ and run puppet on cp-text to switch backend for piwik.wikimedia.org :) [09:24:49] elukey: can this wait a few minutes? [09:24:50] err we have puppet disabled on cp-text [09:24:51] :) [09:25:19] ahahhaha [09:25:25] yes sure, sorry [09:25:47] few minutes == 5/10 or 30? [09:26:06] (to understand if I have to stop the migration or not) [09:26:10] elukey: 5/10 [09:26:18] thanks :) [09:28:08] elukey: the reason for having puppet disabled is that yesterday we hit a memory leak upon ats config update which caused all sorts of fun to be had. Today we've merged another config change, applied it only on two hosts, and now waiting some minutes to see if stuff breaks [09:29:30] the fact that you need to apply a change fleet-wide comes exactly at the right time: in case of troubles we have a scapegoat \o/ [09:29:52] snap I should have asked, my bad [09:30:22] vgutierrez: 3052 looks fine. OK to re-enable puppet everywhere? [09:30:52] yeah, let's go for it [09:31:53] elukey: you're all set! [09:32:08] \o/ [09:32:10] merging [09:33:52] change is merged [09:34:11] * ema watches https://w.wiki/Tjs [09:34:14] can I go ahead and run puppet? [09:34:17] elukey: y [09:34:59] elukey: you're exposing yourself too much, now if anything happens to ats with the new config will clearly be your fault :-P [09:35:15] volans: ssshhhhh [09:35:41] I know! I'll in turn blame matomo [09:36:16] running on cumin1001 with -b 4 [09:36:21] ack [09:46:30] done! [09:49:03] elukey: excellent! [09:51:53] hi, I have https://gerrit.wikimedia.org/r/c/operations/puppet/+/604613 to add two services to lvs eqiad low-traffic, ok to go ahead ? [09:53:56] you're so lucky today... moar scapegoats :D [09:54:32] haha! glad to help [10:02:04] nice [10:02:30] ema: we can ask them to fix some ATS "feature" every time that they need our services [10:03:34] hello, you are doing maintenance on cp-text? [10:03:41] i would like to run puppet on all of them [10:03:47] to fix a mistake i made [10:03:50] 10Traffic, 10Operations: ATS memory leak upon removing healthchecks.so from configuration - https://phabricator.wikimedia.org/T255120 (10ema) [10:04:13] mutante: I think so, but let's wait for ema [10:04:14] I am not currently doing any work on cp-text, and elukey is done [10:04:19] :) [10:04:21] there you go [10:04:33] +1! [10:04:47] mistake i made was due to a bad rebase: https://gerrit.wikimedia.org/r/c/operations/puppet/+/599323/3/hieradata/common/profile/trafficserver/backend.yaml [10:04:53] it added a second backend for piwik [10:04:57] thanks all [10:05:15] hmm that broke puppet? [10:05:16] i am not sure if it tries to loadbalance between them like that [10:05:25] or triggered an undesired behaviour? [10:05:28] no, did not break puppet [10:05:46] it's just that it is supposed to server only from matomo1002 [10:05:50] that's.. interesting [10:05:58] elukey just switched it from 1001 to 1002 before [10:06:13] yeah, I'm wondering what's puppet doing there [10:06:16] then i made an unrelated change about planet [10:06:18] it really is, you'd think that something at some point will complain if a hash in yaml has the same key twice [10:06:40] i merged the fix https://gerrit.wikimedia.org/r/c/operations/puppet/+/604646/2/hieradata/common/profile/trafficserver/backend.yaml [10:06:46] but have not run puppet via cumin yet [10:06:50] cause the replacement parameter is a String [10:06:52] should i do that now? [10:06:58] not an Array[String] [10:07:17] mutante: trigger a puppet run on cp-text please :) [10:07:38] w/in 4 [10:07:43] not bad! [10:07:45] vgutierrez: started! [10:07:54] ema: msg leak? :) [10:08:03] :) [10:09:18] -map http://piwik.wikimedia.org https://matomo1002.eqiad.wmnet [10:09:18] +map http://piwik.wikimedia.org https://matomo1001.eqiad.wmnet [10:09:23] this was from a puppet run , btw [10:09:44] remap.config just got the second one [10:09:50] the previous one triggered by your rebase I guess [10:10:11] i mean the manual puppet run i did before i merged the fix [10:10:17] on a single machine [10:10:28] that's when i noticed the mistake [10:11:27] mutante: the example that you posted above is for the new puppet run or the old? [10:11:42] for the old [10:11:51] super just wanted to check :) [10:11:57] it was to show what puppet did when it gets 2 lines at once [10:12:10] it just used the second one [10:13:11] ack ack [10:13:17] right now, on cp1075, in /etc/trafficserver/remap.config there is 1002 and only that [10:13:25] and the puppet run finished now [10:13:36] on all of cp-text [10:13:46] cool [10:14:08] I am checking traffic on matomo1001, looks good so far [10:14:30] 1001? as in no traffic? [10:14:31] ;P [10:14:57] 10Traffic, 10Operations: ATS memory leak upon removing healthchecks.so from configuration - https://phabricator.wikimedia.org/T255120 (10ema) [10:23:42] 10Traffic, 10Operations: ATS memory leak upon removing healthchecks.so from configuration - https://phabricator.wikimedia.org/T255120 (10ema) p:05Triage→03Medium [10:26:10] going ahead with https://gerrit.wikimedia.org/r/c/operations/puppet/+/604613 shortly, LMK otherwise! [10:26:55] hmmm [10:27:33] mutante: it looks like planet.wm.org is broken cause the servers don't have planet.wm.o as a valid SNI on the internal cert (the one used for planet.discovery.wmnet) [10:27:54] godog: ack :D [10:28:51] vgutierrez: i thought i had checked just that.. but no worries, it was broken before and i can fix it. the purpose of that change was to avoid the error message [10:29:22] it was like this (when you dont specify a language prefix) all the time..that's what i wanted to avoid [10:30:15] oh right, i had checked the unified cert but not the discovery cert.. yea, not a problem to fix [10:40:57] vgutierrez: it turns out our good old icinga also is using "Host:varnishcheck": https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604653/ [10:41:42] interesting [10:43:53] 10Traffic, 10Operations, 10Patch-For-Review: Varnish and ATS health-check improvements - https://phabricator.wikimedia.org/T255015 (10ema) [10:44:14] updating the ticket to add this new discovery :) [10:47:30] 10Traffic, 10Operations, 10Patch-For-Review: Varnish and ATS health-check improvements - https://phabricator.wikimedia.org/T255015 (10ema) [12:22:20] might be interesting to this audience https://www.potaroo.net/ispcol/2020-05/futuretech.html [12:53:56] 10Traffic, 10Operations, 10Patch-For-Review: Varnish and ATS health-check improvements - https://phabricator.wikimedia.org/T255015 (10ema) [13:28:36] vgutierrez: merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604364/ with puppet disabled on A:cp [13:28:42] ack [13:38:51] things look good on cp3050, icinga[12]001 now updated all checks definitions for all hosts to use varnishcheck.wikimedia.org, re-enabling puppet [13:39:34] \o/ [13:40:17] 10Traffic, 10Operations, 10Patch-For-Review: Varnish and ATS health-check improvements - https://phabricator.wikimedia.org/T255015 (10ema) [13:45:39] 10Domains, 10Traffic, 10Operations: wikibase.org should redirect to wikiba.se - https://phabricator.wikimedia.org/T254957 (10jbond) p:05Triage→03Medium [14:26:30] 10Domains, 10Traffic, 10Operations: wikibase.org should redirect to wikiba.se - https://phabricator.wikimedia.org/T254957 (10jbond) Currently wikiba.se is not managed by wikimedia foundation, we don't have control over the hosting environment or management of the DNS. wikibase.org is a Wikimedia foundation... [14:27:53] 10Domains, 10Traffic, 10Operations: wikibase.org should redirect to wikiba.se - https://phabricator.wikimedia.org/T254957 (10Dzahn) For the reason why wikiba.se is not under the control of the WMF you can read history on T99531. [17:01:03] 10Traffic, 10Core Platform Team, 10Operations, 10Patch-For-Review: Configure purged in deployment-prep - https://phabricator.wikimedia.org/T254844 (10ema) I have cherry-picked https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604743/ on deployment-puppetmaster04 and added `profile::cache::purge::kafka... [17:06:47] 10Traffic, 10Continuous-Integration-Infrastructure, 10Operations: Caching of https://doc.wikimedia.org/cover/mediawiki-libs-IPUtils/IPUtils.php.html is inconsistent - https://phabricator.wikimedia.org/T252131 (10Jdforrester-WMF) 05Open→03Declined Working as expected. [17:20:55] 10Traffic, 10Core Platform Team, 10Operations, 10Patch-For-Review: Configure purged in deployment-prep - https://phabricator.wikimedia.org/T254844 (10ema) >>! In T254844#6216088, @ema wrote: > However, by looking at the actual PURGE requests generated by purged, it seems that we're only sending both kafka... [19:04:55] 10netops, 10Operations, 10fundraising-tech-ops, 10WMF-NDA: Deploy pfw policy 1591901800 for T122104 - https://phabricator.wikimedia.org/T255185 (10Dwisehaupt)