[07:09:55] 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663502 (10Ladsgroup) >>! In T99531#3663063, @Dzahn wrote: > Next we should figure out: > > - who should have Gerrit permissions for +2/mer... [07:16:23] setting max_connections to 0 on cp3032's backend stopped the 503 spike this morning [07:17:18] I've now done that on all text nodes https://github.com/wikimedia/puppet/commit/0506427952a0f19d68a935f5e3b10785ef44b539 [07:30:06] 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663517 (10Ladsgroup) wikibase.eqiad.wmflabs is ready for test (hasn't applied the puppet roles yet). Added the hiera var in https://wikitech... [07:58:20] 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663637 (10Lydia_Pintscher) >>! In T99531#3663063, @Dzahn wrote: > - decide whether in puppet it should be "ensure => latest" (means merging... [08:23:04] 10Traffic, 10Operations, 10Pybal, 10monitoring, 10Patch-For-Review: pybal: add prometheus metrics - https://phabricator.wikimedia.org/T171710#3663686 (10ema) 05Open>03Resolved a:03ema PyBal 1.14.0, currently in prod, includes prometheus metrics. Closing. [08:44:50] <_joe_> ema: I embarked in a fool's errand [08:46:48] :) [08:47:51] _joe_: so yeah I think the idea behind the too small alert is right [08:47:57] total < (total * crd.lvsservice.getDepoolThreshold() + 1) [08:48:07] however canDepool in pybal is different: [08:48:12] return len(self.servers) - len(downServers) >= len(self.servers) * self.lvsservice.getDepoolThreshold() [08:49:25] also, the alert says 'warning' but we raise an icinga critical [08:51:44] <_joe_> ema: yeah that's the way our icinga check for pybal works, meh [08:55:56] _joe_: this should be enough, right? https://gerrit.wikimedia.org/r/382667 [08:57:19] <_joe_> ema: once gerrit loads, my connection is dodgy :P [08:58:25] 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663768 (10Addshore) >>! In T99531#3663637, @Lydia_Pintscher wrote: > Let's stick with ensure => latest. Having to find the person who can ac... [09:26:28] mmh now we've got some UNKNOWNs from pybal backends health check in icinga [09:26:37] I'll check that later today [11:08:13] ema: what is that based on? [11:08:30] (should we expose a 'healthy' status, as determined by pybal, to prometheus?) [12:12:16] mark: that was based on sad_trombone.wav :) https://gerrit.wikimedia.org/r/#/c/382676/ [12:21:26] bblack: so this morning cp3032 was affected by the failed fetches issue [12:21:47] instead of restarting the backend, I've set max_connections to 0 by hand and reloaded the VCL [12:22:16] backend connections went up to ~11k and then it recovered [12:22:25] https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?orgId=1&var-server=cp3032&var-datasource=esams%20prometheus%2Fops&from=1507264202294&to=1507276726644&panelId=16&fullscreen [12:23:21] see how the number of backend connections flattened at 8k, then spiked up and went back to normal [12:35:09] so yeah I've set max_connections to 0 on text [12:38:20] mark: how did prometheus took the stress btw? :) [12:41:41] it churned through [12:43:47] nice, IIRC there's also a query timeout somewhere [12:51:31] 10Traffic, 10Operations, 10Goal, 10User-fgiunchedi: Add Prometheus client support for varnish/statsd metrics daemons - https://phabricator.wikimedia.org/T177199#3664279 (10ema) p:05Triage>03Normal [13:01:59] 10Traffic, 10Operations, 10Goal, 10User-fgiunchedi: Add Prometheus client support for varnish/statsd metrics daemons - https://phabricator.wikimedia.org/T177199#3664281 (10fgiunchedi) IMO we could approach the problem of getting the stats above to Prometheus in at least two ways: 1. Import the Prometheus... [13:16:59] ema: awesome [13:22:05] 10netops, 10Analytics, 10Operations, 10User-Elukey: Review ACLs for the Analytics VLAN - https://phabricator.wikimedia.org/T157435#3664332 (10elukey) 05Open>03stalled [14:17:50] 10Domains, 10Traffic, 10Wikimedia-Apache-configuration, 10AbuseFilter, and 11 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664455 (10Joaquinito2018) [14:27:34] 10Domains, 10Traffic, 10Wikimedia-Apache-configuration, 10AbuseFilter, and 12 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664533 (1012345) a:0312345 [14:28:17] 10Domains, 10Traffic, 10Wikimedia-Apache-configuration, 10AbuseFilter, and 14 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664537 (10Joaquinito2018) [14:38:11] 10Traffic, 10ApiFeatureUsage, 10Collaboration-Team-Triage, 10MediaWiki-API, and 8 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664586 (10Joaquinito2018) 05Invalid>03Open Register to byet.host. [14:38:37] 10Traffic, 10ApiFeatureUsage, 10Collaboration-Team-Triage, 10MediaWiki-API, and 8 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664590 (10Joaquinito2018) With free hosting, and free subdomain. [14:41:43] 10Traffic, 10ApiFeatureUsage, 10Collaboration-Team-Triage, 10MediaWiki-API, and 7 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664611 (10Joaquinito2018) [14:54:46] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3664636 (10elukey) One thing that I noticed now from https://github.com/varnishcache/varnish-cache/blob/master/doc/changes.rst is the following: ``` Varnish Cache 5.2-R... [15:01:43] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3664657 (10BBlack) We're moving to 5.1.3 with this upgrade. 5.2.0 is a little too bleeding-edge for now :) [15:06:17] bblack: thanks, just wanted to make sure that we didn't jump to 5.2 for some reason in the nearish future :) [15:06:38] I'll test varnishkafka with 5.1.3 and report back, that version should be fine afaics [15:20:05] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3664709 (10ema) >>! In T168529#3664636, @elukey wrote: > all the other python daemons will need to get reviewed (already seeing commits for 5.2 in https://github.com/xci... [15:30:55] 10netops, 10Operations: Implement RPKI (Resource Public Key Infrastructure) - https://phabricator.wikimedia.org/T61115#3664762 (10ayounsi) 05Open>03Resolved We're all done here! [15:52:46] <_joe_> so, my refactor is not breaking anything, apparently https://puppet-compiler.wmflabs.org/compiler02/8218/ [15:52:59] <_joe_> but I'll work on it a bit more on monday [15:53:11] <_joe_> I'm leaving y'all now for the weekend [15:59:20] thanks _joe_ :) [16:09:16] _joe_: you're a hero! <3 [16:10:20] the weekend is starting right now for me too o/ [18:38:10] my bouncer rebooted for $unknown_raisins :P [18:49:56] Damn raisins! [21:23:45] always hysterical