[06:09:09] 10Traffic, 10Operations, 10Pybal: Alerts on LVS services with one single realserver - https://phabricator.wikimedia.org/T177815#3671128 (10ema) [06:09:19] 10Traffic, 10Operations, 10Pybal: Alerts on LVS services with one single realserver - https://phabricator.wikimedia.org/T177815#3671140 (10ema) p:05Triage>03Normal [06:58:08] 10Traffic, 10Operations, 10Pybal: Alerts on LVS services with one single realserver - https://phabricator.wikimedia.org/T177815#3671160 (10Joe) I would suggest we need to add a condition to the alert so that it gets skipped when the pool size is one backend only. [09:39:57] 10netops, 10Operations: Allow syslog (-tls) from both wezen and lithium in labs - https://phabricator.wikimedia.org/T177820#3671443 (10fgiunchedi) [10:15:31] 10netops, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671561 (10elukey) [11:18:49] 10netops, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3661358 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['druid1004.eqiad.wmnet'] ``` The log can be found... [11:18:59] 10netops, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671633 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['druid1004.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['druid1004.eqiad.wmnet'] ``` [14:06:59] so pcc looks happy about the v5 patch now: https://puppet-compiler.wmflabs.org/compiler02/8268/ [14:07:09] bblack: ok to merge or do you have further comments? :) [14:57:29] ema: seems ok to me [15:08:07] ok, merged [15:11:14] 10Traffic, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Invalid "wikimedia" family in unique devices data due to misplaced WMF-Last-Access-Global cookie - https://phabricator.wikimedia.org/T174640#3672320 (10Nuria) [15:15:14] mmh looks like there's no updated version of libvmod-netmapper in jessie-wikimedia/experimental [15:17:34] I thought they were both uploaded last week, maybe not! [15:18:49] yeah, built on copper but not uploaded apparently [15:23:45] say hello to varnish 5 running on pinkunicorn :) [15:24:20] is varnishkafka happy? [15:24:30] haven't tested it yet and I am curious :) [15:24:44] elukey: it's running [15:24:53] I'm not sure whether it's also happy though [15:25:39] that's a good starting point :) [15:26:11] indeed! [15:38:02] > VCL names are restricted to alphanumeric characters, dashes (-) and underscores (_). In addition, the first character should be alphabetic. That is, the name should match "[A-Za-z][A-Za-z0-9_-]*" [15:38:20] so yeah, reload-vcl needs to be patched, we currently use ':' in the VCL name [15:43:55] I did upload libvmod-netmapper [15:44:40] ema: but maybe with a wrong version number? the reprepro email says "replace" not add [15:45:19] XioNoX: we currently still have 1.4-1 on apt.w.o, maybe try again? [15:46:07] XioNoX: something like this [15:46:18] reprepro -C experimental include jessie-wikimedia libvmod-netmapper-blabla.changes [15:50:17] ema: .changes put in a distribution not listed within it! [15:50:17] To ignore use --ignore=wrongdistribution. [15:50:35] the .changes says Distribution: unstable [15:51:18] XioNoX: sure, --ignore=wrongdistribution [15:52:07] ema: Skipping inclusion of 'libvmod-netmapper' '1.5-1' in 'jessie-wikimedia|experimental|source', as it has already '1.5-1'. [15:52:43] reprepro changes: [15:52:43] replace jessie-wikimedia deb experimental amd64 libvmod-netmapper-dbg 1.5-1 1.4-1 -- pool/experimental/libv/libvmod-netmapper/libvmod-netmapper-dbg_1.5-1_amd64.deb -- pool/experimental/libv/libvmod-netmapper/libvmod-netmapper-dbg_1.4-1_amd64.deb [15:52:44] XioNoX: and now indeed it's there [15:52:51] ah, cool! [15:52:55] thanks! [15:53:01] thank you! [15:53:25] ema: so max_connections seems to have fixed up the text/esams 503s? [15:53:35] bblack: yes [15:55:00] bblack: see eg https://grafana.wikimedia.org/dashboard/db/varnish-failed-fetches?orgId=1&from=1507605054947&to=1507627260472 [15:55:06] 10Traffic, 10Operations, 10Patch-For-Review: varnish backends start returning 503s after ~6 days uptime - https://phabricator.wikimedia.org/T145661#3672450 (10BBlack) 05Open>03Resolved a:03BBlack The cache admission policy change seems to have gotten us over this for now. We should probably wait for t... [15:55:45] 10Traffic, 10netops, 10Operations, 10ops-eqiad: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3672458 (10Cmjohnson) Swapped the NIC card with a new one that HP sent. [15:55:47] ema: right, we might want to eventually investigate the cause of those connection spikes, but at least it doesn't harm other traffic for now [15:56:04] gonna close up the other 503 tickets so we can start fresher on future problems [15:56:07] \o/ [15:56:15] 10Traffic, 10Operations, 10Patch-For-Review: varnish backends start returning 503s after ~6 days uptime - https://phabricator.wikimedia.org/T145661#3672461 (10BBlack) [15:56:19] 10Traffic, 10Operations, 10Patch-For-Review: Recurrent 'mailbox lag' critical alerts and 500s - https://phabricator.wikimedia.org/T174932#3672459 (10BBlack) 05Open>03Resolved a:03BBlack [15:56:43] 10Traffic, 10Operations, 10Patch-For-Review: Text eqiad varnish 503 spikes - https://phabricator.wikimedia.org/T175803#3672462 (10BBlack) 05Open>03Resolved a:03BBlack ^ The above seems to have resolved the esams-specific 503s. Closing this up! [16:24:18] 10Traffic, 10netops, 10Operations, 10Pybal, 10Patch-For-Review: Deploy pybal with BGP MED support (for primary/backup) in production - https://phabricator.wikimedia.org/T165584#3672568 (10ayounsi) We need to cleanup this specific term, now that the LVS advertise the MED themselves. ```delete policy-optio... [16:55:06] 10netops, 10Operations, 10fundraising-tech-ops, 10Patch-For-Review: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3672724 (10Cmjohnson) [18:29:00] 10netops, 10Operations, 10Patch-For-Review: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3673052 (10ayounsi) After depooling esams the Telia link in eqiad started to saturate, I added the following terms to temporary ease out that link. ``` [edit policy-options as-path-group AVOID-P... [21:58:53] 10netops, 10Operations, 10Patch-For-Review: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3674000 (10ayounsi) Temporary AVOID-PATHS removed on cr2-eqiad. The maintenance is now completed, some notes: - It was not clear that the plan included removing OSPF on the trans-atlantic link... [21:59:03] 10netops, 10Operations, 10Patch-For-Review: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3674001 (10ayounsi) a:03ayounsi