[07:22:23] 10Traffic, 06Operations, 13Patch-For-Review, 15User-Elukey: prometheus-vhtcpd-stats cronspamming if vhtcpd is not running yet - https://phabricator.wikimedia.org/T157353#3015970 (10elukey) 05Open>03Resolved [09:52:14] 10Traffic, 06Operations, 10Pybal, 15User-Joe: Pybal not happy with DNS delays - https://phabricator.wikimedia.org/T154759#2922958 (10ema) Pybal not failing over to the next DNS server in resolv.conf has been mentioned in T83662 as well. [09:57:09] 10Traffic, 06Operations, 10Pybal: Unhandled error stopping pybal: 'RunCommandMonitoringProtocol' object has no attribute 'checkCall' - https://phabricator.wikimedia.org/T157786#3016326 (10ema) [09:57:21] 10Traffic, 06Operations, 10Pybal: Unhandled error stopping pybal: 'RunCommandMonitoringProtocol' object has no attribute 'checkCall' - https://phabricator.wikimedia.org/T157786#3016339 (10ema) p:05Triage>03Normal [11:59:10] 10Traffic, 06Operations, 10Pybal: lvs servers report 'Memory allocation problem' on bootup - https://phabricator.wikimedia.org/T82849#3016610 (10ema) 05Resolved>03Open [12:02:09] 10Traffic, 06Operations, 10Pybal: lvs servers report 'Memory allocation problem' on bootup - https://phabricator.wikimedia.org/T82849#3016616 (10ema) This is still happening. @chasemp mentioned in T113597 that the error (from ipvsadm) can be reproduced referencing a pool that doesn't exist. [12:16:13] 10Traffic, 10Monitoring, 06Operations: diamond crashing on hosts using systemd-timesyncd - https://phabricator.wikimedia.org/T157794#3016651 (10ema) [12:22:10] 10Traffic, 10Monitoring, 06Operations: diamond crashing on hosts using systemd-timesyncd - https://phabricator.wikimedia.org/T157794#3016652 (10fgiunchedi) > Should we remove /usr/share/diamond/collectors/ntpd/ if systemd-timesyncd is in use? If that isn't too messy on the puppet level I think it'd make sen... [12:29:53] 10netops, 06Analytics-Kanban, 06Operations: Review ACLs for the Analytics VLAN - https://phabricator.wikimedia.org/T157435#3016653 (10elukey) Added kafka2003, fixed Archiva. [12:36:30] 10netops, 06Analytics-Kanban, 06Operations: Review ACLs for the Analytics VLAN - https://phabricator.wikimedia.org/T157435#3016675 (10elukey) Other batches: Fix logstash IPs: ``` set firewall family inet filter analytics-in4 term logstash from destination-address 10.64.0.122 set firewall family inet filter... [13:21:26] 10netops, 06Analytics-Kanban, 06Operations: Review ACLs for the Analytics VLAN - https://phabricator.wikimedia.org/T157435#3016747 (10elukey) Fixed logstash IPs, added install1002 (208.80.154.86/32) but not removed the other ones (for the moment). [14:26:19] 10Traffic, 06Operations, 06Performance-Team, 06Reading-Web-Backlog, and 3 others: Performance review #2 of Hovercards (Popups extension) - https://phabricator.wikimedia.org/T70861#3016954 (10ovasileva) [15:33:12] 10Traffic, 06Operations, 10ops-eqiad: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#3017160 (10ema) ``` [Fri Feb 10 15:01:57 2017] bnx2x 0000:01:00.0 eth0: Warning: Unqualified SFP+ module detected, Port 0 from FINISAR CORP. part number FTLX1471D3BCL [Fri Feb 10... [15:46:40] 10Traffic, 06Operations, 10Pybal: lvs servers report 'Memory allocation problem' on bootup - https://phabricator.wikimedia.org/T82849#906173 (10BBlack) Yeah ipvsadm says "memory allocation problem" if you give it any kind of not-useful arguments (like delete a non-existent service, etc) [15:49:17] <_joe_> ema: I have a couple of spare hours before my day ends, I might take a stab at rewriting the etcd connector for pybal [15:49:39] <_joe_> ema: if you are already working on it, let me know, I'm happy to help [15:50:30] _joe_: nice! So far I've packaged 1.13.4 with the logging changes and upgraded lvs1007-12 [15:50:37] <_joe_> cool [15:50:49] the plan was to keep an eye on them over the weekend and carry on with the others on Monday [15:50:56] <_joe_> I really think we should go on to use the official python library with all its advantages [15:51:04] <_joe_> via deferToThread [15:51:16] makes sense, yeah [15:51:20] <_joe_> I just have to understand how to pass the signals to such a child thread [15:51:26] <_joe_> it's not immediately clear [15:51:32] <_joe_> from the docs [15:52:18] <_joe_> it still mean we'll have long-running threads for every pool [15:52:34] <_joe_> it might become unfeasible, let's test it anyways :) [15:53:25] well we don't have that many pools anyways right? [15:53:31] oh yeah, microservices! :P [15:53:38] <_joe_> bblack: lvs1003/6 have quite a few [15:54:57] <_joe_> speaking of microservices, it's time to activate my coffeoid! [15:56:03] coffe OID, put some ASN.1 in your coffee [15:57:44] let's do loadbalancing algorithms as a microservice [15:57:54] then pybal and ipvs don't need all that messy logic inside them [15:58:16] when a new client connection comes in, pybal makes an HTTPS microservice API call to ask the balancer-service where to route the request to [15:58:38] of course, we'll need a farm of balancer-service nodes with pybal in front of it... [16:01:03] <_joe_> I thought that was restbase [16:06:27] 10netops, 06Operations: netops: switch all subnets to use install1002/2002 as DHCP - https://phabricator.wikimedia.org/T156109#3017241 (10faidon) a:05mark>03faidon @Dzahn, change what specifically? DHCP relay? ACLs for TFTP? ACLs for webproxy, Ganglia etc.? [16:07:51] 10netops, 06Operations: Add firewall exception to get to wdqs[12]003.(codfw|eqiad).wmnet:8888 from analytics cluster - https://phabricator.wikimedia.org/T157593#3017245 (10faidon) 05Open>03Resolved a:03faidon Done! [16:10:54] 10netops, 06Operations: netops: switch all subnets to use install1002/2002 as DHCP - https://phabricator.wikimedia.org/T156109#3017254 (10Dzahn) @Faidon I would like the DHCP relay changed from install1001 to install1002 and from install2001 to install2002. TFTP/webproxy/Ganglia should not need changes (that... [17:11:11] 10netops, 06Operations: netops: switch all subnets to use install1002/2002 as DHCP - https://phabricator.wikimedia.org/T156109#3017532 (10faidon) 05Open>03Resolved It actually required a bunch of ACL changes for TFTP/webproxy/Ganglia. All of these plus the DHCP relay have been adjusted now across eqiad/cod... [17:13:41] https://censys.io/about [17:15:59] https://zmap.io/ [17:19:47] 10netops, 06Operations: netops: switch all subnets to use install1002/2002 as DHCP - https://phabricator.wikimedia.org/T156109#3017550 (10Dzahn) Thank you very much!