[03:35:37] !log restarting varnish-frontend on cp5008 [03:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:57:52] 10Traffic, 10Operations: varnish-fe is flooding the text backend caching layer with backend probe requests - https://phabricator.wikimedia.org/T236754 (10Vgutierrez) [05:54:24] 10Traffic, 10Operations: Enforce POST size limit on ats-tls - https://phabricator.wikimedia.org/T236755 (10Vgutierrez) [07:44:52] 10Traffic, 10Operations: ats-tls shows a huge amount of ESTABLISHED sockets even when the server is depooled - https://phabricator.wikimedia.org/T236458 (10Vgutierrez) 05Resolved→03Open Reopening cause the issue hasn't been solved as it can be seen here: https://grafana.wikimedia.org/d/ivPJtZAWz/t236458?or... [07:44:54] 10Traffic, 10Operations: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [08:14:57] 10netops, 10Operations: cr3-esams crash - https://phabricator.wikimedia.org/T236598 (10ayounsi) > We have found matching PR1179822, Chassisd might crash if lo0 filter is configured without allowing communication between RE and VM-host on RE. As a result,the internal interfaces are incorrectly examined by lo0 f... [08:24:01] 10netops, 10Operations: cr3-esams crash - https://phabricator.wikimedia.org/T236598 (10ayounsi) All the interfaces are back up and cr3-esams is now reachable and in service. One issue persists, re0 can't reach re1: ` ayounsi@re0.cr3-esams# commit check warning: Could not connect to re1 : No route to host war... [10:46:41] 10Traffic, 10Operations: varnish-fe is flooding the text backend caching layer with backend probe requests - https://phabricator.wikimedia.org/T236754 (10ema) On cache_text we have a fairly significant number of VCL files stuck in the "auto/busy" state after having been discarded by our reload script. As an ex... [10:47:16] 10Traffic, 10Operations: varnish-fe is flooding the text backend caching layer with backend probe requests - https://phabricator.wikimedia.org/T236754 (10ema) p:05Triage→03Normal [10:48:06] 10Traffic, 10Operations: Discarded VCL files stuck in auto/busy state cause high number of backend probe requests - https://phabricator.wikimedia.org/T236754 (10ema) [12:47:09] 10netops, 10Operations: Configure conditional advertizing in eqdfw and knams - https://phabricator.wikimedia.org/T236785 (10ayounsi) [13:13:08] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10BBlack) [13:14:22] 10Traffic, 10DNS, 10Operations, 10ops-esams: rack/setup/install dns300[12] - https://phabricator.wikimedia.org/T236217 (10BBlack) [13:14:34] 10Traffic, 10DNS, 10Operations, 10ops-esams: rack/setup/install dns300[12] - https://phabricator.wikimedia.org/T236217 (10BBlack) 05Open→03Resolved [13:15:19] 10Traffic, 10Operations, 10ops-esams: rack/setup/install lvs300[567] - https://phabricator.wikimedia.org/T236294 (10BBlack) [13:15:30] 10Traffic, 10Operations, 10ops-esams: rack/setup/install lvs300[567] - https://phabricator.wikimedia.org/T236294 (10BBlack) 05Open→03Resolved [13:17:43] 10Traffic, 10Operations, 10ops-esams: cp3036 PS Redundancy Lost - https://phabricator.wikimedia.org/T202627 (10BBlack) [13:17:48] 10Traffic, 10DC-Ops, 10Operations, 10decommission: decommission cp3030-3049 - https://phabricator.wikimedia.org/T236454 (10BBlack) [13:18:02] 10Traffic, 10Operations, 10ops-esams: cp3032 PS Redundancy Lost - https://phabricator.wikimedia.org/T202046 (10BBlack) [13:18:05] 10Traffic, 10DC-Ops, 10Operations, 10decommission: decommission cp3030-3049 - https://phabricator.wikimedia.org/T236454 (10BBlack) [13:19:27] 10Traffic, 10DC-Ops, 10Operations, 10decommission: decommission cp3030-3049 - https://phabricator.wikimedia.org/T236454 (10BBlack) [13:22:00] 10Traffic, 10DC-Ops, 10Operations: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10BBlack) 05Open→03Resolved [13:24:15] 10Traffic, 10Operations, 10decommission, 10ops-esams: Decommission cp3007-cp3010 - https://phabricator.wikimedia.org/T208585 (10BBlack) [13:28:35] 10Traffic, 10Operations, 10ops-esams: rack/setup/install cp30[50-65].esams.wmnet - https://phabricator.wikimedia.org/T233242 (10BBlack) [13:30:11] 10Traffic, 10Operations, 10ops-esams: rack/setup/install cp30[50-65].esams.wmnet - https://phabricator.wikimedia.org/T233242 (10BBlack) 05Open→03Resolved As a batch these servers are complete in general. Note cp3056 had an early hardware issue that prevented progress, but this is tracked separately in:... [14:44:14] 10Traffic, 10Operations, 10Wikidata, 10Wikidata-Query-Service, and 2 others: large number of 504 errors from ulsfo - https://phabricator.wikimedia.org/T236500 (10Ottomata) a:03ema Ema looks like you are working on this. Assigning to you as part of clinic duty. Feel free to resolve if done. [14:45:44] 10Traffic, 10Operations, 10Wikidata, 10Wikidata-Query-Service, and 2 others: large number of 504 errors from ulsfo - https://phabricator.wikimedia.org/T236500 (10ema) 05Open→03Resolved It is done, yes. Thanks @ottomata! [15:33:11] 10Traffic, 10Operations, 10Patch-For-Review: track NIC firmware version numbers across the fleet - https://phabricator.wikimedia.org/T236744 (10Ottomata) p:05Triage→03Normal a:03CDanis CDanis: assigning to you as part of clinic duty [15:38:40] 10Traffic, 10MediaWiki-REST-API, 10Operations, 10Parsoid-PHP, and 3 others: Varnish/ATS should not decode URIs for /w/rest.php - https://phabricator.wikimedia.org/T235478 (10Ottomata) a:03ema Ema this looks done, feel free to resolve. [15:40:57] bblack: firmware_version fact is rolling out across the fleet [15:41:03] er, the sub-key in net_driver [15:41:06] cdanis: awesome :) [15:49:22] 10Traffic, 10Core Platform Team, 10MediaWiki-extensions-CentralAuth, 10Operations, and 5 others: Consistent HTTP 503 Error on some urls for some logged-in users (CentralAuth Set-Cookie storm) - https://phabricator.wikimedia.org/T226840 (10Ottomata) It sounds like this particular problem is fixed. Was TMH... [16:00:09] 10Traffic, 10MediaWiki-REST-API, 10Operations, 10Parsoid-PHP, and 3 others: Varnish/ATS should not decode URIs for /w/rest.php - https://phabricator.wikimedia.org/T235478 (10eprodromou) 05Open→03Resolved Tested; it's done [16:36:57] 10Traffic, 10Operations, 10CPT Initiatives (Core REST API in PHP), 10Core Platform Team Workboards (Green), 10Patch-For-Review: Implement basic routing for rest.php - https://phabricator.wikimedia.org/T235779 (10eprodromou) 05Open→03Resolved a:03eprodromou [20:25:49] 10Traffic, 10Operations, 10decommission, 10ops-esams: Decommission cp3007-cp3010 - https://phabricator.wikimedia.org/T208585 (10Papaul) [20:26:14] 10Traffic, 10Operations, 10decommission, 10ops-esams: Decommission cp3007-cp3010 - https://phabricator.wikimedia.org/T208585 (10Papaul) 05Open→03Resolved a:03Papaul Complete [20:49:40] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10Papaul) @BBlack indeed I was getting an error on the PCIe card and I did removed/ insert it and was not getting the error anymore. Please try to re-image the server and let me know. Thanks. [23:07:38] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10BBlack) I've tried imaging, and things mostly work, but I have a hard time keeping it online long enough to get through an initial puppet agent run (or two or three), as the kernel keeps pan...