[05:19:20] 10netops, 10Operations: cr1-codfw:fpc0 failure - https://phabricator.wikimedia.org/T254110 (10ayounsi) 05Open→03Resolved The FPC is up and healthy. Interfaces are up as well. Netbox updated with the new serial#. [05:27:47] fyi, going to depool esams soon https://gerrit.wikimedia.org/r/601951 [06:50:50] following along on https://grafana.wikimedia.org/d/myRmf1Pik/varnish-aggregate-client-status-codes?orgId=1 [06:57:17] esams is still getting ~1.7K rps from bots and other noise (nothing to worry about, just interesting to see which User-Agents don't follow DNS updates) [07:03:08] no user impact should be expected, the depool is a "just in case" [07:03:49] XioNoX: useful data though, WikipediaApp/6.5.1.1705 also isn't following DNS apparently [07:04:07] ah yeah, that's not good :) [07:04:19] nope! Will file a task shortly [07:04:20] if we're talking about the official one [07:18:43] 10Traffic, 10Analytics, 10Operations: Compare logs produced by atskfafka with those produced by varnishkafka - https://phabricator.wikimedia.org/T254317 (10ema) [07:18:51] 10Traffic, 10Analytics, 10Operations: Compare logs produced by atskfafka with those produced by varnishkafka - https://phabricator.wikimedia.org/T254317 (10ema) p:05Triage→03Medium [07:19:07] 10Traffic, 10Analytics, 10Operations: Compare logs produced by atskfafka with those produced by varnishkafka - https://phabricator.wikimedia.org/T254317 (10ema) [07:19:19] 10netops, 10Operations, 10ops-esams, 10Patch-For-Review: Amsterdam maintenance (June 2020) - https://phabricator.wikimedia.org/T254021 (10ayounsi) [07:22:52] 10Traffic, 10Analytics, 10Operations: Compare logs produced by atskfafka with those produced by varnishkafka - https://phabricator.wikimedia.org/T254317 (10ema) [07:24:18] 10netops, 10Operations, 10ops-esams, 10Patch-For-Review: Amsterdam maintenance (June 2020) - https://phabricator.wikimedia.org/T254021 (10ayounsi) 05Open→03Resolved Everything got done smoothly, no user impact. T253970 and T244497 are still not solved. T245520 is solved. I also used the opportunity t... [07:25:36] 10netops, 10Operations, 10Patch-For-Review: Configure management-instance on routers with Junos > 17.3 - https://phabricator.wikimedia.org/T247073 (10ayounsi) Applied to cr2-esams and cr3-esams. Confirmed mgmt is still reachable. [07:26:59] 10netops, 10Operations, 10ops-esams: 2*10G optics down on cr2-esams - https://phabricator.wikimedia.org/T245520 (10ayounsi) 05Open→03Resolved Links are back up after a reboot. [07:32:22] 10netops, 10Operations: No LACP info for cr2-esams:ae2 - https://phabricator.wikimedia.org/T253970 (10ayounsi) The reboot didn't help. [07:32:54] 10netops, 10Operations: cr3-knams:xe-0/1/3 down - https://phabricator.wikimedia.org/T244497 (10ayounsi) The upgrade and reboot of cr3-knams didn't help. [07:37:37] 10netops, 10Operations: cr3-knams:xe-0/1/3 down - https://phabricator.wikimedia.org/T244497 (10ayounsi) a:05faidon→03ayounsi [08:20:15] shdubsh: hi! I see that puppet has been disabled on cp5011 for a while now for the mtail work, can we re-enable it? [08:21:26] we rely on puppet running to distribute OCSP stapling data and the wikiworkshop cert itself [08:21:37] not super urgent as the checks are still happy but still :) [09:40:18] 10netops, 10Analytics, 10Operations: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10faidon) [10:17:05] 10Traffic, 10Analytics, 10Operations: Compare logs produced by atskfafka with those produced by varnishkafka - https://phabricator.wikimedia.org/T254317 (10elukey) Some extra context: Ema added prometheus monitoring for ATSKafka in https://grafana.wikimedia.org/d/1EUhPpzMz/atskafka?orgId=1, and the cp3050's... [10:29:18] 10netops, 10Operations: Zayo link eqiad-codfw (OGYX/120003//ZYO) down (May 2020) - https://phabricator.wikimedia.org/T253610 (10ayounsi) And TTN-0004131048 for the record, currently down. [10:58:00] 10Traffic, 10Operations, 10conftool, 10Patch-For-Review, and 2 others: Figure out a security model for etcd - https://phabricator.wikimedia.org/T97972 (10Joe) My tests went fine: - `mwdebug*` servers got the datacenter-appropriate `pool-$dc-testserver` user - the deployment servers got the `conftool` user... [11:49:28] 10netops, 10Analytics, 10Operations: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10JAllemandou) Some more on the Druid aspect of things: - We have used multi-value dimensions in Druid without problem - Data needs to be an array and that's it. - We h... [12:10:50] 10netops, 10Operations, 10Patch-For-Review, 10Sustainability (Incident Prevention): Juniper HA audit - https://phabricator.wikimedia.org/T191667 (10ayounsi) [12:10:52] 10netops, 10Operations, 10Patch-For-Review: Configure management-instance on routers with Junos > 17.3 - https://phabricator.wikimedia.org/T247073 (10ayounsi) 05Open→03Resolved a:03ayounsi All done. [14:33:38] ema: yes, the testing is complete. I re-enabled it and ran puppet. Thanks! [14:33:49] shdubsh: ty! [15:07:22] 10netops, 10Operations: Homer: manage transit BGP sessions - https://phabricator.wikimedia.org/T250136 (10CDanis) LGTM [15:25:23] 10netops, 10Operations: Zayo link eqiad-codfw (OGYX/120003//ZYO) down (May 2020) - https://phabricator.wikimedia.org/T253610 (10CDanis) >>! In T253610#6173473, @ayounsi wrote: > Link is back up. > >> We found a defective fiber between nodes. >> Currently seeing service restored. >> Please advise the ONCC... [16:36:25] 10Traffic, 10Analytics, 10Operations: missing wmf_netflow data, 18:30-19:00 May 31 - https://phabricator.wikimedia.org/T254161 (10elukey) The missing data seems from May 31st 18:30 to 19:00. I did a quick check via Spark and on HDFS the data seems present: ` scala> spark.sql("select stamp_inserted from wmf.... [21:36:03] 10Wikimedia-Apache-configuration, 10Operations, 10Developer Productivity, 10Patch-For-Review, 10Performance-Team (Radar): VirtualHost for mod_status breaks debugging Apache/MediaWiki from localhost (on jobrunners) - https://phabricator.wikimedia.org/T190111 (10Krinkle)