[02:13:56] 10Traffic, 10DNS, 06Operations, 06Services (next): icinga alerts on nodejs services when a recdns server is depooled - https://phabricator.wikimedia.org/T162818#3177301 (10GWicke) Lets perhaps tackle T123854, so that icinga also keeps an eye on the action API? [02:23:09] 10Traffic, 10MediaWiki-API, 10Monitoring, 06Operations, 06Services: Set up action API latency / error rate metrics & alerts - https://phabricator.wikimedia.org/T123854#3177305 (10GWicke) Using grafana's new & spiffy alert feature, I set up a simple alert for the RESTBase backend request latency using the... [10:40:21] paravoid: cp3038 is also broken, can we trigger smarthands for it? And cp3003 of course [10:40:58] https://phabricator.wikimedia.org/T162132 https://phabricator.wikimedia.org/T157537 [10:57:50] 10Traffic, 06Operations, 13Patch-For-Review, 15User-Elukey: prometheus-vhtcpd-stats cronspamming if vhtcpd is not running yet - https://phabricator.wikimedia.org/T157353#3177891 (10Volans) 05Resolved>03Open Re-opening because this is happening when rebooting hosts, see last days root@ mails [11:41:56] 10Traffic, 06Operations, 10ops-codfw: lvs2002 random shut down - https://phabricator.wikimedia.org/T162099#3152568 (10faidon) >>! In T162099#3169146, @BBlack wrote: > @ayounsi Let's let it burn in with no traffic until tomorrow sometime, then sync up on reverting the router config hacks and watching the traf... [12:38:27] after the reimage achernar fails to initialise it's remote servers, stratum 16 for ntp3.tamu.edu indicates the server os down (which I've confirmed). how were the peer_upstreams in modules/role/manifests.pp initially picked? tamu.edu seems to be a Texan uni, so it should probably be a similar regional pick? [12:39:37] moritzm: yeah, so, this is all related to why we want ntp 4.2.8, your earlier question :) [12:39:53] yeah, just realised you were referrring to "restrict source" [12:40:11] yup [12:40:39] so according to puppet commentary, I picked the current set in Sept 2014, it's been bitrotting since then and probably needs a wholesale re-review [12:41:22] but in general it's not the end of the world that a bunch of that list are dead/invalid. So long as at least 1-2 per site are still working and all of ours peer with each other, one way or another things tend to barely work out [12:41:48] but it's clearly past time to fix it up for the first time in a couple of years :) [12:42:45] for achernar ATN both ntp3.tamu.edu and ip68-108-190.19 are shown with stratum 16, it's only synching against our other five servers at this point [12:42:51] (noticed it via the Icinga alert) [12:43:59] yeah archernar actually has 4x external (like the rest), but the other two don't even resolve in DNS currently, so they don't show up in peers list [12:44:39] I can run down the process this morning I guess and update them. Should be the last time we need to before moving them to stretch eventually and switching to proper pool aliases and restrict source [12:47:27] 10Traffic, 10netops, 06Operations, 13Patch-For-Review: knams equipment move - https://phabricator.wikimedia.org/T162601#3178079 (10ayounsi) Work finished at 14:00 local time, interfaces confirmed up, LACP active. BGP re-enabled at ~14:30. Everything established, interfaces passing traffic. Will revert the... [12:52:28] thanks, http://support.ntp.org/bin/view/Servers/StratumTwoTimeServers lists various public servers from Texas and should be up-to-date [12:53:42] yeah that's basically what I worked from last time, running through and checking locations and policies and then manually verifying they work and which are closer latency-wise, etc [13:05:29] 10Traffic, 06Operations, 10ops-codfw: lvs2002 random shut down - https://phabricator.wikimedia.org/T162099#3178115 (10ayounsi) a:05ayounsi>03BBlack Moving that one back to Brandon [13:21:57] 10Traffic, 10netops, 06Operations, 13Patch-For-Review: knams equipment move - https://phabricator.wikimedia.org/T162601#3178159 (10ayounsi) 05Open>03Resolved esams reenabled in DNS, confirmed traffic is properly passing through knams. [13:32:31] 10Traffic, 06Operations, 10ops-codfw: lvs2002 random shut down - https://phabricator.wikimedia.org/T162099#3178185 (10ema) Traffic switched to lvs2002 properly: https://grafana.wikimedia.org/dashboard/db/load-balancers?panelId=8&fullscreen&orgId=1&from=1492081888998&to=1492085531622 [13:44:33] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3178221 (10H-stt) No, moving this discussion to another forum on Meta is not an acceptable option. The board's resolution demands action by the ops, so this issue needs to be discussed with the ops, and this... [14:00:59] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#2962020 (10hashar) @H-stt a bit more context. Sustainability was one of the important selection criteria for the previous datacenter. See the request for comment from 2013 at https://wikimediafoundation.org/w... [14:17:21] 10Traffic, 06Operations: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#3178320 (10BBlack) This is the last time I'll respond to trolling on this ticket. >>! In T156029#3166703, @H-stt wrote: >>>! In T156029#3053235, @BBlack wrote: >> >>>>! In T156029#3053179, @Gnom1 wrote: >>... [14:59:34] moritzm: ok the ntp network is all back in sync with each other and upstream now. 1x of the new servers in the EU failed to work out (I think because it's +UseDNS and has IPv4 + IPv6, but their ntp only responds on v4 and our server tried v6). So I'll do a followup and replace that one. [15:03:59] nice [18:47:09] 10Wikimedia-Apache-configuration, 10ArchCom-RfC, 10Wikidata, 06Services (watching): Canonical data URLs for machine readable page content - https://phabricator.wikimedia.org/T161527#3180167 (10daniel) [18:51:21] 10Wikimedia-Apache-configuration, 10ArchCom-RfC, 10Wikidata, 06Services (watching): Canonical data URLs for machine readable page content - https://phabricator.wikimedia.org/T161527#3180211 (10daniel) This RFC was discussed in a public RFC meeting on the wikimedia-office channel on April 12. It was agreed... [18:52:26] 10Wikimedia-Apache-configuration, 10ArchCom-RfC, 10Wikidata, 06Services (watching): Canonical data URLs for machine readable page content - https://phabricator.wikimedia.org/T161527#3180214 (10daniel) [19:20:56] 10Traffic, 06Operations, 10Page-Previews, 06Performance-Team, and 3 others: Performance review #2 of Hovercards (Popups extension) - https://phabricator.wikimedia.org/T70861#3180529 (10ovasileva) [20:45:22] 10netops, 10DBA, 06Operations, 10ops-codfw: db20[7-9][0-9] swith ports configuration - https://phabricator.wikimedia.org/T162944#3180944 (10Papaul) [20:45:56] 10netops, 10DBA, 06Operations, 10ops-codfw: db20[7-9][0-9] switch ports configuration - https://phabricator.wikimedia.org/T162944#3180963 (10Luke081515) [20:48:54] 10netops, 10DBA, 06Operations, 10ops-codfw: db20[7-9][0-9] swith ports configuration - https://phabricator.wikimedia.org/T162944#3180980 (10Papaul) [20:49:54] 10netops, 10DBA, 06Operations, 10ops-codfw: db20[7-9][0-9] switch ports configuration - https://phabricator.wikimedia.org/T162944#3180944 (10Papaul) [21:00:21] 10netops, 10DBA, 06Operations, 10ops-codfw: db20[7-9][0-9] switch ports configuration - https://phabricator.wikimedia.org/T162944#3181028 (10Papaul)