[09:35:38] <wikibugs>	 10Domains, 10Traffic, 10Operations: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10akosiaris) p:05Triage→03Medium
[09:36:43] <wikibugs>	 10Traffic, 10DNS, 10Operations: Verify diff.wikimedia.org ownership for Facebook - https://phabricator.wikimedia.org/T259807 (10akosiaris) p:05Triage→03Medium
[12:25:25] <wikibugs>	 10Traffic, 10Operations: Generate ATS cache.config from software-agnostic data structures - https://phabricator.wikimedia.org/T259692 (10ema) 05Open→03Resolved Done, `profile::trafficserver::backend::caching_rules` is now gone. `cache.config` is generated by parsing `req_handling` and `alternate_domains`....
[13:01:19] <godog>	 I'm currently puzzled by the kibana-next alert on icinga: PYBAL CRITICAL - CRITICAL - kibana-next_443: Servers logstash1025.eqiad.wmnet are marked down but pooled
[13:01:39] <godog>	 host is pooled indeed, https://config-master.wikimedia.org/pybal/eqiad/kibana-next
[13:01:55] <godog>	 and proxyfetch thinks it is up, yet pybal /alerts doesn't
[13:02:18] <godog>	 lvs1015:~# curl -s localhost:9090/metrics | grep -i status.*logstash1025
[13:02:21] <godog>	 pybal_monitor_status{host="logstash1025.eqiad.wmnet",monitor="ProxyFetch",service="kibana-next_443"} 1.0
[13:02:28] <godog>	 lvs1015:~# curl  http://localhost:9090/alerts
[13:02:28] <godog>	 CRITICAL - kibana-next_443: Servers logstash1025.eqiad.wmnet are marked down but pooled
[13:10:17] <ema>	 mmh
[13:10:34] <ema>	 indeed according to pybal logs it's up
[13:10:35] <ema>	 Aug 07 10:44:23 lvs1015 pybal[11890]: [kibana-next_443] INFO: Server logstash1025.eqiad.wmnet (disabled/partially up/not pooled) is up
[13:11:23] <ema>	 well, 'partially up'
[13:12:05] <godog>	 true, also since then I've depooled/repooled and afaict now it should be fully up
[13:14:30] <godog>	 I'm tempted to restart pybal on lvs1016 and see if that "helps"
[13:17:10] <ema>	 godog: try first to disable/re-enable the host in etcd maybe
[13:17:59] <godog>	 I've done that already but no harm in trying again
[13:18:48] <godog>	 {{done}}, from the host via depool/pool scripts that is
[13:18:59] <ema>	 Aug 07 13:18:04 lvs1015 pybal[11890]: [kibana-next_443] INFO: Merged enabled server logstash1023.eqiad.wmnet, weight 10
[13:19:02] <ema>	 Aug 07 13:18:04 lvs1015 pybal[11890]: [kibana-next_443] INFO: Merged disabled server logstash1025.eqiad.wmnet, weight 10
[13:19:05] <ema>	 Aug 07 13:18:04 lvs1015 pybal[11890]: [kibana-next_443] INFO: Merged enabled server logstash1024.eqiad.wmnet, weight 10
[13:19:10] <ema>	 it is puzzling
[13:19:27] <ema>	 and then:
[13:19:32] <ema>	 Aug 07 13:18:27 lvs1015 pybal[11890]: [kibana-next_443] INFO: Merged enabled server logstash1025.eqiad.wmnet, weight 10
[13:20:19] <ema>	 now IdleConnection seems happy too
[13:20:29] <ema>	 pybal_monitor_status{host="logstash1025.eqiad.wmnet",monitor="IdleConnection",service="kibana-next_443"} 1.0
[13:21:06] <godog>	 indeed, and /alerts keeps reporting "down but pooled"
[13:23:11] <ema>	 I see 3 pooled servers according to the dashboard too: https://grafana.wikimedia.org/d/000000421/pybal?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-server=lvs1015&var-service=kibana-next_443 
[13:23:56] <ema>	 and indeed the host is pooled in IPVS:
[13:24:00] <ema>	 TCP  kibana-next.svc.eqiad.wmnet: sh -> logstash1023.eqiad.wmnet:htt Route   10     0          0         -> logstash1024.eqiad.wmnet:htt Route   10     0          0         -> logstash1025.eqiad.wmnet:htt Route   10     0          0         
[13:24:04] <ema>	 bleah
[13:24:05] <ema>	 TCP  kibana-next.svc.eqiad.wmnet: sh
[13:24:07] <ema>	   -> logstash1023.eqiad.wmnet:htt Route   10     0          0         
[13:24:10] <ema>	   -> logstash1024.eqiad.wmnet:htt Route   10     0          0         
[13:24:13] <ema>	   -> logstash1025.eqiad.wmnet:htt Route   10     0          0         
[13:25:59] <ema>	 godog: +1 for restarting, it looks like we're dealing with an instrumentation.py bug to me
[13:26:49] <godog>	 ema: ack! same here, I'll restart lvs1016
[13:29:04] <godog>	 and indeed a restart "fixed" /alerts on lvs1016
[13:29:34] <godog>	 I'll bounce pybal on lvs1015 too
[13:29:55] <cdanis>	 'hooray'
[15:05:30] <wikibugs>	 10Traffic, 10Operations, 10Phabricator, 10serviceops, and 2 others: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10hashar)
[15:06:53] <wikibugs>	 10Traffic, 10Operations, 10Phabricator, 10serviceops, and 2 others: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10hashar) I filed a dupe of this task. The admin interface states the notification server is not reachable. At https...
[20:20:02] <wikibugs>	 10netops, 10Operations, 10ops-eqiad: new cloudflare xconnect to cr1-eqiad - https://phabricator.wikimedia.org/T259923 (10RobH) p:05Triage→03Medium
[20:46:25] <mutante>	 hi ema or vgutierrez ? thanks for the reviews and merge on the aphlict backend change!  we were debugging why it does not fully work yet and eventually came up with this follow-up
[20:46:29] <mutante>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/619036
[20:46:44] <mutante>	 so i guess we need to set "    caching: 'websockets'" too
[20:46:58] <mutante>	 that is from comparing it to the Etherpad setup
[20:47:18] <mutante>	 but we have both wss:// and https:// connections to the same host name
[20:47:41] <mutante>	 extra: even though it's set to "normal" and not "pass" currently.. it seems like it actually does not cache anything for phab 
[20:48:12] <mutante>	 do not cache is what we want.. i just expected for that it would have to be "pass" 
[20:51:38] <mutante>	 the latter is because of the headers sent by phab
[22:02:39] <wikibugs>	 10Traffic, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Operations: Geoip lookup - Misidentifying country due to travelling - https://phabricator.wikimedia.org/T175691 (10Tgr) This also prevents me from making a card donation (via the donation link in the sidebar menu, but I imagine click...
[22:04:40] <wikibugs>	 10Traffic, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Operations: Geoip lookup - Misidentifying country due to travelling - https://phabricator.wikimedia.org/T175691 (10Tgr) Related: {T122097}
[22:26:17] <wikibugs>	 10Traffic, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Operations: Geoip lookup - Misidentifying country due to travelling - https://phabricator.wikimedia.org/T175691 (10Platonides) It could go both ways. If as an Hungarian with only Hungarian credit card, and temporarily visiting the US...