[10:48:05] <godog>	 re: grafana and SSO, now when you are on a dashboard on grafana.w.o and hit "sign in" on bottom left you'll be redirected to the same url on the rw vhost, thanks cdanis for the suggestion :)
[12:48:20] <arturo>	 godog: I just tested what you said, and it doesn't seem to work
[12:48:35] <arturo>	 example: https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1
[12:49:20] <arturo>	 not sure if I'm doing something wrong
[12:49:51] <arturo>	 I see however the url changes to include a `ticket=xxxxx-idp1001` parameter
[13:04:10] <godog>	 arturo: indeed I see the same, mmhh I think that happens when cas needs to refresh the session
[13:04:34] <godog>	 arturo: does it work if you visit grafana-rw.w.o first ?
[13:04:55] <godog>	 as in, go to grafana-rw first then repeat
[13:09:19] <arturo>	 will try in a bit
[14:57:27] <godog>	 arturo: I think I got it, fun! https://phabricator.wikimedia.org/T267645
[14:57:36] * arturo reading
[14:57:40] <godog>	 I'll rollback that change for now
[14:59:16] <arturo>	 godog: thanks for working on this. I find the original idea pretty good: redirecting to rw when required!
[15:00:21] <godog>	 yeah I think there'll be a couple more tweaks, I couldn't test this in an isolated manner just yet
[15:20:23] <effie>	 headsup, I have one mw api host using its onhost memcached and I will push a change for an app one
[15:20:55] <effie>	 I do not expect anything funny to happen as we route specific keys be read from the onhost memcached
[15:21:07] <effie>	 but keep it in mind :)
[16:02:28] <elukey>	 _joe_ FYI in https://phabricator.wikimedia.org/T267065 dcops asked some feedback for some host-rack moves (to free space for 10g hosts), and there are a couple of conf100x hosts. I think it should be fine as long as we do one move at the time, but lemme know if this is not ok (also others with context on zookeeper/etcd please chime in :)
[16:02:59] <_joe_>	 elukey: as long as it's one at a time, it's ok
[16:03:11] <_joe_>	 it's also unfortunate we moved one like 3 months ago for the same reason
[16:03:30] <elukey>	 yeah there are also a lot of mw1xxx nodes in the list :(
[16:03:53] <_joe_>	 a lot of mc* boxes I'd say
[16:04:13] <_joe_>	 it's a bit hard to read that task
[16:04:45] <elukey>	 there is a summary in https://phabricator.wikimedia.org/T267065#6606963
[16:05:19] <elukey>	 I also added it to the description
[16:06:07] <_joe_>	 that's 100 hosts
[16:06:15] <_joe_>	 ok...
[16:06:33] <elukey>	 I am not sure if we'll move all
[16:07:28] <_joe_>	 it's ironign there are 4 mc servers that will be moved /away/ from 10g racks :P
[16:07:36] <_joe_>	 *ironic
[16:09:57] <elukey>	 we also have to refresh all the mc* hosts this FY IIRC, so it will be interesting to find 10g space if we need it :D
[17:16:20] <jynus>	 apergos: those looks like local changes for testing, can local changes be discarded for them to work?
[17:16:41] <apergos>	 those local changes might b necessary over there, that's what I'm afraid of
[17:17:17] <jynus>	 if it is only that diff, that seems safe enough?
[17:17:32] <jynus>	 oh, I see
[17:17:40] <jynus>	 it may break confd
[17:18:08] <jynus>	 but I don't understand why it wasn't applied on production if it breaks beta
[17:18:14] <apergos>	 I don't know who was working on that, I was going to look at puppet git repo but
[17:18:24] <apergos>	 git pull is hanging, of course
[17:18:39] <jynus>	 apergos: for the type of change I would bet either jbond42 or maybe moritzm
[17:19:01] <jynus>	 but let me check production puppet's state
[17:19:01] <apergos>	 moritz I think not, I've been talking to him about the testing
[17:19:07] <apergos>	 maybe jbond42 though
[17:19:17] <apergos>	 that would have been my first guess
[17:19:26] <jynus>	 do you know whose the local change?
[17:20:32] <apergos>	 nope
[17:20:37] <apergos>	 root:root   so much for that
[17:20:59] <jbond42>	 whats the issue i dont see it in the  backlog
[17:21:17] <jynus>	 sorry: T264991
[17:21:18] <stashbot>	 T264991: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991
[17:21:29] <apergos>	 given it's not staged there's no good way to tell
[17:21:34] * jbond42 looking
[17:21:40] <apergos>	 yeah my last comment, puppet sync broken on deployment-prep etc
[17:21:42] <jynus>	 but not necesasilly related ro you, just affects that change
[17:22:25] <apergos>	 my git pull is doing things but it is very slow. ugh
[17:22:32] <apergos>	 git-upload-pack taking forever
[17:23:19] <apergos>	 someone in -releng thinks it might have been tyler, they are checking around
[17:23:29] <apergos>	 (yay someone is awake in that channel :-) )
[17:23:45] <jbond42>	 apergos: so i made a change to use the facts hash for that default
[17:23:46] <jbond42>	  Stdlib::Fqdn     $srv_dns       = $facts['domain'],
[17:24:19] <jbond42>	 but it was in june so unlikly to be that
[17:24:27] <jynus>	 apergos: what if you get the patch, rebase, then reapply the change?
[17:24:34] <jynus>	 that would be the safest bet?
[17:24:42] <jbond42>	 however i think its safe to revery that change
[17:24:43] <apergos>	 yes but it will still leave puppet sync broken
[17:24:46] <apergos>	 that's not too cool
[17:24:53] <jynus>	 yeah, not as a permanente measure
[17:24:57] <jbond42>	 (the local change that is)
[17:24:58] <jynus>	 more like to unblock you
[17:25:11] <apergos>	 let's see what they say in -releng
[17:25:13] <jynus>	 or just doing what jbond says :-D
[17:25:33] <apergos>	 I will consider reverting though if they don't get anywhere
[17:25:36] <apergos>	 thank you both
[17:28:09] <jbond42>	 apergos: i took a quick look at the local repo and i cant think we one would change from String to Stdlib::Fqdn, nothing in the hiera data looks like it would fail the regex for Stdlib::Fqdn
[17:28:31] <apergos>	 https://phabricator.wikimedia.org/T267439
[17:28:43] <apergos>	 this is the deal, in case you care to join me in the -releng channel
[17:29:24] <apergos>	 jbond42: maybe you would have some insight?
[17:29:29] <jbond42>	 looking
[18:36:28] <elukey>	 so something really weird happened for mc1035
[18:36:29] <elukey>	 https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&from=now-12h&to=now
[18:41:55] <apergos>	 for anyone who cares to play with libicu63, deployment-prep has now been switched over.
[18:42:25] <elukey>	 the majority of the bw usage seems to be ruwiki:pcache:idhash:922-0!canonical
[18:44:10] <elukey>	 does anybody know how to trace this back to some event of ruwiki?
[18:47:17] <elukey>	 yes confirmed, slab 134, https://grafana.wikimedia.org/d/000000317/memcache-slabs?viewPanel=60&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=mc1035&var-slab=All&from=now-12h&to=now
[18:47:23] <elukey>	 it contains the pcache key
[19:08:27] <elukey>	 I am going off now, will check tomorrow, things are not on fire atm
[20:25:25] <ryankemper>	 Super quick puppet question, in https://integration.wikimedia.org/ci/job/operations-puppet-tests-buster-docker/14754/console it fails on `modules/query_service/manifests/common.pp:25 wmf-style: class 'query_service::common' includes java::tools from another module`
[20:25:55] <ryankemper>	 Is the error message saying that `query_service::common` *already* includes `java::tools` from another module? i.e. is it pointing out a redundancy?
[20:27:14] <ryankemper>	 (https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/1c88f36be3323848983569006f7a28db78fe7086/modules/query_service/manifests/common.pp#25 is where I include `::java::tools` in my patch)
[20:37:32] <shdubsh>	 ryankemper: this looks like a wmf style guide violation.  c.f. https://wikitech.wikimedia.org/wiki/Puppet_coding#Modules