[10:48:05] re: grafana and SSO, now when you are on a dashboard on grafana.w.o and hit "sign in" on bottom left you'll be redirected to the same url on the rw vhost, thanks cdanis for the suggestion :) [12:48:20] godog: I just tested what you said, and it doesn't seem to work [12:48:35] example: https://grafana.wikimedia.org/d/000000579/wmcs-openstack-eqiad1 [12:49:20] not sure if I'm doing something wrong [12:49:51] I see however the url changes to include a `ticket=xxxxx-idp1001` parameter [13:04:10] arturo: indeed I see the same, mmhh I think that happens when cas needs to refresh the session [13:04:34] arturo: does it work if you visit grafana-rw.w.o first ? [13:04:55] as in, go to grafana-rw first then repeat [13:09:19] will try in a bit [14:57:27] arturo: I think I got it, fun! https://phabricator.wikimedia.org/T267645 [14:57:36] * arturo reading [14:57:40] I'll rollback that change for now [14:59:16] godog: thanks for working on this. I find the original idea pretty good: redirecting to rw when required! [15:00:21] yeah I think there'll be a couple more tweaks, I couldn't test this in an isolated manner just yet [15:20:23] headsup, I have one mw api host using its onhost memcached and I will push a change for an app one [15:20:55] I do not expect anything funny to happen as we route specific keys be read from the onhost memcached [15:21:07] but keep it in mind :) [16:02:28] _joe_ FYI in https://phabricator.wikimedia.org/T267065 dcops asked some feedback for some host-rack moves (to free space for 10g hosts), and there are a couple of conf100x hosts. I think it should be fine as long as we do one move at the time, but lemme know if this is not ok (also others with context on zookeeper/etcd please chime in :) [16:02:59] <_joe_> elukey: as long as it's one at a time, it's ok [16:03:11] <_joe_> it's also unfortunate we moved one like 3 months ago for the same reason [16:03:30] yeah there are also a lot of mw1xxx nodes in the list :( [16:03:53] <_joe_> a lot of mc* boxes I'd say [16:04:13] <_joe_> it's a bit hard to read that task [16:04:45] there is a summary in https://phabricator.wikimedia.org/T267065#6606963 [16:05:19] I also added it to the description [16:06:07] <_joe_> that's 100 hosts [16:06:15] <_joe_> ok... [16:06:33] I am not sure if we'll move all [16:07:28] <_joe_> it's ironign there are 4 mc servers that will be moved /away/ from 10g racks :P [16:07:36] <_joe_> *ironic [16:09:57] we also have to refresh all the mc* hosts this FY IIRC, so it will be interesting to find 10g space if we need it :D [17:16:20] apergos: those looks like local changes for testing, can local changes be discarded for them to work? [17:16:41] those local changes might b necessary over there, that's what I'm afraid of [17:17:17] if it is only that diff, that seems safe enough? [17:17:32] oh, I see [17:17:40] it may break confd [17:18:08] but I don't understand why it wasn't applied on production if it breaks beta [17:18:14] I don't know who was working on that, I was going to look at puppet git repo but [17:18:24] git pull is hanging, of course [17:18:39] apergos: for the type of change I would bet either jbond42 or maybe moritzm [17:19:01] but let me check production puppet's state [17:19:01] moritz I think not, I've been talking to him about the testing [17:19:07] maybe jbond42 though [17:19:17] that would have been my first guess [17:19:26] do you know whose the local change? [17:20:32] nope [17:20:37] root:root so much for that [17:20:59] whats the issue i dont see it in the backlog [17:21:17] sorry: T264991 [17:21:18] T264991: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 [17:21:29] given it's not staged there's no good way to tell [17:21:34] * jbond42 looking [17:21:40] yeah my last comment, puppet sync broken on deployment-prep etc [17:21:42] but not necesasilly related ro you, just affects that change [17:22:25] my git pull is doing things but it is very slow. ugh [17:22:32] git-upload-pack taking forever [17:23:19] someone in -releng thinks it might have been tyler, they are checking around [17:23:29] (yay someone is awake in that channel :-) ) [17:23:45] apergos: so i made a change to use the facts hash for that default [17:23:46] Stdlib::Fqdn $srv_dns = $facts['domain'], [17:24:19] but it was in june so unlikly to be that [17:24:27] apergos: what if you get the patch, rebase, then reapply the change? [17:24:34] that would be the safest bet? [17:24:42] however i think its safe to revery that change [17:24:43] yes but it will still leave puppet sync broken [17:24:46] that's not too cool [17:24:53] yeah, not as a permanente measure [17:24:57] (the local change that is) [17:24:58] more like to unblock you [17:25:11] let's see what they say in -releng [17:25:13] or just doing what jbond says :-D [17:25:33] I will consider reverting though if they don't get anywhere [17:25:36] thank you both [17:28:09] apergos: i took a quick look at the local repo and i cant think we one would change from String to Stdlib::Fqdn, nothing in the hiera data looks like it would fail the regex for Stdlib::Fqdn [17:28:31] https://phabricator.wikimedia.org/T267439 [17:28:43] this is the deal, in case you care to join me in the -releng channel [17:29:24] jbond42: maybe you would have some insight? [17:29:29] looking [18:36:28] so something really weird happened for mc1035 [18:36:29] https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&from=now-12h&to=now [18:41:55] for anyone who cares to play with libicu63, deployment-prep has now been switched over. [18:42:25] the majority of the bw usage seems to be ruwiki:pcache:idhash:922-0!canonical [18:44:10] does anybody know how to trace this back to some event of ruwiki? [18:47:17] yes confirmed, slab 134, https://grafana.wikimedia.org/d/000000317/memcache-slabs?viewPanel=60&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=mc1035&var-slab=All&from=now-12h&to=now [18:47:23] it contains the pcache key [19:08:27] I am going off now, will check tomorrow, things are not on fire atm [20:25:25] Super quick puppet question, in https://integration.wikimedia.org/ci/job/operations-puppet-tests-buster-docker/14754/console it fails on `modules/query_service/manifests/common.pp:25 wmf-style: class 'query_service::common' includes java::tools from another module` [20:25:55] Is the error message saying that `query_service::common` *already* includes `java::tools` from another module? i.e. is it pointing out a redundancy? [20:27:14] (https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/1c88f36be3323848983569006f7a28db78fe7086/modules/query_service/manifests/common.pp#25 is where I include `::java::tools` in my patch) [20:37:32] ryankemper: this looks like a wmf style guide violation. c.f. https://wikitech.wikimedia.org/wiki/Puppet_coding#Modules