[00:00:09] jbond42: thank you! maybe I don't need a separate vhost :D [00:03:34] so I shutdown the linecard and it came back up on its own and it not looks healthy [00:03:35] bahh [00:17:35] cdanis: great i thought i may have been to late [08:44:50] for everybody interested in mcrouter and tkos [08:45:31] there was a bug in the exporter and the TKO state broken down by memcached shard was not really working (always zero) [08:45:59] I am testing the new version on mwdebug1001: I have removed the old/wrong graph and added these two [08:46:02] https://grafana.wikimedia.org/d/000000549/mcrouter?orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=All&var-instance=mwdebug1001&var-memcached_server=All&fullscreen&panelId=39 [08:46:09] https://grafana.wikimedia.org/d/000000549/mcrouter?orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=All&var-instance=mwdebug1001&var-memcached_server=All&fullscreen&panelId=40 [08:46:36] those count the number of appservers that have flagged a memcached shard with soft/hard tkos [08:47:06] so in theory it should be more clear from now on to spot what shards are misbehaving [08:48:01] (going to roll out the new version in a bit) [09:19:51] <_joe_> good! [12:02:10] elukey: there you go [12:02:12] https://grafana.wikimedia.org/d/000000549/mcrouter?orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=All&var-instance=mwdebug1001&var-memcached_server=All&fullscreen&panelId=40&from=1582717086178&to=1582718511589 [12:02:14] D: [12:02:16] :D [12:04:45] memcache errors seem to be flopping up and down [12:05:03] I just saw your update on -ops [12:28:37] What does "scb" stand for? (as in the servers with names starting scb) [12:29:21] service b? [12:29:37] service ops will know [12:30:46] hnowlan: Service Cluster B https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions [12:31:08] ah, thanks! [12:31:29] but people will know if b is kubernetes only or something like that [12:32:26] I don't think sca* exists any more judging by DNS and puppet [12:35:52] <_joe_> hnowlan: correct [12:36:13] <_joe_> scb is being emptied too [12:38:51] what is the new services host called now? [12:39:21] kubernetes ? [12:41:22] kubernetes* are the workers, yes [13:51:10] effie: nice thanks! [13:51:15] seems working :) [13:51:21] so it was hard-tko at the end [13:51:36] yeah [13:51:50] that was without the failover config [14:01:27] oh, 0 failing services, 0 failing host on icinga [14:01:32] congrats! [14:01:49] let me see if I can get rid of warnings too [14:05:45] eh, been working on them for 45min [14:20:56] (down to 1) [14:24:25] yay, XioNoX [14:27:10] could I restart-php7.2-fpm mw1279 as a symbolic milestone? [14:28:38] it may fix itself, hit rate is rising [14:29:55] <_joe_> opcache hitrate? [14:30:01] <_joe_> restarting the server will worsen it [14:30:27] <_joe_> it's a warning, intended to happen without getting immediate action [14:30:35] yeah, I double checked, I thought at first it was the other error [14:31:12] not touching it [16:10:42] o/