[07:20:39] interesting - the kafka mirror maker codfw -> eqiad raised an alert for low consume/produce rate [07:21:10] starting from the 14th at around 15:00 UTC, that is one day before the kafka-main2003 outage IIUC [07:21:15] it seems to coincide with [07:21:16] https://grafana.wikimedia.org/d/000000234/kafka-by-topic?orgId=1&from=now-7d&to=now&refresh=5m&var-datasource=codfw%20prometheus%2Fops&var-kafka_cluster=main-codfw&var-kafka_broker=All&var-topic=codfw.change-prop.transcludes.resource-change [07:23:40] nothing in the SAL, and it seems pretty sharp [07:24:02] if it is expected we can review warning/crits for mirror maker codfw -> eqiad [07:24:20] but it is weird that it fell so sharply on a saturday [07:25:04] (earlier on I roll restarted mirror maker on kafka-main100[1-3] to verify that it wasn't a weird state) [07:25:53] (going afk for a while, dentist, will read later) [08:55:10] back [08:55:29] any thought? [08:56:10] elukey: not sure who is the question is for? [08:56:53] anybody :D [08:58:53] well, I have no idea :) [09:05:56] some interesting metrics also in here https://grafana.wikimedia.org/d/000300/change-propagation?orgId=1&refresh=30s&from=now-7d&to=now [09:11:34] very ignorant about this bit, since it is changeprop working on codfw topics [09:57:45] opened https://phabricator.wikimedia.org/T268121 [13:34:48] re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/632552, anyone familiar with `modules/mediawiki/files/apache/sites/redirects/redirects.dat` ? [13:37:26] it's just data for code that generates apache config, what could possibly go wrong? :) [13:39:08] i believe the appropriate answer is "everything" [16:26:24] I had a question about puppet internals, asked it in #puppet and one day later I got a reply from someone saying "that's actually my code" and a workaround. sometimes it just works :) [18:14:36] Ganeti VMs talking about "I/O error on he floppy disk" still kind of funny. [18:15:48] mutante: you can watch https://www.youtube.com/watch?v=tvVPDC86sYw :p [18:17:29] effie: hah! I will. thanks. btw mwdebug1003 is in the process of becoming a canary. Will be done up to "pooled=no" by tomorrow's meeting. [18:18:09] I thinked that pooled does not matter in this case [18:18:12] I think the very last step will be adding it to x-wikimedia-debug-header map https://gerrit.wikimedia.org/r/c/operations/puppet/+/641759/ [18:18:34] I meant to say "will do everything except sending actual traffic to it". [18:19:18] I think we should reach out to the devs of wikimediaDebug, I do not know the details as to how to add this host to the the extension [18:19:23] but it would be usefulk [18:19:26] usefull [18:19:52] It was new to me but I think that patch above does just that. [18:19:59] ok [18:21:48] I am not sure really [18:22:11] I'll try to find some reviewers [18:22:20] teh patch will direct traffic to mwdebug1003 and the request will bypass the cache [18:23:12] but only if the users specifies it wants to use mwdebug1003, afaict [18:23:26] but not sure enough, yea [18:23:55] yes [18:44:12] trying to 'git rebase -i origin/production' something like a million times before but this is a first: [18:44:16] warning: inexact rename detection was skipped due to too many files. [18:44:20] warning: you may want to set your merge.renamelimit variable to at least 4089 and retry the command. [18:44:30] must be becaues this patch is ANCIENT [18:44:44] I just changed my chrome extension to have mwdebug1003 in the list. I had to edit file background.js in my .config/... directory and then switch on developer mode and reload the extension as unpacked. [18:44:55] https://www.irccloud.com/pastebin/bQgJgsUA/ [18:48:52] wkandek: cool, but don't expect it to work already. puppet was still running until a few min ago [18:49:05] currently watching icinga status and missing packages [18:49:39] ok, happy to test when you get there. [18:50:16] btw, editing the backgound.js file in place did not work, Chrome complained that the extension was corrupt. [19:31:20] wkandek: he, nice hack :) alternatively, you can also install it from git and plug it in directly to Chrome from a more convenient directory if you prefer :) [19:31:21] https://gerrit.wikimedia.org/g/performance/WikimediaDebug [19:31:27] step 2 and 3 of the readme [19:33:27] krinkle: ok, will check it out. Looks like the better way... [19:36:54] krinkle: that would be about adding a new mwdebug host and making it possible to use it.. while also not breaking stuff for existing users of the stretch mwdebug servers... https://gerrit.wikimedia.org/r/c/operations/puppet/+/641759 [19:37:15] not sure if we should add it already at this point or later [19:38:01] It will be around for 3 months or so, and then go away? [19:38:31] easy enough to add, no problem at all [19:39:51] wkandek: I am thinking more like "mwdebug1003/1004 will exist, both on buster, at some point, then mwdebug1001/1002 will be removed" and "once we start testing on bullseye there will be mwdebug1005 and so on [19:40:05] Krinkle: sounds good [19:41:00] by bullseye we will be on kubernetes :) [19:41:25] heh, right [19:42:07] It would be good if we have existing misc hardware to become mwmaint1003 [19:42:18] not sure how we will route to debug there, seems having "hosts" is not usable there,but a way to route to a certain set of pods is still needed [19:42:24] we should have 2 mwmaint per DC anyways but it's not a VM [20:47:47] could it be a VM? https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=mwmaint1002&var-datasource=thanos&var-cluster=misc&from=now-7d&to=now [20:47:58] utilization seems pretty low. [21:23:32] Krinkle: do I ping you/your team for wikimediaDebug ? [21:23:42] I have been trying to find on phab who to ping :p [21:23:54] filing a task will auto-tag Performance, yes. [21:24:05] lovely tx [21:24:06] the repo is performance/WikimediaDebug [21:31:41] done, thank you!