[15:57:31] volans: do you have any advice on detangling https://integration.wikimedia.org/ci/job/tox-docker/16531/console ? [15:58:12] cdanis: looking [15:58:24] the patch is https://gerrit.wikimedia.org/r/c/operations/software/klaxon/+/651846 [15:58:40] yeah got it from CI [16:03:58] looking at tox-wikimedia first, give me few min [16:04:50] no rush! [16:07:16] maybe this is a stupid question, sorry, keep up with me. I see we have a prometheus-like output for smart disk information, but I don't see a grafana dashboard. Does anyone know if that exists? [16:08:06] maybe we don't send it to prometheus in the first place? [16:10:00] cdanis: fyi, I just had an email from python infrastructure incident alerts saying their cdn is playing up. Not sure if it's related but that might not help. [16:11:51] hm, thanks RhinosF1 -- it didn't look like a download failure, more like an actual unsatisfiable constraint, but not 100% on that [16:12:37] The weird thing I noticed was it mentioned trying to get multiple tox versions [16:13:36] ERROR: Cannot install tox==3.10.0, tox==3.11.0, tox==3.11.1, tox==3.12.0, tox==3.12.1, tox==3.13.0, tox==3.13.1, tox==3.13.2, tox==3.14.0, tox==3.14.1, tox==3.14.2, tox==3.14.3, tox==3.14.4, tox==3.14.5, tox==3.14.6, tox==3.15.0, tox==3.15.1, tox==3.15.2, tox==3.16.0, tox==3.16.1, tox==3.17.0, tox==3.17.1, tox==3.18.0, tox==3.18.1, tox==3.19.0, tox==3.20.0, tox==3.20.1 and tox==3.21.0 because these package versions have conflicting [16:13:36] dependencies. [16:13:43] Why didn't that paste [16:13:53] But that's obvious as you can only have one tox version [16:15:00] cdanis: "it works"™ ;0 [16:15:11] works locally and with a recheck works, could have been a temporary issue [16:15:26] ahahah okay [16:15:32] if it works with a recheck that is good enough for me [16:15:46] sorry for the trouble [16:19:14] chaomodus: the automation-framework project is suffering some kind of puppet calamity; do you have time to work on repairing that or shall I? (If I do, I'll start with switching the puppetmasters over to being managed by a central master rather than self-puppetized) [16:25:42] hm, what is the phab tag for the infra-foundations team? I bet cdanis knows [16:26:07] there is none [16:26:08] there is not one :) [16:26:49] huh [16:26:56] well, would one of you be willing to file this as needed? https://phabricator.wikimedia.org/T271827 [16:26:57] tanks! [16:35:06] looks filed correctly to me, just adding a tag [16:39:20] thank you [16:39:50] just trying to avoid it showing up in Andre's untagged-ticket query :) [17:46:52] asking in here too: if everyone is watching the quarterly review, then it will wait, but if someone is around to be my sre buddy for a backport of a train blocker, please lmk [18:26:18] apergos: do you still need a buddy? [18:26:27] yes, I sure do! [18:26:42] I think people might have been at the quarterly review [18:26:49] legoktm: [18:26:51] * legoktm points to self :) [18:26:55] sweet! [18:27:00] ok lemme get stuff set up here [18:27:29] what's the patch? [18:28:22] it's alreay backported, and it's https://gerrit.wikimedia.org/r/c/mediawiki/core/+/655671 [18:28:44] but git status in the wmf.26 branch in mw-staging has stuffs [18:28:45] meh [18:29:14] modified: extensions/VisualEditor (new commits) [18:29:14] modified: extensions/Wikibase (new commits) [18:29:16] as well as [18:29:28] Your branch is ahead of 'origin/wmf/1.36.0-wmf.26' by 1 commit. [18:29:37] so i don't love any of that, nor am I sure what to do about it [18:29:39] hey Reedy [18:29:49] how about some deploy assist here [18:30:02] put your pudding down and tell me what we can do about this [18:31:18] * legoktm logs in [18:31:39] I suppose for the extensions we can just leave them, since this doesn't touch either, and I can scap sync-file because it's just the one file /includes/api/ApiQueryInfo.php being altered [18:31:51] (I can send around the phpunit test change after [18:31:52] ) [18:31:53] the 1 commit is a security patch, that's to be expected [18:32:15] ok, without doing it yourself, lemme know what I should be looking for to figure out how to handle it [18:32:21] and then, how to handle it :-D [18:32:58] I ran git log HEAD...origin/wmf/1.36.0-wmf.26 which shows that the only diff from Gerrit is a security patch [18:33:20] I just git log in there to see the commit message, same [18:33:26] but I don't know what to do about it [18:33:33] oh [18:33:36] you just rebase [18:34:00] so git fetch, check the log that there's nothing unexpcted, get rebase? [18:34:02] *git [18:34:20] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Step_2:_get_the_code_on_the_deployment_host [18:34:35] I've been camped on that page for an hour now :-P [18:34:38] heh [18:34:42] just making sure!! [18:34:45] git rebase origin/wmf/... [18:36:01] at that point /srv/mediawiki-staging has your patch in tree, you can now ssh to mwdebug1001 (or whichever) run `scap pull` to test your patch on that machine [18:36:47] looks good in -staging. gonna do that nexxt [18:38:22] what directory ought I to be in on the mwdebug host? /srv/mediawiki? the specific branch? [18:38:54] doesn't matter [18:39:23] I run it from my home dir [18:40:05] `scap pull` rsyncs all of `/srv/mediawiki-staging` from the deploy host so it updates both branches and wmf-config [18:41:28] so I see [18:41:35] sorry but I don't have my query lined up [18:41:50] sec [18:46:25] ok I think I got a request through that should have given an error [18:46:33] repeating something lie what was on [18:46:40] https://phabricator.wikimedia.org/T271815 [18:46:44] and no error, which is good [18:46:55] ugh, I should be saying thsi in -operatio [20:36:02] Heads up, we're likely to enable the last step of 'coalesceKeys' in Memcached for MW today. https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/607155 [20:36:25] already live since June 2020 for all wikis, but not yet for cross-wiki/shared/global cache keys [20:44:51] This bucket of mw-memcached keys represents about 40-50% of memc gets, and about 50-60% of memc sets. [20:45:31] would feel better if one of effie rzl or_joe_ stand by, can schedule for later in the week as well, not a rush. [20:45:51] I'm around 👋 [20:46:07] late in the day for the other two though [20:47:38] rzl: sure, one can suffice, if you're comfortable with it now. [20:48:34] it wil effectively purge/rehash this portion of cache keys. [20:48:48] yep, proceed when ready [20:52:07] Krinkle: I am around to [20:52:22] awesome, staging in a minute waiting or CI [20:52:24] I stand corrected :) [20:55:21] hehe [20:56:54] would a few folks here be willing to click on https://toolserver.org/ and tell me if it loads a page or times out? [20:57:09] It times out for bd.808 but we really want to blame his ISP [20:57:40] rzl: effie thx, deploy cancelled, we've hit a bug that we thought was gone. [20:57:55] Krinkle: ack, sorry to hear [20:58:00] :( [20:58:01] andrewbogott: WFM [20:58:15] thank you rzl [20:58:38] Krinkle: ping us when you attempt to deploy this again, I am looking forward to it [21:14:15] andrewbogott: works here. I get the we've moved page. [21:14:28] RhinosF1: great, thank you [21:14:51] Np [22:16:47] andrewbogott: toolserver.org times out for me (from AS20412). [22:18:50] Times out from AS23005 as well. [22:19:38] Works from AS49544. [22:20:00] Lemme know if you want traceroutes from any of the above. [22:25:41] Working client is connecting via esams, non-working ones {Level3,Telia}→ulsfo→codfw→eqiad. [22:30:55] dpifke, andrewbogott: sticking it into https://tools.keycdn.com/ping shows failures from San fransico, Singapore, Sydney, Tokyo, Bangalore but it working from Frankfurt, Amsterdam, London, New York, Dallas [22:40:59] what the heck? [22:41:17] Running that tools /traceroute shows the same last ip [22:41:29] * andrewbogott tries some other domains [22:41:50] Which points to irb-1102.cloudsw1-c8-eqiad.wikimedia.org [22:42:10] (208.80.154.211) [22:43:17] The working ones go through 208.80.154.213 which is irb-1103.cloudsw1-d5-eqiad [22:43:24] andrewbogott: does that make sense? [22:44:11] I understand what you're saying but have no idea why it's acting that way :/ [22:44:20] Other wmcs domains seem fine... [22:44:55] I know nothing about networking. I just know that tool exists. [22:45:35] If there's a task you want me to dump the actual trace routes on then I can [22:46:05] seems to be just that one IP that's cursed [22:46:07] I'll start a task [22:47:48] RhinosF1: https://phabricator.wikimedia.org/T271867 [22:48:01] Looking [22:51:01] Can confirm traceroute to tools works OK on the hosts where toolserver.org times out. I'll add my traceroutes to the task. [22:52:44] Possibly an rp_filter issue if it's host-specific? [22:52:57] andrewbogott: left a big paste + comment [22:53:51] thanks [22:54:31] hm, that IP seems to be out of range for the subnet it's supposed to be in [22:54:36] so that would cause all kinds of hilarity [22:54:53] All kinds of hilarity describes this week [22:54:57] https://www.irccloud.com/pastebin/yCB5W7ou/ [22:55:12] pretty sure 185.15.56.245 is not in /25 [22:56:59] /25 ends at .127 [22:57:16] https://www.ultratools.com/tools/netMaskResult?ipAddress=185.15.56.0%2F25 [22:57:22] yeah [23:02:02] hm… somehow this is also associated with 185.15.56.240/29 which contains that IP but seems like a weird special case [23:08:15] RhinosF1: I know at least a little bit about what's happening now. Thank you! [23:09:09] No problem! I'll be asleep as soon as I work out if a 3 month old bug still makes sense