[04:29:55] I will be restarting enwiki master in 30 minutes [07:44:33] <_joe_> quick heads up: I'm disabling puppet on the codfw jobrunners, trying to move them to envoy [07:46:11] ack [07:47:28] <_joe_> it will take some time, these are the first servers I move over for the jobrunner role, so something might be wrong [09:01:32] 198.35.27.0/24 is now anycasted from all our sites. Including the 198.35.27.27 authdns VIP that bblack configured [09:10:14] wow [09:11:18] XioNoX: Anycast VIPs will have a backend in every DC or will transport links be used? [09:11:29] (I don't have much context sorry for the n00b question) [09:11:54] I mean, what kind of service can have a anycast VIP? [09:12:34] elukey: the main one is for https://phabricator.wikimedia.org/T98006 [09:13:05] and yes, each pop have 2 backend servers [09:17:31] okok got it, so no extra trasport usage [09:18:06] yeah, it's basically to trust BGP to route the user to their closest server [09:18:42] elukey: virtually any service can have one, depending on the needs, there are different challenges though. For example with external anycast, we can only advertise /24s [09:19:09] so if DNS and service X are on the same /24 if you want to "depool" one, you have to depool both [09:19:25] ack [09:19:28] or temporarily route it internally, etc... [09:52:34] heads-up: i am going to switch the backend of people.wikimedia.org to a buster machine. all your files have been copied. i tested 2 example dirs with httpbb [10:03:42] I'm going to run a quick test on icinga2001 for T253292 with puppet disabled, no impact expected [10:03:43] T253292: check_http and SNI support - https://phabricator.wikimedia.org/T253292 [10:14:51] much to my surprise, there doesn't seem to be ill side effects yet [10:43:42] may I get a review on https://gerrit.wikimedia.org/r/c/operations/puppet/+/597755? basically I need to take authdns1001 off teh authdns_servers while it is offline [11:40:00] anyone have any experience digging into structured facts with cumin queries? [11:40:35] ugh [11:45:16] cdanis: tl;dr not yet supported, IIRC it's a different API endpoint on puppetdb [11:45:39] I am looking at the "query" "language" for fact-contents now [11:45:48] with 4.0 there will be support for native types [11:45:53] also volans|off please respect your own day off [11:45:57] bool int etc... [11:46:23] that's why I will just say the tl;dr and not the full thing :-P [13:05:31] I think I asked this a few days ago, but — is pxe booting known to be broken in eqiad? I just tried on my fourth server there, and every time... [13:05:33] https://www.irccloud.com/pastebin/zYKW9G82/ [13:05:47] Who is the right person to ask for help with troubleshooting this? [13:12:15] hm… moritzm maybe? I suspect you updated the Buster image last week, maybe that's somehow related [13:12:40] although I guess this is most likely a networking thing [13:14:10] cdanis,XioNoX - I need to turn off for a bit netflow realtime in druid/turnilo, ok? [13:14:21] max one hour I think [13:14:34] elukey: okay, will we lose data? or will it just sit in kafka waiting to be consumed later? [13:14:35] if it is not a good moment let me know [13:14:51] (either way is fine imo, just curious) [13:15:07] cdanis: it follows the lambda architecture, we have batch jobs that override the "realtime" segments hourly [13:15:20] ah neat [13:15:21] plus all the data is in hive/hdfs [13:15:26] yeah go for it [13:15:30] from there we index it in druid etc.. [13:15:33] so yes no data loss :) [14:04:34] kormat: o/ - I am really interested in T252027, did you find anything new recently? [14:04:35] T252027: debian-installer: partman doesn't allow lvm LVs to be reused when reimaging - https://phabricator.wikimedia.org/T252027 [14:04:48] elukey: kormat is off today [14:06:09] ah okok :) [14:06:51] I am facing a similar issue for Druid's migration to Stretch, and we'll also have to deal with preserving /srv for kafka [14:07:06] not urgent, but I can help if needed [14:10:21] elukey: he's been working on it, but in the last few days we had some fires, but I recall he said during this week he might have found a way of doing it, but I reckon he hasn't been able to test it yet [14:10:34] ahh nice! [14:10:36] thanks [14:36:23] herron: hi there! [14:46:44] <_joe_> herron, wiki_willy: I was thinking that Ryan and Leo could be but on a similar schedule re: watching the introductory videos and doing Q&A sessions; have you already scheduled them? [14:46:56] <_joe_> if not we maybe should sync them [14:50:51] hey arturo! [14:52:19] _joe_: I know it's on Ryan's radar but not sure where it sits on his schedule at the moment. makes a lot of sense to combine them to me [14:52:22] ryankemper: ^^ [14:53:05] <_joe_> herron: in theory he should receive instructions from his mentor (not sure if it's you or guillame) [15:03:24] Q&A session itself is still to be scheduled afaik, so +1 for combining [15:07:01] <_joe_> to be clear, I wasn't volunteering for guiding people through the onboarding process myself, I've done it enough with the previous rounds of hires. I would just like to see the process we've defined move forward [15:11:32] +1 to whatever you all feel is good, im new here [15:11:37] ;-) [15:13:05] Yeah I'm still working through the videos and I think we were gonna do a q&a session next week [15:14:26] <_joe_> ryankemper: in theory you should watch (probably) 00 through 05 and then have a Q&A [15:14:33] <_joe_> else it's too much material [15:17:02] <_joe_> ryankemper: in case you didn't see it, the slides 00 have a youtube video https://www.youtube.com/watch?v=2Ieq9z6-m5I [15:17:44] I should (re-)watch some of those videos [15:18:04] _joe_: that makes sense, I started with a couple like the cumin one that I wanted specific info on but beyond that I think it makes sense to do 0-5 [15:18:09] cuz it is quite a lot [15:18:37] <_joe_> so, both for you and lmata, my suggestion is to watch the videos and pen down things you have doubts about [17:02:08] jbond42: non-urgently, I'm curious why the yaml schema didn't catch my stupid mistake here: https://gerrit.wikimedia.org/r/597830 [19:41:51] I would not want to be in charge of demonstrating that a cloud VM can't be compromised via changes to hiera :) [19:41:53] Sounds pretty easy to do? [19:42:02] Just add an extra root private key through hiera right? [19:42:10] uh, public key. what am I saying [19:42:35] yeah, that would do it [20:13:16] rzl: definetly a bug, will dig tomorrow and let you know [20:14:36] thanks! my best guess was something is mangling hyphens into underscores, maybe because of python identifier rules, but that's all I got [20:16:15] im also wondering if `additionalProperties: false` is at the wrong levle [20:34:50] FAIL: unable to validate data.yaml: [20:34:52] Additional properties are not allowed ('ssh-keys' was unexpected) [20:37:45] so I think something is wrong at another level; notably I don't see it being invoked in https://integration.wikimedia.org/ci/job/operations-puppet-tests-buster-docker/3159/console which is jenkins output for https://gerrit.wikimedia.org/r/c/operations/puppet/+/597830 [20:37:52] er, sorry, for https://gerrit.wikimedia.org/r/c/operations/puppet/+/597618 [20:40:13] I suspect that taskgen.rb needs an update [20:46:08] ... however, for whatever reason, run_ci_locally is bombing out for me right now [20:57:11] cdanis: ahh so its not ben run at all by CI, guess there is something wrong in the tox config (taskgen dosn;t call this script) [20:57:15] thx [20:57:47] jbond42: yeah I think I have patched it [20:58:02] oh sweet i missed that thx [21:04:18] +1 from me thanks [21:09:47] run_ci_locally is months out of date from the prod image lol [21:43:20] tbh i have not use it much but perhaps we could have a default or IMG_VERSION=latest? [21:44:07] *IMG_VERSION=${IMG_VERSION:-"latest"} [21:47:14] I think that might be reasonable but I'll let _joe_ weigh in [21:47:27] it might also be possible to parse the committed releng config [21:48:17] ack, sounds good