[10:50:51] [!!] the IP allocation migration to Netbox is happening in ~10 minutes. Please do not merge DNS patches until it's over. https://wikitech.wikimedia.org/wiki/DNS/Netbox for context if you missed my email :) [11:33:32] [!!] IP allocation migration to Netbox completed. You can resume normal merges in the DNS repo, but keep in mind https://wikitech.wikimedia.org/wiki/DNS/Netbox#Transition_FAQ [12:08:02] jclark-ctr cmjohnson1, is the C2 and C3 PDU work still on for today? [12:12:38] I am picking up where I left off last week, D4 & D5 [12:12:38] D4: iscap 'project level' commands - https://phabricator.wikimedia.org/D4 [12:12:38] D5: Ok so I hacked up ssh.py to use mozprocess - https://phabricator.wikimedia.org/D5 [12:14:35] @marostegui: I am out for a while broke my hand have surgery tomorrow [12:15:04] cmjohnson1: thanks, per wiki_willy's email past week today we had C2 and C3, so I am wondering if that is happening or if it is postponed [12:15:10] jclark-ctr: uffff, take care dude :* [12:21:59] good luck and i hope everything goes well [12:51:58] marostegui: sorry it's racks d5 and d6 todya [12:52:34] do you have anything in d6? that is the 2nd one, getting ready to pull side b power in d5 now [12:52:52] cmjohnson1: checking [12:53:56] cmjohnson1: yeah, we have something on d6, let me stop mysql on that host [12:54:32] cmjohnson1: so C2 and C3 won't happen today? Just to confirm I can bring those mysql back up [12:55:05] Confirmed [12:55:34] cmjohnson1: thanks. db1122 is stopped, so all good from my side with D6 [12:55:35] D6: Interactive deployment shell aka iscap - https://phabricator.wikimedia.org/D6 [13:32:43] d5 is all on new pdu, moving to d6 now [15:59:26] Hi All, got a bit of a wierd issue i cant work out so second pair of eyes would be usefull. I have tried to describe the issue https://phabricator.wikimedia.org/T246890#6459242 but tl;dr is that to reboot the PDU's (at least the specific one tested) one needs to call the restart script from within the frame. loading the form outside of the fram and posting seems to have no affect. or more to the [15:59:32] point i cant work out how to post the form ... [15:59:34] ... from python [16:02:45] jbond42: is this a PDU that is safe to restart 'at will'? [16:03:50] cdanis: yes [16:06:41] as FYI I just completed the rollout out ferm rules on kafka jumbo, so now only a subset of production is able to pull data from it [16:07:02] if you have an important client that uses jumbo please check that it works correctly [16:07:20] I just checked on centrallog1001 and kafkatee seems working fine (but double checking is always appreciated) [16:07:56] FYI Krinkle ^ [16:18:35] cdanis: yeah it restarts the controller but doesn't touch the power ports [16:19:26] jbond42: if I send a Chrome U-A it works [16:20:41] can you get it to work every time? (and did you do any other updates to the snippet in the task?) [16:21:34] cdanis: ^ ? [16:22:20] sorry you need *three* things to get it to work consistently [16:22:22] wow [16:22:56] just a minute, going to do one more reboot to see it work hopefully-consistently [16:23:07] ack thanks [16:25:01] it didn't work once, and then it did [16:26:03] we had over 50 crit alerts, acked all the ones on "testvirt" and the logstash hosts that had disabled notifications for some reason.. the ps alerts are expected. more signal amongst the noise now. [16:27:02] I can't tell if what I'm doing is just confirmation bias or not, lol [16:27:32] cdanis: yes i have had it work sporadicly on other pdu's. this pdu is the one that seems to fail most reliably via python but not if you use the browser. the only time i have had this reboot with python was with this specific pice of madness and only once https://phabricator.wikimedia.org/P12581 [16:28:10] the only once is why i didn;t want to post it earlier as it could be some other random coincidence that got it to work [16:30:41] okay [16:31:16] * jbond42 would be very supprised if the Sec-Fetch-Dest header had an affect as this is embeded FW but was clutching at straws [16:32:15] trying one more thing, I think I have it [16:34:34] jbond42: updated your paste, this has worked three times in a row for me, and I believe it to be the minimal example [16:36:04] four times in a row :) [16:36:20] now I am kind of horrified to imagine what is going on inside the PDU's software [16:36:58] I noticed that it worked every *second* time I tried it, and then in browser devtools that it was doing the fetch against /restarting.html twice for whatever reason, despite them both being 200s [16:37:41] oh, it needs to escape the frame I think? interesting [16:39:58] cdanis: thanks alot i had noticed a simlar thing and wondered if there is some strange statre machine. dosn;t help that i somehoe manage to start working with the wrong endpoint /Forms/reboot_1 instead of Forms/restart_1 [16:41:31] yeah there almost must be some strange state machine [16:41:48] and yeah :) fortunately I noticed that pretty quickly, easy thing to miss or typo [16:42:41] ok, worked 5 times in a row [16:46:56] thanks i have done a bit of testing and think i can drop the first get as well, so it seems it really is: POST Forms/restart_1 then GET /restarts.html [16:48:30] yeah I wasn't sure if the first was needed for the C5 cookie to be present on the first POST, but it seems like not [16:49:27] might only be needed on a wedensday who knows ;) [16:51:28] ottomata: elukey: awesome (RE: kafka ferm rules), that's a great achievement [16:51:52] also andecdotally confirmed from bastions and mw* hosts [16:53:30] Krinkle: thanks! I hope to eventually move to a proper authn+z scheme with Kerberos, but it takes time :) [19:11:03] anyone around to give https://gerrit.wikimedia.org/r/c/operations/puppet/+/627348/ a quick stamp? [19:16:41] thanks cdanis [19:21:51] rzl: there uid is actually sudhanshugautam [19:23:42] jbond42: ah thanks, I copied it from the wrong ldap [19:23:53] ah, good catch jbond42 [19:23:55] appreciate the catch [19:24:38] np [19:25:44] PS3 fixes [19:26:00] +1 [19:28:24] anyone tried the Element client yet? [19:32:05] just in-browser and iOS, but both were pretty nice [19:53:46] cool