[07:46:45] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Manange fundraising network elements from Netbox - https://phabricator.wikimedia.org/T377996#10257838 (10ayounsi) It's quite a big task overall, splitting it into several well defined sub-tasks will make it easier to accomplish. For example... [08:00:22] 10Mail, 06Infrastructure-Foundations, 06SRE, 10Wikimedia-Mailing-lists: Replace Exim on lists.wikimedia.org with Postfix - https://phabricator.wikimedia.org/T378021#10257862 (10Peachey88) [08:07:23] 10Mail, 06Infrastructure-Foundations, 06SRE, 10vrts, 10Znuny: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10257878 (10Peachey88) [08:11:25] FIRING: SystemdUnitFailed: netbox_report_vlan_migration_run.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:16:36] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Manange fundraising network elements from Netbox - https://phabricator.wikimedia.org/T377996#10257896 (10cmooney) >>! In T377996#10257838, @ayounsi wrote: > It's quite a big task overall, splitting it into several well defined sub-tasks wil... [08:18:11] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Manange fundraising network elements from Netbox - https://phabricator.wikimedia.org/T377996#10257898 (10ayounsi) > If we don't want to use dummy interface names I think the simple way forward is Option 1, which seems like a big improvement... [08:23:40] ^ that should recover [08:26:25] RESOLVED: SystemdUnitFailed: netbox_report_vlan_migration_run.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:27:55] told ya [08:49:45] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10257925 (10LSobanski) [09:05:25] 10Mail, 06Infrastructure-Foundations: Create a mail address for Russian Wikipedia oversighters - https://phabricator.wikimedia.org/T378069 (10MBH) 03NEW [09:11:46] 10Mail, 06Infrastructure-Foundations: Create a mail address for Russian Wikipedia oversighters - https://phabricator.wikimedia.org/T378069#10258005 (10Bugreporter) 05Open→03Stalled What do you want? * If you want a mailing list, see #Wikimedia-Mailing-lists * If you want a VRT queue, then Phabricator is no... [09:12:17] FYI, I'm going to reboot the netboxdb hosts in a bit [09:12:45] 10netops, 06Infrastructure-Foundations, 06SRE: Automate interface configuration for pfw firewalls using Netbox data - https://phabricator.wikimedia.org/T378070 (10cmooney) 03NEW p:05Triage→03Medium [09:21:35] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Automate interface configuration for pfw firewalls using Netbox data - https://phabricator.wikimedia.org/T378070#10258033 (10cmooney) [09:21:39] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Manange fundraising network elements from Netbox - https://phabricator.wikimedia.org/T377996#10258034 (10cmooney) [09:21:45] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Manange fundraising network elements from Netbox - https://phabricator.wikimedia.org/T377996#10258049 (10cmooney) [09:25:32] 10Mail, 06Infrastructure-Foundations: Create a mail address for Russian Wikipedia oversighters - https://phabricator.wikimedia.org/T378069#10258054 (10MBH) @Bugreporter I want to create an email address like //oversight-en-wp@wikipedia.org//, see https://en.wikipedia.org/wiki/Wikipedia:Oversight [10:06:14] jhathaway: swraid vs hdraid is totally a service owner decision. The tree of decision is something like this: the service owner asks for quotes of a specific config (hopefully one of the standards we have or a custom one if needed). For hosts with few disks (tipically 2) we don't need a disk/raid HW controller so we don't have it and naturally they get SW raid. [10:08:08] for those with many disks usually you're forced to get a controller to handle them all, and the controller comes with HW raid capabilities, and again the service owner can choose what they need. As a rule of thumb all the DBs have HW RAID10, distributed systems that handle redundancy by themselves have either JBOD or one RAID0 per disk (emulating jbod) although AFAIK they are trying [10:08:14] to avoid this in favor of pure JBOD. [10:08:48] and then there are some specific cases that use HW raid with a different setup [10:11:53] an easy way to see which hosts have a controller of which family can be via puppetboard's fact raid_mgmt_tools. Or via cumin queries like: 'F:raid_mgmt_tools ~ megaraid' [10:15:40] the other way ofc is via partman's preseed.yaml file in puppet :) [10:18:05] but in partman you will not see what HW RAID config was made, just how the resulting disks are used. [10:25:31] that's a great summary volan.s TIL [10:30:46] thanks :) [10:32:33] I'm afraid I don't think we have an easy discoverable way to get the raid config, apart from querying the controller ofc, the generic one I can think of is looking at the rack/setup/install related task (click on procurement task from Netbox then click on the related setup task) that has a line or Partitioning/Raid. See T306928 for example [10:32:34] T306928: Q3:(Need By: TBD) rack/setup/install db1185.eqiad.wmnet - db1195.eqiad.wmnet - https://phabricator.wikimedia.org/T306928 [10:35:50] * volans|off fading back off [11:40:21] maybe it's exposed via redfish? [12:14:03] I'm going to start messing around with idp-test1004, it shouldn't affect anything [12:27:42] I am getting authentication errors trying to login to my developer account (https://idp.wikimedia.org/login) [12:27:54] Possibly started happening after resetting my password [12:28:07] I have tried resetting my password a few times but none of them work [12:28:11] Username: Dom Walden [12:28:18] I'll take a look [12:28:21] Thanks [12:29:11] Did you use the "Forgot password" or did you update it from the "Change password page"? [12:29:57] Okay, the password reset [12:30:27] I used the "Forgot your password?" link [12:35:02] How far do you get when trying to reset the password? You get an email, and enter a new password? [12:36:48] I got the email, entered a new password and it appeared to submit correctly [12:41:52] On the LDAP level the change seems to have happened, I can see the followin in OpenLDAP: [12:41:54] pwdChangedTime: 20241024120745Z [12:41:56] modifiersName: cn=bitu,ou=profile,dc=wikimedia,dc=org [12:41:57] modifyTimestamp: 20241024120745Z [12:42:25] So far so good, the IDM logs a bit to little about the event [12:45:10] I just tried to login and it returned: "500 Internal Server Error - Your request caused an error on the server." [12:45:39] The url says idp.wikimedia.org or idm.wikimedia.org? [12:46:27] https://idm.wikimedia.org/complete/cas/ [12:46:54] Okay, perfect... well not perfect, but you know [12:48:29] This page: https://idp.wikimedia.org/login say: Log In Successful? [12:49:57] Yes [12:50:13] Good, then it's "just" the IDM that broken [12:55:51] Okay, I don't think it's just you [12:56:01] moritzm: Did you get an error access the IDM? [13:07:07] dwalden: Could you try going to https://idm.wikimedia.org/keymanagement/ [13:07:32] Just to see if it's an issue with one page or something in general [13:08:27] I go the idm landing page, click the sign in button, was redirected to https://idm.wikimedia.org/ldapbackend/properties/ [13:08:35] If I go to https://idm.wikimedia.org/keymanagement/ again it seems to load fine [13:08:56] Okay, but https://idm.wikimedia.org/ldapbackend/properties/ does not ? [13:09:20] Sorry, https://idm.wikimedia.org/ldapbackend/properties/ also loaded fine [13:10:04] Oooh,.... I think you tried to sign in right as I did the server switch over. I don't feel it should break that much though [13:11:01] I did find one issue though, it shouldn't make a difference, but it's still a bug, so that good to find [13:12:33] In any case it's working now, so I have a bit more time to debug :-) [13:18:38] Thanks for investigating [13:18:56] Anytime, thank you for reporting. [13:19:32] We just updated the software today, so the risk of me breaking something is very real :-) [13:21:07] And I did find a bug, so everyone wins [13:36:47] slyngs: idm worked and works for me [13:36:56] thanks for the software raid overview volans|off [13:39:52] moritzm: I just noticed an exception right after a sign in from you, but that happens for everyone due to this: https://gerrit.wikimedia.org/r/c/operations/software/bitu/+/1082782