[02:30:26] harej, sorry was AFK. Yeah. Johan worked with us briefly on managing outreach around Wikilabels campaigns. [02:30:47] Keegan and I only worked on the CR perspective of how products should be socialized and deployed. [02:31:45] Keegan might have worked with us on following his process for Jade. I don't clearly remember. [02:37:32] Okay, thank you [08:18:30] PROBLEM - puppet on ORES-web02.Experimental is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Service[uwsgi],Service[uwsgi-ores] [08:46:30] RECOVERY - puppet on ORES-web02.Experimental is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [09:59:19] o/ [11:44:58] akosiaris: hey, when you have some time. Can you tell me about config management in kubernetes? I know they are in etcd but is there a version control system that writes to it? will it get translated to env vars in kubernetes or I should use a etcd client (I guess it's the latter) [12:12:50] wikimedia/wikilabels#471 (docker - 05b5d15 : Amir Sarabadani): The build was fixed. https://travis-ci.org/wikimedia/wikilabels/builds/495955117 [12:50:28] Amir1: https://kubernetes.io/docs/tutorials/configuration/ [12:50:43] we already use configmaps extensively [12:51:24] using helm (https://helm.sh/) via templates to populate them and instantiate them [12:52:04] Amir1: have a look at https://releases.wikimedia.org/charts/mathoid/templates/. The configmap and config.yaml templates have the actually file that is to be mounted in the pod and accessed by the pod [12:52:30] deployment.yaml has the small little bits under volumeMounts: to glue all together [12:52:51] the TL;DR is that from the application's PoV it's still files [12:53:03] so no need for you to do anything on that front [12:53:42] Thanks. The thing is how the service can read these configs [12:53:44] we are also working on the part of keeping around in a git repo the helm values.yaml files in order to allow code-review, history and all the other goodies a vcs gives us [12:54:09] it's still files [12:54:36] overall nothing changes for the service itself [12:54:52] Noted [12:55:00] I should look into this a little bit more [12:55:23] the only thing that is important to remember is that there is 1MB limitation in those config files [12:55:36] but ORES does not suffer from excessively long config files so no prob there [12:57:05] The current one is actually pretty big [12:57:18] how we are going to dissolve mediawiki-config IS.php then :D [12:59:52] o/ [13:00:15] Today is a snow day and that means I need an extra couple of hours for errands tonight, so I'm starting early today. [13:12:52] Amir1: yeah that one is an interesting one [13:14:45] akosiaris: I have lots of questions and ideas, do you think we can talk for half an hour? [13:14:52] sure [13:16:17] akosiaris: is it fine in hangout? [13:16:27] yup, want me to send a link ? [13:16:35] sure [13:45:23] (03PS1) 10Halfak: Updates submodules for rebuilt models + kowiki [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/491747 [13:45:38] (03PS2) 10Halfak: Updates submodules for rebuilt models + kowiki [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/491747 (https://phabricator.wikimedia.org/T215406) [13:45:55] 10ORES, 10Scoring-platform-team (Current), 10Patch-For-Review, 10User-Ladsgroup: Rebuild all models with revscoring 2.3.3 - https://phabricator.wikimedia.org/T215406 (10Halfak) https://github.com/wikimedia/ores-wmflabs-deploy/pull/104 [13:46:14] Amir1, https://gerrit.wikimedia.org/r/#/c/mediawiki/services/ores/deploy/+/491747 [13:46:44] halfak: yup, I was working on it. There is an issue [13:46:46] Also https://github.com/wikimedia/ores-wmflabs-deploy/pull/104 [13:46:50] Gotcha. [13:46:58] draftquality hasn't got updated from phabricator [13:47:12] PROBLEM - ores grafana alert on icinga2001 is CRITICAL: CRITICAL: ORES ( https://grafana.wikimedia.org/d/000000255/ores ) is alerting: ORES CPU usage alert codfw. [13:47:13] gerrit is not updated but phabricator mirror is updated [13:47:43] Oh weird. [13:47:51] (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Updates submodules for rebuilt models + kowiki [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/491747 (https://phabricator.wikimedia.org/T215406) (owner: 10Halfak) [13:48:27] I even removed the mirror URI from phabricator and added it again but didn't work [13:50:10] Amir1: so phabricator mirror is fine? [13:50:21] yup [13:50:35] Amir1: What is the gerrit mirror? I didn't even know gerrit had mirrors [13:50:57] it is mirrored to gerrit from phabricator. It's pushed from phab [13:51:07] https://phabricator.wikimedia.org/source/draftquality/manage/uris/ [13:51:32] paladox: do you have any clue ? ^ [13:51:32] It's observing wiki-ai/draftquality [13:51:45] Maybe the redirect to wikimedia/draftquality is a problem? [13:52:07] halfak: is phab looking at wiki-ai or wikimedia/? [13:52:11] is gerrit a mirror too? [13:52:14] halfak: no, in that case phab would be broken too but it's fine: https://phabricator.wikimedia.org/source/draftquality/history/master/ [13:52:18] Looks like editquality mirrors from wikimedia/... [13:52:45] Oh I see. I misunderstood and though phab was broken and gerrit was weird. [13:52:46] halfak: plus it's observing from wikimedia: https://phabricator.wikimedia.org/source/draftquality/manage/uris/ [13:52:47] halfak: if phab is able to handle the redirect gerrit should be able to consider (imho) thats something handled by the protocol? [13:52:52] not wiki-ai [13:53:45] Amir1, yeah. I was looking at drafttopic I guess. [13:53:46] why is there 2 uris pointing to gerrit Amir1 ? [13:54:23] (phab is basically the proxy that pulls from github and pushes to gerrit). I have two ideas on what's wrong: 1- network failure. A random commit should fix this 2- gerrit repo doesn't have push access to phab creds [13:54:38] RECOVERY - ores grafana alert on icinga2001 is OK: OK: ORES ( https://grafana.wikimedia.org/d/000000255/ores ) is not alerting. [13:54:41] Amir1: hows phab pushing to gerrit? [13:54:49] Zppix, with git [13:54:51] Amir1: is it using some sort of bot account? [13:54:58] Zppix: I disabled one to do a classic "tunring it off and on again" [13:55:15] halfak: well of course :P [13:55:18] Zppix: yes, it's credentials are stored in K18 [13:55:31] https://phabricator.wikimedia.org/source/draftquality/uri/view/21409/ [13:55:34] Amir1: whats the bot username so i can look at gerrit perms? [13:56:04] https://phabricator.wikimedia.org/K18 [13:56:08] "phab" [13:56:37] Amir1: looking [13:57:27] Amir1: it has push rights [13:57:37] Amir1: have you guys tried a dummy commit? [13:58:34] I'm on it [13:59:19] Amir1: if that doesn't work you may want to ask releng if it's possibly a phab issue, cause gerrit looks fine as far as i can see (which isnt much) [14:00:05] I just "scheduled an update" with phab. [14:00:10] Last update was over an hour ago. [14:00:18] halfak: tried that already didn't work [14:00:23] That was me [14:00:25] Over an hour ago? [14:00:28] Gotcha. [14:00:43] Yup, I was trying to fix this for hours now [14:00:45] Did you do that after you turned it off and on again ;P [14:01:02] after :D [14:01:06] Boo [14:01:14] Amir1: do you know when it stopped working? [14:01:24] after Jan 29 [14:01:32] based on commit log [14:01:46] I just pushed a dummy commit [14:01:51] we now just wait [14:04:26] should the uri have .git at the end of it? [14:05:32] nope [14:06:11] halfak: I'm going for lunch but in the mean time we have this: https://github.com/wikimedia/wikilabels/pull/254 [14:06:28] it basically makes running wikilabels as easy as doing "docker-compose up" [14:06:36] Amir1, before you go, did you make a task for the gerrit mirror issue? [14:06:42] not yet [14:06:49] OK. I'll make one and add some notes. [14:06:53] thanks [14:06:54] Have a good lunch :) [14:07:18] Thanks. Burggggerrr πŸ”πŸ”πŸ”πŸ” [14:17:50] 10ORES, 10Scoring-platform-team (Current), 10Release-Engineering-Team, 10draftquality-modeling, 10artificial-intelligence: Gerrit mirror from phab broken for source/draftquality - https://phabricator.wikimedia.org/T216616 (10Halfak) [14:18:21] 10ORES, 10Scoring-platform-team (Current), 10Release-Engineering-Team, 10draftquality-modeling, and 2 others: Gerrit mirror from phab broken for source/draftquality - https://phabricator.wikimedia.org/T216616 (10Zppix) [14:19:45] 10ORES, 10Scoring-platform-team (Current), 10Release-Engineering-Team, 10draftquality-modeling, and 2 others: Gerrit mirror from phab broken for source/draftquality - https://phabricator.wikimedia.org/T216616 (10Halfak) @Ladsgroup tried turning mirroring off and on again. We also tried scheduling an updat... [14:21:33] 10ORES, 10Scoring-platform-team (Current), 10Release-Engineering-Team, 10draftquality-modeling, and 2 others: Gerrit mirror from phab broken for source/draftquality - https://phabricator.wikimedia.org/T216616 (10Halfak) [14:21:53] OK I think I have most of the details in there. [14:22:31] I'm starting to get annoyed with all of these issues around mirroring when it comes to deploys. LFS mirroring has problems. Random repos refuse to mirror. [14:22:33] Arg. [14:24:05] halfak: This is why i hate git, lets just log changes on a notepad and call it a day eh? [14:24:51] ha! I'd definitely prefer git over that :P [14:25:34] halfak: so i should forget updating this notepad with all the changes logged? [14:55:15] 10ORES, 10Scoring-platform-team (Current), 10Release-Engineering-Team, 10draftquality-modeling, and 2 others: Gerrit mirror from phab broken for source/draftquality - https://phabricator.wikimedia.org/T216616 (10MarcoAurelio) So, Phab is the one out of sync. if I read this right? [14:57:58] halfak: so ^^ it's Phab diffusion what is out of sync right? [14:58:08] * hauskatze is trying to take a look [14:58:15] Nope. Diffusion is fine. It's gerrit. [14:58:23] Rather it could be that diffusion is not pushing to gerrit. [14:58:25] hauskatze: no gerrit is not getting changes from phab [14:58:35] But diffusion has recent commits. [14:58:51] aha, so I guess it might be some credential issue [14:59:09] Was thinking that. We're using the exact same credentials in other repos, so I'm unsure. [14:59:27] Regretfully, there's no error that is reported to us, so it's hard to say what is breaking down. [14:59:39] I mean, the credentials phab uses to push to gerrit [14:59:40] I think we might need releng to look into the diffusion logs to figure out what isn't happening. [14:59:44] nothing on your end I think [15:00:08] Right. We're using k18 which is managed by phab. Maybe draftquality expects different credentials. Hmm. [15:00:25] halfak: i dont even know if its diffusion per say it could be gerrit auth, but who knows... if its not broken then someone isnt doing their job right :P [15:00:52] * halfak runs to a meeting. [15:01:32] what is the gerrit repo for draftquality? [15:03:04] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @Tonina_WMDE & @tgr - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:03:38] hauskatze: scoring/ores/draftquality [15:03:54] ack [15:04:06] looks correctly configured to get changes [15:04:15] I think k18 might be out of sync [15:05:11] hauskatze: its possible is there a way to make sure its up to date then force it to try to update the repo again? [15:05:15] is it* [15:06:06] I know I can force replication gerrit-github and gerrit-phab but not vice-versa [15:06:13] maybe there's a possibility [15:06:29] hauskatze: isnt there a button on phab that makes it update? [15:06:39] yup, I clicked it already [15:07:07] hauskatze: did you ensure the credentials are up-to-date (im trying to catch up) [15:08:09] Zppix: I cannot see the contents of the credentials [15:08:20] hauskatze: join the club :D [15:08:20] only SRE have them [15:08:37] hauskatze: so will sre have to get involved then? [15:08:42] if what you do doesnt fix it [15:08:44] I can see that there are credentials and I can use them, but I cannot examine its contents [15:08:57] I could make some calls :) [15:09:09] but Phabricator is twentyafterfour [15:09:17] hauskatze: just lmk if you need anything, I may be able to help until aaron gets back from his meetings [15:09:28] (but i have limited access) [15:09:34] ok, I'll try something, not promises though [15:09:46] hauskatze: its not like you can break it anymore :P [15:15:39] using K19 and updating the repo didn't helped [15:15:56] maybe you could make a dummy commit upstream and see if that kicks the update? [15:16:06] assuming you can merge on github [15:20:45] we have tried a dummy commit [15:21:03] I have someone taking a look too [15:23:35] 10ORES, 10Scoring-platform-team (Current), 10Release-Engineering-Team, 10draftquality-modeling, and 2 others: Gerrit mirror from phab broken for source/draftquality - https://phabricator.wikimedia.org/T216616 (10MarcoAurelio) I've tried using {K19} with URI `ssh://phabricator@gerrit.wikimedia.org:29418/sco... [15:31:13] thanks hauskatze [15:32:10] np [15:32:48] did git lfs gave you problems in the past, I reckon hearing complains about that but I can't quite remember [15:33:18] hauskatze: I know halfak has brought up earlier that lfs has been nothing but issues so i'd assume so but i'd have to look through phab [15:33:52] hauskatze: i see 24 said its gerrit denying phab [15:34:24] yup [15:34:30] so it's the credentials [15:34:35] I think that's easier to fix [15:34:43] hauskatze: (TM) [15:34:56] 10ORES, 10Scoring-platform-team (Current), 10Release-Engineering-Team, 10draftquality-modeling, and 2 others: Gerrit mirror from phab broken for source/draftquality - https://phabricator.wikimedia.org/T216616 (10mmodell) It seems that gerrit is rejecting phabricator's credentials. I'm not sure what changed... [15:38:51] 10ORES, 10Scoring-platform-team (Current), 10Release-Engineering-Team, 10draftquality-modeling, and 2 others: Gerrit mirror from phab broken for source/draftquality - https://phabricator.wikimedia.org/T216616 (10MarcoAurelio) @mmodell if K18 credentials are being rejected as well maybe we should regenerate... [15:52:43] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @Tonina_WMDE & @tgr - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:54:23] Zppix, it's not really git LFS that is an issue. Bur LFS mirroring to gerrit has been broken. [15:54:31] gerrit has been the headache it seems [15:54:46] hauskatze, ^ [15:54:54] it looks this time it's gerrit refusing the phabricator credentials [15:55:05] so no lfs this time, apparently [15:55:12] it's being looked into afaics [16:08:37] halfak: Ah, thats what it was i knew it was something with git lfs [16:09:07] Right. The big concern I have with git LFS is how it can leave your repo in a messy state if it fails. [16:09:50] halfak: to bad there isnt a less craptastic alt [16:11:39] Actually, LFS is heads and shoulders better than the alternatives. Ha! [16:14:15] halfak: thats... concerning lol [16:14:21] I never worked with lfs not I'm sure how it really works, but that's for another debate [16:14:23] :) [16:14:41] the repos I work on are small, compared to the ORES ones, so we probably don't need it [16:14:45] hauskatze: i honestly dont see a difference but what do i know [16:15:08] hauskatze, essentially, it's a CDN that is tied to git commands/commits. When it works, it works beautifully. You can version large files along with code without having to download the entire history of large files when cloning. [16:15:21] aha [16:15:39] All our models, deployment wheels, and data-assets are in LFS. But all of the code is in regular git. [16:15:44] but aaron i just got done buying a whole datacenter so i could clone the repos :P [16:16:44] now they'll ask you for funding Zppix :P [16:17:01] hauskatze: you think tech has some room in its budget? [16:17:26] I don't know [16:17:48] hauskatze: someones gotta pay for the dc i bought :P [16:17:53] anyway... [16:18:05] so whats the plan with the credientals? [16:18:24] I think 20after4 is/was/will take a look [16:29:03] halfak: SoS, anything to report? [16:30:07] Amir1, checking.. [16:30:32] Oh! We're blocking Growth. Still working to deploy the itwiki goodfaith model fix. [16:38:42] Noted [18:07:46] Network is still failing. Won't be able to join bs call [18:10:25] I'm working on setting up tethering. [18:11:04] Oh I think it worked [18:12:53] I'm now in the BS call if y'all are interested. [18:19:35] * harej is reading a short novel about the Cloud outages [18:25:34] okay, having read that novel, I will take a shower and then follow up with a nice speech. Very literary morning. [18:27:39] I need to be afk [18:27:42] I'll be back [18:36:51] Woops. Looks like I don't have time for lunch today. [18:36:58] I'll be grabbing something during our sync meeting. [18:47:51] Amir1 & harej: Since we covered so much in the staff meeting, maybe we can also skip the sync make-up meeting. What do you think? [18:48:09] Alternatively, Amir1 and I could use the time to work out consistency issues in our code bases. [18:49:42] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL - No data received from host [18:50:03] Up for me [18:50:44] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.347 second response time [18:51:34] PROBLEM - ORES web node labs ores-web-02 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL - No data received from host [18:51:55] You and Amir should take this time to work on... that [18:52:00] harej, ^ is this cloud maintenance? [18:52:24] Nagios? I don’t think we have a maintenance window right now [18:52:49] Hmm [18:53:48] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Cannot make SSL connection. [18:54:16] https://ores.wmflabs.org/ is up for me. I can SSH to ores-web-02 [18:55:44] RECOVERY - ORES web node labs ores-web-02 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.326 second response time [18:55:54] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.349 second response time [18:56:01] -._o_.- [18:56:44] Well looks like we don't have Amir1 so I'm jumping out of the call. Amir1, please ping when you get back. [18:59:28] PROBLEM - ORES web node labs ores-web-01 on ores.wmflabs.org is CRITICAL: CRITICAL - Cannot make SSL connection. [19:00:36] RECOVERY - ORES web node labs ores-web-01 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.372 second response time [19:12:57] halfak: hey, I'm around now [19:13:02] sorry I was caught up [19:13:06] OK I'll jump in the call. [19:14:02] On it [20:19:15] I'll go eat something and get a break. will be back [20:54:41] halfak: scoring-team is the backlog you want me to groom, not scoring-team-current, right? [20:54:51] (though there's nothing to groom at the moment) [20:55:06] Right. [20:58:28] halfak: do you want me to take a stab at prioritization as well, or is that too subjective? [20:58:49] Yeah. Take a try. Post here and we can discuss. :) [21:14:02] harej, I'm going to need to run out the door soon. Snowpocalypse is calling and I started early so I could run away early :) [21:14:24] I don't think you're allowed to call it Snowpocalypse if a foot of snow is a regular thing for you [21:14:47] When it snowed 18 inches in DC when I lived there, that was absolutely not normal, and the worst storm in multiple years. [21:15:02] But, have fun! Work those arms! [21:15:08] I imagine shoveling is good core exercise too. [21:15:42] Yup. Lower back and hamstrings definitely. [21:22:02] back [22:14:41] Okay I'm gone for the day