[06:17:55] good morning [06:46:54] (03CR) 10Nik Gkountas: [C:03+2] Recommendations based on difficulty level (033 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1171315 (https://phabricator.wikimedia.org/T399117) (owner: 10Sbisson) [06:48:20] (03Merged) 10jenkins-bot: Recommendations based on difficulty level [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1171315 (https://phabricator.wikimedia.org/T399117) (owner: 10Sbisson) [06:55:12] (03PS15) 10Gkyziridis: revertrisk-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) [06:56:32] (03CR) 10CI reject: [V:04-1] revertrisk-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) (owner: 10Gkyziridis) [06:58:40] good morning! [07:01:12] good morning [07:03:52] 10Lift-Wing, 06Machine-Learning-Team: revertrisk model servers should return a 400 response for non canonical language names - https://phabricator.wikimedia.org/T399437#11033673 (10kevinbazira) So the [new dependencies](https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/8e5412476a2bc... [07:09:10] (03PS1) 10Kevin Bazira: RRML: pin numpy version to <2.0 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172515 (https://phabricator.wikimedia.org/T399437) [07:12:50] (03CR) 10Bartosz Wójtowicz: [C:03+1] "LGMT, thank you for looking into this and solving it extremely fast <3" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172515 (https://phabricator.wikimedia.org/T399437) (owner: 10Kevin Bazira) [07:13:46] things are looking good on the edit-check side. First requests started coming yesterday after the train deployment https://grafana.wikimedia.org/goto/0i3ieiQHg?orgId=1 [07:15:53] (03CR) 10Kevin Bazira: [C:03+2] "Thanks for the fast review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172515 (https://phabricator.wikimedia.org/T399437) (owner: 10Kevin Bazira) [07:17:18] (03Merged) 10jenkins-bot: RRML: pin numpy version to <2.0 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172515 (https://phabricator.wikimedia.org/T399437) (owner: 10Kevin Bazira) [07:24:35] good morning [07:26:12] (03PS16) 10Gkyziridis: revertrisk-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) [07:27:30] (03CR) 10CI reject: [V:04-1] revertrisk-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) (owner: 10Gkyziridis) [07:29:53] (03PS17) 10Gkyziridis: revertrisk-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) [07:31:58] (03CR) 10CI reject: [V:04-1] revertrisk-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) (owner: 10Gkyziridis) [07:37:36] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade revertrisk model server from the debian bullseye base image to bookworm. - https://phabricator.wikimedia.org/T400266#11033753 (10gkyziridis) == Update Blubber Syntax == I tried to update the blubber syntax to the latest version of buildk... [07:56:18] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade revertrisk model server from the debian bullseye base image to bookworm. - https://phabricator.wikimedia.org/T400266#11033793 (10elukey) @gkyziridis o/ I think that the problem may be due to setting use-system-site-packages to false, sinc... [08:33:55] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade revertrisk model server from the debian bullseye base image to bookworm. - https://phabricator.wikimedia.org/T400266#11033886 (10gkyziridis) >>! In T400266#11033793, @elukey wrote: > @gkyziridis o/ I think that the problem may be due to s... [08:36:25] (03PS18) 10Gkyziridis: revertrisk-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) [08:39:22] (03CR) 10Gkyziridis: revertrisk-model: Update base image from bullseye to the latest bookworm image. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) (owner: 10Gkyziridis) [08:42:14] bartosz: I ended up with using the older blubber syntax since it needs lot of changes between version `0.21.0` to `1.3.1`. I will open a new phab task dedicated to blubber updates. Lets review this one in order to proceed with the rest updates: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1172297 [08:54:26] (03CR) 10Bartosz Wójtowicz: [C:03+1] "Thank you for the work here <3" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) (owner: 10Gkyziridis) [09:07:45] 06Machine-Learning-Team: Update blubber syntax - https://phabricator.wikimedia.org/T400446 (10gkyziridis) 03NEW [09:12:24] thanks for the reviews bartosz [09:12:25] going to deploy the new RRML image on staging ... [09:13:30] 06Machine-Learning-Team, 07Essential-Work: Upgrade langid model server from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400347#11033959 (10gkyziridis) a:03gkyziridis [09:15:38] kevinbazira: I see that you are deploying the new image on staging. Should we merge the patch for updating revertrisk's base image (from bullseye -> bookworm) ? [09:17:53] georgekyz: deployment is complete. what patch are you refering to? [09:18:10] kevinbazira: this one: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1172297 [09:18:42] kevinbazira: I think this is not affecting anything, but we can coordinate about the deployment of that, right ? [09:18:53] okok that's not yet merged. I have deployed this one: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1172527 [09:19:28] yeah no problem, we can coordinate whenever you're ready [09:20:50] the next step was going to be running load tests. would you like me run them after your patch deployed? [09:21:06] georgekyz: --^ [09:23:49] Do you believe that it would be a good idea to run the loadtests after deploying the model with the bookworm image? In that way we will run the load tests using the model with the bookworm image. What are your thoughts ? [09:25:04] let me proceed with this change so that we can isolate performance issues if they arise [09:25:12] perfect alright [09:25:35] Go ahead and run the tests, I will merge this patch for now, and then we can coordinate on the deployment phase [09:25:46] thnx for the response @kevinbazira [09:26:06] okok np! [09:28:38] (03CR) 10Gkyziridis: [C:03+2] revertrisk-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) (owner: 10Gkyziridis) [09:31:48] (03Merged) 10jenkins-bot: revertrisk-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172297 (https://phabricator.wikimedia.org/T400266) (owner: 10Gkyziridis) [09:35:59] (03PS1) 10Gkyziridis: langid-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172597 (https://phabricator.wikimedia.org/T400347) [09:36:47] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade langid model server from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400347#11034042 (10gkyziridis) == Testing Bookworm on langid == Tested it on ml-tesing machine. ` Total wall clock time: 41s Downloaded: 1 files,... [09:46:08] 10Lift-Wing, 06Machine-Learning-Team: revertrisk model servers should return a 400 response for non canonical language names - https://phabricator.wikimedia.org/T399437#11034058 (10kevinbazira) After fixing the numpy issue (T399437#11033673), load tests for both RRLA and RRML now show zero failures. ` Type... [09:49:54] Bartosz the load test failures we experienced in: https://phabricator.wikimedia.org/T399437#11031097 [09:49:54] have now been resolved as shown in: https://phabricator.wikimedia.org/T399437#11034058 [09:49:54] both RRLA and RRML are running well on staging. we will deploy them in prod on monday [10:01:49] 06Machine-Learning-Team, 07Essential-Work: Update blubber version in inference services images - https://phabricator.wikimedia.org/T400446#11034079 (10isarantopoulos) [10:05:50] That's amazing, thank you a lot Kevin <3 [10:10:56] (03PS1) 10Kevin Bazira: locust: add RR locust load test results [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172605 (https://phabricator.wikimedia.org/T399437) [10:19:59] 10Lift-Wing, 06Machine-Learning-Team, 07Essential-Work: Fix locust load test for edit-check - https://phabricator.wikimedia.org/T400460 (10isarantopoulos) 03NEW [10:44:21] (03CR) 10Bartosz Wójtowicz: [C:03+1] "Thank you for running the load tests! It seems that the model server became slightly faster with the new updates :D" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172605 (https://phabricator.wikimedia.org/T399437) (owner: 10Kevin Bazira) [10:54:16] (03PS1) 10Gkyziridis: ores-legacy-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172612 (https://phabricator.wikimedia.org/T400348) [10:55:29] (03PS2) 10Gkyziridis: ores-legacy-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172612 (https://phabricator.wikimedia.org/T400348) [10:57:04] (03CR) 10Gkyziridis: ores-legacy-model: Update base image from bullseye to the latest bookworm image. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172612 (https://phabricator.wikimedia.org/T400348) (owner: 10Gkyziridis) [11:08:35] (03CR) 10Bartosz Wójtowicz: [C:03+1] "LGTM, you're rocking the MLOps week! :D" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172612 (https://phabricator.wikimedia.org/T400348) (owner: 10Gkyziridis) [11:15:15] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade ores-legacy from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400348#11034303 (10gkyziridis) == Testing Bookworm on ores-legacy model == Try to build the image with the current buildkit: `buildkit:v0.15.0` and bullsey... [11:16:04] (03CR) 10Kevin Bazira: [C:03+2] "yep, a much needed boost! :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172605 (https://phabricator.wikimedia.org/T399437) (owner: 10Kevin Bazira) [11:16:39] (03Merged) 10jenkins-bot: locust: add RR locust load test results [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172605 (https://phabricator.wikimedia.org/T399437) (owner: 10Kevin Bazira) [11:20:34] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade ores-legacy from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400348#11034340 (10gkyziridis) a:03gkyziridis [12:02:06] 06Machine-Learning-Team, 07Essential-Work: Upgrade articletopic-outlink model servers from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400349#11034463 (10gkyziridis) == Testing bookworm for articletopic-outlink model == Create a Dockerfile with bookworm base image at ml-testing machine `... [12:08:16] (03PS1) 10Gkyziridis: articletopic-outlink-model: Update base image from bullseye to the latest bookworm image. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172620 (https://phabricator.wikimedia.org/T400349) [12:27:08] (03CR) 10Gkyziridis: ores-legacy-model: Update base image from bullseye to the latest bookworm image. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1172612 (https://phabricator.wikimedia.org/T400348) (owner: 10Gkyziridis) [13:11:47] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade revertrisk model server from the debian bullseye base image to bookworm. - https://phabricator.wikimedia.org/T400266#11034580 (10elukey) >>! In T400266#11033886, @gkyziridis wrote: >>>! In T400266#11033793, @elukey wrote: >> @gkyziridis o... [13:27:33] 06Machine-Learning-Team, 07Essential-Work: Update blubber version in inference services images - https://phabricator.wikimedia.org/T400446#11034609 (10gkyziridis) == RevertRisk blubber Update == I am quoting here some comments from: https://phabricator.wikimedia.org/T400266#11034580 related to issues we faced... [13:28:37] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade revertrisk model server from the debian bullseye base image to bookworm. - https://phabricator.wikimedia.org/T400266#11034611 (10gkyziridis) >>! In T400266#11034580, @elukey wrote: >>>! In T400266#11033886, @gkyziridis wrote: >>>>! In T40... [13:36:38] I will be off for the next 30-45 mins, I will be back afterwords if anything is needed. [16:09:10] going afk folks o/