[05:30:20] (03PS2) 10Kevin Bazira: article-country: normalize score based on categories and properties [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) [05:53:41] (03PS1) 10Santhosh: logging: add more details and add generic exception handler [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1092458 [05:55:10] (03CR) 10CI reject: [V:04-1] logging: add more details and add generic exception handler [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1092458 (owner: 10Santhosh) [05:56:20] (03PS2) 10Santhosh: logging: add more details and add generic exception handler [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1092458 [05:56:59] (03CR) 10CI reject: [V:04-1] logging: add more details and add generic exception handler [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1092458 (owner: 10Santhosh) [06:08:45] (03PS3) 10Santhosh: logging: add more details and add generic exception handler [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1092458 [06:20:08] good morning o/ [08:11:04] Good morning o/ [08:50:49] (03CR) 10Ilias Sarantopoulos: article-country: normalize score based on categories and properties (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [09:03:44] (03PS1) 10Kevin Bazira: test: update outlink transformer test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1092759 (https://phabricator.wikimedia.org/T360120) [09:36:01] (03PS1) 10Kevin Bazira: test: update outlink transformer test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1092759 (https://phabricator.wikimedia.org/T360120) [09:38:36] (03CR) 10Kevin Bazira: "This was tested with:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1092759 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [10:15:23] (03PS3) 10Kevin Bazira: article-country: normalize score based on categories and properties [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) [10:17:12] (03CR) 10CI reject: [V:04-1] article-country: normalize score based on categories and properties [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [10:21:35] (03PS4) 10Kevin Bazira: article-country: normalize score based on categories and properties [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) [10:23:30] (03CR) 10Kevin Bazira: article-country: normalize score based on categories and properties (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [10:28:12] 06Machine-Learning-Team, 06Data-Platform: Create an aiflow instance for ML - https://phabricator.wikimedia.org/T380258 (10isarantopoulos) 03NEW [10:29:20] 06Machine-Learning-Team, 06Data-Platform: Create an Airflow instance for ML - https://phabricator.wikimedia.org/T380258#10335155 (10isarantopoulos) [11:44:05] 06Machine-Learning-Team, 06Data-Platform, 06Data-Platform-SRE: Create an Airflow instance for ML - https://phabricator.wikimedia.org/T380258#10335314 (10BTullis) Thanks @isarantopoulos - I'm sure that we can help you, here. > We would like our own instance on the dse-k8s cluster similar to what is set up fo... [12:04:56] (03PS3) 10AikoChou: reference-quality: update output schema [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1087446 (https://phabricator.wikimedia.org/T378939) [12:15:13] (03CR) 10AikoChou: reference-quality: update output schema (034 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1087446 (https://phabricator.wikimedia.org/T378939) (owner: 10AikoChou) [12:34:58] * klausman late lunch [12:38:05] (03CR) 10Eamedina: [C:03+1] logging: add more details and add generic exception handler [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1092458 (owner: 10Santhosh) [12:41:09] (03PS2) 10AikoChou: utils: decorators for logging the size of fetched and preprocessed data [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088231 (https://phabricator.wikimedia.org/T374034) [12:41:10] (03PS4) 10AikoChou: metric_utils.py: add prometheus metrics for fetched/preprocessed size [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088242 (https://phabricator.wikimedia.org/T374034) [12:45:11] (03CR) 10AikoChou: "Solved" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088242 (https://phabricator.wikimedia.org/T374034) (owner: 10AikoChou) [12:47:04] (03CR) 10AikoChou: utils: decorators for logging the size of fetched and preprocessed data (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088231 (https://phabricator.wikimedia.org/T374034) (owner: 10AikoChou) [13:41:58] (03CR) 10Sbisson: [C:03+2] logging: add more details and add generic exception handler [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1092458 (owner: 10Santhosh) [13:42:39] (03Merged) 10jenkins-bot: logging: add more details and add generic exception handler [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1092458 (owner: 10Santhosh) [14:52:04] (03CR) 10AikoChou: [C:03+2] "Thanks for the review! :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1087446 (https://phabricator.wikimedia.org/T378939) (owner: 10AikoChou) [14:52:44] (03CR) 10Klausman: [C:03+1] metric_utils.py: add prometheus metrics for fetched/preprocessed size [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088242 (https://phabricator.wikimedia.org/T374034) (owner: 10AikoChou) [14:52:56] (03Merged) 10jenkins-bot: reference-quality: update output schema [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1087446 (https://phabricator.wikimedia.org/T378939) (owner: 10AikoChou) [15:08:31] 06Machine-Learning-Team: Move Lab machines into analytics net for DL access and switch to homedirs on Ceph - https://phabricator.wikimedia.org/T380279 (10klausman) 03NEW [15:35:26] 06Machine-Learning-Team, 07sre-alert-triage: Alert in need of triage: HelmfileAdminNGPendingChanges (instance deploy1003:9100) - https://phabricator.wikimedia.org/T380024#10336291 (10isarantopoulos) a:03klausman [15:36:26] 06Machine-Learning-Team: Move Lab machines into analytics net for DL access and switch to homedirs on Ceph - https://phabricator.wikimedia.org/T380279#10336308 (10isarantopoulos) p:05Triage→03Medium [15:37:11] 06Machine-Learning-Team, 07sre-alert-triage: Alert in need of triage: HelmfileAdminNGPendingChanges (instance deploy1003:9100) - https://phabricator.wikimedia.org/T380024#10336313 (10isarantopoulos) p:05Triage→03Medium [15:40:17] 06Machine-Learning-Team, 07sre-alert-triage: Alert in need of triage: HelmfileAdminNGPendingChanges (instance deploy1003:9100) - https://phabricator.wikimedia.org/T380024#10336333 (10klausman) This is two things: - service updates in the std namespace - a broken change for limitRanges. I will push the former... [15:44:03] 06Machine-Learning-Team, 05Goal: ml-lab: add ROCM 6.1 packages to WMF apt repo - https://phabricator.wikimedia.org/T375076#10336360 (10klausman) 05Open→03Resolved [15:47:36] klausman: o/ there is an alert for ml-lab1001's disk space [15:47:50] seen it passing by on #operations [15:49:00] 06Machine-Learning-Team: Update kserve to 0.13.1 - https://phabricator.wikimedia.org/T367048#10336387 (10isarantopoulos) [15:50:46] elukey:ack, on it [15:57:45] 06Machine-Learning-Team: ml-lab should have documentation - https://phabricator.wikimedia.org/T376974#10336475 (10isarantopoulos) 05Open→03Resolved [16:06:02] (03PS3) 10AikoChou: utils: decorators for logging the size of fetched and preprocessed data [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088231 (https://phabricator.wikimedia.org/T374034) [16:08:47] (03PS5) 10AikoChou: metric_utils.py: add prometheus metrics for fetched/preprocessed size [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088242 (https://phabricator.wikimedia.org/T374034) [16:10:17] (03CR) 10AikoChou: [C:03+2] utils: decorators for logging the size of fetched and preprocessed data [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088231 (https://phabricator.wikimedia.org/T374034) (owner: 10AikoChou) [16:11:36] (03CR) 10Ilias Sarantopoulos: [C:03+1] "Nice!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [16:11:41] (03PS5) 10Kevin Bazira: article-country: normalize score based on categories and properties [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) [16:17:11] (03Merged) 10jenkins-bot: utils: decorators for logging the size of fetched and preprocessed data [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088231 (https://phabricator.wikimedia.org/T374034) (owner: 10AikoChou) [16:19:19] (03CR) 10AikoChou: [C:03+2] "Thanks for the review!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088242 (https://phabricator.wikimedia.org/T374034) (owner: 10AikoChou) [16:22:47] [16:24:07] XD have a nice evening all! [16:28:08] (03Merged) 10jenkins-bot: metric_utils.py: add prometheus metrics for fetched/preprocessed size [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088242 (https://phabricator.wikimedia.org/T374034) (owner: 10AikoChou) [16:30:04] (03PS6) 10Kevin Bazira: article-country: normalize score based on categories and properties [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) [16:34:14] (03CR) 10Kevin Bazira: [C:03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [16:35:55] (03Merged) 10jenkins-bot: article-country: normalize score based on categories and properties [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1089646 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [16:42:45] 06Machine-Learning-Team, 06Data-Platform, 06Data-Platform-SRE: Create an Airflow instance for ML - https://phabricator.wikimedia.org/T380258#10336728 (10isarantopoulos) Thanks for the info. The Spark operator seems like a great choice but we'll get to that when we actually start developing stuff. It also dep... [16:42:53] night Aiko! [16:46:27] (03PS1) 10Ilias Sarantopoulos: Revert "llm: remove model-server" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1092873 [16:47:17] (03PS2) 10Ilias Sarantopoulos: Revert "llm: remove model-server" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1092873 [18:26:53] * isaranto afk! [18:44:49] 06Machine-Learning-Team, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Move Lab machines into analytics net for DL access and switch to homedirs on Ceph - https://phabricator.wikimedia.org/T380279#10337760 (10BTullis) [20:47:05] cmw [20:47:11] oops :)