[04:54:38] o/ [04:54:38] going to deploy patch that enables multiprocessing in kowiki-damaging: https://gerrit.wikimedia.org/r/1170447 [05:02:06] kowiki-damaging pods up and running in both codfw and eqiad [06:20:53] o/ nice [06:57:50] good morning. [06:59:47] good morning [07:10:12] good morning! [07:12:41] hi folks! [08:18:04] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#11019463 (10elukey) Thanks! I was able to provision the host and end up in Debian install, progress :) Of course now it fails since we didn't really configure any vali... [09:06:15] 06Machine-Learning-Team, 07Essential-Work: Reimplement the model-upload script to take into consideration new use cases - https://phabricator.wikimedia.org/T394301#11019568 (10BWojtowicz-WMF) As of this time, I've reimplemented the model-upload script in Python, tested its functionality and merged it into pupp... [10:03:26] 06Machine-Learning-Team, 06Research: Score probability evaluation for languages without enough data - https://phabricator.wikimedia.org/T398930#11019774 (10achou) Hi Miriam :) @diego has been working on a notebook for this. I created this task so we can link the finalized notebook here and have a place to summ... [10:08:31] 06Machine-Learning-Team, 06Research: Score probability evaluation for languages without enough data - https://phabricator.wikimedia.org/T398930#11019783 (10Miriam) a:03diego [10:09:03] 06Machine-Learning-Team, 06Research: Score probability evaluation for languages without enough data - https://phabricator.wikimedia.org/T398930#11019788 (10Miriam) Thank you @AikoChou ! [10:43:34] (03CR) 10AikoChou: "I'm not sure what the best way to handle this is. Adding more reviewers who may have better insights :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1169195 (https://phabricator.wikimedia.org/T399437) (owner: 10Kevin Bazira) [11:12:47] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, and 2 others: Update kserve to v0.15.2* on ML clusters - https://phabricator.wikimedia.org/T380722#11019987 (10BWojtowicz-WMF) I've managed to spin up a local cluster with `minikube`, following our [[ https://wikitech.wikimedia.... [13:12:24] * georgekyz Hey folks, I am investigating this one: https://phabricator.wikimedia.org/T399733 and I have a question. Is there any way to find the exact input request? Do we keep a request history ? [13:12:31] Hey folks, I am investigating this one: https://phabricator.wikimedia.org/T399733 and I have a question. Is there any way to find the exact input request? Do we keep a request history ? [13:23:05] oh sorry my bad, there is a ref id for revision, this is what we receive as request right? [13:25:50] yess, I think we should receive only rev_id and lang in the request for the reference models [13:31:18] yeap got it. thnx! I am trying to get familiar with the reference_quality model. I thought that the rev_id is an id_key which is connected with a full request (body,header,etc). But now I got it [14:26:27] 06Machine-Learning-Team, 07Essential-Work: Investigate reference-need-predictor alert triggered by BrokenProcessPool error - https://phabricator.wikimedia.org/T399733#11020459 (10gkyziridis) I tried to reproduce the error but it seems difficult. What I saw is that we received (almost) the same time 7 requests... [14:54:17] (03CR) 10Nik Gkountas: [C:03+2] Add tests for page-collection-groups [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1155666 (owner: 10Sbisson) [14:55:43] (03Merged) 10jenkins-bot: Add tests for page-collection-groups [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1155666 (owner: 10Sbisson) [15:11:53] georgekyz: plz use staging endpoints for such investigations ,we don't want to mess with the production pods [15:19:21] 06Machine-Learning-Team, 07Essential-Work: Investigate reference-need-predictor alert triggered by BrokenProcessPool error - https://phabricator.wikimedia.org/T399733#11020805 (10isarantopoulos) Looking at the [[ https://grafana.wikimedia.org/goto/KFdR1wUNR?orgId=1 | resources of the pod in grafana ]] that the... [15:20:06] isaranto: You are right! [22:19:07] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#11022309 (10jhathaway) @elukey, I was able to get ml-serve1012 to install the base os, after fixing the raid config, https://gerrit.wikimedia.org/...