[00:06:41] 06Machine-Learning-Team, 07Essential-Work: Orchestrate end-to-end tone-check pipeline using the TriggerDagRunOperator - https://phabricator.wikimedia.org/T406302#11252931 (10kevinbazira) >>! In T406302#11250084, @BTullis wrote: > I have deployed the new version of the airflow chart to the airflow-ml instance.... [00:39:16] 06Machine-Learning-Team, 07Essential-Work: Orchestrate end-to-end tone-check pipeline using the TriggerDagRunOperator - https://phabricator.wikimedia.org/T406302#11252998 (10kevinbazira) Now the `tone_check_retrain_dag` is failing with the error below: ` 'message': '0/21 nodes are available: 1 node(s) '... [06:40:31] good morning [06:54:26] good morning [07:14:30] o/ [07:14:44] https://docs.lmcache.ai/index.html looks really interesting, especially when coupled with vLLM [07:15:00] it can use an S3 backend for the cache blobs too, that is handy [07:15:22] could be a new project to experiment with for the new SRE during the next months :) [08:04:26] 06Machine-Learning-Team, 10Semantic Search: Semantic Search POC - In article QA - https://phabricator.wikimedia.org/T405359#11253532 (10OKarakaya-WMF) Sharing the results for the larger dataset below. I used evaluation model and the query model as same due to the limits on the cloud models. {F66738153} {F667... [08:04:43] 06Machine-Learning-Team, 10Semantic Search: Semantic Search POC - In article QA - https://phabricator.wikimedia.org/T405359#11253533 (10OKarakaya-WMF) [09:01:28] FIRING: [3x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [09:06:28] FIRING: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [13:06:43] FIRING: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [14:00:32] 06Machine-Learning-Team, 06Data-Platform-SRE, 13Patch-For-Review: Investigate Label functionality of AMD GPU device plugin on k8s - https://phabricator.wikimedia.org/T373806#11255268 (10elukey) > The node labeller component is a K8s controller, so it needs to run as a pod (it is unlikely that we can run it l... [17:06:43] FIRING: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [17:14:54] 06Machine-Learning-Team, 10EditCheck, 06Editing-team (Tracking): Build Tone Check Model feedback-based retraining pipeline - https://phabricator.wikimedia.org/T393103#11256240 (10ppelberg) [21:06:43] FIRING: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing