[06:45:15] (03CR) 10Kevin Bazira: article-country: fix duplicate wikidata-related predictions and omitted category-related predictions (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1122904 (https://phabricator.wikimedia.org/T387275) (owner: 10Kevin Bazira) [08:52:06] (03CR) 10AikoChou: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1122904 (https://phabricator.wikimedia.org/T387275) (owner: 10Kevin Bazira) [09:03:21] (03PS3) 10Kevin Bazira: article-country: fix duplicate wikidata-related predictions and omitted category-related predictions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1122904 (https://phabricator.wikimedia.org/T387275) [09:07:48] (03CR) 10Kevin Bazira: [C:03+2] article-country: fix duplicate wikidata-related predictions and omitted category-related predictions (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1122904 (https://phabricator.wikimedia.org/T387275) (owner: 10Kevin Bazira) [09:08:32] (03Merged) 10jenkins-bot: article-country: fix duplicate wikidata-related predictions and omitted category-related predictions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1122904 (https://phabricator.wikimedia.org/T387275) (owner: 10Kevin Bazira) [09:44:00] thanks for the reviews, Aiko! o/ [09:44:01] going to deploy the latest article-country image on LW staging ... [09:50:20] klausman: o/, I am in the deployment server, specifically: `/srv/deployment-charts/helmfile.d/ml-services/article-models` [09:50:50] when I run `helmfile -e ml-staging-codfw diff`, I was expecting to see only changes to the article-country image but I am seeing [09:50:50] undeployed changes from `chart: kserve-inference-0.4.12` to `chart: kserve-inference-0.4.15`. should I proceed with the deployment? [09:54:27] I think that may be stuff Luca was working on. elukey ^^^ [09:55:18] So hold off for now. I suspect it may be fine (and in staging, breaking stuff isn't super critical), but it's better that we sync with his work [09:56:05] okok ... pausing work on this for now untill I get a greenlight [09:59:41] hey folks! [09:59:55] if you only see a chart diff is fine, I reverted yesterday and it is the result [10:00:06] the important bit is that you don't see unexpected changes in configs etc.. [10:01:46] ack, ty! [10:04:31] klausman: I've sent a patch for the knative images, hope it is the last one [10:04:37] I found a way to run unit tests and they all pass [10:20:03] (03PS1) 10AikoChou: reference-quality: fix reference models from getting unnecessary data from mwapi [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1123304 (https://phabricator.wikimedia.org/T387019) [10:26:46] (03PS2) 10AikoChou: reference-quality: fix reference models from getting unnecessary data from mwapi [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1123304 (https://phabricator.wikimedia.org/T387019) [10:38:33] thank you for clarifying Luca. klausman: should I proceed with the deployment? [10:38:56] yes, go ahead, if the only change is the chart version (and whatever you're deploying) [10:39:05] okok ... [10:43:15] the article-country model-server is up and running in staging [10:43:15] ``` [10:43:15] kube_env article-models ml-staging-codfw [10:43:15] $ kubectl get pods [10:43:15] NAME READY STATUS RESTARTS AGE [10:43:15] article-country-predictor-00007-deployment-5d5bb76656-pjncg 3/3 Running 0 109s [10:43:15] ``` [10:43:25] nice work! [10:49:50] 10Lift-Wing, 06Machine-Learning-Team: Fix duplicate wikidata-related predictions and omitted category-related predictions - https://phabricator.wikimedia.org/T387275#10586473 (10kevinbazira) >>! In T387275#10583443, @Isaac wrote: > Works for me - thanks @kevinbazira ! np! this fix has been deployed into stagi... [11:19:12] one thing that I am wondering is if the docker version diffs between bullseye and bookworm may have a played a role in the production issue with seccomp [11:19:19] https://github.com/moby/moby/commits/master/profiles/seccomp/default.json [11:19:44] this should be what Docker uses, via eBPF, but in staging we have all bookworms afaics [11:19:51] and in prod a mixture of bookworm and bullseye [13:31:29] Yes, staging is all bookwom. In prod, only the GPU hosts are bookworm machines. I've been meaning to reimage the bullseye hosts to bookworm, but other things were more urgent [13:57:19] yep I remember, I am wondering if the docker version impacts the istio issue [13:57:49] namely, if a pod scheduled on bullseye behaves differently [14:40:00] 10Lift-Wing, 06Machine-Learning-Team: Fix duplicate wikidata-related predictions and omitted category-related predictions - https://phabricator.wikimedia.org/T387275#10587126 (10Isaac) Quick checks and skim of code look good to me @kevinbazira - thanks! [15:24:27] 10Lift-Wing, 06Machine-Learning-Team: Fix duplicate wikidata-related predictions and omitted category-related predictions - https://phabricator.wikimedia.org/T387275#10587350 (10kevinbazira) Thank you for the confirmation @Isaac. The fix has now been deployed in production. ` # pod running in eqiad $ kube_env... [15:33:18] Good morning all [16:18:18] o/ morning Chris [22:19:21] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [22:24:22] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent