[06:53:42] hello folks [13:56:47] I have knative pods running on my minikube with our docker images! [13:56:54] stil not sure if they work of course [13:56:58] but they are running :D [14:05:24] but I just realized that naming is not ok, so I need to delete all and rebuild [14:05:29] * elukey cries in a corner [14:18:05] there, there [14:24:59] and here we are https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/692899 [14:25:10] I tested those on minikube, all containers up and running [15:28:58] wc -l kfserving.yaml [15:28:58] 66658 kfserving.yaml [15:29:00] ahahhahaha [15:29:31] ah no wait it is wrong [15:31:09] 16351 kfserving.yaml [15:33:10] That is slightly less horrific [15:42:25] I am wondering if Andy is testing full kubeflow on minikube of only kfserving [15:42:51] because I remember a discussion about version 1.3 having issues, but the last one for kfserving is 0.5.1 [15:46:38] Is Andy still using minikube on GCP? [15:49:23] hey all -- nope im running MiniKF (minikube + kubeflow) on a EC2 vm on aws now [15:54:36] we summon a [15:54:41] Andy and there you go :D [15:54:50] :) [15:54:57] accraze: morning :) [15:55:26] I was following https://github.com/kubeflow/kfserving/blob/master/hack/quick_install.sh#L68\ [15:55:44] what is the plan for the final stack for lift wing? [15:57:40] what I mean is - should me and Tobias keep working on Kfserving only or did we change idea and we want full kubeflow? [16:00:00] ehh for lift wing I think we still want just KFServing for now [16:01:01] its easier for me to spin up a full KF install on a cloud vm rather than setting up all up manually [16:01:57] ah yes yes it makes sense, I just wanted to double check [16:02:10] we are proceeding in parallel and I want to make sure that we don't end up working on different things :D [16:02:20] haha yeah good call to chedck! [16:02:23] (or with the wrong expectations about what the other is doing) [16:02:23] check* [16:02:27] *others are [16:04:08] right now i have a small cluster running KFv1.3 and another one running KFv1.1 ... i'm jumping between the two studying the differences between the versions re: kfserving and the gateway stuff [16:05:07] from what I see on the kfserving repo the last version is 0.5.1 [16:05:19] that I assume it is included in 1.3 [16:05:24] (kubeflow I mean) [16:05:26] yep that's correct [16:06:09] Kubeflow 1.3 includes KFServing v0.5. [16:06:11] yep yep [16:06:15] the latest version of kfserving is really nice [16:06:34] so ok to target that one? Cc: klausman [16:07:06] the other thing that I don't know is if we need istio 1.6.x or 1.9.x [16:07:29] yeah that's what i've been trying to figure out [16:08:03] what a mess [16:08:09] it seems like 1.9.x might be more simple due to getting rid of the cluster-internal-gateway [16:08:46] As long as we can staisfy all deps, we should use the latest versions available. [16:09:00] I started from https://phabricator.wikimedia.org/T278194#6964746 [16:09:03] I don't think there is much in the way of a "proven" collection of versions [16:09:35] klausman: agreed, especially since we're in a green fields project, we should avoid getting stuck on an older version [16:09:50] yeah but check what it was suggested from an upstream maintainer :D [16:10:13] this is true [16:10:31] accraze: the cluster-internal-gw mentioned above is the "local" one that is mentioned in the phab task? [16:11:07] because if so it seems replaced by a knative gw that is not shipped with 0.18 (that is the max version that we can use with k8s 1.16 IIUC) [16:11:29] ahhh hmmm... [16:12:05] now what I am not getting is why you are seeing the issue, since it should be related to knative [16:12:20] and IIRC from the list of deps that you posted for the AWS instance we are not using 0.19+ right? [16:13:40] from https://github.com/kubeflow/kfserving/releases/tag/v0.5.0 I see [16:13:44] * accraze looks at his notes again [16:13:53] The minimum required versions are Kubernetes 1.16 and Istio 1.3.1/Knative 0.14.3 [16:13:56] that is good [16:15:29] ahhh yup you are right, so my KFv.1.3 (newest version) sandbox is running istio 1.9.0 and knative 0.14.3 on k8s 1.16 [16:15:57] so we may not need istio v1.9.0 ...? [16:16:34] nono the more recent the better [16:16:55] but now I'd like to understand the problem with cluster-internal-gateway [16:17:05] or local, not sure how it is called [16:17:49] haha yeah.... lemme dig into it a bit here, i'm starting to get some details mixed up and my notes are semi-unintelligble lol [16:18:22] Note: cluster-local-gateway is required to serve cluster-internal traffic for transformer and explainer use cases (unless we are running v0.19.0). [16:18:25] Please follow instructions here to install cluster local gateway. [16:18:27] this is in the knative task [16:18:53] so IIUC we need to deploy the local geteway unless kfserving 0.19 is used [16:18:53] yeah i must not be using the cluster-local-gw on the new sandbox [16:21:05] okok lemme know if this may explain your current issues then [16:21:19] if so we could in theory target [16:21:32] kfserving 0.18.1 + istio 1.9.x + kfserving 0.5 [16:21:48] (in case I'll update my images/tests for a more recent istio version) [16:24:59] elukey: i'll double check today, but those targets _should_ work [16:25:11] (on k8s 1.16) [16:25:39] perfect [16:25:56] I cannot make 1.16 + minikube to run on bullseye for some reason, currently testing images on 1.20 sadly [16:35:49] haha i just checked and i'm still running xenial (Ubuntu 16.04.7 LTS) on my sandbox vms [16:50:24] Warning FailedMount 26s (x7 over 58s) kubelet MountVolume.SetUp failed for volume "cert" : secret "kfserving-webhook-server-cert" not found [16:50:48] I thought that certmanager was optional.. [16:54:01] elukey: [16:54:25] you still need to have a secret for the CA cert if we do the self-signed CA [16:54:35] see: https://github.com/kubeflow/kfserving/blob/master/hack/self-signed-ca.sh [16:55:02] that script can create the kfserving-webhook-server-cert [16:57:53] also confirming that i am still using the istio cluster-local-gw on my KFv1.3 sandbox [17:09:28] 10Lift-Wing, 10Machine-Learning-Team: Install certmanager on ml-serve cluster (if needed) - https://phabricator.wikimedia.org/T280661 (10elukey) [17:12:07] 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Build article quality model for Ukrainian Wikipedia - https://phabricator.wikimedia.org/T251571 (10calbon) 05Open→03Resolved [17:12:11] 10ORES, 10artificial-intelligence, 10articlequality-modeling, 10drafttopic-modeling, and 2 others: ORES deployment Late July 2020 - https://phabricator.wikimedia.org/T258435 (10calbon) [17:12:13] 10Machine-Learning-Team, 10artificial-intelligence, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2020): Proposal (GSoC 2020): Implement articlequality and draftquality model for ptwiki and apply insights to models for bs, uk, hi wikis - https://phabricator.wikimedia.org/T247847 (10calbon) [17:12:39] 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Build article quality model for Ukrainian Wikipedia - https://phabricator.wikimedia.org/T251571 (10calbon) I think was deployed. Marking as resolved. Let me know if this isn't true. [17:24:01] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Lift Wing proof of concept - https://phabricator.wikimedia.org/T272917 (10calbon) [17:24:17] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Find a way to store models for Kubeflow - https://phabricator.wikimedia.org/T280025 (10calbon) 05Open→03Resolved a:03calbon [17:51:08] 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Build article quality model for Ukrainian Wikipedia - https://phabricator.wikimedia.org/T251571 (10Halfak) Confirmed. https://ores.wikimedia.org/v3/scores/ukwiki shows `articlequality` version 0.8.0. [18:03:50] * elukey afk! [18:05:43] WMF ML office hours live stream is happening now: https://www.twitch.tv/wikimediaml [18:06:12] this weeks topic is about data labeling [22:17:36] 10Lift-Wing, 10artificial-intelligence, 10revscoring, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Load a revscoring model into KFServing - https://phabricator.wikimedia.org/T279000 (10ACraze) @kevinbazira I reviewed your deployed inference service on the KFv1.1 sandbox. So far great progr... [23:25:52] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Load outlinks topic model model in to KFServing - https://phabricator.wikimedia.org/T276862 (10ACraze) Quick update: I've been doing some testing over the past couple of days and have noticed a timeout issue when testing high throughput loads (like 50-100 ca...