[04:39:44] FIRING: LiftWingServiceErrorRate: ... [04:39:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=ptwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:34:44] RESOLVED: LiftWingServiceErrorRate: ... [06:34:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=ptwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:50:32] good morning folks o/ [09:14:53] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Update the article-country isvc to use Wikilinks for predictions - https://phabricator.wikimedia.org/T385970 (10kevinbazira) 03NEW [09:47:16] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Stop publishing events without article-country predictions - https://phabricator.wikimedia.org/T385771#10534722 (10kevinbazira) Hi @Isaac, my understanding is that the current article-country predictions that are empty do not give the full picture because the... [10:35:41] o/ Folks I think I am gonna stop experimenting with the `allenai/c4` dataset, I have pasted the results here -> https://phabricator.wikimedia.org/T384734#10532100 [10:35:41] I will continue the simulations using different and bigger datasets such as: `wikitext-103-raw-v1` which is used for llm training for text generation but for calibration and quantization as well. [10:44:29] \o [10:49:19] do you think we'll have better results with the wikitext dataset? [10:51:12] tbh for our initial experimentation both seem fine [11:02:56] yes they seem to be good indeed, but I would like to see how it goes with a big dataset [11:08:55] ack [11:09:49] isn't c4 bigger than the wikitext dataset? although since we use a small sample of data diversity and quality of these samples might matter more than size [11:12:49] anyway it is an interesting topic with no right answer so we can chat about it after we have some results! [11:12:53] thanks for sharing [12:30:40] isaranto: I am refering to `wikitext-103-raw-v1` which is 1.81B rows. Check this: [12:30:40] ``` [12:30:40] In [10]: c4 [12:30:40] Out[10]: [12:30:40] Dataset({ [12:30:40] features: ['text', 'timestamp', 'url'], [12:30:41] num_rows: 356318 [12:30:42] }) [12:30:42] In [11]: wiki2_raw_v1 [12:30:43] Out[11]: [12:30:43] Dataset({ [12:30:44] features: ['text'], [12:30:44] num_rows: 36718 [12:30:45] }) [12:31:00] https://www.irccloud.com/pastebin/febyEqV7/ [12:38:27] that is 1.8 M. the c4 dataset is much larger. I think en samples are ~300M [12:38:45] perhaps in the code above you're looking at only one file /chuck [12:45:14] yes we were training it using the data_file: [12:45:20] https://www.irccloud.com/pastebin/HEMJq9wR/ [12:45:52] taken from their example [12:46:40] I will run simulation using the entire c4 [12:49:32] isaranto: If you have any other idea about the dataset ping me [12:53:32] ok! I don't have anything specific in mind at the moment. getting a diverse sample of 1k would do for now. And indeed the wikitext dataset might be more suitable to our use cases [13:09:54] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Stop publishing events without article-country predictions - https://phabricator.wikimedia.org/T385771#10535342 (10Isaac) @kevinbazira -- thanks for clarifying. That makes a lot of sense and apologies because I now see that I missed that piece in the task des... [15:22:56] good morning! [15:40:34] morning Chris! [16:25:08] https://docs.streamlit.io/develop/quick-reference/release-notes#version-1420-latest [16:27:11] --^ Streamlit has added authentication which is great! perhaps we could use it for some internal POCs [17:19:46] * isaranto afk [20:56:48] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Update the article-country isvc to use Wikilinks for predictions - https://phabricator.wikimedia.org/T385970#10536767 (10Isaac) Thanks for kicking this off! Quick thoughts: * Here's the [[https://en.wikipedia.org/w/api.php?action=query&generator=links&titles=... [21:40:22] 07artificial-intelligence: Use AI to automatically generate edit summaries - https://phabricator.wikimedia.org/T334598#10536905 (10Pppery) →14Duplicate dup:03T14411 [21:40:25] 07artificial-intelligence, 10MediaWiki-Page-editing: Automatic edit summary generation based on analyzing the change made - https://phabricator.wikimedia.org/T14411#10536907 (10Pppery) [21:42:30] 07artificial-intelligence, 10MediaWiki-Page-editing: Automatic edit summary generation based on analyzing the change made - https://phabricator.wikimedia.org/T14411#10536913 (10Pppery) [21:42:51] 07artificial-intelligence, 10MediaWiki-Page-editing: Automatic edit summary generation based on analyzing the change made - https://phabricator.wikimedia.org/T14411#10536914 (10Pppery) [22:24:01] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Update the article-country isvc to use Wikilinks for predictions - https://phabricator.wikimedia.org/T385970#10537055 (10Isaac) Looping in @dcausse as well for context/guidance: as you can see, the plan for the article-country model is to add this additional...