[07:09:17] o/ thanks for the review, Ilias [07:09:17] going to deploy now ... [07:09:17] friday deployment 🤞 [07:32:41] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Stop publishing events without article-country predictions - https://phabricator.wikimedia.org/T385771#10531220 (10kevinbazira) article-country model-server nolonger publishes events when prediction results are empty - new model-server deployed in LiftWing pr... [07:33:01] ^-- deployment looks good. now all article-country events have predictions results. [07:51:27] o/ good morning Kevin [08:30:02] o/ good morning folks [08:37:22] o\ Goedemorgen George [08:37:34] *\o [08:50:31] kalimera ilias [08:57:32] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Quantize aya-expanse-32B with GPTQ (GPTQModel) - https://phabricator.wikimedia.org/T384734#10531276 (10isarantopoulos) The build/installation [[ https://github.com/ModelCloud/GPTQModel/issues/1222#issuecomment-2641758113 | issue has been resolved ]].... [09:12:25] georgekyz: they fixed the issue with GPTQmodel right away, I tested and it works! [09:13:13] isaranto: thats cool! So now we can install GTPQmodel from source without using the wheel right? [09:15:07] yes [09:41:07] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Quantize aya-expanse-32B with GPTQ (GPTQModel) - https://phabricator.wikimedia.org/T384734#10531327 (10isarantopoulos) GPTModel is integrated in [[ https://github.com/huggingface/optimum/releases/tag/v1.24.0 | hf optimum in release v1.24.0 ]]. Copying... [10:17:21] quantized aya has some sense of humour: [10:17:21] ``` [10:17:21] - "question":"Wikimedia foundation is", [10:17:21] - "answer":"Wikimedia foundation is the place for dating site.\nThe Wikimedia movement is a group of people who are passionate about" [10:17:21] ``` [10:27:26] lollll [11:09:00] lol that says a lot about the quantization quality I guess [11:18:06] Btw, I found out what the BOS_TOKEN is: Beginning of sentence. It comes form tokenizers. Some have options to turn it off (there is also EOS_TOKEN, end of sentence) [11:19:04] AIUI, it's also relevant to inputs, but I haven't dug into that [11:43:39] klausman: Yeap, this is coming from the tokenizer, for now I am using the `AutoTokenizer` from the `transformers` library. [11:47:55] georgekyz: I was thinking if the issue we god with all the padding tokens was related to the tokenizer. for example if we were loading a different one or sth similar [11:52:57] * klausman lunch [12:10:27] isaranto: To be honest I still do not understand what happened with the padding words. I am now using a technique which first tokenizes the data and then quantize the model. Based on my recent experiments the padding issue happened only on a single experiment with this config: [12:10:27] ``` [12:10:27] bits: 8 [12:10:27] batch: 4 [12:10:27] text_size: 512 [12:10:27] n_samples: 512 [12:10:28] data: allenai/c4 [12:10:28] ``` [12:11:13] ack! [12:23:26] TIL https://docs.docker.com/build/bake/ [12:44:46] isaranto: o/ how do you check the resource usage for a pod? [12:45:16] grafana would be the easier, lemme find the link [12:45:28] *easiest way [12:46:42] https://grafana.wikimedia.org/d/-D2KNUEGk/kubernetes-pod-details?orgId=1&var-datasource=codfw+prometheus%2Fk8s-mlserve&var-namespace=revision-models&var-pod=reference-quality-predictor-00006-deployment-76769b7487-q7xjn&var-container=All&from=1738928787029&to=1738932387030 [12:47:26] in this dashboard you don't see the total per pod but you do get a breakdown per container [12:47:49] georgekyz: don't remember if we've shown you these dashboards --^ [12:48:18] thanks!! that's what I'm looking for [13:51:16] o/ I've added the example from yesterday's meeting to the LW stream docs: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Streams#Life_cycle_of_a_LiftWing_event [13:51:16] here is the code that renders the video: https://gist.github.com/kevinbazira/51cc8289485670682cb2f216f6ffc289 [13:59:43] oh nice! I want to try manim . thank you! [14:04:35] it's super cool! [15:12:47] 06Machine-Learning-Team, 13Patch-For-Review, 10Release Pipeline (Blubber): Adding uv as a package manager on Lift Wing/blubber - https://phabricator.wikimedia.org/T384584#10531962 (10taavi) [16:15:24] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Quantize aya-expanse-32B with GPTQ (GPTQModel) - https://phabricator.wikimedia.org/T384734#10532100 (10gkyziridis) | Experiment | Dataset | Batch | Bits | TextSize | NSamples | Results | No-padding-tokens |... [16:23:53] Nice work, George! the 2bit vs 8bit results are really interesting [16:31:57] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Stop publishing events without article-country predictions - https://phabricator.wikimedia.org/T385771#10532170 (10Isaac) @kevinbazira moving a slack conversation to here: what's the motivation behind not publishing these events? my two concerns (and let me k... [16:52:17] isaranto: yeah... I really do not understand what happened over there... I will definitely rerun it with the same configuration to reproduce it, but after finishing the main experiments. In general I think we need to play around the `batch size`, `nsamples` and `textsize` keeping the quantization `bits` stable to 4. [17:32:24] ¯\_(ツ)_/¯