[07:09:17] <kevinbazira>	 o/ thanks for the review, Ilias
[07:09:17] <kevinbazira>	 going to deploy now ...
[07:09:17] <kevinbazira>	 friday deployment 🤞
[07:32:41] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Stop publishing events without article-country predictions - https://phabricator.wikimedia.org/T385771#10531220 (10kevinbazira) article-country model-server nolonger publishes events when prediction results are empty - new model-server deployed in LiftWing pr...
[07:33:01] <kevinbazira>	 ^-- deployment looks good. now all article-country events have predictions results.
[07:51:27] <isaranto>	 o/ good morning Kevin
[08:30:02] <georgekyz>	 o/ good morning folks
[08:37:22] <isaranto>	 o\ Goedemorgen George 
[08:37:34] <isaranto>	 *\o
[08:50:31] <georgekyz>	 kalimera ilias
[08:57:32] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Quantize aya-expanse-32B with GPTQ (GPTQModel) - https://phabricator.wikimedia.org/T384734#10531276 (10isarantopoulos) The build/installation [[ https://github.com/ModelCloud/GPTQModel/issues/1222#issuecomment-2641758113 | issue has been resolved  ]]....
[09:12:25] <isaranto>	 georgekyz: they fixed the issue with GPTQmodel right away, I tested and it works!
[09:13:13] <georgekyz>	 isaranto: thats cool! So now we can install GTPQmodel from source without using the wheel right? 
[09:15:07] <isaranto>	 yes
[09:41:07] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Quantize aya-expanse-32B with GPTQ (GPTQModel) - https://phabricator.wikimedia.org/T384734#10531327 (10isarantopoulos) GPTModel is integrated in [[ https://github.com/huggingface/optimum/releases/tag/v1.24.0 | hf optimum in release v1.24.0 ]]. Copying...
[10:17:21] <georgekyz>	 quantized aya has some sense of humour: 
[10:17:21] <georgekyz>	 ```
[10:17:21] <georgekyz>	 - "question":"Wikimedia foundation is",
[10:17:21] <georgekyz>	 - "answer":"<BOS_TOKEN>Wikimedia foundation is the place for dating site.\nThe Wikimedia movement is a group of people who are passionate about"
[10:17:21] <georgekyz>	 ```
[10:27:26] <aiko>	 lollll
[11:09:00] <isaranto>	 lol that says a lot about the quantization quality I guess
[11:18:06] <klausman>	 Btw, I found out what the BOS_TOKEN is: Beginning of sentence. It comes form tokenizers. Some have options to turn it off (there is also EOS_TOKEN, end of sentence)
[11:19:04] <klausman>	 AIUI, it's also relevant to inputs, but I haven't dug into that
[11:43:39] <georgekyz>	 klausman: Yeap, this is coming from the tokenizer, for now I am using the `AutoTokenizer` from the `transformers` library.
[11:47:55] <isaranto>	 georgekyz: I was thinking if the issue we god with all the padding tokens was related to the tokenizer. for example if we were loading a different one or sth similar
[11:52:57] * klausman lunch
[12:10:27] <georgekyz>	 isaranto: To be honest I still do not understand what happened with the padding words. I am now using a technique which first tokenizes the data and then quantize the model. Based on my recent experiments the padding issue happened only on a single experiment with this config:
[12:10:27] <georgekyz>	 ```
[12:10:27] <georgekyz>	 bits: 8
[12:10:27] <georgekyz>	 batch: 4
[12:10:27] <georgekyz>	 text_size: 512
[12:10:27] <georgekyz>	 n_samples: 512
[12:10:28] <georgekyz>	 data: allenai/c4
[12:10:28] <georgekyz>	 ```
[12:11:13] <isaranto>	 ack!
[12:23:26] <isaranto>	 TIL https://docs.docker.com/build/bake/
[12:44:46] <aiko>	 isaranto: o/ how do you check the resource usage for a pod?
[12:45:16] <isaranto>	 grafana would be the easier, lemme find the link
[12:45:28] <isaranto>	 *easiest way 
[12:46:42] <isaranto>	 https://grafana.wikimedia.org/d/-D2KNUEGk/kubernetes-pod-details?orgId=1&var-datasource=codfw+prometheus%2Fk8s-mlserve&var-namespace=revision-models&var-pod=reference-quality-predictor-00006-deployment-76769b7487-q7xjn&var-container=All&from=1738928787029&to=1738932387030
[12:47:26] <isaranto>	 in this dashboard you don't see the total per pod but you do get a breakdown per container
[12:47:49] <isaranto>	 georgekyz: don't remember if we've shown you these dashboards --^
[12:48:18] <aiko>	 thanks!! that's what I'm looking for 
[13:51:16] <kevinbazira>	 o/ I've added the example from yesterday's meeting to the LW stream docs: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Streams#Life_cycle_of_a_LiftWing_event
[13:51:16] <kevinbazira>	 here is the code that renders the video: https://gist.github.com/kevinbazira/51cc8289485670682cb2f216f6ffc289
[13:59:43] <isaranto>	 oh nice! I want to try manim . thank you!
[14:04:35] <kevinbazira>	 it's super cool!
[15:12:47] <wikibugs>	 06Machine-Learning-Team, 13Patch-For-Review, 10Release Pipeline (Blubber): Adding uv as a package manager on Lift Wing/blubber - https://phabricator.wikimedia.org/T384584#10531962 (10taavi)
[16:15:24] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Quantize aya-expanse-32B with GPTQ (GPTQModel) - https://phabricator.wikimedia.org/T384734#10532100 (10gkyziridis) | Experiment     |      Dataset    |  Batch |   Bits |      TextSize  |   NSamples     |     Results             |  No-padding-tokens  |...
[16:23:53] <isaranto>	 Nice work, George! the 2bit vs  8bit  results are really interesting
[16:31:57] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Stop publishing events without article-country predictions - https://phabricator.wikimedia.org/T385771#10532170 (10Isaac) @kevinbazira moving a slack conversation to here: what's the motivation behind not publishing these events? my two concerns (and let me k...
[16:52:17] <georgekyz>	 isaranto: yeah... I really do not understand what happened over there... I will definitely rerun it with the same configuration to reproduce it, but after finishing the main experiments. In general I think we need to play around the `batch size`, `nsamples` and `textsize` keeping the quantization `bits` stable to 4. 
[17:32:24] <isaranto>	 ¯\_(ツ)_/¯