[07:27:41] <isaranto>	 o/ morning!
[08:00:28] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: llm: try prebuilt flash attn package [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344)
[08:02:09] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "I'd like to make this attempt and see if it works. We will have to create a new base image with either pytorch 2.5.1/ROCm6.1 or Pytorch 2." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) (owner: 10Ilias Sarantopoulos)
[08:02:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] llm: try prebuilt flash attn package [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) (owner: 10Ilias Sarantopoulos)
[08:53:15] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: llm: add aya to __init__ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100997 (https://phabricator.wikimedia.org/T379052)
[09:11:17] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+1] llm: add aya to __init__ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100997 (https://phabricator.wikimedia.org/T379052) (owner: 10Ilias Sarantopoulos)
[09:12:12] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: add aya to __init__ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100997 (https://phabricator.wikimedia.org/T379052) (owner: 10Ilias Sarantopoulos)
[09:12:57] <wikibugs>	 (03Merged) 10jenkins-bot: llm: add aya to __init__ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100997 (https://phabricator.wikimedia.org/T379052) (owner: 10Ilias Sarantopoulos)
[09:13:19] <kevinbazira>	 o/
[09:13:22] <kevinbazira>	 TIL: AWQ quantized models end up being converted to GPTQ format model at load time:
[09:13:22] <kevinbazira>	 https://huggingface.co/docs/optimum/main/en/amd/amdgpu/overview#awq-quantization
[09:17:55] <isaranto>	 TIL as well
[09:18:53] <isaranto>	 the docs about AMD rocm usage seem to be all over the place. It would be nice if we create something of our own from what we've learned
[09:50:43] <kevinbazira>	 yep, that would be nice.
[09:50:43] <kevinbazira>	 I've been looking at Flash Attention and IIRC you run into issues with it, did you manage to build it from source?
[09:50:43] <kevinbazira>	 https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html#getting-started
[10:02:48] <isaranto>	 yes we did https://phabricator.wikimedia.org/T371344#10368602
[10:03:29] <isaranto>	 my first attemp to deploy aya with bitsandbytes failed. need to fix the way the llm model server loads the class
[10:03:32] * isaranto afk for 1h
[10:08:04] <kevinbazira>	 T371344#10368602 nice!
[12:26:56] <wikibugs>	 07artificial-intelligence, 06Machine-Learning-Team, 10Bad-Words-Detection-System, 10revscoring: Language assets for Azerbaijani - https://phabricator.wikimedia.org/T162014#10386299 (10Nemoralis) @Halfak Hi! I might help you with this. It looks like labels.wmflabs.org is not working
[12:30:11] <wikibugs>	 07artificial-intelligence, 06Machine-Learning-Team, 10Edit-Review-Improvements-RC-Page, 10editquality-modeling, and 2 others: Add new recent changes filters to az.wiki - https://phabricator.wikimedia.org/T310691#10386312 (10Nemoralis) labels.wmflabs.org is not working
[12:43:15] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: llm: fix circular imports and LLM_CLASS paths [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039
[12:44:24] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: llm: fix circular imports and LLM_CLASS paths [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039
[12:48:17] <isaranto>	 the above module may need more work to make it more robust, I filed a fix for the time being, which I tested locally
[13:15:35] <wikibugs>	 (03CR) 10Nik Gkountas: "I have submitted an alternative approach in this patch here (I22466465eaceef43454f25b737c80291705265af), that includes a more extensive re" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1100055 (https://phabricator.wikimedia.org/T381366) (owner: 10Santhosh)
[13:36:19] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+1] llm: fix circular imports and LLM_CLASS paths [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039 (owner: 10Ilias Sarantopoulos)
[13:46:08] <isaranto>	 kevinbazira: thank u for the review(s) and sorry for the noise!
[13:46:16] <isaranto>	 trying to give this a shot
[13:47:17] <isaranto>	 a little context on the flash attention front: I created a fork of the rocm/flash attention repo with the sole purpose of making a release of the package I've built on ml-labs
[13:47:18] <isaranto>	 https://github.com/isaranto/flash-attention/releases/tag/v2.7.0-py3.11
[13:48:05] <kevinbazira>	 isaranto: no problem. can't wait to see this running! :)
[13:48:25] <isaranto>	 the upstream instructions + code produce an .egg release which is an old build format. I want to convert that to a python wheel (.whl) so that I can pip install in https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1100995
[13:49:01] <isaranto>	 i did the conversion with `wheel convert PACKAGE_NAME.egg` but for the moment I get ERROR: flash_attn-2.7.0.post2-py311-cp311-linux_x86_64.whl is not a supported wheel on this platform
[13:49:55] <isaranto>	 I can check it again later but fi anyone wants to give it a shot as well you're more than welcome :) 
[13:50:23] <isaranto>	 in any case I will update the flash attn task later with my recent findings, failures and successes :P
[13:52:13] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] "Let's give it a try..." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039 (owner: 10Ilias Sarantopoulos)
[13:52:58] <wikibugs>	 (03Merged) 10jenkins-bot: llm: fix circular imports and LLM_CLASS paths [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039 (owner: 10Ilias Sarantopoulos)
[15:18:02] <wikibugs>	 06Machine-Learning-Team, 10Structured-Data-Backlog (Current Work): [M] Create the logo detection model card - https://phabricator.wikimedia.org/T370759#10386727 (10mfossati) 05In progress→03Resolved Published at https://meta.wikimedia.org/wiki/Machine_learning_models/Production/gogologo. Closing.
[16:14:17] <isaranto>	 argh I used a wrong argument. device instead of device_map
[16:17:39] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: llm: fix device_map argument [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101079
[16:20:27] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: fix device_map argument [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101079 (owner: 10Ilias Sarantopoulos)
[17:02:46] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: [LLM] quantization: allow loading model weights as int8/int4 with HF - https://phabricator.wikimedia.org/T377848#10387148 (10isarantopoulos) I took a shot at deploying aya8b with bitsandbytes on lift wing using the LLM model server and got the following failures which I wo...
[17:32:42] <isaranto>	 going afk folks, have a nice weekend!
[19:50:01] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: [LLM] quantization: allow loading model weights as int8/int4 with HF - https://phabricator.wikimedia.org/T377848#10387649 (10isarantopoulos) **bitsandbytes**  It seems that `rocminfo` runs and there is no option for the user to provide the GPU architecture info otherwise....
[21:36:28] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work, 13Patch-For-Review: Request to host article-country model on Lift Wing - https://phabricator.wikimedia.org/T371897#10387931 (10Isaac) @kevinbazira one last thing and then I think good to move onto the last stage: English seems to be hard-coded as the langua...