[07:27:41] o/ morning! [08:00:28] (03PS1) 10Ilias Sarantopoulos: llm: try prebuilt flash attn package [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) [08:02:09] (03CR) 10Ilias Sarantopoulos: "I'd like to make this attempt and see if it works. We will have to create a new base image with either pytorch 2.5.1/ROCm6.1 or Pytorch 2." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) (owner: 10Ilias Sarantopoulos) [08:02:22] (03CR) 10CI reject: [V:04-1] llm: try prebuilt flash attn package [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) (owner: 10Ilias Sarantopoulos) [08:53:15] (03PS1) 10Ilias Sarantopoulos: llm: add aya to __init__ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100997 (https://phabricator.wikimedia.org/T379052) [09:11:17] (03CR) 10Kevin Bazira: [C:03+1] llm: add aya to __init__ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100997 (https://phabricator.wikimedia.org/T379052) (owner: 10Ilias Sarantopoulos) [09:12:12] (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: add aya to __init__ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100997 (https://phabricator.wikimedia.org/T379052) (owner: 10Ilias Sarantopoulos) [09:12:57] (03Merged) 10jenkins-bot: llm: add aya to __init__ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100997 (https://phabricator.wikimedia.org/T379052) (owner: 10Ilias Sarantopoulos) [09:13:19] o/ [09:13:22] TIL: AWQ quantized models end up being converted to GPTQ format model at load time: [09:13:22] https://huggingface.co/docs/optimum/main/en/amd/amdgpu/overview#awq-quantization [09:17:55] TIL as well [09:18:53] the docs about AMD rocm usage seem to be all over the place. It would be nice if we create something of our own from what we've learned [09:50:43] yep, that would be nice. [09:50:43] I've been looking at Flash Attention and IIRC you run into issues with it, did you manage to build it from source? [09:50:43] https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html#getting-started [10:02:48] yes we did https://phabricator.wikimedia.org/T371344#10368602 [10:03:29] my first attemp to deploy aya with bitsandbytes failed. need to fix the way the llm model server loads the class [10:03:32] * isaranto afk for 1h [10:08:04] T371344#10368602 nice! [12:26:56] 07artificial-intelligence, 06Machine-Learning-Team, 10Bad-Words-Detection-System, 10revscoring: Language assets for Azerbaijani - https://phabricator.wikimedia.org/T162014#10386299 (10Nemoralis) @Halfak Hi! I might help you with this. It looks like labels.wmflabs.org is not working [12:30:11] 07artificial-intelligence, 06Machine-Learning-Team, 10Edit-Review-Improvements-RC-Page, 10editquality-modeling, and 2 others: Add new recent changes filters to az.wiki - https://phabricator.wikimedia.org/T310691#10386312 (10Nemoralis) labels.wmflabs.org is not working [12:43:15] (03PS1) 10Ilias Sarantopoulos: llm: fix circular imports and LLM_CLASS paths [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039 [12:44:24] (03PS2) 10Ilias Sarantopoulos: llm: fix circular imports and LLM_CLASS paths [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039 [12:48:17] the above module may need more work to make it more robust, I filed a fix for the time being, which I tested locally [13:15:35] (03CR) 10Nik Gkountas: "I have submitted an alternative approach in this patch here (I22466465eaceef43454f25b737c80291705265af), that includes a more extensive re" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1100055 (https://phabricator.wikimedia.org/T381366) (owner: 10Santhosh) [13:36:19] (03CR) 10Kevin Bazira: [C:03+1] llm: fix circular imports and LLM_CLASS paths [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039 (owner: 10Ilias Sarantopoulos) [13:46:08] kevinbazira: thank u for the review(s) and sorry for the noise! [13:46:16] trying to give this a shot [13:47:17] a little context on the flash attention front: I created a fork of the rocm/flash attention repo with the sole purpose of making a release of the package I've built on ml-labs [13:47:18] https://github.com/isaranto/flash-attention/releases/tag/v2.7.0-py3.11 [13:48:05] isaranto: no problem. can't wait to see this running! :) [13:48:25] the upstream instructions + code produce an .egg release which is an old build format. I want to convert that to a python wheel (.whl) so that I can pip install in https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1100995 [13:49:01] i did the conversion with `wheel convert PACKAGE_NAME.egg` but for the moment I get ERROR: flash_attn-2.7.0.post2-py311-cp311-linux_x86_64.whl is not a supported wheel on this platform [13:49:55] I can check it again later but fi anyone wants to give it a shot as well you're more than welcome :) [13:50:23] in any case I will update the flash attn task later with my recent findings, failures and successes :P [13:52:13] (03CR) 10Ilias Sarantopoulos: [C:03+2] "Let's give it a try..." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039 (owner: 10Ilias Sarantopoulos) [13:52:58] (03Merged) 10jenkins-bot: llm: fix circular imports and LLM_CLASS paths [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101039 (owner: 10Ilias Sarantopoulos) [15:18:02] 06Machine-Learning-Team, 10Structured-Data-Backlog (Current Work): [M] Create the logo detection model card - https://phabricator.wikimedia.org/T370759#10386727 (10mfossati) 05In progress→03Resolved Published at https://meta.wikimedia.org/wiki/Machine_learning_models/Production/gogologo. Closing. [16:14:17] argh I used a wrong argument. device instead of device_map [16:17:39] (03PS1) 10Ilias Sarantopoulos: llm: fix device_map argument [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101079 [16:20:27] (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: fix device_map argument [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101079 (owner: 10Ilias Sarantopoulos) [17:02:46] 10Lift-Wing, 06Machine-Learning-Team: [LLM] quantization: allow loading model weights as int8/int4 with HF - https://phabricator.wikimedia.org/T377848#10387148 (10isarantopoulos) I took a shot at deploying aya8b with bitsandbytes on lift wing using the LLM model server and got the following failures which I wo... [17:32:42] going afk folks, have a nice weekend! [19:50:01] 10Lift-Wing, 06Machine-Learning-Team: [LLM] quantization: allow loading model weights as int8/int4 with HF - https://phabricator.wikimedia.org/T377848#10387649 (10isarantopoulos) **bitsandbytes** It seems that `rocminfo` runs and there is no option for the user to provide the GPU architecture info otherwise.... [21:36:28] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work, 13Patch-For-Review: Request to host article-country model on Lift Wing - https://phabricator.wikimedia.org/T371897#10387931 (10Isaac) @kevinbazira one last thing and then I think good to move onto the last stage: English seems to be hard-coded as the langua...