[03:57:15] (03CR) 10Santhosh: [C:04-1] "I have difficulty in imagining our usecase as candidate for strategy pattern. In the past I had used strategy pattern successfully in situ" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1100512 (https://phabricator.wikimedia.org/T381366) (owner: 10Nik Gkountas) [06:07:23] o/ In order to test the latest changes in the experimental ns, I edited the deployment config for the article-country isvc directly. [06:07:23] I've pushed a patch for this change here: https://gerrit.wikimedia.org/r/1101741 [08:40:21] good morning o/ [08:45:22] (03PS1) 10Ilias Sarantopoulos: llm: update torch image to 2.5.1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101804 (https://phabricator.wikimedia.org/T377848) [08:55:22] 06Machine-Learning-Team, 13Patch-For-Review: [LLM] Use Flash attention 2 for GPU inference - https://phabricator.wikimedia.org/T371344#10392725 (10isarantopoulos) It seems that the package we built on ml-labs for flash-attention2 can't be easily installed on a different environment. (tested both by me and Muni... [08:55:49] kevinbazira: here is more context on the flash attention errors --^ [08:58:23] isaranto: thanks for sharing the context. I've run into the same error and it seems to caused by running the wheel with a different python version: https://github.com/carla-simulator/carla/discussions/5053 [08:59:39] still investigating... will share in case I find a solution [09:00:03] 06Machine-Learning-Team, 13Patch-For-Review: [LLM] Use Flash attention 2 for GPU inference - https://phabricator.wikimedia.org/T371344#10392746 (10isarantopoulos) @MunizaA was able to build a wheel which can be installed via the following way ` pip install build GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a python... [09:00:25] the above seems to work --^ [09:07:35] https://pytorch.org/blog/vllm-joins-pytorch/?utm_content=318786429&utm_medium=social&utm_source=linkedin&hss_channel=lcp-78618366 [09:20:36] noiiiice! [09:33:46] thanks for the reviews Ilias! [09:33:46] I've synced the article-country deployment in the experimental ns [09:33:46] going to deploy in the article-models ns [09:38:20] ok, thank u! [09:38:31] kevinbazira: is it ok if I merge this to try it out? https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1101804 [09:39:01] (03CR) 10Kevin Bazira: [C:03+1] llm: update torch image to 2.5.1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101804 (https://phabricator.wikimedia.org/T377848) (owner: 10Ilias Sarantopoulos) [09:39:19] isaranto: yes please, I've +1'ed. [09:39:35] weebale! [09:39:54] np! :D [09:40:07] now I remember it :D [09:40:19] (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: update torch image to 2.5.1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101804 (https://phabricator.wikimedia.org/T377848) (owner: 10Ilias Sarantopoulos) [09:41:05] (03Merged) 10jenkins-bot: llm: update torch image to 2.5.1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1101804 (https://phabricator.wikimedia.org/T377848) (owner: 10Ilias Sarantopoulos) [10:23:44] (03PS4) 10Ilias Sarantopoulos: llm: try prebuilt flash attn package [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) [10:24:07] (03CR) 10Ilias Sarantopoulos: "I am now using the new torch 2.5.1 image" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) (owner: 10Ilias Sarantopoulos) [10:26:02] (03PS5) 10Ilias Sarantopoulos: llm: try prebuilt flash attn package [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) [10:52:26] (03CR) 10Kevin Bazira: [C:03+1] llm: try prebuilt flash attn package [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) (owner: 10Ilias Sarantopoulos) [10:58:14] * klausman lunch [11:41:46] 10Lift-Wing, 06Machine-Learning-Team: Build and Publish ROCm-Compatible Python Packages - https://phabricator.wikimedia.org/T381859 (10isarantopoulos) 03NEW [11:45:13] 10Lift-Wing, 06Machine-Learning-Team: Build and Publish ROCm-Compatible Python Packages - https://phabricator.wikimedia.org/T381859#10393206 (10isarantopoulos) [11:45:21] 10Lift-Wing, 06Machine-Learning-Team: Build and Publish ROCm-Compatible Python Packages - https://phabricator.wikimedia.org/T381859#10393207 (10isarantopoulos) [11:49:28] * isaranto afk lunch [13:00:38] 06Machine-Learning-Team, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Move Lab machines into analytics net for DL access and switch to homedirs on Ceph - https://phabricator.wikimedia.org/T380279#10393510 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6d95cc0c-dd50-4765-a5b7-63aefaaa6960... [13:12:29] (03PS1) 10Nik Gkountas: performance: Process only the required number of collection articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101854 (https://phabricator.wikimedia.org/T381366) [13:17:05] I've been able to build a Flash Attention 2 wheel using: [13:17:05] `GPU_ARCHS=gfx90a PYTORCH_ROCM_ARCH=gfx90a python3 setup.py bdist_wheel` [13:17:05] more details: https://phabricator.wikimedia.org/P71677 [13:24:15] Neat! [13:29:36] nice! [13:30:30] (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: try prebuilt flash attn package [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) (owner: 10Ilias Sarantopoulos) [13:31:15] (03Merged) 10jenkins-bot: llm: try prebuilt flash attn package [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100995 (https://phabricator.wikimedia.org/T371344) (owner: 10Ilias Sarantopoulos) [13:32:35] python -m build is probably the more modern approach we should stick to (it does use bdist_wheel internally iirc) [13:33:19] I was able to install the flash attention package in other environments and used it also in CI [13:34:22] although I didn't test this one -which I should [13:39:20] 10Lift-Wing, 06Machine-Learning-Team: [LLM] quantization: allow loading model weights as int8/int4 with HF - https://phabricator.wikimedia.org/T377848#10393659 (10isarantopoulos) **bitsandbytes** Deployed bitsandbytes with aya-expanse-8B on experimental ns in ml-staging-codfw: Sincne this uses the llm model s... [13:57:58] kevinbazira: my approach failed. Where is the wheel u built located so I can try it? [14:03:11] isaranto: it's located in: [14:03:22] ``` [14:03:22] /home/kevinbazira/build_flash_attention_conda/install_wheel/from_bdist_wheel [14:03:22] ``` [14:28:59] 06Machine-Learning-Team, 13Patch-For-Review: [LLM] Use Flash attention 2 for GPU inference - https://phabricator.wikimedia.org/T371344#10393805 (10isarantopoulos) a:03kevinbazira [14:29:17] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: [LLM] Use vllm for ROCm in huggingface image - https://phabricator.wikimedia.org/T370149#10393809 (10isarantopoulos) a:05isarantopoulos→03None [14:34:33] 10Lift-Wing, 06Machine-Learning-Team: Requesting write access to ml-serve-{eqiad,codfq} for ML team - https://phabricator.wikimedia.org/T381883 (10isarantopoulos) 03NEW [14:47:49] 06Machine-Learning-Team, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Move Lab machines into analytics net for DL access and switch to homedirs on Ceph - https://phabricator.wikimedia.org/T380279#10393892 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by klausman@cumin1002 for hosts: `ml-la... [15:01:27] (03CR) 10Sbisson: "This patch is causing errors for me locally: https://phabricator.wikimedia.org/P71680" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1099849 (owner: 10Santhosh) [15:20:09] (03PS1) 10Nik Gkountas: Fix source and target for event logging of page-collections endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101890 [15:20:49] (03CR) 10CI reject: [V:04-1] Fix source and target for event logging of page-collections endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101890 (owner: 10Nik Gkountas) [15:24:08] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Edit-Review-Improvements-RC-Page, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: Expose ORES topics in recent changes filters - https://phabricator.wikimedia.org/T245906#10394016 (10Samwalton9-WMF) [15:26:06] 06Machine-Learning-Team: Debian hipcc package conflicts with hipcc from AMD's ROCm repository - https://phabricator.wikimedia.org/T381567#10394037 (10isarantopoulos) 05Open→03Resolved [15:26:40] 10Lift-Wing, 06Machine-Learning-Team: Requesting write access to ml-serve-{eqiad,codfq} for ML team - https://phabricator.wikimedia.org/T381883#10394039 (10isarantopoulos) [15:53:37] (03PS2) 10Sbisson: Fix source and target for event logging of page-collections endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101890 (owner: 10Nik Gkountas) [15:55:02] (03CR) 10Sbisson: [C:03+2] performance: Process only the required number of collection articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101854 (https://phabricator.wikimedia.org/T381366) (owner: 10Nik Gkountas) [15:55:18] (03CR) 10Sbisson: [C:03+2] Fix source and target for event logging of page-collections endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101890 (owner: 10Nik Gkountas) [15:55:41] (03Merged) 10jenkins-bot: performance: Process only the required number of collection articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101854 (https://phabricator.wikimedia.org/T381366) (owner: 10Nik Gkountas) [15:56:58] (03Merged) 10jenkins-bot: Fix source and target for event logging of page-collections endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101890 (owner: 10Nik Gkountas) [15:59:57] (03CR) 10Sbisson: "Can this be rebase on the main branch so it is not blocked on the refactoring?" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101569 (https://phabricator.wikimedia.org/T381777) (owner: 10Nik Gkountas) [16:08:28] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Edit-Review-Improvements-RC-Page, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: [SPIKE] How could we add topic filtering to Recent Changes? - https://phabricator.wikimedia.org/T381569#10394217 (10Samwalton9-WMF) [16:15:49] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Edit-Review-Improvements-RC-Page, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: [SPIKE] How could we add topic filtering to Recent Changes? [16H] - https://phabricator.wikimedia.org/T381569#10394230 (10Scardenasmolinar) [16:26:10] 06Machine-Learning-Team, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Move Lab machines into analytics net for DL access and switch to homedirs on Ceph - https://phabricator.wikimedia.org/T380279#10394267 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin1002 for host m... [16:51:41] 06Machine-Learning-Team, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Move Lab machines into analytics net for DL access and switch to homedirs on Ceph - https://phabricator.wikimedia.org/T380279#10394354 (10BTullis) [17:12:38] 06Machine-Learning-Team, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Move Lab machines into analytics net for DL access and switch to homedirs on Ceph - https://phabricator.wikimedia.org/T380279#10394479 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin1002 for host ml-la... [17:43:27] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad: hw troubleshooting: Stuck/bugged BMC on ml-lab1002.eqiad.wmnet - https://phabricator.wikimedia.org/T381902#10394680 (10klausman) [17:53:00] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [18:35:23] I can reinstall the flash attention package in an evn with torch 2.4.1 but not with 2.5.1 [18:35:40] we might need another production image with torch 2.4.1 +rocm6.1 after all [18:53:42] 06Machine-Learning-Team: [LLM] Use Flash attention 2 for GPU inference - https://phabricator.wikimedia.org/T371344#10394886 (10isarantopoulos) I pip installed the wheel that Kevin built in a new environment that had torch 2.5.1/rocm 6.1 and was able to load it and run inference with it. I will update the llm dep... [18:56:14] 06Machine-Learning-Team, 13Patch-For-Review: Test the feasibility of deployment of Aya-23 model in LiftWing - https://phabricator.wikimedia.org/T379052#10394897 (10isarantopoulos) Made an attempt to load the 32B on a deployment on Lift Wing and got the following error on model load ` /opt/lib/venv/lib/python3... [18:57:28] going afk folks, cu tomorrow [18:59:52] (03PS2) 10Nik Gkountas: collections: add recommendation to the list only if not already present [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101569 (https://phabricator.wikimedia.org/T381777) [19:29:26] (03CR) 10Sbisson: [C:03+2] collections: add recommendation to the list only if not already present [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101569 (https://phabricator.wikimedia.org/T381777) (owner: 10Nik Gkountas) [19:30:05] (03Merged) 10jenkins-bot: collections: add recommendation to the list only if not already present [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1101569 (https://phabricator.wikimedia.org/T381777) (owner: 10Nik Gkountas) [20:44:07] FIRING: ErrorBudgetBurn: liftwing - liftwing-revscoring-latency - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [20:49:07] FIRING: [2x] ErrorBudgetBurn: liftwing - liftwing-revscoring-latency - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [21:53:00] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [22:04:07] FIRING: [2x] ErrorBudgetBurn: liftwing - liftwing-readability-latency - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [22:14:07] FIRING: [4x] ErrorBudgetBurn: liftwing - liftwing-revscoring-latency - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [22:44:07] FIRING: [2x] ErrorBudgetBurn: liftwing - liftwing-readability-latency - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [23:29:07] FIRING: [4x] ErrorBudgetBurn: liftwing - liftwing-revscoring-latency - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn