[00:11:35] 06Machine-Learning-Team: Evaluate adding caching mechanism for article topic model to make data available at scale - https://phabricator.wikimedia.org/T401778#11100720 (10Eevans) >>! In T401778#11097502, @BWojtowicz-WMF wrote: > Thank you for the quick answers @Eevans! I'll schedule a call for us, where I will s... [06:47:06] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11100975 (10OKarakaya-WMF) @brouberol thank you I think analytics-ml user is missing some more permissions: ` skein.exceptions.DriverError: Fa... [07:04:33] hello! [07:08:11] good morning [07:23:36] good morning! [07:26:06] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11101008 (10brouberol) @JAllemandou am I right in thinking that we need to add `u:analytics-ml:production` to https://gerrit.wikimedia.org/r/plugi... [07:30:02] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Apply the Tone Check model to published articles, to learn whether we can build a pool of high-quality structured tasks for new editors - https://phabricator.wikimedia.org/T392283#11101012 (10achou) >>! In T392283#11099092, @Michael wrote: >>>!... [08:01:06] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Apply the Tone Check model to published articles, to learn whether we can build a pool of high-quality structured tasks for new editors - https://phabricator.wikimedia.org/T392283#11101044 (10dcausse) @achou thanks for the ping! Yes you're corr... [08:32:22] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 10Editing-team (Tracking): Build model training pipeline for tone check using WMF ML Airflow instance - https://phabricator.wikimedia.org/T396495#11101114 (10brouberol) @gkyziridis I'm going to take this over from @BTullis as he's going... [08:59:41] 06Machine-Learning-Team: Evaluate adding caching mechanism for article topic model to make data available at scale - https://phabricator.wikimedia.org/T401778#11101198 (10BWojtowicz-WMF) Thank you for the explanations @Eevans! I see that I have some confusion around existing Cassandra deployments, I'm sorry for... [09:03:22] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 10Editing-team (Tracking): Build model training pipeline for tone check using WMF ML Airflow instance - https://phabricator.wikimedia.org/T396495#11101239 (10brouberol) [09:33:40] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11101404 (10brouberol) @OKarakaya-WMF I merged a patch adding `u:analytics-ml:production` to `/etc/hadoop/conf/capacity-scheduler.xml`: `lang=dif... [10:05:40] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Apply the Tone Check model to published articles, to learn whether we can build a pool of high-quality structured tasks for new editors - https://phabricator.wikimedia.org/T392283#11101474 (10Michael) Yes, that is something that would work well... [10:29:50] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05), 10Editing-team (Tracking): Build model training pipeline for tone check using WMF ML Airflow instance - https://phabricator.wikimedia.org/T396495#11101526 (10BTullis) [10:31:46] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11101588 (10BTullis) [10:37:20] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11101752 (10OKarakaya-WMF) hi @brouberol Thank you for the patch 🤗 I've just re-tried the pipeline and I get the following error. Should we al... [11:02:04] 06Machine-Learning-Team: Evaluate adding caching mechanism for article topic model to make data available at scale - https://phabricator.wikimedia.org/T401778#11101816 (10Seddon) >>! In T401778#11086985, @Ottomata wrote: >> For the duration of Year in Review processing, we plan to not invalidate the cache to: >>... [11:09:08] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11101851 (10brouberol) my fault. Sorry, I'm not super versed in Hadoop, so I have to reverse engineer many things. I've identified a missin... [11:55:52] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11101984 (10OKarakaya-WMF) no worries at all. I'm available whenever I can help :) [11:59:00] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11101990 (10brouberol) ` 25/08/20 11:58:21 INFO impl.YarnClientImpl: Submitted application application_1754906949114_260505... [12:01:16] ozge_: I cleared https://airflow-ml.wikimedia.org/dags/add_a_link_pipeline/grid?task_id=generate_anchor_dictionary&tab=mapped_tasks&dag_run_id=manual__2025-08-20T10%3A33%3A29.093095%2B00%3A00&map_index=0 which seems to be humming along in YARN [12:01:45] sorry, that was more iterative that I would have liked [12:10:26] @brouberol no worries. this is awesome. I think permissions issues are fixed now: https://airflow-ml.wikimedia.org/dags/add_a_link_pipeline/grid?dag_run_id=manual__2025-08-20T12%3A03%3A30.126316%2B00%3A00&tab=mapped_tasks&task_id=generate_anchor_dictionary&map_index=1 It's still running but i think it is all right. we generally permission issues at the very beginning of the pipeline. [12:11:10] 😍 🚀 [12:20:34] nice, happy to see that y'all are unblocked [12:24:56] FIY, I documented all the missing steps in https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/Airflow/Kubernetes/Operations#Hadoop_setup [12:25:17] thank you so much brouberol ! <3 [12:28:00] anytime :) [12:44:38] what is airflow-ml ? [12:49:30] hi @gry airflow-ml is a new user that we have for the ml team to run airflow tasks. [12:51:14] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11102099 (10OKarakaya-WMF) Hi @brouberol, I think we have a new small issue. I can't see the logs for the pipeline anymore. Could you please he... [12:54:24] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11102110 (10brouberol) Funnily enough, I was just working on https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/161... [12:55:02] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11102112 (10brouberol) Looking at these logs, I'm seeing ` ... Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependen... [13:02:29] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11102146 (10OKarakaya-WMF) We get this error in filter_dict_anchor step. Tracing back the issue, I see some languages have duplicate anchor record... [13:04:09] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11102156 (10OKarakaya-WMF) thank you @brouberol , Can we get read permissions for the ml team members? [13:19:20] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11102200 (10brouberol) I can do that. Just to validate, what command did you run and from what host? [13:20:44] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11102212 (10OKarakaya-WMF) sure, host: `stat1010` command: `yarn logs -appOwner analytics-ml -applicationId application_1754906949114_261287` [20:22:19] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Analyze samples of articles to see how many structured tasks we might be able to generate - https://phabricator.wikimedia.org/T401968#11104136 (10ldelench_wmf) [23:14:48] 06Machine-Learning-Team: Non-English articles show autogenerated English summaries - https://phabricator.wikimedia.org/T395596#11104814 (10Jdlrobson-WMF) 05In progress→03Declined @Jdrewniak please reopen if I'm mistaken.