[02:43:22] (03PS5) 10Bmansurov: WIP: Add daily referrers Hive table and Oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/655804 (https://phabricator.wikimedia.org/T270140) [02:45:48] (03CR) 10Bmansurov: "Dan, thanks for assigning it to yourself. I think the initial draft version is ready. Could you do a pass and check for obvious no-nos? Th" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/655804 (https://phabricator.wikimedia.org/T270140) (owner: 10Bmansurov) [02:55:44] PROBLEM - Yarn Nodemanagers in unhealthy status on an-worker1118 is CRITICAL: 3 ge 3 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-backup-hadoop&orgId=1&panelId=46&fullscreen [03:56:14] 10Analytics: Add time interval limits to pageview API - https://phabricator.wikimedia.org/T261681 (10lexnasser) @Milimetric Took a look at this over the weekend, and found that restricting the time interval for any given endpoint is very straightforward and simple to implement. But I was wondering what you mea... [04:11:26] 10Analytics: AQS tests that test for error responses don't fail if the response lacks an error - https://phabricator.wikimedia.org/T273404 (10lexnasser) [05:40:31] 10Analytics: Add time interval limits to pageview API - https://phabricator.wikimedia.org/T261681 (10JAllemandou) Using the first approach (hard-coding a global start-date per dataset) is good IMO. [06:06:00] 10Analytics: Presto should warn or prevent users from querying without Hive partition predicates - https://phabricator.wikimedia.org/T273004 (10JAllemandou) Setting the parameter above to number of hours in a month works fro hourly data, not for snapshot data. I don't think we'll have a better solution though. [06:09:05] (03CR) 10Joal: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/660031 (https://phabricator.wikimedia.org/T273083) (owner: 10Milimetric) [07:23:11] 10Analytics: Presto should warn or prevent users from querying without Hive partition predicates - https://phabricator.wikimedia.org/T273004 (10elukey) The option is also available from 0.159+ (and we have 226 now): https://github.com/prestodb/presto/blob/6ff3052e9df1a04613157746e7e779fd54c0c2a1/presto-docs/src... [07:24:51] 10Analytics: Presto should warn or prevent users from querying without Hive partition predicates - https://phabricator.wikimedia.org/T273004 (10elukey) [07:24:53] 10Analytics, 10Analytics-Kanban: Analytics Presto improvements - https://phabricator.wikimedia.org/T266639 (10elukey) [07:25:52] good morning :) [07:30:24] 10Analytics: Decide to move or not to PrestoSQL - https://phabricator.wikimedia.org/T266640 (10elukey) Opened https://github.com/prestodb/presto/pull/15655 to prestodb, it contains all the info related to the problem that I am seeing. To test the fix, I have done the following: * cloned the prestodb git repo *... [07:30:37] Good morning [07:34:09] elukey: I'm sorry for the alarm on yarn-backup :S [07:34:21] elukey: I don't what happened, not actually what goes wrong [07:36:01] joal: do you mean the yarn one? [07:36:07] es [07:36:10] +y [07:37:01] ah yes I think it is due to the partitions filling up, and yarn not able to write its files anymore [07:37:13] of course - makes sense :) [07:37:34] elukey: normally I'm done - I have a job running that updates the data - it's long [07:37:49] joal: ah so we are "ok" with ~50TB left? [07:37:58] (meaning that we don't need to fill everything up) [07:38:29] except the data for this week, that will be some TBs [07:38:31] elukey: The meore we wait, the more we have to store (webrequest for isntance I keep since 2021-01-01), as well as events or others [07:38:58] Do we still agree we try to pull the trigger Tuesday next week? [07:39:06] 50Tb [07:39:11] this is small :S [07:39:38] * joal doesn't realize what he just wrote [07:40:27] if we want to pull the trigger on Tuesday we should give the announce today [07:40:37] I wanted to double check with you first though :) [07:40:41] agreed - Let's see if team agrees? [07:40:45] ack [07:41:03] there may be some opposition from other teams, one week of notice is not a lot [07:41:08] but it is not short either :) [07:41:56] It's not really one week - We have sent regular reminders on that front for weeks, even if no formal emails [07:44:03] yep yep but I fear that people might not like it [07:46:38] hm [07:47:09] elukey: I just checked remaining DFS on the namenode UI, it states 40Tb :S I'm a bit afraid we run out of space [07:47:28] I will try to remove even more backeup data [07:49:51] Willy add some info about the possibility of new rack space in https://phabricator.wikimedia.org/T260445#6784657 [07:50:09] but I doubt that we'll get anything racked this week [07:54:58] 10Analytics-Radar, 10SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10elukey) Cleaned up with `sudo ip addr flush ens5; sudo systemctl restart ifup@ens5` in tmux kafka-test1010, kafka-test1009 and kafka-test1007 [08:00:25] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Gilles) loadEventEnd is just one of the fields recorded by the NavigationTiming schema. I notice that only th... [08:07:52] joal: for Presto - I created an-test-presto1001 as single worker vm, with ~8G of ram, that is not a lot but hopefully we'll be able to test Alluxio [08:08:31] (an-test-coord1001 is now only doing the query coordination part) [08:11:51] I have seen that elukey - That's great :) [08:12:18] elukey: I wondered - Is the difference between prestoDB and Trino big in term of packaging? [08:12:24] and/or config? [08:13:31] a little yes [08:13:55] ok - I was trying to get a feeling of how much effort we would need to move [08:13:59] elukey: --^ [08:14:30] joal: in theory not a lot, a new package should be doable in a couple of hours (in theory) [08:14:50] I would create a new one called "trino" rather than using the current one [08:14:55] versioning is different etc.. [08:14:59] +1 to that [08:15:07] different tools, different names [08:15:12] depending on how this pull request goes, we might want to move or not [08:15:20] no idea about the differences [08:15:31] it seemed more conservative for the moment to stick with prestodb [08:19:26] 10Analytics, 10SRE: rsyslog segfault on an-test-presto1001 - https://phabricator.wikimedia.org/T273412 (10elukey) [08:23:40] 10Analytics, 10SRE: rsyslog segfault on an-test-presto1001 - https://phabricator.wikimedia.org/T273412 (10elukey) [08:27:37] 10Analytics, 10SRE: rsyslog segfault on an-test-presto1001 - https://phabricator.wikimedia.org/T273412 (10elukey) [08:31:31] completly agreed on 'conservative' - less failure-prone it seems [08:33:29] 10Analytics, 10SRE: rsyslog segfault on an-test-presto1001 - https://phabricator.wikimedia.org/T273412 (10elukey) I was able to make rsyslog to start deleting the content of `/var/spool/rsyslog` on the host (looks similar to https://github.com/rsyslog/rsyslog/issues/2654). The old files were backed up in `/hom... [08:42:00] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[18-41] - https://phabricator.wikimedia.org/T260445 (10Marostegui) @wiki_willy db1111 can be moved somewhere else if needed. From our side our needs would be: - Choose a day/time so DBAs can depool the host in... [09:48:52] Aiflow's setup is very different from what we are used to [09:50:23] IIUC adding a job is simply a matter of deploying the DAG to the airflow host, and it will be picked up by the scheduler after x mins [09:50:45] so this means bypassing kerberos auth for these things [09:52:18] so in the context of a single instance used by more than one team, we'd need to be gatekeepers of the scap repo in theory [09:52:21] (with the DAGs) [09:52:35] because I see that you can have hive/spark/etc.. hooks running as user X [09:53:08] so if airflow runs as proxy for any other user, we'd need to be very vigilant to what is deployed [09:53:42] and there is also the UI [09:54:00] people may want to be able to connect to it and stop/start/etc.. dags [09:54:17] but that needs auth too, and access rules [09:54:35] I think that using LDAP auth it is possible to have custom views of DAGs for every user [09:55:29] on the perf side, IIUC the scheduler consumes more resources when the DAGs increase in number and complexity [09:56:45] that is a double edge sword for separate-stacks (one team per instance) and one multi-tenant stack [09:57:00] if we create single stacks with little VMs they might not be enough very soon [09:57:15] (good side that Ganeti can resize memory/cpu for VMs easily) [09:57:39] the more I think about it the better it looks the one-instance-per-team solution [09:57:59] going to add my thoughts to the task [10:05:38] I am watching https://www.youtube.com/watch?v=u00wmcHe8ow&ab_channel=ApacheAirflow (Airflow at EA) as suggested by Erik, very interesting that they use Presto as well [10:07:24] and they had oozie before airflow :D [10:08:25] and they went on a single instance first, interesting [10:11:33] (03CR) 10WMDE-Fisch: [C: 03+1] Fix case of metric path [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659291 (owner: 10Awight) [10:12:28] https://airflow.apache.org/docs/apache-airflow/1.10.6/security.html?highlight=ldap#rbac-ui-security [10:15:15] (03CR) 10Awight: "minor fixes suggested." (037 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659230 (https://phabricator.wikimedia.org/T271902) (owner: 10Svantje Lilienthal) [10:36:57] (03CR) 10WMDE-Fisch: Added editor type preferences (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657362 (https://phabricator.wikimedia.org/T271902) (owner: 10Svantje Lilienthal) [10:52:23] 10Analytics, 10Analytics-Kanban: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (10elukey) @EBernhardson thanks a ton again for your insights, I watched quickly https://www.youtube.com/watch?v=u00wmcHe8ow (Airflow Summit 2020... [10:52:27] joal: --^ [10:52:39] looong read but let me know your thoughts [11:01:02] (03PS6) 10Awight: Added editor type preferences [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657362 (https://phabricator.wikimedia.org/T271902) (owner: 10Svantje Lilienthal) [11:01:30] (03CR) 10Awight: [C: 03+1] "PS 6: s/Mediawiki/MediaWiki/g" (032 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657362 (https://phabricator.wikimedia.org/T271902) (owner: 10Svantje Lilienthal) [11:03:36] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Gilles) All fixed: {F34082972} {F34082976} [11:33:30] 10Analytics, 10Analytics-Kanban: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (10JAllemandou) Thanks @elukey for the very interesting write. I agree on the idea of having airflow instances per team. One nice thing with that... [11:37:27] (03Abandoned) 10Awight: [WIP] Segment CodeMirror metrics by user edit count [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/656210 (https://phabricator.wikimedia.org/T269986) (owner: 10Awight) [11:42:49] (03Restored) 10Awight: [WIP] Segment CodeMirror metrics by user edit count [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/656210 (https://phabricator.wikimedia.org/T269986) (owner: 10Awight) [12:00:14] * elukey afk! lunch [12:57:40] good morning team :)) [13:02:22] Hi fdans [13:51:33] I'm taking a mental health day. The aforementioned family matter is stressing me out a lot more than it should :-/ [13:53:55] (03PS1) 10Gerrit maintenance bot: Add mni.wiktionary to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/660829 (https://phabricator.wikimedia.org/T273457) [13:54:49] (03PS1) 10Gerrit maintenance bot: Add mni.wikipedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/660833 (https://phabricator.wikimedia.org/T273456) [13:57:29] 10Analytics: Bug: Active Editors showing July numbers - https://phabricator.wikimedia.org/T273470 (10Milimetric) [13:58:10] morning yall [13:58:47] fdans: you think you have a quick fix for ^ ? I'm sending a patch to remove the metric groups to fix the swapping animation for now. Dirty fixes are ok until we re-do the state. [14:04:52] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for CodeMirror - https://phabricator.wikimedia.org/T273471 (10awight) [14:07:35] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for CodeMirror - https://phabricator.wikimedia.org/T273471 (10awight) [14:07:45] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for CodeMirror - https://phabricator.wikimedia.org/T273471 (10awight) [14:09:41] 10Analytics, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure: Define event stream configuration syntax - https://phabricator.wikimedia.org/T273235 (10Ottomata) [14:15:35] (03PS1) 10Milimetric: Remove metric groups [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/660835 (https://phabricator.wikimedia.org/T262725) [14:17:16] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for VisualEditor, segment all metrics - https://phabricator.wikimedia.org/T273474 (10awight) [14:17:29] 10Analytics, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure: Define event stream configuration syntax - https://phabricator.wikimedia.org/T273235 (10Ottomata) > I'm not sure whether it's best to treat data supplementation configuration for the intake service and the client librari... [14:22:03] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10Ottomata) > Are there any other Java apps or services around here that might realistically want to make use of a generic Java event p... [14:22:53] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Add edit count bucketing to all metrics - https://phabricator.wikimedia.org/T269986 (10awight) [14:23:06] (03PS1) 10Milimetric: Bump refinery-source version to 0.1.0 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/660836 (https://phabricator.wikimedia.org/T273083) [14:24:21] (03CR) 10Milimetric: "note this expects refinery-source to finally get to that solid 0.1.0 release 😊" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/660836 (https://phabricator.wikimedia.org/T273083) (owner: 10Milimetric) [14:25:00] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for TemplateWizard, segment all metrics - https://phabricator.wikimedia.org/T273475 (10awight) [14:25:08] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for TemplateWizard, segment all metrics - https://phabricator.wikimedia.org/T273475 (10awight) [14:32:49] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10Ottomata) BTW, another JS client is [[ https://gerrit.wikimedia.org/r/c/research/recommendation-api/+/660658/1/recommendation/web/sta... [14:34:33] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for CodeMirror - https://phabricator.wikimedia.org/T273471 (10awight) a:03awight [14:41:51] hey team :] [14:41:56] hola :) [14:44:04] mforns: when you have a moment can I ask you a couple of questions about navtiming? [14:44:23] elukey: yes, if you want now [14:44:42] I was looking at the backfill job I left running... it's stuck as well [14:44:48] :[ [14:44:56] ack! So camus-eventlogging in hadoop test is behaving a little weirdly, I see in raw data stuff for year 2080 :D [14:44:58] will kill it [14:45:05] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Ottomata) Yeehaw thank you! [14:45:09] whaaaat [14:45:24] I checked with spark and the events are in the new format for eventgate [14:45:51] aha [14:45:52] so I was wondering what it was done migration wise for prod [14:46:25] elukey: what schema was the one with future dates? [14:47:48] (03CR) 10Ottomata: "> Patch Set 1:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/660836 (https://phabricator.wikimedia.org/T273083) (owner: 10Milimetric) [14:58:23] (03PS4) 10Awight: Segment CodeMirror metrics by user edit count [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/656210 (https://phabricator.wikimedia.org/T273471) [14:58:31] (03CR) 10Joal: "note to self: refinery counts in base 142" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/660836 (https://phabricator.wikimedia.org/T273083) (owner: 10Milimetric) [14:59:44] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for CodeMirror - https://phabricator.wikimedia.org/T273471 (10awight) [15:02:11] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10jlinehan) >>! In T273219#6785219, @Ottomata wrote: > Nice! Interesting, the KaiOS app is JS? Cool. > > @jlinehan maybe we should on... [15:02:49] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for CodeMirror - https://phabricator.wikimedia.org/T273471 (10awight) [15:03:22] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for CodeMirror - https://phabricator.wikimedia.org/T273471 (10awight) a:05awight→03None [15:04:06] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for TemplateWizard, segment all metrics - https://phabricator.wikimedia.org/T273475 (10awight) [15:08:10] Heads-up, I've CR+2 this patch but am shy of doing the final "submit" because I'm ignorant of how deployment works: https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/656901 [15:23:17] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[18-41] - https://phabricator.wikimedia.org/T260445 (10elukey) Had a chat with @hnowlan and maps1001 can be moved with some heads up time. I am available to work with John on moving the nodes with Manuel and Hu... [15:27:25] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10Mholloway) >>! In T273219#6792058, @jlinehan wrote: > the way we had broken down the architecture was to have a language-specific cor... [15:28:22] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for TemplateWizard, segment all metrics - https://phabricator.wikimedia.org/T273475 (10awight) [15:49:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Ottomata) FYI I just restricted the legacy eventlogging backend from accepting any of these events. For Navi... [15:52:07] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Ottomata) [15:52:22] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Ottomata) [15:58:18] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Structured-Data-Backlog, 10Patch-For-Review: SuggestedTagsAction Event Platform Migration - https://phabricator.wikimedia.org/T267351 (10Ottomata) [15:58:32] 10Analytics, 10Better Use Of Data, 10Product-Data-Infrastructure: Define acceptable usage of the `meta` object in event schemas - https://phabricator.wikimedia.org/T273293 (10jlinehan) >>! In T273293#6787691, @Ottomata wrote: > BTW, here's the first reference to `meta` I can find: https://github.com/wikimedi... [16:02:43] fdans: standup? [16:02:54] sorry! [16:11:29] 10Analytics, 10Wikidata, 10Wikidata-Query-Service: Move wikidata RDF triples files/table to the wmf namespace in hdfs and hive - https://phabricator.wikimedia.org/T272961 (10Gehel) [16:21:46] (03PS2) 10Awight: Use edit count bucket sent by TemplateWizard [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657634 (https://phabricator.wikimedia.org/T269986) [16:22:11] (03PS3) 10Awight: Use edit count bucket sent by TemplateWizard [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657634 (https://phabricator.wikimedia.org/T273475) [16:23:17] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for TemplateWizard, segment all metrics - https://phabricator.wikimedia.org/T273475 (10awight) a:05awight→03None [16:29:04] 10Analytics, 10Analytics-Kanban: Replace Camus by Gobblin - https://phabricator.wikimedia.org/T271232 (10Ottomata) a:03JAllemandou [16:34:13] 10Analytics: AQS tests that expect error responses pass even if the response succeeds - https://phabricator.wikimedia.org/T273404 (10lexnasser) [16:35:35] elukey: razzi ops sync? [16:35:57] ottomata: I thought we were taking a short break, til :40 [16:36:02] OH sorry [16:36:04] ok sure [16:39:01] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for TemplateWizard, segment all metrics - https://phabricator.wikimedia.org/T273475 (10awight) [16:40:04] (03PS4) 10Awight: Use edit count bucket sent by TemplateWizard [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657634 (https://phabricator.wikimedia.org/T273475) [16:50:14] 10Analytics: Filter out .wikidata requests from pageview definition - https://phabricator.wikimedia.org/T273215 (10fdans) p:05Triage→03Medium Ping @ema why do you think we are getting these requests? Shouldn't they be filtered upstream? [16:51:46] (03PS2) 10Awight: Use the edit count bucket sent by TemplateData [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659227 (https://phabricator.wikimedia.org/T272569) (owner: 10Andrew-WMDE) [16:52:48] 10Analytics: Druid loading of navigationtiming gets stuck - https://phabricator.wikimedia.org/T273216 (10fdans) p:05Triage→03Unbreak! [16:56:05] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Add edit count bucketing to all metrics - https://phabricator.wikimedia.org/T269986 (10awight) a:05awight→03None [16:57:15] (03PS4) 10Svantje Lilienthal: Added visual editor sessions [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659230 (https://phabricator.wikimedia.org/T271902) [16:58:40] 10Analytics, 10FR-Tech-Analytics, 10Fundraising-Backlog: Whitelist Portal and WikipediaApp event data for (sanitized) long-term storage - https://phabricator.wikimedia.org/T273246 (10fdans) a:03mforns [16:59:05] (03CR) 10Svantje Lilienthal: "Thanks for the swift and detailed review. I updated the code accordingly." (037 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659230 (https://phabricator.wikimedia.org/T271902) (owner: 10Svantje Lilienthal) [17:00:01] 10Analytics-Radar, 10User-bd808: Reduce partition granularity of hive tables - https://phabricator.wikimedia.org/T273310 (10fdans) [17:03:04] 10Analytics, 10Analytics-EventLogging: Uncaught TypeError: navigator.sendBeacon is not a function - https://phabricator.wikimedia.org/T273374 (10fdans) a:03Milimetric [17:06:11] 10Analytics, 10Analytics-Kanban: AQS tests that expect error responses pass even if the response succeeds - https://phabricator.wikimedia.org/T273404 (10fdans) [17:06:35] 10Analytics, 10Analytics-Kanban: AQS tests that expect error responses pass even if the response succeeds - https://phabricator.wikimedia.org/T273404 (10fdans) p:05Triage→03High [17:13:04] 10Analytics: Bug: Active Editors showing July numbers - https://phabricator.wikimedia.org/T273470 (10fdans) a:03fdans [17:17:42] 10Analytics, 10Analytics-Kanban: Bug: Active Editors showing July numbers - https://phabricator.wikimedia.org/T273470 (10fdans) p:05Triage→03High [17:18:16] 10Analytics-Clusters, 10SRE: rsyslog segfault on an-test-presto1001 - https://phabricator.wikimedia.org/T273412 (10fdans) [17:24:11] (03CR) 10Awight: [C: 03+1] Added visual editor sessions (033 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659230 (https://phabricator.wikimedia.org/T271902) (owner: 10Svantje Lilienthal) [17:24:14] 10Analytics-Radar, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Product-Analytics, and 2 others: Set up generation of JSON dumps for Wikimedia Commons - https://phabricator.wikimedia.org/T259067 (10Cparle) Hey @ArielGlenn this has been in our code review column for a long while now, you reckon you'... [17:25:33] (03CR) 10Awight: [C: 03+1] Use the edit count bucket sent by TemplateData [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659227 (https://phabricator.wikimedia.org/T272569) (owner: 10Andrew-WMDE) [17:32:13] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[18-41] - https://phabricator.wikimedia.org/T260445 (10wiki_willy) Thanks @Marostegui, I appreciate it. We discussed this during my staff meeting a bit last week, and @Cmjohnson will work with you and the othe... [17:40:17] 10Analytics, 10Better Use Of Data, 10Product-Data-Infrastructure: Define acceptable usage of the `meta` object in event schemas - https://phabricator.wikimedia.org/T273293 (10Ottomata) > Can you give an example maybe? There is the thing-being-measured (data), and the things-about-the-thing-being-measured (me... [18:09:11] !log rebalance kafka partitions for eqiad.resource_change [18:09:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:10:30] elukey, ottomata: I scheduled downtime for the expected kafka alerts for the next 4 hours [18:12:54] 10Analytics, 10Better Use Of Data, 10Product-Data-Infrastructure: Define acceptable usage of the `meta` object in event schemas - https://phabricator.wikimedia.org/T273293 (10jlinehan) >>! In T273293#6792956, @Ottomata wrote: >> Can you give an example maybe? There is the thing-being-measured (data), and the... [18:20:26] hey a-team (or perhaps mainly ottomata): I noticed that the MediaSearch schema stream is "mediawiki.mediasearch_interaction". Are all Event Platform streams expected to use that "mediawiki.*" naming convention, so we should expect data to flow into "mediawiki_*" in the Data Lake? [18:23:27] !log rebalance kafka partitions for codfw.mediawiki.recentchange [18:23:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:36:21] 10Analytics-Radar, 10SRE, 10decommission-hardware, 10ops-codfw, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10wiki_willy) [18:36:34] razzi: I still see the alerts not downtimed in icinga :( [18:37:11] elukey: yeah it's strange to me, looking into it [18:38:34] elukey: is there a way to view scheduled downtimes? [18:40:01] elukey: actually I think I understand the issue - I used the time picker to select the start time as "now" in my time zone, but should have kept it to the default time it has, now in utc [18:40:05] (03PS1) 10Kosta Harlan: homepagemodule: Add missing "enabled" state [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/660872 (https://phabricator.wikimedia.org/T273084) [18:41:29] razzi: ah perfect! So if downtime is active you should see a little icon in the row of the alert [18:41:38] (in the page that Andrew showed to you earlier on) [18:42:08] mforns: I just seen the unbreak now for Druid, anything that you folks discussed that I should be aware? [18:42:13] next steps etc... [18:42:24] I can work on it early tomorrow [18:44:27] (dinner time, going to read in a bit) [18:47:20] (03CR) 10Ottomata: [C: 03+1] "I can merge if you don't have perms." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/660872 (https://phabricator.wikimedia.org/T273084) (owner: 10Kosta Harlan) [18:49:35] (03CR) 10Kosta Harlan: [C: 04-1] "Reconsidering for a little bit and discussing with the team. At the very least we should add "disabled" to this patch, but maybe we should" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/660872 (https://phabricator.wikimedia.org/T273084) (owner: 10Kosta Harlan) [18:51:23] (03CR) 10Ottomata: [C: 03+1] "K!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/660872 (https://phabricator.wikimedia.org/T273084) (owner: 10Kosta Harlan) [18:58:03] !log rebalance kafka partitions for codfw.mediawiki.job.recentChangesUpdate [18:58:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:58:27] !log rebalance kafka partitions for eqiad.mediawiki.job.recentChangesUpdate [18:58:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:01:01] !log rebalance kafka partitions for eventlogging_LayoutShift [19:01:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:03:58] 10Analytics-Clusters: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers - https://phabricator.wikimedia.org/T255973 (10razzi) As we get into the higher-volume topics, we are seeing some alerts about replica max lag and under-replicated partions. As I continue to run migrations, t... [19:04:23] 10Analytics-Clusters: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers - https://phabricator.wikimedia.org/T255973 (10razzi) [19:07:01] (03Abandoned) 10Kosta Harlan: homepagemodule: Add missing "enabled" state [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/660872 (https://phabricator.wikimedia.org/T273084) (owner: 10Kosta Harlan) [19:14:53] * razzi afk for lunch [19:16:59] (03PS1) 10Fdans: Fixes broken month-subtracting logic [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/660908 (https://phabricator.wikimedia.org/T273470) [19:17:17] (03PS2) 10Fdans: Fix broken month-subtracting logic [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/660908 (https://phabricator.wikimedia.org/T273470) [19:34:14] I have maybe a basic question, but I am just trying to understand the various AQS endpoints and what they mean. For example, why is the count in /mediarequests for a Commons file smaller than the count in /pageviews for a single page in which the Commons file is used? I expected the opposite. [19:37:56] Example: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Gautama_Buddha/daily/2021011000/2021011000 vs. https://wikimedia.org/api/rest_v1/metrics/mediarequests/per-file/all-referers/all-agents/%2Fwikipedia%2Fcommons%2F0%2F0a%2FShakyamuni_detail%2C_Clevelandart_1991.9_%28cropped%29.jpg/daily/2021011000/2021011000 [19:44:59] 10Analytics-Radar, 10SRE, 10decommission-hardware, 10ops-codfw, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10wiki_willy) a:03Papaul [19:49:15] Anyone from the a-team around to provide a clarification? [19:50:08] I'll do my best (but currently babysitting) [19:52:42] 10Analytics-Radar, 10SRE, 10decommission-hardware, 10ops-codfw, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Papaul) @wiki_willy this is a VM it doesn't go to me [19:55:26] 10Analytics-Radar, 10SRE, 10decommission-hardware, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10wiki_willy) a:05Papaul→03None [19:58:19] milimetric: DominicBM's question is ^^^^ :) [19:59:19] 10Analytics-Radar, 10SRE, 10decommission-hardware, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Dzahn) a:03Dzahn [20:00:09] ottomata, Hah, thanks. I thought milimetric was taking his time, but I guess I could have been clearer. [20:01:00] 10Analytics-Radar, 10SRE, 10decommission-hardware, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Dzahn) a:05Dzahn→03None Oh, this ticket is actually not ready at all (VM or not), per previous comments. this is supposed to be stalled. [20:02:16] 10Analytics-Radar, 10SRE, 10decommission-hardware, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Dzahn) 05Stalled→03Invalid I'll just close this as invalid because the template would not match a VM and the system is still in production. [20:02:20] 10Analytics, 10SRE, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10Dzahn) [20:02:56] 10Analytics-Radar, 10SRE, 10decommission-hardware, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10wiki_willy) Thanks @Dzahn - I saw it listed under the "pending onsite steps (codfw)" column, so it threw me off for a sec. >! In T245279#6793672, @Dzahn wrote: >... [20:05:30] DominicBM: Hi - I think the difference is due to access-methods [20:06:14] DominicBM: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/desktop/user/Gautama_Buddha/daily/2021011000/2021011000 [20:06:59] DominicBM: the wikipedia mobile view don't download the whole set images, and the one you have taken as example is at the full bottom of the page [20:09:12] 10Analytics-Radar, 10SRE, 10decommission-hardware, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Dzahn) Yea, my bad, this is not fitting for a VM. But also when this was created we did not have the same decom cookbook and workflow yet. [20:11:13] Okay, that is very interesting information! Most of the image-centric metrics tools that are used the most, like GLAMorgen and BaGLAMa are based on aggregating the page views of articles with images embedded, but it seems like that's technically an overcount, based on how mobile works. [20:16:07] !log rebalance kafka partitions for eventlogging_DesktopWebUIActionsTracking [20:17:05] DominicBM: I'm no expert of image-count, but I'm happy data makes more sense :) [20:17:44] Gone for tonight team - see y'all tomorrow :) [20:23:26] DominicBM: that's definitely not always true though. I'd expect your original intuition to be true for leading images, for example. Or articles with a lot of traffic on desktop. (sorry, was taking my time because both kids decided to tantrum at the same time). But there are more subtle differences too, like agent type includes "automata" on the pageviews dataset: [20:23:45] https://github.com/wikimedia/analytics-aqs/blob/master/v1/mediarequests.yaml#L57 (this is mediarequests, which only detects self-identified spiders) [20:23:58] https://github.com/wikimedia/analytics-aqs/blob/master/v1/pageviews.yaml#L63 (this is pageviews, which detects automata as well) [20:27:12] there may be some patterns and rules between image traffic on all wikis vs. pageviews on commons (or at least rules of thumb) but we haven't done that research. If you're interested, it could be done on the Hadoop cluster. [20:30:57] Thanks, that's useful. I think what is interesting to me is that, except in rare cases (like many page views on Commons itself, or hot-linking to upload.wikimedia.org from external sites), it seems due to mobile traffic, you'll generally expect the /mediaviews total to be less or at least the same as the aggregate of /pageviews of all articles where the image is embedded. [20:32:08] This means /mediaviews is probably the most accurate way to count if you are trying to measure access to images you uploaded,but it might be painful to try to change the logic of most existing tools, if it lead to the reported numbers going down substantially. [20:33:53] (03Abandoned) 10Mholloway: Add fragment/analytics/meta/mediawiki to analytics/test [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/659412 (https://phabricator.wikimedia.org/T271456) (owner: 10Mholloway) [20:34:02] (03Abandoned) 10Mholloway: Create fragment/analytics/meta/mediawiki [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/658677 (https://phabricator.wikimedia.org/T271456) (owner: 10Mholloway) [20:37:48] 10Analytics: Presto should warn or prevent users from querying without Hive partition predicates - https://phabricator.wikimedia.org/T273004 (10razzi) @JAllemandou what do you mean by snapshot data? [20:40:54] 10Analytics: Presto should warn or prevent users from querying without Hive partition predicates - https://phabricator.wikimedia.org/T273004 (10Ottomata) (answering since Joal is out for the eve) Joal is referring to Hive tables with 'snapshot' partitions, like the Sqooped mediawiki tables, and mediawiki_histor... [20:51:44] !log rebalance kafka partitions for eventlogging_PaintTiming [20:51:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:57:42] 10Analytics-Radar, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Product-Analytics, and 2 others: Set up generation of JSON dumps for Wikimedia Commons - https://phabricator.wikimedia.org/T259067 (10ArielGlenn) It's been on my todo list every day for the same length of time. But, yes. soon. In parti... [21:04:20] 10Analytics, 10Analytics-Kanban: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (10mforns) Awesome summary @elukey Thanks! I'd add 1 thing to the 'Downsides' of single-instance approach: - In case that a dynamic DAG breaks A... [21:10:43] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Isaac) > that's an amazing demo! I love this, thanks for your work and thoughtful analysi... [21:27:00] !log rebalance kafka partitions for codfw.mediawiki.job.cdnPurge [21:27:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:27:25] !log rebalance kafka partitions for eqiad.mediawiki.job.cdnPurge [21:27:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:31:46] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[18-41] - https://phabricator.wikimedia.org/T260445 (10Marostegui) Thanks @wiki_willy - I will be off a few days next week but @LSobanski and @Kormat are on this task in case this needs to happen while I am away. [21:42:07] * razzi afk for a walk [22:21:41] 10Analytics-Clusters: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers - https://phabricator.wikimedia.org/T255973 (10razzi) [22:35:42] 10Analytics, 10Machine Learning Team: Get pytorch running on AMD GPU - https://phabricator.wikimedia.org/T245449 (10calbon) 05Open→03Declined [22:45:08] milimetric: could you quickly take a look at my latest question here: https://phabricator.wikimedia.org/T261681 [23:21:20] 10Analytics: Add time interval limits to pageview API - https://phabricator.wikimedia.org/T261681 (10Milimetric) Agreed with @JAllemandou, if I was thinking anything fancier than hard-coded start dates for each dataset, it's lost in some dusty corner of my brain. Thanks for taking this, glad it's straightforward. [23:41:20] 10Analytics, 10Privacy Engineering, 10Research, 10Patch-For-Review: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10Isaac) @JFishback_WMF I just wanted to check in to see whether there is anything I can get started to make the p...