[05:51:26] morning! [06:28:18] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Decouple analytics zookeeper cluster from kafka zookeeper cluster [2019-2020] - https://phabricator.wikimedia.org/T217057 (10elukey) [06:28:54] new zookeeper nodes ready! [06:29:07] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Decouple analytics zookeeper cluster from kafka zookeeper cluster [2019-2020] - https://phabricator.wikimedia.org/T217057 (10elukey) 05Stalled→03Open [06:38:06] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Decouple analytics zookeeper cluster from kafka zookeeper cluster [2019-2020] - https://phabricator.wikimedia.org/T217057 (10elukey) Hosts are ready, and we have been testing the Hadoop Test cluster with the new Zk cluster for a while without any big issues. Ne... [06:39:14] 10Analytics, 10Analytics-Kanban: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) @Isaac Adding also you as well, let me know if you are interested! [07:22:04] Hi team - internet issues this morning :S [07:26:46] o/ [07:26:56] bonjour! [07:27:03] Bonjour elukey :) [07:27:19] come stai? [07:27:55] très bien, sa va? [07:28:33] Va bene anche grazie :) [07:28:51] * joal can't speak italian without help from google-translate ... [07:30:04] ahahha same for me :) [07:30:33] joal: do you have time tomorrow morning if I swap the zk cluster for Hadoop? [07:31:07] elukey: I'll be teaching tomorrow morning, but will be there around noon [07:31:16] elukey: can we late-start the process? [07:31:37] sure sure! We can do it even in the afternoon [07:31:45] it should last not much [07:32:17] I can drain the cluster while waiting, then we can do it even around 14:30 CEST [07:32:25] thanks mate :) I'll ping you when here, I'd say depedning on your schedule for lunch, starting around 1pm is grea [07:32:32] Ah - 14:30 :) no problem :) [07:33:12] otherwise I can do it another day if it is a problem for you (don't want to force you to find a spot to connect only for this if you have to talk with people etc..) [07:33:42] elukey: no no, really tomorrow is fine :) [07:33:53] super, 14:30 CEST then :) [07:34:06] going to send an email in a bit about the maintenance window [07:34:31] \o/! Zookeepytics [07:36:03] yes finally :) [07:36:21] brb [07:59:26] 10Analytics: MediaWiki history dumps have some events in 2025 - https://phabricator.wikimedia.org/T235269 (10JAllemandou) ` spark.sql("select wiki_db, event_entity, event_type, count(1) as c from wmf.mediawiki_history where snapshot = '2019-09' and event_timestamp > '2020-01-01 00:00:00' group by wiki_db, event_... [08:00:54] 10Analytics: Update mediawiki-history dumper to use project in file names - https://phabricator.wikimedia.org/T235409 (10JAllemandou) [08:18:41] joal: procedure for tomorrow in https://etherpad.wikimedia.org/p/analytics-zk-migration [08:30:45] elukey: question for you - the procedure mentions cookbook - Are we using chef?? [08:31:01] or is cookbook related to cumin? [08:32:43] cumin cookbooks! [08:33:43] Ah [08:34:07] This could be misleading :) May we should have said cumin-recipes? [08:34:09] :-P [08:34:16] :) [08:35:06] we have several cookbooks now for our stuff [08:35:16] next round of hadoop reboots should be done with a cookbook [08:35:37] need to create one for Druid as well [08:36:10] I like the idea of a druid-cookbook - feels like magix [08:37:24] I'd love to get to a point in which any sre can safely execute cookbooks for hadoop etc.. [08:37:43] without us being impacted or in need to schedule big works [08:38:57] only thing missing from the procedure IMO is draining / restarting jobs on the clustewr [08:39:07] as for the rest, I do trust you elukey :) [08:39:37] yes yes for draining there is, but not for restoring the prev state [08:39:39] going to add it [08:39:54] (basically enable/disable timers on an-coord) [08:40:23] my idea is to quickly stop standby and master nodes, run puppet on an-master1001 with the new config and then on an-master1002 [08:40:38] Ah yes - a comment would be welcome ;) [08:40:49] in theory the workers should complain a bit but not go down [08:40:58] yup, if fast should be ok [08:47:28] 10Analytics, 10Analytics-Kanban: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10Neil_P._Quinn_WMF) @elukey Yes, I'm happy to help with this! Just let me know what you'd like me to do. [10:01:10] 10Analytics, 10Analytics-Kanban: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) >>! In T212258#5571956, @Neil_P._Quinn_WMF wrote: > @elukey Yes, I'm happy to help with this! Just let me know what you'd like me to do... [10:09:54] 10Analytics, 10Operations: Add metadata to puppet about kerberos accounts - https://phabricator.wikimedia.org/T235418 (10elukey) [10:26:56] * elukey lunch! [11:39:47] joal: hellooo if you think these queries are fine I can start backfilling [11:39:47] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/541817/ [12:00:36] Hi fdans - looking RIGHT NOW [12:00:47] fdans: please excuse me for having done so already :S [12:00:59] oh no joal no problem at all, thank you! [12:03:48] fdans: just to be sure I'm not missing soemthing: we have refers for special values only, not per project [12:04:01] joal: correct [12:04:23] fdans: do we have those special values in the new dataset, or only 'all' and per-project? [12:04:43] fdans: asking for data-correctness in time [12:04:43] joal: we do have them [12:04:55] great [12:05:15] joal: if you +1 I'll move the queries to their own directory and merge [12:07:39] Ah fdans - I get it now - In currently loaded data in AQS, we don't serve wiki as referer, only aggregated values [12:08:16] Oh actually no we don't !!!! [12:08:18] My bad [12:08:21] Man [12:09:20] Ok - we use 'internal' as referer if we don't know the wiki - Same in that backfilled data - Hourray [12:09:56] (03CR) 10Joal: [C: 03+1] "LGTM! Thanks fdans" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541817 (https://phabricator.wikimedia.org/T228149) (owner: 10Fdans) [12:37:38] thank you joal!! [13:42:46] (03CR) 10Joal: [C: 04-1] "A bunch of comments, mostly about comments (-1 nonetehless, as you prefer -1 if there are comments :). I did double check the blacklist ag" (039 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [13:44:05] 10Analytics, 10Analytics-Kanban, 10Performance-Team (Radar): Upgrade python-kafka to 1.4.7 - https://phabricator.wikimedia.org/T234808 (10elukey) @Gilles how should we coordinate for the deployment? [13:47:52] 10Analytics: Fix download-project-namespace-map script to send alert if it fails - https://phabricator.wikimedia.org/T203824 (10elukey) 05Open→03Resolved a:03elukey This seems an old task... [13:54:30] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Allow all Analytics tools to work with Kerberos auth - https://phabricator.wikimedia.org/T226698 (10elukey) [14:09:37] hi teammm :] [14:10:01] mforns, hi! I'll have a question for you when you have time :) [14:14:32] hey joal :] shoot! [14:16:44] mforns: From me reading some reportupdater config files, I have inferred the fact that it runs requests AFTER the time-period selected as granularity - Just wanted to be sure I'm not messing up [14:17:15] So, when selecting granularity month and start-date 2019-08-01, the first query will happen on 2019-09-01 [14:18:52] using start_date = 2019-08-01, and end-date 2019-08-31 [14:20:46] mforns: --^ [14:21:14] reading [14:21:38] joal, yes exactly [14:21:48] great :) [14:22:05] in reportudpater all data-points are labeled with the timestamp of the start of the period [14:22:26] (like most of scheduling tools no?) [14:22:58] so RU executes a monthly report for 2019-09-01 on 2019-10-01 [14:23:15] or maybe a litter later, if delay=... is specified [14:24:11] joal, I think though that end_data is exclusive, so in your example: end_data would be 2019-09-01 [14:25:06] queries should treat end_date as exclusive [14:28:29] !log matomo upgraded to 3.11 on matomo1001 [14:28:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:29:19] 10Analytics, 10Analytics-Kanban: Upgrade matomo to its latest upstream version - https://phabricator.wikimedia.org/T234607 (10elukey) @Nuria matomo1001 has been upgraded with the latest upstream version. From a quick check everything looks ok, can you check as well? [14:56:37] makes sense mforns - Thanks for that [14:57:40] elukey: I may have asked this before, but why are some terminal commands not allowed in the Jupyter terminal? [14:57:58] neilpquinn: like? [14:58:05] nuria: like crontab [14:58:31] neilpquinn: o/ - why do you need a crontab in there? :D [14:58:34] nuria: or moving files to /srv/published-datasets [14:58:49] elukey: to periodically run a notebook :) [14:59:05] sure but that can/should be done with regular ssh [14:59:15] to the notebook host [14:59:21] elukey: yes, but why? [15:00:00] elukey: my big issue is that it makes it a lot harder to script that [15:00:02] I am not familiar a lot with the notebook shell, but I guess that there are limitations in what you can/cannot execute [15:00:28] we can open a task and check why if you want [15:00:53] elukey: well, the docs say "The terminals run on the system where the Jupyter server is running, with the privileges of your user." https://jupyterlab.readthedocs.io/en/stable/user/terminal.html [15:01:39] neilpquinn: what error do you get? [15:01:48] elukey: as far as I know, the whole point is to have arbitrary shell access next to your notebooks with arbitrary data access :) [15:02:27] elukey: running `crontab -l` I get "crontabs/neilpquinn-wmf/: fopen: Permission denied" [15:03:15] well arbitrary shell access seems a lot :D [15:03:34] I'll check after meetings what happens, and if there are limits [15:03:53] neilpquinn: is the error always the same? [15:04:05] elukey: and running "mv ~/foo.txt /srv/published-datasets" gives me "mv: inter-device move failed: '/home/neilpquinn-wmf/foo.txt' to '/srv/published-datasets/foo.txt'; unable to remove target: Read-only file system" [15:05:02] that sounds a limitation of the shell to avoid you doing anything on the underlying OS simply logging as your user via jupyter :D [15:05:22] but there might be some misconfing on our side [15:05:27] so I promise I'll check :) [15:07:52] elukey: thanks! What I'm trying to do is create a Python script that will export the notebook to HTML, move it to `/srv/published-datasets` so it gets published, and optionally add a cron job to keep doing the same thing in the future. [15:08:39] would be happy to talk more about the need for this and the potential security implications [15:10:40] neilpquinn: ack! [15:11:06] and IIUC the script is more painful to write if not in a jupyter terminal? [15:11:49] let's do this - can you open a task with a brief description of what you are trying to do? So I'll be able to repro and possibly help with a solution [15:12:01] 10Analytics, 10Product-Analytics: Hash all pageTokens or temporary identifiers from the EL Sanitization white-list for Android - https://phabricator.wikimedia.org/T226852 (10mpopov) 05Open→03Resolved [15:12:04] 10Analytics, 10Product-Analytics, 10VisualEditor: Hash all pageTokens or temporary identifiers from the EL Sanitization white-list - https://phabricator.wikimedia.org/T220410 (10mpopov) [15:17:23] elukey: sounds good! [15:22:30] thanks! [15:23:17] (03CR) 10Mforns: "LGTM! Just left a comment on the delay param." (032 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/542419 (https://phabricator.wikimedia.org/T231529) (owner: 10Awight) [15:25:01] 10Analytics, 10Operations: Add metadata to puppet about kerberos accounts - https://phabricator.wikimedia.org/T235418 (10JAllemandou) p:05Triage→03Normal [15:26:16] 10Analytics: Update mediawiki-history dumper to use project in file names - https://phabricator.wikimedia.org/T235409 (10JAllemandou) p:05Triage→03High [15:26:38] 10Analytics, 10Analytics-Kanban: Update mediawiki-history dumper to use project in file names - https://phabricator.wikimedia.org/T235409 (10JAllemandou) a:03JAllemandou [15:28:45] 10Analytics: Add partition pruning for wmf.browser_general and interlanguage - https://phabricator.wikimedia.org/T235283 (10JAllemandou) [15:29:00] 10Analytics, 10Analytics-Kanban: Add partition pruning for wmf.browser_general and interlanguage - https://phabricator.wikimedia.org/T235283 (10JAllemandou) a:03JAllemandou [15:29:25] 10Analytics, 10Analytics-Kanban: Add partition pruning for wmf.browser_general and interlanguage - https://phabricator.wikimedia.org/T235283 (10JAllemandou) p:05Triage→03High [15:30:12] 10Analytics: browser dashboards not updated since 09/29 - https://phabricator.wikimedia.org/T235278 (10JAllemandou) 05Open→03Declined [15:30:27] 10Analytics, 10Analytics-Kanban: Superset not able to load a reading dashboard - https://phabricator.wikimedia.org/T234684 (10Nuria) 05Open→03Resolved [15:30:35] 10Analytics, 10Analytics-Kanban: MediaWiki history dumps have some events in 2025 - https://phabricator.wikimedia.org/T235269 (10JAllemandou) a:03JAllemandou [15:30:46] 10Analytics, 10Analytics-Kanban: MediaWiki history dumps have some events in 2025 - https://phabricator.wikimedia.org/T235269 (10JAllemandou) p:05Triage→03High [15:31:16] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: HivePartition (refinery::Hive.py) does not allow partition values to have dots (.) - https://phabricator.wikimedia.org/T235268 (10JAllemandou) p:05Triage→03High [15:32:27] 10Analytics: Analytics Access for Grant - https://phabricator.wikimedia.org/T235260 (10JAllemandou) p:05Triage→03High [15:32:43] 10Analytics, 10Analytics-Kanban: Analytics Access for Grant - https://phabricator.wikimedia.org/T235260 (10JAllemandou) [16:04:54] Taking a b [16:05:01] break for diner - will be back later [16:34:39] 10Analytics, 10Analytics-Kanban: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10Isaac) @elukey : also happy to help. thanks for reaching out! [16:39:25] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Wikimedia-Portals: Sudden drop in WikipediaPortal events - https://phabricator.wikimedia.org/T234461 (10Nuria) Numbers make sense now: ` OK day _c1 1 20614 2 20917 3 20398 4 19691 5 18251 6 18924 7 20555 8 20100 9 20342 10 19808 11 18992 12 178... [16:39:33] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Wikimedia-Portals: Sudden drop in WikipediaPortal events - https://phabricator.wikimedia.org/T234461 (10Nuria) 05Open→03Declined [16:40:04] 10Analytics, 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests: Analytics Access for Grant - https://phabricator.wikimedia.org/T235260 (10Nuria) [16:41:46] 10Analytics, 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests: Analytics Access for Grant - https://phabricator.wikimedia.org/T235260 (10Nuria) @gsingers In order to get an ldap user you need to create a user at http://wikitech.wikimedia.org , paste that user in this ticket once you have it, someo... [16:49:53] elukey: as far as i can see turnilo is doing fine [16:50:00] elukey: as far as i can see PIWIK is doing fine [16:50:03] elukey: sorry [16:50:52] ack! [17:24:40] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10phuedx) a:03phuedx [17:46:06] * elukey off! [18:13:45] !log Manually add ban.wikipedia.org to pageview whitelist (T234768) [18:13:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:13:48] T234768: Create Balinese Wikipedia - https://phabricator.wikimedia.org/T234768 [18:19:22] git st [18:19:24] oops [18:21:17] (03PS1) 10Joal: Add ban.wikipedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542998 (https://phabricator.wikimedia.org/T234768) [18:22:49] 10Analytics, 10Analytics-Kanban: Add Balinese wikipedia to analytics setup - https://phabricator.wikimedia.org/T235448 (10JAllemandou) [18:22:59] 10Analytics, 10Analytics-Kanban: Add Balinese wikipedia to analytics setup - https://phabricator.wikimedia.org/T235448 (10JAllemandou) a:03JAllemandou [18:23:42] (03PS2) 10Joal: Add ban.wikipedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542998 (https://phabricator.wikimedia.org/T235448) [18:25:21] (03PS3) 10Joal: Add ban.wikipedia to pageview whitelist and sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542998 (https://phabricator.wikimedia.org/T235448) [18:25:31] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add Balinese wikipedia to analytics setup - https://phabricator.wikimedia.org/T235448 (10JAllemandou) [18:33:11] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add Balinese wikipedia to analytics setup - https://phabricator.wikimedia.org/T235448 (10DannyS712) [18:33:30] hey mforns where did you get that info that the anti harassmet training was due today? it says nov 15 in my email [18:49:19] (03PS1) 10Fdans: Add mediarequests per referer metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/542999 [18:56:57] fdans, O.o [18:58:04] fdans, no idea... my bad, so sorry [18:58:31] mforns: nono, it's all good! just asking in case you had seen a different date somewhere [18:59:07] I just checked and I saw it was 15th nov as you say, so.. I also wonder where I got that, it was probably me flying in the mayonese :] [19:08:52] mforns: do you mind having a look at my comments here https://phabricator.wikimedia.org/T232671#5573559 ? Wanted to be sure I'm saying mistakes :) [19:11:39] joal, hehe, I was writing a comment to that task, and your comment popped in, nothing missing, you explained all good. thanks :] [19:12:19] joal, I think there's a typo in month = substr($1, 1, 7), $1 should be single-quoted no? [19:12:38] RU replaces it by just the YYYY-MM string no quotes [19:14:29] mforns: actually if RU gives a month, nothing is needed :) [19:14:49] mforns: it should then be: `month = '$1'` [19:14:51] joal, sorry, no [19:14:57] no no, you did it right [19:15:06] RU gives YYYY-MM-DD [19:15:06] Ah - I get your point though [19:15:11] up [19:15:31] it needs quotes around $1 I think you're right [19:15:33] :] [19:15:38] thanks a lot mforns :) [19:15:43] no problemo! [19:19:48] (03CR) 10Nuria: Add ban.wikipedia to pageview whitelist and sqoop (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542998 (https://phabricator.wikimedia.org/T235448) (owner: 10Joal) [19:37:55] (03CR) 10Joal: Add ban.wikipedia to pageview whitelist and sqoop (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542998 (https://phabricator.wikimedia.org/T235448) (owner: 10Joal) [19:40:33] (03PS4) 10Joal: Add ban.wikipedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542998 (https://phabricator.wikimedia.org/T235448) [19:41:17] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add Balinese wikipedia to analytics setup - https://phabricator.wikimedia.org/T235448 (10JAllemandou) [20:09:27] 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10JAllemandou) I looked at the code and have some comments, but not that many given the complexity of the analysis :) Good job @MGerlach! - When using date for partition pruning, I discourage going with... [20:21:05] (03CR) 10Joal: New report for Reference Previews (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/542419 (https://phabricator.wikimedia.org/T231529) (owner: 10Awight) [20:29:24] (03CR) 10Nuria: New report for Reference Previews (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/542419 (https://phabricator.wikimedia.org/T231529) (owner: 10Awight) [20:30:01] (03CR) 10Nuria: [C: 03+2] Add ban.wikipedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542998 (https://phabricator.wikimedia.org/T235448) (owner: 10Joal) [20:37:33] Gone for tonight team - see you tomorrow [20:57:08] (03PS1) 10Srishakatux: Add hive query for wmcs edits [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/543008 (https://phabricator.wikimedia.org/T232671) [23:46:21] 10Analytics, 10Event-Platform, 10WMF-JobQueue, 10CPT Initiatives (Modern Event Platform (TEC2)), 10good first bug: EventBus extension must not send batches that are too large - https://phabricator.wikimedia.org/T232392 (10Johan) Another week when Tech News wouldn't deliver if the MassMessage target list... [23:55:45] 10Analytics, 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests: Analytics Access for Grant - https://phabricator.wikimedia.org/T235260 (10gsingers) My user is `Grant Ingersoll`