[05:29:58] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): HomepageVisit schema validation errors - https://phabricator.wikimedia.org/T269966 (10MMiller_WMF) @nettrom_WMF -- we'll talk about this task in our team meeting on Monday. Does this disturb your analysis for Variants C and D? [07:03:51] good morning! [07:04:08] The bigtop devs asked to us if it is ok to present our use case to (virtual) FOSDEM! [07:55:55] !log roll restart yarn daemons to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/649126 [07:55:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:47:58] (03PS6) 10Awight: Process EventLogging events and tally preferences for CodeMirror [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [08:48:04] (03CR) 10Awight: Process EventLogging events and tally preferences for CodeMirror (033 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [09:10:49] 10Analytics: Investigate showing realtime the eventlogging banner stream (currently sampled at 1%) - https://phabricator.wikimedia.org/T255446 (10kai.nissen) @Nuria This seems to have stalled. Are there plans to tackle this? [09:24:42] ah! [09:25:00] I found a trick to finally avoid the extra kinit inside jupyter notebooks [09:30:48] 10Analytics-Clusters: Kerberos credential cache location - https://phabricator.wikimedia.org/T255262 (10elukey) I re-analyzed the problem with fresh mind, and I found this very useful gh issue: https://github.com/jupyter-incubator/sparkmagic/issues/466 What I did was: 1) change krb.conf on an-test-client1001 a... [09:32:37] Good morning [09:32:58] bonjour! [09:38:00] elukey: not having to kinit from notebook is an awesome news! [09:40:10] It should work fine, I am working on a patch to enabled it only on say stat1004 [09:40:22] team: I went to physio this morning and shes asked me to take it easy today - I'm gonna follow the advice [09:42:02] take care! [10:07:54] Recently, my team has been needing low-multidimensional (4) segmentation, which makes Graphite and Grafana uncomfortable ("MediaWiki.VisualEditor.byDialog.template.byStat.opens.byEditCount.over10k.byWiki.dewiki"). Please remind me where I might find another visualization platform used by Wikimedia projects, with good multidimensional support? [10:08:16] (also open to the public would be nice, but I know this is asking for a lot) [10:17:34] ... or maybe the "impedance" mismatch is with reportupdater, because we would still be exporting to CSV and therefore have to somehow pass values in a Cartesian product of categories. [10:18:30] awight: hi! If you can open a task with some details etc.. it will be reviewed by our best data engineers (so not me :P) asap :) [10:20:55] elukey: I'm mostly just rambling, pardon me :-). I'm starting to think there's a problem with the way I'm segmenting along all dimensions at once. The questions we'll ask are supported by segmentation along just one dimension at a time, i.e. a set -".byWiki" and a set -"byEditCount", ... [10:31:50] awight: nono no need for pardon, I wanted to make sure that your questions were seen/answered, most of people would probably ask for more details etc.. so doing it once would have saved you time :) [10:37:29] Thanks! (and muttered:) actually, ".byWiki" is handled by the reportupdater framework so what I described above has no benefit. [10:43:08] (03CR) 10Awight: [WIP] Process EventLogging events for VisualEditor (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/647742 (https://phabricator.wikimedia.org/T262209) (owner: 10Andrew-WMDE) [10:53:56] (03PS1) 10Awight: Glue for pure-setup.cfg project [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/649296 [11:21:11] (03PS2) 10Awight: Glue for pure-setup.cfg project [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/649296 [11:36:06] * elukey lunch! [11:52:46] Happy to report that the "funnel" feature of reportupdater already does exactly what I need, my script can output multiple rows and index columns and they will each be wired to a separate metric path. [12:17:01] (03PS3) 10Awight: [WIP] Process EventLogging events for VisualEditor [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/647742 (https://phabricator.wikimedia.org/T262209) (owner: 10Andrew-WMDE) [12:18:38] (03CR) 10Awight: [WIP] Process EventLogging events for VisualEditor (032 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/647742 (https://phabricator.wikimedia.org/T262209) (owner: 10Andrew-WMDE) [12:29:46] (03PS1) 10Fdans: Upgrade Webpack from 2 to 5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/649311 (https://phabricator.wikimedia.org/T188759) [12:31:01] (03CR) 10jerkins-bot: [V: 04-1] Upgrade Webpack from 2 to 5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/649311 (https://phabricator.wikimedia.org/T188759) (owner: 10Fdans) [12:32:33] (03PS4) 10Awight: Process EventLogging events for VisualEditor [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/647742 (https://phabricator.wikimedia.org/T262209) (owner: 10Andrew-WMDE) [12:54:06] (03PS5) 10Awight: Process EventLogging events for VisualEditor [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/647742 (https://phabricator.wikimedia.org/T262209) (owner: 10Andrew-WMDE) [12:56:44] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10nshahquinn-wmf) >>! In T244548#6687703, @Jpita wrote: > If you need me to do some specific actions on a device please let me know. > I've been using the app a bit Thur... [13:06:47] good morning! [13:27:19] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10hueitan) > I also checked again for events that failed validation and this time found two from last Tuesday that were missing the required `source_title` property, but... [13:38:23] awight: nice! [13:38:27] fdans: hola hola [13:45:32] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10SBisson) @Jpita you need to look at the preview for a full second for us to log the event. Make sure you do that. [13:45:46] My hive event schema "templatewizard" has revision=null, is this something about the new event system? [13:46:48] awight: could this be old data being sanitized? [13:47:38] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10Jpita) >>! In T244548#6688837, @SBisson wrote: > @Jpita you need to look at the preview for a full second for us to log the event. Make sure you do that. yeah I never... [13:49:29] joal: That was an intriguing guess! But I don't think so, this is event.templatewizard [13:50:49] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Sanitize and keep TemplateDataEditor events (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/646670 (https://phabricator.wikimedia.org/T260343) (owner: 10Awight) [13:51:04] Hm awight - This schema is written to have been migrated recently to EventGate - I wonder if it could be related [13:53:01] (03CR) 10Awight: Sanitize and keep TemplateDataEditor events (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/646670 (https://phabricator.wikimedia.org/T260343) (owner: 10Awight) [13:54:29] joal: I'm curious but my blocker at the moment is that the schema was migrated to add an optional field, and I can't figure out how to ignore rows with a missing struct column, would like to do something like "where ... and event.new_field is not null" [13:55:16] Should I make a task for this / it's unexpected behavior? [13:55:24] awight: I just re-read your question - Please excuse me for the non-processing of the new-event system - :) [13:55:45] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "There is so much going on here, in so many different languages (some of them even nested), I can't really review this in a way where I'm s" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [13:55:48] awight: The query you write doesn't work ? [13:57:39] joal: It dies with "RuntimeException cannot find field user_edit_count". OH [13:58:51] Here's a wild guess, putting the clues you gave me together in random order: maybe new-event messages are being sanitized before they land in the `event` database? [13:59:48] So two things going on: my query cannot be made robust to the optional field sometimes not being present, and potentially there's something stripping the field from data before it lands in `event`. [13:59:57] I see it present in the raw kafka feed. [14:01:05] Could very well be awight! [14:01:18] Let's ask mforns when he comes online [14:02:33] (03CR) 10Awight: "I've smoke-tested each query on the stat machines, but the piece that could break is the config file, and the way results are wired into g" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [14:02:57] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10nshahquinn-wmf) >>! In T244548#6688846, @Jpita wrote: > yeah I never do that, I open it, read the title and go to the article. > that's my MO on all similar things in... [14:07:28] 10Analytics-Clusters, 10Patch-For-Review: Kerberos credential cache location - https://phabricator.wikimedia.org/T255262 (10Ottomata) Pretty coOOOoOL! [14:30:31] (03PS1) 10Awight: [WIP] Aggregate TemplateWizard metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649351 (https://phabricator.wikimedia.org/T262209) [14:31:09] ottomata: o/ [14:31:16] the jupyter thing works! [14:31:29] \o/ [14:31:49] so elukey the same ticket used by the shell works for the notebooks? [14:31:57] exactly, and vice-versa [14:33:16] I tried to kinit on a bash session, use beeline and spark2-shell --master yarn, and then I moved to pyspark yarn on jupyter and it worked without a problem [14:33:23] (on an-test-client1001) [14:33:34] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:33:50] I'd do more test and move to stat1004 if you are ok [14:33:56] and ask people to test the settings.. [14:34:08] (I'm excited for this :-) [14:34:51] ahahhaha :D [14:35:33] java is as always flexible as a piece of concrete when you have to do things outside the defaults [14:35:42] but this should be the right setting [14:39:02] gmodena,fkaelin - hi! qq: are you subscribed to https://lists.wikimedia.org/mailman/listinfo/analytics-announce ? [14:39:26] it is useful if you work on the cluster, to get news/maintenance/etc.. [14:39:34] elukey yep [14:39:41] perfect [14:39:42] thanks for checking! [14:41:10] yes I am too, thank you! [14:43:13] elukey: +1 proceed please! [14:43:19] ottomata: ack! [14:43:40] isaacj: hello hello! Are you around? I am making an experiment on stat1004 and I see that you have a jupyter notebook [14:44:02] elukey: yes -- let me check. it probably can be shutdown [14:44:43] yep -- it finished what it was doing. shutdown now but thanks for checking! [14:44:47] isaacj: the test is basically to have a kerberos ticket shared between notebook and ssh/bash, would you also have time to test it and report back if you see weirdnesses? [14:45:10] oh yeah, i can help. i would also much appreciate that :) [14:45:27] ack all right so if you give me the green light I'll deploy the change :) [14:45:58] all good to go -- i'm not authenticated via kerberos on stat1004 right now either which i assume is what you want [14:46:35] also in case you didn't see -- thanks for including the feedback on terminal about whether you're authenticated or not when i first ssh in! [14:46:50] isaacj: nono I just need to shutdown your current notebook, and kdestroy in case you have creds [14:47:04] after that you are free to do anything you want [14:47:31] deploying now! [14:47:37] :thumbs up: [14:52:55] isaacj: all done! So now you have to kinit either on bash/ssh or on jupyter's terminal [14:53:00] and the session should be shared [14:54:36] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:55:58] elukey: thanks! quick check shows i can create / destroy kinit from SWAP and basic ssh and it works as expected (shared state) [14:56:07] i'll let you know if i see any weirdness though in the next few days [14:56:19] This is awesome --^ \o/ [14:57:00] isaacj: thanksssss! [14:57:12] no, thank you! [14:58:48] !log stat1004's krb credential cache moved under /run (shared between notebooks and ssh/bash) - T255262 [14:58:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:58:51] T255262: Kerberos credential cache location - https://phabricator.wikimedia.org/T255262 [15:00:00] 10Analytics-Clusters, 10Analytics-Kanban: Kerberos credential cache location - https://phabricator.wikimedia.org/T255262 (10elukey) p:05Triage→03Medium a:03elukey [15:00:16] 10Analytics-Clusters, 10Analytics-Kanban: Kerberos credential cache location - https://phabricator.wikimedia.org/T255262 (10elukey) Seems to work fine, Isaac is going to test the new settings on stat1004 during the next day and report if any weirdness comes up. Volunteers for testing on stat1004 are welcome! [15:09:32] nshahquinn: o/ [15:09:47] are you testing wikipediapreview_stats by any chance? [15:09:57] it is sending a lot of failure reports to analytics-alerts@ [15:11:22] elukey: yes, sorry about that [15:11:25] let me kill it [15:12:54] nshahquinn: all good I was just wondering :) [15:13:38] elukey: well, I actually wasn't testing...that was the "real" launch after all the testing [15:14:05] which is why it started actually using the email-on-failure workflow [15:14:20] but obviously something's not working correctly :P [15:15:10] (also, we really should tweak that workflow so that the destination address is configurable, so we can send ourselves the emails :) [15:16:53] hello teammmm [15:22:13] MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permi [15:22:16] ssion denied: user=neilpquinn-wmf, access=WRITE, inode="/user/analytics-product":analytics-product:analytics-product:drwxr-xr-x [15:22:18] hi mforns, hi team [15:22:19] nshahquinn: --^ [15:23:22] nshahquinn: are you going to run the job with you own user or is the plan to use the analytics-product one? The latter will need some work since we don't have the kerberos cred deployed yet IIRC [15:23:28] hello mforns and razzi :) [15:23:32] elukey: thanks! yeah, I think I accidentally submitted it as myself rather than as analytics-product. [15:24:02] elukey: oh, I thought it was all ready to use. [15:24:16] I do want to use analytics-product [15:26:33] yeah, looks like you're right. I tried to run a basic Presto query as analytics-product and it gave me a Kerberos error. I'll file a task for it. [15:27:38] nshahquinn: there is one, lemme find it, I was waiting for more instructions in there :) [15:28:37] elukey: ah, this? https://phabricator.wikimedia.org/T258970 [15:28:53] yep! [15:38:37] nshahquinn: I misremembered, the analytics-product keytab is there (so past Luca already deployed it) but kerberos-run-command seems not liking it [15:40:54] nono ok it works, not on stat1004, where I have the new settings [15:40:55] kinit: Failed to store credentials: Credentials cache permissions incorrect (filename: /run/user/13926/krb_cred) while getting initial credentials [15:43:43] interesting, this is the weird use case that I was waiting for [15:48:07] nshahquinn: so on any stat100x except 1004, you can run [15:48:26] sudo -u analytics-product kerberos-run-command analytics-product oozie-coomand-etc.. [15:48:29] and it should work [15:51:58] ahhhh KRB5CCNAME is preserved by sudo [15:58:23] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging for tomorrow's deployment train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/645419 (https://phabricator.wikimedia.org/T231339) (owner: 10Mforns) [16:18:44] 10Analytics, 10Event-Platform: EventStreams error in logs: Error: Invalid number of arguments (for prometheus?) - https://phabricator.wikimedia.org/T263759 (10Ottomata) 05Open→03Resolved a:03Ottomata This has been fixed since upgrading EventStreams to a new version of service-runner that better supports... [16:20:58] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10Ottomata) a:03elukey [16:21:11] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10Ottomata) a:03elukey [16:22:46] 10Analytics, 10Analytics-Kanban, 10Growth-Team, 10Product-Analytics: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10Ottomata) I wonder if {T249745} is related. [16:27:17] elukey: razzi yoohoo [16:30:13] mforns: yoohoooo [16:30:26] uop [16:35:10] 10Analytics, 10DBA: mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking - https://phabricator.wikimedia.org/T270112 (10jcrespo) [16:35:49] 10Analytics-Clusters: Deprecate the anaytics-users POSIX group - https://phabricator.wikimedia.org/T269150 (10Ottomata) a:03elukey [16:35:54] 10Analytics, 10Product-Analytics, 10Epic: Readership Retention: New vs. Returning Unique devices - https://phabricator.wikimedia.org/T269815 (10fdans) the last access timestamp is present in webrequest so we should be able to perform this distinction, according to @mforns looking into it. [16:37:54] 10Analytics, 10DBA: mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking - https://phabricator.wikimedia.org/T270112 (10jcrespo) p:05Triage→03Low This is not a huge concern since we have memory monitoring T172490, but adding it here for tracking, so we can research at a later tim... [16:38:13] 10Analytics, 10DBA: mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking - https://phabricator.wikimedia.org/T270112 (10jcrespo) [16:39:45] 10Analytics-Clusters, 10Analytics-Kanban: Deprecate the anaytics-users POSIX group - https://phabricator.wikimedia.org/T269150 (10Ottomata) [16:43:36] 10Analytics, 10Product-Analytics, 10Product-Infrastructure-Data: Schema repository structure, naming - https://phabricator.wikimedia.org/T269936 (10sdkim) a:03jlinehan [16:45:52] 10Analytics: Wikistats map's choropleth shows the same color for 0 and minimum nonzero value - https://phabricator.wikimedia.org/T269883 (10fdans) p:05Triage→03High a:03fdans [16:48:39] 10Analytics-Radar, 10Product-Analytics, 10Growth-Team (Current Sprint): HomepageVisit schema validation errors - https://phabricator.wikimedia.org/T269966 (10fdans) cc @mforns [16:50:40] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade AMD ROCm drivers/tools to latest upstream - https://phabricator.wikimedia.org/T264408 (10Ottomata) a:03klausman [16:59:38] 10Analytics, 10Analytics-Kanban: AQS should be more resilient to druid nodes not available - https://phabricator.wikimedia.org/T268811 (10Ottomata) @fdans if we can get a patch for this and merged we can deploy this and the patch for {T268809} together. [17:00:39] 10Analytics-Clusters, 10Analytics-Kanban: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10Ottomata) [17:00:43] 10Analytics-Radar, 10Product-Analytics, 10Growth-Team (Current Sprint): HomepageVisit schema validation errors - https://phabricator.wikimedia.org/T269966 (10nettrom_WMF) >>! In T269966#6687832, @MMiller_WMF wrote: > @nettrom_WMF -- we'll talk about this task in our team meeting on Monday. Does this disturb... [17:00:46] 10Analytics-Clusters: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10Ottomata) [17:03:42] 10Analytics-Clusters, 10Machine Learning Platform, 10ORES, 10Research: Desired packages to be installed/upgraded on the PySpark cluster (jupyterhub) - https://phabricator.wikimedia.org/T249078 (10Ottomata) 05Open→03Resolved a:03Ottomata Pretty sure all these packages are deployed as part of our base... [17:03:44] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Create anaconda .deb package with stacked conda user envs - https://phabricator.wikimedia.org/T251006 (10Ottomata) [17:03:46] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) [17:06:02] 10Analytics, 10Analytics-SWAP, 10Product-Analytics: Enable widgets on Jupyter Labs on SWAP - https://phabricator.wikimedia.org/T227217 (10Ottomata) 05Open→03Resolved a:03Ottomata nodejs and different versions of should be available via https://wikitech.wikimedia.org/wiki/Analytics/Systems/Anaconda with... [17:07:29] 10Analytics, 10Analytics-SWAP: Support R Kernels by default for all users. - https://phabricator.wikimedia.org/T190453 (10Ottomata) 05Open→03Resolved a:03Ottomata R Kernels should be available. [17:07:32] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Create anaconda .deb package with stacked conda user envs - https://phabricator.wikimedia.org/T251006 (10Ottomata) [17:07:34] 10Analytics, 10Analytics-SWAP: Jupyter Notebooks TLC 2018-2019 - https://phabricator.wikimedia.org/T188275 (10Ottomata) [17:07:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) [17:08:20] 10Analytics, 10Analytics-SWAP: Notebook machine to double as RStudio Server? - https://phabricator.wikimedia.org/T190769 (10Ottomata) [17:08:22] 10Analytics, 10Analytics-SWAP: RStudio web version on SWAP - https://phabricator.wikimedia.org/T180270 (10Ottomata) [17:08:58] 10Analytics, 10Analytics-SWAP: Users should be able to read their jupyter instance logs - https://phabricator.wikimedia.org/T198764 (10Ottomata) p:05Medium→03High [17:09:45] 10Analytics: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10Ottomata) 05Open→03Resolved a:03Ottomata This should be possible now via conda and newpyter. [17:22:55] I see that the TemplateWizard eventlogging schema was chosen as a pilot topic for the new infrastructure. [17:23:16] Can I delete metawiki:Schema:TemplateWizard? [17:24:24] Or is it possible that the eventlogging processor is still reading from there, so the advice in schemas-event-secondary is a bit wrong, and we still need the redundancy between metawiki and the schemas repo? [17:25:04] awight: previously it would be dangerous to delete the schema, buti think our process now is safe for that [17:25:12] i should have marked the schema as migrated in its talk page [17:25:13] sorry about that [17:25:19] let's delete it. [17:26:43] ottomata: great, thanks! [17:27:40] ottomata: If you have a minute, I also have a question about how to safely migrate these schemas. I need to add an optional field (will edit the yaml), but how can I detect the schema patchlevel in hive? [17:27:46] `revision` is null [17:28:50] (I can't delete the page, maybe you want to do it so you can monitor?) [17:35:08] 10Analytics, 10Analytics-Kanban: Can't use custom conda kernel in Newpyter within PySpark UDFs - https://phabricator.wikimedia.org/T269358 (10Isaac) To make this more relevant, I now find myself in a situation where I do want a Python lib on the workers that is not available in the standard conda environment:... [17:38:00] 10Analytics, 10Analytics-Kanban: Can't use custom conda kernel in Newpyter within PySpark UDFs - https://phabricator.wikimedia.org/T269358 (10awight) I've switched from custom kernels to the generic Python (not pyspark) kernel, and can install packages directly in the notebook's environment: https://gitlab.com... [17:42:26] 10Analytics, 10Analytics-Kanban: Can't use custom conda kernel in Newpyter within PySpark UDFs - https://phabricator.wikimedia.org/T269358 (10Isaac) > I've switched from custom kernels to the generic Python (not pyspark) kernel, and can install packages directly in the notebook's environment Thanks for chiming... [17:48:26] 10Analytics, 10Analytics-Wikistats: Last Caliph of Islam - https://phabricator.wikimedia.org/T270117 (10Faaizan) [17:48:55] 10Analytics, 10Analytics-Kanban: Can't use custom conda kernel in Newpyter within PySpark UDFs - https://phabricator.wikimedia.org/T269358 (10awight) >>! In T269358#6689561, @Isaac wrote: > the challenge is when I want Spark workers to also have access to the library so I can parallelize the computation. Oho!... [17:58:28] 10Analytics, 10Analytics-Wikistats: Last Caliph of Islam - https://phabricator.wikimedia.org/T270117 (10Majavah) 05Open→03Invalid Content issues are not handled in Phabricator. We have answered this question dozens of times now. The Wikipedia article says that Mirza Masroor Ahamd is the 5th caliph *of the... [17:59:23] elukey: do you have a sense of how long it will take to fix the analytics-product Kerberos issue? Just wondering whether I should wait for that or instead use my own account for now :) [18:00:17] nshahquinn: on what node are you testing? In theory it should be fixed [18:01:21] elukey: stat1008...I just tested the presto command and it failed again [18:02:03] nshahquinn: can you give me the command? [18:02:27] I just tried `sudo -u analytics-product kerberos-run-command analytics-produst presto --catalog analytics_hive` [18:02:38] That resulted in `The user keytab that you are trying to use (/etc/security/keytabs/analytics-produst/analytics-produst.keytab) doesn't exist or it isn't readable from your user, aborting...` [18:02:46] oh [18:02:48] wait [18:02:49] yep :) [18:02:52] typo [18:02:59] produst :) [18:03:09] hahaha yeah [18:12:26] ottomata: Is there a wiki page for developers using new-event? I'm wondering how to synchronize future schema upgrades: why was I able to send events using a different schema than the published one, without validation errors? [18:15:09] awight: have you seen https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To [18:15:14] but i don't think that answers your question [18:18:14] elukey: I actually think it's still not working. If I run `sudo -u analytics-product kerberos-run-command analytics-product presto --catalog analytics_hive` (proper spelling, right?), I get the Presto interface, but then when I try to run anything, I get `Error running command: Kerberos error for [presto@an-coord1001.eqiad.wmnet]: Unable to obtain password from user`. [18:21:15] nshahquinn: I can check it, it might be a specific thing for presto, I am pretty sure it works with oozie [18:21:18] have you tried? [18:22:28] ahhh yes indeed the presto wrapper needs some adjustment [18:28:15] ottomata: Thanks, that page does answer my question, we're supposed to send a `$schema` property! [18:30:00] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Mholloway) [18:30:17] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Mholloway) p:05Triage→03High [18:33:02] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Ottomata) Looks like this also happened in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventLogging/+/648288 ping @Ladsg... [18:33:12] nshahquinn: so I need to figure out a way to fix the presto cli, but oozie should work just fine [18:34:15] nshahquinn: do we really need to run presto as analytics-product thought? It is read only, so you username should be ok [18:36:18] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Mholloway) [18:42:10] (03CR) 10Mforns: [C: 03+1] "Code looks good to me!" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/647756 (https://phabricator.wikimedia.org/T268809) (owner: 10Razzi) [18:43:40] !log applying yarn config change via `sudo cumin "A:hadoop-worker" "systemctl restart hadoop-yarn-nodemanager" -b 10` [18:43:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:45:44] :) [18:46:10] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/644201 (https://phabricator.wikimedia.org/T264987) (owner: 10Gilles) [18:53:49] (03CR) 10Mforns: [C: 03+1] "I don't know about the graphite connector, probably Adam is right!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [18:54:44] !log restart hadoop-yarn-resourcemanager on an-master1001 [18:54:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:55:16] 10Analytics, 10Analytics-Kanban: Can't use custom conda kernel in Newpyter within PySpark UDFs - https://phabricator.wikimedia.org/T269358 (10Ottomata) @Isaac does https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark#pyspark_and_external_packages help at all? You could certainly pass those args... [18:56:10] 10Analytics, 10Analytics-Kanban: Can't use custom conda kernel in Newpyter within PySpark UDFs - https://phabricator.wikimedia.org/T269358 (10Ottomata) Alternatively, if you think shapely is something that would be nice for others to have on the workers in Anaconda, we can add it to our Anaconda package and de... [18:56:55] elukey: ah, no, no need for the Presto CLI. I was just using it as a test for Kerberos that didn't require starting an Oozie job that might fail and spam your alerts email address (I think I figured out how to override it with mine, but I'm still a bit wary). I'll give the Oozie job another try :) [18:58:02] nshahquinn: super :) [18:58:39] 10Analytics-Radar, 10Product-Analytics, 10Growth-Team (Current Sprint): HomepageVisit schema validation errors - https://phabricator.wikimedia.org/T269966 (10MMiller_WMF) We will address the fix for this specific issue in January. [19:01:26] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/646670 (https://phabricator.wikimedia.org/T260343) (owner: 10Awight) [19:02:35] !log restart hadoop-yarn-resourcemanager on an-master1002 [19:02:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:03:24] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Mholloway) Removing the eventgate-wikimedia dev dependency resolves the error (testing [[ https://gerrit.wikimedia.org/r/c/mediawik... [19:03:50] razzi: congrats on your first big config change deployed :) [19:05:17] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Ottomata) I wonder if https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/643519 is the cause. It does do some weird stuff with... [19:06:55] possibly why yarn.wikimedia.org/ is not reachable anymore? [19:07:57] fkaelin: yes, it will be back again momentarily [19:08:14] elukey: the analytics-product Oozie job seems to be working fine! Thank you! 😁 [19:08:39] !log restarted hadoop-yarn-resourcemanager on an-master1001 again by mistake [19:08:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:09:31] !log restart restart hadoop-yarn-resourcemanager on an-master1002 to promote an-master1001 to active again [19:09:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:09:34] nshahquinn: \o/ [19:09:47] fkaelin: should be all set now [19:10:09] yay thank you razzi [19:10:13] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Mholloway) It seems to be failing while trying to resolve dependencies, before anything is installed. Also, the eventgate-wikimedi... [19:12:25] 10Analytics-Clusters, 10Analytics-Kanban: Set yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds - https://phabricator.wikimedia.org/T269616 (10razzi) This has been deployed to the hadoop workers and master. To test, we can view a long-running job and see that its logs are aggregated at the 1-ho... [19:17:23] * elukey afk! [19:19:51] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Mholloway) If the eventgate dev server stuff is mainly for the benefit of MediaWiki-Docker users, maybe we could work out a docker-... [19:21:57] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/647742 (https://phabricator.wikimedia.org/T262209) (owner: 10Andrew-WMDE) [19:49:25] 10Analytics-EventLogging, 10Analytics-Radar, 10Product-Infrastructure-Team-Backlog, 10Epic: Explore an API for logging events sampled by session - https://phabricator.wikimedia.org/T168380 (10Jdlrobson) I believe this was addressed by the addition of mw.eventLog.inSample ? [19:52:13] 10Analytics, 10Analytics-Kanban, 10Growth-Team, 10Product-Analytics, 10Patch-For-Review: Migrate Growth EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T267333 (10kostajh) > The following schemas have been migrated successfully. [...] Thank you! > The other schemas have not b... [19:58:45] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10hashar) [19:59:08] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10hashar) [20:00:48] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10hashar) T270100#6689252 James pointed that it could be a cache corruption, I could not reproduce the failure using the same contai... [20:11:20] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Mholloway) Yes, looks fixed, thanks @hashar and @Jdforrester-WMF. [20:12:31] mforns: Is it pioneering to use the new event platform with a PHP client? I don't see any provisions for $schema in ext-EventLogging. [20:23:50] awight: the PHP client is on the way, but might take a bit [20:24:23] I don't understand what you mean with pioneering, though [20:26:09] mforns: That explains it, yeah I just meant to ask if I would be the first one. [20:29:06] awight: ah! understand. Yes it would. Maybe hip can give you more details on the client status, goes by jlinehan on slack. [20:29:36] I believe it's in progress, but paused right now, until the session length work is finished. [20:32:40] mforns: Sorry, I just realized that the extension I'm working with, TemplateWizard, sends its events from JS. Not sure what I was thinking there. [20:33:26] But to continue diagnosing my problem of lacking event schema `revision` data in hive, I think that https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/TemplateWizard/+/refs/heads/master/resources/ext.TemplateWizard.js is missing some new-event glue, right? [20:33:38] It should send `$event`, at least. [20:34:16] (and I'm trying to tell if the mw.track abstraction is already compatible/equivalent to new- mw.eventLog.submit [20:42:04] Okay, so mw.track should work. The presence of `$schema` is what switches between legacy / new event mode. [20:42:31] (should I update the examples to use `mw.track`?) [20:46:35] aah and $schema is populated transparently if the configured eventlogging schema ID is a path string. Nice! [20:47:43] I'll tweak the example in https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Writing_instrumentation_code according to how I understand this, please feel free to revert or adjust if I'm wrong. [20:53:39] haha awight you found it all out! [20:53:42] awight: [20:53:49] that interface is just for legacy compatibility [20:53:50] for new stuff [20:54:00] you should not use mw.track OR mw.eventLog.logEvent [20:54:03] you shouldl use mw.eventLog.submit [20:57:31] awight i have a meeting fo rthe next 30 mins but would be happy to help you figure your stuff out after that [21:01:32] sorry awight had to afk for a bit [21:03:23] +1 all good, I'm psyched to be tinkering with eventgate. [21:04:51] Great idea to include {eventlogging,eventgate}-devserver in the EventLogging PHP extension for convenience. [21:12:34] possible gotcha: I'll need to set $wgEventLoggingServiceUri to something like 'http://localhost:8192/v1/events'. [21:14:29] Another thing I'm finding difficult about the wiki page, there is talk about "setting $wgEventStreams" but that's not how we normally package extensions. In extension.json, there would be an {attributes: EventLogging: Schemas: ...} which defines the streams used by the extension. [21:14:55] In other words, the default configuration is encapsulated in the extension and we never set globals directly. [21:31:22] awight: we need to be able to query that configuration from other services [21:31:29] so it needed to be exported globally somehow [21:31:52] in the old EventLogging case, it was the onlything producing (and configuring) events [21:31:58] now, there are many producers and consumers [21:32:06] that may not even be part of MediaWiki [21:32:53] meeting over btw, lemme know if i can help ya [21:34:17] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [21:34:31] ! lookkng [21:35:48] ottomata: (and m-forns) Thanks for all the help and review! I dumped my notes here https://phabricator.wikimedia.org/T262209#6690235 [21:36:23] Don't feel obliged, but a sanity check would be helpful. [21:37:40] What you say about externally exporting events makes sense, and I am happy to accept the situation (for now :-). [21:38:11] (03PS3) 10Fdans: Add Active Editors per Country metric to Wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/647792 (https://phabricator.wikimedia.org/T188859) [21:39:03] 10Analytics-EventLogging, 10Analytics-Radar, 10Product-Infrastructure-Team-Backlog, 10Epic: Explore an API for logging events sampled by session - https://phabricator.wikimedia.org/T168380 (10mpopov) >>! In T168380#6689930, @Jdlrobson wrote: > I believe this was addressed by the addition of mw.eventLog.inS... [21:39:48] I guess I'll also file a bug about the missing `revision` field, in case there's an obvious answer or it's in some layer I'll never figure out on my own. [21:43:36] 10Analytics, 10Event-Platform: jsonschema-tools should have option to require examples - https://phabricator.wikimedia.org/T270134 (10Ottomata) [21:43:57] looking awight [21:45:17] awight: why does the revision field need to be set? [21:45:58] ottomata: I'm wondering the same thing. Maybe I'm taking the wrong approach to my problem. [21:46:31] I'm trying to add new optional fields to the TemplateWizard schema. [21:46:58] awight: https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Evolving [21:47:00] (03CR) 10jerkins-bot: [V: 04-1] Add Active Editors per Country metric to Wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/647792 (https://phabricator.wikimedia.org/T188859) (owner: 10Fdans) [21:47:00] should cover that, right? [21:47:14] My first attempt failed, because I didn't realize that the metawiki schema page has been deprecated. (Interestingly, I'm still able to send the events and they are valid event with extra fields?) [21:47:34] yeah metawiki: that's my fault (responding to your comment now...) [21:48:17] Well, the problem is that my aggregation query is looking for event.new_field, but hive structs don't support "exists" semantics as far as I can see. [21:48:18] valid with extra fields: probably. additionalProperties: false doesn't propogate to subobjects [21:48:45] awight: is that new_field in the schema? [21:48:48] the hive schema? [21:48:48] really... so you can send invalid fields in event.* [21:48:58] ottomata: Not yet [21:49:01] ok [21:49:08] awight: lemme respond to your comment [21:49:14] then want to jump in a hangout? [21:52:34] ottomata: Would you have a little time tomorrow your morning? (I'm in UTC+1) [21:53:10] ah late for you [21:53:22] i'm not sure i have a tooon of meetings and interviews this week its crazy! [21:53:35] let's see if i can explain a little bit [21:54:09] if your event contains fields that are not in the schema, but additionalProperties is not false (at that object level), the event will validate [21:54:13] that is waht is happening here [21:54:16] event.new_field is not in the scheam [21:54:21] nooo <3 ah okay I will learn [21:54:35] but the event object in the schema does not set additionalProperties: false (perhaps it should...?) [21:54:38] anyway, it passes validation [21:54:40] b ut [21:54:51] the jsonschema is ALSO used for ingestion into hive [21:54:59] and to read the json data itself [21:55:00] so [21:55:09] if the schema does not have the field that is in the data [21:55:15] it'll just be ignored [21:55:32] That's very clear, thank you [21:55:41] the 'schema' here in this case is a bit more complicated than just the jsonschema, buuuut those are details :p [21:56:12] there are some bugs and issues with the schema we use for hive ingestion, but theoretically that is the way it (should) owrk [21:56:13] work* [21:57:17] So what I need to do is, a) update the schema and deploy it, b) wait for hive ingestion to include the field, c) set the start-date for my aggregation to fall the day after new events include the field. [21:57:39] hmm [21:57:57] i think once you merge the schema (and at least one new event has been ingested into hive) [21:58:14] the full hive schema will have the new field, and if you select it explicitly [21:58:15] e.g. [21:58:21] ooh the schema change is retroactive! nice. [21:58:21] select event.new_field where ... [21:58:37] if your where includes old partitions without event. new_field, it will jsut return nulll [21:58:43] That's perfect. [21:59:06] there is a gotcha of some kind that i can't totallly remember...i think some select * or event.* type queries might error out across partitions with different underlying schemasx [21:59:26] IIRC if you explicitly list the fields you are selecting though it shouldnt' be a problem [22:00:01] O_O interesting. I'll be sure to brag if I trigger that feature. [22:01:05] That was a much simpler answer than I'd expected, thanks for saving me from a lot more exploration. [22:02:04] glad I could help! [22:02:14] apologies for not noting that the templatewizard schema was migrated [22:02:26] I edited its metawiki talk page and added that [22:02:44] Very minor, sorry that my hypothesizing included wild accusations ;-) [22:03:27] growing pains... hopefully I can pass some of this on to the rest of my team and save you from other confused devs wandering in here. [22:04:01] hmm EventLogging manages the Schema: namespcae, right? [22:04:07] i wonder if we could automate the display of the migrated message... [22:10:09] Sounds fancy, but I think EventLogging only reads from the Schema namespace so far, never writes. (I support the idea, though!) [22:11:19] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Analytics, 10Product-Infrastructure-Data: Automate deprecation of schema on metawiki after migration to Event Platform - https://phabricator.wikimedia.org/T270136 (10Ottomata) [22:11:27] awight: somethjing like that ^ [22:11:54] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Analytics, 10Product-Infrastructure-Data: Automate deprecation of schema on metawiki after migration to Event Platform - https://phabricator.wikimedia.org/T270136 (10Ottomata) [22:13:56] 10Analytics, 10Analytics-EventLogging, 10Continuous-Integration-Config: mwgate-node10-docker fails for EventLogging with npm error - https://phabricator.wikimedia.org/T270118 (10Ottomata) > If the eventgate dev server stuff is mainly for the benefit of MediaWiki-Docker users I think it is for them...and also... [22:16:34] 10Analytics, 10Analytics-Kanban, 10Growth-Team, 10Product-Analytics, 10Patch-For-Review: Migrate Growth EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T267333 (10Ottomata) Yup! {T253121} [22:31:13] 10Analytics, 10Event-Platform: produce_canary_events job should not fail if a schema is missing examples - https://phabricator.wikimedia.org/T270138 (10Ottomata) [22:36:35] 10Analytics, 10Research: Release dataset on top search engine referrers by country, OS, and language - https://phabricator.wikimedia.org/T270140 (10Isaac) [22:36:48] 10Analytics, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10Isaac) [22:47:28] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [22:49:20] 10Analytics, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10Isaac) Some data from December 10th to help us think about privacy. Raw data can be found in `isaacj.search_engine_data` in Hive and data pipeline in `stat1004...