[03:43:17] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Nuria) >Breakdown by wiki domain (derived from webpage url?). On my opinion this should not be needed, r... [03:47:46] joal: all right! [03:54:26] 10Analytics, 10Operations: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10Zoranzoki21) >>! In T247266#5954803, @Aklapper wrote: >> Therefore, I would like to inquire about the possibility to use graph-tool on one of the stat-machines (e.g. stat1005) via any of... [04:40:02] 10Analytics, 10ContentTranslation, 10Operations, 10SRE-Access-Requests, 10Language-Team (Language-2020-January-March): Request for access for stats machines for Santhosh - https://phabricator.wikimedia.org/T247246 (10santhosh) [04:40:26] 10Analytics, 10ContentTranslation, 10Operations, 10SRE-Access-Requests, 10Language-Team (Language-2020-January-March): Request for access for stats machines for Santhosh - https://phabricator.wikimedia.org/T247246 (10santhosh) >>! In T247246#5953952, @Nuria wrote: > @santhosh What is your LDAP user? Sa... [04:41:35] 10Analytics, 10ContentTranslation, 10Operations, 10SRE-Access-Requests, 10Language-Team (Language-2020-January-March): Request for access for stats machines for Santhosh - https://phabricator.wikimedia.org/T247246 (10Nuria) FYI that @santhosh is a WMF employee. Approved on my end [04:56:14] 10Analytics, 10ContentTranslation, 10Language-Team (Language-2020-January-March): Test Performance of Marian NMT translation in stat cluster - https://phabricator.wikimedia.org/T247245 (10santhosh) @JAllemandou I have filled https://phabricator.wikimedia.org/T247246 for access. Setting up MarianMT in that s... [04:58:18] 10Analytics, 10ContentTranslation, 10Language-Team (Language-2020-January-March): Test Performance of Marian NMT translation in stat cluster - https://phabricator.wikimedia.org/T247245 (10Nuria) pinging @elukey on this [07:59:09] elukey: I am trying to use the graph-tool python-package on stat; if you have the time, any chance you could have a look? https://phabricator.wikimedia.org/T247266 [08:00:02] 10Analytics: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10MGerlach) [08:01:34] 10Analytics: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10MGerlach) @Aklapper @Zoranzoki21 forgot to add tags before signing out yesterday. thanks for the reminder. (added to analytics for now). [08:03:36] mgerlach: hey! what cpp deps are missing when building with pip? [08:03:50] ah ok I see them in the task [08:05:27] elukey: manual installation via pip in a venv is very cumbersome and prone to failure, see here for a full list https://git.skewed.de/count0/graph-tool/-/wikis/installation-instructions/#manual-compilation [08:06:21] yes I was reading [08:06:51] we could think about importing the package on our apt repo, even if it is ubuntu-only (may work fine) [08:06:56] so just curious if there are any possibilities here [08:07:58] mgerlach: is it a on-off testing or something that is needed for a research project? I am asking because we can definitely find a solution, like importing those packages, but maintaining them has of course a cost. So if it is backed up by a project that you are doing I am all for it, if it is a one-off testing I'd be less inclined :) [08:08:36] it is part of a project, I have been using the package for some time [08:09:30] I remember at some point there was a discussion to also have conda on the stat-machines? [08:09:44] good morning folks - Happy compiling - https://xkcd.com/303/ [08:09:50] yes Andrew is working on it as part of the revamping of Notebooks :) [08:10:05] bonjour :) [08:10:26] mgerlach: how soon is it needed? I can try to prioritize it but the next couple of weeks are full of things [08:10:55] not very urgent, but the sooner the better ; ) [08:11:05] elukey: when you'll have a minute counld we discuss the error I generated this weekend? I think it;s interesting [08:12:45] mgerlach: ack so I'll add some info to the task and will talk with Nuria to prioritize it :) [08:12:53] joal: sure, can we do it in here? [08:13:00] yes :) [08:13:53] elukey: thanks, sounds good. [08:14:03] elukey: here is what happened - I ran a spark job reading/writing a lot of data (3 to 4 steps, each reading writing ~1Tb) [08:14:34] The cluster was empty, so I used ~1/2 of it in RAM/CPU (1024 parallel tasks) [08:15:26] That lead to the shuffle-services being overloaded by too much writing/reading [08:16:43] see https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&from=now-7d&to=now&fullscreen&panelId=19 and http://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&from=now-7d&to=now&fullscreen&panelId=18https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&from=now-7d&to=now&fullscreen&panelId=19 [08:17:01] You can guess from those charts when the app was running [08:18:03] The solution has been for me to decrease parallelism, preventing node-managers to suffer from spark-shuflle-service - at the cost of time [08:18:43] This example makes me wonder about potential improvements: Should we grow the available RAM for node-managers? [08:19:05] Or should we gro the number of machines, reducing the disk-space on each, to better spread the load [08:19:08] elukey: --^ [08:19:25] elukey: You know it all - comments/questions/ideas welcome :) [08:20:28] joal: I am wondering if we could try G1 GC first, and see if it makes any difference [08:20:38] elukey: we surely could!! [08:20:52] it is doing a good job for the namenodes [08:21:15] Let's do that!! [08:21:31] okok, we can do it :) [08:21:35] how urgent this work is? [08:21:41] elukey: not urgent at all :) [08:22:58] super, I'll open a task :) [08:23:04] elukey: can be deprioritized to next time you have time (probably end-of-fiscal-year-of-your-sixties) [08:23:45] ahhahahahaah [08:24:11] nah I think that I should be able to do it this week, super curious [08:24:20] <3 [08:24:38] elukey: I'm feeling less alone in the nerd-snipping area :-D [08:24:42] I also need to work on mgerlach's packages though otherwise he'll think that Analytics people are the worst [08:26:20] sure elukey [08:27:06] joal, speaking of nerd sniping - on stat1004 and stat1005 there are notebooks :) [08:27:13] with a caveat [08:27:15] huhuh [08:27:30] stat1005 has also the latest and greatest pypi packages for jupyterhub [08:27:51] but yesterday when me and nuria tried kerberos wasn't working with spark (no credentials found etc..) [08:27:58] hm [08:35:40] yes it seems as if it wasn't able to check credentials [08:36:01] they are in the private /tmp of the notebook's systemd unit [08:36:09] so I am wondering if anything changed [08:37:23] (03CR) 10Joal: "Bunch of comments :)" (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/570681 (https://phabricator.wikimedia.org/T243090) (owner: 10Mforns) [08:37:25] ah! I checked in the private tmp dir, there are no krb credentials even if I kinited [08:37:32] that would explain it [08:38:11] how would that even be possible elukey? write permission issue? [08:38:29] no idea [08:38:40] I think it is probably a kinit setting [08:38:59] you can specify where the credentials cache should be IIRC [08:39:14] but if it is not where spark looks for it, then spark will complain [08:39:14] Ah [08:39:18] I think [08:40:17] ok no my bad, now I can see them in the private tmp [08:40:38] the notebook was restarted to private tmp was wiped and recreated [08:40:42] klist shows them under /tmp [08:40:54] ok [08:44:34] gone for ~2h errand [08:44:52] o/ [08:47:16] very weird, I killed the test kernels, etc.. and created newer ones, all works [08:48:10] mgerlach: o/ - sorry to bother you, but i noticed that you have a notebook running on stat1005 [08:48:37] yesterday I deployed on stat1005 a jupyterhub server, last brand new version [08:48:49] if you want to help testing feel free to try it [08:49:19] ssh stat1005.eqiad.wmnet -L 8000:127.0.0.1:8000 or similar and you can access the UI [08:49:40] note: there might be some changes to do in your venv [09:00:44] elukey: thanks for letting me know, I will have a look. [09:00:50] thanks! [09:01:22] what the difference to running a 'normal' jupyter notebook in a venv ? [09:03:27] mgerlach: not much, I thought that you were one of the people finding difficult to work on restricted home space on notebook100[3,4] [09:04:34] I am also a little bit ignorant about the difference between what you have running your own venv/notebook vs using jupyterhub [09:04:46] I guess that with the latter there is a UI with some good tools etc.. [09:05:40] elukey: yes, that is why I ran the notebook on stat1005 [09:07:20] mgerlach: ah ok then I am working on unifying stat and notebooks, and currently stat1004 and stat1005 have swap [09:07:37] stat100[6,7] will follow hopefully soon [09:08:00] so the idea is that you guys will connect to stat100x and get the same config [09:08:13] but it is a long process :D [09:08:26] that sounds very good [09:24:04] !log removed /etc/mysql/conf.d/stats-research-client.cnf from all stat boxes (all file used for RU, now on an-launcher1001) [09:24:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:13:01] 10Analytics, 10Analytics-Wikistats, 10Product-Analytics: Contribution inequality graphs for Wikistats - https://phabricator.wikimedia.org/T195033 (10Nemo_bis) > Gini coefficient for edits In my experience that's needlessly expensive to compute, while the [Theil index](https://en.wikipedia.org/wiki/Theil_ind... [10:37:21] 10Analytics, 10ContentTranslation, 10Operations, 10SRE-Access-Requests, 10Language-Team (Language-2020-January-March): Request for access for stats machines for Santhosh - https://phabricator.wikimedia.org/T247246 (10elukey) [11:27:17] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10fgiunchedi) >>! In T226986#5954880, @Krinkle wrote: > I've saved this as a dashboard called [mw-client-e... [11:55:44] * elukey lunch! [11:56:20] 10Analytics: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10GoranSMilovanovic) Just to confirm as a Data Scientist for Wikidata, WMDE: we absolutely want to have this available from our stat100* machines. @MGerlach a few days ago on the #wmde-analytics-engineeri... [11:58:06] 10Analytics: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10elukey) s/we absolutely want/we would love/ seems more appropriate, just a note :) I'll have a chat with @Ottomata about what is best in this case and report back! [12:02:13] 10Analytics: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10GoranSMilovanovic) @elukey :) Edited T247266#5955801. It now states: > ... we would absolutely love to have this available from our stat100* machines. [12:31:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Vet high volume bot spike detection code - https://phabricator.wikimedia.org/T238363 (10JAllemandou) Vetting heuristic -- One day of manually computed `automated` actors has the exact same number than the one in `predictions.actor_label_hourly`: ` spark.s... [12:33:09] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Virtual pageviews should set access_type to mobile if webhost is a mobile one - https://phabricator.wikimedia.org/T246309 (10nshahquinn-wmf) >>! In T246309#5953781, @Nuria wrote: > @hueitan Hello, Please be so kind to modify the VirtualPageview schema addi... [12:54:19] 10Analytics, 10Analytics-Kanban, 10Release Pipeline, 10Patch-For-Review, and 2 others: Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10Ottomata) > The metric won't be a single one for all services though, when Prometheus pulls from k8s services it'll attach ta... [12:56:30] 10Analytics, 10Operations, 10Wikidata, 10Wikidata-Query-Service: Deployment strategy and hardware requirement for new Flink based WDQS updater - https://phabricator.wikimedia.org/T247058 (10Ottomata) While not a google doc, the parent ticket's description describes it pretty well: {T244590} [13:10:22] 10Analytics, 10Multimedia, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10BerndFiedlerWMDE) 05Resolved→03Open Dear all, if I should report this on any other channel, please tell me. Here it is: The file "presidential election" can no... [13:12:12] 10Analytics: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10Ottomata) My hope is that we will switch to conda envs in the near future (next quarter?), so hopefully we can resolve this then. [13:24:53] 10Analytics: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10MGerlach) @Ottomata conda-environments sound great. It would most likely resolve this issue without any additional support. So if on the (near?) horizon, I would add graph-tool as another use-case. Is t... [13:25:12] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) > Breakdown by wiki domain (derived from webpage url?). If you do this, I'd recommend using se... [13:26:42] 10Analytics: Installing package graph-tool on one stat-machine - https://phabricator.wikimedia.org/T247266 (10Ottomata) Yup, as part of the 'newpyter' SWAP rewrite I've mentioned to you guys. This quarter I'll be writing a design doc on what we want to do. {T224658} [13:27:46] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10nshahquinn-wmf) >>! In T244547#5953780, @Nuria wrote: > In order to count pageviews the UA of the app needs to be "wikipediaApp/" Unfortunately, it turns out th... [13:39:15] (03CR) 10Mforns: [V: 03+2] Add dimensions to druid's pageview_hourly (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/570681 (https://phabricator.wikimedia.org/T243090) (owner: 10Mforns) [13:43:59] (03CR) 10Joal: Add dimensions to druid's pageview_hourly (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/570681 (https://phabricator.wikimedia.org/T243090) (owner: 10Mforns) [14:04:10] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) I've enabled this on haw.wikipedia.org :) [14:14:22] ottomata: o/ [14:14:24] gooood morning [14:14:41] ok if I deprecated statistics-users and statistics-privdata-users? [14:15:38] hello [14:15:40] go go go! [14:15:40] :) [14:16:26] ottomata: can I bother a sec here? [14:16:29] brainstorming [14:16:45] elukey: ya! [14:20:46] soo I this morning I was thinking to leave 'researchers' for this step [14:20:57] and possibly deploy it to all stat boxes [14:21:28] in this way stat1006 could become explorer today [14:21:42] and then we'd leave only 1007 to refactor [14:21:59] after we are done, then we could tackle 'researchers' [14:22:23] possibly with a new group scheme (after we check hadoop acls etc..) [14:22:56] I keep thinking that having a group for people able to grab data from mariadb (considering it a sort of dataset) could be useful [14:23:40] hm, sounds fine, but why a new group? are there too many 'researchers' to stick into an-privatedata-users ? [14:24:16] yep yep, there are also people outside priv-data though, and I'd like to think a bit more about them [14:24:19] without rushing [14:24:39] also, it would be great to avoid a huge single posix group like an-privdata [14:24:42] long term I mean [14:25:01] but this only if ACLs works as we expect [14:25:23] I like Leila's view of granting access to datasets, rather than "all" [14:29:44] if we can do that it makes the most sense, it is just hard to do it right, esp with no good way of declaring it somewhere, e.g. in puppet or something [14:29:55] yes I agree [14:30:57] PROBLEM - Check if active EventStreams endpoint is delivering messages. on icinga1001 is CRITICAL: CRITICAL: No EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [14:47:37] 10Analytics, 10Analytics-Wikistats, 10Product-Analytics: Contribution inequality graphs for Wikistats - https://phabricator.wikimedia.org/T195033 (10awight) >>! In T195033#5955531, @Nemo_bis wrote: >> Gini coefficient for edits > > I've never tried anything very serious, but in my experience that's needless... [14:56:28] looking into the EventStreams alarm [14:57:11] mforns: o/ - Andrew is moving ES to kubernetes IIUC [14:57:16] oh mforns sorry didn't see that here [14:57:21] yeah we rolled back, there are complicated LVS issyues [14:57:25] oh ok ok [14:57:32] we were trtying to do incremental rollou [14:57:34] t [14:57:37] not really possible it seams [14:57:40] seems* [14:57:44] ok, thx [14:57:47] ah [15:00:46] Hi team - I'm gone for kids and will miss standup (time-change) today I have started to vet bots data - looks great so far :) [15:01:24] (03CR) 10Nuria: Add automated agent-type to pageview_hourly (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/578373 (https://phabricator.wikimedia.org/T238363) (owner: 10Joal) [15:01:33] RECOVERY - Check if active EventStreams endpoint is delivering messages. on icinga1001 is OK: OK: An EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [15:02:04] (03CR) 10Mforns: [V: 03+2] Add dimensions to druid's pageview_hourly (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/570681 (https://phabricator.wikimedia.org/T243090) (owner: 10Mforns) [15:05:51] mforns: if you have time in these days I'd need some help in testing jupyter on stat100[4,5,6] [15:06:10] !log move stat1006 to role::statistics::explorer [15:06:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:06:13] * elukey dances [15:06:13] elukey, sure, will add that to ops week TODO [15:06:35] ack thanks! [15:07:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Virtual pageviews should set access_type to mobile if webhost is a mobile one - https://phabricator.wikimedia.org/T246309 (10Nuria) @nshahquinn-wmf the other thing to do is to make sure that kaiOS is sending this access_method bit cc @SBisson [15:08:31] I am running puppet on stat1006, only stat1007 is left at this point [15:20:41] !log remove the analytics user keytab from stat100[4,5] [15:20:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:20:59] this is not needed anymore --^ [15:21:06] we have an-launcher and an-coord [15:33:23] dsaez: hola! [15:33:42] jupyterhub is on stat1006 if you want to give a try [15:46:54] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Unify puppet roles for stat and notebook hosts - https://phabricator.wikimedia.org/T243934 (10elukey) Ok up to now stat100[4,5,6] have been unified under a single role, role::statistics::explorer. Jupyterhub was also added as well. Next steps: * move stat... [15:47:03] 10Analytics, 10Analytics-Kanban: Analytics Ops Technical Debt - https://phabricator.wikimedia.org/T240437 (10elukey) [15:47:05] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Unify puppet roles for stat and notebook hosts - https://phabricator.wikimedia.org/T243934 (10elukey) 05Stalled→03Open [15:47:27] 10Analytics, 10Analytics-Kanban, 10Research, 10User-Elukey: Add SWAP profile to stat1005 - https://phabricator.wikimedia.org/T245179 (10elukey) [16:00:16] a-team; will be a bit late for standup [16:03:59] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): EventLogging MEP Upgrade - https://phabricator.wikimedia.org/T238544 (10Ottomata) In today's meeting, we agreed that as long as the size of the shipped configs is within some limit, sh... [16:11:37] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) > Breakdown by browser family (does EventGate have support for EL's uaMap?). No, not as is. E... [17:24:08] hey nuria - can we take a moment now to talk about bots? [17:33:05] mforns: ok if we meet briefly in 15/20 mins for the notebook stuff? [17:33:11] just to plan what to test etc.. [17:34:21] 10Analytics, 10Product-Analytics (Kanban): SQL definition for structure data in commons metrics - https://phabricator.wikimedia.org/T247101 (10SNowick_WMF) p:05Triage→03High [17:34:23] 10Analytics, 10Product-Analytics (Kanban): SQL definition for structure data in commons metrics - https://phabricator.wikimedia.org/T247101 (10SNowick_WMF) [17:35:32] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Tech Tunning Session metrics - https://phabricator.wikimedia.org/T247100 (10SNowick_WMF) p:05Triage→03High [17:35:45] 10Analytics, 10Product-Analytics (Kanban): SQL definition for wikidata metrics for tunning session - https://phabricator.wikimedia.org/T247099 (10SNowick_WMF) [17:35:52] 10Analytics, 10Product-Analytics (Kanban): SQL definition for wikidata metrics for tunning session - https://phabricator.wikimedia.org/T247099 (10SNowick_WMF) p:05Triage→03High [17:38:34] 10Analytics, 10Analytics-Kanban, 10Product-Analytics (Kanban), 10User-Elukey: Learn how to make dashboard on top of data on hadoop/hive - https://phabricator.wikimedia.org/T247329 (10kzimmerman) [17:38:50] 10Analytics, 10Analytics-Kanban, 10Product-Analytics (Kanban), 10User-Elukey: Learn how to make dashboard on top of data on hadoop/hive - https://phabricator.wikimedia.org/T247329 (10kzimmerman) p:05Triage→03Medium [17:39:15] 10Analytics, 10Analytics-Kanban, 10Product-Analytics (Kanban), 10User-Elukey: Learn how to make dashboard on top of data on hadoop/hive - https://phabricator.wikimedia.org/T247329 (10Ottomata) [17:42:58] * elukey be back in a bit [17:51:27] joal: yes, i have 10 mins [17:51:33] joal: let me know if you are there [17:53:02] hip: mforns lemme know if you have any qs or thoughts about El patch [17:53:16] also, hip there are some comments for you to respond to on the error logging ticket [17:53:30] https://phabricator.wikimedia.org/T226986 [17:55:19] 10Analytics, 10Multimedia, 10Tool-Pageviews: Image files with quotes do not resolve on the mediarequest API - https://phabricator.wikimedia.org/T247333 (10Nuria) [17:55:20] ottomata, k [17:56:16] 10Analytics, 10wmfdata-python, 10Product-Analytics (Kanban): Update wmfdata to support multiple SQL engines for Hive databases - https://phabricator.wikimedia.org/T246060 (10Aklapper) [17:56:19] 10Analytics, 10wmfdata-python, 10Product-Analytics (Kanban): wmfdata cannot recover from a crashed Spark session - https://phabricator.wikimedia.org/T245713 (10Aklapper) [17:56:29] 10Analytics, 10Multimedia, 10Tool-Pageviews: Image files with quotes do not resolve on the mediarequest API - https://phabricator.wikimedia.org/T247333 (10Nuria) [17:57:15] 10Analytics, 10Multimedia, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10Nuria) @BerndFiedlerWMDE Yes, it is the quotes and it is a known problem. moved issue to a different ticket {T247333} [18:18:10] 10Analytics, 10Operations, 10User-Elukey: Refactor Analytics POSIX groups in puppet to improve maintainability - https://phabricator.wikimedia.org/T246578 (10elukey) Next steps: * decide what to do with the `researchers` posix group (fold it in `analytics-privatedata-users`, etc..) [18:20:56] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Add CPU quota to stat and notebook hosts - https://phabricator.wikimedia.org/T240440 (10elukey) To keep archives happy: the solution above was not enough, we had to do multiple things: 1) move the limits to `user.slice` to get applied to all the users logged i... [18:22:19] 10Analytics, 10Analytics-Kanban: Analytics Ops Technical Debt - https://phabricator.wikimedia.org/T240437 (10elukey) [18:22:21] 10Analytics: Kerberos credential cache expiry time on notebook is different than the OS one - https://phabricator.wikimedia.org/T247084 (10elukey) [18:26:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Investigate sporadic failures in oozie hive actions due to Kerberos auth - https://phabricator.wikimedia.org/T241650 (10elukey) To keep archives happy - it turned out, while working on Hive 2.2.3 on BigTop, that the Zookeeper setting (old... [18:35:38] heya nuria - sorry I missed your ping [18:35:45] nuria: anytime in the next few hours? [18:36:20] joal: sure, let's talk now [18:36:25] \o/ [18:36:27] batcave? [18:36:31] joal: ya [18:42:39] 10Analytics, 10ContentTranslation, 10Language-Team (Language-2020-January-March): Test Performance of Marian NMT translation in stat cluster - https://phabricator.wikimedia.org/T247245 (10elukey) [18:42:43] 10Analytics, 10ContentTranslation, 10Operations, 10SRE-Access-Requests, and 2 others: Request for access for stats machines for Santhosh - https://phabricator.wikimedia.org/T247246 (10elukey) 05Open→03Resolved a:03elukey @santhosh you have now access to stat100[4-7], and on 1005 we have a AMD GPU :) [18:45:29] 10Analytics, 10ContentTranslation, 10Language-Team (Language-2020-January-March): Test Performance of Marian NMT translation in stat cluster - https://phabricator.wikimedia.org/T247245 (10elukey) >>! In T247245#5955300, @santhosh wrote: > @JAllemandou I have filled https://phabricator.wikimedia.org/T247246... [18:48:10] ottomata: if you have time in these days can you try/test notebooks on stat1005? [18:48:38] we had some issues with kerberos yesterday but I can't repro anymore (spark failing to find the credentials) [18:48:59] since it is the jupyterhub 1.1.0 version etc.. [18:49:36] ottomata, I reviewed the patch, LGTM, no big questions really, except maybe: is the mw.track thing something required, or rather 'structural sugar'? [18:50:51] mforns: good q, i asked jason and he thinks both interfaces are needed. mw.track is nice because you can put it in places without necessarily depending on EventLogging. mw.eventLog.submit is nice because it skips the async fireing of the submit function, which will do things like format meta.dt asap, rather than whenever eventstream. fires [18:52:22] ottomata, but if your extension doesn't require eventLogging, mw.track('eventstream.blah', e) will not work right? so it does depend no? [18:53:11] * elukey off for today :) [18:53:36] ottomata, and AFAICS mw.eventLog.submit will enqueue to BackgroundQueue, so will be async no? [18:53:41] mforns: it just won't do anything [18:53:47] unless you do mw.trackSubscribe somewhere [18:53:48] I see [18:53:53] aha [18:53:54] mforns: ottomata: sorry I've been in meeting hell (and will be until ~4), the patch is fine just some questions I can add on the patchset [18:54:06] 10Analytics, 10Analytics-Kanban: Create UDF for action id generation - https://phabricator.wikimedia.org/T247342 (10Nuria) [18:54:11] mforns: and yes, the actual produuce is async because of backgroundqueue [18:54:15] mforns: that's a legitimate point about the async being forced by the output hehe, we're going to pretend that doesn't exist I guess for now [18:54:32] but the setting of meta.dt will be closer to correct, since it will be set as soon as submit() is called [18:54:38] so the event time will be more correct [18:54:46] but in practice? [18:54:50] that probably doesn't really matter ^ [18:54:50] :p [18:54:54] but it is tempting to just say that all stuff uses mw.track and it's always async, if we could have a thing that sticks dt on there or just decide it doesn't matter [18:55:45] (that the time diff doesn't matter) [18:55:56] I don't understand though, what is the advantage of mw.track [18:56:07] like you said, it doesn't introduce a hard dependency [18:56:21] any code that calls mw.eventLog.submit will throw if mw.eventLog isn't loaded [18:56:27] but mw.track decouples it because it's pub/sub [18:57:32] but if someone wants to produce to an eventstream, they will need the eventLog dependency anyway no? [18:57:32] so we can ship code to third parties with instrumentation still in it that just doesn't do anything; we don't need to require that they install the EventLogging extension [18:57:46] ah [18:57:51] yeah I'm talking about third party devs, in house yeah it will be there [18:58:41] although if it isn't there to draw down trackQueue, I feel like it's a memory leak [18:58:55] I haven't looked at the code lately but that's the impression I got last time I looked at it [19:02:14] 10Analytics, 10Analytics-Kanban: Automated deletion of actor data for bot prediction after 90 days - https://phabricator.wikimedia.org/T247344 (10Nuria) [19:04:07] hip, I understand [19:05:02] does this mean that we will prefer to use mw.track whenever possible so that more code can be shipped to 3rd parties without the eventLogging dependency? [19:05:08] nooo idea [19:05:22] I was just commenting on the patchset that we can just cut the mw.track for this patch and re-introduce it later [19:05:32] aha [19:05:53] I mean it may be the case that it's fine, or it may turn out that the timestamps are really messed up and there's no way to fix it, or the case that we really need a code path that isn't async, or we are blowing out the trackQueue with all our events, etc. etc. [19:05:59] no clue [19:10:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Vet high volume bot spike detection code - https://phabricator.wikimedia.org/T238363 (10Nuria) Per our conversation, we will take a look at: * translation requests * top pageview computation, like for example, this recent problem should disappear: {T247085... [19:24:04] hip, now that you say it, it would be cool to provide a way for mw.eventLog.submit to produce right away, otherwise it will be a pain for instrument developers to test, because the BackgroundQueue works with 30 second delay by default [19:25:24] yes it might be the case that we will need to support an additional optional argument submit(streamName, eventData, immediate=false) to put it in PHP style, or provide a second function submitNow.. blah blah [19:26:11] mforns: there is additional work that needs to be done to tune the BackgroundQueue parameters etc. but we're just putting that off for now I think [19:28:10] mforns: it's really tricky to make sure that thing can empty itself properly as well, in the event of a page unload, lots of issues to experiment with there. [19:37:39] heheh hip, I just had the exact same ideas as you and put them in the CR [20:48:52] Gone for tonight team :) [21:27:26] 10Analytics, 10Event-Platform, 10Services, 10Wikipedia-iOS-App-Backlog, and 2 others: Implementing the reliable event bus using Kafka - https://phabricator.wikimedia.org/T88459 (10Krinkle) [22:17:07] 10Analytics, 10Product-Analytics (Kanban): SQL definition for wikidata metrics for tunning session - https://phabricator.wikimedia.org/T247099 (10Isaac) Thanks @Nuria for looping me in on this. I'm sure that I'm missing some of the context here but wanted to add a few thoughts: * My interest in this task comes...