[00:05:36] 10Analytics, 10Multimedia, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10Doc_James) Very nice. [00:32:54] 10Analytics, 10Multimedia, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10Nuria) Thanks! @MusikAnimal pinging #analytics so they know this is been done. [07:05:59] morning! [07:06:09] Spark RPC encryption seems working fine [07:06:11] \o/ [07:07:26] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Enable encryption in Spark 2.4 by default - https://phabricator.wikimedia.org/T240934 (10elukey) [07:10:10] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Enable encryption in Spark 2.4 by default - https://phabricator.wikimedia.org/T240934 (10elukey) Due to some issues with Spark Refine, we removed the spark.io.encryption settings (encryption of temporary shuffle files, spilled to disk) since it seems not w... [07:23:01] ok so for Presto, I might know what is happening [07:23:28] the HTTP port is not autenticated, since it is used by default (IIUC) by the workers to communicate with the coordinator [07:23:51] so we need a TLS certificate to enable the TLS port in the coordinator [07:23:59] and I think that then kerberos will be used [07:24:24] also the workers need to use https as well, otherwise we have PII data flowing unencrypted [07:24:27] sigh [08:03:30] Good morning elukey [08:09:33] 10Analytics, 10Analytics-Kanban, 10User-Elukey: New Hadoop hardware. Refreshes and hosts with space for GPUs - https://phabricator.wikimedia.org/T241190 (10elukey) Just to confirm the amount of 10g vs 1g NICs: ` elukey@cumin1001:~$ sudo cumin 'A:hadoop-worker' 'cat /sys/class/net/*/speed 2>&1 | grep -v "In... [08:11:40] joal: o/ [08:11:43] bonjour [08:13:18] elukey: I thought I'd work yesterday and just crashed out - In bed at 8:30 [08:13:34] So no forking :) [08:16:00] 10Analytics, 10Analytics-Cluster: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10elukey) [08:16:02] 10Analytics, 10Analytics-Kanban, 10User-Elukey: New Hadoop hardware. Refreshes and hosts with space for GPUs - https://phabricator.wikimedia.org/T241190 (10elukey) [08:19:34] 10Analytics, 10Analytics-Cluster: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10elukey) Cross posting in here too: Just to confirm the amount of 10g vs 1g NICs: ` elukey@cumin1001:~$ sudo cumin 'A:hadoop-worker' 'cat /sys/class/net/*/speed 2>&1 | grep -v "Invalid"' 54 host... [08:23:18] joal: ahhahahah :D [08:23:45] :) [08:32:52] elukey: just sent a new patch for wikidata-dumps\ import with 2 different syntaxes - please let me know the one ou prefer :) [08:34:40] ack will check it in a bit! [09:01:19] 10Analytics, 10Analytics-Cluster: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10elukey) [09:04:45] (03PS1) 10Joal: Add oozie job converting wikidata dumps to parquet [analytics/refinery] - 10https://gerrit.wikimedia.org/r/569836 (https://phabricator.wikimedia.org/T209655) [09:05:24] (03CR) 10Joal: "Ping folks on this: this code has been ran manually a few times over past 2 years. An oozie job is ready (see https://gerrit.wikimedia.org" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 (owner: 10Joal) [09:14:12] 10Analytics: Refresh stat1004 with a new host and GPU - https://phabricator.wikimedia.org/T241187 (10elukey) 05Open→03Stalled The procurement task is T242149, setting the task to stalled until we have hardware. [09:33:48] 10Analytics, 10Operations: Host analytics1073 is DOWN - https://phabricator.wikimedia.org/T244064 (10elukey) 05Open→03Resolved a:03elukey The host has been stable since then, let's re-open if it re-happens. [09:36:59] 10Analytics, 10Operations: analytics1061 is down - https://phabricator.wikimedia.org/T244081 (10elukey) 05Open→03Resolved a:03elukey Didn't re-occur, will re-open if needed. [09:37:31] hi, I'm wondering if something has changed with AQS recently. (Context is T244164) [09:37:32] T244164: Newcomer tasks: module preview showing "0 edits" - https://phabricator.wikimedia.org/T244164 [09:37:49] groceryheist: hi! did you see T232068 ? [09:37:50] T232068: notebook1004 - /srv is full - https://phabricator.wikimedia.org/T232068 [09:37:51] specifically, https://wikimedia.org/api/rest_v1/metrics/edits/aggregate/cs.wikipedia/user/content/daily/20200103/20200104 used to show results [09:38:58] mm, looks like the last data is from 31 December. [09:39:29] kostajh: Hi! yes we still need to update aqs to the new snapshot [09:39:38] (that includes january's data) [09:40:03] does it work for data prior to that? [09:40:06] elukey: ok, thanks. Is there a task I can follow fo rthat? [09:40:20] elukey: yes it does. [09:41:10] the problem is that we show data from "1 month ago" so currently all of "number of edits" widgets show 0, e.g. https://phabricator.wikimedia.org/F31546308 [09:41:41] a super hacky workaround, since we're not too concerned about accuracy, is to show data from "2 months ago" (or 3 months ago), but depends on the timeline of updating AQS [09:42:40] kostajh: is this a feature for mobile or the apps? I wasn't aware of it.. problem is, we should be able to update aqs this week, but IIUC the problem will re-occur every month no? [09:43:40] elukey: mobile web and desktop [09:44:07] elukey: we haven't had a problem with it yet, the code went live in the third week of October [09:44:24] or, maybe we have had a problem and no one noticed [09:44:32] kostajh: yeah I think so :( [09:45:26] elukey: is the process for updating AQS documented somewhere? if I understand, you manually load data each month, but sometimes this doesn't happen at the beginning of the month? [09:46:11] so if we update our code to query for one day of edits from 2 months ago, we should be safe? [09:47:45] kostajh: we have jobs in hadoop that every month create https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_reduced, but we need to make a code change in AQS to update the data source since it is not a simple update (we calculate every month the whole dataset) [09:47:50] so we need to test etc.. [09:48:00] k [09:48:31] 2 months seems safe, but please note the AQS is not supported as mediawiki is [09:48:49] namely, we keep it up as best as possible following best SRE practices etc.. [09:49:02] sure [09:49:06] but it is a service that can go down, and that doesn't page SRE etc.. [09:49:23] we get alarms of course :) [09:49:36] but I woudn't use it for something super impactful for users [09:49:39] yeah [09:49:53] so, we probably need something else in the UI to show when the number comes back as 0 [09:49:56] ok, thanks :) [09:50:32] we can discuss the use case for sure, but I would reach out to nuria first (my manager) [09:51:39] I think the existing setup is fine, we can make our UI a bit more resilient [09:52:15] another question is how much traffic does thing bring to AQS [09:52:26] I didn't notice anything horrible in traffic graphs [09:52:31] but do you expect it to grow? [09:54:23] elukey: not very high traffic at the moment. It's shown to newly registered users on select wikis. https://www.mediawiki.org/wiki/Growth#Current_initiatives has a table of where we are deployed [09:54:47] but, French Wiki and Commons are possible additions [09:55:06] elukey: also, we cache the result for a day (for all users) [09:55:27] so actually, we're just hitting that raw URL once per day per wiki. [09:56:28] ack thanks [09:56:52] kostajh: do you mind to update the task with a brief summary of this discussion? Otherwise I can do it [09:57:06] just want to preserve this chat, seems useful :) [09:57:12] elukey: I'll add a comment and see if I can summarize [09:57:18] super [10:05:30] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10elukey) @Cmjohnson @Jclark-ctr let's sync about next steps whenever you have time! [10:05:50] elukey: posted the abbreviated chat in the task. thanks again [10:07:27] 10Analytics: Analytics Hardware for Fiscal Year 2019/2020 - https://phabricator.wikimedia.org/T244211 (10elukey) p:05Triage→03Normal [10:08:53] kostajh: thank you! [10:14:11] 10Analytics, 10Product-Analytics: Check home leftovers of dfoy - https://phabricator.wikimedia.org/T239571 (10elukey) Everything cleaned up from stat/notebook homes. @Milimetric the last remaining action before closing is to remove or not the data from HDFS. [10:45:04] 10Analytics, 10stewardbots, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10elukey) @Legoktm sorry for the late reply, thanks a lot for the help! @Krinkle thanks a lot for the info, can you add a bit more detail about: > I... [11:03:33] (03PS12) 10Joal: Add spark code for wikidata json dumps parsing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 (https://phabricator.wikimedia.org/T209655) [11:07:49] 10Analytics, 10Operations: Add metadata to puppet about kerberos accounts - https://phabricator.wikimedia.org/T235418 (10elukey) 05Open→03Resolved a:03elukey This has been done: users with existing kerberos principals have been backfilled, and now we have in place a procedure to add a flag to puppet each... [11:07:51] 10Analytics: Enable Security (stronger authentication and data encryption) for the Analytics Hadoop cluster and its dependent services - https://phabricator.wikimedia.org/T211836 (10elukey) [11:39:04] * elukey lunch! [12:10:23] (03PS2) 10Joal: Add oozie job converting wikidata dumps to parquet [analytics/refinery] - 10https://gerrit.wikimedia.org/r/569836 (https://phabricator.wikimedia.org/T209655) [12:57:11] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10BerndFiedlerWMDE) Nice! Great job! Is this the place to report issues? Trying t... [12:58:33] https://hpc.guix.info/blog/2019/10/towards-reproducible-jupyter-notebooks/ looks nice! [13:03:28] joal: https://fosdem.org/2020/schedule/track/graph_systems_and_algorithms/ :) [13:03:49] \o/ [13:03:57] Thanks Luca! I should have gone there :) [13:04:48] we all should have, but you know, ALL HANDS :D [13:54:28] elukey: sent https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570050/ for you :) [13:55:36] joal: ah ok, I thought that you had to re-run the mediawiki history job from yesterday's chat, but probably I didn't get it correctly [13:56:04] elukey: history is correct, so is its reduced version - The dumped version is not however :) [13:56:12] ahhh okok [13:58:21] joal: aqs1004 depooled and ready for a test [13:58:38] elukey: testing [13:59:12] works for me elukey :) [13:59:44] joal: shall I proceed with the roll restart? [13:59:51] please elukey ) [14:00:01] 10Analytics, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10Miriam) Thanks @Nuria! @JAllemandou salut! As discussed at all hand, could you please share the folder where I should... [14:04:20] done! [14:04:36] ack elukey - testing UI [14:06:36] elukey: all good on my side - Thanks a lot :) [14:07:03] super :) [14:08:02] joal: did you see https://phabricator.wikimedia.org/T244164 ? [14:08:32] elukey: something else - Keeping streaming in our jars (used previously for druid, by tranquility noticeably) costs ~60M per jar version - I suggest removing that :) [14:08:33] I wasn't aware of this use case [14:09:44] elukey: I wasn't either - It feels related to the use-case we're discussing with core-platform, but we should be kept in the loop! [14:11:00] yep [14:11:18] ^ apologies for not pinging, I didn't know we were meant to tell you if using that endpoint [14:12:01] kostajh: nah all good, we probably should have advertised it in a better way in the docs :) [14:12:23] my point was more how we (as analytics) can advertise it in a better way [14:12:25] kostajh: using it is great :) [14:13:04] ottomata: goood morning [14:13:07] ottomata: https://hpc.guix.info/blog/2019/10/towards-reproducible-jupyter-notebooks/ [14:13:27] found it today in the FOSDEM talks [14:15:03] joal: also found how to make presto work with kerberos [14:15:09] \o/ [14:15:17] elukey: Today has been very productive indeed :) [14:15:36] also a lot of swearing from my side [14:15:41] :D [14:15:55] elukey: May I ask for a quick review around my wikidata patch, so that I finalie it (choice of syntax), please ? Only if i doesn't kill the productivity :) [14:16:16] doing it [14:16:16] 10Analytics: Analytics Hardware for Fiscal Year 2019/2020 - https://phabricator.wikimedia.org/T244211 (10Ottomata) I think we don't need the Kafka jumbo expansion this year. [14:16:18] elukey: productively swearing feels not that bad :) [14:16:44] 10Analytics: Analytics Hardware for Fiscal Year 2019/2020 - https://phabricator.wikimedia.org/T244211 (10elukey) [14:18:08] 10Analytics, 10Cite, 10Reference Previews, 10Research, and 2 others: Instrument Cite to record the nubmer of footnote marks and references list entries rendered in each article - https://phabricator.wikimedia.org/T241833 (10Miriam) Hi @awight ! I believe @tizianopiccardi has worked on something similar for... [14:18:39] 10Analytics, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10JAllemandou) Hi @Miriam :) From stat1007, directories are synced from `/srv/published/datasets` to https://analytics.wi... [14:21:43] 10Analytics, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10Ottomata) Sounds good! [14:22:13] will read luca ty! [14:27:46] joal: there is a typo in the class name, just added a comment [14:27:50] puppet compiler fails [14:27:55] ack luca mweh [14:28:10] very difficult to spot, took me a bit :D [14:35:51] ottomata: would you mind reviewing an hdfs-rsync PR when you have time? https://github.com/wikimedia/hdfs-tools/pull/7 [14:36:06] elukey: corrected the patch [14:38:48] joal: please don't kill me, the same typo is in the filename :( [14:38:52] I just realized it [14:38:57] hahahaha :) [14:39:08] I'm sorry I didn't triple check - my bas [14:40:05] pusshing the patch elukey [14:40:08] #j wikimedia-ai [14:40:16] Hi kevinbazira :) [14:40:57] Hi joal. Sorry for the interruption :) [14:43:55] np kevinbazira - I was having fun pinging you :) [14:46:29] :) [14:49:53] joal: added some nits [14:51:07] looking ottomata [14:55:24] ottomata: commented as well and pushed an updated pathc (minor changes) [14:57:45] ok gone for kids, will be back later after diner [14:59:53] joal: shall I merge the patch or wait? [15:01:19] elukey: we need to wait, hdfs-rsync needs to be fixed, and also, I'd like to make that patch with coherent syntax - I have used 2 different syntaxes for the 2 datasets to be rsynced - Can you please tell me which one you prefer? [15:03:22] okok, let's chat about it when you are back [15:03:31] sure elukey [15:22:20] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10Tobi_WMDE_SW) >>! In T234590#5847798, @BerndFiedlerWMDE wrote: > Trying to measu... [15:40:30] brb [16:01:32] (03CR) 10Nuria: [C: 04-1] "I think a bit of info as the nature of every dataset will help." (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/569836 (https://phabricator.wikimedia.org/T209655) (owner: 10Joal) [16:02:54] nuria, milimetric hellooo just a lil reminder of this change: https://gerrit.wikimedia.org/r/#/c/analytics/wikistats2/+/558702/ [16:03:21] fdans: indeed [16:04:02] fdans: let's also not forget the switch of stats.wikimedia.org, will put this changeset on my queue for today [16:04:36] nuria: yea I'm on that [16:13:26] joal it looks like my main comment on that patch was lost..>??? where'd it go?!?! [16:18:27] re-added [16:23:17] 10Analytics, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10JFishback_WMF) @Miriam it was my understanding that @Michele.tizzoni 's home folder was removed at some point (I couldn... [16:25:27] ottomata: to gerrit comment purgatory, neither here nor there [16:34:33] fdans: on your change the bundle is not versioned [16:34:44] fdans: see [16:34:51] fdans: on index.html [16:35:13] fdans: that would not work, as no new versions on bundle.js can override and old one once cached [16:35:37] fdans: makes senser? [16:35:42] *sense? [16:41:19] fdans: from what i see build.js is only being executed on the prod build [16:42:15] nuria: index.html is only for the dev bundle [16:42:30] the dev bundle only has the english strings [16:42:43] fdans: index.html is used in both dev and prod [16:42:46] nope [16:42:50] nuria: index.ejs [16:43:31] fdans: let'see [16:43:36] fdans: this is the prod build now: [16:43:40] fdans: [16:43:43] https://www.irccloud.com/pastebin/fTTLpNhB/ [16:44:20] fdans: i run npm run build, right? [16:45:07] nuria: hmmm do you have the latest patch? the main file names should have versions in them [16:45:50] fdans: let me get code again and rebuild [16:48:26] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: The guava error still persists in data quality bundles - https://phabricator.wikimedia.org/T241375 (10mforns) Cool, moving this to Ready to Deploy. If I'm not the one deploying, remember there's deployment instructions in the etherpad: https://etherpad.wik... [16:54:09] https://www.irccloud.com/pastebin/ZhLpeCRZ/ [16:54:15] fdans: sorry, see above [16:54:53] nuria: yep, that looks more like it [16:55:00] fdans: i also do not understand wjhat you mean by index.ejs as apache is pointing the wikistats deploy to index.html so that file needs to exist, let's talk after standup [16:55:24] nuria: src/index.ejs vs src/indes.html [16:56:03] the template when creating the bundle is, yes, rendered into an index.html [16:56:21] fdans: ah i see, the build [16:56:47] nuria: in the prod build you should see the correct thing [16:57:08] fdans: still , in this case in index.html it says: var jsFile = "main.bundle.2.6.11.aa.js"; [16:57:56] nuria: yep, the language is substituted in the js there [16:58:06] * fdans sweats as he explains his awful hacks [17:22:36] ottomata: i realized i don't know appropriate process for a report, but the event.mediawiki_revision_score/datacenter=eqiad/year=2020/month=1/day=26/hour=20 partition is missing in hive [17:27:04] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10MusikAnimal) >>! In T234590#5847798, @BerndFiedlerWMDE wrote: > Nice! Great job!... [17:33:12] 10Analytics, 10Readers-Web-Backlog: Report unhandled jQuery $.Deferred errors in client side error logging - https://phabricator.wikimedia.org/T244261 (10Niedzielski) [17:33:13] 10Analytics, 10Readers-Web-Backlog: Report unhandled jQuery $.Deferred errors in client side error logging - https://phabricator.wikimedia.org/T244261 (10Niedzielski) @tgr, let me know if this doesn't make sense and feel free to edit! [17:33:15] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Product-Analytics (Kanban): Add dimensions for Project type & language to Edits_hourly - https://phabricator.wikimedia.org/T232659 (10Nuria) 05Open→03Resolved [18:00:00] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad: Degraded RAID on analytics1030 - https://phabricator.wikimedia.org/T243971 (10Ottomata) 05Open→03Declined analytics1030 is a node in the analytics-test cluster. We will be ordering replacement hardware this year so we aren't going to worry a... [18:03:37] 10Analytics, 10Analytics-Kanban, 10Product-Analytics (Kanban): Add new dimensions to virtual_pageview_hourly and pageview_hourly - https://phabricator.wikimedia.org/T243090 (10Nuria) a:05Milimetric→03mforns [18:07:00] 10Analytics, 10stewardbots, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10Krinkle) Get a sample from `#central` on IRC and from `wiki=login.wikimedia` on EventStreams, then in the latter look for `log/newusers` entries tha... [18:14:31] 10Analytics, 10Cite, 10Reference Previews, 10Research, and 2 others: Instrument Cite to record the nubmer of footnote marks and references list entries rendered in each article - https://phabricator.wikimedia.org/T241833 (10Ottomata) Interesting! I like this idea. This could either be added as new field(... [18:15:19] 10Analytics, 10Readers-Web-Backlog: Report unhandled jQuery $.Deferred errors in client side error logging - https://phabricator.wikimedia.org/T244261 (10Ottomata) [18:53:51] 10Analytics, 10Readers-Web-Backlog (Tracking): Report unhandled jQuery $.Deferred errors in client side error logging - https://phabricator.wikimedia.org/T244261 (10Niedzielski) [18:54:57] nuria: forgot to mention https://phabricator.wikimedia.org/T244164 during standup [18:55:02] interesting use case to know [18:55:11] (at least I wasn't aware of it) [19:00:18] * elukey off! [19:27:43] ebernhardson: re missing rev score hour [19:27:45] interesting. [19:28:00] it is also missing in raw data, but we have no alerts, which indicates that the data was never in kafka.... [19:28:08] i wonder if there was a change-prop issue? [19:28:09] hmmm [19:29:37] IIRC that hour was the one of the ddos [19:29:42] ottomata: --^ [19:34:15] nuria: Heya - would you have am inute? [19:42:07] oohhhhh [19:42:09] that makes sense [19:42:11] phew yeha, [19:42:24] in kafka i have no data between 19:542 and 21:01 that day [19:42:26] ebernhardson: ^ [19:42:28] 19:52 [19:42:43] It makes sense indeed [19:43:23] thanks joal [19:43:31] ottomata: I just pushed the change you asked for - It's a favour I do to you: an almost full functional code and an imperative-style function :S [19:43:34] :) [19:44:19] np ottomata - when ebernhardson sent his message I didn't get it, and you reposted about data not being available in Kafka and I connected the dots :) [19:44:38] ottomata: hmm, the problem is that doesn't interact well with jobs that wait for either paths or hive partitions to exist [19:44:53] so a daily job of same day will just sit around and not see it's input data. [19:45:37] this is in airflow, so i basically told it to lie and pretend that operator was a success, not sure how to lie to oozie in same way [19:46:20] ebernhardson: We fake a valid hour by manually touching the _SUCCESS file in the folder [20:00:50] yeah there is a bug about that.... [20:02:19] i think this one [20:02:20] https://phabricator.wikimedia.org/T214545 [20:02:36] https://phabricator.wikimedia.org/T214545#4947669 [21:03:52] hey team, I can not find our program code for coupa expense reports, which one have you used? [21:04:00] https://office.wikimedia.org/wiki/Spending_authority [21:04:20] mforns: Heya - There is an email about that in fundation-all IIRC [21:04:22] CR-2021 is not in coupa's dropdown [21:04:44] joal, yes, but I'm following its instructions and there's no corresponding analytics program [21:05:01] Meh - My bad mforns - Can't help then :( [21:05:16] no problemo, will look [21:11:04] (03PS13) 10Joal: Add spark code for wikidata json dumps parsing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 (https://phabricator.wikimedia.org/T209655) [21:18:04] mforns: i kind of saw an analytics one and put things there? CT-2021 [21:18:45] nuria, yes I also found CT-2021, but this code doesn't appear in coupa's dropdown options [21:19:39] I put "Other programmatic activities" [21:20:07] mforns: i found some drop down that included 'analytics" but might not be present in yours, do ask https://meta.wikimedia.org/wiki/User:Tle_(WMF) [21:21:05] nuria, CT-2021 appears within the "Dept - Department" dropdown, yes [21:21:31] but the emails gives instructions to choose there the oprion "Centralized Budget" [21:22:22] the one I'm trying to fill in is the "Program" dropdown [21:22:34] and there's no analytics there [21:25:08] mforns: ya, no program budget [21:25:17] mforns: maybe that one can stay empty? [21:25:31] it has an asterisk, so it's mandatory [21:25:50] I put "Other programmatic activities" and will add a comment on the report [21:30:07] 10Analytics: Database creation in Hive - https://phabricator.wikimedia.org/T244292 (10EYener) [21:30:27] 10Analytics, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10leila) @Miriam I just shared the link to where the data lives with you. [21:33:49] (03PS3) 10Joal: Add oozie job converting wikidata dumps to parquet [analytics/refinery] - 10https://gerrit.wikimedia.org/r/569836 (https://phabricator.wikimedia.org/T209655) [21:34:34] (03CR) 10Joal: "I added a readme file describing the dataset. @nuria: Is it enough or do you prefer in-file comments?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/569836 (https://phabricator.wikimedia.org/T209655) (owner: 10Joal) [21:39:31] joal: i also sent another patch for bots (will work on oozie today) let me know what you think [21:40:07] nuria: I missed that - Have you tagged me? [21:41:07] joal: yessir: https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/562368/ [21:41:38] Ah yes - thanks for the link [21:51:51] (03CR) 10Joal: "Another bunch of comments" (039 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) (owner: 10Nuria) [21:58:40] 10Analytics, 10Analytics-Kanban, 10Product-Analytics (Kanban): Add new dimensions to virtual_pageview_hourly and pageview_hourly - https://phabricator.wikimedia.org/T243090 (10mforns) @cchen (and @Nuria) I believe the namespace_id field is already in both data sets. Isn't that enough? Are you after the text... [22:00:05] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10Nuria) What is happening here (cc @fdans) is that requests are for this filename... [22:00:59] 10Analytics, 10Analytics-Kanban, 10Product-Analytics (Kanban): Add new dimensions to virtual_pageview_hourly and pageview_hourly - https://phabricator.wikimedia.org/T243090 (10Nuria) @mforns i do not think it is present in the data ingested in turnilo, correct? [22:01:43] nuria, oh, is it about druid only? [22:01:58] the project_family field is not in Hive though [22:02:01] mforns: ya, connie is most interested in teh superset dashboards [22:02:17] oh ok, the task didn't mention druid [22:02:48] mforns: ah correct. the parent task does but not this one: https://phabricator.wikimedia.org/T232659 [22:03:06] Gone for tonight - See you tomorrow team [22:03:24] nuria, in any case, we'd have to add project_family to pageview_hourly and virtualpageview_hourly to then load them to druid no? [22:03:58] or maybe parse it from the project, when loading to druid [22:04:26] yea, I guess that's possible [22:05:20] 10Analytics, 10Analytics-Kanban, 10Product-Analytics (Kanban): Add new dimensions to virtual_pageview_hourly and pageview_hourly - https://phabricator.wikimedia.org/T243090 (10cchen) @mforns in edit_hourly in Superset and Turnilo, we have fields: namespace_is_content, namespace_is_talk and namespace_name. C... [22:05:33] byeee joal :]] [22:18:58] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10bd808) >>! In T234590#5849775, @Nuria wrote: > This is somewhat confusing for en... [22:22:19] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10Nuria) >("Präsidentschaftswahl_in_den_Vereinigten_Staaten.ogv" in this case) wou... [22:29:24] joal: for labels I think we should leave the automated, user and null cases cause that is informative as to what is happening [22:29:46] joal: so 3 possible labels (or two plus absence of label if that makes sense), let me know what you think [22:46:00] 10Analytics, 10Privacy Engineering, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10JFishback_WMF) [23:13:44] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10bd808) Leaving the md5 hash bits out of the discussion, the problem with fetchin... [23:29:27] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10MusikAnimal) > The expected URL on the API's side has double URL encoded the tit...