[06:19:07] morning! [06:26:47] 10Analytics: Analytics datasets should be under a free license - https://phabricator.wikimedia.org/T244685 (10Yair_rand) [07:07:13] good morning :) [07:48:17] 10Analytics, 10Product-Analytics: Enable shell access to presto from jupyter/stats machines - https://phabricator.wikimedia.org/T243312 (10elukey) All the stat/notebooks have now the `presto` cli working with Kerberos. I tested the following python script and it seems working: ` #!/usr/bin/env python3 # -*- c... [07:48:31] presto seems to work fine with python --^ [07:48:42] need to fix a couple of things in puppet but it seems working [07:48:53] I'll test it on a notebook but I don't expect weird results :) [08:02:26] good morning elukey :) [08:02:38] o/ [08:03:16] elukey: I didn't manage to make notebooks work with pyhive, but it was expected (kerberos not supported yet for presto) [08:03:23] It's great it seems [08:03:32] to work with prestodb client :) [08:06:20] yep very nice and easy [08:07:07] Thanks again elukey for the wikidata-dumps merge :) [08:07:25] thank you for the patience! I really like it, very tidy [08:07:48] separation of concerns between ExecStart and bash is always nice [08:07:48] elukey: There is a model (xml-dumps), so it was easy :) [08:07:58] Right makes sense [08:09:28] elukey: checking for data, I realized I made a typo in path :((( [08:09:31] correcting that [08:10:07] sorry [08:18:43] np! didn't even realize while reviewing :) [08:25:13] git br [08:25:16] oops [08:34:34] joal: better to absent the current one and create a new one with the new naming [08:34:41] pff :( [08:34:45] yeah - doing [08:35:07] mmm lemme check first with pcc [08:35:11] might not be needed [08:35:16] joal: --^ [08:35:20] ack [08:39:13] elukey: let's do it the correct way - Sending a new patch :) [08:41:22] joal: absent should be first, otherwise jenkins complains [08:41:38] indeed, just saw that elukey [08:43:02] you know puppet guidelines [08:43:03] -.- [08:43:12] be patient :) [08:54:39] joal: done! [08:54:49] Thanks alot elukey [08:56:03] PROBLEM - Check the last execution of refinery-import-wikidata-all-tll-dumps on stat1007 is CRITICAL: NRPE: Command check_check_refinery-import-wikidata-all-tll-dumps_status not defined https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:56:57] ahhahaha [08:57:01] yes of course [08:57:02] :) [08:57:19] should clean by itself or not really? [09:03:31] joal: yes yes it will, next puppet run on the icinga host, but we'll not see the recovery msg [09:03:39] ok [09:16:07] elukey: shall I go with cleaning? [09:28:14] (03PS14) 10Joal: Add spark code for wikidata json dumps parsing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 (https://phabricator.wikimedia.org/T209655) [09:29:09] (03CR) 10Joal: "This is ready and tested - It'd be great to have that deloyed this week if anyone can review." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 (https://phabricator.wikimedia.org/T209655) (owner: 10Joal) [09:29:34] joal: yep! [09:32:16] ack elukey - sending patch [10:00:13] (03PS4) 10Joal: Add oozie job converting wikidata dumps to parquet [analytics/refinery] - 10https://gerrit.wikimedia.org/r/569836 (https://phabricator.wikimedia.org/T209655) [10:04:55] (03CR) 10Joal: "Ready IMO. Would be great to be deployed this week." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/569836 (https://phabricator.wikimedia.org/T209655) (owner: 10Joal) [10:20:00] 10Analytics, 10Analytics-Kanban: Productionize item_page_link table - https://phabricator.wikimedia.org/T244707 (10JAllemandou) [10:20:11] 10Analytics, 10Analytics-Kanban: Productionize item_page_link table - https://phabricator.wikimedia.org/T244707 (10JAllemandou) a:03JAllemandou [10:23:53] 10Analytics, 10Product-Analytics, 10Patch-For-Review: Enable shell access to presto from jupyter/stats machines - https://phabricator.wikimedia.org/T243312 (10elukey) Updated the documentation: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto#Usage_on_analytics_cluster [10:23:53] I have updated https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto#Usage_on_analytics_cluster [10:24:11] ready to test if anybody has time [10:24:17] testing elukey :) [10:25:06] thanks :) [10:25:21] a possible goal for next quarter is to add prometheus metrics for presto [10:25:40] that's be great :) [10:26:04] also - maybe we could add `presto-python-client['kerberos']` to notebook machines as default? [10:29:08] How come that presto thing needs me to be authenticated? [10:29:10] :) [10:29:14] Works like a charm [10:29:23] And is BLAZING FAST [10:29:31] :D [10:29:38] MWAHAHAHA :) [10:31:02] joal: I don't think there is a deb package for that [10:31:14] so people will need to pull it via pip [10:31:45] does it work even from a notebook? Was about to test but if you did I'll skip :) [10:32:47] elukey: indeed it works (your pip3 command() [10:33:16] it needed libkrb5-dev deployed everywhere (to get gssapi libs) [10:33:35] and the ca cert available, but the rest wasn't a problem [10:33:47] all right seems that the task is done then :) [10:34:10] elukey: \o/ !!!! [10:34:20] elukey: you know what I'm gonna ask soon ;) [10:34:31] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Enable shell access to presto from jupyter/stats machines - https://phabricator.wikimedia.org/T243312 (10elukey) a:03elukey [10:34:53] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Enable shell access to presto from jupyter/stats machines - https://phabricator.wikimedia.org/T243312 (10elukey) [10:35:19] 10Analytics, 10Analytics-Kanban: Presto access on jupyter notebooks - https://phabricator.wikimedia.org/T244505 (10elukey) [10:35:22] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Enable shell access to presto from jupyter/stats machines - https://phabricator.wikimedia.org/T243312 (10elukey) [10:35:52] elukey: I updated the doc page wioth a more meaningful comment on pip3 install [10:36:21] ack thanks [10:36:51] 10Analytics, 10Multimedia, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10fdans) [10:38:10] 10Analytics, 10Multimedia, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10fdans) [10:52:24] 10Analytics, 10Operations, 10vm-requests: Create a ganeti VM in eqiad: an-tool1008 - https://phabricator.wikimedia.org/T244717 (10elukey) [10:53:41] 10Analytics, 10Operations, 10vm-requests: Create a ganeti VM in eqiad: an-tool1008 - https://phabricator.wikimedia.org/T244717 (10elukey) [11:00:25] 10Analytics, 10Operations, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10elukey) [11:00:59] an-tool1008 should become the analytics team's client [11:01:08] with all our timers etc.. [11:01:22] so the stat boxes will become all equal (hopefully) [11:02:11] I guess that 4 cpu cores and 8G of ram should be enough for our use cases [11:02:24] if more is needed a stat box is probably better to be used... [11:02:40] (ideally the vm will run mostly timers or similar automated stuff) [11:03:04] stat1006 should become a hadoop client eventually [11:03:16] moar space for everybody [11:03:43] does it make sense? [11:17:12] need to run errands for a bit, so will take a long lunch break, bbl! [11:50:56] 10Analytics, 10Operations, 10serviceops, 10vm-requests: Create a ganeti VM in eqiad: an-tool1008 - https://phabricator.wikimedia.org/T244717 (10jijiki) [11:51:04] 10Analytics, 10Operations, 10serviceops, 10vm-requests: Create a ganeti VM in eqiad: an-tool1008 - https://phabricator.wikimedia.org/T244717 (10jijiki) p:05Triage→03Medium [11:51:47] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10jijiki) p:05Triage→03Medium [13:17:37] djellel: Heya [13:17:48] djellel: your running is eating all of cluster resource [13:18:24] djellel: I have moved it to the nice queue, but even with that it's really heavy - Could we please try to limit it? [13:22:15] djellel: I suggest trying to launch it with -Dmapreduce.job.running.map.limit=1024 [13:47:40] djellel: I kill your app as it prevent other jobs to proceed [13:50:08] * elukey approves judge joal [13:54:04] elukey: I'm unhappy - commons-compress 1.20 has been released, but I don't manage to find a way to use it in spark :( [13:55:25] joal: elukey anyone want to have some fun deleting some partitions, pairing with me? [13:56:54] (as per our policy of not deleting data alone™️) [13:58:32] fdans: if you gimme 10 mins I'll be able to help [13:58:37] elukey: thank youuu [13:59:44] * joal won't intervene between a Spanish and an Italian :) [14:10:54] hey I'm alive! [14:11:01] hello milimetric ! [14:11:14] wooo hello milimetric, missed you :) [14:11:30] likewise, I'm so sad I missed you all that Friday and all last week [14:12:19] I'm catching up but lemme know of anything urgent [14:13:12] milimetric: they say the best way of getting back on your feet after many days of sickness is hitting +2 on an i18n code review [14:13:17] (jk, not urgent) [14:13:24] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Security Readiness Reviews, and 2 others: Security Review For EventStreamConfig extension - https://phabricator.wikimedia.org/T242124 (10Ottomata) AH SORRY, somehow on Friday I read ', unless you wanted to take a crack at improving' as 'unless you wanted... [14:14:09] :) sweet, will take a look. I gotta send Timo a patch first [14:14:33] yeayea [14:18:06] fdans: ok I am good [14:18:09] batcave? [14:18:18] elukey: ome [14:18:19] omw [14:18:20] 10Analytics, 10stewardbots, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10Ottomata) > he wrote a Python prototype of a new irc.wikimedia.org backend that uses eventstreams rather than the UDP recent changes feed (that hope... [14:20:07] 10Analytics, 10stewardbots, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10elukey) >>! In T242712#5864925, @Ottomata wrote: >> he wrote a Python prototype of a new irc.wikimedia.org backend that uses eventstreams rather tha... [14:20:13] elukey: oh wait shit luca [14:20:24] I got a meeting in 10min, sorry [14:20:38] ahahaha okok [14:26:09] 10Analytics, 10stewardbots, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10Ottomata) Hm. I didn't realize that. Moving to other ticket to discuss... [14:28:07] joal, dedcode's job is still eating up the cluster no? [14:28:32] oh, it was relauched? [14:29:55] yeah seems so [14:30:38] should we kill again? [14:30:58] mforns: yep doing so, plus I'll send an email to djellel, he is probably not reading the chan [14:31:11] ok, thanks! [14:31:20] !log kill application_1576512674871_246419 (eating a ton of ram on the cluster) [14:31:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:32:13] 10Analytics, 10User-Elukey: Redesign architecture of irc-recentchanges on top of Kafka - https://phabricator.wikimedia.org/T234234 (10Ottomata) [[ https://phabricator.wikimedia.org/T242712#5864926 | EventStreams or Kafka ]]!? Hm, the proposal in this ticket isn't specific, there are some mentions of backed by... [14:32:21] mforns: all good [14:32:53] thaaanks! [14:33:00] my query working now :] [14:40:06] I know that this is a super n00b question, but how can I know when a spark query is running on a notebook? [14:40:12] (after hitting "Run") [14:49:18] elukey: it is pretty opaque, i know. [14:49:24] but, a if a cell has not returned [14:49:30] i think it has an asterisk next to it instead of a number [14:49:31] e.g. [14:49:32] [*] [14:50:00] ah okok [14:50:23] I am trying to use notebooks and I am a total n00b [14:53:05] but the absence of feedback compared to a spark shell is a big downside [14:53:21] indeed [15:26:59] !log kill application_1576512674871_246621 (consuming too much memory) [15:27:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:27:02] sigh [15:35:52] 10Analytics, 10Analytics-Cluster, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) [15:43:06] 10Analytics, 10Analytics-Kanban, 10Release Pipeline, 10Patch-For-Review, and 2 others: Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10Ottomata) Bump @akosiaris [15:45:24] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Security Readiness Reviews, and 2 others: Security Review For EventStreamConfig extension - https://phabricator.wikimedia.org/T242124 (10sbassett) Heh, no problem, I thought that was kinda funny. Anyhow, I think the `README.md` clarification is helpful... [15:53:26] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Security Readiness Reviews, and 2 others: Security Review For EventStreamConfig extension - https://phabricator.wikimedia.org/T242124 (10sbassett) 05Open→03Resolved [15:53:31] 10Analytics, 10Event-Platform, 10Wikimedia-Extension-setup, 10Patch-For-Review, and 2 others: Deploy EventStreamConfig extension - https://phabricator.wikimedia.org/T242122 (10sbassett) [15:55:14] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Security Readiness Reviews, and 2 others: Security Review For EventStreamConfig extension - https://phabricator.wikimedia.org/T242124 (10Ottomata) Actualy, something like that for stream name validation in general would be good. Stream names must follow... [16:00:18] nuria, ottomata standuuup [16:02:36] sorry mforns I missed your ping :( [16:02:47] no problemo! [16:03:02] * joal should not work with IDE masking chan [16:24:21] 10Analytics, 10Operations, 10Security-Team, 10User-jbond, 10security_assessment_analytics_2018: Log / alert on too many failing logins / Throttling login attempts - https://phabricator.wikimedia.org/T233944 (10chasemp) [16:29:17] 10Analytics, 10Product-Analytics (Kanban): Develop a consistent rule for which special pages count as pageviews - https://phabricator.wikimedia.org/T240676 (10Ghilt) As an arbcom, we need to see the Pageviews for Special:Contributions/USERNAME. For example, when we receive hounding claims. That's why we opened... [16:38:51] joal: ok, thanks! i am trying your suggestions [16:52:36] joal: is it better now? I am also having issues with "Split metadata size exceeded" [16:53:15] djellel: in term of usage, you're using half of the cluster instead of 90%, so yes, it's better :) [16:53:47] joal: aie .. [16:54:07] djellel: also, from a usage perspective, I guess the job uses 1 month of webrequest [16:54:30] djellel: is the 1-month-at-a-time really needed, or could it be run evewry day with a day of data? [16:57:00] joal: correct. good point. Is 1024 mappers half the cluster? [16:57:48] djellel: correct in reading 1 month of webrequest I guess - Any answer or beiong able to split it? [16:58:10] djellel: 1024 mappers of 2Gb each means 2+Tb, so yeah, half the cluster memory [16:58:43] djellel: also, 1024 cores is a bit more than half of the 1866 cores we have [17:01:48] 10Analytics, 10Multimedia, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10fdans) p:05Triage→03High [17:02:13] joal: Ok, I will switch to day processing. [17:02:49] djellel: if possible, it's a lot better to schedule daily (even hourly !), it spreads the load on time [17:04:02] joal: I understand, I've to figure out how to merge my output. [17:11:07] 10Analytics, 10Analytics-Kanban: Productionize item_page_link table - https://phabricator.wikimedia.org/T244707 (10fdans) p:05Triage→03High [17:12:27] 10Analytics: Analytics datasets should be under a free license - https://phabricator.wikimedia.org/T244685 (10fdans) p:05Triage→03High [17:12:46] 10Analytics: Analytics datasets should be under a free license - https://phabricator.wikimedia.org/T244685 (10fdans) a:03fdans [17:13:39] 10Analytics, 10Pageviews-API: Pageviews for "Special:Contributions/USERNAME" not working: "Error querying Pageviews API - Not found" - https://phabricator.wikimedia.org/T244639 (10fdans) a:03Nuria [17:13:46] 10Analytics, 10Pageviews-API: Pageviews for "Special:Contributions/USERNAME" not working: "Error querying Pageviews API - Not found" - https://phabricator.wikimedia.org/T244639 (10fdans) p:05Triage→03High [17:14:36] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics (Kanban): Bot field in edits_hourly dataset ignores username - https://phabricator.wikimedia.org/T244632 (10fdans) a:03Milimetric [17:14:58] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics (Kanban): Bot field in edits_hourly dataset ignores username - https://phabricator.wikimedia.org/T244632 (10fdans) p:05Triage→03High [17:17:35] 10Analytics, 10Analytics-Wikistats: Canonical wikistats v2 URLs should be permalinks to the period the graph is referring to - https://phabricator.wikimedia.org/T244618 (10fdans) 05Open→03Declined This was the previous behavior. We decided that it made more sense from a user's perspective (based on our use... [17:18:34] 10Analytics, 10Analytics-Kanban: Create intermediate table that holds public data for geoeditors dataset so it can be used to load cassandra - https://phabricator.wikimedia.org/T244597 (10fdans) p:05Triage→03High [17:23:00] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10fdans) @nshahquinn-wmf Speaking for the team: that approach sounds good! [17:24:56] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog: EventLogging MEP Upgrade Phase 1 - https://phabricator.wikimedia.org/T244521 (10fdans) [17:25:15] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog: EventLogging MEP Upgrade Phase 1 - https://phabricator.wikimedia.org/T244521 (10fdans) p:05Triage→03High [17:31:10] 10Analytics, 10Analytics-Cluster: Refine failed for event.mediawiki_cirrussearch_request - https://phabricator.wikimedia.org/T244765 (10EBernhardson) [17:31:49] 10Analytics, 10Analytics-Cluster: Refine failed for event.mediawiki_cirrussearch_request - https://phabricator.wikimedia.org/T244765 (10EBernhardson) [17:38:44] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10fdans) The user agent of the app must be changed to "WikipediaApp" (at the beginning of the string). Additionally, if you can send us what the URLs look like when requesting d... [17:39:57] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Superset aggregation across edit tags uses all tags - https://phabricator.wikimedia.org/T243552 (10fdans) [17:41:00] 10Analytics, 10Analytics-Cluster: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10fdans) p:05Triage→03High [17:41:12] 10Analytics, 10Analytics-Kanban: Mediawiki history public release: tsv format is not correctly parsable - https://phabricator.wikimedia.org/T243427 (10fdans) 05Open→03Resolved [17:41:54] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Enable shell access to presto from jupyter/stats machines - https://phabricator.wikimedia.org/T243312 (10fdans) p:05Triage→03High [17:42:16] 10Analytics, 10Analytics-Kanban: Add Presto to Analytics' stack - https://phabricator.wikimedia.org/T243309 (10fdans) [17:47:22] djellel: heya - you're still up taking 75% of the cluster with your two jobs - could you have them running one at a time? [17:48:04] djellel: I think you still have the monthly one running, and you're testing daily [17:48:46] 10Analytics, 10Fundraising-Backlog, 10WMDE-Analytics-Engineering, 10WMDE-FUN-Team, 10WMDE-Fundraising-Tech: Find a better way for WMDE to get impression counts for their banners - https://phabricator.wikimedia.org/T243092 (10fdans) hellooo ping on this [17:50:13] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation: Some xml-dumps files don't follow BZ2 'correct' definition - https://phabricator.wikimedia.org/T243241 (10fdans) p:05Triage→03Medium [17:50:56] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Add new dimensions to virtual_pageview_hourly and pageview_hourly - https://phabricator.wikimedia.org/T243090 (10fdans) p:05Triage→03High [17:51:51] 10Analytics: Spike. Try to ML models distributted in jupyter notebooks with dask - https://phabricator.wikimedia.org/T243089 (10fdans) p:05Triage→03Medium [17:52:32] 10Analytics, 10Analytics-Kanban: Mediawiki history documentation for public dataset release - https://phabricator.wikimedia.org/T243426 (10fdans) 05Open→03Resolved [17:55:19] 10Analytics, 10Fundraising-Backlog, 10WMDE-Analytics-Engineering, 10WMDE-FUN-Team, 10WMDE-Fundraising-Tech: Find a better way for WMDE to get impression counts for their banners - https://phabricator.wikimedia.org/T243092 (10DStrine) I'm not sure what information you are looking for. pgehres is only acce... [17:56:24] 10Analytics, 10Analytics-Wikistats: Wikimedia Statistics - Confusing number formatting/rounding on the vertical axis scale on the "Page to date" statistics graph - https://phabricator.wikimedia.org/T242790 (10fdans) Thank you for filing! This is a regression, let's replace the rounding/scaling/number formattin... [17:56:59] 10Analytics, 10Analytics-Wikistats: Wikimedia Statistics - Confusing number formatting/rounding on the vertical axis scale on the "Page to date" statistics graph - https://phabricator.wikimedia.org/T242790 (10fdans) p:05Triage→03High [17:57:42] 10Analytics, 10Analytics-Kanban: Data quality Dashboards 2.0 - https://phabricator.wikimedia.org/T242995 (10fdans) [17:57:47] 10Analytics, 10Fundraising-Backlog, 10WMDE-Analytics-Engineering, 10WMDE-FUN-Team, 10WMDE-Fundraising-Tech: Find a better way for WMDE to get impression counts for their banners - https://phabricator.wikimedia.org/T243092 (10kai.nissen) I have no insight into the process on WMF's side. This is what is be... [17:57:56] 10Analytics-Kanban: Data quality Dashboards 2.0 - https://phabricator.wikimedia.org/T242995 (10fdans) [18:08:57] 10Analytics, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511 (10bd808) [18:09:00] 10Analytics, 10Data-Services: Document the process for importing a new "datasets_p" table - https://phabricator.wikimedia.org/T173514 (10bd808) [18:09:05] 10Analytics, 10Data-Services: Create a database on the wikireplica servers called "datasets_p" - https://phabricator.wikimedia.org/T173513 (10bd808) [18:10:12] djellel: ping again? [18:10:49] ok - killing the monthly job djellel [18:10:53] +1 [18:12:20] djellel: I'm gonna let the other one leave at it is smaller (and therefore will finish soon), but limitations should also apply to daily jobs please [18:13:00] djellel: also please help us and keep an eye on your jobs, we are here to help but we asked you three times (plus an email) the same thing today :) [18:17:13] 10Analytics: centralnotice events do not have data - https://phabricator.wikimedia.org/T244771 (10Nuria) [18:18:45] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: centralnotice events do not have data - https://phabricator.wikimedia.org/T244771 (10AndyRussG) [18:19:32] * elukey off! [18:28:30] Hey everyone :) [18:28:45] o/ [18:29:17] I'm not sure how to access Hive. I'm on stat1005 and `hive --database wmf` is throwing an error [18:29:44] foks: do you have a kerberos account? [18:29:53] oh hm. Is that new? [18:30:01] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide#Get_a_password_for_Kerberos [18:30:02] yeah [18:30:03] Probably not heh. Been a while since I last had to use it [18:30:11] you must have missed some announcements [18:30:12] :) [18:30:13] Cool thanks, I'll do that [18:30:22] Yeah it's rare I have to pull stuff out of Hive :) [18:30:45] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Refining is failing to refine centranoticeimpression events - https://phabricator.wikimedia.org/T244771 (10Nuria) [18:30:53] hmm, how do I check that my "shell username is in analytics-privatedata-users" [18:31:07] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Refining is failing to refine centranoticeimpression events - https://phabricator.wikimedia.org/T244771 (10Nuria) a:03Ottomata [18:31:17] I imagine it is since I've pulled in the past [18:31:23] ottomata: please take a look at this: https://phabricator.wikimedia.org/T244771 [18:31:40] looking [18:32:34] 10Analytics: Create a Kerberos identity for foks - https://phabricator.wikimedia.org/T244773 (10jrbs) [18:38:17] 10Analytics, 10Fundraising-Backlog, 10WMDE-Analytics-Engineering, 10WMDE-FUN-Team, 10WMDE-Fundraising-Tech: Find a better way for WMDE to get impression counts for their banners - https://phabricator.wikimedia.org/T243092 (10Nuria) @kai.nissen I see so data is not sensitive in nature and seems that the b... [18:38:52] 10Analytics-Kanban: Add Presto to Analytics' stack - https://phabricator.wikimedia.org/T243309 (10Nuria) [18:47:08] ottomata: let me know if you need another pair of eyes [18:48:28] nuria: [18:48:33] someone changed a field type [18:48:34] https://meta.wikimedia.org/w/index.php?title=Schema%3ACentralNoticeImpression&type=revision&diff=19511351&oldid=19510146 [18:48:37] AndyRussG: ^ [18:49:01] i'm not sure why we aren't getting an error [18:49:09] but it looks like there is some attempt at casting going on? [18:49:11] in refine? [18:49:25] the hive schema has the field with an array [18:49:40] | |-- campaignStatuses: array (nullable = true) [18:49:41] | | |-- element: string (containsNull = true) [18:49:42] campaignStatuses:array [18:50:26] hehe, nuria mep schema CI won't let you do this! so we can look forward to that :) [19:06:51] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10Product-Analytics: Bot field in edits_hourly dataset ignores username - https://phabricator.wikimedia.org/T244632 (10nshahquinn-wmf) @fdans I assume you meant to add your Kanban board? 🙂 [19:10:57] ottomata: Shouln't alarms have fired? [19:11:06] ottomata: cause it is a bit worrisome they did not [19:11:10] nuria: i'm looking to see why they didn't [19:11:13] but in this case there was no failure [19:11:14] ottomata: k [19:11:16] from refines PoV [19:11:24] it says it refined 1 or 2 datasets each time [19:11:27] ping @AndyRussG [19:11:38] ping AndyRussG [19:12:07] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Infrastructure-Team-Backlog, 10Product-Analytics (Kanban): Switch mw.user.sessionId back to session-cookie persistence - https://phabricator.wikimedia.org/T223931 (10mpopov) [19:12:20] AndyRussG: i think you need to revert your schema change and we can re-refine the data [19:13:06] nuria: i don't think that will work [19:13:22] assuming his clients have been emitting strings since october [19:13:23] instead of arrays [19:13:37] we can alter the table to the way it is now, and re-refine [19:13:42] first i'm trying to find out what happened [19:13:46] why not error [19:15:06] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Analytics, 10Product-Infrastructure-Team-Backlog: Switch mw.user.sessionId back to session-cookie persistence - https://phabricator.wikimedia.org/T223931 (10mpopov) [19:15:35] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Analytics, 10Product-Infrastructure-Team-Backlog: Switch mw.user.sessionId back to session-cookie persistence - https://phabricator.wikimedia.org/T223931 (10mpopov) a:03jlinehan Jason and I talked about picking this up this quarter [19:17:20] 10Analytics, 10Event-Platform, 10Wikimedia-Extension-setup, 10Patch-For-Review, and 2 others: Deploy EventStreamConfig extension - https://phabricator.wikimedia.org/T242122 (10Jdforrester-WMF) [19:18:31] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Refining is failing to refine centranoticeimpression events - https://phabricator.wikimedia.org/T244771 (10Nuria) pinging @DStrine so he knows this is going on. @AndyRussG Issue can be tracked to this change: https://meta.wikimedia.... [19:21:00] ottomata: ok, i see, it was a non backwards compatible change [19:30:52] 10Analytics, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511 (10Superyetkin) Is there any estimate as to when we will start using this database on replica servers? [19:31:12] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Refining is failing to refine centranoticeimpression events - https://phabricator.wikimedia.org/T244771 (10Ottomata) @AndyRussG [[ campaignStatuses | changed the type of the campaignStatuses field ]]. He added the field at 15:40, 31 O... [19:31:19] ah sorry nuria i had ^ writeup but didn't hit submit [19:51:34] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10Dzahn) Just to confirm. It should still have a public IP in wikimedia.org ? [19:55:14] nuria: hi! sorry I was on a bunch of calls [19:56:10] I'll dig in in a bit! Assuming this is only for the EL-based date pipeline, it's not that urgent... thanks so much ottomata nuria for flagging this! [19:57:20] AndyRussG: there are other users of that data that is not only FR right? Cause banner impressions are printed for wiki loves documents /wmde and others correct? [19:57:51] AndyRussG: and that banner data also ends up on that table? or is this data only restricted to fR banners? [19:58:12] nuria: others are still using the old beacon/impression system too [19:58:36] AndyRussG: is the data sent to this eventlogging schema sampled in any way? [19:59:49] yep! client-side sampled at 1%, unless any CentralNotice camapgin turns up the sample rate for that campaign specifically, which I don't think is happening... [20:00:23] AndyRussG: I see, so it is of limited use [20:01:05] AndyRussG: is there a a reason why it cannot be unsampled so other (non FR use cases) also benefit from this data? [20:01:42] 10Analytics, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511 (10bd808) >>! In T173511#5866295, @Superyetkin wrote: > Is there any estimate as to when we will start using this da... [20:01:52] nuria: sure it can be unsampled... I mean, the same is true for the old beacon/impression system. Sample rate can be adjusted on a per-campaign basis as required [20:02:33] usually the sampled data has been good enough for non-FR campaigns, since they just need a general sense of how many people see the banners, and are not running A/B tests that need a degree of statistical significance [20:03:05] joal: yt? and got a few mins for some fun spark schema stuff? [20:03:11] Also there's the old wider issue of how to make the data more easily available to non-FR campaigns, but that's no different in the old vs new systems [20:03:15] sure ottomata - batcave? [20:03:18] ya [20:05:55] nuria: I mean, with the new system in place, I imagine the question of making the data more widely available may be easier to unblock, but it's not something that's come up per se recently [20:06:22] Seddon is the one who I think sometimes provides general feedback or aggregate data to community campaigns when they ask [20:24:18] 10Analytics, 10Multimedia, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10Tgr) Old versions of images are also something not easily expressed as a file page URL. For now, you only see them when you visit the file page (when you uploa... [20:39:13] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10nshahquinn-wmf) >>! In T244548#5865692, @fdans wrote: > @nshahquinn-wmf Speaking for the team: that approach sounds good! Okay, thank you! [20:47:31] 10Analytics, 10Multimedia, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10MusikAnimal) > without relying on the API you can't tell whether `en.wikisource.org/wiki/File:Speed_Limit_50_Minimum_5_sign.svg` refers to a file on Commons or... [21:26:57] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team, 10MW-1.35-notes (1.35.0-wmf.19; 2020-02-11): EventLogging needs to enque events to avoid draining users' battery on mobile - https://phabricator.wikimedia.org/T225578 (10Krinkle) @Milimetric Is this task ready to resolve? [21:27:01] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MW-1.35-notes (1.35.0-wmf.19; 2020-02-11), 10Performance-Team (Radar): EventLogging needs to enque events to avoid draining users' battery on mobile - https://phabricator.wikimedia.org/T225578 (10Krinkle) [21:28:46] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MW-1.35-notes (1.35.0-wmf.19; 2020-02-11), 10Performance-Team (Radar): EventLogging needs to enque events to avoid draining users' battery on mobile - https://phabricator.wikimedia.org/T225578 (10Milimetric) Moved to done, Nuria likes to look t... [21:50:10] (03PS1) 10Ottomata: [WIP] Warn when merging incompatible types; FAILFAST when reading JSON data with a schema [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/571365 (https://phabricator.wikimedia.org/T244771) [21:50:33] joal: i'm not sure if this is how I want to do it yet [21:50:41] i keep waffling on where to put canCast [21:50:41] but [21:50:51] i think just warning in HiveExtensions makes sense [21:50:56] FAILFAST for json [21:51:11] we'll fail on the convertToSchema in DFtoHive anyway [21:51:20] it seems maybe more right to do what merge says it will [21:51:23] AHHH i don't know [21:51:26] still thinking about it [21:51:29] but, i gotta run soon [21:51:34] so i'll park that there for today [21:51:37] thanks for your help :) [21:51:53] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Milimetric) I think it's worth thinking through a migration to a hypothetical v3, just to be careful. So I did that today and I reason that a v2 -> v3 m... [23:24:30] 10Analytics, 10Multimedia, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10Nuria) Thanks for following up. >Old versions of images are also something not easily expressed as a file page URL. It is worth having in mind that our core... [23:33:58] 10Analytics, 10Fundraising-Backlog, 10WMDE-Analytics-Engineering, 10WMDE-FUN-Team, 10WMDE-Fundraising-Tech: Find a better way for WMDE to get impression counts for their banners - https://phabricator.wikimedia.org/T243092 (10Nuria) @kai.nissen the data on events/centralnoticeimpression is sampled 1% W... [23:36:06] Dear Analytics. Congrats on announcing the mediawiki history dumps. This is big. :) [23:37:36] I have a question for you. I'm looking at yowiki.all-time dataset and in the first 10 rows I see dataset entries like "r2.7.3) (Bot: Ìfikún [[hy:(256) Վալպուրգա]]". I'm assuming the paranthesis is broken in this case. Is this a known issue? [23:41:19] leila: do please file a ticket for that , it is likely a parsing issue that might appear in other lines as well [23:41:39] nuria: on it. [23:45:20] 10Analytics, 10Analytics-Wikistats: Mediawiki History Dumps - Possible parsing issue - https://phabricator.wikimedia.org/T244807 (10leila) [23:59:39] 10Analytics, 10Operations, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Krinkle) [23:59:58] 10Analytics, 10Operations, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Krinkle)