[00:48:56] PROBLEM - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:56:34] PROBLEM - Check the last execution of monitor_refine_eventlogging_analytics_failure_flags on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_analytics_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:00:34] 10Analytics-Radar, 10MediaWiki-extension-requests: "Reverted edits" view for Contributions page - https://phabricator.wikimedia.org/T186536 (10DannyS712) Not sure it needs to be an extension, {T248775} and {T254074} [06:25:44] 10Analytics-Clusters, 10Discovery, 10Discovery-Search, 10Patch-For-Review: Move mjolnir kafka daemon from ES to search-loader VMs - https://phabricator.wikimedia.org/T258245 (10elukey) The main blocker at the moment seems to be the fact that mjolnir runs in two places: * `role::elasticsearch::cirrus`, tha... [06:44:37] !log truncate big log file on an-launcher1002 that is filling up the /srv partition [06:44:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:45:43] 10Analytics: RU reportupdater-ee-beta-features keeps logging a lot of daily errors to its logs - https://phabricator.wikimedia.org/T256195 (10elukey) @mforns the /srv partition on an-launcher1002 was already filling up, we should prioritize this and see if we can remove these errors from the logs :( [07:14:00] elukey: could we get kerberos for agaduran? pinging you here as you suggested in https://phabricator.wikimedia.org/T258214 but can also open a separate ticket. thanks [07:22:53] mgerlach: good morning! Standard fee is 5 euros! :D [07:25:58] mgerlach: (done :) [07:26:01] naah, Kerberos follows a freemium model, you get the Kerberos principal for free, but to run kinit more than once per week you need to buy the enterprise plan [07:26:35] moritzm: ahhhh I was unaware of this, I am running my own illegal business though, don't tell researchers! [07:27:44] elukey: :) thanks [07:28:44] mgerlach: there should be an email with the tmp pass sent to agaduran's email, etc.. you know how it works :) if you need anything ping me! [07:31:05] elukey: I will check with agaduran; thanks again [08:19:38] !log reset-failed the monitor_refine_failures for eventlogging on an-launcher1002 [08:19:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:21:07] RECOVERY - Check the last execution of monitor_refine_eventlogging_analytics_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:23:54] RECOVERY - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:49:19] 10Analytics-Clusters: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10elukey) @Isaac in theory there shouldn't be a lot of issues, but if you want to make sure you can try to install them on stat1005/stat1008 that are already running debian 10 (just to double check tha... [08:52:41] 10Analytics-Radar, 10MediaWiki-extension-requests: "Reverted edits" view for Contributions page - https://phabricator.wikimedia.org/T186536 (10Ostrzyciel) If I understand how Special:Contributions' //Tag filter// field works, once T254074 is complete, it should be possible to show only edits that were reverted... [08:56:30] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T256356 (10elukey) @Groceryheist of course, no problem! What we usually ask is either to delete or to move the files under somebody else's ownership, to have better management of PII data. I would prefer that the files would m... [09:41:14] 10Analytics-Clusters, 10Operations, 10User-MoritzMuehlenhoff: Replace firejail use in superset with native systemd features - https://phabricator.wikimedia.org/T258700 (10elukey) p:05Triage→03Medium [12:14:56] * elukey lunch! [12:53:36] hi teammm [13:03:19] helooo [13:08:12] hello! [13:08:25] mforns: when you have a moment I'd need to discuss with you the druid upgrade :) [13:08:49] elukey: sure! now? [13:10:02] mforns: yep! So basically the new version is ready on analytics1041, in hadoop test [13:10:20] aha [13:10:24] I did some tests myself, and it looks working, but I'd need sombody to double check [13:10:27] what I did was [13:10:40] 1) kafka supervisor ingestion for netflow (basically indexation from kafka) [13:10:49] 2) simple parquet ingestion from webrequest [13:11:02] 3) simple ingestion from webrequest (json) [13:11:16] there are a lot of changes etc.. [13:11:40] the idea is to upgrade the druid public cluster first (the one that serves data to AQS) [13:11:41] aha [13:11:59] since worst case scenario we rollback to 0.12.3 and just re-index the history snapshot [13:12:32] what I'd need, if/when you have time, would be to test it and see if I forgot anything [13:12:48] aha, because all it contains for AQS is MediaWikiHistory-based, right? [13:13:24] elukey: ok [13:13:47] yes exactly [13:14:14] Your tests seem pretty exhaustive [13:14:19] I never did streaming-ingestion from kafka [13:14:57] elukey: The only thing that I can think of that we could try is re-ingesting from already ingested data [13:16:11] like recompacting a datasource into monthly segments, from already ingested data [13:16:19] ah yes yes [13:16:27] *from an already ingested daily datasource [13:18:04] Also, being super nit-picky, testing slow schema increments, like 1st ingest datasource D with schema S(3 fields) [13:18:41] and then 2nd ingest next time partition of D with an incremented schema D+(with 2 more extra fields) [13:18:51] and see if there are issues [13:19:27] we can do those tests if you want, in test we have navtiming and webrequest sampled basically [13:19:32] and maybe also (heh, things start coming to my mind) play a bit with Druid's yaml config [13:19:47] ok [13:19:51] ssh -N analytics1041.eqiad.wmnet -L 8081:localhost:8081 [13:19:53] this is the new UI [13:21:03] ooooooh! [13:21:22] this looks much better :D [13:21:50] excited to see if they improved ingestion logs! [13:22:46] this replaces overlord + coordinator ui, but it is not at full power (for example, no SQL available etc..) since we'd need a new daemon called "router" [13:23:01] but if needed we can do it after the upgrade [13:24:20] elukey: but still a huge improvement! the ingestion logs look the same, but they are accessible from the ui :D [13:27:03] I'm seeing now that the UI is a bit buggy [13:29:06] 10Analytics-Clusters, 10Patch-For-Review, 10User-Elukey: Upgrade Druid to its latest upstream version (currently 0.18.1) - https://phabricator.wikimedia.org/T244482 (10elukey) Test that I did: * one off parquet ingestion with: ` curl -X 'POST' -H 'Content-Type:application/json' -d '{ "type" : "index_hado... [13:29:23] mforns: --^ are the tests done with specs [13:31:09] elukey: what do you mean? is that a question? [13:31:23] oh, you mean that those are the tests you did right? [13:31:28] ok, thanks! :D [13:32:27] yes yes sorry :D [13:33:08] hey mforns [13:33:26] I need some help thinking about history data when you have a chance [13:33:27] heya milimetric [13:33:41] sure, wanna bc? [13:33:44] omw [13:33:57] ok [13:38:07] 10Analytics, 10Operations, 10Patch-For-Review: Move Hue to a Buster VM - https://phabricator.wikimedia.org/T258768 (10herron) p:05Triage→03Medium [13:41:58] 10Analytics, 10Operations, 10Patch-For-Review: Move Hue to a Buster VM - https://phabricator.wikimedia.org/T258768 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [13:55:14] 10Analytics-Clusters: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10Isaac) > but if you want to make sure you can try to install them on stat1005/stat1008 that are already running debian 10 (just to double check that nothing explodes etc..) Ahh good point -- done and... [14:43:30] mforns: I was wrong about user names, my query must have not taken into account end_timestamp is null, there are zero problems with that [14:43:50] and a tiny tiny percent of problems with groups overall, so it must be a handful of special cases like bureaucrats and admins [14:43:57] (judging by the count (it's like 0.02%) [14:54:33] milimetric: oh! feewwww :D [14:55:28] so it's just the sysops group? [15:01:52] ping ottomata [15:03:40] AH [15:03:48] 10Analytics-EventLogging, 10Analytics-Radar, 10QuickSurveys, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10phuedx) >>! In T220627#6283034, @Isaac wrote: > Just to clarify, there were two unexpl... [15:25:57] 10Analytics-Clusters, 10Discovery, 10Discovery-Search, 10Patch-For-Review: Move mjolnir kafka daemon from ES to search-loader VMs - https://phabricator.wikimedia.org/T258245 (10EBernhardson) We can kill the relforge installation of the daemons. The msearch daemon lets us run search queries, but the plan is... [15:41:46] 10Analytics-Radar, 10Product-Analytics, 10Release-Engineering-Team, 10Repository-Admins: Create a repository and user for Product Analytics Oozie jobs - https://phabricator.wikimedia.org/T230743 (10fdans) [15:47:20] 10Analytics-EventLogging, 10Analytics-Radar, 10Contributors-Team, 10MobileFrontend: Schema:MobileWebEditing: What are commons sorts of errors? - https://phabricator.wikimedia.org/T118366 (10fdans) [15:52:15] 10Analytics-Radar, 10Operations, 10Patch-For-Review: Move Hue to a Buster VM - https://phabricator.wikimedia.org/T258768 (10fdans) [16:33:50] 10Analytics, 10Product-Analytics: Investigate accessing superset via internal VPN - https://phabricator.wikimedia.org/T258962 (10Nuria) [16:36:50] 10Analytics, 10Product-Analytics: Investigate accessing superset via internal VPN - https://phabricator.wikimedia.org/T258962 (10Nuria) @MoritzMuehlenhoff We are concerned that being able to access superset just with a yubi key might be too big of a barrier for many of the users we need to support. Could we a... [16:38:39] 10Analytics, 10Product-Analytics: Investigate accessing superset via internal VPN - https://phabricator.wikimedia.org/T258962 (10elukey) @Nuria we decided not to do pursue the VPN road in other tasks, what kind of barrier a yubikey should represent? It will definitely more problematic to explain to people how... [16:41:22] nuria: is there still time in the meeting to talk about superset? [16:42:06] elukey: yes, but kzeta is no longer here, i think having her on the meeting will be best [16:42:58] nuria: ack then let's set up some specific meeting, the VPN solution is way harder in my opinion (for us and users) [16:48:47] 10Analytics, 10Product-Analytics: Investigate accessing superset via internal VPN - https://phabricator.wikimedia.org/T258962 (10elukey) For reference: https://phabricator.wikimedia.org/T242998 (discussion about VPN with the security team) [16:54:53] 10Analytics, 10Product-Analytics: Investigate accessing superset via internal VPN - https://phabricator.wikimedia.org/T258962 (10Nuria) @elukey Found old ticket https://phabricator.wikimedia.org/T242998 [16:56:15] 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10SDAW-MediaSearch (MediaSearch-Beta), 10Structured-Data-Backlog (Current Work): Instrument MediaSearch results page - https://phabricator.wikimedia.org/T258183 (10CBogen) [17:01:19] bearloga: o/ - analytics-product user approved by SRE, will try to set it up tomorrow :) [17:01:34] elukey: yay!! :D thank you!!!! [17:15:23] !log restart eventlogging on eventlog1002 to update the event whitelist (exclude MobileWebUIClickTracking) [17:15:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:15:32] not sure if it was needed but I did it anyway :) [17:29:06] * elukey off! [17:31:20] 10Analytics, 10Analytics-Kanban: Update PageviewDefinition to only include /api/rest_v1/page/mobile-html requests with X-Analytics: pageview=1 in pageviews - https://phabricator.wikimedia.org/T257860 (10JoeWalsh) > So we are all on the same page this will *reduce* the number of pageviews, see plot. For 2020/07... [17:53:41] 10Analytics: History: mismatched historical and latest values - https://phabricator.wikimedia.org/T258967 (10Milimetric) [18:19:02] fyi im enabling instrumentation on some wikis today which will likely increase the volume of events to Schema:DesktopWebUIActionsTracking. I'll be monitoring https://grafana.wikimedia.org/d/000000566/overview?panelId=23&fullscreen&orgId=1 but please let me know if you see anything funky that requires a revert. [18:26:30] hey hey - I'm trying to run a query on Superset and am just getting "502 Proxy Error" - a quick search of Phab reveals that this might be because the query is too broad? [18:26:37] I'll try it with more limits [18:37:38] yeah I think it's a limit issue. not sure if anyone's familiar otherwise [18:50:57] milimetric: i missed your post above [18:51:05] milimetric: ok, so one les sthing to worry about [18:51:36] nuria: wait what do you mean? [18:51:39] I have to find the bug no? [18:52:05] milimetric: i would file a ticket [18:52:16] milimetric: for quality , given that is 0.02% of data [18:52:35] nuria: right but it's 50% of the sysop / bureaucrat data across wikis [18:52:52] (sorry, I buried that at the bottom of the description) [18:53:14] milimetric: not saying is not important but it does not seem urgent such it needs to look at it right now [18:53:38] milimetric: there is also another bug on user names that deals with encoding [18:53:41] nuria: I should find a different topic for the blogpost then, I can't publish if this part of the data is not clean [18:53:54] I have a few other options [18:53:57] milimetric: agreed [18:54:11] this would've been good though... [18:54:32] nuria: the other option is I do incremental stuff now and we postpone until we can fix the bug [18:56:23] *postpone the blog post [18:56:29] milimetric: i think the data provides a lot of value even with that bug on quality , on my opinion we do not need to hold on publishing the blogpost for that [18:56:54] ok, sounds good, will finish it up with a different idea then [19:09:44] 10Analytics-Radar, 10Product-Analytics (Kanban): Check Product Analytics team's standard datasets and remove COUNT(*) - https://phabricator.wikimedia.org/T256025 (10cchen) 05Open→03Resolved Removed COUNT(*) from data sources and data tables in Superset. Didn't see COUNT(*) as metrics in datasets in Turnilo. [19:09:46] 10Analytics, 10Product-Analytics: Remove COUNT(*) from datasets when not useful in Superset & Turnilo - https://phabricator.wikimedia.org/T255725 (10cchen) [19:23:53] 10Analytics, 10Product-Analytics: Set up environment for Product Analytics system user - https://phabricator.wikimedia.org/T258970 (10mpopov) [19:24:40] 10Analytics, 10Product-Analytics: Set up environment for Product Analytics system user - https://phabricator.wikimedia.org/T258970 (10mpopov) [19:24:43] 10Analytics-Radar, 10Product-Analytics, 10Release-Engineering-Team, 10Repository-Admins: Create a repository and user for Product Analytics Oozie jobs - https://phabricator.wikimedia.org/T230743 (10mpopov) [19:34:16] tzatziki: o/ - can you open a task (when you have time) with the query that you used to break superset? I am interested in reproducing to see what went wrong [19:35:17] but from https://grafana.wikimedia.org/d/pMd25ruZz/presto?orgId=1 (workers section) I see a ton of data moved [19:41:16] 10Analytics, 10Product-Analytics: Set up environment for Product Analytics system user - https://phabricator.wikimedia.org/T258970 (10mpopov) [20:01:31] 10Analytics, 10Analytics-Kanban: Validation rules on eventgate should take max int values into account in order to validate data for an schema - https://phabricator.wikimedia.org/T258659 (10Krinkle) [20:08:55] elukey: yeah I can - I broke it down into 8-hour chunks and it worked [20:18:48] (03PS1) 10Nuria: Removing outdated IOS pageview code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/616591 (https://phabricator.wikimedia.org/T257860) [20:46:58] (03PS2) 10Nuria: Removing outdated IOS pageview code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/616591 (https://phabricator.wikimedia.org/T257860) [21:09:15] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T256356 (10leila) >>! In T256356#6334346, @Groceryheist wrote: > Okay, > For the project with @halfak any risks would arise from the internal histories of historical ores scores of revisions. The rest of the data used in the... [22:48:15] 10Analytics, 10MobileFrontend, 10Readers-Web-Backlog, 10XAnalytics: MobileFrontend should use XAnalytics extension - https://phabricator.wikimedia.org/T217859 (10Jdlrobson) Adding analytics given conversation on referenced task. [23:41:13] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T256356 (10Groceryheist) This all sounds fine with me. No CSCW reviews yet. I'll update this thread when the time comes. [23:55:52] (03PS1) 10Nuria: For Android and iOS we only count pageviews with x-Analytics marker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/616629 (https://phabricator.wikimedia.org/T257860) [23:56:07] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update PageviewDefinition to only include /api/rest_v1/page/mobile-html requests with X-Analytics: pageview=1 in pageviews - https://phabricator.wikimedia.org/T257860 (10Nuria) Current patch counts only requests with X-Analytics: pageview=1 for iOS and And...