[00:48:50] PROBLEM - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:24:32] PROBLEM - Check the last execution of monitor_refine_event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:20:13] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) Spark2 seems not able to run any Yarn jobs, I see the following in the application's logs: ` 20/08/12 07:22:09 ERROR ApplicationMast... [10:38:33] /away afk! lunch! [13:32:51] weird spark2 is only bundled with hadoop 2.7 maximum [13:32:52] mmmm [13:33:56] 10Analytics, 10Analytics-Kanban, 10Design-Research: Setup and integrate analytics for Design Research Website - https://phabricator.wikimedia.org/T259322 (10Volker_E) Was wondering about access rights, but I guess that's neglible question here. So we're just reusing the scripts of design.wikimedia.org…?! [14:20:49] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Infrastructure-Data, and 3 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10mforns) Hi @jlinehan! Here's the continuation of the Gerrit discussion about using seconds elapsed instead of t... [14:57:29] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Infrastructure-Data, and 3 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10jlinehan) Quick thoughts, let's discuss more today though 1. I think perhaps the presence of `tick_interval` is... [15:03:42] nuria: standup? [15:34:19] so I found the problem [15:34:23] if I add --conf spark.yarn.am.memory=4g everything works [15:34:42] and I remember Luca from the past discovering the same issue (and not writing it down sigh) [15:34:59] by default it is 512m afaics [15:36:42] https://spark.apache.org/docs/2.4.4/running-on-yarn.html [15:36:50] the docs suggest that it is related to only client mode [15:38:08] 10Analytics-Radar, 10Performance-Team, 10MW-1.36-notes (1.36.0-wmf.4; 2020-08-11): Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Ottomata) Ok @Gilles `maximum` and `minimum` can now be added. I edited NavigationTiming and added them for the `connectStart` and `connectEnd` f... [15:38:16] ahhh of course, spark2-shell deploy mode is client, and the druid indexation timers are client as well (due to the alarming problems) [15:39:10] elukey: ops sync or skip? [15:39:56] ottomata: we can do it quickly if you have time, this spark thing is interesting [15:40:20] ok! [15:40:52] 10Analytics-Radar, 10Epic, 10MW-1.35-notes (1.35.0-wmf.32; 2020-05-12), 10Platform Team Initiatives (Revision Storage Schema Improvements), 10Technical-Debt: Remove revision_comment_temp and revision_actor_temp - https://phabricator.wikimedia.org/T215466 (10AMooney) a:05Anomie→03None [15:55:21] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10mpopov) I have a strong suspicion that this bug also affects `mw.eventLog.submit()` b... [16:05:48] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Infrastructure-Data, and 3 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10mforns) @jlinehan > 1. I think perhaps the presence of tick_interval is confusing here. In practice, we are n... [16:18:17] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) After a lot of debugging, the issue seemed to be related to the AM getting killed for too much virtual memory used: ` 2020-08-12 14:... [16:18:28] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10jlinehan) >>! In T259917#6379915, @mpopov wrote: > I have a strong suspicion that thi... [16:28:02] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Infrastructure-Data, and 3 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10jlinehan) >>! In T248987#6379951, @mforns wrote: > I mentioned that in Gerrit CR: The advantage I see is: inter... [16:48:17] ottomata: I found another way to fix the spark issue, namely yarn.nodemanager.vmem-pmem-ratio [16:48:41] default is 2.1 (2 to 1), I bumped it to 5 to 1 [16:48:43] it seems fine [16:50:15] also druid indexing seems working fine [16:51:36] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) yarn.nodemanager.vmem-pmem-ratio set to 5.1 (5 to 1, default 2 to 1) seems also to solve the issue! [16:56:56] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10mpopov) Nevermind, `test.event` is a stream in `wgEventStreams` but is not registered... [16:57:37] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10Ottomata) I don't think it is; it is not listed in `wgEventLoggingStreamNames`. BTW,... [17:11:59] ottomata: sorry for the false alarm on the possible effect of the []/{} bug on EventLogging! we'll have to keep an eye on how common "stream configured but not registered with EL" ends up being for teams instrumenting on MediaWiki [17:12:14] np! [17:12:39] you know...we might be able to change that. we are tagging all stream config with destination_event_stream now [17:12:49] we could change EL to only request streams that have that set to eventgate-analytics-external... [17:13:06] that means it will get stream config for streams that go to eventgate-analytics-external but not through EVentLogging though (like iOS app streams) [17:13:58] but ya hm, we could use other settings as constraints for eventlogging [17:14:07] hmmm [17:14:24] instead of a separate mw config, just stick a in setting that says this stream should be loaded by eventlogging ext. [17:14:25] what about making it exclude streams with certain destinations? [17:14:26] somehow [17:14:52] hm the API doesn't support negative constraints, but since this is loaded by EL in PHP ResourceLoader hook [17:14:55] you could do anything you wanted in there [17:18:23] we could enforce all streams to have the destination configured, make EL auto-retrieve stream configs with specific destination, and that would also help lighten the payload when apps are downloading the stream configs because they likewise could specify that constraint (right now they don't because they need to retrieve streams where destination may not be configured, but that also includes non-analytics streams [17:19:18] so if we place a hard requirement that all streams must specify a destination, that would (1) make that []/{} moot, and (2) kind of make everything easier for EventLogging & apps clients [17:19:37] hmmmmmmm indeed [17:25:39] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10mpopov) >>! In T259917#6380167, @Ottomata wrote: > BTW, I am removing 'test.event' as... [17:29:21] bearloga: all streams must specify a destination, but it is not included unless ?all_settings is specified via the API [17:29:28] to reduce bandwidth for undeeded settings [17:29:49] but yes that was my suggestion above, make EL request streams for which destination_event_service=eventgate-analytics-external [17:30:02] and ya, all stream configs will now have to have a destination_event_service set [17:30:13] that is how an eventgate instance knows it is allowed to produce a stream [17:32:39] hm that kinda messes with that (vaguely hacky) solution for targeted deployment of streams to certain wikis [17:33:19] wherein streams are deployed to every wiki's config but then registered with eventlogging only on the specific wikis those streams should be active for [17:34:22] so if we go with destination filtering, i guess that makes the is_active: false/true setting idea for streams more appealing [17:42:32] true! [17:42:33] good point [17:42:47] probalby destination filtering isn't right [17:42:55] el will pick up streams that it won't be producing if we do that [17:43:04] streams that other things produce to eventgate-analytics-exteranl [17:48:06] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10Ottomata) There's nothing special about the test event streams, other than eventgate... [17:51:30] so for now let's keep the explicit eventlogging registration until we observe that it's a massive PITA for users? [17:51:39] yeah [17:53:50] i am still thinking about how the stream configs being fetched by EPC-iOS right now is the full set of configs, including the mediawiki ones. [17:54:09] bearloga: you can request specific streams [17:55:56] ottomata: i know, but if we're going to bake in the uri the client uses to fetch stream configs from then we gotta be careful because if apps team decide to name their streams "mobile_apps.*" and use that for filtering, then if there are streams we want all platforms to produce to that'll be an issue [17:57:14] hm [17:57:45] bearloga: yeah sounds like we need a 'tag' type support with ability to use constraints param to filter on them [17:57:53] we could use that for eventlogging streams too [17:58:06] ottomata: i was actually thinking the same thing [17:58:09] :D [17:58:17] tags: {'is_eventlogging_stream': true, ...} [17:58:18] or somethign [17:58:46] tags: [ 'eventlogging', 'mobile app', 'inactive' ] [17:59:01] or key-value pairs sure [17:59:24] might be more flexibly to have key vals [17:59:33] but ya we'd need to figure out a way to make the constraints param smarter then [17:59:46] annoying to have to encode complex fliters in a url [18:00:02] just thinking that if we have key-val pairs then might as well make it top-level configs like destination [18:00:17] true that is easier for filtering [18:00:31] constraints don't have or filtering right now though [18:00:32] only and [18:02:56] the funny thing is that adding negatives (e.g. ability to exclude) would also create support for or filtering due to !and :P [18:03:26] a little compsci/logic humor for ya [18:13:01] * elukey afk! [18:26:59] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10mpopov) But what happens to the existing table in the database that was created from... [18:29:12] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10Ottomata) I guess we can drop it, but we don't really care about the data in test.eve... [18:29:40] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10Ottomata) > under a different schema? BTW, the schema is the same one (test/event),... [18:35:02] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10mpopov) I was just using test.event as an immediate example, but the question I'm ask... [18:43:48] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10Ottomata) > what happens in general when a stream name is re-used with a different, p... [18:54:36] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10mpopov) >>! In T259917#6380598, @Ottomata wrote: >> what happens in general when a st... [19:01:34] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10mpopov) Also this conversation has gone way off-topic from the actual phab task and w... [19:03:25] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10jlinehan) >>! In T259917#6380640, @mpopov wrote: > Should streams specify owners in t... [19:19:25] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10kzimmerman) 05Open→03Declined Declining this task: - Too many data issues arose when trying to track/understand pagevie... [19:19:28] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Epic: API pageview counts for 'Mobile app' are incorrect since switch to mobile-html - https://phabricator.wikimedia.org/T256508 (10kzimmerman) [19:43:03] 10Analytics, 10Event-Platform, 10Wikimedia-Extension-setup, 10Release-Engineering-Team-TODO (Release-Engineering-Team-TODO (2020-07-01 to 2020-09-30 (Q1))), 10Wikimedia-extension-review-queue: Deploy EventStreamConfig extension - https://phabricator.wikimedia.org/T242122 (10Ottomata) 05Open→03Resolved [19:43:05] 10Analytics-Radar, 10Performance-Team, 10MW-1.36-notes (1.36.0-wmf.4; 2020-08-11): Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Gilles) I can take care of the extension update, but I'm off all of next week, so can't be around to keep an eye on it when the change gets deploy... [19:43:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 9 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) [19:59:47] 10Analytics-Radar, 10Performance-Team, 10MW-1.36-notes (1.36.0-wmf.4; 2020-08-11): Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Nuria) @Gilles the sooner the better cause it is about twice a day that the refine jobs fail due to the integer overflow problem (cc @Krinkle) [20:30:46] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10Ottomata) This is one of the user stories listed in {T205319}, so indeed stream confi... [20:32:32] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) WOOHOO I think we are done! I updated some docs: https://wikitech.wikimedia.org/wiki/Event_Platform/Stre... [20:33:03] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) [21:12:42] mostly verifying in evaluating options presented for future work, we still heavily discourage having analytics jobs that query the public mediawiki apis through the web proxys? As in, future projects shouldn't build on the idea of performing requests against public mw apis? [21:17:59] ebernhardson : ya [21:19:09] nuria: thanks [23:16:24] 10Analytics-Clusters, 10Discovery, 10Discovery-Search: mjolnir-kafka-msearch-daemon dropping produced messages after move to search-loader[12]001 - https://phabricator.wikimedia.org/T260305 (10EBernhardson) [23:16:37] 10Analytics-Clusters, 10Discovery, 10Discovery-Search: mjolnir-kafka-msearch-daemon dropping produced messages after move to search-loader[12]001 - https://phabricator.wikimedia.org/T260305 (10EBernhardson) [23:20:06] 10Analytics-Clusters, 10Discovery, 10Discovery-Search: mjolnir-kafka-msearch-daemon dropping produced messages after move to search-loader[12]001 - https://phabricator.wikimedia.org/T260305 (10EBernhardson)