[00:11:06] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up automatic deletion for netflow datasource in Druid - https://phabricator.wikimedia.org/T229674 (10Nuria) pinging @mforns [00:12:53] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0: Add statistics for the geographical origin of the contributors - https://phabricator.wikimedia.org/T188859 (10Nuria) [00:16:36] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Nuria) Per our meeting we are going to submit a code patch for gerrit for this code to live in mw... [00:19:05] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Nuria) And @Tgr gets to be our well versed MW developer that can code review first patches! [00:21:48] 10Analytics: Make hdfs-cleaner resilient to in-flight files deletion - https://phabricator.wikimedia.org/T238304 (10Nuria) ping @mforns to look at this on his ops week [00:22:07] 10Analytics, 10Analytics-Kanban: Make hdfs-cleaner resilient to in-flight files deletion - https://phabricator.wikimedia.org/T238304 (10Nuria) [00:24:59] 10Analytics, 10Analytics-Kanban: Hourly Feature extraction for bot detection from webrequest - https://phabricator.wikimedia.org/T238360 (10Nuria) [00:26:02] 10Analytics-Kanban, 10Cloud-Services: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950 (10Nuria) [00:27:22] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Strategy to be able to test eventlogging in beta in the absence of mysql - https://phabricator.wikimedia.org/T223415 (10Nuria) ping @Ottomata Please correct me if I am wrong but we decided to leave things on beta as they are for the time coming correct? [00:49:49] (03PS1) 10Srishakatux: Modify WMCS queries [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551690 (https://phabricator.wikimedia.org/T232671) [00:58:15] (03PS2) 10Srishakatux: Modify WMCS queries [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551690 (https://phabricator.wikimedia.org/T232671) [02:41:18] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Milimetric) I'm fine with just two repos, and the way you outline using them. Two thoughts on naming: * schema-e... [02:41:32] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Strategy to be able to test eventlogging in beta in the absence of mysql - https://phabricator.wikimedia.org/T223415 (10Ottomata) Correct ya! I think we can re-evaluate this idea for both beta and prod in the future. It'd be nice to have some topic... [03:09:14] 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Setup Config:Dashiki:WMCSEdits on meta wiki - https://phabricator.wikimedia.org/T236223 (10Milimetric) @srishakatux what's left to do here? Need any support updating the `config.yaml` for example? [03:12:02] 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10Milimetric) >>! In T234188#5602707, @MGerlach wrote: > Most sessions start in ns=1 (article), but for new users the percentage is slightly smaller. just a quick note that ns=1 is article talk, ns=0 is... [03:16:57] 10Analytics, 10CPT Initiatives (Revision Storage Schema Improvements), 10Epic, 10Technical-Debt: Remove revision_comment_temp and revision_actor_temp - https://phabricator.wikimedia.org/T215466 (10Milimetric) I'm sorry I thought I confirmed this - yes, we were blissfully ignorant on top of the cloud db vie... [03:21:36] 10Analytics, 10Multi-Content-Revisions (Tech Debt): Adapt mediawiki history for MCR - https://phabricator.wikimedia.org/T238615 (10Milimetric) [04:00:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Strategy to be able to test eventlogging in beta in the absence of mysql - https://phabricator.wikimedia.org/T223415 (10Nuria) 05Open→03Resolved [04:00:14] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Move reportupdater reports that pull data from eventlogging mysql to pull data from hadoop - https://phabricator.wikimedia.org/T223414 (10Nuria) [04:13:30] 10Analytics, 10Analytics-Kanban: Tune Wikistats 2 Varnish caching - https://phabricator.wikimedia.org/T230136 (10Milimetric) @DannyS712: could you tweak the batch to not leave a comment? Otherwise subscribers get notified [04:15:24] 10Analytics-Kanban: Productionize Edit History Reconstruction and Extraction - https://phabricator.wikimedia.org/T152035 (10Milimetric) [04:15:26] 10Analytics: vet edit data on the data lake - https://phabricator.wikimedia.org/T153923 (10Milimetric) 05Open→03Declined In the time since we made this task, the Product Analytics and Analytics Engineering teams have been working closely on this dataset. We made some quality improvements and continue to vet... [04:16:36] 10Analytics, 10Analytics-Wikistats: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695 (10Milimetric) 05Open→03Declined Wikistats 1 is no longer maintained. [04:20:04] (03Abandoned) 10Milimetric: [Full dump analysis] Reduce edits_only and reverts_only intricacy [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/118436 (owner: 10Nemo bis) [04:20:17] (03Abandoned) 10Milimetric: Archives are downloaded in .txt.gz format: fix matching and opening [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/92066 (owner: 10Nemo bis) [04:21:02] (03Abandoned) 10Milimetric: Remove all trailing whitespace [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/145862 (owner: 10Nemo bis) [04:21:09] (03Abandoned) 10Milimetric: Comment some path tests which overrode standard ones [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/118261 (owner: 10Nemo bis) [04:31:24] 10Analytics, 10Analytics-Kanban: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Milimetric) @elukey - we can talk more tomorrow but this solution hides Wikistats 1's index.html, which is how most people navigated the old site. I think we should preserve... [07:39:38] 10Analytics, 10Analytics-Kanban: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10elukey) >>! In T237752#5674036, @Milimetric wrote: > @elukey - we can talk more tomorrow but this solution hides Wikistats 1's index.html, which is how most people navigated t... [07:49:07] 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10MGerlach) >>! In T234188#5673893, @Milimetric wrote: >>>! In T234188#5602707, @MGerlach wrote: >> Most sessions start in ns=1 (article), but for new users the percentage is slightly smaller. > > just... [07:51:06] !log restart hdfs-cleaner on an-coord1001 [07:51:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:52:00] joal: bonjour! Got a warning for disk space used in hdfs, we crossed the 2PB mark https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&panelId=25&fullscreen&from=now-7d&to=now [07:53:06] 90d view is very interesting - https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&panelId=25&fullscreen&from=now-90d&to=now [07:58:57] RECOVERY - Check the last execution of hdfs-cleaner on an-coord1001 is OK: OK: Status of the systemd unit hdfs-cleaner https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:35:18] 1.2 T 3.6 T /tmp [08:35:18] 66.0 T 198.6 T /user [08:35:18] 73.1 T 219.3 T /var [08:35:18] 547.0 T 1.6 P /wmf [08:35:49] I am wondering if it is our data growing (new datasets etc..) or if say /user is bigger [08:44:09] it is difficult now to get what increased over time [08:57:03] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Direct link generator to reports in Superset has the incorrect hostname - https://phabricator.wikimedia.org/T238461 (10elukey) Hi Kate! Thanks for the report, I wasn't aware of this functionality.. I found https://github.com/apache/incubator-superset/pul... [09:36:54] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10elukey) ` elukey@krb1001:~$ sudo manage_principals.py create goransm --email_address=goran.milovanovic_ext@wikimedia.de Principal successfully created. Successfully sent email to goran.milovanov... [09:37:04] \o/ --^ [09:37:14] first users asking for credentials [10:10:23] * fdans backfilling at a rate of 20 days per day we'll finish backfilling per file mediarequests in mid February [10:15:44] fdans: o/ [10:16:28] one question - this morning I got a warning for HDFS space used, we crossed the 2PB mark.. nothing problematic since we have space, but then I checked the past 90d of space used: [10:16:32] https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&panelId=25&fullscreen&from=now-90d&to=now [10:17:06] we have been growing steadily from mid october onward, so I am wondering if this is mediarequests [10:17:33] not sure how huge the dataset is/will-be, but it kinda fits timing wise [10:26:37] 10Analytics, 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Add system user analytics-privatedata to the anaytics-privatedata-users group - https://phabricator.wikimedia.org/T238306 (10elukey) During the SRE meeting no strong opposition to this task was raised, but it was su... [10:54:03] sorry elukey, just looking at this now [10:55:57] hmmm... it fits the timing of when we started backfilling per file [10:56:07] elukey: maybe tmp files are not being deleted in hdfs? [11:05:50] fdans: /tmp is ~3.6T, not much [11:06:24] not pointing the finger to anything, just wanted to know if you have an idea about how big the datasets are currently for mediarequest [11:07:29] we went from ~1.7P to ~2P, so in theory 300T during the last month if I am not mistaken [11:07:36] (replicated 3 times) [11:07:51] so possibly an increase of 100T of data, but seems really a lot [11:14:14] hmmmm elukey [11:14:22] https://www.irccloud.com/pastebin/cxz5JAVf/ [11:14:55] ah sorry I wasn't sure about the path to check but it was trivial :) [11:15:09] so yes 10T is something but there is also more [11:16:40] I'll open a task to track this down, just to be sure [11:16:42] thanks :) [11:16:54] yeah elukey it's weird, let me know if I can help look for this [11:17:39] there are some big /user home dirs, some people might have added a lot of data [11:18:26] but the trend seems more steady increase [11:20:31] yep, that's what I was just checking elukey [11:20:53] not 300T in the last month though [11:25:07] fdans: what do you mean? [11:25:17] in the meantime, I found [11:25:18] elukey@stat1004:~$ sudo -u hdfs hdfs dfs -du -h /var [11:25:18] 0 0 /var/lib [11:25:18] 73.3 T 219.9 T /var/log [11:25:24] that is extremely weird [11:26:10] ah no hadoop yarn logs [11:27:49] fdans: it should be 300T replicated, so ~100T to account [11:28:01] does it make sense or am I doing wrong calculations? [11:30:13] elukey: nono you're right [11:31:46] 10Analytics: HDFS space usage steadily increased over the past month - https://phabricator.wikimedia.org/T238648 (10elukey) p:05Triage→03High [11:31:52] created --^ [11:31:57] let me know if it makes sense [11:32:27] 10Analytics: HDFS space usage steadily increased over the past month - https://phabricator.wikimedia.org/T238648 (10elukey) [11:35:35] ok taking a break for lunch, ttl! [11:49:26] 10Analytics, 10Analytics-Kanban: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Milimetric) That’s the ugly/cute part, since we’re copying and not moving / to /v1, all the urls will work, relative or absolute. This is totally fine with v2 because that’s... [11:51:22] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10Addshore) Hello! here's my request for Kerberos credentials for Hadoop access on stat100X and notebook100X. My username is addshore. [11:53:10] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10Addshore) Hello! here's my request for Kerberos credentials for Hadoop access on stat100X and notebook100X. My username is addshore. [12:04:25] elukey: Hi! I was teaching this morning :) [12:04:32] fdans: hi as well :) [12:54:58] hellooo joal sorry, was out lunchin [12:55:03] np fdans :) [13:27:24] joal: o/ [13:27:43] Hi elukey - I'm doing some analysis on log usage [13:29:40] 10Analytics, 10Analytics-Kanban: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10elukey) >>! In T237752#5674841, @Milimetric wrote: > That’s the ugly/cute part, since we’re copying and not moving / to /v1, all the urls will work, relative or absolute. Thi... [13:29:55] elukey: there are things I dont' understand [13:30:11] elukey: would you have a minute in batcave? [13:30:31] joal: sure [13:44:26] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10elukey) >>! In T237605#5674861, @Addshore wrote: > Hello! here's my request for Kerberos credentials for Hadoop access on stat100X and notebook100X. My username is addshore. ` elukey@krb1001:~$... [13:46:21] !log Deleting 100 heavier log-folders from analytics user (cassandra backfilling logs) -- T238648 [13:46:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:46:23] T238648: HDFS space usage steadily increased over the past month - https://phabricator.wikimedia.org/T238648 [13:46:51] !log Deleting old parquet wikitext data (new data is stored in Avro) -- T238648 [13:46:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:47:17] fdans: this backfilling is cursed :D [13:47:54] elukey: whaat what happened now? [13:48:27] oh I see [13:48:39] fdans: the cassandra debug logs spamming hdfs [13:54:45] !log Deleting 600 more log-folders from analytics user (cassandra backfilling logs) -- T238648 [13:54:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:54:48] T238648: HDFS space usage steadily increased over the past month - https://phabricator.wikimedia.org/T238648 [13:57:28] 10Analytics: HDFS space usage steadily increased over the past month - https://phabricator.wikimedia.org/T238648 (10elukey) Joseph found the root cause, namely mediarequests backfilling creating huge files due to cassandra debug logging (T236698). [14:03:36] 10Analytics, 10Analytics-Kanban: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Milimetric) Sold! [14:18:36] 10Analytics, 10Analytics-Kanban: Logging level of cassandra should be warning or error but not debug - https://phabricator.wikimedia.org/T236698 (10elukey) p:05Triage→03High [14:19:14] 10Analytics, 10Analytics-Kanban: Logging level of cassandra should be warning or error but not debug - https://phabricator.wikimedia.org/T236698 (10elukey) Raising to High since this was the cause of T238648 (mediarequests backfilling creating huge log files on HDFS). [14:26:50] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Ottomata) I see nothing wrong with schemaS! Wait what is schemaE? A quick search just tells me its a latin plura... [14:27:28] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Ottomata) I think the naming we are having trouble with though is the production vs analytics dichotomy. I'm ok w... [14:32:07] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 5 others: eventgate-wikimedia should support using remote stream configuration - https://phabricator.wikimedia.org/T238657 (10Ottomata) [14:34:25] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 5 others: eventgate-wikimedia should support using remote stream configuration - https://phabricator.wikimedia.org/T238657 (10Ottomata) p:05Triage→03High [14:37:07] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 7 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) [14:37:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add MaxMind DB files on an-coord1001 for hive local-jobs using UDF to succeed - https://phabricator.wikimedia.org/T238432 (10elukey) @JAllemandou I reverted the patch since I didn't realize that we already include class { 'geoip': } on an-coord1001: ` elu... [14:42:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add MaxMind DB files on an-coord1001 for hive local-jobs using UDF to succeed - https://phabricator.wikimedia.org/T238432 (10elukey) [14:59:44] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Prepare the Hadoop Analytics cluster for Kerberos - https://phabricator.wikimedia.org/T237269 (10elukey) [15:03:39] (03PS1) 10Fdans: Add granularity option to schedule monthly jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/551841 [15:11:51] 10Analytics, 10Release Pipeline, 10Patch-For-Review, 10Release-Engineering-Team (Pipeline), 10Services (watching): Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10Ottomata) [15:23:18] (03PS1) 10Mforns: Move back the start date for pingback php_drilldown [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551847 (https://phabricator.wikimedia.org/T238389) [15:24:03] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging for re-runs." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551847 (https://phabricator.wikimedia.org/T238389) (owner: 10Mforns) [15:26:05] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551621 (owner: 10Cicalese) [15:29:07] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rerun pingback reports to categorize software versions correctly. - https://phabricator.wikimedia.org/T238389 (10mforns) @CCicalese_WMF I merged your patch, and also fixed the start date of the php_drilldown reports in another change. This was needed for... [15:41:54] (03CR) 10Mforns: [C: 03+1] "LGTM! Man, you're a machine!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550945 (https://phabricator.wikimedia.org/T237269) (owner: 10Joal) [16:07:47] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Milimetric) Will comment on architecture next, but just jotting down thoughts about code as I rea... [16:10:04] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Milimetric) I started out thinking Sentry was the gold standard and I now prefer the way you're g... [16:13:46] (03CR) 10Mforns: "Looking good, I think!" (0310 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551690 (https://phabricator.wikimedia.org/T232671) (owner: 10Srishakatux) [16:16:06] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Prepare the Hadoop Analytics cluster for Kerberos - https://phabricator.wikimedia.org/T237269 (10elukey) I was able to make Presto run with the following config settings on analytics1030: ` /etc/presto/catalog/analytics_test_hive.properti... [16:16:31] 10Analytics: Output schema with mediawiki_history snapshots dumps - https://phabricator.wikimedia.org/T238668 (10Milimetric) [16:21:01] ottomata: o/ - presto in the testing cluster works with kerberos! [16:21:09] (without TLS for the moment) [16:23:39] 10Analytics, 10Analytics-Kanban: Logging level of cassandra should be warning or error but not debug - https://phabricator.wikimedia.org/T236698 (10Nuria) I see, will work on this some more today. [16:28:31] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Nuria) Let's move this to gerrit and we can code review there. ua regexes already exist on http... [16:37:51] ottomata: also another q - profile::hadoop::spark2::install_assembly: true is currently notworking on the test coordinator, since it doesn't find the keytab for hdfs (was failing silently before) [16:38:06] IIRC we discussed about moving it and/or changing the user to analytics? [16:38:57] ya or moving that command to somewhere that has an hdfs keytab? [16:39:20] yep like the hadoop masters [16:39:26] ya [16:39:37] hmm, oh but maybe we don't install spark on masters [16:39:52] nope we don't [16:39:57] yeah, i guess anlaytics keytab is fine... [16:40:04] it just needs to write the file once, doesn't really matter who owns it [16:40:14] we'd need to make /user/spark/share/lib writeable by it [16:41:50] yep seems fine [16:42:28] 10Analytics, 10Release Pipeline, 10Patch-For-Review, 10Release-Engineering-Team (Pipeline), 10Services (watching): Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10Ottomata) I can't seem to develop this locally due to a barrage of version problems. From what... [16:42:34] ah nice job with presto::properties [16:42:35] like it [16:45:04] ottomata: or we can use the oozie user for the assembly, it seems consistent with what we do with spark2_oozie_sharelib_install [16:45:19] ya but very different purposes [16:45:26] the spark assembly file has nothing to do with oozie [16:45:38] it'd be nice if oozie could just use it, but i guess it doesn't know how? [16:46:17] yes my bad, it is more generic, I just wanted to avoid mixing 100 users in the same profile [16:46:50] or we can just add the hdfs keytab and that's it [16:47:03] elukey, 3 months have passed and we should activate druid sanitization for netflow: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/535924/ [16:47:10] elukey, I just rebased it [16:47:24] mforns: nice! [16:48:22] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5675651, @Milimetric wrote: > Will comment on architecture next, but jus... [16:52:20] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5675723, @Nuria wrote: > Let's move this to gerrit and we can code revie... [16:58:32] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Direct link generator to reports in Superset has the incorrect hostname - https://phabricator.wikimedia.org/T238461 (10kzimmerman) Thanks for looking into this, Luca! One other note is that, in an earlier version of Superset, the link generator worked pr... [17:00:32] ping ottomata , milimetric [17:05:05] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 7 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10LGoto) [17:24:25] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) p:05Normal→03High [17:27:48] mforns: merged! [17:27:54] elukey, :D [17:27:55] it was marked "WIP" [17:27:58] so I couldn't submit [17:27:59] thanks! [17:28:02] aaaaah [17:28:04] I didn't know that :( [17:28:05] my bad [17:28:31] nono Gerrit tricked me [17:28:35] learned a new thing! [17:35:07] 10Analytics, 10Event-Platform, 10Gerrit, 10Release-Engineering-Team-TODO, and 3 others: Delete eventgate-ci repository from gerrit - https://phabricator.wikimedia.org/T229111 (10MarcoAurelio) Considering that the `delete-project` gerrit plugin does cause some unwanted effects (massive logspam which cannot... [18:03:07] fdans: I've forgotten two days in a row to ask you about internationalization, I'll try to remember tomorrow morning [18:11:42] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Prepare the Hadoop Analytics cluster for Kerberos - https://phabricator.wikimedia.org/T237269 (10elukey) [18:13:10] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Direct link generator to reports in Superset has the incorrect hostname - https://phabricator.wikimedia.org/T238461 (10elukey) Ah interesting! In a few days I should be able to test the new version, 0.35.1, I'll make sure to see if the report generator wo... [18:21:24] (03PS1) 10Nuria: Comenting debug logging [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/551889 (https://phabricator.wikimedia.org/T236698) [18:22:13] elukey: https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/551889/ [18:22:51] FYI: I get a 502 Error when I go to https://datasets.wikimedia.org/ [18:23:17] nuria: nice! [18:23:35] leila: ouch [18:23:50] elukey: I'm not in rush to access it. fyi only [18:23:55] elukey: i was trying a more elegant solution but really cannot get to exclude from pom the loggers in a way that they work [18:24:18] leila: that would be brandon's team who can fix that , let's see, tehy might be doing something one sec [18:24:39] nuria: ok. let me know if you want me to report it. [18:24:59] leila: nah, we just did [18:25:05] (03CR) 10Joal: [C: 03+2] "Merging for deploy tomorrow. Thanks nuria :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/551889 (https://phabricator.wikimedia.org/T236698) (owner: 10Nuria) [18:25:13] nuria: thanks [18:25:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Logging level of cassandra should be warning or error but not debug - https://phabricator.wikimedia.org/T236698 (10Nuria) [18:29:55] (03Merged) 10jenkins-bot: Comenting debug logging [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/551889 (https://phabricator.wikimedia.org/T236698) (owner: 10Nuria) [18:31:16] (03PS1) 10Joal: Update cassandra jar in related jobs for logging [analytics/refinery] - 10https://gerrit.wikimedia.org/r/551893 (https://phabricator.wikimedia.org/T236698) [18:36:50] (03CR) 10Nuria: [C: 03+2] "Per https://archiva.wikimedia.org/#artifact/org.wikimedia.analytics.refinery.cassandra/refinery-cassandra jar version looks correct" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/551893 (https://phabricator.wikimedia.org/T236698) (owner: 10Joal) [18:48:14] (03CR) 10Masumrezarock100: [C: 03+1] "Why Jenkinsbot is not working?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [18:54:31] leila: does it work now? [18:57:08] nuria: the issue seems fixed, I'll open a task to add alarming [19:00:49] !log regenerate TLS cert for yarn.wikimedia.org (containing SANs for all analytics UIs) to add datasets.w.o SAN (site was failing due to ATS not being able to contact thorium) [19:00:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:00:56] * elukey off! [19:02:04] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Ottomata) Alternatives for 'production' schema repo - primary - major - cardinal - main - essential - critical - v... [19:02:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Logging level of cassandra should be warning or error but not debug - https://phabricator.wikimedia.org/T236698 (10JAllemandou) I'm glad I double checked. Unfortunately the above 2 patches will probably not be enough. I've done a quick analysis on logs fro... [19:02:40] nuria: --^ :( [19:03:10] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10cchen) Hi there! here's my request for Kerberos credentials for Hadoop access on stat100X and notebook100X. My username is conniecc1. [19:03:53] milimetric: https://phabricator.wikimedia.org/T206789#5676232 :) [19:03:58] primary & supplementary ? [19:04:26] you didn't like core? [19:06:01] I am usually with you on naming bikesheds but I think these latest names are going a bit too far from what wmf folks would find intuitive. [19:06:17] essential is subjective too [19:06:51] same with most of them actually, and I think core/instrument are less subjective [19:09:38] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10Mayakp.wiki) Hello! Requesting Kerberos credentials for Hadoop access on stat100X and notebook100X. My username is Mayakpwiki Thanks! [19:13:27] nuria: if you expect datasets.wikimedia.org to redirect to https://analytics.wikimedia.org/published/datasets/archive/ : yes, it works now. [19:47:00] 10Analytics, 10Event-Platform: Evaluate possible replacements for Camus: Gobblin, Marmaray, etc. - https://phabricator.wikimedia.org/T238400 (10Ottomata) We should def consider these things as we think about refactoring sanitization: > At Uber, all Kafka data is stored in append-only format with date-level pa... [19:50:11] joal: ayayaya [19:50:40] joal: It seems that cannot be right, cause the logging per line of all records has to be more than 15% of the log [19:51:45] nuria: actually nope, it doesn't :( A lot of log lines are about map-reduce own stuff [19:52:07] leila: mmm.. that does not seem right [19:52:52] leila: ah no it is, i am forgetting there is also dumpsblah domain [19:53:08] joal: that can be fixed with [19:53:26] https://www.irccloud.com/pastebin/OfpXUEMP/ [19:53:32] setting these two to ERROR [19:55:26] hm [19:55:51] ok - let's try that :) [19:55:53] joal: in all but application log [19:56:07] joal: that is , in all but cassandra log explicitily [19:56:09] all but cassandra I guess [19:56:22] joal: let me look at your benchmark [19:56:43] joal: i see, no, that would not help [19:56:58] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10jlinehan) I'd like to see the schema repositories either bear a direct relation to the name of the EventGate insta... [19:59:58] joal: we are going to have to add the anonying logback file? man... [20:00:42] milimetric: oh didn't see you suggessted core [20:01:33] what do you think? [20:01:46] reading jason's comment... [20:08:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Ottomata) Hm, but remember that 'analytics' will $ref sub schemas out of 'production'. If we didn't have the requ... [20:08:21] milimetric: ^ [20:08:24] i like core well enough [20:08:38] don't really like analytics or instrumentation, for reasons in ^^ [20:08:38] k [20:10:16] ottomata: I think core is good [20:10:37] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Ottomata) > I'm not sure why there is a problem with having the repositories named after the "genre" of schema(ta... [20:11:30] I think the tier-1 tier-2 thing is useful too, like an event service level -- I'm just not sure why to see those levels label the schema repositories [20:11:57] oh boy real time bike shedding time! [20:11:59] my fav [20:12:17] hip: we have to name them something, right? [20:12:24] ottomata: so would people with merge rights to analytics schemas be working in the same repo as the machine learning schema example you gave? It seems like that would naturally fit in a third repo, where there's activity on production features but not as much risk of merging stuff that breaks core schemas [20:12:37] milimetric: i don't know where it would go [20:12:43] maybe a third repo if we had to [20:12:46] right now we don't need that [20:12:50] we can certainly support more repos [20:12:54] but the more we have, the more confusing things get [20:13:01] (and the harder naming gets :p) [20:13:03] I agree about this -- it seemed like I didn't in my post [20:13:06] yeah, just useful to consider when naming [20:13:09] tomorrow it will be research wanting a repo, etc [20:13:12] fundraising, etc [20:13:17] it'll be silos all over again [20:13:45] for the ML schema example [20:13:53] we could consider cirrussearch-request [20:13:55] so if we can keep a simple dichotomy I think that's okay, but we're overloading core, right? it's not just core, it's also things that both core and non-core need, no? [20:13:59] yeah [20:14:02] like a dependency between core and non-core would be bad news, right? [20:14:04] Gone for diner - back after [20:14:12] which is kinda why i liked'primary' [20:14:24] hm, yeah i'd hope that we woudln't ref in that direction [20:14:25] I think we have to decide now - do we want to protect repos from being broken by others? Or do we trust our devs and optimize for sharing schemas [20:15:13] milimetric: i think we don't want to allow everyone in product analytics to change the mediawiki job schema [20:15:15] we can ref to stuff in core from everywhere - that's why it's core [20:15:18] there's definitely going to need to be sharing, pretty explicitly since we can $ref things and that's a pattern we're looking to promote I think [20:15:27] but we don't want to ref from core to satellite repos [20:15:42] i think that is right, it wouldnt' really hurt, justr be bad practice probably [20:16:09] but like we've discussed with the 'analytics' repo -- I'm going to use the words as we have them today -- [20:16:09] ok, so then question is: one repo outside core where we let everyone ML, fundraising, etc. be good citizens and take care of each other or do we silo it? [20:16:19] will we really ref into core? what would we need in there? [20:16:21] (ya have been using analytics and production as working terms) [20:16:28] if we have a separate common.json for example [20:16:41] hip: data dictionary use case? [20:16:42] hip: yeah, I think so, as we might want snippets defining users, wikis, etc. [20:16:52] yeah like [20:16:53] right, but if we do a data dictionary, should that be in core? [20:17:04] wouldn't that be a PITA for the platform folks? [20:17:34] I mean a data dictionary for analytics stuff -- I mean maybe we can all use the same one, but I'm assuming that will be tricky [20:17:37] https://github.com/wikimedia/mediawiki-event-schemas/blob/master/jsonschema/mediawiki/revision/common/current.yaml [20:17:57] coudl be in both, but if we had official fields to use for e.g. page_id [20:18:03] it'd be nice to use them in mediawiki core events [20:18:13] yeah I'd fully support having everybody use it [20:18:14] as well as in analytics ones [20:18:21] if possible -- is that possible? we think it is? [20:18:28] sure ya possible [20:18:32] I don't have much visibility into the production events side of the house [20:18:35] we could start by putting data dictionary in core [20:18:42] if it's annoying, we can break it out into core-definitions [20:18:51] sure, and analyticsy dicitionary things can go in analytics repo [20:18:51] but core-definitions would have the same rights as core [20:18:57] its just ones that are used by both shoudl go in core [20:19:28] so, i'm ok with core, but i like primary better, and here's why [20:19:37] *prepares to receive wisdom* [20:19:42] haha [20:19:47] its not going to be that great [20:19:57] core gives me the same feeling as platform or production even [20:20:11] it has a bit of meaning of the purpose of the schemas [20:20:24] which i think is kind of what we are trying to avoid; since we are trying to find names that imply the access rights [20:20:36] like, mediawiki-core [20:21:17] so the implication we want is... access rights, which also imply the "tier-ness", since presumably tier-1 needs access rights for tier-1 stuff, tier-2 needs rights for.. etc. So there's some equivalence there and we want to get both? [20:21:27] I think core in the mediawiki-core / core platform team sense is more about organization than about purpose. Like, that's where all the things go that are needed by all the other things [20:21:32] hm, yes. [20:21:46] but at the same time, also make it seem clear that there are possibly one-way dependencies between primary and other things? [20:22:05] er between other things and primary* [20:22:07] hm, yes. [20:22:30] I'd be fine with primary if it didn't imply order. Like, why is that first? Do I have to look there first? [20:23:35] I like that it captures the tier-1-ness and is somewhat neutral-sounding among the choices that rank stuff, but it doesn't really strike me as carrying an implication about dependencies [20:23:52] however it is NOT carrying any connotation to any team or software area etc. which I like, that's what I like the best about it [20:24:05] like I said on-ticket, I feel like we either need to go all-in or all-out on that mapping [20:24:26] how about we embrace a metaphor, like hub/spoke, and name it that way? [20:25:13] hahah [20:25:14] well if you do that it would be most sensible to have like, /shared, /production, /analytics, or something [20:25:20] we're kind of back in that situation where you can have too many spokes [20:25:23] or you want to make spokes [20:25:27] heh, that was the original propsoal :p [20:25:54] ok, so wrong metaphor, thinking of another [20:25:57] I mean for discoverability the easiest thing is to have it in 1 [20:25:59] you don't think primary carries info about dependencies? [20:26:10] but we don't want to do that because Product wants to be able to +2 schema, yeah? [20:26:19] so we don't want to hold things up in that use-case [20:26:22] ya [20:26:38] quick summary, it sounds like we are coelescing around core vs primary [20:26:46] what about the 'analytics' name? [20:27:14] supplemental (or -ary) ? [20:27:20] well if you want to emphasize completeness and close the door on more repos [20:27:26] (talkinga about this might inform core vs primary) [20:27:26] you'd want to compliment whatever you chose for the other one, so core/primary [20:27:33] with like, peripheral/secondary etc. [20:27:43] supplemental locks you in, we can never have another supplemental-extra or something awful [20:27:54] locks you into not creating more repos? [20:27:58] yeah [20:28:00] yeah, that would be by design hopefully but milimetric you raise lol an intersting point [20:28:01] same with primary [20:28:05] when our crystal city begins to crumble [20:28:12] like, if there was a schema repo named tertiary my head would explode [20:28:15] ya, but we don't want more repos; our only reason to create more is for merge rights [20:28:17] how ugly will it look to have primary-seriously-critical or something [20:28:51] well, all crystal cities look ugly, I guess it's better not to worry [20:28:52] right, so that's why we need to make that decision now. Two or many [20:28:58] 2. [20:28:59] haha [20:29:01] I think I see the benefit of keeping it to be 2 [20:29:04] in the current architecture [20:29:20] ok, done, so bi-modal metaphors... hmmm [20:29:21] i mean MAYBE more if there is a really good reason, like we need another level of merge rights or something... [20:29:26] if we could cheaply make repos discoverable and browsable and it didn't hurt any of the maintainability etc. then it'd be different but I think it's simplest now to stay small? [20:29:34] oh, we will do ^ [20:29:38] with schema browsing ui [20:29:43] https://schema-beta.wmflabs.org/ [20:29:45] will ahve them all [20:29:46] I know I advocated on-ticket potentially going with functional naming to map to EG instances, but I recognize that the arch just doesn't want that to happen [20:29:51] if yall want to improve that please do! [20:29:59] oh so that does it? hmm [20:30:12] yeah i don't wnat to name after EG [20:30:17] i think they are poorly named as is [20:30:18] well then it will be usability trouble because of the "where do I commit this" problem [20:30:38] yeah, i think 2 is best even so [20:30:57] MAYBE MAYBE MAYBE one day someone will have a use case for another repo, and we can deal with that then. [20:31:00] but for now we can say 2 only [20:31:26] I can buy the argument for 2 [20:31:28] i also dont' like 'secondary', i'm ok with primary just because it doesn't necessarily need a 1,2,3 ordering [20:31:30] it just means its the main one [20:31:55] central? [20:31:56] what about going more in-depth on the JSONSchema stuff and calling it like 'root' [20:32:24] ? [20:32:27] heh, the other one would be branches? [20:32:27] and have the other one be like, idk, 'child' sounds real dumb, as does 'nonroot' [20:32:36] yeah, I got nothing there lol [20:33:05] heh, no we do $ref from supplemental -> core, but that doesn't really mean it is a 'child' so explicilty i think [20:33:37] ok i'm ok with core [20:33:40] yeah, it's restricted and central, weird thing to find a good metaphor for [20:33:42] what about core/accessory ? [20:33:54] i'm ok with that too, why accessory over supplemental? [20:34:11] to me: supplemental sounds like good extra, accessory sounds like unncessary extra [20:34:17] really? it's the opposite for me [20:34:18] supplemental sounds optional to me [20:34:24] hm [20:34:31] not that this is the way to decide such things [20:34:32] lol [20:34:35] right [20:34:47] well, we do have 3 people, and we do represent 100% of the people who care about this [20:34:48] the definitions i found don't really help, they all say the same thing [20:34:53] hahahha [20:34:54] ottomata: LOL [20:35:04] thank you so much for caring by the way, i need carers [20:35:31] auxiliary accessory supplemental [20:35:42] ancillary [20:35:54] weird idea: how about "main"? [20:35:55] providing necessary support to the primary activities or operation of an organization, institution, industry, or system. [20:35:55] "the development of ancillary services to support its products" [20:36:00] like, I get core/main might be weird [20:36:06] but main is where most people need to go [20:36:07] main for 'analytics'? why? [20:36:16] ah no i thnik that will confuse things a lot more [20:36:17] because it's like the main schema repo anyone cares about [20:36:20] who is most? [20:36:23] product analysts? [20:36:34] everyone but us and platform team [20:36:36] or mediawiki developers [20:37:02] yeah, most mediawiki developers would go to this other thing. It feels weird to call it "second" or "other" or stuff along those lines, since it's only like that from our perspective [20:37:29] which other thing? [20:37:32] core would change only when core features change, hooks/new actions/new jobs, etc [20:37:34] but either way, agree. [20:37:54] the other (not core) repo would be where the vast majority of people using MEP would work out of [20:38:02] q: where would we put a webrequest schema? [20:38:04] i think in core? [20:38:05] so it has to be cooler than "secondary" I think [20:38:10] yes, core [20:38:21] we're thinking about turning the tables? [20:38:28] hah, no i hope not [20:38:30] mm, I dunno, doesn't that invert the tier-* thingy? [20:38:33] buut cooler than secondary, sure [20:38:40] not turning, just finding a word that isn't subjugated to core [20:38:55] accessory and synonyms are good, no? [20:38:56] but yeah I think having something that doesn't sound like a bag on the side of the system would be good for user state of mind [20:39:01] hm [20:39:01] accessory yea [20:39:09] ancillary? [20:39:18] too fancy [20:39:23] gets into limn territory :) [20:39:25] hehh [20:39:27] core/component would also be possible [20:39:32] lol I sense a past incident [20:39:41] limn is a great name [20:39:43] hip: I have to show you some code sometime, you'll love it [20:39:49] it is a great name, and EVERYONE hated it [20:39:51] the name was not its problem [20:39:52] haha [20:40:38] I like component, I'm cycling on that [20:40:47] really you guys think supplement is negative? [20:40:49] well I don't think it's possible to make the 'analytics' category sound cool, so the best is probably to make it not sound uncool [20:41:03] don't like component, it sounds like a building block [20:41:16] I'm trying to think of like a very organized city where the core is like the core police/fire/judicial services and the rest of the city is ... [20:41:26] ancillary? [20:41:30] :) [20:41:31] auxillary? [20:41:44] component, I have to agree, it's too software-y [20:41:50] auxiliary I think, no? [20:41:53] hard to spell [20:42:05] https://en.wikipedia.org/wiki/Ancillary_Justice#Setting_and_synopsis [20:42:06] yeah, but less fancy than anciliary, in the right direction [20:42:08] sounds like an ROTC squad, and kind of too fancy [20:42:20] common? [20:42:26] no [20:42:30] common could work if it was like [20:42:33] there is a common schema already [20:42:36] that will live in core [20:42:53] commons... [20:42:58] plebian [20:43:01] yeah, rabble [20:43:04] I think what I see in my head is like commons [20:43:06] lol [20:43:29] hello! where can I find the query that generates projectview_hourly? I would like to understand what the col "Project" is about. [20:43:39] djellel: one sec [20:43:41] so no aux, no ancillary, it soundsl ike you like suplemental :p [20:45:08] hm ,djellel https://github.com/wikimedia/analytics-refinery-source/blob/c6fab87b155c295f39af5c84e6ec06f417884e35/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L312-L374 [20:45:10] i think that is it? [20:45:24] djellel: it's here https://github.com/wikimedia/analytics-refinery/blob/master/oozie/projectview/hourly/aggregate_pageview_to_projectview.hql but that just selects "project" from the pageview_hourly table which is created here: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/hourly/pageview_hourly.hql#L34 which comes from pageview_info which is created by the UDF andrew linked to :) [20:45:49] accessory or supplemental eh? [20:46:12] public? [20:46:25] shared [20:46:54] protected/public, special/{regular,normal}, closed/open, internal/external, advanced/basic ... [20:49:08] hm protected [20:49:23] primary / accessory I like better than primary / supplimentary because accessory seems to imply things like instrumentation etc. are non-essential peripherals -- accessories, if you will -- to the primary program [20:49:31] I like protected a lot, but I dunno about public [20:49:38] protected/shared might work [20:49:45] or protected/common [20:49:49] not common [20:50:19] ah right, then shared. but it does say the exact wrong thing about the references [20:50:22] don't love shared either, i know those are attempts at naming based on access privs, but it does souund like shared should be $refed from [20:50:27] ya [20:50:41] i'm ok with accessory too [20:50:49] core / accessory [20:50:50] primary / accessory ? [20:51:18] if I see primary, I want to put secondary there rather than accessory, so I'm biased for core / accessory, but [20:51:53] I do also see that 'core' seems to imply maybe too concretely a mapping between these schema and a certain piece of software [20:52:13] meh not too strongly tho [20:52:29] i don't feel the same way about primary -> secondary [20:52:30] yeah, it's not just "core" it's schema/event/core [20:52:41] ah that's true, that would loosen it up [20:52:53] wait it's event/core but the eventgate api is /v1/events? :p [20:53:00] primary just sounds like the primary stuff, doesn't mean there is necessarily a line of succession behind it [20:53:32] schemas/event/core..what's wrong with that? [20:53:36] with eg api? [20:53:56] oh plural vs singular you mean? [20:53:56] well eventgate uses event*s* and the schema uses event [20:53:59] ah [20:54:07] we could go either way on eithe rone [20:54:19] i did plural on EG to make it clear you could post a list of events [20:54:24] you can?? [20:54:26] wait, really? [20:54:27] ya [20:54:29] wait [20:54:35] so no client-side batching is just LARPing? [20:54:37] you actually can do that? [20:54:44] ya [20:54:45] I see.. [20:54:48] well then yeah events [20:55:05] I mean, let's shed one bike at a time [20:55:26] that's the main reason meta.stream has to be in the event [20:55:31] it would be more natural to make that part of the API [20:55:42] but we wanted to support posting to multiple streams in the same req [20:56:14] yeah, I like it. I feel like I knew this at one point and then forgot again [20:56:20] you probably have explained this before lol [20:56:29] ottomata, milimetric Thanks! [20:56:34] :) [20:56:54] anyway so ya; primary just sounds like the primary stuff, doesn't mean there is necessarily a line of succession behind it [20:57:32] schemas/event/core schemas/event/accessory [20:57:52] eeehhh it's so noun-y now that I see it [20:58:00] I wish we had a good complement for like, protected [20:58:10] schemas/event/protected and schemas/event/*** seems so much more businesslike [20:58:55] schemas/event/primary schemas/event/secondary, like [20:59:07] hm, so yeah the reason we have 2 is for access, but it'd be nice if we named it after the reason for the access differences, not just the access itself [20:59:14] in terms of documentation, i'm willing to say I like that even better than supplementary or accessory or anything fancy [20:59:18] just because it's dead simple [20:59:35] well, core and accessory is fine for that too [20:59:36] and i think none of us really like secondary [20:59:43] primary / secondary? [21:00:05] it'd just be nice if the things were obvious enough that they barely needed to be documented or explained [21:00:11] as to why they were that way [21:00:18] I feel like accessory, while I like it, needs explaining [21:00:22] hm [21:00:51] maybe it's fine [21:01:06] I can document that there are like two event levels, core/primary and accessory, or something [21:01:20] it can be real simple as long as they are seen together and after that you just know which is which [21:01:32] even if you don't remember the exact names, if the concept is there you'll get why there are 2 [21:02:27] i mean, hah, i like primary more than core, and i like accessory more than secondary :p [21:02:27] so I could go either way [21:02:39] if you and milimetric argued enough for one or the other [21:02:41] :) [21:03:05] I don't like any of them sadly, except maybe core [21:03:20] and I've been actively trying to like them for like an hour! [21:04:02] we could double down on the similarity with mediawiki and go with core/extensions [21:04:14] Yeah I only really am a fan of the pairings either core/accessory or primary/secondary. Failing a chance to communicate why the policy is what it is, I'd settle for protected/. Maybe there are other things out there, or these might look different to us tomorrow. [21:04:32] milimetric: yeah I considered extension, or core/extra, but it sounds weird to talk about 'extension events' or 'extra events' [21:04:39] yea [21:04:54] extension will confuse people i think [21:05:01] withdrawn [21:05:11] ok, so right now our top contenders are: primary/secondary or core/accessory [21:05:22] I understand that the 'core' repo isn't necessarily only for what we're today calling 'production' events, but I have to discuss that distinction enough that it'd be good to have better words that are documented and we are all using [21:05:25] i could go either way. let's mull on that for a day [21:05:42] k [21:05:54] sounds good [21:07:09] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Ottomata) Wow this bike shed is going to look great! Just had a good IRC discussion with Jason and Dan. Our cont... [21:20:53] ottomata: milimetric: I'm also going to go there and say that I think /schema/events/core/Foo.json or whatever makes more sense than /schemas/event/core/Foo.json, since 'schema' is a weak plural for schema in English, because 'schemata' is the only correct plural and it is totally inappropriate in almost all settings outside of writing philosophy papers, and more importantly because the thing after 'event' is not the schema for a parti [21:22:01] +1 philosophy is cute but I ain't got no time for that [21:33:05] hm, ok oh boy [21:33:22] am not too opinionated about this, but will offer a counter opinion for kicks [21:33:42] these are event schemas, vs i dunno, some other kind of schemas [21:33:58] putting schema first is just for hierarchy [21:34:07] so while schemas/event may read weird [21:34:13] really they are event schemas [21:34:23] where schemas is the thing they are, and event is just the type [21:34:32] database schemas, event schemas, table schemas, i dunno [21:34:36] we wouldn't do [21:34:42] schema/databases [21:34:47] schema/hives [21:39:49] joal: still dealing with logging? [21:39:56] nuria: trying [21:40:34] joal: did you had any new ideas? [21:40:58] nuria: nope - Trying to add a logback.xml to resources in jar [21:41:20] joal: all right! [21:41:34] joal: i Think i can help with that, want to go to bed and let me try? [21:41:59] sure nuria, sounds good [21:42:01] Thanks for that [21:42:07] joal: k [21:42:15] going to bed then :) [21:42:31] ottomata: I agree with that. what about /schema/event/core/foo then [21:43:37] ottomata: well let's take this up tomorow if at all [21:45:05] k [21:45:16] ya i'd be good with that [22:02:41] joal: ok, going to try one more exclusion [22:25:18] (03PS1) 10Milimetric: [WIP] What we need to do here disagrees with one of the core patterns of dashiki, I have to figure out how to bend/break it. [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/551941 (https://phabricator.wikimedia.org/T236941) [23:12:38] ottomata: yt? [23:12:55] ottomata: from the top of your head, do you know if our logger is log4j or slf4j? [23:14:55] ottomata: from puppet seems like log4j