[09:47:36] (03PS2) 10Awight: [WIP] Process EventLogging events for CodeMirror [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [09:53:48] (03PS2) 10Awight: Sanitize and keep TemplateDataEditor events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/646670 (https://phabricator.wikimedia.org/T260343) [09:54:23] (03CR) 10Awight: Sanitize and keep TemplateDataEditor events (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/646670 (https://phabricator.wikimedia.org/T260343) (owner: 10Awight) [09:56:57] (03CR) 10Awight: [C: 03+1] [WIP] Process EventLogging events for CodeMirror (034 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [10:03:41] Is there a machine where regular NDA users can test new reportupdater-queries? This doc https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater#How_to_test? mentions a machine an-launcher1001. [10:12:42] awight: hi! most of EU people are off today, but I think that stat100x is probably the best way (checkout the repo, test, etc..) [10:22:53] elukey: Thanks, that will be plenty to get me started :-) [10:49:32] Hi awight - Sorry for missing the ping :S [10:54:06] 10Quarry: some quarry fails silently with status 500 - https://phabricator.wikimedia.org/T269668 (10Herzi.Pinki) [11:14:27] Hi! Thanks joal, stat* is working well for me. I've found some query errors and inefficiencies in our patch to keep me busy, so I suppose not ready for analytics review anyway. [11:15:10] ack awight - Feel free to ping, I'll try to be more responsive than last time! [11:36:22] (03CR) 10Awight: [WIP] Process EventLogging events for CodeMirror (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [12:29:33] (03PS3) 10Awight: [WIP] Process EventLogging events for CodeMirror [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [12:30:21] (03CR) 10Awight: "PS 3 rewrites hive/sessions using common table expressions. Evaluation takes ~4 minutes of CPU time, so plenty fast." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [12:54:54] (03CR) 10Awight: [C: 04-1] [WIP] Process EventLogging events for CodeMirror (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [14:23:02] (03CR) 10Neil P. Quinn-WMF: "Marking all comments except one as resolved. Planning to fix the last one myself." (035 comments) [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/635578 (https://phabricator.wikimedia.org/T261953) (owner: 10Sbisson) [15:01:09] (03PS21) 10Neil P. Quinn-WMF: Oozie job for Wikipedia Preview stats [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/635578 (https://phabricator.wikimedia.org/T261953) (owner: 10Sbisson) [15:01:43] Gone for kifs - back at standup [15:16:05] (03PS4) 10Awight: [WIP] Process EventLogging events for CodeMirror [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [15:17:01] (03CR) 10Awight: "PS 4: Adds `date` and `wiki` columns, groups output by wiki." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/645345 (https://phabricator.wikimedia.org/T260138) (owner: 10Andrew-WMDE) [15:51:36] a-team: only thing on the deployment train is syncing pageview whitelist, shall I delay until tomorrow or anything? [15:53:52] hmmm [15:54:06] milimetric: i have a couple of refinerh source changes it might be nice to gert in [15:54:09] but there's no hurry [15:54:32] they are waiting for review [15:54:34] ottomata: go ahead, not really worth it to do a whitelist deploy [15:54:47] and you are a reviewer... :) [15:54:48] I'll take a look see if it's something I can review [15:54:50] oh ok [15:55:16] they touch pageveiw and webrequest code so i don't want to do it lightly [15:59:32] (03PS1) 10Lucas Werkmeister (WMDE): Sum up pp_sortkey instead of pp_value in lexemes.sql [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/647010 [16:01:18] (03CR) 10Lucas Werkmeister (WMDE): "Testing on the stats machines, this changes the query time from ca. 1.6 to 0.3 seconds when the server is “warmed up” for the query." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/647010 (owner: 10Lucas Werkmeister (WMDE)) [16:14:08] 10Analytics, 10Analytics-Kanban: Refine should always DROPMALFORMED but alert if records are dropped - https://phabricator.wikimedia.org/T266872 (10Ottomata) Huh, this is actually pretty cool. It works! `lang=scala import spark.implicits._ val schema = StructType(Seq( StructField("a", StructType(Seq(Str... [16:21:57] (03PS1) 10Lucas Werkmeister (WMDE): Optimize SPARQL query for ranks [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/647012 [16:23:52] (03PS2) 10Lucas Werkmeister (WMDE): Optimize SPARQL query for ranks [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/647012 [16:46:05] (sent e-scrum but I should be back for staff, just hopping out to pick up Ada) [16:55:16] (03CR) 10Neil P. Quinn-WMF: [C: 03+2] "faw`" [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/635578 (https://phabricator.wikimedia.org/T261953) (owner: 10Sbisson) [17:17:35] (03CR) 10Ladsgroup: [C: 03+2] Optimize SPARQL query for ranks [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/647012 (owner: 10Lucas Werkmeister (WMDE)) [17:18:45] (03Merged) 10jenkins-bot: Optimize SPARQL query for ranks [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/647012 (owner: 10Lucas Werkmeister (WMDE)) [17:19:50] (03PS1) 10Lucas Werkmeister (WMDE): Optimize SPARQL query for ranks [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/646782 [17:21:31] (03PS2) 10Ladsgroup: Optimize SPARQL query for ranks [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/646782 (owner: 10Lucas Werkmeister (WMDE)) [17:21:35] (03CR) 10Ladsgroup: [C: 03+2] Optimize SPARQL query for ranks [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/646782 (owner: 10Lucas Werkmeister (WMDE)) [17:22:17] (03CR) 10Ladsgroup: [C: 03+2] Sum up pp_sortkey instead of pp_value in lexemes.sql [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/647010 (owner: 10Lucas Werkmeister (WMDE)) [17:22:34] (03PS1) 10Ladsgroup: Sum up pp_sortkey instead of pp_value in lexemes.sql [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/646783 [17:22:42] (03Merged) 10jenkins-bot: Optimize SPARQL query for ranks [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/646782 (owner: 10Lucas Werkmeister (WMDE)) [17:22:46] (03CR) 10Ladsgroup: [C: 03+2] Sum up pp_sortkey instead of pp_value in lexemes.sql [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/646783 (owner: 10Ladsgroup) [17:23:04] (03Merged) 10jenkins-bot: Sum up pp_sortkey instead of pp_value in lexemes.sql [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/647010 (owner: 10Lucas Werkmeister (WMDE)) [17:23:24] (03Merged) 10jenkins-bot: Sum up pp_sortkey instead of pp_value in lexemes.sql [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/646783 (owner: 10Ladsgroup) [17:55:43] 10Quarry, 10cloud-services-team (Kanban): Quarry down for logged in users - https://phabricator.wikimedia.org/T265997 (10Andrew) 05Open→03Resolved [18:08:51] Hi Nettrom - Would be available by any chancE? [18:09:07] joal: unfortunately in meetings for a couple of hours [18:09:40] ack Nettrom [18:10:18] Nettrom: My question was on usefulness of img_metadata field - No rush :) [18:13:37] (03CR) 10Neil P. Quinn-WMF: [V: 03+2 C: 03+2] Oozie job for Wikipedia Preview stats [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/635578 (https://phabricator.wikimedia.org/T261953) (owner: 10Sbisson) [18:13:47] joal: I've jotted it down and will come back to it when I have a few minutes :) [18:14:39] Thanks a lot Nettrom :) Actually let me keep archive happy and add that to the task [18:17:48] (03CR) 10Joal: "One last nit (test addition) then good for me." (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/632597 (owner: 10Fdans) [18:17:52] fdans: --^ [18:17:59] Tell me if you think I'm too picky [18:29:11] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Structured-Data-Backlog, 10Patch-For-Review: Add image table to monthly sqoop list - https://phabricator.wikimedia.org/T266077 (10JAllemandou) After investigations based on @nettrom_WMF invaluable intuitions, the problem getting commonswiki image ta... [18:54:51] joal: no worries if you're not around but I'm thinking more about the sanitization problem and realizing it's fairly substantial and we didn't budget for it in our goals yet. [18:55:07] (maybe we should chat about it a bit) [18:55:26] milimetric: sure let's do that [18:55:53] To the cave [19:46:29] joal: o/ ok tomorrow morning to restart hive to apply the DBTokenStore settings? If so I'll drain the cluster as soon as I join and restart the metastore/server [20:06:59] hmm fkaelin scala q for you! [20:07:01] or joal :) [20:07:06] yt? [20:08:19] ya? [20:08:40] if I have a method on a case class that does not take parameters [20:08:56] but it uses the case class's properties [20:09:07] meaning its return value will always be the same for a given instance of that case class [20:09:26] is there any reason not to make that method a val, or (probably lazy val in this case) ? [20:10:00] that is, if the return value of a method is always the same...should I just make it a val? [20:11:03] what do you mean by properies? making it a val would be fine, unless you need to be worried about iniatlization/serialization on the cluster. [20:11:15] hmm in this case i don't thnk i do [20:11:19] uhhh [20:11:38] by properties i guess i mean like either the case class parameters, or ones I define in the class [20:11:40] like [20:12:13] case class A(propA: String) { val propB: String = propA + "_propB", ... } [20:12:28] i'm refering to both propA and propB as 'properties', maybe that is not the right term [20:13:20] val should be fine then. i call them instance variables [20:13:39] aye [20:13:39] cool [20:13:48] would val be preferred? [20:14:28] esp with lazy val, its almost like memoizing the function [20:18:28] if you see a val you know it is something that is can be evaluataed at class instantiation time, where as with a def you signal that the value might change and needs to be evaluated when needed. that is useful information when reading code, so more often the benefit is that it signals intent [20:18:39] hm [20:18:58] i think in this case it can be evaluated at instantiation [20:19:01] if it can...perhaps it should? [20:19:05] actually [20:19:10] i'm using lazy val already for some things [20:19:11] for example [20:19:16] loading a remote schema [20:19:30] so, i guess my question is more about lazy vals [20:19:54] i prefer to avoid using lazy val, usually only when i have serialization issues. [20:20:24] * razzi afk for a bit to exercise! [20:22:19] i guess i don't want to request the remote schema at instantiation [20:22:28] but i also don't want to request it more than once [20:22:37] so lazy val is an easy way to accomplish that [20:23:49] if you create an instance of a class with `val` of a type that is not serializable, using this class in a spark job will fail. however, if you make it a lazy val and dont use the val before the class instance is serialized, it works since the val is only evaluated after deserialization on the remote host [20:24:28] aye, in this case the class is more of a spark job launcher [20:24:39] i'm using it to initialize the DataFrameReader [20:24:43] among other things [20:25:45] Heya was gone for diner - ok for me elukey :) [21:27:13] 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production - https://phabricator.wikimedia.org/T120242 (10Clarakosi) [21:45:16] 10Analytics, 10Event-Platform, 10Product-Infrastructure-Data: Client-side error logging should use Elastic Common Schema (ECS) fields when possible - https://phabricator.wikimedia.org/T267602 (10Ottomata) > I think the http field might also be affected, and that one will be a bit trickier to reconcile. Just... [22:10:21] (03PS1) 10Ottomata: [WIP] Refine using PERMISSIVE mode and capture corrupt records [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) [22:11:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should report about malformed records and continue if possible - https://phabricator.wikimedia.org/T266872 (10Ottomata) [23:15:30] 10Analytics, 10Analytics-Dashiki: Chart data from analytics.wikimedia.org do not fully specify macOS versions - https://phabricator.wikimedia.org/T269722 (10gh87) [23:31:10] 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production - https://phabricator.wikimedia.org/T120242 (10Ottomata) @claroski o/ Curious, is Platform Engineering not a relevant tag? Work on this would very likely affect things like ChangeProp a...