[06:42:00] 10Analytics, 10Pageviews-Anomaly: Manipulation of pageview statistics - https://phabricator.wikimedia.org/T232992 (10Superbass) I'd suggest to remove the articles from the list for at least two months (I thouht that happened already?). It is very unlikely that Mr. Sammet and co. will become famous in the next... [08:12:56] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-FUN-Funban-2019, 10WMDE-FUN-Sprint-2019-10-14, 10WMDE-New-Editors-Banner-Campaigns (Banner Campaign Autumn 2019): Implement banner design for WMDEs autum new editor recruitment campaign - https://phabricator.wikimedia.org/T235845 (10awight) >>! In T235845... [08:30:30] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) @Isaac @Neil_P._Quinn_WMF checking in, any issue with kerberos? Doubts/fears/etc.. ? :) [08:38:57] 10Analytics, 10Analytics-Kanban: Enable TLS encryption for the MapReduce Shufflers in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T236995 (10elukey) Added documentation to https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#New_Worker_Installation_(12_disk,_... [08:39:08] 10Analytics, 10Analytics-Kanban: Enable TLS encryption for the MapReduce Shufflers in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T236995 (10elukey) [08:52:16] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar): Drop Navigationtiming data entirely from mysql storage? - https://phabricator.wikimedia.org/T233891 (10elukey) @mforns When you are online can you ping me? I'd like to drop the above tables but with somebody triple check... [09:02:35] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Rerun sanitization before archiving eventlogging mysql data - https://phabricator.wikimedia.org/T236818 (10elukey) High level plan that I have in mind: * review/merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/5481... [10:28:26] * elukey bbiab [10:59:44] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-FUN-Funban-2019, 10WMDE-FUN-Sprint-2019-10-14, 10WMDE-New-Editors-Banner-Campaigns (Banner Campaign Autumn 2019): Implement banner design for WMDEs autum new editor recruitment campaign - https://phabricator.wikimedia.org/T235845 (10GoranSMilovanovic) @aw... [12:00:21] * elukey lunch! [12:00:55] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Track WDQS updater UA in wikidata-special-entitydata grafana dashboard - https://phabricator.wikimedia.org/T218998 (10Addshore) [12:46:54] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10JAllemandou) Thanks for the fast loop over format @Nuria and @BBlack. Indeed having a single field named `TLS` formatted as describe... [13:26:56] 10Analytics, 10Analytics-Kanban: Add Mon Wikipedia to analytics setup - https://phabricator.wikimedia.org/T235747 (10Urbanecm) @Nuria Wiki was created. [13:37:20] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10fgiunchedi) >>! In T235189#5624242, @jlinehan wrote: >>>! In T235189#5623642, @Nuria wrote: >> Di... [14:00:21] joal: o/ [14:11:39] Hi elukey :) [14:13:21] bad news from the snakebite front, it doesn't support rcp encryption [14:13:24] sigh [14:13:43] :( [14:14:15] it seems not super difficult to add the code, if we are lucky the spotify devs will give us some hints [14:14:38] elukey: Seems worth it - Let's have a look [14:16:08] also I've read today an article from 2016 (so relatively old) from Ebay, they were saying that RPC encryption caused a 5x slowdown in their namenodes [14:16:16] WOW [14:16:17] different jvms, etc.. [14:16:24] That's not so cool [14:16:52] so nowadays it may be less impactful.. for example, we have TLS in shufflers and the hadoop docs say that it is impactful as well [14:16:55] but it doesn't seem so [14:17:19] so I am still inclined to keep rpc encryption, and remove it only if necessary [14:17:37] works for me elukey [14:17:51] Making me a bit afraid, but eh, encryption is good [14:20:24] encrypt all the things! [14:20:44] :) [14:20:55] * joal wonders if he should encrypt himself [14:29:43] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Ottomata) Thanks Jason! [14:52:07] elukey: just sent a patch for AQS snapshot bump [14:53:00] elukey: if you confirm puppet has run on all aqs machines, I can then apply using scap deploy --service-restart [14:54:04] joal: can we do aqs1004 first? The last time Dan found an issue, namely two out of three druid brokers in a weird state [14:54:09] we had to restart them [14:54:37] I don't recall exactly what they were returning [14:54:40] elukey: we can - just tested all new values using my local AQS, and found no issue - But I'm happy to do it gently [14:54:55] ah okok, then it should be fine [14:55:01] can we document all this? [14:55:11] I am happy to remove ops from the picture as much as possibile [14:55:14] elukey: Please define "all this" :) [14:55:42] the testing step that you did [14:56:04] I guess that you connected your aqs to cassandra in prod right? [14:56:10] and then tested some queries [14:56:21] if so, it could be a good procedure to document [14:56:23] heading home back in a bit! [14:56:37] as pre-step before using scap deploy service restart [14:56:51] (puppet ran on all aqs hosts in the meantime) [14:57:05] elukey: I have a script doing cahce warming up through plenty querying ;) [14:58:02] elukey: I'll document how I do that yes [14:58:09] my point being: I would not be able to carefully test AQS as you do, I might have an idea but surely I'd miss some steps [14:58:26] ideally if me/you are out people should be able to safely deploy [14:58:29] !log restarting AQS using scap after snapshot bump (2019-10) [14:58:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:58:31] thanks :) [15:07:48] ok AQS looks good [15:08:23] o/ morning [15:09:43] Hi milimetric :) [15:09:50] yall already did the snapshot, that was my job this week :) [15:10:01] Arf- sorry for that :) [15:10:05] heya teammmm [15:10:05] I'll take a look at alarms and then who wants to pair to fix the geoeditors? [15:10:29] milimetric: I need to drop for kids soon - I can do it later if you want :) [15:10:39] np, maybe mforns? [15:10:57] milimetric, sure! [15:12:22] ok mforns, did you see this: https://phabricator.wikimedia.org/T237072#5627615 [15:12:45] milimetric, reading [15:14:56] milimetric, ok, bc? [15:15:03] omw [15:15:07] k [15:19:53] 10Analytics, 10Analytics-Kanban: Prepare the Hadoop Analytics cluster for Kerberos - https://phabricator.wikimedia.org/T237269 (10elukey) [15:26:55] 10Analytics, 10Analytics-Kanban: Prepare the Hadoop Analytics cluster for Kerberos - https://phabricator.wikimedia.org/T237269 (10elukey) [15:37:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10elukey) @Ottomata I am wondering if we could simply configure bacula to copy the meta backup's files as we do for Archiva. This will allow us to re... [15:43:04] 10Analytics, 10Pageviews-Anomaly: Manipulation of pageview statistics - https://phabricator.wikimedia.org/T232992 (10Der_Keks) That's why I suggested 2 month :) [15:49:35] (03PS2) 10MNeisler: Add the MobileWebUIActionsTracking schema to EventLogging whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) [15:55:18] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) @BBlack: we can take a stab at modifying code on VCL if you can CR since that needs to happen before the varnishkafka changes [15:57:15] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10Ottomata) 'the meta backup's files'? You mean the mysqldumps? In general yes if we have another way of storing versioned backups (mysqldump+bac... [15:59:06] (03PS2) 10Milimetric: Fix monthly insert and publish query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) [16:00:47] ping ottomata fdans elukey [16:01:50] ping ottomata elukey standdduppp [16:02:00] coming :) [16:04:55] (03PS1) 10Milimetric: Add mnw.wikipedia as it graduates the incubator [analytics/refinery] - 10https://gerrit.wikimedia.org/r/548300 [16:05:20] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add mnw.wikipedia as it graduates the incubator [analytics/refinery] - 10https://gerrit.wikimedia.org/r/548300 (owner: 10Milimetric) [16:05:32] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Ottomata) Hm, another idea... > I don't think sub-objects or arrays are supported by varnishkafka. We'll have to set each one as a... [16:05:46] AHHHH [16:06:54] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10elukey) >>! In T231208#5631975, @Ottomata wrote: > 'the meta backup's files'? You mean the mysqldumps? > > In general yes if we have another wa... [16:11:45] (03PS3) 10Milimetric: Fix monthly insert and publish query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) [16:12:09] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10Ottomata) Ah, no. We didn't upload to HDFS until Nuria asked that we start keeping historical backups. We should def disable the HDFS upload part... [16:13:10] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Correct namespace zero editor counts on geoeditors_monthly table on hive and druid - https://phabricator.wikimedia.org/T237072 (10mforns) @JAllemandou > This retention period feels small!! Is the deletion scheme deleting more than... [16:30:20] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Ottomata) > - Changes the serialization code path we've been using to produce webrequest for years We discussed this in Analytics s... [16:34:31] (03PS1) 10Cicalese: Update to include 1.35 and 1.36. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/548306 [16:35:43] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10BBlack) Agreed, let's not go down that road right here (because we have a burning need for this data pronto), but side note to keep... [16:40:24] (03PS2) 10Cicalese: Update to include 1.35 and 1.36. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/548306 [16:41:25] 10Analytics, 10Analytics-Kanban: Create a script to ease the Oozie work while enabling kerberos in Hadoop - https://phabricator.wikimedia.org/T237271 (10fdans) a:03JAllemandou [16:41:35] 10Analytics, 10New-Readers: Add KaiOS to the list of OS query options for pageviews in Turnilo - https://phabricator.wikimedia.org/T231998 (10SBisson) My PR on uap-core has been open for almost 2 months without any feedback. Anyone knows the maintainers of this project? [16:41:55] (03PS3) 10Cicalese: Update to include MW 1.35 and 1.36 and PHP 7.3. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/548306 [16:43:21] (03PS4) 10Cicalese: Update to include MW 1.35 and 1.36 and PHP 7.3 and 7.4. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/548306 [16:44:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Prepare the Hadoop Analytics cluster for Kerberos - https://phabricator.wikimedia.org/T237269 (10fdans) p:05Triage→03High [16:46:05] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-ContentHandler: Allow YAML as an alternative for JSON on MediaWiki pages - https://phabricator.wikimedia.org/T237136 (10Ottomata) Sounds fine to me however this is done! However, [[ http://schema-beta.wmflabs.org/#!//mediawiki/jsonschema/mediawiki/revision/... [17:02:08] 10Analytics, 10Analytics-Kanban, 10Multimedia, 10Tool-Pageviews: Create script that returns oozie time intervals every time a coordinator is started from a cron job - https://phabricator.wikimedia.org/T237119 (10fdans) p:05Triage→03High [17:07:50] 10Analytics: Update data-purge for processed mediawiki_wikitext_history (6 snapshot kept, 3 would be sufficient) - https://phabricator.wikimedia.org/T237047 (10fdans) p:05Triage→03Normal [17:18:32] 10Analytics: Update data-purge for processed mediawiki_wikitext_history (6 snapshot kept, 3 would be sufficient) - https://phabricator.wikimedia.org/T237047 (10mforns) @JAllemandou > Change proposal: Remove the lists from https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-sn... [17:19:49] 10Analytics, 10Growth-Team, 10Product-Analytics: Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10Nuria) >Is Analytics Engineering able to apply a different timeline to some schemas and not others? >Can parts of the schema be purged at 90 days, and the rest at 270 days... [17:25:34] mforns: can you check https://phabricator.wikimedia.org/T236818 if you have time later on? (mentioning it to avoid forgetting :) [17:26:13] 10Analytics, 10Analytics-Kanban: Add Mon Wikipedia to analytics setup - https://phabricator.wikimedia.org/T235747 (10Nuria) ping @Milimetric that will be adding this to pageview whitelist table. [17:27:28] (03CR) 10Nuria: [C: 03+2] Add the MobileWebUIActionsTracking schema to EventLogging whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [17:27:47] 10Analytics, 10Tool-Pageviews: Topviews Analysis of the Hungarian Wikipedia is flooded with spam - https://phabricator.wikimedia.org/T237282 (10Bencemac) [17:28:52] 10Analytics, 10Tool-Pageviews: Topviews Analysis of the Hungarian Wikipedia is flooded with spam - https://phabricator.wikimedia.org/T237282 (10Bencemac) [17:29:26] elukey, reading [17:32:21] (03CR) 10Nuria: "One comment, good catch on "distinct_editors > 0 " filtering" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) (owner: 10Milimetric) [17:33:30] 10Analytics, 10Analytics-Kanban: Add Mon Wikipedia to analytics setup - https://phabricator.wikimedia.org/T235747 (10Nuria) codechange is done: https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/548300/ but record needs to be inserted on table still, I think. [17:35:17] mforns: if you are ok I'll also run the sanitization script with --older-than 0 [17:35:44] elukey, yes, was reading that [17:36:03] elukey, I don't see any issues in the plan! sounds good :] [17:36:43] mforns: or I can just change the option in the systemd timer, and let it do its thing.. Then do a cleanup tomorrow of all the timers [17:36:48] going to send a patch [17:37:15] elukey, yes, makes sense also [17:42:02] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Rerun sanitization before archiving eventlogging mysql data - https://phabricator.wikimedia.org/T236818 (10mforns) @elukey, LGTM! [18:05:07] 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10MGerlach) == Documentation on meta added summary of results and approach on meta: https://meta.wikimedia.org/wiki/Research:New_user_reading_patterns [18:07:37] 10Analytics, 10Desktop Improvements, 10Event-Platform, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): [SPIKE 8hrs] How will the changes to eventlogging affect desktop improvements - https://phabricator.wikimedia.org/T233824 (10ovasileva) a:05phuedx→03None [18:10:35] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10Jdlrobson) @Isaac let me know if I can help in any way with identifying to whether that miss... [18:23:00] 10Analytics, 10Pageviews-Anomaly: Manipulation of pageview statistics - https://phabricator.wikimedia.org/T232992 (10Superbass) I would like to add that we now regularly receive complaints in OTRS about this manipulation. It's a bit embarrassing that nothing has changed effectively so far. [18:34:40] ok mforns, I'm back [18:35:02] I ran some basic vetting on the nulled-out copied data [18:35:18] same exact number of rows per wiki and per activity level [18:35:49] so my thought was when you're ready to merge the change, to deploy and run jobs. Looking at the last comments on there [18:38:21] hey milimetric [18:39:36] ah, nuria that's a good point about the list of wikis. There's no explicit whitelist, I think we should create that. [18:42:51] ok, milimetric, let's run the job then, after we solve this whitelist thing, isn't the geoeditors_monthly data already filtered to not include private wikis? [18:43:32] doesn't it come from the mediawiki_history? [18:43:45] mforns: no, this comes straight from cu_changes via geoeditors_daily [18:43:51] ah right [18:44:27] I'll look at the rest of the pipeline for whitelisting [18:46:00] it should be a blacklist no? [18:49:39] mforns: I don't think so, because then we'd have to remember to add things to it like this year's wikimania wiki or something. I see that we exclude private wikis from the project namespace map download: https://github.com/wikimedia/analytics-refinery/blob/master/bin/download-project-namespace-map#L130 [18:49:49] so maybe we can just join to that? [18:50:47] join to the mediawiki_project_namespace_map? makes sense :] [18:51:04] yes, that's great [18:51:18] select distinct dbname, hostname from mediawiki_project_namespace_map where hostname like '%.wikipedia.org' and hostname not like 'test%' and snapshot='2019-09'; [18:51:54] ok, submitting patch [18:51:59] yep [18:53:05] anyone: why would I get a java.lang.OutOfMemoryError when querying `WHERE {start} <= timestamp <= {end}` but not when querying `WHERE day = {day}, month={month}, year={year}` if {start} and {end} are ten seconds apart [18:54:56] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10BBlack) Patches above look sane? I went ahead and shortened the key names down to the minimum to prevent bloat at these layers. We can... [18:56:09] * elukey off! [18:58:30] lexnasser, in hive data is organized in partitions, each partition corresponds to a time range, i.e. 1 hour, or 1 month [18:59:09] if a data set has monthly granularity [18:59:29] it will organize data in a directory tree that looks like this: [18:59:59] data/year=/month=/ [19:00:16] so, now I come to the point: [19:00:55] when you specify in Hive sth like: WHERE year=2019, Hive does "partition pruning" [19:01:20] it recognizes that you just want to query 2019 and only reads data/year=2019/* [19:01:47] mforns: got it. Thanks for the detailed explanation. so is the best way to find data between two arbitrary timestamps to first partition by day/hour and then to filter that by timestamps? [19:04:16] lexnasser, one possible solution is to specify some partition-pruning conditions that reduce the data that is going to be read, and then also use the {start} <= timestamp <= {end} condition to fine-filter the data [19:04:27] depends on the size of the data set as well [19:05:54] 10Analytics, 10Desktop Improvements, 10Event-Platform, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): [SPIKE 8hrs] How will the changes to eventlogging affect desktop improvements - https://phabricator.wikimedia.org/T233824 (10Jdlrobson) Okay. I took a stab at this in Sam's absence. I've tried to simplify t... [19:06:02] is your query going to need data across partitions? Meaning, does the query process data belonging to more than 1 partition at the same time? [19:08:02] mforns: I haven't decided on the specifics yet, but I was planning on getting 24-hours of data across two days (ex. 8:34pm-11:59pm November 1, 12:00am-8:33pm November 2) [19:09:04] lexnasser, I see [19:09:57] (03PS4) 10Milimetric: Fix monthly insert and publish query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) [19:11:59] lexnasser, then I'd probably use: WHERE (year={start_year} AND month={start_month} AND day={start_day} OR year={end_year} AND month={end_month} AND day={end_day}) AND timestamp >= {start} AND timestamp < {end} [19:12:30] ok nuria take a look at that approach to make sure no private wikis is released. Basically we're just using what's defined in the sitematrix. That would give us the best chance to stay up to date I think. I suppose if something is not marked public while we are processing, we could miss a wiki temporarily, but it's not something I would expect to happen - if something's private it would normally be marked that way from the [19:12:31] beginning [19:12:51] lexnasser, the (... OR ...) would make sure you only process 2 days of data, while the timestamp comparison would slice the exact time range. [19:13:16] this assuming that start and end belong to two consecutive days [19:14:08] mforns: if you merge that change, I'll deploy and rerun, unless nuria has any objections [19:14:09] milimetric, +1 to using the sitematrix, that should be the single source of truth [19:14:17] ok, looking [19:14:36] in the meantime I'll try to become cool and use notebooks like the rest of the kids [19:14:45] heh [19:15:22] mforns: great, I'll try that out. Thanks so much! [19:15:34] lexnasser, no problemo :] [19:16:18] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Ottomata) Hm, would be ok with me, but likely whatever we choose we'll be stuck with forever. I tend to prefer descriptive names in gene... [19:23:04] 10Analytics, 10Desktop Improvements, 10Event-Platform, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): [SPIKE 8hrs] How will the changes to eventlogging affect desktop improvements - https://phabricator.wikimedia.org/T233824 (10Ottomata) A think a big conceptual difference is the distinction between a schema... [19:23:41] (03CR) 10Mforns: [C: 03+1] "LGTM!" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) (owner: 10Milimetric) [19:24:38] mforns: I can just not mention that param in the coordinator and it gets passed fine to the workflow? [19:24:59] it gives me errors sometimes about required parameters blah blah [19:25:31] but also, if that's true, that's called -1 Fix this shit not +1 LGTM!!! [19:25:42] milimetric, yes, it gets passed anyway. But we put it there, so the job fails fast in case the param is missing: better to fail in the coordinator than in the workflow [19:25:45] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10BBlack) >>! In T233661#5633090, @Ottomata wrote: > Hm, would be ok with me, but likely whatever we choose we'll be stuck with forever. I... [19:26:01] xD no no you did it how it should be donw [19:26:06] I guess [19:26:13] I see, then yeah it's better this way, it's just oozie sux [19:26:13] ok [19:26:29] I mean it should fail fast anyway, it's just silly [19:26:35] yea [19:26:45] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10BBlack) Nevermind, I see it in the gerrit comments [19:27:43] milimetric, should I merge, or wait for nuria? [19:28:01] should we test the job before merging milimetric? [19:28:07] um... check again when you're about to leave, if she hasn't commented, let's merge [19:28:16] ok [19:28:17] mforns: I tested the query, the rest doesn't really change [19:28:26] I'm confident in at least that level of oozie change :) [19:29:05] but I tested the query with hive -f with -d params to make sure that's all good, and looked that the data was the same as the previous run which it mostly should be [19:30:22] mforns: also, here's what I'm vetting so far: [19:30:25] https://www.irccloud.com/pastebin/clJe4KkI/ [19:30:41] let me know what you think and if there are other rules you see [19:30:59] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Ottomata) > I'm not sure if there's limitations on overall length of the varnishkafka inputs/outputs. Shouldn't be from varnishkafka, but... [19:32:37] oh, cool, looking milimetric [19:37:30] milimetric, maybe check that there are no nulls anywhere? [19:37:43] in country or wiki? [19:42:13] milimetric, yes, and also... I mean, all fields are non-nullable right? [19:42:41] sure but the other rules take care of null checks, since null + 9 != null [19:42:49] oh, of course [19:43:53] milimetric, we could also query for some histograms of country frequency, or wiki frequency, or even activity_level [19:44:52] those are not invariants, but usually let you see if something is wrong [19:45:01] no? [19:45:30] maybe: wiki LIKE '%wiki'? [19:49:26] histograms... hm, see this is where I don't have any intuition. Like, I could see how you could spend a month coming up with checks like these, but I have no idea if that would be useful or not [19:49:32] you think we should do histograms? [19:49:45] the wiki like wiki... ok, sure [19:49:52] (03PS3) 10Ottomata: Bump spark.version to Spark 2.4.4 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/542226 (https://phabricator.wikimedia.org/T222253) [19:51:03] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10JAllemandou) I checked for message length in one day of webrequest, and we top at 4916 bytes. I think Kafka will be fine as per message-s... [19:51:16] (03PS4) 10Ottomata: Bump spark.version to Spark 2.4.4 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/542226 (https://phabricator.wikimedia.org/T222253) [19:51:50] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Correct namespace zero editor counts on geoeditors_monthly table on hive and druid - https://phabricator.wikimedia.org/T237072 (10JAllemandou) Makes sense - Thanks for the explanation @mforns :) [19:55:26] (03CR) 10Joal: [C: 04-1] "The jobs misses the project_namespace_map dataset dependency in order not to start before it being available." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) (owner: 10Milimetric) [19:58:42] doh, thanks jo [19:58:45] (03CR) 10Nuria: [C: 03+1] "nice" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) (owner: 10Milimetric) [20:00:04] (03CR) 10Nuria: [C: 04-1] "Correcting, after joseph's comment" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) (owner: 10Milimetric) [20:01:09] Heya nuria - I'm reading the patch for varnishkafka, and I realize there are too many I don't know about in that :) [20:02:01] nuria: for instance, why `VCL_Log:` here? I assume it's because the field comes from a different place than the other ones (namely the VCL-log) - Is that correct? [20:02:55] nuria: also, why `x` at the end of the field? It seems not to be related to value type [20:03:41] oozie really does just waste all of our times :( [20:05:44] (03PS5) 10Milimetric: Fix monthly insert and publish query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) [20:05:59] testing full oozie job now [20:06:28] nuria: see my notebook here: /user/milimetric/notebooks/Vet%20Geoeditors%20Bucketed.ipynb [20:06:53] (is there an easier way to share these?) [20:12:17] (03PS5) 10Joal: Bump spark.version to Spark 2.4.4 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/542226 (https://phabricator.wikimedia.org/T222253) (owner: 10Ottomata) [20:12:48] ottomata: I updated your patch here having looked more closely at refinery-job --^ I hope you don't mind :) [20:13:15] ottomata: We'll also need changes in some oozie jobs once released [20:13:45] aye joal [20:13:48] working on plan here [20:13:49] https://etherpad.wikimedia.org/p/analytics-spark [20:14:07] I can provide CR for the oozie changes (minimal) [20:14:35] sure! [20:14:54] joal: do we want this here? [20:14:55] https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/542226/5/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/MediawikiXMLDumpsConverter.scala [20:14:59] in our upgrade change? [20:15:30] do we want to change the output as part of the upgrade (or is already set in caller somewhere?) [20:15:43] ottomata: The already runs with avro - I can leave default as paruqte if you prefer, but should be ammened [20:15:53] +oozie job [20:15:55] sorry [20:16:14] no no, is ok then. i'm ok with changing default here as long as we aren't changing actual behavior [20:16:21] ack :) [20:17:03] And actually the oozie patch will not be that minimal - I had forgotten about the explicit oozie_spark_lib param [20:17:08] :) [20:17:39] ottomata: Shall we go with spark2.4.0 as new oozie lib name? [20:18:18] Or will it be spark2.4.4 actually [20:22:15] joal: [20:22:16] it will be [20:22:19] spark-2.4.4 [20:23:08] since [20:23:08] https://gerrit.wikimedia.org/r/c/operations/puppet/+/543474 [20:23:19] we had originally had spark2 [20:23:27] and then started versioning the spark2 version [20:23:48] but recenlty i decided to just start versioning the 'spark version' indepdently of the package name [20:23:58] will be more future proof if we get a spark 3 or whatever [20:24:36] joal: do we have a good way of testing the spark shuffle service? [20:24:37] ottomata: current value for lib is spark2.3.1 [20:24:41] ya [20:24:44] changing to spark-2.4.4 [20:24:52] with hyphen [20:25:07] ottomata: This would have been my comment? Hyphen hyphen? [20:25:09] testing shuffle service; i'd like to run a test job on cluster before re-enabling jobs [20:25:45] ottomata: we tested it on test cluster, but it'd be good to have another test before relaunching everything on prod cluster, yes [20:31:09] milimetric: on meeting, will look at notebook in a bit [20:31:11] (03PS1) 10Joal: Update oozie jobs to use spark 2.4.4 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/548494 (https://phabricator.wikimedia.org/T222253) [20:31:23] (03CR) 10Mforns: [C: 04-1] "I left some comments, please have a look :]" (034 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/548306 (owner: 10Cicalese) [20:39:10] (03PS5) 10Cicalese: Update to include MW 1.35 and 1.36 and PHP 7.3 and 7.4. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/548306 [20:39:33] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10Ottomata) Plan here: https://etherpad.wikimedia.org/p/analytics-spark [20:39:41] (03CR) 10Cicalese: "Thanks for the review!" (032 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/548306 (owner: 10Cicalese) [20:48:40] CindyCicaleseWMF, thanks for the quick patch, will merge this now, and re-run version_simple since 2018-10-24 [20:48:54] @mforns Thank you!! [20:52:40] @mforns I think you need to re-run php_drilldown since 2018-06-24 as well to convert the Others there into what I hope is PHP 7.3 [20:57:45] elukey: you gone? [20:58:18] CindyCicaleseWMF, I see... This will take some computation resources: 66 weeks (approx) * 5 reports * 25 minutes = 6 days of computation non-stop [20:58:50] @mforns: wow [20:58:53] I will check with the team to see if this is ok [20:59:09] mforns: thanks! [20:59:45] CindyCicaleseWMF, no problem :] I'll let you know tomorrow [21:00:57] mforns: no rush, if you have to wait for a quiet time or to do it in batches. But, it would be good to know if folks are starting to use later versions of PHP. [21:01:13] CindyCicaleseWMF, of course! [21:09:55] (03CR) 10Milimetric: [C: 03+1] "oozie job tested (https://hue.wikimedia.org/oozie/list_oozie_coordinator/0004669-191031155137252-oozie-oozi-C/)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) (owner: 10Milimetric) [21:10:52] btw, sorry for whitelist alarms, I'm waiting to deploy the cluster with the geoeditors fix [21:12:16] (03CR) 10Joal: [C: 03+1] "Thanks milimetric for the addition" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) (owner: 10Milimetric) [21:20:53] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, and 4 others: Create new eventgate-logging deployment in k8s with helmfile - https://phabricator.wikimedia.org/T236386 (10Ottomata) Hm, another issue! For MirrorMaker replication reasons, we prefix topics in... [21:52:25] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), 10Services (watching): Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Ottomata) Based on discussions in {T233432}, I am considering creating 2... [21:58:36] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) @BBlack: once we deploy the VCL/varnish-kafka chnages we need to change our refine pipeline to read these values, when we deploy t... [22:10:20] milimetric: for the geoeditors data did we rerun the scripts for 2019-09? [22:12:12] milimetric: i think we still need to re-run teh data for 2019 [22:12:23] https://www.irccloud.com/pastebin/be9nEDWZ/ [22:12:29] as this returns no results [22:16:00] milimetric: added section to document issue with 1-4 column : https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geoeditors#Changes_and_known_problems_since_2019-09 [22:22:56] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad: analytics1062 lost one of its power supplies - https://phabricator.wikimedia.org/T237133 (10Jclark-ctr) unfortunately not a loose power cord Submitted Tech direct ticket for replacement psu Service Request 1001998096 [22:34:09] milimetric: in dataset "namespace_zero_distinct_editors" always < distinct_editors right? [22:38:20] nuria: no, we are waiting to merge/deploy, but it looks like everyone +1 right? [22:38:40] so that'll populate the data for 2019-09 and -10, but rest of 2019 is there [22:38:46] milimetric: yes, but one sec , want to check one thing [22:38:53] thanks for the docs [22:39:20] and nuria no, ns0 distinct is not always < distinct, that's the tricky part that I realized after you found the bug [22:39:39] milimetric: if you sum for all activity levels it should be right? [22:39:46] nuria: yes [22:40:05] milimetric: so that is what i was checking, one sec [22:58:05] milimetric: so i run: [22:58:09] https://www.irccloud.com/pastebin/IKHqXoGl/ [23:00:38] for several snapshots and i could not find records where the sum across activity levels returns negative results (indicating that given a wiki and a country nad a month ns0 editors > distinct editors) [23:02:20] milimetric: so let's run some more tests once data for 2019-09 and 2019-10 is populated [23:02:46] milimetric: my notebook is named the same than yours, on my homedir on notebook1003 [23:08:13] ok, nuria, I did this: [23:08:15] https://www.irccloud.com/pastebin/2a3sMM2Z/ [23:08:35] I'll let you know if it returns any results (no results means all's good) [23:08:49] so then I'll deploy and run jobs and check again after jobs are done [23:10:16] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "looks like everyone's happy, self-merging to deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547707 (https://phabricator.wikimedia.org/T237072) (owner: 10Milimetric) [23:28:07] !log deployed refinery [23:28:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log