[07:03:58] goood morning :) [07:04:04] happy new year folks [07:32:28] Heya! Happy new year :) [07:40:07] bonjour bonjour [07:42:36] GoranSM: o/ - I noticed the failed jupyter notebook on stat1007, if you haven't used it in a while you might need to recreate it via https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#Resetting_user_virtualenvs [08:04:55] o/ [08:05:17] bonjour dcausse [08:05:19] hope you all had a nice break :) [08:06:06] we had some jobs that failed with "GSS initiate failed" [08:06:17] not sure where to look at first [08:14:14] dcausse: I bet money that it is due to the analytics-hive.eqiad.wmnet migration.. what jobs? Oozie-related? [08:14:26] if they are still using an-coord1001's creds they'll fail [08:14:33] I recall to have sent a code change for some [08:14:39] but I may have missed the rest :( [08:15:12] ah I vaguely remeber something related to this [08:15:35] yes we have one oozie job that failed but some others scheduled through airflow [08:15:44] dcausse: basically now Hive works with a new kerberos credential, so if we have to failover to another node we don't have to change 10000 jobs [08:16:29] dcausse: I don't know where the configs are but if you grep for an-coord1001 I am pretty sure you'll find the culprits [08:19:18] elukey: ok I see your patch: https://gerrit.wikimedia.org/r/c/wikimedia/discovery/analytics/+/643823/1/oozie/query_clicks/daily/coordinator.properties#14 [08:23:17] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/653209 (https://phabricator.wikimedia.org/T270987) (owner: 10Gerrit maintenance bot) [08:23:39] (03CR) 10Joal: "Thanks for the fix!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/653063 (owner: 10Milimetric) [08:23:56] dcausse: is the job failing the query click one? [08:24:05] elukey: yes [08:24:30] I think Erik tried to deploy this patch a while ago but it reverted it [08:24:56] I'm trying the new hive connection with airflow jobs [08:25:12] dcausse: ah yes I see https://hue.wikimedia.org/oozie/list_oozie_coordinator/0005131-201127102807975-oozie-oozi-C/ [08:25:16] it shows the old creds [08:25:43] but in theory the coordinator file is picked from localhost when launching the oozie job, and since Erik didn't revert the repo (but only the hdfs files) it should have worked [08:26:21] ah snap look at all those failures, really sorry :( [08:29:23] elukey: np, not your fault! [09:25:11] dcausse: how it is going? Do you need any help? [09:25:54] elukey: going well so far, airflow jobs seem to be fixed, trying to fix the oozie one now [09:28:23] (03CR) 10Joal: "Some comments inline. Please feel free to discard them as you wish." (034 comments) [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/651794 (https://phabricator.wikimedia.org/T261953) (owner: 10Neil P. Quinn-WMF) [09:29:29] dcausse: super [09:35:19] (03CR) 10Joal: [C: 03+1] "LGTM - One follow up on caching discussion but this is ready IMO." (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) (owner: 10Ottomata) [09:59:01] 10Analytics: Check home/HDFS leftovers of kaldari - https://phabricator.wikimedia.org/T271089 (10MoritzMuehlenhoff) [10:21:03] 10Analytics: Check home/HDFS leftovers of dcipoletti - https://phabricator.wikimedia.org/T271092 (10MoritzMuehlenhoff) [10:21:52] elukey: happy new year :D [10:22:37] Amir1: to you too :) [10:28:05] 10Analytics-Clusters, 10Analytics-Kanban: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10kostajh) @elukey I still need access for the #add-link project. (Sorry for missing the deadline!) [11:04:10] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T270768 (10elukey) The host boots, see T215183#6718961, but we still need to get the new disk :) [11:16:33] joal: I was able to make an-coord1002 to boot, the raid1 worked (one disk is sufficient to boot etc..) but of course the BIOS needed a tweak [11:17:20] elukey: I assume the bootloader was also present on the other disk? [11:17:49] elukey: do ou think we should put it in service with a single disk? [11:18:46] joal: it was (even if I thought it wasn't) but the BIOS was instructed not to check other disks if the first one was broken [11:19:02] Ah! [11:19:07] it is fine to have it in service, it doesn't really do much at the moment [11:19:19] okey :) [11:19:44] elukey: informal discussion to get your opinion on related thoughts (RAID) [11:21:15] elukey: I had been told that usually RAID disks would break close together - and therefore putting a machine in seervice with a single disk if the 2 RAID disks were from the same original setup would probably not be a good idea [11:21:24] elukey: any opnion? [11:22:51] joal: sometimes it may happen, and it is definitely something to take into consideration.. In theory we have several disk spares in the DC, so usually (non-holiday time etc..) with hot-swap it should be relatively quick for dcops to fix the issue [11:23:04] ack [11:23:35] elukey: my question was more about the correctness about what I had been told - It was long ago :) [11:24:30] joal: nono I completely get your point, and it is a good one [11:24:36] disks are really sneaky [11:29:25] elukey: disk sneak peek would be fun :) [11:29:46] * joal has difficulty with puns in this new year [11:40:19] * elukey lunch! [12:02:00] for when you have time: https://gerrit.wikimedia.org/r/c/operations/puppet/+/651640 [12:08:18] 10Analytics-Kanban, 10Operations, 10ops-eqiad: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T271098 (10Peachey88) [12:09:07] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T271098 (10Peachey88) [12:24:59] Aloha! Welcome to 2021, the year that everyhting will hopefully be better. [13:35:42] helllooOoOo! :) [13:44:29] Heyoooo [13:49:06] Good afterning! [13:49:53] First week back at work: four job interviews scheduled :-o [13:50:19] /o\ [13:50:38] Well, one got canceled by the candidate, thankfully the one tomorrow at 0930 :D [13:51:19] things do chaqnge it seems :) [13:53:09] Plus ça change [13:57:21] hello!! [13:57:41] hello team! [14:00:39] hola hola [14:01:18] helloooo teamm! happy new year :] [14:02:28] Hello! [14:04:53] heya [14:11:54] (03CR) 10Ottomata: Refine using PERMISSIVE mode and log more info about corrupt records (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) (owner: 10Ottomata) [14:24:51] !log deprecate the analytics-users group [14:24:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:29:40] (03PS1) 10Mforns: Raise data quality alert threshold for useragent hourly metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/654256 [14:31:16] 10Analytics-Clusters, 10Analytics-Kanban: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10elukey) >>! In T268801#6718886, @kostajh wrote: > @elukey I still need access for the #add-link project. (Sorry for missing the deadline!) No problem at all, I haven't removed the g... [14:35:24] 10Analytics-Clusters, 10Analytics-Kanban: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10kostajh) >>! In T268801#6719357, @elukey wrote: >>>! In T268801#6718886, @kostajh wrote: >> @elukey I still need access for the #add-link project. (Sorry for missing the deadline!) >... [14:38:46] (03CR) 10Ottomata: [C: 03+2] Update junit and netty versions for github security alert [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/651553 (https://phabricator.wikimedia.org/T237774) (owner: 10Ottomata) [14:39:57] 10Analytics-Clusters, 10Analytics-Kanban: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10elukey) >>! In T268801#6719402, @kostajh wrote: >>>! In T268801#6719357, @elukey wrote: >>>>! In T268801#6718886, @kostajh wrote: >>> @elukey I still need access for the #add-link pr... [14:59:50] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Isaac) Hey all -- happy new year!! I had differential privacy on the mind and my feeling... [15:02:26] (03CR) 10Ottomata: [C: 03+2] Refine - Add is_wmf_domain transform function [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/646828 (https://phabricator.wikimedia.org/T256677) (owner: 10Ottomata) [15:07:38] hi team! [15:08:13] hello! [15:11:32] heya! [15:20:01] joal o/ yt? [15:25:05] am testing https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/651542 [15:25:21] the _REFINED flag is written if there are 0 events but refine succeeds [15:25:27] however, no hive partition is added [15:25:34] i wonder what the correct behavior is [15:25:38] probably to also add the partition, right? [15:37:13] (03PS3) 10Ottomata: Add Refine TransformFunction to remove canary events [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/651542 (https://phabricator.wikimedia.org/T251609) [15:38:24] joal yeah I think that is the right thing to do, updated https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/651542 [15:49:21] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10JFishback_WMF) This looks awesome @Isaac! Can't wait to try it out. [15:52:24] Aw crud, I forgot that our shops all close at 1900 atm due to COVID. I have to go buy some groceries, so I'll miss the standup likely. Will send in short update after. [15:52:36] klausman: ack! [16:01:20] fdans: standuppp [16:15:56] (03CR) 10Milimetric: [C: 03+1] Raise data quality alert threshold for useragent hourly metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/654256 (owner: 10Mforns) [16:24:38] 10Analytics, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): wikimedia-event-utilities should provide tools for JVM based apps producing directly to kafka - https://phabricator.wikimedia.org/T270371 (10CBogen) [16:35:42] ottomata: are we skipping ops sync? [16:35:50] klausman: we are here [16:35:54] https://meet.google.com/qdx-cxwy-feo [16:35:57] razzi: ^ [16:39:17] 10Analytics-Clusters: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Ottomata) [16:44:38] 10Analytics-Clusters, 10Analytics-Kanban: analytics-admins should be able to sudo -u www-data in analytics systems - https://phabricator.wikimedia.org/T263272 (10Ottomata) 05Open→03Declined Re-open if we end up actually needing this. [16:45:16] 10Analytics, 10Product-Analytics, 10Epic: Readership Retention: New vs. Returning Unique devices - https://phabricator.wikimedia.org/T269815 (10fdans) p:05Triage→03Medium [16:47:05] 10Analytics, 10Event-Platform: jsonschema-tools should ensure schema examples exist - https://phabricator.wikimedia.org/T270134 (10fdans) p:05Triage→03High [16:48:35] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Analytics, and 2 others: Automate deprecation of schema on metawiki after migration to Event Platform - https://phabricator.wikimedia.org/T270136 (10fdans) @Ottomata can we close this? [16:53:38] 10Analytics, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): wikimedia-event-utilities should provide tools for JVM based apps producing directly to kafka - https://phabricator.wikimedia.org/T270371 (10CBogen) [16:56:00] 10Analytics-Clusters: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Ottomata) a:03razzi We should use this also as an opportunity to reinstall as Debian Buster and adopt new Cloud VPS host naming conventions, e.g. clouddb10xx (ask Brooke what this hostnam... [16:59:36] 10Analytics-Kanban: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10Ottomata) [17:05:54] 10Analytics, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10Milimetric) p:05Triage→03Medium We groomed this today, and here are our thoughts: We think it'll take about a week to write the job, a week to deploy and... [17:16:19] 10Analytics: Switch off skipTrash for some data purging - https://phabricator.wikimedia.org/T270431 (10fdans) p:05Triage→03High a:03mforns [17:26:56] 10Analytics: Add logic to purging scripts that requires admin action if it's about to delete a lot of data - https://phabricator.wikimedia.org/T270433 (10fdans) p:05Triage→03High [17:39:23] elukey: :] can you please look at: https://gerrit.wikimedia.org/r/c/operations/puppet/+/649864 [17:44:29] wikipedian's 2020 in review is always good: http://thewikipedian.net/2020/12/31/the-top-10-wikipedia-stories-of-2020/ [17:46:07] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Analytics, and 2 others: Automate deprecation of schema on metawiki after migration to Event Platform - https://phabricator.wikimedia.org/T270136 (10Ottomata) 05Open→03Declined Yes, let's decline. We are using auto-protect to edit lock... [17:46:09] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [17:48:31] mforns: deployed, I am now running the job to see if it works or not :) [17:48:43] elukey: thanks a lot! [17:48:59] elukey: I will check that too [17:50:07] joal: when you have a minute, plz re-review https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/651542 with the change to DataFrameToHive [18:28:47] 10Analytics-Clusters, 10Patch-For-Review: Move Superset and Turnilo to an-tool1010 - https://phabricator.wikimedia.org/T268219 (10razzi) Spoke with @elukey and we're thinking of leaving turnilo on an-tool1007 for now, rather than co-locating it with superset, so that issues with either service won't affect the... [18:30:27] will review tonight ottomata [18:31:34] 10Analytics-Clusters, 10Patch-For-Review: Move Superset and Turnilo to an-tool1010 - https://phabricator.wikimedia.org/T268219 (10elukey) Turnilo on an-tool1007 [[ https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=an-tool1007&var-datasource=thanos&var-cluster=analytics | con... [18:32:39] elukey: templatedata output looks good to me, thanks! [18:33:16] nice! [18:33:54] 10Analytics, 10SRE-tools, 10User-crusnov: Some Analytics clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271133 (10crusnov) [18:35:09] 10Analytics, 10SRE-tools, 10User-crusnov: Some Analytics clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271133 (10crusnov) p:05Triage→03Medium [18:37:41] ottomata: you wanted to sync on migration? [18:38:28] mforns can we do in 20 mins so I can finish a thought? [18:38:47] of course! [18:38:50] coo [18:45:47] * elukey afk! [18:58:23] 10Analytics-Clusters, 10Patch-For-Review: Move Superset and Turnilo to an-tool1010 - https://phabricator.wikimedia.org/T268219 (10Ottomata) Sure! fine with me. [19:01:38] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:02:28] mforns: bc? [19:05:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:12:14] (03PS8) 10Ottomata: Refine using PERMISSIVE mode and log more info about corrupt records [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) [19:13:40] (03CR) 10Ottomata: "@joal, please re-review Refine.scala" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) (owner: 10Ottomata) [19:15:17] 10Analytics, 10Event-Platform, 10Inuka-Team: InukaPageView Event Platform Migration - https://phabricator.wikimedia.org/T267344 (10Ottomata) @nshahquinn-wmf FYI we will migrate this schema next week (week of Jan 11 2021). [19:15:20] 10Analytics, 10Event-Platform, 10Inuka-Team: KaiOSAppFeedback Event Platform Migration - https://phabricator.wikimedia.org/T267345 (10Ottomata) @nshahquinn-wmf FYI we will migrate this schema next week (week of Jan 11 2021). [19:15:22] 10Analytics, 10Event-Platform, 10Inuka-Team: KaiOSAppFirstRun Event Platform Migration - https://phabricator.wikimedia.org/T267346 (10Ottomata) @nshahquinn-wmf FYI we will migrate this schema next week (week of Jan 11 2021). [19:17:19] 10Analytics, 10Event-Platform, 10Inuka-Team: InukaPageView Event Platform Migration - https://phabricator.wikimedia.org/T267344 (10Ottomata) Actually, I take that back. We can't migrate this; it is an app event, not an EventLogging extension based one. Will postpone. [19:17:23] 10Analytics, 10Event-Platform, 10Inuka-Team: KaiOSAppFeedback Event Platform Migration - https://phabricator.wikimedia.org/T267345 (10Ottomata) Actually, I take that back. We can't migrate this; it is an app event, not an EventLogging extension based one. Will postpone. [19:17:26] 10Analytics, 10Event-Platform, 10Inuka-Team: KaiOSAppFirstRun Event Platform Migration - https://phabricator.wikimedia.org/T267346 (10Ottomata) Actually, I take that back. We can't migrate this; it is an app event, not an EventLogging extension based one. Will postpone. [19:22:44] ottomata: back, bc? [19:26:23] 10Analytics, 10Event-Platform, 10Product-Infrastructure-Data: PrefUpdate Event Platform Migration - https://phabricator.wikimedia.org/T267348 (10Ottomata) @mpopov Let us know if this schema needs client IP and/or geocoded data? If not, it will be removed as part of this migration. [19:26:38] mforns: got meeting with fdans in 3 mins [19:27:24] ottomata: oh sorry, when you said 20 mins, I didn't realize it was the end of the hour, thought it was just an imprecise small amount of time [19:27:39] hehe soryyYYyy [19:27:40] ottomata: I'll be here later if you want to ping [19:27:43] ok! [19:27:48] ok [19:28:13] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:30:28] mforns: i'm trying to find ones we can actually do now (EL JS based ones) [19:30:33] it looks like the Strucutred Data one [19:30:36] SuggestedTagsAction [19:30:36] yea [19:30:40] is doable and easy to do now [19:30:42] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:30:42] can you take that one? [19:30:47] https://phabricator.wikimedia.org/T267351 [19:30:57] notify them that you'll do it next week and proceed [19:31:03] should be the same as the other's we've done [19:31:13] it looks like it is logged from the MachineVision extension [19:31:25] gerrit.wikimedia.org:29418/mediawiki/extensions/MachineVision [19:32:25] ottomata: sure, will do [19:42:16] 10Analytics, 10SRE-tools, 10IPv6, 10User-crusnov: Some Analytics clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271133 (10Aklapper) [19:43:36] happy new year team! not sure who here was involved with the top-by-country endpoint design, but I was wondering why we don't filter out countries on the country protection list, like we do with geoeditors. is it just because the data is not that sensitive privacy wise? also, why do we not filter out the unknown country `--`? [19:50:16] mforns: am done with fdans but i think that's probably good enough for sync, ya? [19:50:31] once the EL PHP changes go out we should able to finish the outstanding migrations [19:50:36] hopefully we can do those next week too [19:50:42] ottomata: yes, I can go ahead with that scheam [19:50:44] schema [19:50:46] k gr8, thank you! [19:50:51] k! [19:51:08] milimetric: ^^^^ for lexnasser's q [19:51:35] Hi lexnasser - I have no good answer to your question [19:52:13] lexnasser: good point, I have a bad explanation [19:52:48] we just released it before geoeditors, and hadn't thought of the filter back then [19:53:20] this is the feeling I have as well milimetric, but I can't recall --^ [19:53:50] I think we should do both, use the country blacklist in all country-based datasets, and filter out the unknown (--) country [19:54:13] I will file a task for this later lexnasser, thanks! [19:58:47] (03CR) 10Joal: "Comment inline" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) (owner: 10Ottomata) [19:59:51] milimetric, joal: thanks! [20:08:38] 10Analytics, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10Isaac) [20:10:07] 10Analytics, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10Isaac) [20:14:27] (03CR) 10Joal: [C: 03+1] "LGTM as long as the empty-df case is tested :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/651542 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [20:15:26] ottomata: I reviewed DataframeToHive and refine to look for where to cache, and the fact that writing happens in DataframeToHive while count happens in Refine ddoesn't help :( [20:17:49] joal: you mean your suggestion? or my recent changge to work around the _corrupt_record thing? [20:17:55] ottomata: would we add a parameter to PartitionedDataFrame: PartitionedDataFrame(df: DataFrame, partition: HivePartition, recordCount: Option[Long]) ? [20:18:27] hm like lazy val it or something [20:18:35] lazy val recordCount = df.count [20:18:37] ? [20:19:20] joal: oh [20:19:26] i added the cache there not because of your suggestion [20:19:45] but because of https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala#L110-L122 [20:21:05] so, I didn't add cache() there to help performance, just to make it actually work [20:21:08] otherwise it throws an exception [20:21:53] ottomata: I think caching at higher level (workingPartDf.df) would also work , and help perf :) [20:22:02] oh hm [20:22:17] but [20:22:22] maybei don't know how cache() works [20:22:50] but the transform functions are applied to the df by DataFrameToHive [20:22:52] 10Analytics-Clusters, 10Operations, 10SRE-tools, 10netops, and 2 others: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136 (10Dzahn) [20:23:04] so we can cache workingPartDf.df [20:23:09] but what else will use it again? [20:25:22] joal: i'm all for figuring out caching for perf enhancements, but maybe we can do in different patch? [20:25:36] ottomata: works for me [20:25:47] ottomata: caching meterializes data in memory [20:26:41] ottomata: so if you cache workingDf.df, ou can use memory for both filtering-out and counting corrupted records, and counting records without errors (this is the -corrupt-record caching enhancement) [20:27:27] ottomata: For refine, same idea: currently you compute the dataset (read + transform functions) for both writing data and counting data - If you materilize the data in moemry, computation is only done once [20:28:00] in that case, isn't the df already in memory? would have thought spark would optimize that [20:28:40] ottomata: it might, but migth not :) [20:32:07] joal: do I need to use the returned df from cache()? or does it not matter? [20:32:21] ottomata: doesn't matter [20:32:33] you can do: df.cache() - it's not functional [20:32:34] ok [20:33:18] (03PS9) 10Ottomata: Refine using PERMISSIVE mode and log more info about corrupt records [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) [20:33:39] (03PS10) 10Ottomata: Refine using PERMISSIVE mode and log more info about corrupt records [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) [20:34:03] (03PS11) 10Ottomata: Refine using PERMISSIVE mode and log more info about corrupt records [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) [20:34:16] ok joal pushed patch [20:34:20] be back in a bit [20:35:19] (03CR) 10Joal: [C: 03+1] "LGTM! Thanks !" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) (owner: 10Ottomata) [20:41:21] * razzi afk for a bit [20:54:58] any chance something kerberos related deployed around dec 21/22? [20:55:47] yes ebernhardson [20:55:57] specifically i'm getting 'GSS initiate failed' starting then on some airflow things, but oddly not all of them and not sure what the difference is [20:56:12] ebernhardson: we changed the hive kerb principal to faciliate an-coord failover [20:56:41] joal: hmm, that might be it. thanks i'll poke that area [20:57:02] the difference would then be that we have an analytics-search user for most things, spark, etc. But some stuff must talk to hive directly [20:57:04] ebernhardson: we talked to dcausse about that this morning, he mentioned he'd work on it - maybe it's already fixeD? [20:57:26] not yet, i'm coming back from a 3 week vacation and just looking at the 500 failure emails now :) [20:57:39] talked to david this morning, but he hadn't gotten anywhere in particular [20:57:46] ebernhardson: ack [20:57:58] ebernhardson: give me a minute to find our patch as an example [20:58:15] (03CR) 10Ottomata: [C: 03+2] Add Refine TransformFunction to remove canary events [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/651542 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [20:58:22] we pre-fixed this in some places, but apparently missed others [20:58:42] ebernhardson: it's easy to miss some :( [20:58:54] here is our big change ebernhardson: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/647612 [20:59:24] excellent, thanks! [20:59:46] ebernhardson: please note that spark jobs using the metastore also need a change! [21:00:10] hmm, i don't think we tell spark anything about metastore it should be getting that from /etc/spark ? [21:01:18] but i'll double check [21:02:06] heh, restarting the scheduler and trying a task worked :S I wonder it we deployed the fix and forgot to have it load... [21:02:28] ebernhardson: could it be related to hive-site.xml change? [21:04:02] seems likely, it looks like these errors might be limited to where airflow talks to hive metastore directly. I suppose once those connections are setup they might last for some time, or the config for how those are setup might be loaded once and kept [21:05:28] Sorry for the mess ebernhardson :( We moved late last month as a little christmas present to have finished the task [21:05:51] ebernhardson: We hadn't guessed it would mean new-year issues for you folks :( [21:06:18] no worries, we were informed about it but i suppose noone was around to see how it worked [21:07:13] Gone for tonight a-team - see you tomorrowe [21:07:18] laters joal ! [21:07:22] byeeee [21:07:31] o/ [21:10:41] (03CR) 10Ottomata: [C: 03+2] Refine using PERMISSIVE mode and log more info about corrupt records [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) (owner: 10Ottomata) [21:10:50] (03PS12) 10Ottomata: Refine using PERMISSIVE mode and log more info about corrupt records [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) [21:11:37] (03CR) 10Ottomata: [C: 03+2] Refine using PERMISSIVE mode and log more info about corrupt records (037 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) (owner: 10Ottomata) [21:16:35] (03Merged) 10jenkins-bot: Refine using PERMISSIVE mode and log more info about corrupt records [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) (owner: 10Ottomata) [21:18:51] (03PS1) 10Ottomata: Update changelog.md for 0.0.143 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/654305 [21:22:41] 10Analytics-Clusters: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Bstorm) >>! In T269211#6719873, @Ottomata wrote: > We should use this also as an opportunity to reinstall as Debian Buster and adopt new Cloud VPS host naming conventions, e.g. clouddb10xx... [21:24:16] (03PS2) 10Ottomata: Update changelog.md for 0.0.143 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/654305 [21:30:02] 10Analytics, 10Event-Platform: ExternalLinksChange Event Platform Migration - https://phabricator.wikimedia.org/T271162 (10Ottomata) [21:31:27] 10Analytics, 10Event-Platform: ExternalLinksChange Event Platform Migration - https://phabricator.wikimedia.org/T271162 (10Ottomata) Hi @Samwalton9, let us know if this schema needs client IP and/or geocoded data? If not, it will be removed as part of this migration. Also, do you know what produces these even... [21:35:37] 10Analytics, 10Event-Platform: ExternalLinksChange Event Platform Migration - https://phabricator.wikimedia.org/T271162 (10Ottomata) Actually, @Samwalton9, this schema hasn't had any events in at least the last 90 days. Are you sure we need to migrate this? Can we just disable it instead? [21:40:23] 10Analytics, 10Event-Platform: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10Ottomata) [21:41:06] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T271098 (10wiki_willy) 05Open→03Resolved a:03Cmjohnson Duplicate of T270768 [21:42:01] 10Analytics, 10Event-Platform: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10Ottomata) @Isaac, let us know if this schema needs client IP and/or geocoded data? If not, it will be removed as part of this migration. Also, do you know what produces the... [21:44:21] 10Analytics, 10Event-Platform: DesktopWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T271164 (10Ottomata) [21:45:09] 10Analytics, 10Event-Platform: DesktopWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T271164 (10Ottomata) @MNeisler let us know if this schema needs client IP and/or geocoded data. If not, it will be removed as part of this migration. [21:46:41] 10Analytics, 10Event-Platform: QuickSurveyInitiation Event Platform Migration - https://phabricator.wikimedia.org/T271165 (10Ottomata) [21:47:10] 10Analytics, 10Event-Platform: QuickSurveysResponses Event Platform Migration - https://phabricator.wikimedia.org/T271166 (10Ottomata) [21:49:59] 10Analytics, 10Event-Platform: QuickSurveysResponses Event Platform Migration - https://phabricator.wikimedia.org/T271166 (10Ottomata) @ovasileva @phuedx let us know if this schema needs client IP and/or geocoded data. If not, it will be removed as part of this migration. [21:50:06] 10Analytics, 10Event-Platform: QuickSurveyInitiation Event Platform Migration - https://phabricator.wikimedia.org/T271165 (10Ottomata) @ovasileva @phuedx let us know if this schema needs client IP and/or geocoded data. If not, it will be removed as part of this migration. [21:54:14] 10Analytics, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): wikimedia-event-utilities should provide tools for JVM based apps producing directly to kafka - https://phabricator.wikimedia.org/T270371 (10Mstyles) a:03dcausse [21:55:19] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [22:03:50] 10Analytics, 10Event-Platform: ExternalLinksChange Event Platform Migration - https://phabricator.wikimedia.org/T271162 (10Samwalton9) @Ottomata Is this schema what defines the page-links-change EventStream? That stream appears to be functional and sending data. If not, could you elaborate on what this is? [22:05:37] 10Analytics, 10Event-Platform: ExternalLinksChange Event Platform Migration - https://phabricator.wikimedia.org/T271162 (10Ottomata) No, this is the legacy EventLogging schema https://meta.wikimedia.org/wiki/Schema:ExternalLinksChange [22:08:34] 10Analytics, 10Event-Platform: CentralNoticeBannerHistory Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) [22:11:17] 10Analytics, 10Event-Platform: CentralNoticeBannerHistory Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) @AndyRussG Let us know if this schema needs client IP and/or geocoded data. If not, it will be removed as part of this migration. [22:14:40] 10Analytics, 10Anti-Harassment, 10Event-Platform: AutoblockIpBlock Event Platform Migration - https://phabricator.wikimedia.org/T267340 (10Ottomata) FYI @sdkim I'm declining this one and marking it as To Deprecate on our audit sheet. It hasn't had any data for years. Niharika if we should still migrate it a... [22:16:05] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [22:22:54] !log reboot an-test-coord1001 to upgrade kernel [22:22:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:23:07] (03PS1) 10Milimetric: Clarify requirements for building [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/654319 [22:25:04] 10Analytics, 10Event-Platform: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10Isaac) > let us know if this schema needs client IP and/or geocoded data? If not, it will be removed as part of this migration. If it's not hard, I'd ask to retain the geoco... [23:00:32] 10Analytics, 10Event-Platform, 10Fundraising-Backlog: CentralNoticeBannerHistory Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10AndyRussG) [23:02:36] 10Analytics, 10Event-Platform, 10Fundraising-Backlog: CentralNoticeBannerHistory Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10AndyRussG) >>! In T271168#6721247, @Ottomata wrote: > @AndyRussG > Let us know if this schema needs client IP and/or geocoded data. If not, it will be remo... [23:19:24] 10Analytics, 10Analytics-Data-Quality: Unique devices numbers for all wikipedias missing for Agust and SEptember - https://phabricator.wikimedia.org/T271170 (10Nuria) [23:34:14] (03PS4) 10Milimetric: Upgrade Webpack from 2 to 5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/649311 (https://phabricator.wikimedia.org/T188759) (owner: 10Fdans) [23:35:31] (03CR) 10Milimetric: [C: 04-1] "This all looks great, but there were two very weird style and number formatting regressions. Maybe they were hidden by the bad style scop" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/649311 (https://phabricator.wikimedia.org/T188759) (owner: 10Fdans) [23:40:47] (03CR) 10Ppchelko: [C: 03+1] [WIP] Add log-entry create schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/651635 (https://phabricator.wikimedia.org/T263055) (owner: 10Milimetric) [23:40:52] (03CR) 10Milimetric: "Looks great" (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/648376 (owner: 10Fdans) [23:44:36] (03CR) 10Milimetric: [WIP] Add log-entry create schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/651635 (https://phabricator.wikimedia.org/T263055) (owner: 10Milimetric) [23:47:06] (03CR) 10Ppchelko: [C: 03+1] [WIP] Add log-entry create schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/651635 (https://phabricator.wikimedia.org/T263055) (owner: 10Milimetric)