[00:53:55] PROBLEM - Check the last execution of monitor_refine_eventlogging_analytics_failure_flags on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_analytics_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:41:11] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Milimetric) side note: something weird's going on with kinit. I got this error again: `ExitCodeException exitCode=1: kinit: KDC can't fulfill requeste... [05:00:16] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Milimetric) Started sqoop on the top 10 tables (well, 9 'cause I had already fetched one of them). I did a diff between what I tried to sqoop and what... [05:01:45] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Product-Analytics (Kanban): SQL definition for structure data in commons metrics - https://phabricator.wikimedia.org/T247101 (10Nuria) 05Open→03Resolved [05:01:47] 10Analytics, 10Patch-For-Review, 10Product-Analytics (Kanban): SQL definition for wikidata metrics for tunning session - https://phabricator.wikimedia.org/T247099 (10Nuria) [05:02:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Product-Analytics (Kanban): SQL definition for wikidata metrics for tunning session - https://phabricator.wikimedia.org/T247099 (10Nuria) [05:09:16] 10Analytics: SQL query failed on superset SQL lab - https://phabricator.wikimedia.org/T252225 (10Nuria) 05Open→03Resolved [05:09:38] 10Analytics: SQL query failed on superset SQL lab - https://phabricator.wikimedia.org/T252225 (10Nuria) closed, please reopen if help is still needed [05:10:07] 10Analytics, 10Analytics-Kanban: Geoeditors job is faliing due to problems with geo udf - https://phabricator.wikimedia.org/T252205 (10Nuria) 05Open→03Resolved [05:11:51] 10Analytics, 10Product-Analytics: Can't publish my draft dashboard on superset - https://phabricator.wikimedia.org/T248904 (10Nuria) 05Open→03Resolved [05:12:46] 10Analytics, 10Analytics-Kanban: Add page_restrictions table to sqoop list - https://phabricator.wikimedia.org/T251749 (10Nuria) p:05Triage→03High [05:33:18] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Superset aggregation across edit tags uses all tags - https://phabricator.wikimedia.org/T243552 (10Nuria) Maybe I am missing something but ...isn't revision_tags an array per revision? [06:35:23] good morning :) [06:51:56] Hi elukey - Need to go for dentist appointment in minutes - Will be back before lunch [06:54:53] ack! [07:35:31] 10Analytics, 10Analytics-Kanban: Add new Druid nodes to analytics and public clusters - https://phabricator.wikimedia.org/T252771 (10elukey) p:05Triage→03High [07:37:35] 10Analytics: reportupdater shoudl run with python3 - https://phabricator.wikimedia.org/T253418 (10elukey) @Nuria can you list what it is not currently working? RU works with python3, but we might have missed some code paths that are not currently covered in prod jobs.. [08:38:37] RECOVERY - Check the last execution of monitor_refine_eventlogging_analytics_failure_flags on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:41:03] (03CR) 10Fdans: "ping on this review :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/595152 (https://phabricator.wikimedia.org/T251777) (owner: 10Fdans) [09:09:58] (03CR) 10Joal: "All good except for a naming question :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/595152 (https://phabricator.wikimedia.org/T251777) (owner: 10Fdans) [09:11:17] joal: hmmm at the end of the conversation dan said "or complete" and nuria said "yea complete, comprehensive, whatever" [09:11:36] and then when writing the thing comprehensive sounded silly to me [09:11:46] hm [09:12:38] ok then fdans [09:13:40] joal: I actually don't really care... this is why I'm always against being all smart naming things [09:13:45] joal: which one do you prefer? [09:14:32] fdans: I don't know! I just was with `comprehensive` left in mind - therefore my question - but if `complete` it is, then be it :) [09:16:21] I think Pagecounts-FD was the better name :P [09:16:28] joal: ok, so we keep complete [09:16:47] let's merge? [09:17:16] For sure it was fdans - Let's merge :) [09:19:03] elukey: thanks a lot for the rerun - I wanted to do that now but as usual I'm too slow [09:27:52] joal: np! [09:52:52] (03CR) 10Fdans: [V: 03+2 C: 03+2] "Merging as discussed with joal" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/595152 (https://phabricator.wikimedia.org/T251777) (owner: 10Fdans) [10:02:31] (03PS1) 10Joal: Add page_restrictions table to sqoop script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598706 (https://phabricator.wikimedia.org/T251749) [10:08:19] so this time the rollout of Bigtop failed, but the rollback was easy [10:08:29] \o/ [10:08:39] well I am not that happy :D [10:08:46] maybe the third time both things will be good [10:08:53] I understand - But rollback is important as well ;) [10:09:58] joal: this time I have used the "brutal one", namely the rollback to a known state that destroys data between rollout and rollback [10:10:03] way better [10:10:15] and it is the only one that works with the HA config that we have [10:10:33] elukey: by destroy data you talk about namenode data, right, not block data? [10:12:11] joal: I am not sure exactly how it works, since the datanodes are probably forced to go back to a known state too, otherwise blocks may be different no? [10:12:23] something like the cassandra hard links [10:12:39] elukey: if there is block conversion yes, if it's only meta-data conversion (namenode), nope :) [10:13:14] joal: if a block is changed between the rollout and the rollback, it must go back to a previous state in theory [10:13:29] otherwise the NN points to blocks that are not the ones saved during the snapshot [10:18:57] there is in fact this [10:18:58] Rollback the datanode to the previous version. This should be used after stopping the datanode and distributing the old hadoop version. [10:19:22] something that wasn't included in my guide [10:20:01] and the "finalize upgrade" on the NN causes [10:20:02] Finalize upgrade of HDFS. Datanodes delete their previous version working directories, followed by Namenode doing the same. This completes the upgrade process. [10:20:58] so is it needed? Probably yes [10:21:05] what a mess the hadoop docs [10:25:10] and now I have a similar question in mind about fsimage recovery, if needed [10:25:29] we can go back to a previous known good state for metadata, but blocks data? [10:40:23] ok going to lunch + interview later, ping me if needed! [11:14:25] (03PS2) 10Joal: Add page_restrictions table to sqoop script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598706 (https://phabricator.wikimedia.org/T251749) [11:16:24] (03PS1) 10Joal: Update sqoop table script for schema changes [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598722 (https://phabricator.wikimedia.org/T252565) [11:34:47] mforns: Hi :) When you're in and have some time, would you agree discussing with me the mediawiki_history_reduced things? [13:12:47] 10Analytics, 10Product-Analytics: Can't publish my draft dashboard on superset - https://phabricator.wikimedia.org/T248904 (10Esanders) 05Resolved→03Open As I stated above, I managed to publish the dashboard via another route, but the Draft button didn't work. Not seeing any info here that suggests this i... [13:22:24] 10Analytics, 10Better Use Of Data, 10Event-Platform: Document in-schema who sets which fields - https://phabricator.wikimedia.org/T253392 (10Ottomata) Hm, I'm not sure in the schema is quite the right place to do this. The fact that EventGate does anything at all is just an implementation detail, but doesn'... [13:35:34] ottomata: good morning, your email made my day thanks [13:35:42] ahahhaah [13:42:46] haha :p [13:55:24] joal: so I had to explicitly rollback the datanodes too [13:55:36] and everything went ok, really nice and easy [13:55:42] fsck now shows healthy [13:56:06] I'll write some notes in the task [13:56:14] elukey: I'm wondering about the semantic about rollback here (same thoughts as with cassandra) [13:56:41] I'm sure you'd need to rollback sofware [13:57:01] but does this involve changing/copying/updating the data? [13:59:12] joal: so from what I can see, when you tell the namenode to upgrade, it does two macro-things: [13:59:25] 1) saves a copy of the fs image as "previous" in a know location [13:59:37] 2) tells to all the datanodes to do the same, using hardlinks [13:59:50] and eventually there are two possibilities: [13:59:56] hack elukey [14:00:04] 1) finalize the upgrade, so the previous state is discarded [14:00:15] 2) rollback, the current state is discarded [14:00:28] but it needs to be explicitly mentioned to all the daemons [14:00:30] sigh [14:00:32] does it make sense? [14:01:16] it does elukey - Now that also means data duplication if datanodes need to copy blocks for format change [14:04:38] yeah [14:04:54] elukey: we should check that among you tests :) [14:05:31] gone for kids until standup - talk later [14:13:23] 10Analytics, 10Event-Platform: Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Ottomata) [14:14:26] 10Analytics, 10Event-Platform: Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Ottomata) This will also allow us to finish https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/576145, where we don't use the deprecated `onResourceLoade... [14:14:57] 10Analytics, 10Event-Platform: Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Ottomata) @Krinkle WikimediaEvents uses skinScripts, but the [[ https://www.mediawiki.org/wiki/ResourceLoader/Package_modules#Incompatibility_with_'scripts' | packageFiles do... [14:16:31] 10Analytics, 10Event-Platform: Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Ottomata) @Sbisson, do you think you could finish this patch? I think you have a lot more context here as to how WikimediaEvents is used. [14:23:29] 10Analytics, 10Analytics-Kanban: MEP: canary events so we know events are flowing through pipeline - https://phabricator.wikimedia.org/T250844 (10Ottomata) Since the implementation of this and T251609 are so similar, I'm going to merge this task into T251609 and redescribe that one to mention canary events. [14:23:42] 10Analytics, 10Analytics-Kanban: MEP: canary events so we know events are flowing through pipeline - https://phabricator.wikimedia.org/T250844 (10Ottomata) [14:23:44] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10Ottomata) [14:28:43] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) [14:29:45] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) > I'm going to make T250844 a sub... [14:30:10] 10Analytics: reportupdater should run with python3 - https://phabricator.wikimedia.org/T253418 (10Nuria) [14:42:16] milimetric: helllOoOo yt? [14:51:26] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) > If it is, then consume the late... [14:52:19] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) I was able to rollback successfully but the caveat was that even the datanodes need to be rolledback. When the `upgrade` command is issued to... [14:53:54] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) Ah right so if we don't have a st... [14:56:44] ottomata: o/ [14:57:26] hello! [14:57:29] do you think it'll be possible to increase the retention on few MW event topics? [14:57:45] say from 7days to 30days for edits stream [14:59:51] in kafka main? [14:59:57] or jumbo? [15:00:05] well now on jumbo but later main [15:00:13] jumbo yes, main no [15:00:22] we used to have it at 30 days for main [15:00:37] ok [15:00:38] but revisoin-create contains data that is sometimes supressed [15:00:42] on wiki [15:01:01] having 30 days exposed increases the number of ediits that might be available at any given time that have been suppressed [15:01:18] ok I see [15:01:19] so we reduced it back to 7 days [15:01:55] ah sorry, and main is used to export streams publicly via evenstreams [15:01:56] so hm [15:02:07] if we did need this data in kafka for more than 7 days [15:02:16] we should probably just reproduce it to a different topic [15:02:20] and adjust that retention [15:02:29] probably would be better to have dedicated public topics [15:02:34] than special internal topics [15:02:44] buuut that would take some implementing... [15:02:53] or go back to hdfs but that was not super trivial from flink [15:02:56] maybe if we get a prod flink up somewhere we can have a very simple replayer job for some topics [15:04:42] I'll see if I can write a simple job that just reread the purged events from hdfs directly instead of relying on kafka retention [15:05:11] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) [15:05:13] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Services (watching): Add examples to all event schemas - https://phabricator.wikimedia.org/T242454 (10Ottomata) [15:05:24] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) FYI {T242454} must be done for th... [15:05:24] ok [15:09:58] ottomata: I’m off today, enjoying some kind of old normal :) [15:10:51] ok! enjoy! [15:11:38] 10Analytics, 10Event-Platform: Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Krinkle) For `skinScripts` this is no longer an issue. The path of the package file can be defined by a callback and thus vary by the given skin name (in $context) the same s... [15:11:47] 10Analytics, 10Event-Platform, 10Performance-Team (Radar): Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Krinkle) [15:15:40] hey joal, just saw your ping! yt? [15:24:10] 10Analytics, 10Event-Platform, 10Performance-Team (Radar): Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Ottomata) Oh, I guess the docs should be updated then? [16:01:38] a-team: staddduppppp [16:31:34] 10Analytics, 10Event-Platform: Write blog post(s) about MEP - https://phabricator.wikimedia.org/T253649 (10Ottomata) [16:34:19] 10Analytics, 10User-Elukey: Upgrade Druid to its latest upstream version (currently 0.18.1) - https://phabricator.wikimedia.org/T244482 (10elukey) Highlights for release 0.13: https://github.com/apache/druid/releases/tag/druid-0.13.0-incubating Things to consider: * automatic compaction of segments has been a... [16:35:11] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Analytics: Write and update Event Platform instrumentation documentation for Product teams - https://phabricator.wikimedia.org/T233329 (10Ottomata) Today I joined the Tech Documentation office hours. Notes: - Don't need to justify changes so mu... [16:40:56] elukey: :[ I think pyarrow's HadoopFileSystem is having issues with Kerberos when using LocalExecutor [16:41:15] dunno, wanna bc? [16:58:08] 10Analytics: Web publication doesn't work - https://phabricator.wikimedia.org/T253661 (10jwang) [17:12:26] 10Analytics, 10Event-Platform, 10Product-Analytics (Kanban): Product Analytics to review & provide feedback for Event Platform Instrumentation How-To - https://phabricator.wikimedia.org/T253269 (10LGoto) p:05Triage→03Medium a:03mpopov [17:17:30] mforns: sorry! I am here [17:17:40] I thought you needed more time and I went out a bit [17:18:08] I have time now if you are around [17:19:38] 10Analytics, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Performance-Team (Radar): Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Ottomata) [17:21:52] 10Analytics, 10Inuka-Team, 10Language-strategy, 10Privacy Engineering, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10JFishback_WMF) [17:24:25] 10Analytics, 10Operations, 10Privacy Engineering, 10Traffic: Publishing project anomaly data for censorship researchers. Evaluate privacy threats - https://phabricator.wikimedia.org/T183990 (10JFishback_WMF) [17:26:43] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Creation of canonical pageview dumps for users to download - https://phabricator.wikimedia.org/T251777 (10Isaac) Makes sense @fdans -- I think page IDs + API is being tracked at T159046, so I'll hold off on creating a new task. [17:43:27] 10Analytics, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Performance-Team (Radar): Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Krinkle) [17:44:05] elukey: you still there? [17:47:16] * addshore sits down now to move his notebooks [17:51:42] mforns: yep, bc in 5 mins? [17:51:51] ok [18:00:09] I'm there [18:03:26] mforns: joining [18:11:31] https://usercontent.irccloud-cdn.com/file/RbdGqjIX/image.png [18:11:34] ^^ anyone any ideas? [18:14:22] addshore: did you copy the venv from a notebook to either 1005 or 1008? [18:14:30] nope [18:14:53] to what host then? [18:14:57] I left the venve there, and the first time I ran this notebook on stat1007 it installed the package as exptected [18:15:20] I only had ~15 notebooks so i just downloaded and reuplaoded them in the UI :P [18:15:59] i vaugly remember something being wrong with my venv somewhere before though, perhaps when i started using notebook1003 in the first place [18:16:33] I'm not a python pro :P [18:19:42] weird [18:19:43] (venv) elukey@stat1007:/home/addshore$ pip freeze | grep find [18:19:43] findspark==1.3.0 [18:19:48] so I can see it in your venv [18:20:11] I'm using a PySpark - YARN (large), do i need to do anything fancy to make it use the right venv? [18:20:39] https://usercontent.irccloud-cdn.com/file/I68UK4AN/image.png [18:22:55] https://www.irccloud.com/pastebin/MCZxtACh/%3F [18:23:26] I tried doing help("modules") and got some errors and things / warnings [18:23:56] and help("modules") didnt list findspark [18:31:12] mhhym, well, the notebooks are all on stat1007 now, I just need to make them work again :D *no longer needed notebook1003* [18:31:18] * addshore will look more tommorrow perhaps [18:31:41] addshore: ping me so we can check together tomorrow :) [18:31:45] yarp! [18:51:42] * elukey off! [19:51:42] nuria: yt? got a sec for a brain bounce maybe java refinery source related? :) [19:59:40] ottomata: on metting with recruiting , sorry [20:04:08] np! [21:16:42] ottomata: is eventbus stream config available anywhere for reading? similar to the stream-(secondary) schema pairs in wgEventStreams in wmf-config/InitialiseSettings.php but for primary schemas? [21:17:42] bearloga: they just quit 4 mins ago [21:18:09] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Technical contributors emerging communities metric definition, thick data - https://phabricator.wikimedia.org/T250284 (10jwang) > let's just consider 'wikipedia' projects > let's remove any project with less than 5 editors or more than 40,000 (to include... [22:30:40] 10Analytics, 10Operations, 10ops-eqiad: an-presto1004 down - https://phabricator.wikimedia.org/T253438 (10wiki_willy) a:03Cmjohnson [22:33:21] 10Analytics: Decomission notebook hosts - https://phabricator.wikimedia.org/T249752 (10nettrom_WMF) Moved everything off of notebook1003 and notebook1004. Shut down the Jupyter servers on both as well. Good to go as far as I'm concerned. Thanks for the work on maintaining these hosts!