[00:07:06] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Remove deprecated using schema.* syntax from WikimediaEvents - https://phabricator.wikimedia.org/T223285 (10Krinkle) @Milimetric Remember that while mw.track is encouraged overall (async, depend... [00:57:15] PROBLEM - Check the last execution of refinery-import-page-history-dumps on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [00:59:07] PROBLEM - Check the last execution of reportupdater-browser on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [01:02:43] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Multimedia, and 3 others: Remove deprecated using schema.* syntax from MultimediaViewer - https://phabricator.wikimedia.org/T223284 (10Milimetric) [01:03:07] PROBLEM - Check the last execution of reportupdater-interlanguage on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [01:08:13] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused [01:08:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Remove deprecated using schema.* syntax from WikimediaEvents - https://phabricator.wikimedia.org/T223285 (10Milimetric) [01:17:35] RECOVERY - Check the last execution of refinery-import-page-history-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-page-history-dumps [01:19:29] RECOVERY - Check the last execution of reportupdater-browser on stat1007 is OK: OK: Status of the systemd unit reportupdater-browser [01:23:33] RECOVERY - Check the last execution of reportupdater-interlanguage on stat1007 is OK: OK: Status of the systemd unit reportupdater-interlanguage [03:02:17] PROBLEM - Check the last execution of refinery-import-page-history-dumps on stat1007 is CRITICAL: CRITICAL: Status of the systemd unit refinery-import-page-history-dumps [04:24:01] PROBLEM - Check the last execution of monitor_refine_eventlogging_analytics on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_analytics [06:29:45] RECOVERY - Check the last execution of refinery-import-page-history-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-page-history-dumps [06:42:07] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1007 is OK: OK [06:52:42] 10Analytics: Can't log into Superset - https://phabricator.wikimedia.org/T223335 (10elukey) @Nuria the user that you added missed the "Alpha" role, I think that this is the reason for the 401.. it should be good now! @kaldari please retry when you have time :) [08:37:32] starting the superset migration [08:37:37] superset is now not available [08:43:40] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add artifacts for Debian Buster and upgrade to 0.32rc2 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/495182 (https://phabricator.wikimedia.org/T212243) (owner: 10Elukey) [08:49:06] of course I found something that breaks only now [08:59:30] I am currently building another version of superset [08:59:48] basically pyhive doesn't work anymore with python3.7 (the current version I mean) [09:21:30] (03PS1) 10Elukey: Upgrade pyhive to 0.6.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/510447 (https://phabricator.wikimedia.org/T212243) [09:21:32] (03PS1) 10Elukey: Rebuild with updated pyhive dependency [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/510448 [09:58:41] (03PS2) 10Elukey: Upgrade pyhive to 0.6.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/510447 (https://phabricator.wikimedia.org/T212243) [09:58:43] (03PS2) 10Elukey: Rebuild with updated pyhive dependency [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/510448 [10:03:19] (03CR) 10Elukey: [V: 03+2 C: 03+2] Upgrade pyhive to 0.6.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/510447 (https://phabricator.wikimedia.org/T212243) (owner: 10Elukey) [10:03:26] (03CR) 10Elukey: [V: 03+2 C: 03+2] Rebuild with updated pyhive dependency [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/510448 (owner: 10Elukey) [10:09:58] ok so the issue seems solved, but it is best to wait a bit before merging the varnish/ats change since the traffic team is busy [10:10:16] so I'll keep superset down for a bit more (until after lunch) [10:10:19] and then I'll do the migration [10:31:17] superset updated! [10:34:19] https://superset.wikimedia.org/static/assets/version_info.json [10:34:26] "version": "0.32.0rc2-wikimedia1" [10:34:35] !log superset upgraded to 0.32 [10:34:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:34:48] the last missing step is to restart the hdfs masters [10:34:57] to apply the new config for analytics-tool1004 [10:35:03] (namely being proxies for some users) [10:35:07] will do it after lunch! [10:35:18] * elukey lunch! [14:42:06] 10Analytics: CamusPartitionChecker email alert should be more descriptive - https://phabricator.wikimedia.org/T223387 (10Ottomata) [14:44:55] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up edit_hourly data set in Hive - https://phabricator.wikimedia.org/T220092 (10mforns) [14:45:16] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up edit_hourly data set in Hive - https://phabricator.wikimedia.org/T220092 (10mforns) [14:45:38] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up edit_hourly data set in Hive - https://phabricator.wikimedia.org/T220092 (10mforns) Added some docs on Wikitech: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Edit_hourly [14:47:05] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up edit_hourly data set in Hive - https://phabricator.wikimedia.org/T220092 (10Nuria) Please @MNeisler and @Neil_P._Quinn_WMF take a look at docs and update as needed: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Edit_hourly [15:21:24] 10Analytics, 10Analytics-Kanban, 10Operations, 10vm-requests, and 2 others: Create an-tool1005 (Staging environment for Superset) - https://phabricator.wikimedia.org/T217738 (10elukey) @MoritzMuehlenhoff I am ready to make this host a proper staging environment for superset, let me know if we can proceed o... [15:50:54] 10Analytics: Make it (just a bit) easier to spin up Hadoop cluster in Cloud VMs - https://phabricator.wikimedia.org/T223389 (10Ottomata) [15:52:26] 10Analytics, 10Analytics-Cluster: Make it (just a bit) easier to spin up Hadoop cluster in Cloud VMs - https://phabricator.wikimedia.org/T223389 (10Ottomata) [16:02:24] ping joal , fdans standduppp [16:02:54] sorry omw! [16:03:50] (03PS2) 10Elukey: oozie/wikidata: move user to analytics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/509016 (https://phabricator.wikimedia.org/T220971) [16:04:06] (03CR) 10Elukey: [V: 03+2 C: 03+2] oozie/wikidata: move user to analytics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/509016 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [16:04:15] (03PS2) 10Elukey: oozie/mediawiki/geoeditors: move coordinators to the analytics user [analytics/refinery] - 10https://gerrit.wikimedia.org/r/509026 (https://phabricator.wikimedia.org/T220971) [16:04:32] (03CR) 10Elukey: [V: 03+2 C: 03+2] oozie/mediawiki/geoeditors: move coordinators to the analytics user [analytics/refinery] - 10https://gerrit.wikimedia.org/r/509026 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [16:37:29] elukey, you wanted to talk about RU? [16:38:21] mforns: yep! but not super urgent.. basically I'd like to start running the hdfs based jobs with the 'analytics' user [16:38:25] IIRC they are a couple [16:38:48] and I'd need to know 1) what data are they reading 2) what are they creating/writing to [16:38:56] so I can see if any chown etc.. is needed [16:38:58] elukey, the only potential issue that I see is, we have to change the report files permissions [16:41:44] ok elukey let me prepare a small doc and I will pass it to you [16:41:53] mforns: we can bc-2 if you want [16:41:54] I can see [16:41:55] # Set up reportupdater. [16:41:55] # Reportupdater here launches Hadoop jobs, and [16:41:55] # the 'hdfs' user is the only 'system' user that has [16:41:57] # access to required files in Hadoop. [16:41:57] ok! [16:42:05] omw [16:42:30] elukey, batcave is empty if you want [16:42:35] ah sure! [16:45:07] hey a-team, qq: which is the IP for hdfs? I'm trying to use a python client to read directly from hadoop, but ask for that parameter [16:45:25] dsaez: there isn't one IP, since it is an HA setup [16:45:38] if your python client can read the default hadoop conf in /etc/hadoop/conf [16:45:46] you should be able to use [16:45:52] hdfs://analytics-hadoop/ [16:45:58] thx ottomata, I'll try that [16:49:42] ok I need to roll restart the HDFS masters [16:51:09] starting with 1002 (the standby) [17:01:13] addshore: ehm allocated vcores 1440 MB allocated 3964928 [17:01:22] are you cryptomining? :D [17:01:44] I'm being evil arent I [17:01:56] its 70%! [17:02:08] if you want me to stop though I can! :) [17:02:35] in theory no if it closes to finish, but please be more gentle next time :P [17:05:31] :) I was trying to find a gif to represent my interactions with hadoop sometimes, but I cant find one that quite lives up to my image [17:06:02] big bear pawing at a door trying to get in an failing, but successfully scratching everything, then falling backwards and walking away [17:07:04] ahahaha [17:08:27] addshore: https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&panelId=50&fullscreen [17:08:35] look how beautiful it is [17:09:18] :D looks like what happened a few hours ago to inbound traffic to the whole wfm cluster ;) [17:11:08] all done! [17:11:14] :)