[06:01:16] good morning! [06:17:45] 10Analytics-Clusters: Deprecate Hue (if possible) - https://phabricator.wikimedia.org/T258799 (10elukey) 05Open→03Declined In T258768 the Analytics team is trying to upgrade Hue to a new version, packaging the upstream code directly rather than relying on distributions (like CDH). For the moment I am closing... [06:17:47] 10Analytics, 10Analytics-Kanban: Analytics Ops Technical Debt - https://phabricator.wikimedia.org/T240437 (10elukey) [06:23:29] 10Analytics: Apply proper permissions to stat100x home directories - https://phabricator.wikimedia.org/T262183 (10elukey) [06:24:44] 10Analytics: Apply proper permissions to stat100x home directories - https://phabricator.wikimedia.org/T262183 (10elukey) [06:25:00] 10Analytics: Apply proper permissions to stat100x home directories - https://phabricator.wikimedia.org/T262183 (10elukey) [06:26:03] Good mornig :) [06:32:46] 10Analytics, 10Analytics-Kanban: Sort editors-by-country by descending editor-ceil value in cassandra - https://phabricator.wikimedia.org/T262184 (10JAllemandou) [06:33:25] 10Analytics: Apply proper permissions to stat100x home directories - https://phabricator.wikimedia.org/T262183 (10elukey) [06:34:51] 10Analytics: Sort editors-by-country by descending editor-ceil value in cassandra - https://phabricator.wikimedia.org/T262184 (10JAllemandou) [06:47:42] (03PS1) 10Joal: Sort editors by-country for cassandra loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/625586 (https://phabricator.wikimedia.org/T262184) [06:48:30] 10Analytics: Apply proper permissions to stat100x home directories - https://phabricator.wikimedia.org/T262183 (10elukey) Checked briefly and we could add new parameters to each users of admin's data.yaml, and then read them in the related defines (`admin::hashuser` mostly) but there are some things to solve fir... [06:48:40] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Sort editors-by-country by descending editor-ceil value in cassandra - https://phabricator.wikimedia.org/T262184 (10JAllemandou) a:03JAllemandou [07:20:55] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) @Cmjohnson I'd need these to be on Stretch, I have updated dhcp accordingly, will try to reimage :) [07:22:53] so we have the new GPU workers racked, I need to reimage them because they have a different number of disks [07:23:31] for the regular hadoop worker, we have 2xSSD disks in a flexbay, set with hw raid1 (so we see it as one disk in the os, very handy) [07:23:50] and 12x4TB disks (non ssds) [07:26:12] the new GPU nodes, due to chassis space constraints (namely our dear GPUs) have 24x2T disks (not ssds) and no flexbay [07:29:33] so both puppet + partman need to consider this use case (the datanode partitions may vary from 12 to 23/24) [07:29:57] also, the worker nodes are on stretch, that it is challenging for AMD drivers [07:30:31] the work that Tobias is doing to add ROCm dkms drivers will be surely helpful, we'll see if applicable to stretch as well [07:30:45] otherwise no gpus until we upgrade to bigtop + buster :( [07:32:01] there's also a third option, though: [07:33:14] since Stretch us now in LTS stage and people were previously using the stretch-backports kernel (which no longer exists), 4.19 is now an officially supported, second kernel for Stretch (based on the 4.19 security updates for Buster) [07:33:19] https://lists.debian.org/debian-lts-announce/2020/08/msg00019.html [07:33:47] so as long as the rocm userland stack can be installed on stretch, we can simply use the 4.19 kernel on stretch as well [07:33:49] ah interesting! [07:34:20] but ofc, moving to bigtop brings many other things, but at least wanted to mention the option [07:34:21] so in theory the work for dkms could be re-used also on strech if needed [07:34:27] yeah [07:34:46] the DKMS tool is older, but it should not make a difference [07:35:02] yes sure sure bigtop is the priority, I contacted upstream to have a sense of when they'll release 1.5 (supporting debian 10) but no answer yet [07:35:34] and even with 4.19 rocm should be the same as on buster (sans the kernel metrics added in 5.2) [07:35:38] as I was telling to Nuria we'll have to shift our mentality with Bigtop, from "let's only get packages and deploy" to "we'll have to contribute to upstream actively" [07:36:06] which is also nice, as we can shape things in the right direction then :-) [07:40:54] yep I agree! [07:41:21] I am a little bit worried about the hadoop 3 migration, since IIUC it is stalled due to work needed to port scripts etc.. [07:42:12] last year at the apachecon in berlin there was a talk from the hadoop team at CERN, they used to be on CDH and then migrated to their own packaging of hadoop upstream code [07:42:18] (main hadoop, not bigtop etc..) [07:42:28] I asked some info via email but they never reached back [07:44:48] (for some reason upstream devs don't like me :P) [07:55:52] 10Analytics: Create a cookbook to automate the bootstrap of new Hadoop workers - https://phabricator.wikimedia.org/T262189 (10elukey) [07:57:07] brb [08:04:30] back from impromptu appointment - ping me when you're back elukey, I'd like to confirm my understanding of the new hadoop workers config/challenges :) [08:04:38] elukey: and have a coffee :) [08:05:46] joal: I am back :) [08:05:53] \o/ :) Coffee? [08:06:28] ah yes sorry I didn't get it, sure! [09:01:31] (03CR) 10Joal: [C: 03+1] "Change looks good." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624779 (owner: 10Milimetric) [09:04:33] (03CR) 10Joal: [C: 03+2] "LGTM ! To be deploy with its oozie companion https://gerrit.wikimedia.org/r/c/analytics/refinery/+/623456" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/612454 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria) [09:08:01] (03CR) 10Joal: [C: 03+1] "One nit on commit message, code looks good" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria) [09:08:57] (03Merged) 10jenkins-bot: Chopping timeseries for noise detection [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/612454 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria) [09:12:09] elukey: asking for superpowa help please [09:12:24] elukey: in the context of https://phabricator.wikimedia.org/T262141 [09:12:49] sure [09:12:54] how can I help? [09:13:36] elukey: on stat1007, in /srv/dumps/pagecounts-ez/merged, there is a file named files_in_hdfs, of size 0, that we should delete IMO [09:14:10] elukey: sorry, minor thing, but couldn't do it :S [09:14:52] elukey: then I have a question - in that folder, we store BZ2 files only, while external dumps-site shows an uncompressed verison (see https://dumps.wikimedia.org/other/pagecounts-ez/merged/) [09:15:18] elukey: do we know where the decompression happens? Cause I think the problem in the task comes from that [09:15:37] ok file deleted [09:15:41] thsanks mate [09:16:16] for the rest, it is probably something buried inside the perl code :D [09:16:26] lemme check [09:16:31] elukey: I can't image so [09:17:01] elukey: perl would be generating the files - decompression must be done on the labsstore once files are avialable [09:18:20] that would be ideal but I dropped that concept after reading those intricate crons :D [09:18:29] MEH ! [09:18:42] elukey: no uncompressed file on stat1007 [09:18:53] elukey: so they are not copied [09:19:12] mmm where do you see uncompressed files? [09:19:17] https://dumps.wikimedia.org/other/pagecounts-ez/merged/2014/2014-10/ shows compressed no? [09:19:46] yes [09:19:59] elukey: uncompressed shows only from 2018-04 [09:20:08] WEIRD! [09:20:28] Ah! could that the data is first generated in that place, then compressed, then the file gets deleted [09:20:49] yes that could be possible [09:21:17] and depending on compression time, copy time etc, you end-up with partially-uncompressed data over the internetz [09:21:23] hm [09:21:59] elukey: I don't know how we can do, but deleting the uncompressed files and make sure they don't get released should be the way to go I assume [09:22:34] or maybe not [09:22:47] maybe it's on purpose :( [09:22:52] no idea [09:27:10] elukey: doing some checks and commenting on task [09:39:41] Morning! (ish...) [09:41:04] good morning :) [09:47:35] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1102.eqiad.wm... [09:56:37] Hi klausman [09:57:14] 'lo [10:23:02] 10Analytics: pagecounts-ez of month 2020-08 is incomplete - https://phabricator.wikimedia.org/T262141 (10JAllemandou) @Danilo: You should use the bz2 compressed version of the file, they are complete (I checked). The availability of the uncompressed version seems a bug. Compressed files are available on the com... [10:28:43] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1102.eqiad.wmnet'] ` and were **ALL** successful. [10:30:22] 10Analytics: pagecounts-ez of month 2020-08 is incomplete - https://phabricator.wikimedia.org/T262141 (10elukey) follow up: `dumps::web::fetches::stat_dumps` might be changed to force the copy only of the files that we want, so temporary ones are not transferred to labstore nodes.. [10:38:58] * elukey lunch! bb in ~2h [10:38:58] (03CR) 10Mforns: Removing seasonality cycle as it is fixed once granularity is set (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria) [10:41:17] (03CR) 10Joal: [C: 04-1] "Actually I forgot something! This patch should contain the change of jar version for spark." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria) [11:49:56] 10Analytics: Gather all data-purge into a single job - https://phabricator.wikimedia.org/T262201 (10JAllemandou) [11:52:24] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Technical contributors emerging communities metric definition, thick data - https://phabricator.wikimedia.org/T250284 (10Jhernandez) Hi @jwang, I read through and did some format and copy edits, hopefully small tweaks. Let me know if something is inapprop... [12:15:18] helloo team! :] [12:15:30] Hi mforns :) [12:15:48] hiiii [12:23:32] mforns:heya [12:23:45] hi klausman :] [12:24:07] have we met before? [12:24:18] definitely not in person :-P [12:24:44] I only started at WMF last Monday, so I may have flown in under your radar [12:25:06] oh cool :] we'll meet at standup then, probably [12:25:21] Aye [12:25:38] great [12:34:30] 10Analytics, 10Event-Platform: Duplicated revision_create events - https://phabricator.wikimedia.org/T262203 (10JAllemandou) [12:55:33] 10Analytics, 10Event-Platform: Need for new event-type - `user_create` and `user_rename` - https://phabricator.wikimedia.org/T262205 (10JAllemandou) [12:56:35] 10Analytics, 10Event-Platform: Need for new event-type - `user_create` and `user_rename` - https://phabricator.wikimedia.org/T262205 (10JAllemandou) [12:56:38] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: [SPIKE] Prototype of incremental updates for mediawiki history for simplewiki , including reverts using apache hudi - https://phabricator.wikimedia.org/T258532 (10JAllemandou) [15:04:10] nuria, ottomata, milimetric, mforns - standup? [15:04:38] ah only team europe :) [15:04:39] sorry :) [15:04:53] 10Analytics-Radar, 10Operations, 10Traffic, 10Patch-For-Review: Package varnish 6.0.x - https://phabricator.wikimedia.org/T261632 (10Vgutierrez) [15:04:56] mforns: you are not in the US! :D [15:14:16] a-team aaaah!!! joining [16:03:58] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) @Cmjohnson I think that the two SSDs in the flex bay are not configured with hardware RAID1 (like all the other hadoop wo... [16:05:27] klausman: this is the task that I was talking about: https://phabricator.wikimedia.org/T216528 [16:06:15] the GPU on stat1005 was very old, "enabled but not supported" in ROCm, so we had to change it [16:06:40] now we have one on 1005 and one on 1008, plus one on each of the 6 hadoop worker nodes that we are racking [16:23:22] Ah, good to know [16:49:11] elukey: I'm hitting the same problem I had with Spark and hudi: MapReduce job for hive fails because of the hbase token missing [16:49:42] * joal sighhhh looking at the ceiling :( [16:52:54] back in a min sorry1 [16:53:57] An actually, I have no idea why, but the error is gone for spark :( [16:55:44] MEH ! [16:56:06] And now query results for spark don't give correct results [16:56:10] it is strange, is there any default or config file from hudi that could lead spark to think that it needs a token for hbase? [16:56:42] elukey: I don't think so - hudi can use hbase as a backend for some indexing, but it's not set in our case [16:57:01] Any way - time to get diner [16:57:14] ack [16:57:15] I'll see you tomorrow team (sorry for the ranting elukey) [17:08:02] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10AMuigai) @Milimetric @Nuria Checking in again about the plans for this? [17:11:57] joal: no rant, I wish I could help! [17:20:27] * elukey off for today! [19:59:48] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Nuria) We are wrapping up the work listed on the ticket above so it is likely we can get to this quarter, to be clear thi... [20:00:06] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Nuria) a:03lexnasser [21:40:41] (03CR) 10Nuria: Removing seasonality cycle as it is fixed once granularity is set (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria)