[00:18:56] RECOVERY - Check the last execution of monitor_refine_eventlogging_analytics on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:45:58] RECOVERY - Check the last execution of monitor_refine_eventlogging_analytics_failure_flags on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:15:48] RECOVERY - Check the last execution of monitor_refine_mediawiki_job_events on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_mediawiki_job_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:02:31] 10Analytics: Decomission notebook hosts - https://phabricator.wikimedia.org/T249752 (10Iflorez) I deleted all files on nb3 and shutdown the server. I rsynced all files from nb4 and shutdown the server. Thank you! [04:08:36] How far does EventStream go in terms of recent changes? 30 days? 90 days? [05:09:06] RECOVERY - Check the last execution of analytics-dumps-fetch-mediawiki_history_dumps on labstore1007 is OK: OK: Status of the systemd unit analytics-dumps-fetch-mediawiki_history_dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:35:53] 10Analytics, 10Operations: Increase memory available for an-launcher1001 - https://phabricator.wikimedia.org/T254125 (10elukey) There are still some important jobs running so I cannot reboot the instance, will do it hopefully tomorrow :) [06:38:22] RECOVERY - Check the last execution of check_webrequest_partitions on an-launcher1001 is OK: OK: Status of the systemd unit check_webrequest_partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:41:37] elukey: have a nice day off!! :) [06:55:35] joal: o/ I'm around, in case you still have a question (you pinged me last friday) [06:59:55] Hi folks ) [07:00:09] dcausse: hi! I actually found my answer by myself :) [07:00:17] ok :) [07:10:58] 10Analytics, 10Analytics-Kanban: Add sqooped imagelinks table to oozie load job for hive to show new snapshots - https://phabricator.wikimedia.org/T254191 (10JAllemandou) To keep archives happy: There was a bug in loading the data in hive: https://gerrit.wikimedia.org/r/#/c/601387/ THis has been manually fixed... [07:46:12] djellel: Hi - Your spark job is eating all of the cluster resources [07:47:49] Hi. Ok, I killed it. [07:48:18] thanks djellel - you need to set limits in term of executor number for your jobs please :) [07:49:47] joal: I actually did. isn't 50 within limits? [07:51:18] djellel: job was not applying that limit - I suspect you used `--num-executors` while your job was running in `dynamicAllocation` mode (true by default) - The setting when running in dnamic allocation mode is: `--conf spark.dynamicAllocation.maxExecutors=50` [07:54:23] joal: this is correct. [07:55:36] joal: Are you saying that my spark job doesn't scale back if there are other jobs claim capacity? [07:56:08] correct djellel - your job requests an unlimitted number of executors and doewsn't release them as long as there is work to be done [07:57:12] djellel: some prod job are capable of killing some jobs resources to steal them, but it's best practice to not have to get there :) [08:04:11] sounds good, I'll keep that in mind. [09:10:40] 10Analytics: [Spike] Spark job for digests-only mediawiki-history-reduced - https://phabricator.wikimedia.org/T212928 (10JAllemandou) [09:10:42] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Define reduce calculations needed to compute active editors per project family - https://phabricator.wikimedia.org/T249751 (10JAllemandou) [09:12:03] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Define reduce calculations needed to compute active editors per project family - https://phabricator.wikimedia.org/T249751 (10JAllemandou) [09:15:22] (03PS1) 10Joal: [SPIKE] Add mediawiki-reduced spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601662 (https://phabricator.wikimedia.org/T212928) [09:15:43] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: [Spike] Spark job for digests-only mediawiki-history-reduced - https://phabricator.wikimedia.org/T212928 (10JAllemandou) [09:18:58] (03CR) 10jerkins-bot: [V: 04-1] [SPIKE] Add mediawiki-reduced spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601662 (https://phabricator.wikimedia.org/T212928) (owner: 10Joal) [09:36:26] joal: o/ my understanding is that nothing is exploding right? [09:36:40] sqoop still running so I cannot reboot an-launcher to add moar memory [09:36:44] all good so far elukey - you can enjoy time-off :) [09:37:05] elukey: with new slots tables, sqoop takes longer (2/3 days) [09:37:08] nice :) [09:37:11] I hope it'll be done tomorrow [09:37:21] elukey: see you tomorrow ;) [09:56:50] (03PS2) 10Joal: [SPIKE] Add mediawiki-reduced spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601662 (https://phabricator.wikimedia.org/T212928) [10:07:14] 10Analytics, 10Event-Platform, 10Operations, 10Services (watching): Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10hnowlan) I'm okay with using `general.yaml` in this way but I would like to put the kafka service list under a general hierarchy of services (like `"service... [10:35:34] 10Analytics, 10Wikidata, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Remove wb_terms from sqoop - https://phabricator.wikimedia.org/T249319 (10JAllemandou) 05Open→03Resolved [12:32:15] 10Analytics: Order mediawiki_history dumps by event_timestamp - https://phabricator.wikimedia.org/T254233 (10Milimetric) [12:50:04] 10Analytics: Mediawiki History dumps unique editors feature request - https://phabricator.wikimedia.org/T254234 (10Milimetric) [12:59:14] 10Analytics: Mediawiki History dumps unique editors feature request - https://phabricator.wikimedia.org/T254234 (10JAllemandou) We would need a heuristic about datasize as pages might have a big number of different users. If we implement the users-per page, we should probably implement the pages per-user :) [13:08:23] dsaez: Hello!! I'm sorry I completely missed the time of our meeting - I'm in now if you want :) [13:10:44] hi! sorry, just walked out from the computer. Let's reschedule for next week? [13:10:59] as you wish dsaez - I have time now, or next week :) [13:11:20] I would prefer next week [13:11:27] no problem dsaez [13:12:00] cool, thx I'll send an invitation [13:12:28] ack dsaez - later :) [13:23:43] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10Wikimedia-production-error: [beta] EventLogging trying to fetch wrong Schema title - https://phabricator.wikimedia.org/T254058 (10Ottomata) Hm, I can't see anything in the code that would cause this to happen, and it only happened for... [14:07:31] heya teamm [14:19:25] 10Analytics: Mediawiki History dumps unique editors feature request - https://phabricator.wikimedia.org/T254234 (10mforns) One thing we suggested in the past to solve this, was having 2 alternative dumps: one ordered by user_id, the other one by page_id. Not sure if this would cover all use cases that you guys t... [14:39:06] joal: hellooo joseph do you know of any docs for hdfs rsync? I can't find any reference to it on wikitech [14:42:16] heya elukey, I was reviewing https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594773/ post-merge (sorry I didn't see it before), thanks for that [14:49:48] (03PS1) 10Ottomata: Refactor JsonSchemaLoader into JsonLoader to allow for easy loading of remote JSON blobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601749 (https://phabricator.wikimedia.org/T251609) [14:50:19] (03CR) 10jerkins-bot: [V: 04-1] Refactor JsonSchemaLoader into JsonLoader to allow for easy loading of remote JSON blobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601749 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [14:50:45] (03PS2) 10Ottomata: Refactor JsonSchemaLoader into JsonLoader to allow for easy loading of remote JSON blobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601749 (https://phabricator.wikimedia.org/T251609) [14:51:10] (03PS3) 10Ottomata: Refactor JsonSchemaLoader into JsonLoader to allow for easy loading of remote JSON blobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601749 (https://phabricator.wikimedia.org/T251609) [14:52:03] (03CR) 10jerkins-bot: [V: 04-1] Refactor JsonSchemaLoader into JsonLoader to allow for easy loading of remote JSON blobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601749 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [15:02:59] UH OH [15:03:37] AH what why is my calendar stil lnot synced! [15:04:19] 10Analytics, 10Product-Analytics, 10MW-1.35-notes (1.35.0-wmf.35; 2020-06-02), 10Patch-For-Review: [Spike] Should EventLogging support DNT? - https://phabricator.wikimedia.org/T252438 (10jlinehan) >>! In T252438#6178811, @Milimetric wrote: > This is moot at this point, but sometimes I like to make sure I u... [15:30:02] 10Analytics: Temporarily remove hourly traffic alarms from analytics-alerts - https://phabricator.wikimedia.org/T254256 (10mforns) [15:30:19] 10Analytics, 10Analytics-Kanban: Temporarily remove hourly traffic alarms from analytics-alerts - https://phabricator.wikimedia.org/T254256 (10mforns) [15:30:56] (03PS1) 10Mforns: Send hourly traffic alarms to individuals only [analytics/refinery] - 10https://gerrit.wikimedia.org/r/601773 (https://phabricator.wikimedia.org/T254256) [15:31:43] 10Analytics, 10Analytics-Kanban: adjust fundraising firewalls and kafkatee configuration to accomodate new kafka brokers kafka-jumbo100[789] - https://phabricator.wikimedia.org/T254257 (10Jgreen) [15:45:53] (03PS4) 10Ottomata: Refactor JsonSchemaLoader into JsonLoader to allow for easy loading of remote JSON blobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601749 (https://phabricator.wikimedia.org/T251609) [16:00:48] a-team: standdduoppppppp [16:01:14] ping milimetric elukey [16:02:54] fdans: no doc on wikitech I think for hdfs-rsync - Everyhting in code, and in running -help [16:15:03] 10Analytics: Unique devices, retrofit with bot detection code - https://phabricator.wikimedia.org/T250744 (10Nuria) a:03JAllemandou [16:15:29] 10Analytics, 10Analytics-Kanban: Unique devices, retrofit with bot detection code - https://phabricator.wikimedia.org/T250744 (10Nuria) [16:49:50] joal: oh wait it's an actual command! [16:50:43] an-launcher1001 has all 4 vcores at 9x% of cpu usage [16:51:04] pf :( [16:51:45] but it seems under control, the java processes launched by sqoop require some cpu resource :) [16:52:19] load 15m is ~14 atm [16:52:19] elukey: I was looking at that - once every now and then there will be 4 parallel sqooped jobs [16:54:47] better to have these separate from hive/presto/etc.., and also we have a better idea now about resource consumption [16:54:52] 10Analytics: Mediawiki History dumps unique editors feature request - https://phabricator.wikimedia.org/T254234 (10marcmiquel) What Dan is suggesting is the number of editors who intervened in an article. That's very useful. However, a dataset ordered by user_id would be very useful too to understand the commun... [16:54:53] a new VM is probably needed [16:54:58] for sqoop/RU [16:55:01] and actually elukey I didn't realize - one reason for which there is so much load is because the number of cores is a lot smaller on an-launcher that it was on an-coord1001 [16:55:31] elukey: there is a param for sqoop about paralellization - it's set to 10 now - too much given the 4 CPUs of an-launcher [16:55:31] yes yes [16:55:49] elukey: I'm sorry I didn't realize :( [16:56:12] joal: no problem, I think that we are learning a lot about performance habits of our tools that we didn't know before :) [16:57:32] lowering it down to 4 might also cause sqoop to take more to run, so $maybe sqoop needs to run from the coordinator or a similar host [16:58:16] so one VM for RU jobs, on vm for launcher and sqoop on coordinator, even if the last one means causing potential problems to hive running [16:58:24] but since it is once a month, seems acceptable [16:58:34] elukey: sqoop running from a powerfull enough machine is a good idea [16:59:08] elukey: I'm trying to think of a way to make that run on yarn [16:59:57] joal: we have enough on our plate, if it is easy then could be a big win, otherwise let's just move it back to the coordinator and open a task for some quarter in the future [17:00:43] all right will think something and then tomorrow let's move some jobs :) [17:01:57] ack elukey [17:02:10] elukey: I'll have a look at skein - might be easy enough [17:12:06] 10Analytics, 10Product-Analytics: /srv/published should be structured similarly, have identical README across stat hosts describing said structure - https://phabricator.wikimedia.org/T254189 (10kzimmerman) p:05Triage→03Low This is not urgent and we also need to clarify the right approach; we're moving this... [17:18:04] 10Analytics, 10Discovery-Analysis, 10Product-Analytics (Kanban): Decomission maps metrics module from wikimedia/discovery/golden data retrieval - https://phabricator.wikimedia.org/T252365 (10mpopov) 05Open→03Resolved This is done [17:45:45] 10Analytics, 10MediaWiki-API: mostviewed generator not returning any results - https://phabricator.wikimedia.org/T254211 (10Pchelolo) I guess #analytics team is maintaining page views APIs, retagging. [18:27:33] 10Analytics, 10Operations, 10ops-eqiad: an-presto1004 down - https://phabricator.wikimedia.org/T253438 (10Cmjohnson) There is a larger issue with this server, replaced the CPU but noticed the power supplies are both failed. I could also smell burning in the server, swapped the power supplies with decom spar... [18:48:43] 10Analytics, 10Product-Analytics, 10MW-1.35-notes (1.35.0-wmf.35; 2020-06-02), 10Patch-For-Review: [Spike] Should EventLogging support DNT? - https://phabricator.wikimedia.org/T252438 (10Nuria) Really nice find @jlinehan [18:49:33] (03CR) 10Nuria: [C: 03+2] Send hourly traffic alarms to individuals only [analytics/refinery] - 10https://gerrit.wikimedia.org/r/601773 (https://phabricator.wikimedia.org/T254256) (owner: 10Mforns) [18:49:35] (03CR) 10Nuria: [V: 03+2 C: 03+2] Send hourly traffic alarms to individuals only [analytics/refinery] - 10https://gerrit.wikimedia.org/r/601773 (https://phabricator.wikimedia.org/T254256) (owner: 10Mforns) [18:57:24] (03CR) 10Nuria: [C: 04-1] "I do not think we are ready to take on this work quite yet, let's hold on until next quarter." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601662 (https://phabricator.wikimedia.org/T212928) (owner: 10Joal) [19:23:01] 10Analytics, 10Analytics-Kanban: Order mediawiki_history dumps by event_timestamp - https://phabricator.wikimedia.org/T254233 (10mforns) a:03mforns [19:45:21] 10Analytics, 10Analytics-Kanban: Order mediawiki_history dumps by event_timestamp - https://phabricator.wikimedia.org/T254233 (10mforns) Indeed, events are not strictly in order. For large and medium wikis, whose dumps are split in monthly or yearly files, each file does only contain event corresponding to the... [20:11:02] 10Analytics, 10Product-Analytics (Kanban): Need libfontconfig1-dev on stat hosts for data visualization work - https://phabricator.wikimedia.org/T254278 (10mpopov) p:05Triage→03Low [20:11:22] joal: yt? [20:20:25] musikanimal: yt? [20:37:37] 10Analytics, 10Analytics-Kanban: Add new kafka brokers kafka-jumbo100[789] to the jumbo-eqiad Kafka cluster - https://phabricator.wikimedia.org/T252675 (10Jgreen) [21:24:16] (03PS1) 10Ottomata: Refine - Make event transform functions smarter about choosing which possible column to use [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601865 [21:29:49] (03CR) 10jerkins-bot: [V: 04-1] Refine - Make event transform functions smarter about choosing which possible column to use [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601865 (owner: 10Ottomata) [21:34:56] 10Analytics, 10Analytics-EventLogging, 10Timeless, 10Patch-For-Review: EventLogging revision popup gets hidden behind content in Timeless - https://phabricator.wikimedia.org/T249557 (10matmarex) 05Open→03Resolved a:03ashley