[02:58:26] PROBLEM - Check the last execution of monitor_refine_mediawiki_job_events_failure_flags on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_mediawiki_job_events_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:00:45] 10Analytics, 10ContentTranslation, 10Language-Team (Language-2020-Focus-Sprint): Test Performance of Marian NMT translation in stat cluster - https://phabricator.wikimedia.org/T247245 (10santhosh) 05Open→03Resolved >>! In T247245#6063019, @JAllemandou wrote: > Hi @santhosh, > I have thought of the projec... [05:48:14] good morning! [05:57:46] very strange, the RU jobs say that they cannot create the pid file [06:00:02] RECOVERY - Check the last execution of reportupdater-cx on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-cx https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:02:28] RECOVERY - Check the last execution of monitor_refine_mediawiki_job_events_failure_flags on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_mediawiki_job_events_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:05:30] (03PS1) 10Elukey: reportupdater.py: add verbose log exceptions [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599159 [06:05:53] (03CR) 10jerkins-bot: [V: 04-1] reportupdater.py: add verbose log exceptions [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599159 (owner: 10Elukey) [06:06:15] whatt [06:11:52] ah flake8 is not happy because [06:11:53] /reportupdater/reportupdater.py:186:29: E741 ambiguous variable name 'l' [06:11:58] but the error seems already there? [06:12:40] PROBLEM - Check the last execution of reportupdater-cx on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-cx https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:22:04] (03PS2) 10Elukey: reportupdater.py: add verbose log exceptions [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599159 [06:22:06] (03PS1) 10Elukey: tox: run tests also with python 3.7 [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599160 [06:22:08] (03PS1) 10Elukey: reportupdate.py: fix flake8 error E741 ambiguous variable name 'l' [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599161 [06:22:50] (03CR) 10jerkins-bot: [V: 04-1] tox: run tests also with python 3.7 [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599160 (owner: 10Elukey) [06:23:08] (03PS2) 10Elukey: reportupdater.py: fix flake8 error E741 ambiguous variable name 'l' [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599161 [06:23:10] (03PS3) 10Elukey: reportupdater.py: add verbose log exceptions [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599159 [06:23:36] ah sure the flake fix must go first sigh [06:24:09] (03PS3) 10Elukey: reportupdater.py: fix flake8 error E741 ambiguous variable name 'l' [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599161 [06:24:11] (03PS2) 10Elukey: tox: run tests also with python 3.7 [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599160 [06:24:13] (03PS4) 10Elukey: reportupdater.py: add verbose log exceptions [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599159 [06:28:45] so all the RU pid files have May 27 19:00 [06:29:47] that should be more or less when the issue happened [06:30:21] and currently the RU pingback is the only one working [06:30:34] so, I am going to delete those pid files and restart RU jobs [06:32:30] !log delete old RU pid files with timestamp May 27 19:00 (scap deployment failed to an-launcher due to disk issues) except ./jobs/reportupdater-queries/pingback/.reportupdater.pid that was working fine [06:32:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:34:18] RECOVERY - Check the last execution of reportupdater-cx on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-cx https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:40:22] !log slowly restarting all RU units on an-launcher1001 [06:40:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:45:28] I noticed that one RU job logs [06:45:31] pymysql.err.ProgrammingError: (1146, "Table 'testwiki.user_properties' doesn't exist") [06:45:34] but it doesn't fail [06:45:37] not sure if expected [06:45:46] RECOVERY - Check the last execution of reportupdater-browser on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-browser https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:46:24] RECOVERY - Check the last execution of reportupdater-edit-beta-features on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-edit-beta-features https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:46:26] RECOVERY - Check the last execution of reportupdater-ee-beta-features on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-ee-beta-features https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:51:40] RECOVERY - Check the last execution of reportupdater-interlanguage on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-interlanguage https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:52:31] also very interesting that airflow is consuming quite a lot of CPU [06:53:30] RECOVERY - Check the last execution of reportupdater-language on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-language https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:53:35] also another error [06:53:36] PermissionError: [Errno 13] Permission denied: '/srv/reportupdater/jobs/reportupdater-queries/structured-data/wikidata_usage_in_wikimedia_projects' [06:53:47] that leads to no failure [06:54:02] RECOVERY - Check the last execution of reportupdater-ee on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-ee https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:54:20] RECOVERY - Check the last execution of reportupdater-mt_engines on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-mt_engines https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:55:48] RECOVERY - Check the last execution of reportupdater-wmcs on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-wmcs https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:56:28] RECOVERY - Check the last execution of reportupdater-published_cx2_translations_mysql on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-published_cx2_translations_mysql https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:56:50] RECOVERY - Check the last execution of reportupdater-reference-previews on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-reference-previews https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:57:00] RECOVERY - Check the last execution of reportupdater-flow-beta-features on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-flow-beta-features https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:57:00] RECOVERY - Check the last execution of reportupdater-page-creation on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-page-creation https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:58:50] RECOVERY - Check the last execution of reportupdater-published_cx2_translations on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-published_cx2_translations https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:00:20] RECOVERY - Check the last execution of reportupdater-structured-data on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-structured-data https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:00:46] sending a number of hugs to elukey [07:01:45] :D [07:15:06] wow - kids morning today, can't stay long - I bluntly restarted services yesterday evening but didn't furt [07:15:09] 10Analytics, 10Product-Analytics (Kanban): Create Druid tables for Druid datasources in Superset - https://phabricator.wikimedia.org/T251857 (10cchen) @elukey I forgot to click "submit" button... There's no charts/query using this. I was checking each table to make sure they are the same as datasource. And ye... [07:15:25] further - I should have been more careful :( My apologizes elukey [07:16:31] joal: nono it was a sneaky issue, RU left some garbage and it wasn't able to recover, not great :( [07:16:55] I checked because icinga alarms were still in CRIT, as I do all mornings, nothing special :) [07:17:48] elukey: when I restarted the units I didn't check any further but was unhappy of the jobs not saying "recovery" - I should have investigated more yesterday [07:18:06] anyway, thanks again :) back top kids [07:53:52] https://github.com/apache/incubator-superset/issues/2686 - very interesting [08:27:33] 10Analytics, 10Wikidata, 10Wikidata-Query-Service: Increase retention for mediawiki.revision-create on the kafka jumbo cluster - https://phabricator.wikimedia.org/T253753 (10dcausse) @JAllemandou I think that is an option as well, the thing is that is it is transitional to help to bootstrap a test of the ful... [09:23:34] joal: o/ [09:24:06] if/when you're around, I'd have few questions about your hdfs rsync tool [09:31:20] dcausse: early afternoon? [09:31:44] joal: 2pm or 3pm I have a meeting at 2:30 [09:31:55] 2pm is fine [09:32:02] err no [09:32:11] hehe :) 3pm then :) [09:32:27] joal: ok! thanks! :) [10:18:02] (03PS1) 10Elukey: Add gunicorn[gevent] dependency. [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/599295 (https://phabricator.wikimedia.org/T253545) [10:35:24] May 28 10:35:08 an-tool1005 superset[8281]: [2020-05-28 10:35:08 +0000] [2] [INFO] Using worker: gevent [10:35:27] \o/ [10:49:05] (03PS24) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [10:50:24] 10Analytics, 10Analytics-Kanban: Update oozie SLAs for pageview-daily-dumps and wikidata-entities jobs - https://phabricator.wikimedia.org/T253847 (10JAllemandou) [10:50:29] 10Analytics, 10Analytics-Kanban: Update oozie SLAs for pageview-daily-dumps and wikidata-entities jobs - https://phabricator.wikimedia.org/T253847 (10JAllemandou) a:03JAllemandou [10:54:09] 10Analytics, 10Patch-For-Review: Test superset running on gunicorn + gevent - https://phabricator.wikimedia.org/T253545 (10elukey) ` May 28 10:35:08 an-tool1005 superset[8281]: [2020-05-28 10:35:08 +0000] [2] [INFO] Starting gunicorn 20.0.4 May 28 10:35:08 an-tool1005 superset[8281]: [2020-05-28 10:35:08 +0000... [10:59:04] (03PS1) 10Joal: Update oozie SLAs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/599302 (https://phabricator.wikimedia.org/T253847) [11:00:14] git up [11:00:16] oops [11:03:49] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Test superset running on gunicorn + gevent - https://phabricator.wikimedia.org/T253545 (10elukey) p:05Triage→03Medium a:03elukey [11:05:27] (03PS25) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:07:22] (03PS26) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:18:56] * elukey lunch! [11:29:50] if you have a second I'd love to ask you a q about oozie datasets [11:31:48] sorry joal ^ [11:31:59] sure fdans [11:32:50] joal: I'm on the caveaux :) [11:32:57] Ah! [11:33:03] J'ARRIVE :) [11:41:16] (03PS27) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:45:26] 10Analytics, 10Analytics-Kanban: Fix oozie event dataset file - https://phabricator.wikimedia.org/T253855 (10JAllemandou) [11:45:38] 10Analytics, 10Analytics-Kanban: Fix oozie event dataset file - https://phabricator.wikimedia.org/T253855 (10JAllemandou) a:03JAllemandou [11:46:08] (03PS1) 10Joal: Fix event dataset padded values [analytics/refinery] - 10https://gerrit.wikimedia.org/r/599309 (https://phabricator.wikimedia.org/T253855) [11:48:47] joal: you got some inspiration from our conversation? ^ [11:49:09] fdans: interestingly I was solving a pretty related problem :) [11:55:23] (03PS1) 10Joal: Add mediawiki_page_restrictions table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/599310 (https://phabricator.wikimedia.org/T253803) [13:15:41] 10Analytics, 10Analytics-Kanban: Wikistats time selector shifts backwards when selecting custom ranges - https://phabricator.wikimedia.org/T253861 (10Milimetric) p:05Triage→03High [13:26:58] joal: oops, I was wrong to be confident, https://hue.wikimedia.org/oozie/list_oozie_coordinator/0018287-200507064132789-oozie-oozi-C/ looks for the zero-padded date pieces like 2020/05/01 and the data is in folders like 2020/5/1 [13:28:37] milimetric: I sent a patch :) [13:28:52] doh [13:29:05] sorry & thx [13:29:32] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix event dataset padded values [analytics/refinery] - 10https://gerrit.wikimedia.org/r/599309 (https://phabricator.wikimedia.org/T253855) (owner: 10Joal) [13:29:36] np milimetric [13:29:45] I'll deploy and restart the job? [13:30:17] milimetric: there is another bunch of patches that should be merged before we do that :) [13:30:35] ok, I'll take a look at your other ones and merge what I can, let me know when and I can deploy [13:31:06] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update oozie SLAs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/599302 (https://phabricator.wikimedia.org/T253847) (owner: 10Joal) [13:33:31] (03CR) 10Milimetric: [C: 03+2] "looks good but didn't test" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/599310 (https://phabricator.wikimedia.org/T253803) (owner: 10Joal) [13:38:53] * elukey afk for a bit! (need to run a quick errand) [14:03:59] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) Hm, we should settle on a config key for identifying the eventgate instance. The obvious one to u... [14:10:44] (03PS28) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [14:15:57] (03PS29) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [14:18:18] milimetric: shall we merge those 3 patches and deploy-hotfix? [14:18:58] (03PS30) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [14:19:18] joal: all for it but which 3? I merged 2 [14:19:38] the one you +2ed :) [14:19:52] without test [14:20:03] oh ok, I didn’t want to jump the test this time, is it ok? [14:20:28] milimetric: I have not tested, and will take responsibility for that :) [14:21:08] ok, will test then, at least dryrun [14:21:46] elukey: I also have a request of merge for you (the page_restrictions sqoop patch - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/599112/) [14:22:11] thanks milimetric - I feel ashame to let you test my patch - I should have stepped up [14:22:14] joal: is this ready? https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/597541/ [14:22:23] psh, no prob [14:22:56] milimetric: I don't think it is - We should check with fdans - we talked earlier today and he was still testing [14:35:55] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add mediawiki_page_restrictions table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/599310 (https://phabricator.wikimedia.org/T253803) (owner: 10Joal) [14:36:12] ok joal, shall I deploy then? [14:36:22] milimetric: works for me! [14:36:37] doing [14:37:01] joal: wait, is this the weekly train or just special case refinery-only for those last 3 changes of yours? [14:37:14] milimetric: hotfix - I deployed yesterday [14:37:18] k [15:00:49] joal: standup! [15:00:55] oh hello!@ [15:19:23] 10Analytics, 10Analytics-Kanban, 10Wikidata, 10Wikidata-Query-Service: Increase retention for mediawiki.revision-create on the kafka jumbo cluster - https://phabricator.wikimedia.org/T253753 (10Milimetric) p:05Triage→03High a:03Ottomata [15:21:15] 10Analytics, 10Event-Platform: Write blog post(s) about MEP - https://phabricator.wikimedia.org/T253649 (10Milimetric) p:05Triage→03High [15:21:51] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Performance-Team (Radar): Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10Milimetric) p:05Triage→03High [15:23:05] 10Analytics, 10Analytics-Kanban: reportupdater should run with python3 - https://phabricator.wikimedia.org/T253418 (10Milimetric) p:05Triage→03High a:03elukey [15:24:07] 10Analytics: analytics.wikimedia.org TLC - https://phabricator.wikimedia.org/T253393 (10Milimetric) p:05Triage→03Medium [15:24:19] 10Analytics, 10Analytics-Kanban: analytics.wikimedia.org TLC - https://phabricator.wikimedia.org/T253393 (10Milimetric) [15:24:57] 10Analytics, 10Better Use Of Data, 10Event-Platform: Document in-schema who sets which fields - https://phabricator.wikimedia.org/T253392 (10Milimetric) p:05Triage→03Medium [15:26:16] 10Analytics: Web publication doesn't work - https://phabricator.wikimedia.org/T253661 (10Ottomata) The /srv/published sync thing is complicated in that it is multi-source, and if there are files with the same path names, which one gets synced is not defined (but I'd guess that the latest alphabetically sorted ho... [15:26:36] 10Analytics: Grant not able to access superset - https://phabricator.wikimedia.org/T253281 (10Milimetric) p:05Triage→03High [15:27:07] 10Analytics: Make it easy to debug eventlogging instrumentation, add ability to send client canary events. - https://phabricator.wikimedia.org/T253239 (10Milimetric) p:05Triage→03Medium [15:28:32] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: eventgate-wikimedia should expose runtime stream configuration - https://phabricator.wikimedia.org/T253157 (10Milimetric) p:05Triage→03High [15:30:22] 10Analytics, 10Event-Platform: DRY kafka broker declaration into helmfiles from puppet - https://phabricator.wikimedia.org/T253058 (10Milimetric) p:05Triage→03Low [15:31:22] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Create a profile to standardize the deployment of JVM packages and configurations - https://phabricator.wikimedia.org/T253553 (10Milimetric) p:05Medium→03High [15:31:58] elukey: have you bumped turnilo? [15:32:15] joal: nope [15:32:50] anything on fire? [15:33:53] nope elukey, we're tryaging :) [15:35:24] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 3 others: Set up an instance of EventStreams in beta that will allow for consuming any stream - https://phabricator.wikimedia.org/T253069 (10Milimetric) ping @Milimetric (yes that's myself) to write the GUI [15:36:03] 10Analytics, 10Operations, 10Traffic: Remove ganglia leftovers from ops/puppet - https://phabricator.wikimedia.org/T253555 (10Milimetric) p:05Low→03Medium a:03Ottomata [15:46:44] Hi! [15:47:18] I have a job that reach 99% reduce, and then fails the reduce tasks because of java.net.SocketTimeoutException: Read timed out [15:47:46] https://yarn.wikimedia.org/jobhistory/logs/an-worker1094.eqiad.wmnet:8041/container_e11_1589903254658_34422_01_000192/attempt_1589903254658_34422_r_000015_0/dedcode/syslog.shuffle/?start=0 [15:58:53] hi djellel - we were in meeting [16:00:13] joal: hello! no worries :) [16:03:52] djellel: some suggestions: disable speculation, and put slow-start to 0 [16:04:38] djellel: slow-start parameter: mapreduce.job.reduce.slowstart.completedmaps should be set to 0.0 (default to 0.05) [16:05:06] djellel: and speculative maps: mapreduce.map.speculative should be set to false [16:05:34] You can pass those parameters to your job using -D settings (before the jar) [16:05:38] djellel: --^ [16:14:19] 10Analytics, 10Product-Analytics, 10MW-1.35-notes (1.35.0-wmf.35; 2020-06-02), 10Patch-For-Review: [Spike] Should EventLogging support DNT? - https://phabricator.wikimedia.org/T252438 (10kaldari) 05Open→03Resolved a:03kaldari Marking this as resolved (more or less). The conclusion is that EventLoggin... [16:14:45] djellel: does it make sense what I'm suggesting? [16:17:07] 10Analytics: Web publication doesn't work - https://phabricator.wikimedia.org/T253661 (10SNowick_WMF) Not sure why I had a copy there but I have deleted the weekly_edits directory on stat1007. [16:21:43] 10Analytics: Web publication doesn't work - https://phabricator.wikimedia.org/T253661 (10jwang) Thank you @Ottomata, @SNowick_WMF How often do we auto sync now? [16:28:01] joal: Yes! I will do that. thank you :) [16:28:29] cool djellel :) [16:41:28] gone for diner - back after [16:44:28] errand for a bit! [17:13:07] ok mforns can help when you are ready [17:13:18] hey! :] ok [17:13:21] wanna bc? [17:15:29] ya! [17:15:33] k [17:26:38] * elukey off! [17:26:39] o/ [17:43:02] 10Analytics: Web publication doesn't work - https://phabricator.wikimedia.org/T253661 (10mpopov) Perhaps related but @SNowick_WMF and I ran into a perplexing permissions issue. When I tried to run `published-sync` on stat1007 I got the following: ` /usr/bin/flock -n /var/lock/published-sync -c /usr/bin/rsync -r... [18:31:09] !log after deployment, restarted four oozie jobs with new SLAs and fixed datasets definitions [18:31:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:40:48] mforns: lostya [18:41:17] milimetric: interesting failure for wikidata-item-page-link [18:42:52] joal: oops, didn't start it with the right refinery [18:43:04] arg, true of all the restarts, will redo. [18:44:14] interesting milimetric - config says hdfs://analytics-hadoop/wmf/refinery/current/oozie as oozie folder [18:44:24] should be correct no? [18:44:58] no, it needs to be a specific one, got with the shell list/sort descending thing [18:45:24] I always forget something with oozie, too much manual involvement needed [18:45:27] true milimetric, but still, current should be the last update [18:45:46] yeah, it's weird that current doesn't have the 0.125 jar, that was a couple versions ago [18:46:02] nah 125 is the last created [18:46:28] mforns: back in a bit, want to try to repro using https://arrow.apache.org/docs/python/generated/pyarrow.fs.HadoopFileSystem.html#pyarrow.fs.HadoopFileSystem instead [18:46:58] apparently https://arrow.apache.org/docs/python/filesystems_deprecated.html#hadoop-file-system-hdfs is deprecated [18:47:02] i think thats the one you are using [18:47:47] hey ottomata sorry internet hiccup [18:48:11] milimetric: I think version 0.0.125 of the jars has not been added to refinery [18:48:21] milimetric: jars are not present and I can't find the commit [18:48:27] oh... [18:48:32] 10Analytics: Decomission notebook hosts - https://phabricator.wikimedia.org/T249752 (10cchen) I've cleaned out my notebooks as well. Thank you! [18:48:38] I did that deploy... pretty sure I ran the jenkins job... [18:48:45] but no commit... no jars [18:50:08] joal: yep, here it is: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/15/console [18:50:14] no error, and yet... no jars [18:50:18] weird... [18:50:36] AH milimetric! Have you merged the PR? [18:51:07] milimetric: https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/597534/ [18:51:10] what/! [18:51:19] oh!!!! that's new... in the old process we didn't have that [18:51:21] I had no idea [18:51:28] milimetric: new job pushes a PR - indeed that's new [18:51:43] Mwarf - We should have better communicate about that :( [18:51:50] oh yeah, I had zero idea this was the case. That definitely needs to be in super bold in the docs - will update :) [18:51:53] We updated docs but didn't mention in standup - definierly should [18:52:08] sorry for the mess milimetric :( [18:52:17] np at all, tiny mess, learned something, thx [18:53:36] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "Didn't realize we had to do this as part of deploy now. Have updated docs to highlight that this is the case." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597534 (owner: 10Maven-release-user) [18:53:58] will wait the 5-10 minutes now, then deploy again, then restart again [18:54:03] I've killed all the jobs in the meantime [18:54:09] ack milimetric [19:01:03] milimetric, ottomata - We should keep an eye on this - https://iceberg.apache.org/ - It's not yet to level we want it, but version 2 (see specs https://iceberg.apache.org/spec/) might actually be what we're after, the apache way (instead of hudi/delta) [19:01:56] woa "Schema evolution supports add, drop, update, or rename" [19:17:43] btw back fdans [19:17:44] oops [19:17:46] wrong ping sorry! [19:17:47] meant mforns [19:18:00] not sure how to properly instantiate fs.HadoopFileSystem but am trying [19:18:07] hey ottomata , same here [19:18:14] can't find out what to put as namenode [19:18:41] I tried hdfs://an-master1001.eqiad.wmnet:8020/?user=analytics, but I get an error [19:19:06] well I tried many other things, and same [19:22:50] mfornsgot it [19:22:54] export CLASSPATH=$(hadoop classpath --glob) [19:23:01] from pyarrow import fs [19:23:05] hdfs = fs.HadoopFileSystem("hdfs://analytics-hadoop/") [19:24:33] ottomata: ok!! [19:26:05] ottomata: can I overwrite my code in an-launcher or are you modifying it? [19:26:29] nope go ffor it [19:26:32] haven't touched it [19:26:37] exited screen [19:26:39] k! [19:33:34] ooo [19:33:41] joal: looking [19:39:00] wow joal the partition transform on a timestamp with year,month,day,hour etc is really cool! [19:39:09] It is!!! [19:39:36] ottomata: row-deletion feature in v2 is the missing feature as of now IMO [19:39:47] i mean, even without row deletion this is nice [19:39:55] we don't really need row deletion that much, do we? [19:40:36] row deletion = row update IIUC - And we need that for increental updates [19:41:45] WHY DOES EVERYONE INVENT THEIR OWN VERSINO OF JSON SCHEMA [19:41:47] WHYYYY???? [19:41:57] looking at https://iceberg.apache.org/spec/#appendix-c-json-serialization [19:42:07] djellel: I still confirm slowstart should be closer to 1.0 as your mappers take long :) [19:42:19] ottomata: thanks for your help! I started to modify the code of refine_plugin, but I found a small bug (not related), so will continue tomorrow, tired now. I hope the newer client works with parallel :] [19:42:24] ok [19:42:26] i hope so too! [19:42:29] djellel: mappers are long, if reducers start early, they spend a long time waiting [19:45:41] joal: I set slow-start to 0 [19:46:34] djellel: it got set to 0.8, cause 0.0 means start all reducers to 0 - This makes me realize that I might have given you the wrong value the last time - Di I say 0? [19:47:26] yes [19:48:08] Meh - Completely wrong I was djellel - 1.0 it should be, for reducers to start when all mappers are done - sorry for the mistake [19:48:35] at least now the job makes progress djellel [19:48:40] let's see if it finishes [20:09:30] This is a bit of a long anecdote, but: I was looking at stats.wikimedia.org today and wanted to "split by access method" on the User Edits graph. This isn't possible, so I did my own query to find the split for the population I was interested in (people encountering the edit conflict interface)--but now I'm soul-searching about the assumptions behind what I did. And that's why I'm here, [20:09:37] spamming your channel :-D [20:10:07] Maybe product decisions about editing features should be made according to *overall* readership demographics, rather than an arbitrary and historically accidental snapshot of today's editor demographics? [20:12:59] For example, mobile editorship is lower in absolute numbers than desktop, but it would be wrong if that fact led us to prioritize a feature for desktop, breaking parity for mobile. I'm sure there's a FAQ or an essay already? [20:16:13] 10Analytics: Web publication doesn't work - https://phabricator.wikimedia.org/T253661 (10Ottomata) > How often do we auto sync now? Every 15 minutes. Your files have been synced, but I think that varnish must just be caching the old ones. Let's check tomorrow. > a perplexing permissions issue Hm! Weird, the... [20:21:19] joal: fyi, this is to compute the diff (old discussion we had). This job is processing one (!) part of the enwiki history. I have to compute for nearly all wikis o_O .. I need an efficient way to scale. [20:44:05] 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Jan-Mar 2020), 10Goal, 10Patch-For-Review: Create a WMCS edits dashboard via Dashiki - https://phabricator.wikimedia.org/T226663 (10bd808) [20:55:24] 10Analytics, 10Product-Analytics, 10Core Platform Team Workboards (Clinic Duty Team): Update mediawiki_user_blocks_change to log partial block parameters - https://phabricator.wikimedia.org/T252455 (10Pchelolo) a:03Pchelolo [20:58:21] 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Milimetric) Ok, dumps took longer than expected but it's done now. There's a little snag: new schema / spark produc...