[00:03:29] PROBLEM - Check the last execution of analytics-dumps-fetch-pageview on labstore1006 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-pageview https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:31:34] 10Analytics, 10Event-Platform, 10MediaWiki-Core-Testing, 10Patch-For-Review: Integration tests die with "HTTP requests to blocked. Use MockHttpTrait." when EventBus extension is installed - https://phabricator.wikimedia.org/T270801 (10Tgr) >>! In T270801#6711308, @Krinkle wrote: > Does this fail in C... [04:38:23] 10Analytics, 10Event-Platform, 10MediaWiki-Core-Testing, 10Patch-For-Review: Integration tests die with "HTTP requests to blocked. Use MockHttpTrait." when EventBus extension is installed - https://phabricator.wikimedia.org/T270801 (10Tgr) The EventLogging extension also has the same problem, althoug... [04:42:20] 10Analytics, 10Event-Platform, 10MediaWiki-Core-Testing, 10Patch-For-Review: Integration tests die with "HTTP requests to blocked. Use MockHttpTrait." when EventBus extension is installed - https://phabricator.wikimedia.org/T270801 (10Tgr) Looking at the EventLogging Vagrant config, it doesn't look l... [08:18:44] 10Analytics: dumps::web::fetches::stats job should use a user to pull from HDFS that exists in Hadoop cluster - https://phabricator.wikimedia.org/T271362 (10elukey) [08:21:51] !log execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/geoeditors" to unblock labstore hdfs rsync [08:21:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:22:31] RECOVERY - Check the last execution of analytics-dumps-fetch-geoeditors_dumps on labstore1006 is OK: OK: Status of the systemd unit analytics-dumps-fetch-geoeditors_dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:24:06] !log execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/pageview" to unblock labstore hdfs rsyncs [08:24:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:25:39] RECOVERY - Check the last execution of analytics-dumps-fetch-pageview_complete_dumps on labstore1006 is OK: OK: Status of the systemd unit analytics-dumps-fetch-pageview_complete_dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:30:00] !log execute hdfs chmod o+x of /wmf/data/archive/projectview /wmf/data/archive/projectview/legacy /wmf/data/archive/pageview/legacy to unblock hdfs rsyncs [08:30:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:31:37] !log restart the failed hdfs rsync timers on labstore100[6,7] to kick off the remaining jobs [08:31:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:34:25] RECOVERY - Check the last execution of analytics-dumps-fetch-pageview on labstore1006 is OK: OK: Status of the systemd unit analytics-dumps-fetch-pageview https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:35:15] RECOVERY - Check the last execution of analytics-dumps-fetch-pageview on labstore1007 is OK: OK: Status of the systemd unit analytics-dumps-fetch-pageview https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:37:21] RECOVERY - Check the last execution of analytics-dumps-fetch-pageview_complete_dumps on labstore1007 is OK: OK: Status of the systemd unit analytics-dumps-fetch-pageview_complete_dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:37:23] the last timer seem to be pulling data without any errors [08:37:35] the remaining bits are things like README files etc.. [08:38:18] for example /wmf/data/archive/pageview/legacy/README [08:38:27] does it need to be readable from dumpsgen? [08:38:59] ah even in mediacount [08:39:53] RECOVERY - Check the last execution of analytics-dumps-fetch-geoeditors_dumps on labstore1007 is OK: OK: Status of the systemd unit analytics-dumps-fetch-geoeditors_dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:47:14] anyway looks good so far, we'll see later [09:57:47] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10ArielGlenn) Or it might be because fetches from HDFS are currently broken;... [10:09:27] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10elukey) @Xaosflux thanks a lot for reporting, the issue should be fixed no... [10:11:36] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) 05Open→03Resolved a:03Xaosflux @elukey yes, confirmed reso... [10:12:21] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) (Whomever actually solved this feel free to claim and note solut... [10:52:49] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) 05Resolved→03Open [10:54:03] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): 403 forbidden error on all newly created dump files - https://phabricator.wikimedia.org/T271616 (10Xaosflux) [10:54:46] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): 403 forbidden error on all newly created dump files - https://phabricator.wikimedia.org/T271616 (10Xaosflux) a:05Xaosflux→03None [10:55:16] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): 403 forbidden error on all newly created dump files - https://phabricator.wikimedia.org/T271616 (10Xaosflux) [10:57:02] ah snap [10:58:47] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) a:05Xaosflux→03None [10:58:52] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) reopened as verification can't proceed until T271616 is resolved [10:59:35] so very interesting [10:59:45] files are correctly owned on hdfs [10:59:49] (I am checking mediacounts) [11:00:01] but not on /srv on labstores, after hdfs-rsync runs [11:00:43] they are missing o+r [11:07:18] joal: bonjour :) if you are around, do you know if there are special cases for hdfs-rsync when assigning permissions? [11:07:44] concrete example [11:07:49] on labstore1006 [11:07:52] -rw-r----- 1 dumpsgen dumpsgen 703211144 Jan 9 00:32 mediacounts.2021-01-08.v00.tsv.bz2 [11:08:33] on hdfs [11:08:38] -rwxrwxr-x 1 99 hdfs 703211144 Jan 9 00:32 mediacounts.2021-01-08.v00.tsv.bz2 [11:11:42] (so when trying to download a mediacount file users get 403 due to missing other perms on labstore [11:12:54] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): 403 forbidden error on all newly created dump files - https://phabricator.wikimedia.org/T271616 (10elukey) Thanks a lot for reporting, there is a remaining issue with the permissions of files on labs... [11:14:36] 10Analytics: dumps::web::fetches::stats job should use a user to pull from HDFS that exists in Hadoop cluster - https://phabricator.wikimedia.org/T271362 (10elukey) There is still a problem with permissions on labstore nodes, so users get a HTTP 403 when trying to download files, as outlined in T271616. We are t... [11:42:35] 10Analytics: Fix the remaining bugs open on for Hue next - https://phabricator.wikimedia.org/T264896 (10nshahquinn-wmf) I found something that doesn't seem to be covered by one of the upstream issues you've opened. When I try to view the list of tables in my Hive database, I get the error: "java.lang.RuntimeExce... [11:54:46] 10Analytics: Fix the remaining bugs open on for Hue next - https://phabricator.wikimedia.org/T264896 (10elukey) >>! In T264896#6733556, @nshahquinn-wmf wrote: > I found something that doesn't seem to be covered by one of the upstream issues you've opened. When I try to view the list of tables in my Hive database... [13:09:48] 10Analytics: Fix the remaining bugs open on for Hue next - https://phabricator.wikimedia.org/T264896 (10nshahquinn-wmf) >>! In T264896#6733558, @elukey wrote: > Hello Neil, can you try the steps outlined in https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue to see if it works? This may be a problem with t... [13:44:35] (03PS2) 10Neil P. Quinn-WMF: Set up and document deployment strategy for jobs [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/651794 (https://phabricator.wikimedia.org/T261953) [14:39:45] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): 403 forbidden error on all newly created dump files - https://phabricator.wikimedia.org/T271616 (10Xaosflux) p:05Triage→03Medium [14:40:20] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) p:05Triage→03Medium [14:45:12] hi elukey [14:46:05] elukey: I think it could be that hdfs-rsync applies hdfs default umask even to regular fs [14:46:32] elukey: hdfs-rsync has a mechanism to change perms when copying [14:46:38] we can use that maybe? [15:04:41] joal: bonjour! Do you mean the --chmod thing? Maybe o+r [15:05:08] I see that we already add o-w, we could add --chmod o+r and that's it [15:05:42] elukey: I sent a patch enforcing 755 for dirs and 644 for files - Is that good? [15:05:49] (patch arriveS) [15:07:12] joal: ah nice yes I think it could work, didn't know about the option [15:07:15] shall I merge? [15:07:35] elukey: please! I tested the syntax, so I'm kinda sure it won't break - Let's see if it works [15:09:00] joal: running puppet and restarting one timer to see how it goes [15:09:06] ack elukey [15:10:14] -rw-r--r-- 1 dumpsgen dumpsgen 703211144 Jan 9 00:32 mediacounts.2021-01-08.v00.tsv.bz2 [15:10:17] \o/ [15:10:42] * joal is very happy to have gone almost full-feature for hdfs-rsync :) [15:11:43] !log restart timers 'analytics-*' on labstore100[6,7] to apply new permission settings [15:11:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:17:01] for some reason I can't restart all the analytics timers in one go [15:18:53] joal: I see from systemctl list-timers that in ~ a couple of hours max we should be very good [15:19:05] I'll check later, but the mediacounts use case is perfect [15:19:06] thanks! [15:19:11] we can go back to our weekend :) [15:22:47] (03CR) 10Neil P. Quinn-WMF: "Thank you very much for the review, Joseph!" (034 comments) [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/651794 (https://phabricator.wikimedia.org/T261953) (owner: 10Neil P. Quinn-WMF) [15:33:18] \o/ [15:33:45] Thanks a lot elukey for triple checking and applying :) [16:11:18] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): 403 forbidden error on all newly created dump files - https://phabricator.wikimedia.org/T271616 (10elukey) We have just applied a fix for this issue, during the next hours every 403 should disappear! [17:40:12] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) [17:40:49] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): 403 forbidden error on all newly created dump files - https://phabricator.wikimedia.org/T271616 (10Xaosflux) 05Open→03Resolved a:03elukey Thank you @elukey - files seem to be accessible now! [17:42:04] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) [17:42:29] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) a:03Xaosflux [17:43:00] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): 403 forbidden error on all newly created dump files - https://phabricator.wikimedia.org/T271616 (10Xaosflux) [17:43:03] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10Xaosflux) [17:46:11] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): 403 forbidden error on all newly created dump files - https://phabricator.wikimedia.org/T271616 (10elukey) @Xaosflux thank you for the feedback and the reports, really appreciated!