[18:32:18] hello, anyone know what's up with https://phabricator.wikimedia.org/T150524 ? [18:32:36] that phab is referring to the dump files, but this also seems to be affecting the API [18:33:00] I got a few complaints about data missing since 9 November, when typically data for the 10th/11th would have shown up by now [18:35:12] 10Analytics, 10Datasets-General-or-Unknown: pageviews files missing since yesterday 10th November - https://phabricator.wikimedia.org/T150524#2788564 (10MusikAnimal) The issue also seems to be affected the API ([[ https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Cat/... [19:18:11] musikanimal: I can see data from 10/11 now in dumps, and the example that you provided in the task seems returning data [19:18:15] am I missing something? [19:19:00] maybe it's caching, I only have data up to 11/9, and others are reporting the same [19:19:14] I am very curious how that caching works, not the first time this has come up [19:19:36] where some users see data and others do not, from the API directly [19:20:27] ah you are right sorry [19:20:42] (I was indeed missing something :) [19:22:13] ah snap the cassandra jobs are waiting for data [19:22:15] this is why [19:23:40] !log Launch 0028421-161020124223818-oozie-oozi-B to cover for webrequest-load hours 19-20 missing on 2016-11-10 [19:23:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:23:42] so basically when data is refined (IIRC, I am more focused on ops stuff :) we set a _SUCCESS file in each hour dir [19:23:47] hola joal ! [19:23:49] he is the man [19:23:52] Ha ! [19:23:57] Didn't know you were here [19:23:59] Just the mess [19:24:08] I saw the cassandra load jobs waiting [19:24:14] There are 2 hours missing for last Thursday [19:24:23] gneee [19:24:36] like this one? [19:24:49] sorry, two days ago? [19:24:58] because IIRC we deployed the refinery [19:25:04] When nuria restarted job, 2 hours were lost in translationg [19:25:16] all right makes sense [19:25:39] and these are reported in the email [19:25:59] If you backlog chan, she had trouble with oozie, and discussing with ottomata they restarted a job starting at 21:00, while hours 19:00 and 20:00 were not done [19:26:08] correct elukey [19:26:19] Should bge fixed in 2 hours hopefully [19:26:27] 2016-11-10T19 and 2016-11-10T20 [19:26:29] snap [19:26:50] I am going to update the task [19:26:53] thanks joal ! [19:26:55] Classical oozie mistake: you kill the job without having checked which of the materialized jobs had finished [19:27:15] (I am waiting my flight for seville, two hours of delay :/) [19:27:23] Mwarf elukey :() [19:27:27] Best of luck with that [19:27:47] I'll arrive tomorrow, around 20:00 (depending on delay and so on) [19:27:58] If you agree elukey, let's have diner tomorrow together :) [19:29:55] 10Analytics, 10Datasets-General-or-Unknown: pageviews files missing since yesterday 10th November - https://phabricator.wikimedia.org/T150524#2788564 (10elukey) Sorry for the inconvenience! We deployed the new Analytics refinery on Thursday and one of its task was to kill and restart the Oozie jobs responsib... [19:30:09] joal: +2 [19:30:23] Yay elukey :) [19:30:45] musikanimal: I commented the task, really really sorry for the incovenience. Joal just fixed it, we should get data back during then next hours [19:32:50] also answered to the email on analytics@ [19:34:09] Thanks a lot elukey for having managed communication [19:34:23] elukey: I sent an email to alert to explain the thing. [19:36:23] I missed it yesterday :( [19:36:28] what are the materialized jobs? [19:37:49] elukey: oozie materializes a job in waiting state at a given point in time [19:38:10] You can look at any webrequest load coordinator, there are jobs in orange, waiting [19:38:17] Those are materialized but not started jobs [19:38:20] thanks! [19:38:33] np musikanimal, sorry for the problem [19:38:39] all good [19:40:33] joal: ahh ok! The jobs waiting! So basically those were not restarted as well [19:41:02] elukey: you need to set your start time at the point where the jobs were waiting (or were running, if you kill) [19:43:03] 10Analytics, 10Datasets-General-or-Unknown: pageviews files missing since yesterday 10th November - https://phabricator.wikimedia.org/T150524#2790334 (10elukey) p:05Triage>03Normal [19:45:41] elukey: online checkin, done, boarding passes ready: getting there ! [19:46:31] :) [19:51:51] good luck for your flight elukey [19:53:02] thankssss