[05:19:05] Analytics-Kanban, EventBus, Patch-For-Review, Services (watching), and 3 others: Empty body in EventBus request - https://phabricator.wikimedia.org/T148251#2773531 (mobrovac) This is a good avenue to pursue, but we need to figure out what to do if/when the field in question is a required one, as... [08:45:12] !log started 0019686-161020124223818-oozie-oozi-C to re-run wf-text-2016-11-4-19 -> wf-upload-2016-11-4-20 [08:45:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:47:02] !log started 0019693-161020124223818-oozie-oozi-C to re-run wf-text-2016-11-5-00 -> wf-upload-2016-11-5-07 [08:47:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:42:58] (PS29) Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [09:44:44] (PS30) Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [10:42:01] mmmm joal wf-text-2016-11-5-00 -> wf-upload-2016-11-5-07 ? [10:42:12] all text right? These are new :( [10:49:24] ok so text-ulsfo and text-codfw have been migrated to varnish 4 [10:49:35] and ulsfo now points to codfw rather than eqiad [11:05:28] elukey: Correct, it was text partitions, wrong copy-pate [11:08:49] joal: o/ [11:09:07] I am checking varnishlog output on ulsfo [11:09:20] and it seems the same problem that was fixed yesterday with upload [11:09:22] sigh [11:09:40] :( [11:16:24] elukey: I let you handle the varnish side (it probably can wait monday) [11:16:33] elukey: I'll take care of this weekend jobs rerun [11:16:52] elukey: easier if only one of us do it, no need for synchro etc [11:17:05] sure! [11:21:11] joal: do you think that I could run now an hive query in root.priority? [11:22:26] the bright side is that upload seems quiet finally :) [11:23:00] elukey: You can even use production for those things [11:23:16] true, upload is quiet since yesterday ! Yay ! [11:26:02] joal: how can I set the queue? I know that you already told me :( [11:26:12] no bother elukey :) [11:26:25] job has already started or not? [11:26:54] yes it is a hive query, I think it is in accepted atm [11:27:06] application_1476969128131_44799 [11:27:12] ok, then it's on stat100[24] machine [11:27:44] Done :) [11:27:52] elukey: yarn application --movetoqueue application_1476969128131_44799 --queue production [11:28:33] elukey: I'm sorry, edit-history job taes a lot of resources :( [11:30:48] niceeee [11:30:50] thanks :) [11:30:57] * elukey takes notes [11:31:21] ;) [11:31:48] elukey: it makes me feel useful ;) [11:33:24] don't say silly things :D [11:35:11] so I suspect that the issue is in ulsfo again like yesterday [11:35:54] I found a request that returns "Expires: Thu, 01 Jan 1970 00:00:00 GMT" :P [11:38:24] wow [11:38:40] That request has started a loooooooong time ago :) [12:19:44] ok so the numbers are really low compared to the overall traffic [12:20:01] I am re-running some queries but I'd say that we are seeing only corner cases in text [12:20:10] creating the usual holes [12:30:20] /w/index.php?title=Template%3AAdvancedSiteNotices%2Fajax&action=render for id.w.o seems the most problematic one, affecting ~60k req/host maximum [12:30:35] (for a single hour) [12:30:50] all right going afk! [12:44:50] Thanks elukey for double checking that ! [18:05:25] !log started 0020254-161020124223818-oozie-oozi-C to re-run wf-text-2016-11-5-10 [18:05:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:53:18] Analytics, Discovery, Discovery-Analysis, RfC: RFC: Requirements for analytics stats processor - https://phabricator.wikimedia.org/T150028#2774047 (Yurik) @Nuria thanks, for web request, what are the monitoring tools you use to notify of failed jobs? Also, what mechanisms do you have for backfill...