[08:03:11] Labs-Team, Analytics-Engineering: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#823759 (Eloquence) [20:11:57] Analytics-Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://phabricator.wikimedia.org/T71615#823935 (QChris) Happened again for: * 2014-12-07T15/2H (on bits) [20:20:54] !log Marked raw bits webrequest partition for 2014-12-07T16/1H ok (See {{PhabT|71615#823935}}) [21:23:10] Analytics-Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://phabricator.wikimedia.org/T71615#823991 (QChris) Happened again for: * 2014-12-06T19/2H (on upload) [21:24:36] !log Marked raw bits webrequest partition for 2014-12-06T19/2H ok (See {{PhabT|71615#823991}}) [22:04:02] (PS1) Merlijn van Deen: Add meta info for run, rev and query [analytics/quarry/web] - https://gerrit.wikimedia.org/r/178128 [22:04:05] (PS1) Merlijn van Deen: Add metadata to json output [analytics/quarry/web] - https://gerrit.wikimedia.org/r/178129 [22:04:14] (CR) jenkins-bot: [V: -1] Add meta info for run, rev and query [analytics/quarry/web] - https://gerrit.wikimedia.org/r/178128 (owner: Merlijn van Deen) [22:04:16] (CR) jenkins-bot: [V: -1] Add metadata to json output [analytics/quarry/web] - https://gerrit.wikimedia.org/r/178129 (owner: Merlijn van Deen) [22:05:39] (CR) Yuvipanda: [C: -1] Add meta info for run, rev and query (1 comment) [analytics/quarry/web] - https://gerrit.wikimedia.org/r/178128 (owner: Merlijn van Deen) [22:54:40] Analytics-Refinery: Raw webrequest partitions that were not marked successful due to single message being missing - https://phabricator.wikimedia.org/T76977 (QChris) NEW p:Triage [22:56:08] Analytics-Refinery: Raw webrequest partitions that were not marked successful - https://phabricator.wikimedia.org/T72085#824047 (QChris) [22:57:06] Analytics-Refinery: cp3009.esams.wikimedia.org lost a kafka message on 2014-12-06T12:53:43 - https://phabricator.wikimedia.org/T76978#824048 (QChris) [22:58:38] Analytics-Refinery: Raw webrequest partitions that were not marked successful due to single message being missing - https://phabricator.wikimedia.org/T76977#824055 (QChris) Losing a single message is not much of an issue. But it's happening more and more, so it might hide some other unnoticed issue. Hence, we... [22:58:58] Analytics-Refinery: Raw webrequest partitions that were not marked successful due to single message being missing - https://phabricator.wikimedia.org/T76977#824056 (QChris) [22:59:20] Analytics-Refinery: Raw webrequest partitions that were not marked successful - https://phabricator.wikimedia.org/T72085#824057 (QChris) [22:59:50] Analytics-Refinery: Raw webrequest partitions that were not marked successful due to single message being missing for unknown reason - https://phabricator.wikimedia.org/T76977#824058 (QChris) [23:01:27] !log Marked raw upload webrequest partition for 2014-12-06T12/1H ok (See {{PhabT|76978}}) [23:27:23] Analytics-Refinery: cp4006.ulsfo.wmnet lost a kafka message on 2014-12-01T01:47:50 - https://phabricator.wikimedia.org/T76980 (QChris) NEW p:Triage [23:28:14] !log Marked raw upload webrequest partition for 2014-12-01T01/1H ok (See {{PhabT|76980}}) [23:31:53] qchris: mind if I ask what exactly ^ means? [23:32:41] YuviPanda: We have monitoring on the data that we ingest in the Analytics cluster. [23:32:52] manually? [23:32:56] oh [23:32:57] I see [23:33:03] so sometimes you have to check them manually? [23:33:25] If some requests are missing (sequence numbers, ...), the partition does not get marked successful, and Oozie does not do it's magic on the partitions. [23:33:58] So if the automatic checks fail to assess that the partition is good, I look manually, to understand what is going on, and whether [23:34:07] there are anythings that need more attention. [23:34:37] Currently the monitoring is yes/no. But for consumers of the data, it is relevant how much data is missing. From which hosts etc. [23:35:55] We're still learning on what issues this kafka thing has, and how it affects us :-) [23:36:51] Like ... why is esams bits so often missing messages ~18:00? [23:48:20] qchris: aaah, right :) [23:48:23] qchris: makes sense now :) [23:56:38] Analytics-Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://phabricator.wikimedia.org/T71615#824100 (QChris) [23:58:05] !log Marked raw upload webrequest partition for 2014-11-30T21/2H ok (See {{PhabT|71615|824100}})