[08:16:41] 10Analytics, 10Dumps-Generation, 05Security: Pageview dumps incorrectly formatted, looks like a result of possibly malicious activity - https://phabricator.wikimedia.org/T144100#2588550 (10Nemo_bis) >>! In T144100#2855217, @Anomie wrote: >>>! In T144100#2855187, @Tgr wrote: >> There is no HTTP response code... [09:03:00] joal: checking the sequence stats issue [09:03:08] from the error file: 249 requests (0.0% of total) have incomplete records. 6490442 requests (2.726% of valid ones) were lost [09:03:40] IIRC this is the first time that we restart kafka brokers with Marcel's new alarms [09:04:41] so there might be two possibilities: 1) there is something weird with the sequence holes calculation 2) we didn't see the issue before because our previous alarms were hiding it [09:08:59] I can see connections refused in the varnishkafka's logs on cp3033 but nothing seems to be dropped, same thing from the metrics [09:23:24] and also we use kafka.topic.request.required.acks = 1 (so the broker needs to ack) and max.retries = 3 [09:26:01] ok opening a phab task [09:38:15] 06Analytics-Kanban: Webrequest dataloss registered during the last kafka restarts - https://phabricator.wikimedia.org/T152674#2856720 (10elukey) [09:38:24] 06Analytics-Kanban: Webrequest dataloss registered during the last kafka restarts - https://phabricator.wikimedia.org/T152674#2856733 (10elukey) p:05Triage>03High [09:40:38] elukey: Thanks for having looked [09:42:02] elukey: Given that loss happens at the same hour on every webrequest_source and that the incomplete records on text source is the roughly same accross various hours I think they are not realted [09:42:47] joal: not related? [09:43:00] incomplete records and data loss on that specific case [09:43:09] ah yes sorry, I agree [09:43:15] ok :) [09:43:36] It could be an issue with new alarms or a pre-existing issue that we just unveiled :) [09:43:45] As we said yesterday with ottomata, and as you wrote in the phab, seems related to drop from vk to kafka at kafka restat [09:44:24] but I don't see any evident error supporting this from the metrics, might be something tricky with librdkafka [09:44:56] we have never questioned that part of varnishkafka [09:45:08] because we have been concentrating on the Varnish side [09:45:22] now that it is fixed, we might need to work on the Kafka side :P [09:45:30] anyhow, going back to vacation mode :) [09:46:01] o/ [09:46:14] elukey: Ooooh, sorry I forgot about vacations !!!! You shouldn't have come, it could have waited for monday [09:46:26] * joal apologizes to elukey [09:47:34] nono I would have checked anyway, you know me :) [09:48:04] byyyeee [14:27:17] 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 13Patch-For-Review, and 2 others: RecentChanges in Kafka - https://phabricator.wikimedia.org/T152030#2857048 (10Ottomata) @gwicke, let's move this discussion over to T149736. This ticket is about getting the Mediawiki Recent Change feed into Kafka. [14:57:20] 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 06Services (watching), 15User-mobrovac: Bikeshed what events should be exposed in public EventStreams API - https://phabricator.wikimedia.org/T149736#2857147 (10Ottomata) Bringing some comments from @gwicke over from T152030: > Revision-create is lower... [15:45:04] 06Analytics-Kanban: Webrequest dataloss registered during the last kafka restarts - https://phabricator.wikimedia.org/T152674#2857203 (10Ottomata) I ran [[ https://github.com/wikimedia/analytics-refinery/blob/master/hive/webrequest/select_missing_sequence_runs.hql | select_missing_sequence_runs ]] for webrequest... [17:23:00] (03PS1) 10Ottomata: Fix select_missing_sequence_runs query to report seqs on edges of holes, not in [analytics/refinery] - 10https://gerrit.wikimedia.org/r/325961 [17:33:47] ashgrigas: are you in batcave? [17:33:57] i thought so [17:33:57] https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave [17:33:59] :) [17:34:01] is this not it [18:13:59] ok milimetric, joal fyi, i'm going to restart kafka1012 and watch some logs to repro again [18:14:19] * milimetric crosses fingers [18:15:22] !log restarting broker on kafka1012 to repro T152674 [18:15:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:15:24] T152674: Webrequest dataloss registered during the last kafka restarts - https://phabricator.wikimedia.org/T152674 [18:35:48] ottomata: no alarms thus far? Well, maybe we need to wait an hour [18:36:22] (03CR) 10Nuria: [C: 032] Fix select_missing_sequence_runs query to report seqs on edges of holes, not in [analytics/refinery] - 10https://gerrit.wikimedia.org/r/325961 (owner: 10Ottomata) [18:36:56] we need to wait yeah, but i'm poking around and i can't find anything wrong [18:37:16] !log preferred-replica-election on analytics kafka cluster to bring 1012 back as leader for its partitions [18:37:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:38:53] ottomata, nuria : Hour 18 refine jobs will start runnning in about 25 minutes (at the end of the hour) [18:41:27] milimetric: would you have a minute for an edge-case bug discussion? [18:41:50] joal: lemme know when you want me to start reviewing [18:41:55] the mw history spark stuff [18:42:06] ottomata: So far, documenting documenting [18:42:09] sure joal, cave? [18:42:16] milimetric: OMW [18:42:24] k no hurries, just let me know when you are ready for me to look [18:42:32] ottomata: I'll have the denormalized package ready soon [18:54:15] 10Analytics: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#2857802 (10Nuria) [18:54:50] 10Analytics: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#2857803 (10Ottomata) a:03Ottomata [18:55:00] 10Analytics: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#2857791 (10Ottomata) Guess we should get some quotes started on this, eh? [18:56:33] 10Analytics: Hadoop cluster expansion.Add Nodes - https://phabricator.wikimedia.org/T152713#2857806 (10Nuria) [18:57:40] 10Analytics: CDH upgrade. Value proposition: new spark for edit reconstruction - https://phabricator.wikimedia.org/T152714#2857818 (10Nuria) [18:58:12] 10Analytics: CDH upgrade. Value proposition: new spark for edit reconstruction - https://phabricator.wikimedia.org/T152714#2857830 (10Nuria) Hadoop Nodes to Jessie? (CDH has released Jessie debs) +1 (maybe with CDH upgrade?). [19:02:37] 10Analytics: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#2857833 (10Ottomata) Something I just thought of: If we waited until after we upgraded Hadoop and the nodes to Jessie in T152714, we could also install these as Jessie. If we replace these before, we'd have to install... [19:13:44] (03CR) 10Nuria: [V: 032 C: 032] Fix select_missing_sequence_runs query to report seqs on edges of holes, not in [analytics/refinery] - 10https://gerrit.wikimedia.org/r/325961 (owner: 10Ottomata) [19:25:52] milimetric: just checked page and user states - no special case exists [19:26:21] milimetric: I'll just put a filter to be sure that if it exists one day we don't corrupt, and will document [19:26:21] (03CR) 10Nuria: [V: 032 C: 032] Fix select_missing_sequence_runs query to report seqs on edges of holes, not in (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/325961 (owner: 10Ottomata) [19:26:39] Going for diner a-team, later ! [19:26:55] k, joal, nite [19:32:02] nuria: don't understand your q on [19:32:03] ^^ [19:32:56] ottomata: it was more a question to myself but do we understand (with that sql) how are we getting reports of "1" datapoint lost with the same end /start date for interval? [19:33:04] yes [19:33:07] that's why i changed the query [19:33:17] say it looked like this [19:33:22] 1 2 4 5 [19:33:32] aham [19:33:34] the previous query, with the + 1 and - 1 [19:33:42] would show 2 + 1 ==3 [19:33:46] and 4 - 1 == 3 [19:34:00] so they would say, the first missing seq is 3, and the last missing seq is 3 [19:34:12] the size of the hole is 1 [19:34:17] which was confusing [19:34:22] so my change makes it say [19:34:47] the seq on the left of the hole is 2, and the seq on the right of the hole is 4 [19:35:15] wait... [19:35:19] does that make sense [19:35:20] ? [19:35:30] if the hole was bigger [19:35:32] like [19:35:44] 1 2 _ _ _ 5 6 [19:35:46] sorry [19:35:50] 1 2 _ _ _ 6 7 [19:36:12] before it would have said: [19:36:12] missing_start: 3, missing_end: 5, missing_count: 3 [19:36:18] with my change, now it will say: [19:36:28] before_missing: 2, after_missing: 6, missing_count 3 [19:37:04] ottomata: ah yes, it will never report as a start a sequence number that is not present on table [19:37:08] ottomata: k [19:37:11] right [19:37:20] ottomata: comprendouuuu [19:37:30] before we were reporting the missing edge of the holes, now we report the present edge of the holes [20:07:24] 06Analytics-Kanban: Webrequest dataloss registered during the last kafka restarts - https://phabricator.wikimedia.org/T152674#2858000 (10Ottomata) BTW, the query was just a little wrong about missing_start and missing_end. The 1s in that list are actually missing. Fixed the query here: https://gerrit.wikimedia.... [20:14:25] 10Analytics, 06Commons, 06Multimedia, 10Tabular-Data, and 4 others: Review shared data namespace (tabular data) implementation - https://phabricator.wikimedia.org/T134426#2858027 (10Yurik) 05Open>03Resolved [21:33:07] 10Analytics, 10EventBus, 13Patch-For-Review, 06Services (watching): Allow multiple schemas in a single EventBus topic. - https://phabricator.wikimedia.org/T152557#2858291 (10mobrovac) 05Open>03declined Closing as invalid as per discussion. [21:37:41] 10Analytics, 10EventBus, 13Patch-For-Review, 06Services (watching): Allow multiple schemas in a single EventBus topic. - https://phabricator.wikimedia.org/T152557#2858300 (10Ottomata) Hm, well, we still need to talk about what we should do with the change-prop topic configs, since we had to revert that today. [21:39:10] 10Analytics, 10EventBus, 13Patch-For-Review, 06Services (watching): Allow multiple schemas in a single EventBus topic. - https://phabricator.wikimedia.org/T152557#2858304 (10GWicke) I expect the topic of supporting multiple event schemas to also come up in the context of the public RecentChanges feed, wher... [21:49:17] 10Analytics, 10EventBus, 10Wikimedia-Stream: Implement filtering (if we should) - https://phabricator.wikimedia.org/T152731#2858381 (10Ottomata) [21:49:20] 10Analytics, 10EventBus, 10Wikimedia-Stream: Implement server side filtering (if we should) - https://phabricator.wikimedia.org/T152731#2858397 (10Ottomata) [21:49:26] Hello! I am GCI participant and I joined this channel because mentor of my task is here. [21:55:00] Phantom42: what task are you trying to do? [21:55:47] nuria: I have just claimed this one: https://phabricator.wikimedia.org/T139934 [21:57:08] Phantom42: you can ping milimetric on that as needed, try to get familiar with this code repo: https://github.com/wikimedia/analytics-aqs [21:58:09] hi Phantom42, happy to help [21:58:44] nuria: thank you [21:59:01] Phantom42: I'm Dan Andreescu, listed on the task [21:59:47] 10Analytics, 10Pageviews-API, 07Easy, 03Google-Code-In-2016: Add monthly request stats per article title to pageview api - https://phabricator.wikimedia.org/T139934#2858448 (10Nuria) [22:00:00] milimetric: i have edited task to remove some outdated info [22:01:09] milimetric: Nice to meet you. I am now studying that task and collecting information on that topic. I will contact you in case I have any quesions. [22:01:55] Phantom42: ok, I'm happy to show you around the code if you like, it might make it a lot easier [22:05:20] ottomata: BTW, some (strong?) opinions here: https://phabricator.wikimedia.org/T152684#2858471 [22:08:37] nuria: sorry, gotta run out, will comment later [22:08:44] ottomata: np [22:09:47] 10Analytics, 10EventBus, 10Wikimedia-Stream, 13Patch-For-Review: Implement server side filtering (if we should) - https://phabricator.wikimedia.org/T152731#2858381 (10Pchelolo) > However, many nodejs jsonschema packages speed up validation by using code generation. We probably shouldn't generate code based... [22:14:53] milimetric: some introduction to code could be helpful, I think. Maybe we better discuss that in pm? [22:25:15] 10Analytics, 10Datasets-General-or-Unknown, 06Services: Many error 500 from pageviews API "Error in Cassandra table storage backend" - https://phabricator.wikimedia.org/T125345#2858578 (10Nuria) [22:52:16] (03PS5) 10Joal: Add mediawiki history spark jobs to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/325312 (https://phabricator.wikimedia.org/T141548) [23:50:45] 10Analytics, 06Commons, 06Multimedia, 10Tabular-Data, and 3 others: Allow structured datasets on a central repository (CSV, TSV, JSON, GeoJSON, XML, ...) - https://phabricator.wikimedia.org/T120452#2858834 (10Yurik) [23:53:46] 10Analytics, 06Commons, 06Multimedia, 10Tabular-Data, and 3 others: Allow structured datasets on a central repository (CSV, TSV, JSON, GeoJSON, XML, ...) - https://phabricator.wikimedia.org/T120452#2858850 (10Yurik) [23:54:40] 10Analytics, 06Commons, 06Multimedia, 10Tabular-Data, and 3 others: Allow structured datasets on a central repository (CSV, TSV, JSON, GeoJSON, XML, ...) - https://phabricator.wikimedia.org/T120452#2108162 (10Yurik)