[00:07:20] neilpquinn: yt? [00:17:29] nuria_: hey there! [00:18:40] neilpquinn: from what i gather you were trying to report total article count using wikistats1 data, right? [00:18:47] neilpquinn: or do I have that wrong? [00:19:03] nuria_: yeah, that's right. now that Erik has sent the files, I should have everything I need. In the next month or so, I can look at moving the calculation to mediawiki_history. [00:19:43] neilpquinn: ya, numbers will not match exactly cause erik counts articles that have "at least 1 internal link" in main content namespace (joal dixit) [00:19:46] I was thinking that I should use the full article count definition, including the part about containing a link, which you can't see from mediawiki_history [00:20:27] neilpquinn: ya, that would not be possible in data lake for a while, we are working in a total article count metric for next quarter and we will document how much it differs from erik's [00:20:44] nuria_: well, actually, I thought I saw somewhere that wikistats 1 actually uses the stub dumps anyway, so it omits that criterion at least part of the time anyway [00:20:59] neilpquinn: the current metrics are within 1%, so we will see if the link/no link makes a difference [00:21:04] nuria_: but in practice I have a feeling that the difference is acceptable [00:21:23] neilpquinn: ah, not sure about stub dump thing [00:37:41] 10Analytics-Kanban: Current master code on wikistats breaks net bytes metric display - https://phabricator.wikimedia.org/T199285 (10Nuria) [00:37:54] 10Analytics-Kanban: Current master code on wikistats breaks net bytes metric display - https://phabricator.wikimedia.org/T199285 (10Nuria) {F23579543} [00:42:15] (03PS1) 10Nuria: Revert "Make bar-chart and line-chart resilient to breakdowns with null values" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445047 [00:43:45] 10Analytics-Kanban: Current master code on wikistats breaks net bytes metric display - https://phabricator.wikimedia.org/T199285 (10Nuria) Reverting change that fixes issue: https://gerrit.wikimedia.org/r/#/c/analytics/wikistats2/+/445047/ [00:47:54] (03PS2) 10Nuria: Revert "Make bar-chart and line-chart resilient to breakdowns with null values" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445047 (https://phabricator.wikimedia.org/T199285) [00:55:57] 10Analytics-Tech-community-metrics, 10Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585 (10srishakatux) Removing #possible-tech-projects as we are planning on killing that workboard soon. In its current state, it is not a good fit for #outr... [00:55:59] 10Analytics-Tech-community-metrics, 10Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585 (10srishakatux) [02:50:22] 10Analytics, 10EventBus, 10JobRunner-Service, 10Operations, and 3 others: Stop and remove old job runners - https://phabricator.wikimedia.org/T198220 (10Krinkle) [03:35:04] (03Abandoned) 10Nuria: Revert "Make bar-chart and line-chart resilient to breakdowns with null values" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445047 (https://phabricator.wikimedia.org/T199285) (owner: 10Nuria) [03:35:14] (03PS1) 10Nuria: Revert "Make bar-chart and line-chart resilient to breakdowns with null values" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445061 [06:16:49] bonjour joal [06:17:02] if you want me to revert the snapshot let me know :) [07:07:00] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) [07:08:11] Hi elukey :) [07:08:14] o/ [07:08:23] Why should we revert snapshot? [07:08:26] any issue I missed? [07:08:33] Nuria sent an email to internal [07:10:11] addshore: thanks a lot for the quick fix! \o/ [07:10:54] elukey: I think it is related to WKS2-UI, not data [07:11:35] Arf - Just saw th [07:11:42] the email sorry :) [07:13:08] elukey: give me aminu [07:13:55] sure sure no rush, I only offered my help in case we need to rollback this morning :) [07:35:42] elukey: https://medium.com/hadoop-noob/druid-parquet-extension-on-array-list-type-ab9ddfb150e6 [07:35:51] * joal cries and cries [07:38:20] ok - I have a working patch, but it's mega-ugly [07:38:24] ahahahah [07:38:29] :) [07:38:35] elukey: batcave for a minute? [07:41:15] sure [07:58:16] elukey: https://github.com/apache/incubator-druid/issues/5433 [07:58:19] I'm gonna try that [07:59:24] ah nice exactly like us! [07:59:30] same versions [08:00:44] in the meantime, I am proceeding with the rollback [08:01:08] elukey: there is a solution at the end [08:01:11] elukey: will ry it [08:01:20] elukey: will try it [08:02:14] yep yep I saw it, it is a matter of reindexing right? [08:02:34] when we'll have a good snapshot (tested etc..) we'll deploy again [08:02:37] not a big deal [08:02:49] sounds good [08:02:51] Thanks mate :) [08:04:03] am I seeing cached content for https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/new-registered-users/normal|bar|2-Year~2016060100~2018071100|~total by any chance? [08:04:07] still flat zero [08:04:27] ah no I've hit "all" and it seems working [08:04:30] ok rollback done :) [08:06:59] (03PS1) 10Joal: Fix mediawiki_reduced parquet indexation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445091 [08:07:04] one of the things that we should work on, if not present already, is a checklist for aqs/wikistats deployments [08:07:12] things to check to verify that all is good [08:07:33] say for example that I need to do it for some obscure reasons (emergency rollback, etc..) [08:07:37] I'd have no idea what to check [08:07:44] (shame on you Luca! buuuu) [08:08:06] (03CR) 10Elukey: [C: 031] Fix mediawiki_reduced parquet indexation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445091 (owner: 10Joal) [08:12:22] elukey: Asking the permission to overwrite snapshot 2018-06 in public cluster [08:14:36] +2 [08:29:31] 10Analytics-Kanban: Fix mediawiki_reduced druid indexation (parquet arrays) - https://phabricator.wikimedia.org/T199299 (10JAllemandou) [08:29:44] 10Analytics-Kanban: Fix mediawiki_reduced druid indexation (parquet arrays) - https://phabricator.wikimedia.org/T199299 (10JAllemandou) a:03JAllemandou [08:30:18] (03PS2) 10Joal: Fix mediawiki_reduced parquet indexation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445091 (https://phabricator.wikimedia.org/T199299) [08:30:38] elukey: kids awake, I'll monitor indexation and test nonetheless :) [08:34:30] ack! [08:34:44] joal: can we do https://phabricator.wikimedia.org/T199109 together this evening when you have time? [08:58:09] elukey: I'll have time this evening for this yes :) [09:01:43] elukey: no problem :) [09:01:49] I like to fix quick things quickly :) [09:02:24] joal: I might try and write a hive query to do that I was trying to do then! :) [10:05:49] so I've just added a http.proxy setting to stat and root users on stat/notebook hosts [10:06:21] puppet works fine, git seems to work correctly [10:06:38] I hope that this fixes the last weird data flows in our vlan [10:36:54] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design: Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) @Aklapper - it seems I don't have the permissions to edit this task. Is there a way to check what is causing that? [10:42:03] * elukey afk for ~2h! [10:54:26] elukey: For when you're back -- We can redeploy again - Problem fixed :D [11:01:07] (03CR) 10Joal: [V: 031 C: 031] "Tested on cluster" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445091 (https://phabricator.wikimedia.org/T199299) (owner: 10Joal) [11:08:55] (03CR) 10Joal: [C: 032] "Merging!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/443069 (https://phabricator.wikimedia.org/T193641) (owner: 10Jonas Kress (WMDE)) [11:18:14] (03Merged) 10jenkins-bot: Track number of editors from Wikipedia who also edit on Wikidata over time [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/443069 (https://phabricator.wikimedia.org/T193641) (owner: 10Jonas Kress (WMDE)) [12:41:57] hellooooo [12:43:41] 10Analytics, 10Analytics-Wikistats: [Wikistats 2.0] No data appears by default on New Registered Users Detail page - https://phabricator.wikimedia.org/T199324 (10sahil505) [12:46:14] mforns: o/ [12:46:22] joal: if you are there we can deploy aqs [12:46:30] hey elukey, I've seen the alarm on mysql purging script [12:47:10] ufffff I had it in my spam folder! Again [12:48:27] ahhh deployment-eventlog05! [12:48:38] it needs a deploy in there too [12:49:09] same for me, had that in my spam folder :( [12:49:50] and also piwik alers [12:49:53] /o\ [12:49:56] yes [12:49:57] :( [12:50:50] ahhhh that's labs again [12:50:52] fiuuuu [12:51:02] ok all good :) [12:54:18] (03PS5) 10Jonas Kress (WMDE): Introduce oozie job to schedule generating metrics for Wikidata co-editors [analytics/refinery] - 10https://gerrit.wikimedia.org/r/443409 (https://phabricator.wikimedia.org/T193641) [12:59:01] joal: ok to deploy then? [12:59:13] +1 elukey [13:08:11] (03CR) 10Elukey: [V: 032 C: 032] Fix mediawiki_reduced parquet indexation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445091 (https://phabricator.wikimedia.org/T199299) (owner: 10Joal) [13:11:13] joal: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445163/1/hieradata/role/common/aqs.yaml [13:15:05] joal: restarting now [13:16:05] elukey: may I try aqs1004? [13:16:44] ehm I am deploying to all, it should be completed in a min [13:18:07] elukey: confirmed from aqs1004 [13:18:17] joal: deployment done [13:18:42] 10Analytics-Kanban, 10Patch-For-Review: Fix mediawiki_reduced druid indexation (parquet arrays) - https://phabricator.wikimedia.org/T199299 (10elukey) [13:19:41] looks good [13:20:12] elukey: YES [13:20:30] I checked every metric in the UI, looks good - Except for cached value for enwiki [13:20:33] :( [13:21:56] mforns: so one thing that I don't get is why eventlog05 in labs alarms, since we put the --no-whitelist-sanity-check [13:24:02] ah no wait nevermind, I confused the parameters [13:24:29] elukey, no idea [13:27:10] so the last commit in the refinery is from Jan 5 on that host :D [13:38:28] ah of course, on deployment-tin the refinery is super old [13:41:01] oh ok [13:46:57] also, elukey thanks for merging the puppet cron job for EL Sanitization :] [13:47:07] I think it's not working however... :[ [13:50:47] I'm confused... [13:51:00] there's a problem with the whitelist file [13:52:02] I thought /srv/deployment/analytics/refinery/static_data/eventlogging/whitelist.yaml was available in the nodes [13:52:40] it is on an1003 [13:52:47] what problem do you see? [13:53:00] mforns: --^ [13:53:17] elukey, the code expects it to be in the cluster nodes [13:53:48] because of -deploy-mode cluster [13:54:49] on all the worker nodes?? [13:54:51] isn't refinery deployed there as well? [13:54:52] yea [13:55:03] nono this is not possible :) [13:55:07] we can add it to hdfs [13:55:13] oh, sorry of course [13:55:15] in hdfs [13:55:52] is refinery code in hdfs? [13:57:15] in theory yes [13:58:49] so we have /srv/deployment/analytics/refinery/bin/refinery-deploy-to-hdfs that is meant to be done after each deployment [13:58:49] then the only thing needed is changing the path [13:58:58] aha [13:59:14] is the script able to read from hdfs though? [13:59:26] or will the path be passed to the spark stuff? [13:59:35] yes, I changed it to read from hdfs [13:59:52] no no, no spark [14:00:10] /wmf/refinery/current/static_data/eventlogging/whitelist.yaml [14:00:29] this is the path on hdfs [14:01:50] ok [14:02:02] will change that, thanks elukey! [14:02:31] :) [14:27:45] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10ayounsi) @elukey I still see flows to gerrit (2620:0:861:3:208:80:154:85) https from: 2620:0:861:108:10:64:53:26 2620:0:861:108:10:64:... [14:35:11] 10Analytics, 10Analytics-Wikistats: [Wikistats 2.0] No data appears by default on New Registered Users Detail page - https://phabricator.wikimedia.org/T199324 (10Nuria) [14:35:13] 10Analytics-Kanban, 10Patch-For-Review: Fix mediawiki_reduced druid indexation (parquet arrays) - https://phabricator.wikimedia.org/T199299 (10Nuria) [14:35:42] mforns: yt? want to talk about the problems on wikistats master? [14:36:08] nuria_, yes, I'm with sahil talking about that right now [14:36:12] wanna join? [14:36:18] https://hangouts.google.com/hangouts/_/wikimedia.org/mforns-sahil?authuser=0 [14:36:26] mforns: sure, one sec [14:42:02] elukey: does aqs needs re-started when we point to the other snapshot? [14:42:57] nuria_: yep [14:45:10] elukey: ok, what is happening is that today's range returning defective data is alredy cached on varnish, but looks like we have good data under [14:46:02] nuria_: yep, but it should expire eventually.. if we are in a hurry we can ask to traffic if they can purge wikistats's entries [14:48:22] elukey: mmm.. i am not sure snapshot is reverted cause i still can query a june range https://wikimedia.org/api/rest_v1/metrics/edits/aggregate/all-projects/all-editor-types/all-page-types/daily/2018050100/2018061100 (this might be cached but unilkely i just changed end date) [14:48:50] elukey: this could be in cache since 06-11 but seems unilikely [14:49:02] right? [14:50:51] nuria_: we have re-deployed the correct snapshot later on in the day [14:59:44] elukey: mmm.. let's look at this later , my end date on url above shoudl not returnm any data for june, given that i chose an arbitrary (end) date in the middle of june it is unlikely that the url is cached [15:03:26] PROBLEM - Varnishkafka Statsv Delivery Errors per second on einsteinium is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [5.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=statsv&var-host=All [15:25:05] elukey, I corrected the whitelist path: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445187/ [15:30:48] a-team: Kafka main-eqiad exploded in a big set of fireworks, it seems that we found how to recover but pretty much everything is down now [15:30:59] I am surely skipping standup sorry :( [15:34:46] * fdans hugs elukey [15:39:06] RECOVERY - Varnishkafka Statsv Delivery Errors per second on einsteinium is OK: OK: Less than 1.00% above the threshold [1.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=statsv&var-host=All [15:51:08] elukey: I'm here now - ping if you think I can do anything [15:51:26] joal: it seems better now, but basically Kafka main eqiad went brutally down [15:51:33] :( [15:51:48] kafka finished open fds, OOMs, horror everywhere [15:52:04] elukey: any possible cause in mind? [15:52:06] (03PS1) 10Mforns: Unbreak master [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445197 (https://phabricator.wikimedia.org/T189619) [15:52:11] it seems that for some reason a ton of topics got created by change-prop with repetitive naming [15:52:18] like eqiad.change-prop.retry.change-prop.retry.change-prop.retry.change-prop.retry.change-prop.continue [15:52:29] precise 2086 [15:52:36] *precisely [15:52:50] recursive calls with longer and longer names, and numerous more topics :( [16:01:53] ping elukey [16:02:16] ping fdans [16:05:20] (03PS1) 10Mforns: Fix topic selector routing [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445199 (https://phabricator.wikimedia.org/T199343) [16:39:42] mforns: batcave? [16:39:48] nuria_, omw [16:44:13] (03CR) 10Nuria: [C: 032] Unbreak master [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445197 (https://phabricator.wikimedia.org/T189619) (owner: 10Mforns) [16:44:31] (03Abandoned) 10Nuria: Revert "Make bar-chart and line-chart resilient to breakdowns with null values" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445061 (owner: 10Nuria) [16:45:00] Gone for diner, will be back after [16:49:56] PROBLEM - Varnishkafka Statsv Delivery Errors per second on einsteinium is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [5.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=statsv&var-host=All [16:59:46] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message produce rate in last 30m on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [17:00:05] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message consume rate in last 30m on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [17:17:28] (03PS1) 10Mforns: Correct use of new Date() and Date.parse() [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445205 (https://phabricator.wikimedia.org/T189619) [17:18:08] nuria_, ^ [17:18:45] nuria_, also this one, if you want to include it in the deploy: https://gerrit.wikimedia.org/r/#/c/analytics/wikistats2/+/445199/ [17:18:58] (03CR) 10Nuria: Correct use of new Date() and Date.parse() (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445205 (https://phabricator.wikimedia.org/T189619) (owner: 10Mforns) [17:19:20] mforns: it can be let dates = this.data.map((d) => d.month); [17:19:23] mforns: right? [17:19:37] nuria_, not sure, because later operations can modify that date [17:19:37] mforns: d.month is a date object [17:19:54] and that would modify the underlying data [17:20:16] I haven't checked, but I assumed that was the reason we have that copy there [17:20:55] nuria_, actualy, no, that array is readonly [17:20:58] will change [17:23:20] mforns: mmm no, I think that predated my changes on moving thinmgs to UTC [17:23:23] (03PS2) 10Mforns: Correct use of new Date() and Date.parse() [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445205 (https://phabricator.wikimedia.org/T189619) [17:23:38] nuria_, changed that ^ [17:23:48] mforns: before i think that was the point at which dates where being generated: https://github.com/wikimedia/analytics-wikistats2/commit/a18f2a59d32cdbf3bc5ba66eea6d89abf68a44dd#diff-25d902c24283ab8cfbac54dfa101ad31 [17:24:57] mforns: mmm.. no wait, we had dates being generated on graph model before, ya [17:25:21] yea, I guess so [17:25:29] mforns: let's test some more one sec [17:28:17] mforns: ya, it actually fixes a bug [17:28:24] mforns: i can show you in batcave [17:28:28] ok omw [17:33:15] RECOVERY - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message consume rate in last 30m on einsteinium is OK: (C)0 le (W)100 le 129 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [17:36:36] (what a disaster today) [17:36:39] sigh [17:40:43] (03PS3) 10Mforns: Correct use of new Date() and Date.parse() [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445205 (https://phabricator.wikimedia.org/T189619) [17:43:00] (03CR) 10Nuria: [V: 032 C: 032] Correct use of new Date() and Date.parse() [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445205 (https://phabricator.wikimedia.org/T189619) (owner: 10Mforns) [17:49:35] elukey: if it helps we can talk :-S [17:49:55] (03CR) 10Nuria: [C: 032] Fix topic selector routing [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445199 (https://phabricator.wikimedia.org/T199343) (owner: 10Mforns) [18:11:05] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message consume rate in last 30m on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [18:11:56] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message produce rate in last 30m on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [18:15:36] RECOVERY - Varnishkafka Statsv Delivery Errors per second on einsteinium is OK: OK: Less than 1.00% above the threshold [1.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=statsv&var-host=All [18:18:23] (03PS1) 10Nuria: Preparing for release 2.3.2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445215 [18:22:06] (03CR) 10Nuria: [V: 032 C: 032] "Self merging release" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445215 (owner: 10Nuria) [18:24:27] (03PS1) 10Nuria: Release 2.3.2 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/445217 [18:27:42] (03CR) 10Nuria: [V: 032 C: 032] Release 2.3.2 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/445217 (owner: 10Nuria) [18:42:20] anyone know what the retention is on offsets in kafka? I tried looking it up but it seems its only in zookeeper, not available via the kafka api. And zookeeper is firewalled away [18:44:46] PROBLEM - Varnishkafka Statsv Delivery Errors per second on einsteinium is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=statsv&var-host=All [18:47:07] looks like we might use the default from the confluent puppet module, 10080 (1 week), but not sure [19:01:05] PROBLEM - Check if active EventStreams endpoint is delivering messages. on scb1002 is CRITICAL: CRITICAL: No EventStreams message was consumed from http://scb1002.eqiad.wmnet:8092/v2/stream/recentchange within 10 seconds. [19:17:49] ebernhardson: is what you call offset-retention the same as data-retention? [19:18:13] joal: no, there is a special topic in kafka that holds offsets. It's retention is configured separately from regular topics usually [19:18:24] ebernhardson: That's what I wondered :) [19:19:13] mostly i was trying to figure out if this job that would produce to kafka once a week might run into problems with offsets disapearing between runs (but maybe simply having daemons running refreshes the offsets ... still more investigation to do) [19:19:27] (03PS2) 10Amire80: WIP: Measure articles published using CX2 [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/442860 (https://phabricator.wikimedia.org/T196435) [19:19:33] s/daemons/consumers/ [19:21:57] ebernhardson: I see the point [19:22:41] (03CR) 10Amire80: "Thanks, Milimetric." [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/442860 (https://phabricator.wikimedia.org/T196435) (owner: 10Amire80) [19:23:44] ebernhardson: I would intuitively think that a good retentio [19:24:00] for such a topic would be max(user_topic_retention)? [19:31:05] RECOVERY - Check if active EventStreams endpoint is delivering messages. on scb1002 is OK: OK: An EventStreams message was consumed from http://scb1002.eqiad.wmnet:8092/v2/stream/recentchange within 10 seconds. [19:35:32] joal: indeed as long as offset retention is >= topic retention, and the consumer defaults to earliest possible offset when no offset is recorded it should "just work" [19:35:36] RECOVERY - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message produce rate in last 30m on einsteinium is OK: (C)0 le (W)100 le 108.4 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [19:35:46] RECOVERY - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message consume rate in last 30m on einsteinium is OK: (C)0 le (W)100 le 1376 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [19:47:06] RECOVERY - Varnishkafka Statsv Delivery Errors per second on einsteinium is OK: OK: Less than 1.00% above the threshold [1.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=statsv&var-host=All [20:08:29] (03PS2) 10Sahil505: Corrected negative number abbreviation in chart axis [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/444012 (https://phabricator.wikimedia.org/T198510) [20:11:13] (03CR) 10Sahil505: Corrected negative number abbreviation in chart axis (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/444012 (https://phabricator.wikimedia.org/T198510) (owner: 10Sahil505) [20:11:46] PROBLEM - Varnishkafka Statsv Delivery Errors per second on einsteinium is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=statsv&var-host=All [20:14:35] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message produce rate in last 30m on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [20:14:45] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message consume rate in last 30m on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [20:59:22] (03CR) 10Nuria: Corrected negative number abbreviation in chart axis (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/444012 (https://phabricator.wikimedia.org/T198510) (owner: 10Sahil505) [21:30:16] RECOVERY - Varnishkafka Statsv Delivery Errors per second on einsteinium is OK: OK: Less than 1.00% above the threshold [1.0] https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&var-instance=statsv&var-host=All [21:47:26] RECOVERY - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message produce rate in last 30m on einsteinium is OK: (C)0 le (W)100 le 359.6 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [21:47:35] RECOVERY - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message consume rate in last 30m on einsteinium is OK: (C)0 le (W)100 le 281.1 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [21:49:05] hello kafka main :) [21:49:12] glad that you are back [21:59:18] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: [Wikistats2] Fix topic selector routing - https://phabricator.wikimedia.org/T199343 (10Nuria) 05Open>03Resolved [21:59:33] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add interface::add_ip6_mapped { 'main': } to all the Analytics hosts - https://phabricator.wikimedia.org/T199180 (10Nuria) 05Open>03Resolved [21:59:35] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10Nuria) [21:59:51] 10Analytics-Kanban, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210 (10Nuria) [21:59:53] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Disable filter & split radio switches for metrics & reduce the height of wiki div in sidebar of detail page - https://phabricator.wikimedia.org/T198183 (10Nuria) 05Open>03Resolved [22:00:45] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Please install Text::CSV_XS at stat1005 - https://phabricator.wikimedia.org/T199131 (10Nuria) 05Open>03Resolved [22:01:01] * elukey going to sleep :P [22:01:25] 10Analytics, 10Analytics-Kanban: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642 (10Nuria) [22:01:27] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade AQS to Debian Stretch - https://phabricator.wikimedia.org/T196138 (10Nuria) 05Open>03Resolved [22:01:54] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Display of radio buttons in Wikistats 2 is somewhat confusing - https://phabricator.wikimedia.org/T183185 (10Nuria) 05Open>03Resolved [22:02:07] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Update piwik to latest stable - https://phabricator.wikimedia.org/T192298 (10Nuria) 05Open>03Resolved [22:02:20] 10Analytics-Kanban, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210 (10Nuria) [22:02:22] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Improve UI for ‘All wikis’ & ‘Explore Topics’ dropdown - https://phabricator.wikimedia.org/T196982 (10Nuria) 05Open>03Resolved [22:02:36] 10Analytics-Kanban: Private geo wiki data in new analytics stack - https://phabricator.wikimedia.org/T176996 (10Nuria) [22:02:38] 10Analytics, 10Analytics-Kanban: Archive old geowiki data (editors per country) and make it easily available at WMF - https://phabricator.wikimedia.org/T190856 (10Nuria) 05Open>03Resolved [22:03:04] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Correct mixed case naming of metrics WIkistats 2.0 - https://phabricator.wikimedia.org/T197103 (10Nuria) Super thanks for this work @sahil505 [22:03:06] 10Analytics-Kanban, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210 (10Nuria) [22:03:08] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Correct mixed case naming of metrics WIkistats 2.0 - https://phabricator.wikimedia.org/T197103 (10Nuria) 05Open>03Resolved [22:03:21] 10Analytics-Kanban, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210 (10Nuria) [22:03:23] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Correct spelling mistake in description meta tag of wikistats 2 - https://phabricator.wikimedia.org/T198122 (10Nuria) 05Open>03Resolved [22:03:49] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Correct order of metric types on Dashboard navbar/menu - https://phabricator.wikimedia.org/T198168 (10Nuria) Thnaks for the attention to detail here [22:04:14] 10Analytics-Kanban, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210 (10Nuria) [22:04:16] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Shortcut icon is not showing - https://phabricator.wikimedia.org/T197482 (10Nuria) 05Open>03Resolved [22:04:31] 10Analytics-Kanban, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210 (10Nuria) [22:04:33] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Correct order of metric types on Dashboard navbar/menu - https://phabricator.wikimedia.org/T198168 (10Nuria) 05Open>03Resolved [22:04:46] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Bar chart changes height when toggling splits - https://phabricator.wikimedia.org/T194431 (10Nuria) 05Open>03Resolved [22:06:50] 10Analytics-Kanban: Fix issue with prod/labs jars for sqoop - https://phabricator.wikimedia.org/T196737 (10Nuria) 05Open>03Resolved [22:08:18] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add interface::add_ip6_mapped { 'main': } to all the Analytics hosts - https://phabricator.wikimedia.org/T199180 (10Nuria) 05Resolved>03Open [22:08:20] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10Nuria) [22:08:45] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add interface::add_ip6_mapped { 'main': } to all the Analytics hosts - https://phabricator.wikimedia.org/T199180 (10ayounsi) Reopening this task to not forget to add the DNS PTR record on those new IPs. [22:15:17] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174 (10Nuria) a:05Nuria>03mforns [22:22:02] 10Analytics, 10Analytics-Wikistats: Wikistats - move from numeral to numbro for better localization support - https://phabricator.wikimedia.org/T199386 (10Nuria)