[02:46:18] Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2210925 (Niharika) >>! In T125459#2299256, @kaldari wrote: > Reply from Microsoft: "We do... [02:58:27] Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2299769 (Earwig) About half of all queries. [03:48:17] Analytics-Kanban: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2299811 (Nuria) Where is this logging to? [04:36:16] Analytics, Editing-Analysis, Notifications, Collab-Team-2016-Apr-Jun-Q4: Numerous Notification Tracking Graphs Stopped Working at End of 2015 - https://phabricator.wikimedia.org/T132116#2299871 (Nuria) Actually from what I can see ( @milimetric please correct me if I am wrong) these metrics are... [07:40:56] Analytics, Graph, Pageviews-API, Patch-For-Review: Unable to get pageviews for the title with ' in the name - https://phabricator.wikimedia.org/T129346#2300124 (Yurik) Apparently wrapping {{#titleparts: {{ARTICLE*}} }} removes the problem. Waiting for some feedback on @matmarex 's change, and i... [08:36:45] joal: o/ [08:36:53] Heya elukey :) [08:36:59] how is it today? [08:37:01] gooood morning :) [08:37:10] \o ! [08:37:41] I am re-imaging aqs1004 with raid10 on "only" 4 disks rather than 8, this should be the final version.. we'll waste 15GB in 4 disks of unused partition [08:37:57] I had a chat with robh and other opsen yesterday [08:38:18] and the raid0 thing needs to be supported with data [08:38:30] hm, not following [08:38:43] I mean, the general question was: what happens if you loose a disk, and how soon you'll need a new one? [08:38:44] Lret me put it my way and correct me when I'm wrong :) [08:38:46] etc.. [08:39:04] joal: I already answered with some points [08:39:17] but I have also other things to investigate.. [08:39:22] if you want we can chat later on [08:39:31] ok sure [08:39:40] let me know when you have time [08:39:57] let's say 10 min that I finish aqs1004 :) [08:44:05] sure ! [08:46:46] mdadm: /dev/md/0 has been started with 4 drives. [08:46:46] mdadm: /dev/md/1 has been started with 4 drives. [08:46:47] mdadm: /dev/md/2 has been started with 4 drives. [08:46:50] ahhh goood [08:47:43] md0 has 28GB, md2/1 5.8TB [08:48:29] we might want to have a larger root? [08:48:31] just in case [08:48:41] 30GB feels not much [08:48:57] elukey: it really depends if logs are stored there or not ;) [08:49:09] maybe grow it t0 50GB ? [08:49:48] joal: yeah 50GB seems better, logs should be sent to logstash [08:49:56] mobrovac: aloha :D [08:50:30] quick question - 30GB for the / partition seems not a lot, even if we stated it in the aqs rack/setup/deploy task.. what do you think? [08:50:39] (it will be raid 10 on 4 disks) [08:51:40] if you don't put cass' data dir there it should be ok [08:51:59] but if the logs go onto / as well then it might get tight very quickly [08:52:07] especially since cass is pretty verbose [08:52:13] (gotta love java stack traces) [08:53:42] Analytics-Kanban, Operations, ops-eqiad, Patch-For-Review: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2300302 (elukey) All right 1005 booted after restarts, it might be a problem of md arrays taking too much time to bootstrap? Anyhow, after a chat with @robh we dec... [08:54:23] mobrovac: no no data dirs will be on raid0 arrays [08:54:32] I thought that logs were on logstash only [08:54:40] will double check, thanks! [08:54:57] no no, they're logged locally too [08:55:04] in /var/log/cassandra/system.log [08:56:07] * elukey plays various low tones on the piano to increase the gravity of Marko's statement [08:56:16] :D [08:56:20] will check then thanks! [08:58:34] elukey@aqs1003:~$ ls -hl /var/log/cassandra [08:58:35] total 162M [08:58:51] looks good [08:58:55] joal --^ [08:59:22] now 60 gb looks better then, JUST IN CASE [08:59:53] but I'll waste 120GB of capacity in 4 disks for the partman recipe [08:59:58] mmmm [09:00:02] joal: batcave? [09:13:16] oh sorry elukey, didn't get the ping [09:13:23] elukey: ready, yes ! [09:13:27] oookkk [09:13:34] * joal need some more coffee [09:13:41] * elukey will wait [09:13:55] * elukey will start the new install in the meantime [09:14:51] elukey: no no, please arrive, was just joking on me not being awake enough :) [10:48:08] Analytics-Kanban, Operations, ops-eqiad, Patch-For-Review: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2300729 (elukey) All the hosts re-installed and working fine, the only issue seems to be occasionally md arrays not available during boot. [11:07:52] * elukey lunch! [11:57:08] joal: https://puppet-compiler.wmflabs.org/2823/ looks good [12:12:40] o/ joal [12:12:50] Just wanted to let you know that I can't make the live systems meeting today [12:41:01] Analytics-Cluster, Analytics-Kanban, Operations, Traffic, Patch-For-Review: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 - https://phabricator.wikimedia.org/T121562#2300922 (elukey) So from the past week I can see: - kafka1012 increased steadily its logsize from 12/05 ~20:00 UTC more o... [12:42:59] so kafka1012 has a broker log size of almost 12TB [12:43:14] that is weird [12:43:19] handling too many partitions? [12:57:50] Analytics-Cluster, Analytics-Kanban, Operations, Traffic, Patch-For-Review: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 - https://phabricator.wikimedia.org/T121562#2300952 (elukey) Distribution of the leaders: ``` elukey@kafka1012:~$ kafka topics --describe | grep Leader | awk '{print... [13:00:35] joal: batcave? [13:02:35] milimetric: o/ [13:02:43] hey elukey :) [13:03:39] Hi milimetric, I thought we have meeting with halfak? [13:03:47] he cancelled above joal [13:03:49] Oh, missed his ping [13:03:56] let's batcave then :) [13:04:00] yep [13:19:24] Analytics-Kanban: Spike - Slowly Changing Dimensions on Druid - https://phabricator.wikimedia.org/T134792#2301014 (Milimetric) Two approaches: * try to load data from Altiscale that @JAllemandou already has in a single flat denormalized schema. We'll query this in Hive and Druid when the cluster is up. * t... [13:30:45] Analytics-Kanban: Spike - Slowly Changing Dimensions on Druid - https://phabricator.wikimedia.org/T134792#2301072 (JAllemandou) Test data is revision oriented and based on this schema: ``` id BIGINT, timestamp STRING, page_id BIGINT, page_title STRING, page_namespace BIGINT, page_redi... [14:05:11] (CR) Mforns: [C: 1] "LGTM!" [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/289005 (https://phabricator.wikimedia.org/T126549) (owner: Milimetric) [14:06:44] (CR) Mforns: [C: 1] "LGTM!" [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/289004 (https://phabricator.wikimedia.org/T126549) (owner: Milimetric) [14:12:06] (CR) Mforns: [C: 1] "LGTM!" [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/289006 (https://phabricator.wikimedia.org/T126549) (owner: Milimetric) [14:16:53] Analytics-Cluster, Analytics-Kanban, Operations, Traffic, Patch-For-Review: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 - https://phabricator.wikimedia.org/T121562#2301203 (Ottomata) The increase in log size correlates to the time at which I set `inter.broker.protocol.version=0.9.0.X`.... [14:18:04] Analytics-Kanban, Operations, ops-eqiad, Patch-For-Review: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2301206 (Ottomata) 30GB for root should be fine, we do that on many other servers. [14:25:02] ottomata: o/ [14:29:26] hiiii [14:32:07] ottomata: I have two quetions for you if you have time.. first one is https://puppet-compiler.wmflabs.org/2823/ - is the gmond config ok in your opinion? It uses the same aqs stuff so it should be good for us to get metrics, but better super safe :) [14:32:24] second one.. 12TB of broker log size for kafka1012??? [14:34:09] milimetric: I'm back ! [14:35:03] yeah elukey def a problem there [14:35:09] haven't dove in yet but am about to [14:35:54] elukey: if it uses the same stuff from aqs100x it should be fine [14:35:57] ottomata: I checked partitions/broker distribution but it looks ok, then logs timestamps, etc.. but nothing really *wrong* that pops up [14:36:00] joal: in the batcave [14:36:17] oh, uh, mforns can join us... wait he's not on [14:36:19] i also highly doubt there are any app level alerts based on ganglia [14:36:19] ottomata: re: aqs - yeppa just didn't want to interfere with them [14:36:26] so it should be fine [14:36:42] in case you were worried about messing with some aggregate metrics alerts or something [14:36:45] but ja, should be fine [14:37:39] well going to merge them [14:37:42] *then [14:38:00] HMMM elukey i *think* this kafka log size might be fine [14:38:06] i'm looking at timestamps on actual log files on disk [14:38:09] and, for kafka1012 [14:38:16] there are a lot on may 12th, a lot more than others [14:38:18] i'm going to guess [14:38:24] that when switchign to 0.9 protocol [14:38:32] it has to touch all of the log files on disk and make a change [14:38:53] since retention is based on mtime of log files, it just hasn't deleted stuff that actually was created before may 12th [14:39:15] mmmm [14:39:18] if my guess is right, log size should start going back down on may 19th [14:39:50] so we should see the same thing for the other brokers too? [14:39:51] well, may 19th at 19:03 (that's when these were last modified) [14:39:54] yes i believe so [14:40:01] except those would start going down on may 23rd [14:40:26] okok makes sense, plus df -h shows 70% utilization only for some partitions, it is still fine [14:49:35] Analytics, Revision-Slider, TCB-Team, Patch-For-Review, TCB-Team-Sprint-2016-05-19: Data need: User Behaviour when comparing article revisions - https://phabricator.wikimedia.org/T134861#2301289 (Lea_WMDE) [15:06:54] Error: Execution of '/usr/bin/deploy-local --repo analytics/aqs/deploy -D log_json:False' returned 1: http://tin.eqiad.wmnet/analytics/aqs/deploy/.git [15:07:03] I forgot to add aqs100[456] to scap :P [15:07:40] elukey: i added Disk Free on 6 fullest partitions here [15:07:41] https://grafana-admin.wikimedia.org/dashboard/db/kafka [15:07:59] shows percent free and also bytes free when hovering [15:08:05] on the 6 fullest partitions across the cluster [15:08:11] so, we can see the worst case [15:08:33] if those get too close to 0 we'll have to act [15:08:37] ah partitions == disk partitions [15:08:41] ah [15:08:41] yes [15:08:42] i should change that [15:08:50] I was confused :P [15:09:05] haha, what to call it. [15:09:06] uhhhh [15:09:12] disks? [15:09:29] Free Space on 6 Fullest Disks [15:09:30] ? [15:10:32] Disk partitions? [15:10:55] I mean, Free Space on ... [15:11:12] that'll do [15:11:13] done [15:11:16] (PS1) Elukey: Add aqs100[456] to the list of production hosts. [analytics/aqs/deploy] - https://gerrit.wikimedia.org/r/289224 (https://phabricator.wikimedia.org/T135145) [15:11:28] i'm annoyed though, because now my # of graphs per panel is not divisible by 3! [15:11:30] GRRRRR [15:11:39] need to think of 2 more graphs to put in there :) [15:12:03] ahhahaah [15:12:13] ottomata: https://gerrit.wikimedia.org/r/#/c/289224/1 looks good? [15:12:49] (CR) Ottomata: [C: 1] Add aqs100[456] to the list of production hosts. [analytics/aqs/deploy] - https://gerrit.wikimedia.org/r/289224 (https://phabricator.wikimedia.org/T135145) (owner: Elukey) [15:12:50] ja +1 [15:13:00] heya milimetric, yt? [15:13:04] (CR) Elukey: [C: 2] Add aqs100[456] to the list of production hosts. [analytics/aqs/deploy] - https://gerrit.wikimedia.org/r/289224 (https://phabricator.wikimedia.org/T135145) (owner: Elukey) [15:13:13] (CR) Elukey: [V: 2] Add aqs100[456] to the list of production hosts. [analytics/aqs/deploy] - https://gerrit.wikimedia.org/r/289224 (https://phabricator.wikimedia.org/T135145) (owner: Elukey) [15:14:45] joal: hello! need to deploy aqs [15:14:58] for a small config change to enable the new hosts [15:15:09] anything against it? [15:15:31] I can use --limit otherwise [15:15:59] elukey: Heya [15:16:24] elukey: There have been a code change so if you deploy, it'll include thqt one [15:16:28] That's good for me :) [15:16:48] joal: anything weird that I should be aware? :D [15:17:11] Nope, I want to test after you deploy is all :) [15:17:35] joal: do you want me to use --limit initially? [15:17:35] Analytics-Cluster, Analytics-Kanban, Operations, Traffic, Patch-For-Review: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 - https://phabricator.wikimedia.org/T121562#2301358 (Ottomata) Ok, I believe that when switching `inter.broker.protocol.version` and bouncing brokers, on startup they... [15:18:33] elukey: I think we should in any case, to prevent global failure ;) [15:18:48] right :P [15:20:40] 15:20:27 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'analytics/aqs/deploy', '-g', 'default', 'rollback'] on aqs1001.eqiad.wmnet returned [255]: Agent admitted failure to sign using the key. [15:20:44] Permission denied (publickey,keyboard-interactive) [15:20:57] that was the annoying thing in beta [15:21:29] joal: do you sudo -u by any chance? [15:21:32] I guess no [15:21:44] you just execute deploy (now scap deploy!) [15:22:10] correct elukey [15:41:30] o/ joal & milimetric. Sorry for the late notice re. meeting this morning. [15:41:55] np halfak, just hadn't noticed it :) [15:43:46] milimetric: yt? [15:44:13] yep, hi nuria [15:45:21] !log Deploying aqs (deploy code only) [15:46:08] milimetric: saw your chnages for throttling but I think we should enable it right away rather than log it , cause where is the code logging the request ratios? [15:46:09] PROBLEM - Analytics Cassanda CQL query interface on aqs1004 is CRITICAL: Connection refused [15:46:48] PROBLEM - Analytics Cassanda CQL query interface on aqs1006 is CRITICAL: Connection refused [15:46:50] PROBLEM - Analytics Cassanda CQL query interface on aqs1005 is CRITICAL: Connection refused [15:46:51] elukey: joal ^^ we should prob disable those alerts, eh? [15:46:52] while testing? [15:47:08] doing it [15:47:43] nuria_: that's just what gabriel was doing and he recommended it to start [15:47:52] milimetric: but where are we logging? [15:48:06] milimetric: cause as far as i know there are no logs of teh service anywhere [15:48:10] *the [15:48:10] same place it logs their filter, I'll ask [15:48:18] gwicke: yt? [15:48:20] nuria_: but this is the front end restbase [15:48:31] not aqs [15:48:43] milimetric: right, of course, but even then [15:49:09] milimetric: i do not think they have any logs of requests as in "files you can look at" [15:50:15] ACKNOWLEDGEMENT - Analytics Cassanda CQL query interface on aqs1004 is CRITICAL: Connection refused Elukey Testing environment [15:50:19] ACKNOWLEDGEMENT - Analytics Cassanda CQL query interface on aqs1005 is CRITICAL: Connection refused Elukey Testing environment [15:50:23] ACKNOWLEDGEMENT - Analytics Cassanda CQL query interface on aqs1006 is CRITICAL: Connection refused Elukey Testing environment [15:51:04] milimetric: i'd like to load the druid cluster in hive with some data, to verify that my metadata store and hdfs deep storage configs work [15:51:05] nuria_: no, they do, I'm sure, because they just rolled this out for their own stuff, so they must be logging it somewhere [15:51:13] sorry [15:51:17] s/in hive/in labs/ [15:51:39] can you help me? :D [15:51:49] ottomata: sure, uh, lemme look if that big file is still around and then you can use my indexing job [15:52:03] milimetric: ok, let's make sure we have access to the logs, let me know when you find out [15:52:23] yeah, I was going to talk about all this at standup, nuria, no worries. Also, it's super easy to change [15:54:40] Analytics-Kanban, Operations, ops-eqiad, Patch-For-Review: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2301477 (RobH) We could put in a boot delay option (we used to have a similar issue as this on some models of Dells in the past.) I recall it simply applying to th... [15:54:43] !log Deploying onto new aqs [15:55:04] ok [16:14:49] wikimedia/mediawiki-extensions-EventLogging#557 (wmf/1.28.0-wmf.2 - 83ffff3 : Mukunda Modell): The build has errored. [16:14:50] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/83ffff38bcbc [16:14:50] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/130876727 [16:37:38] gwicke: yt? [16:55:23] joal: elukey i missed ops sync up too?!?!!? [16:55:31] what is wrong with my calendar, i ddin't see these on it this morn [16:55:58] :) [16:56:16] ottomata: do I need to be into a specific group to deploy AQS? [16:56:53] elukey: yes i think deploy-service [16:57:04] hmm, butdan and joal aren't in that [16:58:03] yeah.. [16:59:57] hmm, elukey yeah dunno how they deploy then. [17:00:04] but i thoght that was the group [17:00:17] maybe aqs-admins in some weird way? [17:00:20] I am not in it [17:01:47] maybe, but i don't see that the scap configs, dunno [17:01:55] ok, closing compy, am on hangout [17:33:39] joal: so not able only on 1006? [17:33:59] mobrovac: any chance that you are still onlinez? [17:34:58] also elukey, deployment dais it worked, but the new code is actually running only on aqs1005 [17:35:25] elukey: on 4 and 6, old (previsous patch) is still running, but code is present (weird state) [17:36:42] joal: but 1001/2/3 are fine right? [17:38:45] elukey: I haven't double check, doing now [17:39:05] thanks.. I think that I made a mistake not following the first steps [17:39:45] elukey: Yes, they are fine [17:39:53] elukey: I don't think so [17:40:22] I think scap is a bit suscpetible [17:40:24] :) [17:40:38] :P [17:50:48] sorry back, IRC issues [18:00:01] joal: will check tomorrow morning, I don't think that cassandra is running correctly on the new nodes [18:01:15] yes [18:01:16] May 17 18:00:48 aqs1004 cassandra[7508]: Exception encountered during startup: Unable to gossip with any seeds [18:01:29] this is from elukey@aqs1004:~$ sudo systemctl cassandra-a status [18:01:46] May 17 18:00:48 aqs1004 cassandra[7508]: WARN 18:00:48 No local state or state is in silent shutdown, not announcing shutdown [18:02:12] all right will bang my head against those host tomorrow [18:02:17] bye a-team! [18:02:31] bye elukey ! [18:02:40] Same for me, gone for today [18:02:47] See you tomorrow a-team [18:06:43] milimetric, mforns : i think the deffered unit tests do a good job of explaining how deferred works: https://github.com/knockout/knockout/blob/master/spec/asyncBehaviors.js [18:36:40] Analytics, Analytics-EventLogging, MediaWiki-extensions-MultimediaViewer, Reading-Web-Backlog: Parse mediaviewer team's requirements for EventLogging {oryx} - https://phabricator.wikimedia.org/T90766#2302360 (MBinder_WMF) [18:40:18] mforns: none of your if checks are due to the deffered I think, cc milimetric [18:40:33] milimetric, mforns : all those error conditions were there before [18:41:35] (CR) Nuria: [WIP] Fix unique devices bugs (4 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/288104 (https://phabricator.wikimedia.org/T122533) (owner: Mforns) [18:43:58] nuria_, was in the meeting with dan [18:44:01] k [18:44:20] nuria_, looking [18:44:48] mforns: I am going to try to rework test just for kicks but anyways, none of those ifs are due to teh deffered updates [18:44:51] *the [18:45:00] on my opinion that is [18:45:07] nuria_, aha [18:45:16] Analytics, Hovercards, Reading-Web-Backlog, Reading-Web-Sprint-72-Ninety-nine-problems-but-Nirzar-aint-one: Verify X-Analytics: preview=1 in stable - https://phabricator.wikimedia.org/T133067#2302482 (Jdlrobson) @dr0ptp4kt can you sign off? [18:45:20] mmm, I didn't get them before though [18:45:46] only when I changed to deferUpdates [18:51:29] nuria_, the jasmine.Clock think is cool! [18:51:39] mforns: just fixed test, will commit [18:52:00] (PS12) Nuria: [WIP] Fix unique devices bugs [analytics/dashiki] - https://gerrit.wikimedia.org/r/288104 (https://phabricator.wikimedia.org/T122533) (owner: Mforns) [18:52:46] mforns: the defer updates just surfaced teh conditions [18:52:48] *the [18:52:54] taht were there before [18:52:56] *that [18:53:08] let me give you an example [18:53:51] nuria_, yes, the code before assumed somethings were defined, because the synchronous execution guaranteed a certain order [18:54:10] mforns: ah ok [18:54:11] now there isn't any order [18:54:15] yes that is my point [18:54:21] aha [18:55:42] nuria_, the test looks better now, thanks [18:56:48] (CR) Nuria: "I think this is good to go, pending on @milimetric's review" [analytics/dashiki] - https://gerrit.wikimedia.org/r/288104 (https://phabricator.wikimedia.org/T122533) (owner: Mforns) [18:57:11] joal: yt? [18:57:49] Analytics, Wikipedia-Android-App-Backlog: Investigate recent decline in views and daily users - https://phabricator.wikimedia.org/T132965#2302573 (Tbayer) >>! In T132965#2288859, @dr0ptp4kt wrote: > Does https://gerrit.wikimedia.org/r/#/c/285051/ ({T133204}) address the header enrichment? It did not. Nu... [18:57:55] milimetric: , yt? [18:57:56] :) [18:58:02] milimetric, BTW, feel free to self-merge the RU changes whenever you want, and ping me if you want me to copilot [18:58:09] elukey: still stuck with aqs deployment? [18:58:24] ottomata: yes but I need to eat lunch [18:58:32] mforns: cool, thx [18:59:26] Analytics, Wikipedia-Android-App-Backlog: Investigate recent decline in views and daily users - https://phabricator.wikimedia.org/T132965#2302592 (Nuria) Bug is fixed on code: https://gerrit.wikimedia.org/r/#/c/288458/ We are in the process of backfilling pagebviews for apps: cc @JAllemandou [18:59:54] milimetric: ok, when you get fed let's have some druid fun [19:01:00] Analytics-Tech-community-metrics, Phabricator-Upstream, Upstream: List of Phabricator users - https://phabricator.wikimedia.org/T37508#2302596 (jayvdb) @Qgil, was there any #Phabricator-Upstream -iness about this task? Was the "People" a WMF requested feature? It seems to have existed long before W... [19:31:34] ok ottomata, back [19:31:44] I looked earlier for that big file, but it's gone [19:31:47] ok [19:31:52] wanna chat in the cave? [19:31:58] ja [19:55:07] ottomata: let me know when you guys are done [20:09:10] (PS3) Nuria: Initial content of analytics.wikimedia.org [analytics/analytics.wikimedia.org] - https://gerrit.wikimedia.org/r/289062 (https://phabricator.wikimedia.org/T134506) [20:27:27] a-team: visa done! They gave me an exact 19 day visa but I get to go! [20:27:39] yeehaw! [20:28:05] Analytics-Kanban: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2292830 (GWicke) As mentioned on IRC, you'll probably want to enable global rate limiting by adding a section in config.yaml like this: https://github.com/wikimedia/restbase/pull/613/files#diff-541b6e195e9da580... [20:31:31] madhuvishy: all right! one less thing to worry about [20:33:06] Analytics-Tech-community-metrics, Developer-Relations (Apr-Jun-2016): Play with Bitergia's Kabana UI (which might potential replace our current UI on korma.wmflabs.org) - https://phabricator.wikimedia.org/T127078#2302932 (Lcanasdiaz) Here you are a new version of the dashboards. Information is being upda... [20:53:58] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [30.0] [20:55:01] ^ fixing :( [20:55:31] !log restarted eventlogging after kafka1013 flapped [21:02:19] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 20.00% above the threshold [20.0] [21:08:48] congrats madhuvishy, crisis averted :) [21:15:34] thank y'all [21:15:37] thanks