[08:59:43] hello! [08:59:50] I was checking kafka1012 [08:59:52] /dev/sdf1 1.8T 1.5T 322G 83% /var/spool/kafka/f [08:59:58] --^ is the new disk [09:00:09] the new disk is already 83% full? [09:00:23] /o\ [09:00:27] root@kafka1012:/var/spool/kafka/f/data# du -h [09:00:27] 951M ./webrequest_mobile-10 [09:00:27] 831G ./webrequest_text-1 [09:00:28] 124K ./__consumer_offsets-22 [09:00:28] 5.6G ./eventlogging-valid-mixed-1 [09:00:30] 673G ./webrequest_upload-7 [09:00:32] 4.0K ./eventlogging_MobileWikiAppBannerClickThrough-0 [09:00:35] 171M ./eventlogging_WikipediaPortal-0 [09:00:37] 562M ./eventlogging_MobileWikiAppEdit-0 [09:00:40] 652M ./eventlogging_TestSearchSatisfaction2-0 [09:00:42] 212K ./__consumer_offsets-24 [09:00:45] 1018M ./webrequest_misc-0 [09:00:47] 159M ./eventlogging_ImageMetricsLoadingTime-0 [09:03:13] hm [09:04:22] elukey: We need to talk to ottomata about EL message keys --> kafka partitions messages based of the message key, and IIRC on EL the key is schema based [09:06:18] joal: yep I've heard you guys talking about it [09:06:45] elukey: I'm sure this is the thing, but it might [09:07:42] elukey: Can you tell me more about eventlogging-valid-mixed-??? partitions on other disks (or other machines?) [09:10:22] joal: I am missing something, sorry for the extra question. I am seeing webrequest_text/upload as major contributors for the disk saturation, so I thought it was more an imbalance due to varnishkafka [09:10:35] or possibly due to the kafka broker catching up with too much data [09:11:21] elukey: Yes you're absolutely right ! [09:11:24] My bad [09:12:04] ahhhhh okkkk! [09:12:16] elukey: last week webrequest_mobile has been merged into webrequest_text [09:13:02] the load was therefore shared among 12 partitions (6 for mobile, 6 for text) [09:13:12] Now, the load is shared among 6 partitions [09:13:17] Creating issues. [09:14:09] elukey: I had discussed with ottomata already about the number of partitions being too small when merging everything in text, but we said we'd wait and see --> I think we are gonna take actions :) [09:16:28] thanks for the clarification! but theoretically webrequest_mobile should go away right [09:16:31] ? [09:16:36] without it we are basically inline [09:16:50] good point in checking the other brokers though [09:17:04] webrequest_mobile is "away' means all it's trafiic is now handled by webrequest_text [09:17:04] I am restarting hhvm atm, going to check them in a bit [09:17:37] joal: yes, I meant that 951MI ./webrequest_mobile-10 will not be needed anymore [09:18:00] correct elukey, it's in kafka but don't even import it in camus anymore [09:18:06] \o/ [09:18:18] I am getting something right once in a while [09:18:28] all due to your patience joal :D [09:20:07] narf, I'm not patient, I'm thinking aloud :) [09:20:24] And you get it right :) [09:22:31] elukey@neodymium:~$ sudo salt kafka* cmd.run 'df -h | egrep "[6789].%"' [09:22:34] kafka1013.eqiad.wmnet: [09:22:36] kafka1012.eqiad.wmnet: /dev/sdb3 1.8T 1.5T 271G 86% /var/spool/kafka/b /dev/sdf1 1.8T 1.5T 320G 83% /var/spool/kafka/f [09:22:39] kafka1014.eqiad.wmnet: [09:22:42] kafka1018.eqiad.wmnet: /dev/sdg1 1.8T 1.1T 727G 61% /var/spool/kafka/g /dev/sdj1 1.8T 1.1T 729G 61% /var/spool/kafka/j /dev/sdb1 1.8T 1.2T 694G 63% /var/spool/kafka/b [09:22:46] kafka1020.eqiad.wmnet: [09:22:48] kafka1022.eqiad.wmnet: /dev/sdb3 1.8T 1.1T 684G 63% /var/spool/kafka/b /dev/sdi1 1.8T 1.1T 713G 62% /var/spool/kafka/i /dev/sdk1 1.8T 1.2T 703G 62% /var/spool/kafka/k [09:23:04] so only kafka1012.eqiad.wmnet is really a bit overloaded [09:23:08] bad mobile is bad [09:25:31] ? [09:25:59] elukey: not easily readable [09:26:52] elukey: when having multiple lines to show, best practive is to use a paste (in phab, gist or whatever you prefer :) [09:26:58] ahhh sorry, my bad [09:27:01] npo [09:27:28] tl;dr - only kafka1012 has disks with > 80% of space used [09:28:10] hm, kafka 1013? [09:36:22] http://hastebin.com/jupexelobu.rb - better [09:38:24] 1013 seems fine [09:38:37] Ahhh, yes ! [09:38:40] Thanks :) [09:39:10] hm, so 1012 takes a bigger [09:39:16] hhit ... Weird [09:43:26] trying to check if 1012 is the only one with mobile [09:45:46] elukey: mobile is not even 1% of text, don't bother [09:50:51] joal sorry I confused mobile with upload :( [09:50:59] np :) [09:51:07] * joal is away for a bit [10:01:03] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#2007384 (Kurisutina24) Hello! I would like to work on this project for Outreachy Round 12 and I have also started with the... [10:09:02] * elukey commutes to the office [10:54:19] I am in the office but.. internet is not working well (I am on mobile connection for the moment). I hope to be fully ready soon :( [11:36:51] Analytics-Tech-community-metrics: Microtask: Create a very simple REST API for SortingHat - https://phabricator.wikimedia.org/T114838#2007546 (01tonythomas) >>! In T114838#1971390, @Saylikarnik wrote: > Hello,I am Sayli Karnik ,an Outreachy aspirant for the upcoming Round 12. I am proficient in HTML, CSS, Ja... [11:39:50] all right back :) [11:40:32] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#2007554 (01tonythomas) >>! In T60585#2007384, @Kurisutina24 wrote: > Hello! I would like to work on this project for Outre... [12:20:55] ottomata: I am sure that https://gerrit.wikimedia.org/r/#/c/268682/13 is wrong, but I tried to refactor it a bit. Please be patient :D [12:23:01] I included also nuria's change [12:45:04] ottomata: we also have a problem with the new disk in kafka1012, namely [12:45:08] /dev/sdf1 1.8T 1.6T 296G 84% /var/spool/kafka/f [12:45:20] let me know when you'll be up and running :) [13:15:51] * elukey grabs lunch [13:31:14] Analytics-Tech-community-metrics: Microtask: Create a very simple REST API for SortingHat - https://phabricator.wikimedia.org/T114838#2007683 (Aklapper) Hi @Saylikarnik. Thanks for your interest! Apart from what @01tonythomas already wrote: As you commented on this task, do you have a [[ https://www.mediawik... [13:33:26] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#2007687 (Aklapper) @Kurisutina24: Hi and welcome! Please also check https://www.mediawiki.org/wiki/How_to_become_a_MediaWik... [13:38:19] * elukey back [13:38:34] joal: /away [13:38:40] ? [13:41:27] elukey: wassup ? [13:41:38] sorry Joseph! I wanted to tell you something and put my status not in away [13:41:51] :D [13:42:00] ah, no prob, just wondered :) [13:42:48] anyhow, wanted to tell you that hhvm has been upgraded, but since kafka1012 is still not "green" I think it would be best to wait for andrew before restarting the brokers [13:43:02] so not sure if we'll do the work today [13:43:23] agreed: this disk thing should be sorted before we move forward I guess [13:50:17] Analytics-Tech-community-metrics, DevRel-February-2016: top-contributors.html is not sorted by rank anymore - https://phabricator.wikimedia.org/T125797#2007722 (Aklapper) Open>declined a:Aklapper Cannot reproduce. Will reopen once I manage again. [13:50:20] Analytics-Tech-community-metrics, DevRel-February-2016: Key performance indicator: Top contributors: Find good Ranking algorithm fix bugs on page - https://phabricator.wikimedia.org/T64221#2007725 (Aklapper) [14:01:28] (PS1) Addshore: Fix WikimediaCurl @author tag [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/269120 [14:01:52] (CR) Addshore: [C: 2 V: 2] Fix WikimediaCurl @author tag [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/269120 (owner: Addshore) [14:09:49] Analytics, DBA, WMDE-Analytics-Engineering: labtestwiki appears in the dblist but can not be found on analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2007754 (Addshore) NEW [14:17:31] Analytics, DBA, WMDE-Analytics-Engineering: labtestwiki appears in the dblist but can not be found on analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2007762 (Krenair) The wiki exists, it's just not hosted by the normal MySQL servers. It's like labswiki which runs on silver onl... [14:17:31] Hey! Is there a tool where I can see the load time stats for enwiki articles? It seems really slow over the past few days. (Not sure if this is the right channel to ask about this :) [14:19:04] Analytics, DBA, WMDE-Analytics-Engineering: labtestwiki appears in the dblist but can not be found on analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2007765 (Krenair) See also T89548 which obviously should be dealt with before this [14:21:02] Analytics, DBA, WMDE-Analytics-Engineering: labtestwiki appears in the dblist but can not be found on analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2007769 (Krenair) > I am making an incorrect assumption that dbs on the list should always be replicated to this servers I thin... [14:21:26] (PS1) Addshore: Make minutely wdqs run for each host each min [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/269124 (https://phabricator.wikimedia.org/T126004) [14:21:34] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2007773 (Krenair) [14:22:00] (CR) Addshore: [C: 2 V: 2] Make minutely wdqs run for each host each min [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/269124 (https://phabricator.wikimedia.org/T126004) (owner: Addshore) [14:24:39] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2007779 (Addshore) Thanks for all the info @Krenair and poking this ticket into the correct shape ;) [15:19:35] elukey: good morning! [15:19:43] goooood morning! [15:19:49] y u change the $email_template = 'burrow/email.tmpl'? :) [15:20:57] also, nuria has added lag_window in this patch: https://gerrit.wikimedia.org/r/#/c/268594/6/modules/burrow/manifests/init.pp [15:22:37] not sure I felt that it was belonging to heira with nuria's change, but I can revert :) [15:23:13] so, about module defaults [15:23:17] the module should always have sane defaults [15:23:28] usually, those are what the package or service you are configuring has [15:23:51] most of the time, you want to think of the module as 100% usable outside of the operations/puppet repo [15:23:58] pretend you don't ahve hiera or role classes at all [15:24:06] it should be totally decoupled [15:24:22] ok, I was looking at the problem more like "I want to force people to be aware of those parameters" [15:24:22] (most of the time) someone else should be able to take the module and use it in their own puppet repo if they wanted [15:25:10] naw you want to make it easy to use. those parameters have defaults that will work in most cases. for special cases people can change them if they want, and in those cases they look up how to change them [15:25:28] there may be occasions when you will want to force people to set things [15:25:32] an snap I didn't see https://gerrit.wikimedia.org/r/#/c/268594/6, but only the other one with the value :( [15:26:00] but, not for changing simple defaults like lagcheck_intervals [15:26:03] or the email template [15:26:30] makes sense. [15:26:45] I'll move everything from heira to the module then [15:26:56] but possibily after 268594 [15:27:45] heh, either way one will conflict with the other, and you'll have to resolve in a local rebase [15:27:47] but that's ok [15:28:13] we could also pack everything in mine [15:28:27] naw, easiest to do it this way [15:28:32] conflicts are easy enough to resolve [15:28:44] and its better (even though i am bad at this) to have small commits that to one thing [15:28:53] +! [15:28:56] +1 [15:29:24] all right I'll wait for the code to be merged, then I'll resolve the conflict [15:29:27] :) [15:32:17] hehe, there's no conflict yet [15:32:19] if we merge yours first [15:32:24] nuria will have to resolve it :p [16:50:17] holaaa [16:50:30] Heya [17:00:20] a-team: standddupppppp [17:01:14] ops meetinggggg [17:01:15] sorry! [17:01:26] nuria_: me too! [17:01:36] maybe we shoudl change monday standup time? :)_ [17:06:38] (CR) Milimetric: [C: 2 V: 2] Add note in README about Hiera hostnames config [analytics/dashiki] - https://gerrit.wikimedia.org/r/268829 (owner: Madhuvishy) [17:07:17] (CR) Milimetric: [C: 2 V: 2] Add friendly prints to the fab tasks [analytics/dashiki] - https://gerrit.wikimedia.org/r/268830 (owner: Madhuvishy) [17:09:03] Analytics-Kanban, Patch-For-Review: Buurow Increase length of window to evaluate lag [1 pts] - https://phabricator.wikimedia.org/T125916#2008314 (Nuria) a:Nuria [17:10:44] (CR) Milimetric: [C: -1] Updated result of validation after creating cohort. (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/263911 (owner: Wassan.anmol) [17:14:47] Krinkle: yt? [17:40:26] Analytics-Kanban, Patch-For-Review: Fabric-alize dashiki dashboard deployments {crow} [13 pts] - https://phabricator.wikimedia.org/T110351#2008424 (Milimetric) [17:40:34] a-team: sorry for not having sent the e-scrum but I was fighting with a redis/memcached issue in prod :( [17:40:44] nuria_: afk. Back in 1-2h [17:41:05] Krinkle: ok, we need 15 mins of your time today if that is ok, [17:41:15] elukey, np, you can send it later if you want no? [17:41:43] sure! [17:42:35] Analytics-Kanban: Productionize last access jobs for monthly calculations {bear} [8 pts] - https://phabricator.wikimedia.org/T124678#2008426 (JAllemandou) [17:42:52] nuria_: ok [17:44:23] Analytics-Kanban: Eventlogging should start with one bad kafka broker, retest that is the case {oryx} [5 pts] - https://phabricator.wikimedia.org/T125228#2008439 (Milimetric) [17:45:10] Analytics: Cassandra Backfill July [5 pts] {melc} - https://phabricator.wikimedia.org/T119863#2008448 (Nuria) [17:45:12] Analytics-Kanban: Projections of cost and scaling for pageview API. {hawk} [8 pts] - https://phabricator.wikimedia.org/T116097#2008447 (Nuria) Open>Resolved [17:47:43] Analytics: Consider SSTable bulk loading for AQS imports - https://phabricator.wikimedia.org/T126243#2008467 (Eevans) NEW [17:51:19] Analytics-Kanban: Make Dashiki get pageview data from pageview API {melc} [8 pts] - https://phabricator.wikimedia.org/T124063#2008513 (Milimetric) [17:51:51] Analytics-Kanban, Patch-For-Review: Fabric-alize dashiki dashboard deployments {crow} [13 pts] - https://phabricator.wikimedia.org/T110351#2008514 (Nuria) Open>Resolved [17:57:40] Analytics: Get piwik stats for dashiki - https://phabricator.wikimedia.org/T126247#2008529 (Nuria) p:Triage>Normal [18:00:04] Analytics-Kanban: Have dashiki read and write GET params to pass stateful versions of dashboard pages {crow} - https://phabricator.wikimedia.org/T119996#2008544 (Milimetric) a:Nuria>None [18:00:06] Analytics-Kanban: Have dashiki read and write GET params to pass stateful versions of dashboard pages {crow} - https://phabricator.wikimedia.org/T119996#2008546 (Nuria) [18:00:18] Analytics: Have dashiki read and write GET params to pass stateful versions of dashboard pages {crow} - https://phabricator.wikimedia.org/T119996#1842220 (Nuria) [18:01:01] Analytics, ArchCom-RfC, Discovery, EventBus, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#2008548 (Milimetric) [18:08:48] nuria_: if you have a minute later on I'd like to talk about https://gerrit.wikimedia.org/r/#/c/268594/6 [18:09:05] elukey: we are on tasking , you can join in if you want [18:09:26] yep I was about to, still working on some ops things :( [18:09:27] elukey: ah, sorry [18:09:40] Analytics-EventLogging, Analytics-Kanban: Add autoincrement id to EventLogging MySQL tables. {oryx} [8 pts] - https://phabricator.wikimedia.org/T125135#2008569 (Milimetric) [18:09:42] elukey: let's talk about it yes, i thought it was your change not mine [18:10:08] Analytics-Kanban: Lower parallelization on EventLogging to 1 consumer {oryx} [3 pts] - https://phabricator.wikimedia.org/T125225#2008571 (Milimetric) [18:10:23] Analytics-Kanban: Lower parallelization on EventLogging to 1 consumer {oryx} [3 pts] - https://phabricator.wikimedia.org/T125225#1981933 (Milimetric) p:High>Unbreak! a:elukey [18:10:42] elukey: we also assigned you one item in tasking, we can talk about it tomorrow [18:10:59] Analytics-Kanban: Lower parallelization on EventLogging to 1 consumer {oryx} [3 pts] - https://phabricator.wikimedia.org/T125225#1981933 (Milimetric) p:Unbreak!>High [18:12:04] Analytics: Consider SSTable bulk loading for AQS imports - https://phabricator.wikimedia.org/T126243#2008591 (Eevans) [18:24:37] Analytics: Get piwik stats for dashiki - https://phabricator.wikimedia.org/T126247#2008673 (Johsthao) [18:24:45] Analytics: Consider SSTable bulk loading for AQS imports - https://phabricator.wikimedia.org/T126243#2008677 (Johsthao) [18:25:10] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2008689 (Johsthao) [18:32:18] Analytics: Get piwik stats for dashiki - https://phabricator.wikimedia.org/T126247#2008781 (matmarex) duplicate>Open [18:32:28] Analytics: Consider SSTable bulk loading for AQS imports - https://phabricator.wikimedia.org/T126243#2008786 (matmarex) duplicate>Open [18:33:04] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2008803 (matmarex) duplicate>Open [18:34:40] Analytics-EventLogging, Analytics-Kanban: Send raw server side events to Kafka using a PHP Kafka Client {oryx} [0 pts] - https://phabricator.wikimedia.org/T106257#2008848 (Milimetric) [18:35:20] Analytics-EventLogging, Analytics-Kanban: Send raw server side events to Kafka using a PHP Kafka Client {oryx} [0 pts] - https://phabricator.wikimedia.org/T106257#2008851 (Nuria) Substaks: This is likely between 21 and 34. Substasks: - make sure we can publish json text with mediawiki mononlog (right... [18:47:09] Analytics: Remove cron on wikimetrics instance that updates vital signs [1 pts] - https://phabricator.wikimedia.org/T125751#2008962 (Nuria) [18:47:58] Analytics, Analytics-EventLogging: Send raw server side events to Kafka using a PHP Kafka Client {oryx} [0 pts] - https://phabricator.wikimedia.org/T106257#2008963 (Nuria) [18:51:35] Analytics, Analytics-EventLogging: Send raw server side events to Kafka using a PHP Kafka Client {oryx} [0 pts] - https://phabricator.wikimedia.org/T106257#2008988 (Nuria) p:Normal>High [18:52:07] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2008995 (jcrespo) > I am making an incorrect assumption that dbs on the list should always be replicated to this servers There is some separation between lab... [18:52:58] Analytics, Analytics-EventLogging: Send raw server side events to Kafka using a PHP Kafka Client {oryx} [0 pts] - https://phabricator.wikimedia.org/T106257#2009006 (Milimetric) [18:53:00] Analytics: Server side eventlogging should publish to kafka and not use udp {stag} - https://phabricator.wikimedia.org/T124813#2009005 (Milimetric) [18:55:07] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2009011 (jcrespo) Oh, I missread labtestwiki vs. labswiki. If labstestwiki is on s3, and it is as small as I suppose, it should be already there. I will inves... [18:56:20] oh mann i shoulda come to tasking [18:56:22] sorry guys [18:56:28] was helping jeff green with more kafka stuff [18:56:59] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2009014 (jcrespo) I should have read again, my previous comment apply, then T126218#2008995. [18:57:25] Analytics: camus-wediawiki job should run in production (or essential?) queue [1 pts] - https://phabricator.wikimedia.org/T125967#2009015 (Nuria) [18:57:28] Analytics: camus-wediawiki job should run in production (or essential?) queue {hawk} [1 pts] - https://phabricator.wikimedia.org/T125967#2009017 (Milimetric) [18:59:28] Analytics: Use a new approach to compute monthly top 1000 articles (brute force probably works) [8 pts] - https://phabricator.wikimedia.org/T120113#2009020 (Nuria) [19:00:07] Analytics: Use a new approach to compute monthly top 1000 articles (brute force probably works) {slug} [8 pts] - https://phabricator.wikimedia.org/T120113#2009021 (Milimetric) [19:01:34] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2009022 (Addshore) >>! In T126218#2008995, @jcrespo wrote: > I strongly suggest to iterate over "all - silver.list", if that makes sense. If there is real int... [19:07:52] madhuvishy: btw, the wikimetrics deploy went perfectly smoothly [19:08:02] i did staging then prod [19:08:04] milimetric: oh yay :D [19:09:13] milimetric: i should have put a line in the fabric readme about restarting puppet after changing hiera config or waiting ~20 minutes for the changes to effect - should i do it or can you add it? [19:10:01] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2009045 (jcrespo) I have no idea where that is, but if its name is labstestweb*2*XXX, there is a high chance it is on a different datacenter (dallas). [19:10:54] milimetric: should we do that restbase change? [19:21:42] Analytics, DBA, WMDE-Analytics-Engineering: Replicate wikitech wikis to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T126218#2009089 (Krenair) The dblist containing both wikis with DBs hosted externally is wikitech.dblist. [19:22:58] ottomata: yeah, lemme eat something 'cause we just had meetings all day [19:23:06] *all afternoon :) [19:23:20] but I say all day 'cause i'm grumpy, so I need to eat :) [19:24:13] elukey: I'll do this now - if that's fine with you. https://phabricator.wikimedia.org/T125225 [19:25:16] nuria: I'm not aware of any progress or change in browser reports. What exactly is this about? I'm a bit behind on mailing lists. [19:27:34] madhuvishy: sure :) [19:27:45] elukey: cool! [19:28:09] Analytics-Kanban, Patch-For-Review: Lower parallelization on EventLogging to 1 consumer {oryx} [3 pts] - https://phabricator.wikimedia.org/T125225#2009149 (madhuvishy) a:elukey>madhuvishy [19:28:10] madhuvishy: I've also found a solution for the burrow email template, but the CR is still not ready.. Andrew made some comments and I am waiting for Nuria's CR [19:28:21] elukey: okay :) [19:28:38] but I want to add stuff like the lagcheck interval, that are config dependent [19:28:52] so anybody will be able to make calculation on the sliding window directly [19:28:59] not making assumption [19:29:47] right [19:30:06] ottomata: https://gerrit.wikimedia.org/r/#/c/269185 can you CR this? [19:33:23] looks good madhuvishy, shall I merge? [19:33:28] ottomata: yup! [19:35:54] elukey: i have +1 already [19:36:19] all right shall I merge? [19:37:49] elukey: ottomata will merge i think [19:37:53] there is a comment in https://gerrit.wikimedia.org/r/#/c/268594/7/hieradata/role/common/analytics/burrow.yaml [19:38:26] all right I'll amend mine tomorrow :) [19:38:33] elukey: ah , sorry,m not my patch, I thought we were talking about yours [19:39:04] ottomata: i should restart puppet for the change to take effect - should i restart eventlogging? [19:39:09] nope! Mine is not ready yet, I wanted to merge yours before to re-use the lagcheck interval in the email template [19:39:16] puppet has run [19:39:17] elukey: sorry, i need to submit 1 more [19:39:19] madhuvishy: yes restart el [19:39:24] ottomata: oh cool okay doing [19:39:28] nuria: sure! I'll restart tomorrow :) [19:39:41] elukey: can merge! [19:39:42] :) [19:39:49] he's got da powerrrrr [19:40:02] DA POWA [19:40:24] but Nuria needs to submit a change, so you'll do it later on :P :P [19:40:52] oh ok hehh [19:40:57] elukey: what about yours? looking..>. :) [19:41:03] oh you want nuria's to go first? [19:41:18] milimetric: i'm ready for aqs stuff whenever you are [19:41:41] ottomata: yes! so I'll re-use the parameter and I'll remove heira stuff [19:42:07] ottomata: looks good - there's only one consumer running [19:42:28] k [19:42:29] perfect [19:43:42] will keep an eye on grafana [19:43:44] logging off, talk with you tomorrow!! [19:43:49] laters! [19:43:57] good night elukey :) [19:44:32] Hm.. Kafka metrics in graphite broke. I guess the metric changed? [19:44:57] Previously: kafka.kafka*.kafka.server.BrokerTopicMetrics [19:45:01] Currently: kafka.cluster.analytics-eqiad.kafka.*.kafka.server.BrokerTopicMetrics [19:45:27] it chnaged Krinkle [19:45:35] to make it work with mulitple clusters [19:45:49] sorry, shoulda thought to notify you [19:46:04] check https://grafana.wikimedia.org/dashboard/db/kafka for some usage [19:46:09] did some templating stuff [19:46:30] Hm.. k [19:46:38] It doens't go back more than a week or so [19:46:55] the metrics weren't copied over, they are just new metrics now [19:47:05] but, if you are looking at last week, it may be weird if you include kafka1012 [19:47:12] it was down for a while last week [19:47:32] and for some reason grafana won't show data if it has to render for all brokers in the time period when it was down [19:51:11] A-team, I'm off for tonight ! [19:51:17] See y'all tomorrow :) [19:51:26] night joal :) [19:58:11] ottomata: still looks down to me now? [19:58:13] 0 messages [19:58:54] anyway, the new pattern works fine [19:59:02] will have to update a number of dashboards eventually [19:59:06] It's quite a long property path [19:59:53] yeah :/ [20:00:00] Krinkle: , quick grafana q for you [20:00:16] i want have that kafka messages per sec metric [20:00:22] sorry messagesIn [20:00:23] i can get [20:00:27] OneMinuteRate [20:00:36] or I can get count ( which is always increasing) [20:00:40] i want to look at [20:00:47] sum messages per minute [20:01:02] I'm going to ask the services folks about deploying because that test script wasn't working, then we can deploy [20:01:06] i think i'd want to do sum(count, 1m) or something [20:01:07] ottomata: Check https://wikitech.wikimedia.org/wiki/Graphite#Counters first [20:01:11] but im' not sure [20:01:13] oo k [20:01:14] Use .rate always [20:01:25] Which is average rate per second [20:01:28] ah but these are not from statsd [20:01:35] Hm. k [20:01:43] well, i guess they are [20:01:44] hm [20:01:44] hang on [20:01:47] Still, Im fairly sure OneMinuteRate is per second [20:01:59] It's the avg rate / sec of one minute window [20:02:03] no they aren't [20:02:07] yeh it is [20:02:17] :) [20:02:21] ah scale [20:02:22] ok trying [20:02:33] Check https://grafana-admin.wikimedia.org/dashboard/db/eventlogging-schema [20:02:48] ja that looks right [20:02:52] https://grafana-admin.wikimedia.org/dashboard/db/eventlogging-schema?panelId=9&fullscreen&edit [20:02:57] ahh yeha [20:02:58] cool [20:03:18] OneMinuteRate scale(60) if you want per min [20:03:24] and always sumSeries() to add up from diff brokers [20:03:31] great perfect [20:04:02] it also has a MessageInPerSec....FifteenMInuteRate for example [20:04:09] k :) [20:05:01] ok, ottomata, ready to plan the deploy [20:05:10] so we have to sync puppet with code [20:05:27] not sure how to do that, I haven't done the deployer patch for the code yet [20:05:52] Krinkle: can't use sumSeries on this, because I've got 2 wildcards (broker, topic) [20:05:55] trying to groupByNode... [20:05:59] with scale [20:06:00] not sure that works [20:06:13] oh ja it does [20:06:14] cool [20:06:31] hmm maybe [20:07:02] ottomata: That's fine. sumSeriesWithWildcard() [20:07:09] To pick which one you want [20:07:11] to expand [20:07:15] oo [20:07:31] in Grafana, you can click on a function to get a (?) visible, which points to Graphite documentation [20:07:36] for e.g. signature params [20:07:48] http://graphite.readthedocs.org/en/latest/functions.html [20:08:02] (some listed there are not available in our install though as we have a slightly older version) [20:08:37] ottomata: Graphite has many problems. But not having enough functions is not one of them. [20:08:37] ja [20:08:46] uhhh, hm, is the node 0 indexed? [20:08:51] (milimetric 2 mins...) [20:08:57] Maybe, on Monday? I think so. [20:08:59] no rush [20:09:05] ha no, i mean [20:09:09] in the metric [20:09:11] like [20:09:12] a.b.c. [20:09:14] is c 3 or 2? [20:09:21] I know. I was joking. The point is, It's unpredictable. [20:09:23] haha [20:09:23] ok [20:09:25] really? [20:09:28] I thnk this one is [20:09:28] depends on function? [20:09:37] You sound so surprised ! [20:09:42] yes! haha [20:10:02] uhh i'm confused because my aliasByNode is not what I expect. [20:10:09] so i'm not sure if i'm doing sumwith wildcards wright [20:10:11] Right. That one probably isn't. [20:10:12] hang on, will save and link you [20:10:16] Just try it :) [20:10:27] well, sum with wildcards doesn't seem to care [20:12:01] Krinkle: https://grafana-admin.wikimedia.org/dashboard/db/eventbus?panelId=3&fullscreen [20:12:59] changing the number in sumSeriesWIthWildcards doesn't do what i'd expect [20:13:06] and i have no idea how aliasByNode thinks topic name is 8 [20:13:07] :) [20:13:18] anyway, no worries if you don't have time to look at it, that looks about right to me [20:13:21] milimetric: let's do it! [20:13:45] ok milimetric so [20:13:50] i merge change and run puppet on nodes [20:13:59] no wait [20:14:01] then, we do a deploy to just aqs1001 [20:14:10] restart restbase there [20:14:12] then you test [20:14:16] then if good, we proceed with others? [20:14:22] i'm not sure the code change has what i was expectign: https://gerrit.wikimedia.org/r/#/c/269199/1,publish [20:14:26] it's huge and hard to read [20:14:36] ottomata: IT seems these metrics are only on one broker each [20:14:37] oh, ok [20:14:39] so it's not very visible [20:14:40] check with services? [20:15:14] sum...(4) seems to do it [20:15:23] oh ya, hm, because the topics only have one partition [20:15:24] If you remove the alias() you'll see which one in the metic name disappears [20:15:58] huh ok i see [20:15:58] cool [20:16:01] Which is why aliasByNode() changes depending on presence of sumSerieswithWildcard [20:16:02] ohhhh [20:16:04] got it! [20:16:07] that makes sense now [20:16:07] yeeeeeah [20:16:18] It's all piped [20:16:47] You can even aliasByNode() and then aliasSub() to transform the chosen name [20:16:55] ottomata: no, it's good, we can deploy [20:16:58] (which is nicer than building a massive regex to catch the right property) [20:17:00] anyway :) [20:17:03] so the directions are the *deploy bullet: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/AQS#Deploying [20:17:16] thanks Krinkle, that's all good then [20:17:19]