[00:08:12] Analytics-Kanban, Patch-For-Review, WMF-deploy-2015-10-06_(1.27.0-wmf.2): Bug: client IP is being hashed differently by the different parallel processors {stag} [8 pts] - https://phabricator.wikimedia.org/T112688#1687831 (Ottomata) Phew, ok, I got this deployed, but I had to do a little hackery. Our pa... [00:10:01] Analytics-Backlog, Analytics-EventLogging: Upgrade eventlogging servers to Jessie - https://phabricator.wikimedia.org/T114199#1687834 (Ottomata) NEW [01:08:57] Analytics-Kanban, Patch-For-Review, WMF-deploy-2015-10-06_(1.27.0-wmf.2): Bug: client IP is being hashed differently by the different parallel processors {stag} [8 pts] - https://phabricator.wikimedia.org/T112688#1688102 (Jdforrester-WMF) [01:41:20] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1688147 (Jgreen) >>! In T97676#1687605, @awight wrote: > Furthermore, when we do make the change, the `count` c... [02:33:16] Analytics-Backlog, Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1688177 (Bawolff) Can't you already write db queries of this form using UNION and foreign table references? (Of course, that's not very user friendly) [03:22:24] Quarry: 'New query' highlighted when looking at existing queries - https://phabricator.wikimedia.org/T106411#1688191 (Ricordisamoa) Because of https://github.com/wikimedia/analytics-quarry-web/blob/2f23db6fa4c2891cefdb40fd2f13e00b6514ba9a/quarry/web/templates/query/view.html#L1 [04:32:50] Analytics-Backlog, Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1688272 (Bawolff) I mean, like http://quarry.wmflabs.org/query/5417 [04:38:46] Analytics-Backlog, Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1688278 (Ricordisamoa) >>! In T95582#1688177, @Bawolff wrote: > Can't you already write db queries of this form using UNION and foreign table references? (... [05:03:52] Analytics-Backlog, Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1688289 (Bawolff) >>! In T95582#1688278, @Ricordisamoa wrote: >>>! In T95582#1688177, @Bawolff wrote: >> Can't you already write db queries of this form us... [09:49:01] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1688646 (Anmolkalia) @jgbrah, thank you for the guidance. I am on it. Thanks. [11:50:43] Analytics-Tech-community-metrics: Remove deprecated repositories from korma.wmflabs.org code review metrics - https://phabricator.wikimedia.org/T101777#1688869 (Aklapper) [12:09:30] (PS3) Addshore: Split metrics up [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242199 [12:34:06] (PS4) Addshore: Split metrics up [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242199 [12:41:41] (PS1) Addshore: chmod +x all .sh files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242529 [12:50:19] headsup, stat1001-stat1003 will be rebooted starting in about 10 minutes [13:10:22] ok, thx moritzm [13:13:56] all three rebooted into fresh kernels, you can use them again [13:18:31] great [13:20:07] :) [13:20:11] stat1002 looks so empty ;) [13:20:31] :) [13:24:28] (CR) Aude: Split metrics up (1 comment) [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242199 (owner: Addshore) [13:26:02] (CR) Addshore: Split metrics up (1 comment) [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242199 (owner: Addshore) [13:27:00] (CR) Aude: Split metrics up (1 comment) [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242199 (owner: Addshore) [13:35:47] Analytics-Backlog: Standardize Hive UDF code comments and generate documentation {flea} - https://phabricator.wikimedia.org/T114238#1689125 (Milimetric) NEW [13:52:03] (PS1) Addshore: Add READMEs for all metrics [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242546 [13:53:33] (CR) Addshore: [C: 2 V: 2] Add READMEs for all metrics [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242546 (owner: Addshore) [13:53:43] (CR) Addshore: [C: 2 V: 2] chmod +x all .sh files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242529 (owner: Addshore) [13:54:03] (CR) Addshore: [C: 2 V: 2] Move create SQL from comments to own files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242100 (owner: Addshore) [13:54:16] (CR) Addshore: [C: 2 V: 2] Classify wikidata_social.php [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242097 (owner: Addshore) [13:54:30] (CR) Addshore: [C: 2 V: 2] Move scripts to a src dir [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242095 (owner: Addshore) [13:54:50] (CR) Addshore: [C: 2 V: 2] Also copy tsv files to aggregate-datasets [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240725 (owner: Addshore) [13:55:28] (CR) Addshore: [C: 2 V: 2] Add sql to tsv script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240724 (owner: Addshore) [13:55:39] (CR) Addshore: [C: 2 V: 2] Add social stats tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240710 (owner: Addshore) [13:56:32] (CR) Addshore: Add social stats tracking script (2 comments) [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240710 (owner: Addshore) [13:56:46] (CR) Addshore: [C: 2 V: 2] Add getclaims property use tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240653 (owner: Addshore) [13:56:56] (CR) Addshore: [C: 2 V: 2] Script for tracking site_stats over time [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240652 (owner: Addshore) [13:56:59] #spam [13:57:40] *twiddles thumbs* ahh, jenkins isnt going to do the merging.... [14:00:50] (CR) Addshore: [C: 2 V: 2] Split metrics up [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242199 (owner: Addshore) [14:15:34] holaaa [14:15:46] ey nuria [14:23:53] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1689285 (Qgil) [14:25:01] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1027974 (Qgil) Thank you @jgbarah! Please update the description and propose some microtasks. When this is done, we will promote this project idea to #Outreachy-Round-11 candid... [14:29:43] Analytics-Backlog, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1689330 (mforns) I'm with Nuria in that we have to evaluate whether the complexity in the... [14:35:02] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1689357 (JAllemandou) Hey Tilman, About bots no backfill was done. It could have been possible to do it from July 17th (2 month of refined webrequest are kept), but we t... [15:21:02] Analytics-Backlog, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1689560 (BBlack) Logging events and stats without having significant complexity or perf is... [15:35:55] Analytics-Kanban, Patch-For-Review, WMF-deploy-2015-10-06_(1.27.0-wmf.2): Bug: client IP is being hashed differently by the different parallel processors {stag} [13 pts] - https://phabricator.wikimedia.org/T112688#1689630 (ggellerman) [15:38:52] Analytics-Kanban: Introduction to Hive class {flea} [13pts] - https://phabricator.wikimedia.org/T113545#1689645 (ggellerman) [15:39:06] Analytics-Kanban: Introduction to Hive class {flea} [13 pts] - https://phabricator.wikimedia.org/T113545#1689650 (Milimetric) [15:50:35] Analytics, operations, Patch-For-Review: Moving analysis data from flourine to analytics cluster - https://phabricator.wikimedia.org/T112744#1689684 (Addshore) Open>Resolved a:Addshore [15:55:32] joal, sorry for the late notice. [15:55:41] Would you be able to join us for a meeting re. altiscale cluster in 5 min [15:55:42] ? [15:56:05] halfak: I have a event-bus meeting in 5 :( [15:56:17] No worries! I'll have some stuff to talk to you about though. [15:56:22] It turns out that Nitin, our engineer, got an injury that will prevent him from doing substantial work. [15:56:43] So I was hoping to get a hand from you loading data into our "research cluster" hive instance. [15:57:02] halfak: Woould love to :) [15:57:20] :) Cool. ttyl [15:57:27] Will ask the team to ensure my resource is not bottlenexk anywhere, but I'd gladly help :) [15:57:30] later ! [17:11:28] Analytics-Backlog, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1690049 (Nuria) There are many good points on @BBlack reply. We could give a little more... [17:33:23] hello nuria. non-technical comment: can you release R61 from the event we have at 11? madhuvishy added R31 but she doesn't have enough permission to remove R61. :D [17:33:38] leila: :) thanks [17:34:15] np, madhuvishy. as the person who complains about rooms not being available for meetings :D, I take responsibility for releasing this one. :D [17:39:44] PROBLEM - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: processor/client-side-11 processor/client-side-10 processor/client-side-09 processor/client-side-08 processor/client-side-07 processor/client-side-06 processor/client-side-05 processor/client-side-04 processor/client-side-03 processor/client-side-02 processor/client-side-01 [17:40:49] milimetric: i just turned on the 12 processors [17:40:50] lets check it out. [17:41:06] cool, i'm at scrum of scrums [17:41:11] so first i'll tell everyone we did that [17:41:14] and then we'll check :) [17:41:17] :D [17:41:23] RECOVERY - Check status of defined EventLogging jobs on eventlog1001 is OK: OK: All defined EventLogging jobs are runnning. [17:41:34] k [17:44:04] milimetric: Can i get a link to the hashing salt patch, whenever you get a chance? [17:44:29] Analytics-Backlog, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1690191 (mforns) In the particular case of the CentralNoticeBannerHistory schema, I see th... [17:44:32] csteipp: for eventlogging? [17:45:34] leila: yes, let me get to it [17:45:51] leila: done [17:46:06] thanks nuria [17:47:13] csteipp: https://gerrit.wikimedia.org/r/#/c/238854/ [17:47:46] csteipp: i think that's the one, which puts a shared salt on /etcd [17:47:55] nuria: Cool, thanks [17:48:35] csteipp: is the same hash that was used before (os.urandom), just stored in etcd instead of in memory [17:48:46] so it can be shared by multiple processes [17:50:07] ottomata: back to libjars, will report accordingly, cc madhuvishy [17:50:25] k [17:50:29] GOOD LUCK [17:50:30] :) [17:53:36] nuria: no hangout in the bot mneeting :) [17:53:53] joal:ains.. [17:54:25] leila: alas, i'm working from home and cannot use both rooms :P [17:54:34] mm, same here. [17:54:42] then maybe nuria should cancel the other room, too? [17:54:44] a-team: IP hash frequencies look good! what's cool about this now too is that the hashed IPs stay the same even during eventlogging restarts, whereas before they'd reset whenever that happened [17:54:45] I guess no one is in the office [17:54:46] :-( [17:54:57] i am! [17:54:58] leila: i cancel 6th floor [17:55:01] oh for meeting [17:55:03] nawww [17:55:06] nm [17:55:08] ottomata: :) [17:55:11] leila: i should cancel 3rd floor too? [17:55:46] madhuvishy: do you know of a EL schema that is inactive? [17:55:55] Analytics [17:56:21] hm ok, cool [17:56:22] perfect. [17:56:22] https://meta.wikimedia.org/wiki/Schema:Analytics [17:56:32] ottomata: data in db looks good [17:56:43] I'll keep an eye on the dashboards just to be sure [17:57:10] madhuvishy: ok if I blacklist this schema from eventlogging-valid-mixed? it will then not go into MySQL [17:57:16] i guess I could blacklist temporarily [17:57:28] i'm going to blast data into the system with this schema [17:57:48] ottomata: it's just a test schema so you can do whatever, temporary or not [17:57:52] ottomata: no one uses this [17:57:54] yeah [17:58:08] ok [17:58:25] well ok, i will blacklist and leave it so, but just remember if you test with it in the future that it wont' go into mysql [17:59:16] ottomata: that's fine - we added it to the list of tables to delete permanently - so hopefully no one will use it [17:59:39] I'll add this to the talk page, just in case someone gets super confused :) [18:00:07] k [18:00:08] thanks [18:00:12] milimetric: it says obsolete etc [18:00:32] right, but just in case someone else wants to use it to test in the same way we are [18:00:39] and then gets confused when events don't go into sql [18:00:55] milimetric: right, okay [18:01:24] feeding the archives / leaving breadcrumbs is something I try to be better at since I worked with Christian :) [18:01:31] I've gotta run out for a bit, brb [18:33:09] mforns_gym: simple chnage for you to look at whenever [18:33:09] https://gerrit.wikimedia.org/r/#/c/242634/ [18:33:10] no hurry [18:48:29] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1690491 (JAllemandou) Tilman: I confirm the discrepancy we observe comes from a difference in the computation of the pageview boolean, and not the sampling (see below).... [18:58:05] Hi a-team ! [18:58:13] hey joal [18:58:14] Need to run, will see you tomorrow ! [18:58:23] good night :) [18:58:30] Thanks for very interesting meeting nuria and madhuvishy :) [18:58:47] nuria: Please let me know if you find the libjar siolution [18:59:35] nuria: an idea could be that package com.linkedin.camus is already existing in a jar, and therefore is not looked after in another for the example subpackage ... But it's a wild guess [19:00:44] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [19:02:34] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [19:03:54] PROBLEM - Throughput of event logging events on graphite1001 is CRITICAL: CRITICAL: 21.43% of data above the critical threshold [600.0] [19:05:53] Analytics-Cluster: Trouble with access to hue - https://phabricator.wikimedia.org/T114292#1690557 (Krenair) [19:09:03] RECOVERY - Throughput of event logging events on graphite1001 is OK: OK: Less than 15.00% above the threshold [500.0] [19:09:29] Analytics-Backlog, Analytics-Cluster: Trouble with access to hue - https://phabricator.wikimedia.org/T114292#1690567 (madhuvishy) a:madhuvishy>Ottomata [19:09:53] Analytics-Backlog, Analytics-Cluster: Trouble with access to hue - https://phabricator.wikimedia.org/T114292#1690550 (madhuvishy) @Ottomata Could it be that his LDAP account needs to be manually synced? [19:14:26] nuria: I ran the gist that you pasted [19:14:33] and it seems like the job ran for me [19:14:35] but [19:14:42] it says [CamusJob] - Discarding topic (Decoder generation failed) : test [19:14:52] so nothing got written [19:17:31] madhuvishy: but didn't you get a classnotfound exception? [19:17:39] nuria: Nope [19:17:43] waittt [19:17:54] the hadoop job ran and the logs say the job finished [19:18:08] but check the local log [19:18:18] log_camus_avro_test.txt [19:18:19] ? [19:18:20] yes [19:18:27] yup that's the one [19:18:30] let me paste [19:18:50] nuria: aaah yes [19:18:58] it does say decoder not found [19:19:01] madhuvishy: good, well not really [19:19:04] when i scroll up [19:19:09] madhuvishy: but consistent [19:19:19] ok, is joal arround still? [19:19:34] nuria: nope, he had to leave [19:20:11] ok, let's try to see if ottomata is there [19:21:03] cool, i'm poking around too to see what's going on [19:21:35] so, the -libjars is working (as in the hadoop job swallows the option) [19:21:55] madhuvishy: but the jar is not being loaded [19:22:10] madhuvishy: let me check for typos one more time [19:22:19] right [19:22:34] andrew mentioned yesterday may be it needs to be loaded by all workers [19:22:41] madhuvishy: waitttt [19:22:56] madhuvishy: i think i know what it is [19:23:10] oh? [19:23:42] madhuvishy: ya, the package must be wrong this time, cause libjars is working [19:25:12] madhuvishy: yes, trying again 'DummySchemaRegistry' had the wrong package [19:25:47] madhuvishy: after i modified the camus jar to get -libjars ... things should work [19:25:52] madhuvishy: let's see [19:25:55] okay [19:26:49] madhuvishy: batcave? [19:27:11] nuria: yup joining [19:37:17] ottomata: can you let yuvi in? :D [20:11:18] ottomata: there? [20:11:46] ottomata: is there a way we can flush the test topic on kafka? [20:35:50] Analytics-EventLogging, Beta-Cluster, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, and 2 others: Beta Cluster EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1690997 (DStrine) [20:42:59] ottomata: what is json encoded avro? [20:44:38] leila: ha! i needed help myself a few mins ago [20:44:41] i forgot my card today too [20:44:45] just made it back in [20:44:50] nuria: flush? [20:44:56] ottomata: ;-) [20:45:09] ottomata: as in 'delete all messages' [20:45:23] ah [20:45:23] no [20:45:25] no way [20:45:48] hm, maybe. could mayyybye alter topic configs and set a short retention time [20:46:03] but, not realy woth it [20:46:09] nuria: can you just configure to consume from latest? [20:46:24] so you don't consume old data that is bad format? [20:46:36] ottomata: yeah that's what we tried [20:46:46] but it looks like a different problem [20:47:12] they want us to send "json encoded avro" and i am not sure what that means [20:47:56] who is they? [20:49:29] ottomata: these people [20:49:30] https://groups.google.com/forum/#!topic/camus_etl/ipmjc02PrBY [20:49:49] the error they report here is the exact same one we get [20:51:28] !!!!! [20:51:28] The decoder by default expects json encoded messages coming from kafka, not avro [20:51:31] !!!! :? [20:51:35] YES [20:51:46] and i dont understand what that is even supposed to mean [20:53:14] oh, i know what it means, still reading.. [20:53:17] ottomata: {"id":123456,"name":"pepito perez", "muchoStuff":{"a": "1"}} is what we tried to send [20:55:41] oh you are trying the dummylog too [20:55:53] yes [20:56:06] ok, looking... [20:59:26] ottomata: ok, some progress today, we can launch a job and pass it a third party jar that contains schemas: https://gist.github.com/nuria/833fef6a74574125a3fc [21:00:25] ottomata: now, we just need to have a dummy schema registry or some registry that we can use to validate messages, the current example on camus is failing but i think we can fix it for reference [21:00:42] yes reading some code now. [21:02:26] yall wanna batcave? [21:02:28] nuria: , madhuvishy [21:02:29] ? [21:03:07] hey ottomata :] when you finish this discussion, do you have 5 mins to talk? [21:03:11] yes [21:03:12] oh [21:03:13] NO [21:03:13] ah [21:03:16] xD [21:03:17] RFC meeting is now, want to join that [21:03:18] sorry yall. [21:03:27] np [21:04:10] oh, mforns nm, that is just an IRC meeting? [21:06:35] mforns: what's up? [21:07:56] ottomata: nuria has meeting with Kevin [21:08:34] ottomata: where is RFC meeting happening? [21:09:13] in #wikimedia-office [21:10:13] ottomata, it's about the logstash puppet change [21:11:05] I should apply the logstash::eventlogging role to the deployment-logstash2 instance in the deployment-prep project [21:12:17] is this to be done in here? https://github.com/wikimedia/operations-puppet/blob/production/hieradata/labs/deployment-prep/host/deployment-logstash2.yaml [21:12:50] naw, i think here mforns [21:12:50] https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=configure&instanceid=8c2fe47d-6ea4-4faa-90f9-83a2baee3bae&project=deployment-prep®ion=eqiad [21:12:57] scroll down, see logstash roles checked [21:13:06] mmmmmmmm [21:13:25] i can add the role to that list if you like [21:13:34] role::logstash::eventlogging yes? [21:13:42] ok, or tell me how to do it [21:13:48] not sure if you can, try: [21:13:51] https://wikitech.wikimedia.org/wiki/Special:NovaPuppetGroup [21:13:54] you have to have super powers [21:13:57] yes role::logstash::eventlogging [21:13:58] scroll down, add class [21:14:17] oh [21:14:17] mforns: i don't think yo have super powers, so i am adding [21:14:31] no, I'm regular [21:14:36] the regular guy [21:14:43] :] [21:14:48] done. [21:14:52] thx [21:14:55] mforns: answering bd808's question about multiple consumers [21:15:06] aha [21:15:22] i don't know how the logstash kafka plugin works [21:15:29] but i think running on multiple hosts will cause duplicates, right? [21:15:38] unless there is some auto balancing stuff [21:15:46] plus, EventError is a single partition topic [21:15:54] aha [21:15:55] so, unless we increase partitions, you can't parallelize those [21:16:16] ok, so I should load the role only in one of the machines [21:16:20] yes, i think so [21:16:37] bd808, which one should I, any preference? [21:16:46] ja hm, not sure the best way to do that, maybe conditional [21:16:57] if $::hostname == 'logstash1001', maybe? [21:16:58] dunno [21:17:31] node /^logstash1001\.eqiad\.wmnet$/ { [21:17:36] node /^logstash100[2-3]\.eqiad\.wmnet$/ { [21:18:15] mforns: we should look at the current server load and see if there a "best" candidate. 1001 would be the worst. It is already a SPoF for apache2+hhvm logs via syslog [21:18:24] aha [21:18:42] bd808: whichever you choose, should he just make that a conditional in the node blck? [21:18:50] if hostname ... [21:18:55] that's probably easiest [21:18:56] role logstash::eventlogging [21:18:56] ? [21:19:11] I think we have done things like that before in site.pp [21:19:24] aha [21:20:11] can't we just use regexp like it's done in node /^logstash100[1-3]\.eqiad\.wmnet$/ { [21:20:15] ? [21:21:37] oh I see, you mean within the section, ok [21:21:39] mforns: yes [21:21:46] probably not worth separating out the nodes [21:21:52] cool [21:22:20] btw, I added a nitpicky comment in input/kafka.pp [21:22:20] bd808, so 1002 then? [21:22:27] cool! [21:24:10] ottomata, I was looking for zookeeper_url in labs realm and couldn't find it [21:25:20] mforns: 1003 looks like it is taking less traffic right now [21:25:21] ottomata: I think we can merge https://gerrit.wikimedia.org/r/#/c/231574/ then. You should be able to +2 it, alex gave the go-ahead after that last fix [21:25:51] you said I could just use kafka::config::zookeeper_url, that it would use "deployment-zookeeper01.deployment-prep.eqiad.wmflabs:2181/kafka/deployment-prep" when in labs, but I could not find it in puppet, am I missing something? [21:25:59] ottomata, ^ [21:26:09] bd808, 1003, cool thx [21:27:00] ottomata: wait :) maybe we need to wait for the new aqs group [21:27:08] I'll ping you then or alex tomorrow [21:30:18] mforns: its in hiera [21:30:28] https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [21:30:30] ottomata, I see [21:31:09] thx [21:37:14] ottomata, btw, when I try to configure https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-logstash2.deployment-prep.eqiad.wmflabs, it says "The specified resource does not exist." [21:43:25] hmmm [21:43:27] ? [21:43:37] this link doesn't wokr? [21:43:37] https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=configure&project=deployment-prep&instanceid=8c2fe47d-6ea4-4faa-90f9-83a2baee3bae®ion=eqiad [21:43:52] ottomata, no, same message [21:44:05] maybe I don't have permits to configure that [21:44:07] weird, mforns maybe log out and log back in [21:44:08] maybe not. [21:44:11] ok [21:44:11] want me to just check the box? [21:44:44] ottomata, mmm [21:44:58] the code is not there yet [21:45:04] ok [21:45:05] right. [21:45:21] bd808: ^ mforns cannot configure deployment-prep instances, you can help? [21:46:40] ottomata, bd808, I pushed the changes to gerrit, if you think that's OK, I'll pull puppet from deployment-logstash2 and apply the role in the machine [21:46:53] yeah. I can help at the top of the hour when the rfc meeting is done [21:46:54] we want to cherry-pick that patch, apply it and see what happens right? [21:47:04] aha, cool [21:47:27] let me know when you have some time, thx [21:53:03] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1691218 (Krenair) [22:02:28] dear beloved a-team, do you guys have use cases lined up for $real_hardware in labs. i.e. managing a box in labs like a vm that is hardware for mucho disk or mem or cpu [22:02:44] I'm trying to collect some actual use cases before we design a solution, seems sensible [22:05:17] chasemp: I think halfak has a bunch of those types of cases [22:05:36] o/ [22:05:38] * halfak reads [22:05:52] yeah. [22:05:56] So. I have two. [22:06:02] chasemp: we'd like a copy of the hadoop cluster we have in prod [22:06:16] so we can test, but nothing else huge. Most of the other stuff we do in labs is dashboards and those can use tiny instances [22:06:30] but that is just functional testing [22:06:31] not load testing [22:06:34] milimetric: how many boxes would that be? [22:06:36] so vms are fine, ja? [22:06:37] mforns: If my irc stays working, I have some time to help now. [22:06:48] bd808, cool [22:07:09] bd808, have you seen the latest changes? [22:07:16] are you ok with them? [22:07:33] https://gerrit.wikimedia.org/r/#/c/241984/ [22:07:41] * bd808 looks [22:10:52] mforns: will $role::analytics::kafka::config::zookeeper_url have the correct data in the beta cluster? [22:11:35] bd808, ottomata says this variable is stored in hiera and it has the correct value depending on the realm [22:11:52] perfect [22:12:22] halfak: if I make a ticket would you mind fleshing out the ideas you have? [22:12:29] Yes [22:12:32] Rather no [22:12:36] I would not mind ;) [22:13:24] right :) cool thanks [22:13:33] mforns: l cherry-pick to the beta cluster puppet master, apply it on deployment-logstash2 and then we can see if logstash starts up or not :) [22:13:47] *I'll [22:13:55] bd808, cool [22:15:15] bd808, I can send some invalid events to beta cluster's eventlogging instance to generate some contents in the topic in question [22:16:04] doing battle with puppet on logstash2 at the moment. :/ [22:23:03] mforns: ok. The role is applied and the generated logstash config looks like: https://phabricator.wikimedia.org/P2127 [22:23:12] I just realized that we will need a bit more logstash config to go with this [22:23:56] bd808, aha, what would we need? [22:25:28] we are going to need config to add the "es" tag to these messages to route them into Elasticsearch. We will probably end up wanting to do more transforms too once we see what the records look like. [22:25:50] So we need a filter file similar to https://github.com/wikimedia/operations-puppet/blob/production/files/logstash/filter-logback.conf [22:26:05] I can write it up and add it to the patch for you [22:26:07] easy peasy [22:26:39] aha, ok, if you're busy I can do this, as well [22:27:38] I've got time to help. No worries [22:27:45] ok bd808 thanks! [22:28:10] chasemp: Andrew's right, just functional testing, we don't need anything for load testing in labs that I know of [22:29:54] PROBLEM - Throughput of event logging events on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [600.0] [22:30:11] mmm [22:34:51] milimetric, can you see eventlogging metrics in graphite? [22:35:06] mforns: ottomata and I are doing some load testing :) [22:35:15] oh! ok :] [22:35:20] heheh [22:35:22] yes we are doing stuff [22:35:26] i thought i scheudled downtime for that [22:35:26] coool [22:35:41] I think that throughput is much higher now :) [22:35:49] woohoo [22:36:31] anyway, is this a reason why raw and valid metrics wouldn't show up? [22:36:36] Analytics-EventLogging, Beta-Cluster, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, and 2 others: Beta Cluster EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1691338 (DStrine) [22:37:45] aj, never mind, I forgot, now we are looking at grafana [22:38:15] RECOVERY - Throughput of event logging events on graphite1001 is OK: OK: Less than 15.00% above the threshold [500.0] [22:40:51] Analytics-Backlog, Analytics-EventLogging, Fundraising-Backlog, Unplanned-Sprint-Work, and 2 others: Promise returned from LogEvent should resolve when logging is complete - https://phabricator.wikimedia.org/T112788#1691357 (Ejegg) Open>declined Seems to be a no-go in EventLogging, we'll jus... [22:41:21] mforns: ok. in theory we are ready to catch the error events. can you make some happen? [22:41:29] bd808, sure! [22:41:57] bd808, I've seen the new changes, awesome [22:42:47] mforns: we haz events! https://logstash-beta.wmflabs.org/#dashboard/temp/AVAga-y3a1EjumVdqYuC [22:43:09] now to tweak the formatting a bit in that new filter [22:43:21] woohoo! [22:44:47] !log testing Event Logging by sending large amounts of events up to 600k to see how fast they process [22:48:04] PROBLEM - Throughput of event logging events on graphite1001 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [600.0] [22:48:57] bd808, I've sent events with json parsing errors, too. I think a good improvement would be having the event.message field in the message column [22:49:10] *nod* [22:49:18] probably you alread knew, ok :] [22:50:16]