[00:23:30] ottomata: yt? [00:35:57] nuria: hiya :) [00:36:10] ottomata: working still? [00:36:26] ottomata: asking as it is kind of late on your side of teh continent [00:36:28] *the [00:36:37] nope, but can help for a min [00:37:20] ottomata: it can wat until tomorrow, remember that scala jar for testing? [00:37:36] oh ja, sorry [00:37:43] forgot about that today [00:37:56] nuria, can you run mvn compile with that? if so, i will make it so things work [00:38:55] ottomata: run mvn compile with my local copy of the jar, you mean? [00:39:15] no, with uh, the dep in your pom and wihtout your local copy [00:39:38] i want to you trigger an attempt to dl from archiva [00:39:41] (not yet, gotta do something) [00:40:10] ahh , sorry [00:40:56] ottomata: k, you let me know [00:41:55] ok , try now nuria [00:42:27] ottomata: done [00:42:28] [ERROR] Failed to execute goal on project refinery-job: Could not resolve dependencies for project org.wikimedia.analytics.refinery.job:refinery-job:jar:0.0.10-SNAPSHOT: Failure to find org.scalatest:scalatest_2.10:jar:2.2.4 in http://archiva.wikimedia.org/repository/mirrored/ was cached in the local repository, resolution will not be reattempted until the [00:42:28] update interval of wmf-mirrored has elapsed or updates are forced -> [Help 1] [00:44:49] hmmm [00:44:51] really? [00:45:13] uhhh, can you delete ~/.m2/repository/org/scalatest and try again? [00:45:54] i'm watching logs [00:47:26] nuria: ? [00:50:51] ottomata: yes sorry [00:51:21] better? [00:51:27] ottomata: now downloading [00:51:40] cool [00:52:05] it had cached the http url i guess? [00:52:09] instead of https [00:54:36] cooool good! [00:54:45] ottomata:grasiassss [00:54:54] nuria all good? [00:55:08] cause dinner time! [00:55:25] ottomata: deps wise we should be I do not think we need more than that for scala testing, joel's stuff for aaron requires aton more things [00:59:32] k [00:59:35] byyyee [10:31:41] (PS1) KartikMistry: Update for CX deployment 20150421 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/205573 [10:41:32] Analytics-Tech-community-metrics: "Who contributes code" page metrics are not updating - https://phabricator.wikimedia.org/T95166#1223989 (Acs) http://korma.wmflabs.org/browser/who_contributes_code.html fixed to include updated information. Once we update in korma all the dashboard, this issue could be rev... [14:29:08] holaaaa [14:29:13] Hi nuria [14:40:17] joal: o/ [14:40:21] allo [15:01:21] ottomata: there is another maven plugin we need to run scala tests: [15:01:47] https://www.irccloud.com/pastebin/MjEnmj4P [15:02:16] ottomata: http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin [15:04:29] ok, nuria, gimme a few and then try to compile with that in pom... [15:04:30] or test [15:04:32] i guess? [15:04:36] i tell you when [15:07:15] Analytics-Tech-community-metrics, ECT-April-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1224592 (Aklapper) [15:07:53] Analytics-Tech-community-metrics, ECT-April-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1167361 (Aklapper) [15:09:59] nuria: try now if you can [15:11:09] ottomata: working! [15:11:10] Analytics-Tech-community-metrics, ECT-April-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1224625 (Aklapper) [15:11:38] lemme know when it is done [15:11:54] ottomata: the mvn test? done now [15:12:06] k [15:15:36] Analytics-Tech-community-metrics, ECT-April-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1224648 (Aklapper) [15:19:52] (PS2) KartikMistry: Update for CX deployment 20150421 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/205573 [15:36:31] (CR) KartikMistry: [C: 2] Update for CX deployment 20150421 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/205573 (owner: KartikMistry) [15:42:45] Analytics-Tech-community-metrics, ECT-April-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1224723 (Aklapper) [15:43:59] (Merged) jenkins-bot: Update for CX deployment 20150421 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/205573 (owner: KartikMistry) [15:57:08] nuria, internet droppped for me again [15:57:09] :( [15:57:09] Hopefully back soon [16:00:08] joal, looks like I won't be able to just use the jar file you provided see http://hadoop.apache.org/docs/r1.2.1/streaming.html#How+do+I+provide+my+own+input%2Foutput+format+with+streaming%3F [16:00:40] It looks like we'll need to build a new streaming.jar [16:01:47] mwarf ... [16:01:50] Indeed [16:02:53] Hmm... I think the docs might be lying [16:03:05] Other people on the intertubes think it works with -libjars [16:03:09] http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/ [16:03:21] yeah, differences between hadoop 0.14 and hadoop 2.4.1 ;) [16:03:24] halfak: --^ [16:03:43] lol [16:04:04] Docs were for 1.2.1 o.O [16:05:36] Looks like the equivalent docs don't exist for 2.4.1 [16:05:53] SO, the for -libjars. Do you suspect that is local or on HDFS? [16:06:21] if you don't tell hdfs://, probably local, otherwise ... [16:06:42] kk [16:21:14] ottomata: can't hear you :( [16:21:14] internet connection sicks today [16:29:35] nuria: love your comment on reorgs :) [16:30:04] joal: i am old and jaded you know... [16:30:12] jaja [16:30:16] yeah, I unfortunately know the feeling ;) [16:43:52] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics, Patch-For-Review: Story: WikimetricsUser reads user names in a JSON report [8 pts] - https://phabricator.wikimedia.org/T74747#1224911 (Capt_Swing) [16:43:53] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics, Patch-For-Review: Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" - https://phabricator.wikimedia.org/T93023#1224912 (Capt_Swing) [16:44:09] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics, Patch-For-Review: Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" - https://phabricator.wikimedia.org/T93023#1126852 (Capt_Swing) It does. Thanks @Nuria! [16:50:40] Analytics, Gather Sprint Forward, Mobile-Web, Patch-For-Review: Update main menu schema to include collections for limn graphs - https://phabricator.wikimedia.org/T93690#1224938 (Jdlrobson) @bmansurov could you review this? please pretty please! :) [16:53:48] Analytics, Analytics-Kanban, WMF-Product-Strategy: Backfill pageview data for March 2015 from sampled logs before transition to UDF-based reports as of April - https://phabricator.wikimedia.org/T96169#1224962 (Ironholds) Done. [16:53:52] Analytics, Analytics-Kanban, WMF-Product-Strategy: Backfill pageview data for March 2015 from sampled logs before transition to UDF-based reports as of April - https://phabricator.wikimedia.org/T96169#1224963 (Ironholds) Open>Resolved [17:55:59] Analytics, Analytics-Kanban, WMF-Product-Strategy: Backfill pageview data for March 2015 from sampled logs before transition to UDF-based reports as of April - https://phabricator.wikimedia.org/T96169#1225081 (DarTar) Thanks, dude. [18:44:47] folks who can't make it into the infrastructure hangout: "call office phone number and dial 2002" [18:46:25] (415) 839-6885 x 2002 [18:57:01] Analytics, Gather Sprint Forward, Mobile-Web, Patch-For-Review: Update main menu schema to include collections for limn graphs - https://phabricator.wikimedia.org/T93690#1225143 (bmansurov) I will once I get limn working locally (hopefully tomorrow). [19:32:43] milimetric: you look totally different than I remember! [19:33:42] I try to metamorphose every once in a while [19:34:05] joal: have you set up llama before, or just used it? [19:34:36] ottomata: not personally --> worked with a devops who did it [19:34:41] getting issues ? [19:34:47] yeah, 'im very close [19:34:51] https://groups.google.com/a/cloudera.org/forum/#!topic/impala-user/C8hHA33qjUI [19:34:53] just posted there [19:35:34] there was a whole lot lacking in Cloudera's documentation on how to set this up [19:35:43] first time i've had that experience, everything else on cloudera docs is great [19:37:08] I have seen this short name issue before, I'm pretty sure [19:37:27] I'll ask the ops friend who worked with me tomorrow :) [19:37:56] oh awesome, ok thank you [19:38:12] ottomata: On my suide, I run into trouble with hove udfs [19:38:19] First time it happens that bad [19:38:25] joal, enwiki XML on altiscale at /hdfs/user/halfak/streaming/enwiki-20150304/xml-bz2 [19:38:34] niiiiiice halfak [19:38:43] Will launch the job [19:38:49] ? [19:38:53] trouble with hive udfs? [19:38:57] yeah [19:39:01] whatcha mean? [19:39:25] I'll push my code tomorrow and show you (time to go babysitting now) [19:39:30] weirdo [19:39:38] ok [19:39:43] have good night [19:39:50] Thx ! [19:39:52] You too [19:41:30] ottomata: [19:41:37] yt?, hola [19:42:19] nuria, question on EL consumer... can you? [19:42:25] mforns: sure [19:43:00] yup hey [19:43:07] nuria, in the current code, when grouping the events by schema and revision, the code also groups by event fields [19:43:35] mforns: after validation? [19:43:38] thus executing a separate insertion for each different field collection [19:43:42] nuria, yes, in the consumer [19:43:44] mforns: right [19:43:53] as you cannot do insert values () [19:44:06] if those values are not the same for every record you rae inserting right? [19:44:28] nuria, mmmm [19:45:16] mforns: "optional" fields will not be present [19:45:30] nuria, couldn't we give a null value for optional fields? [19:45:41] mforns: no, cause null carries value [19:45:47] mforns: in many schemas [19:47:09] nuria, isn't it possible to insert "undefined" values? [19:47:44] mforns: by infering the ones that are not filled in by schema, no, not really, default could be "0", "null", "empty" [19:47:56] mforns: depends on schema and column type [19:48:21] nuria, mmmmm, this could be a reason for the problems we have [19:48:42] mforns: on top of buffer size? [19:48:46] nuria, maybe just fixing this, we could get a significantly better insert rate [19:49:09] nuria, anyway... there are lots of schemas with optional values [19:49:31] nuria, we are right now spliting inserts not only by schema but by exact set of event values [19:49:54] nuria, if a schema has lots of optional values, we could be having lots of inserts per schema [19:50:08] mforns: yes, but think that before we insert events 1 by 1 w/o throughput issues (on our end) [19:50:16] mforns: as you can see if you backfill 1 by 1 [19:50:18] nuria, I see [19:50:43] mforns: the batching is not for EL code, the python is totally fine [19:50:54] mforns: and would work great like it was before 1 by 1 [19:51:27] mforns: now, the batching was done to reduce the time we lock a table and thus halt replication (ahem... oversimplifying) [19:51:50] nuria, aha [19:52:00] mforns: so even if 30 events per second are grouped in 3 insert sentences [19:52:08] mforns: db wise is more efficient [19:52:56] nuria, maybe I can add a log that prints how many inserts are done per batch [19:53:25] mforns: also it is not that every schema has n fields that are optional and n possible types of records you can be inserting, although given the monolith the edit schema is it might very well have several permutations of that [19:53:37] mforns: ya, logging can only help [19:54:51] nuria, maybe in the future we can specify to the teams (EL users) that optional values will translate to NULL in the tables. [19:55:13] mforns: mmm.. why not zero? [19:55:14] nuria, this would maybe give us 2x, 3x more writing speed, no? [19:55:26] mforns: depends what that optional value stands for right? [19:56:01] nuria, yes, sure: NULL or 0 or empty string, just a default [19:56:02] mforns: we will know that when you have the log of what is going on [19:56:09] nuria, or force the default! [19:56:38] mforns: cause having optional fields is fine, makes a lot of sense, on our end we need to handle it better perhaps [19:56:53] nuria, yes! sure, we have to support optional fields [19:57:09] mforns: but let's get some logging around it toquantify is really a problem [19:57:16] nuria, but having to split them in several inserts is not good [19:57:26] nuria, yes makes total sense [19:57:43] mforns: yes, it is less optimal, but if you are hitting 100 tables with each batch [19:57:52] and 10 of those are a repeat [19:58:03] that is not your bottleneck, makes sense? [19:58:08] nuria, totally [19:58:33] mforns: so quantifying the "spread" will give us an idea [19:58:46] nuria, yes, will do [19:58:51] nuria, thanks! that clarified it a lot :] [19:59:07] mforns: the mantra "leave code better than you found it" [19:59:17] nuria, :] [19:59:24] mforns: but we shouldn't try to fix it all in the same pass [20:00:24] nuria, ok [21:51:46] Analytics-Tech-community-metrics, ECT-April-2015: Provide list of oldest open Gerrit changesets without code review - https://phabricator.wikimedia.org/T94035#1225526 (Dicortazar) Let's start with this first 100 issues with no code review. Just a comment, I've noticed in at least one of the issues, that... [22:14:08] Sati, did you get into geowiki? [22:14:31] @sshouston_wmf [22:37:31] * milimetric mili|flight [22:38:10] halfak: Hey, we did https://gerrit.wikimedia.org/r/205773 as you suggested to test the A/B bucketting for VE – I was thinking we could do it for 24 hours and then disable, to verify. Does that work for you? When should we do this? [22:38:22] halfak: (Not crisis-urgent.) [22:47:08] Analytics, Ops-Access-Requests, operations: Grant Sati access to geowiki - https://phabricator.wikimedia.org/T95494#1225949 (Shouston_WMF) Open>Resolved [23:02:21] halfak: Config patch to enable it is https://gerrit.wikimedia.org/r/#/c/205778/ and is gated on your approval (of course). [23:02:54] James_F, that's great, but I was hoping to run a full pilot. [23:03:10] e.g. logging fixes in and we run a 1 day test that affects users [23:03:20] halfak: … [23:03:37] halfak: That will need announcing. [23:04:00] halfak: That's why it writes to a different (fake) preference so we can detect the bucketing without affecting users. [23:04:06] halfak: I thought that was the point? [23:04:32] https://en.wikipedia.org/wiki/Pilot_experiment [23:05:23] I'm not here to read articles. :-) [23:05:25] Either we run a pilot and a full study or we risk running two full studies because of flaws in the first. [23:05:32] James_F, I used the word "pilot" [23:05:42] And I suggested we deploy the experiment for a day. [23:05:49] Which is what the patch does. [23:05:51] Why don't we make it part of "the experiment" [23:05:56] James_F, it does not [23:06:01] It doesn't? [23:06:03] It changes something no one will see. [23:06:07] Yes. [23:06:29] Which is the test of the testing framework you asked for. [23:06:58] Not what I asked for. [23:07:09] Also, a pref change is not "the testing framework" [23:07:14] Which is test of the testing framework I understood that you asked for. [23:07:20] The testing framework includes logging. [23:07:24] No, the A/B bucketing is the testing framework. [23:07:33] Indeed. I should have communicated more clearly. [23:07:36] Sorry. [23:07:38] So… [23:07:41] Do we agree that I used the term "pilot study"? [23:07:45] I have no idea. [23:07:58] OK. Allow me to review the document. [23:08:11] I recall you explaining that you wanted to test the testing framework, as it had gone wrong in the past. [23:08:16] Which seemed totally sensible. [23:08:27] "This will need to account for a pilot test and potential issues discovered" [23:08:39] But announcing a test of a test will delay us for a few weeks. [23:08:53] by definition, doesn't testing the framework mean actually running it through its paces by running a mock experiment? [23:09:01] in the same way that we don't call unit tests done until we've hit "run" on them. [23:09:03] Ironholds: This is a mock experiment. [23:09:05] No it won't. We can run the pilot test two days before we run the real one. [23:09:15] Ironholds: But halfak wants us to run the real experiment, but only for a day. [23:09:31] A "pilot" [23:09:33] okay. What's the problem? [23:09:35] That is the technical term [23:09:41] Ironholds: Timing. :-) [23:10:03] halfak: OK, fine, you want to run the test but call it a pilot and then run the test some more, ideally having spotted errors first. [23:10:18] Do we stop between the two runs of the test? [23:10:52] (Either is fine, but if we do we need to announce them separately.) [23:11:31] James_F, yes [23:11:37] James_F, I don't see why. [23:11:50] halfak: We don't do cowboy releases. [23:11:51] We start the pilot on day 1 of the experiment. [23:11:59] We start the proper test on day 3 [23:12:10] Assuming we didn't find a problem during the pilot. [23:12:15] Hmm. [23:12:31] OK, we can announce that in one go, I feel. CLs will have to sign off. [23:13:29] I'm not too worried they won't [23:13:31] :) [23:13:42] I've already discussed this with whatamidoing [23:16:06] would suggest discussing it with t'others too [23:16:13] WAMID's approach to things tends to be very, uhm. [23:16:19] "This is a good idea, therefore sod everything else" [23:16:26] Ironholds: Be nice. [23:16:39] halfak: OK. [23:16:40] what? She's very dedicated to having good ideas deployed. [23:16:43] It's a feature, not a bug [23:16:53] it just means she doesn't necessarily represent more cowardly community folk such as myself ;p [23:19:20] James_F, this test will also give us a sense for what to expect from the final test. [23:19:26] "final" [23:19:31] --> "full" [23:19:33] halfak: You mean in terms of data? [23:19:55] Indeed. I'll do a full first pass on the data for the metrics I can look at after 24h. [23:20:49] Nice. [23:21:03] OK, in that case I should amend things. :-) [23:22:16] James_F, thanks. :) Let me know if you or the CLs need anything from me -- e.g. a blurb on why we do pilots. [23:28:23] halfak: That'd be helpful. [23:41:17] James_F, on it.