[00:01:02] (CR) Nuria: [C: 2] Split projectcounts tests in separate test cases [analytics/aggregator] - https://gerrit.wikimedia.org/r/182663 (owner: QChris) [00:01:09] (Merged) jenkins-bot: Split projectcounts tests in separate test cases [analytics/aggregator] - https://gerrit.wikimedia.org/r/182663 (owner: QChris) [00:02:45] (CR) Nuria: [C: 2] Get fixture name through test case [analytics/aggregator] - https://gerrit.wikimedia.org/r/182664 (owner: QChris) [00:02:53] (Merged) jenkins-bot: Get fixture name through test case [analytics/aggregator] - https://gerrit.wikimedia.org/r/182664 (owner: QChris) [00:12:10] (PS1) QChris: Fix ISO week numbering [analytics/aggregator] - https://gerrit.wikimedia.org/r/182963 [00:16:31] (CR) QChris: "> Computation of week number is faulty." (3 comments) [analytics/aggregator] - https://gerrit.wikimedia.org/r/182676 (owner: QChris) [00:19:08] (CR) Nuria: [C: 2] Split projectcounts tests into modules [analytics/aggregator] - https://gerrit.wikimedia.org/r/182665 (owner: QChris) [00:19:56] huh; that's interesting. tests can't find the class. [00:23:07] (CR) Nuria: [C: 2] Collect projectcounts test modules in separate package [analytics/aggregator] - https://gerrit.wikimedia.org/r/182666 (owner: QChris) [00:23:25] Ironholds: need help? [00:23:46] nuria, not sure how replicable the error is :( [00:23:50] I'll try googling for a bit first [00:23:54] Ironholds: udf unit tests? [00:24:08] this is actually the unit tests for the underlying raw Java class [00:24:39] Ironholds: look at package names, are both tests and code in the same package? [00:25:01] Ironholds: i can try to re-pro if you have a changeset [00:25:35] I'll push what I have so far, then [00:25:38] thanks! [00:27:59] (PS1) OliverKeyes: [WIP] Legacy pageviews definition UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/182971 [00:28:05] nuria, ^ [00:28:14] it appears to only throw those errors after the first test has failed, however [00:28:15] curious [00:28:34] Ironholds: ok, lemme get the changeset [00:29:12] ta! [00:32:02] Ironholds: compiling ... [00:33:43] Ironholds: test fail but they run, have you tried? mvn clean;mvn compile;mvn test [00:33:51] huh; will do so [00:34:20] same failures, same initialisation error [00:34:26] Ironholds: ah i see it [00:36:26] Ironholds: lemme start intelij and import this [00:36:50] (CR) Bmansurov: Integrated logging (1 comment) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180828 (owner: Rtnpro) [00:43:37] Ironholds: so mvn is not compiling the class to target directory [00:43:49] if you run mvn clean/mvn compile [00:44:28] Ironholds: analytics/refinery/source/refinery-core/target/classes/org/wikimedia/analytics/refinery/core/ has 1 class file [00:48:05] Ironholds: does mvn clean;mvn test work for you? [00:50:11] huh! [00:50:40] it does not [00:50:47] the clean process and compile process, however, report [INFO] Compiling 2 source files to /home/ironholds/Code/source/refinery-core/target/classes [00:50:51] so it should have both files. [00:51:15] the only explanationn I have is there's some problem in the definition that prevents it from compiling but doesn't trigger a compiler error, which is just confusing [00:52:41] Ironholds: sorry MY MISTAKE: i see two files there, that is working fine, it's just not in the classpath of the tests for some reason [00:54:14] weird [00:54:25] and yet it should be; it's in the same directory as the existing Pageviews class, same structure [00:54:53] it's just not making it to package org.wikimedia.analytics.refinery.core somehow? [00:58:39] Ironholds: ahhhhh, regex is bad [00:58:44] ohh [00:58:46] * Ironholds looks [00:59:04] welll, ahem, lemme verify [00:59:20] "(commons|incubator|meta|outreach|quality|species|strategy|usability)(\\.m)?\\.wikimedia)\\.org$" excess ) [00:59:42] yess! [00:59:44] resolved! [00:59:46] thank you nuria :D [01:00:09] Ironholds: do you use an IDE? [01:02:13] Ironholds: i know otto doesn't like them but intelij used to cost 400$ and now it's community version is free and it is pretty good [01:03:47] I've tried IntelliJ: it makes my eyes hurt :( [01:03:51] So I tend to use sublime and the terminal [01:04:23] Ironholds: does sublime mark the regex as syntax error? [01:05:15] it does not do so very prominently [01:05:22] I'm going to look at better tooling around plugins and suchlike [01:06:57] (CR) Nuria: [C: 2] Call parents setUp and tearDown in test cases [analytics/aggregator] - https://gerrit.wikimedia.org/r/182667 (owner: QChris) [01:38:42] (CR) Nuria: Prevent crashing ui when metric data is missing (2 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/182946 (owner: Mforns) [01:44:04] (PS2) OliverKeyes: Legacy pageviews definition UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/182971 [01:44:36] nuria, https://gerrit.wikimedia.org/r/#/c/182971/ is now up and it's thanks to you :). Thank you so much for helping me out, particularly at this hour :) [01:45:03] Ironholds: no merit cause i am on the west coast man, merit goes to you for working so late [01:45:34] oh, I just don't have a life ;p [01:45:48] well, that's a lie. I could be hanging out with a 6 foot 2 botanist right now. [01:45:53] but committing is so fun. [03:04:54] (PS5) OliverKeyes: [WIP] start of a generalised class of UDFs for handling the webrequests table [analytics/refinery/source] - https://gerrit.wikimedia.org/r/181939 [03:24:17] hum [03:24:26] it won't compile since I rebased [03:27:56] (PS6) OliverKeyes: [WIP] start of a generalised class of UDFs for handling the webrequests table [analytics/refinery/source] - https://gerrit.wikimedia.org/r/181939 [04:34:55] * Ironholds hugs nuria [04:34:58] CONGRATS! :D [04:42:17] (PS7) OliverKeyes: Generalised class of UDFs for handling the webrequests table [analytics/refinery/source] - https://gerrit.wikimedia.org/r/181939 [04:43:09] (PS8) OliverKeyes: Generalised class of UDFs for handling the webrequests table [analytics/refinery/source] - https://gerrit.wikimedia.org/r/181939 [06:49:56] YuviPanda|zzz, Yo o/ [06:50:47] * YuviPanda|zzz waves at rtnpro [06:51:34] YuviPanda|zzz, I was entangled in a lot of stuffs for the last few weeks [06:52:16] YuviPanda|zzz, waartaa demo instance is down, releases for office work :\ [06:52:56] ow [06:53:10] sounds like a lot of tangled things [06:53:55] YuviPanda|zzz, IRC ports are blocked from our office [06:54:12] YuviPanda|zzz, setup shout for myself in one of my servers :) [06:54:17] :D [06:54:20] shout isn’t so bad... [06:54:31] YuviPanda|zzz, shout is good [10:29:27] (PS1) QChris: Fix numbering of weeks [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/183011 [10:29:45] (CR) QChris: "recheck" [analytics/aggregator] - https://gerrit.wikimedia.org/r/182667 (owner: QChris) [10:30:20] (CR) QChris: "recheck" [analytics/aggregator] - https://gerrit.wikimedia.org/r/182680 (owner: QChris) [10:30:28] (CR) jenkins-bot: [V: -1] Teach the aggregation script from when on to expect mobile and zero counts [analytics/aggregator] - https://gerrit.wikimedia.org/r/182680 (owner: QChris) [11:42:32] (PS2) QChris: Add weekly aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/182676 [11:42:32] (PS2) QChris: Add monthly aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/182677 [11:42:32] (PS2) QChris: Add yearly aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/182678 [11:42:32] (PS2) QChris: Stop treating missing values as 0 [analytics/aggregator] - https://gerrit.wikimedia.org/r/182679 [11:42:32] (PS2) QChris: Teach the aggregation script from when on to expect mobile and zero counts [analytics/aggregator] - https://gerrit.wikimedia.org/r/182680 [11:42:48] (Abandoned) QChris: Fix ISO week numbering [analytics/aggregator] - https://gerrit.wikimedia.org/r/182963 (owner: QChris) [11:43:16] (CR) QChris: Add weekly aggregations (3 comments) [analytics/aggregator] - https://gerrit.wikimedia.org/r/182676 (owner: QChris) [12:14:12] Analytics-Cluster: Raw webrequest partitions for 2015-01-05T17/1H not marked successful - https://phabricator.wikimedia.org/T85918#957264 (QChris) NEW [12:14:33] Analytics-Cluster: Raw webrequest partitions for 2015-01-05T17/1H not marked successful - https://phabricator.wikimedia.org/T85918#957264 (QChris) Varnishkafkas had their timeout adjusted on 2014-01-05T17:02 (f2218287bfa02332c132531c6cab70a8d46039d4) which made sequence numbers reset. This reset explains awa... [12:14:42] Analytics-Cluster: Raw webrequest partitions for 2015-01-05T17/1H not marked successful - https://phabricator.wikimedia.org/T85918#957272 (QChris) Open>Resolved a:QChris [12:14:43] Analytics-Cluster: Raw webrequest partitions that were not marked successful due to configuration updates - https://phabricator.wikimedia.org/T74300#957274 (QChris) [12:15:52] !log Marked raw mobile+text webrequest partitions for 2015-01-05T17/1H ok (See {{PhabT|85918}}) [12:31:33] (PS2) QChris: Add projectcounts for 2015-01-02 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/182689 [12:31:35] (PS2) QChris: Add projectcounts for 2015-01-01 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/182688 [12:31:37] (PS2) QChris: Add weekly projectcounts [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/182684 [12:31:39] (PS2) QChris: Add monthly projectcounts [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/182685 [12:31:41] (PS2) QChris: Add yearly projectcounts [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/182686 [12:31:43] (PS2) QChris: Backfill projectcounts for 2014 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/182687 [12:50:20] (Abandoned) QChris: Fix numbering of weeks [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/183011 (owner: QChris) [13:29:08] Analytics-Cluster: Find deployment host to deploy refinery from that has neither refinery-hive, nor passwords in hive-site.xml - https://phabricator.wikimedia.org/T76806#957320 (QChris) Open>Resolved a:QChris [14:46:31] woooOO, hey nuria [14:46:41] how long did you say your app uuid daily query was taking right now? [14:53:01] qchris: how are you launching things in the adhoc queue? i thought that didn't exist... [14:53:06] maybe it always exists by default? [14:54:46] ottomata: Sorry. I just used the queue_name of those old changes [14:54:58] ahhhh, ok, oh i see this is for tsvs! [14:54:58] It should be "default" now. :-/ [14:54:59] cOOOll! [14:55:01] :) [14:55:03] no worries, i think its fine [14:55:21] Yes, "adhoc" is one of the default queues of fail-scheduler IIRC. [14:58:15] ottomata: for 1 day of data it would be several hours [14:58:28] ARGH GOOGLE! [14:58:28] ottomata: for 1 hour is 10/15 mins [14:58:42] Now it's telling me "party is over" :-/ [14:59:08] Anyone having luck joining the hangout? [14:59:21] I just joined qchris [14:59:28] wonder why it hates you all of a sudden [14:59:37] * milimetric is starting to think hangouts are sentient [15:00:00] qchris: yesterday when you tried the third browser, it gave me a little "so and so is trying to access this hangout, allow?" dialog [15:00:15] Really? [15:00:16] :-D [15:00:16] what was different about that last time, I didn't see the dialog otherwise [15:00:51] It shows the batcave as empty for me. :-/ [15:00:53] nuria: parqeut is 36 minutes! [15:01:47] https://plus.google.com/hangouts/_/wikimedia.org/a-batcave [15:01:50] ottomata: nice! [15:01:53] that's empty?! [15:02:07] milimetric: Yes. [15:02:30] milimetric: I just noticed that me @gmail.com account is not added to the event. Could you please add it? [15:02:36] yes [15:02:41] i.d.rather right? [15:03:07] milimetric: I.d.rather.not.use@gmail.com [15:04:10] right, I know :) I was trying to not disclose that [15:19:36] Analytics, Multimedia: Track image context and pass information onto X-Analytics - https://phabricator.wikimedia.org/T85922#957402 (Gilles) NEW [15:19:56] qchris_meeting: what do you think is the proper way to add jars to hive queries in oozie jobs? [15:20:00] add jar with an hdfs path? [15:20:16] hdfs://analytics-hadoop/wmf/refinery/current/artifacts/? [15:20:35] add jar looks good to me. [15:20:42] But I am not sure about the ".../current/..." [15:21:16] I guess I'd make the url to the jar a parameter to the script. [15:21:36] So oozie can set it using the value of refinery.version [15:29:04] hmmm [15:29:13] is refinery.version also a parameter then? [15:29:18] The hangout is still empty for me [15:29:26] Are we meeting in the batcave as the invite says? [15:29:34] you are just early [15:30:01] Analytics, Multimedia: Track image context and pass information onto X-Analytics - https://phabricator.wikimedia.org/T85922#957417 (Gilles) I think the first decision to make is which contexts to track. Being too exhaustive means that users would needlessly have to reload the images, particularly the default... [16:13:15] milimetric, Hi [16:13:36] hi rtnpro [16:13:48] milimetric, I am sorry for the absence [16:14:04] milimetric, I saw your review, writing a reply to it now :) [16:14:07] no prob - Happy New Year [16:14:22] a few others reviewed, take your time [16:14:23] milimetric, Happy new year to you too :) [16:14:32] it's better to be thorough than fast around here [16:14:39] as changes we make tend to stick longer than we'd like [16:14:46] milimetric, +1 [16:15:22] milimetric [16:16:03] I mutated the LOGGING variable mutable, because I did not want to use copy.deepcopy until and unless necessary [16:16:34] and also, the LOGGING variable will be modified only during a script run [16:17:11] other simultaneous invocation of the script won't have the mutated value [16:18:17] milimetric, ^^ [16:30:51] (CR) Rtnpro: "I have gone through the latest review and update the patch set based on it." (6 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180828 (owner: Rtnpro) [16:44:39] ottomata: Are we having the sync up meeting today? [16:51:35] Analytics, Multimedia: Track image context and pass information onto X-Analytics - https://phabricator.wikimedia.org/T85922#957666 (ezachte) What goes for thumbs goes for frames as well. I wouldn't mind if they get the same tag, they seems closely related in functionality. [17:35:36] Analytics, Multimedia: Track image context and pass information onto X-Analytics - https://phabricator.wikimedia.org/T85922#957725 (Ottomata) Would it be possible to add this info directly to the X-Analytics header from Mediawiki using this extension? https://gerrit.wikimedia.org/r/#/c/157841/ [17:56:45] (CR) Mforns: Integrated logging (1 comment) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180828 (owner: Rtnpro) [17:59:22] (CR) Nuria: [C: 2] Document relation between csv and computed projects [analytics/aggregator] - https://gerrit.wikimedia.org/r/182668 (owner: QChris) [17:59:36] (Merged) jenkins-bot: Document relation between csv and computed projects [analytics/aggregator] - https://gerrit.wikimedia.org/r/182668 (owner: QChris) [18:04:38] (PS3) Rtnpro: Integrated logging [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180828 [18:16:57] Analytics-Cluster: Geo-coding UDF - https://phabricator.wikimedia.org/T77683#957759 (ggellerman) a:Ottomata>ananthrk [18:29:02] qchris_away: lemme know when (if) you are back... [18:49:54] nuria: back. [18:49:57] What's up? [18:50:25] qchris: i checked out the aggregator depot in 1002 to my homdir and run: python aggregate_projectcounts --date 2015-01-01 [18:50:40] but ...ahem...nothing happens [18:50:46] *homedir [18:51:02] tried also python aggregate_projectcounts --date 2015-01-04 --log ./log/log.txt --verbose [18:51:12] You created the empty csvs? [18:53:32] nuria: ^ [18:54:39] qchris, ahem, no, I though source was read from /mnt/hdfs/wmf/data/archive/webstats and initial files created by program [18:54:54] but i guess i have to create the files themselves right? [18:55:02] Run: [18:55:02] bin/aggregate_projectcounts --help [18:55:08] and see the description of the --target parameter. [18:55:23] It's part of the original design of milimetric [18:55:45] the script looks which csvs are there, and [18:55:54] updates only for those projects that have a csv. [18:56:42] qchris, i did read that but understood it differently ok, i see [18:57:01] qchris: I took empty to mean that script had run but found no data [18:57:14] But it says: [18:57:16] So for example to compute data for the English [18:57:20] Wikipedia, put a file 'daily_raw/enwiki.csv' in [18:57:23] TARGET_DIR. [18:57:40] Mhmmm... suggestions welcome to improve that wording. [18:58:44] qchris nevermind i though it was ..." So for example to compute data for the English Wikipedia, PUTS a file 'daily_raw/enwiki.csv'" [18:58:48] qchris: my fault [18:59:23] Oh, I did not mean to accuse. Sorry, if that came out wrong. [18:59:31] Really, if you have a suggestion to [18:59:40] qchris: np, i see what you mean [18:59:52] improve the wording, it's more than welcome. [19:01:47] k, i was going to submit a patch for tox.ini too so i will do that later. [19:02:19] qchris: now, this python aggregate_projectcounts --date 2015-12-10 --log ./log/log.txt --verbose [19:02:37] sorry [19:03:31] qchris: python aggregate_projectcounts --date 2014-12-10 --log ./log/log.txt --verbose works fine i see [19:04:32] I guess I should update the default for --source, since that is no longer "webstats" but "pagecounts-all-sites" [19:06:44] qchris: also other question, for the "monitoring" your thought was to set up that script in another cron? [19:08:18] nuria: Yes, it's in the puppet change that I mentioned in todays email. [19:08:23] Let me get a url [19:09:00] qchris: i see, makes a lot of sense, ok, i think i understand it well enough to take responsibility to merge it all now [19:09:21] https://git.wikimedia.org/blob/operations%2Fpuppet/977fe78e9a3cecd063e9789ea6b5965e449f34e9/manifests%2Fmisc%2Fstatistics.pp#L1178 [19:09:26] ^ there's the cron for monitoring [19:09:30] (CR) Nuria: [C: 2] Add separate test case for tests using a data directory [analytics/aggregator] - https://gerrit.wikimedia.org/r/182669 (owner: QChris) [19:09:38] (Merged) jenkins-bot: Add separate test case for tests using a data directory [analytics/aggregator] - https://gerrit.wikimedia.org/r/182669 (owner: QChris) [19:13:17] (PS1) QChris: Update default to new location of pagecounts-all-sites in the cluster [analytics/aggregator] - https://gerrit.wikimedia.org/r/183072 [19:14:17] qchris: and i take we are deploying puppet to 1002 also [19:14:54] (CR) Nuria: [C: 2] Fix error message if mobile count is too low [analytics/aggregator] - https://gerrit.wikimedia.org/r/182670 (owner: QChris) [19:15:02] (Merged) jenkins-bot: Fix error message if mobile count is too low [analytics/aggregator] - https://gerrit.wikimedia.org/r/182670 (owner: QChris) [19:15:11] Puppet will bring the aggregator repository to stat1002. [19:15:20] Does that address your question? [19:15:28] Or did you mean something else [19:16:24] (PS1) QChris: Remove debugging output from monitoring script [analytics/aggregator] - https://gerrit.wikimedia.org/r/183073 [19:17:29] (CR) Nuria: [C: 2] Fix comments in checks for too low readings [analytics/aggregator] - https://gerrit.wikimedia.org/r/182671 (owner: QChris) [19:17:36] (Merged) jenkins-bot: Fix comments in checks for too low readings [analytics/aggregator] - https://gerrit.wikimedia.org/r/182671 (owner: QChris) [19:17:54] (CR) Nuria: [C: 2] For monitoring script, check for expected issues [analytics/aggregator] - https://gerrit.wikimedia.org/r/182672 (owner: QChris) [19:18:01] (Merged) jenkins-bot: For monitoring script, check for expected issues [analytics/aggregator] - https://gerrit.wikimedia.org/r/182672 (owner: QChris) [19:19:00] qchris: that's it, i was not sure whether 1002 is plugged in into our puppet infrastructure already [19:19:22] yes, puppet is running on stat1002 [19:20:02] (CR) Nuria: [C: 2] Extracts CSV reading and writing into separate functions [analytics/aggregator] - https://gerrit.wikimedia.org/r/182673 (owner: QChris) [19:20:10] (Merged) jenkins-bot: Extracts CSV reading and writing into separate functions [analytics/aggregator] - https://gerrit.wikimedia.org/r/182673 (owner: QChris) [19:21:02] (CR) Nuria: [C: 2] Strip line ending from internal CSV data representation [analytics/aggregator] - https://gerrit.wikimedia.org/r/182674 (owner: QChris) [19:21:10] (Merged) jenkins-bot: Strip line ending from internal CSV data representation [analytics/aggregator] - https://gerrit.wikimedia.org/r/182674 (owner: QChris) [19:23:46] (CR) Nuria: [C: 2] Separate daily into daily_raw and daily [analytics/aggregator] - https://gerrit.wikimedia.org/r/182675 (owner: QChris) [19:23:54] (Merged) jenkins-bot: Separate daily into daily_raw and daily [analytics/aggregator] - https://gerrit.wikimedia.org/r/182675 (owner: QChris) [19:25:52] (CR) Nuria: [C: 2] Add weekly aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/182676 (owner: QChris) [19:26:01] (Merged) jenkins-bot: Add weekly aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/182676 (owner: QChris) [19:26:44] (CR) Nuria: [C: 2] Add monthly aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/182677 (owner: QChris) [19:26:52] (Merged) jenkins-bot: Add monthly aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/182677 (owner: QChris) [19:27:35] (CR) Nuria: [C: 2] Add yearly aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/182678 (owner: QChris) [19:27:50] (Merged) jenkins-bot: Add yearly aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/182678 (owner: QChris) [19:31:21] (CR) Nuria: [C: 2] Stop treating missing values as 0 [analytics/aggregator] - https://gerrit.wikimedia.org/r/182679 (owner: QChris) [19:31:30] (Merged) jenkins-bot: Stop treating missing values as 0 [analytics/aggregator] - https://gerrit.wikimedia.org/r/182679 (owner: QChris) [19:31:32] (Merged) jenkins-bot: Teach the aggregation script from when on to expect mobile and zero counts [analytics/aggregator] - https://gerrit.wikimedia.org/r/182680 (owner: QChris) [19:32:15] (CR) Nuria: [C: 2] Update default to new location of pagecounts-all-sites in the cluster [analytics/aggregator] - https://gerrit.wikimedia.org/r/183072 (owner: QChris) [19:32:35] (PS1) QChris: Fix line ending stripping for monitoring script [analytics/aggregator] - https://gerrit.wikimedia.org/r/183077 [19:33:33] (Merged) jenkins-bot: Update default to new location of pagecounts-all-sites in the cluster [analytics/aggregator] - https://gerrit.wikimedia.org/r/183072 (owner: QChris) [19:54:19] qchris: no reason I can't deploy refinery to hdfs, is there? [19:54:47] no reason to not do nothing not. [19:54:56] Or whatever the correct answer to double nogation is. [19:55:07] It's good to deploy from my point of view. [19:55:25] qchris: do you deploy from stat1002? [19:55:32] Yes. [19:55:36] k [20:14:53] (PS6) Ottomata: First draft of refinement phase for webrequest [analytics/refinery] - https://gerrit.wikimedia.org/r/182478 [20:36:23] ottomata: you around? [21:06:37] hey tnegrin! how are you, and do you have a moment to chat? [21:20:34] urgh. R has spoiled me for statistical calculations. [21:20:43] ottomata, how much have you played around with cascading? [21:21:42] not much at all. [21:21:53] hrm. Do we have anyone who has? [21:21:57] i hear amazing things about scalding, which is cascading in scala + more [21:21:59] no, i don't think so. [21:22:15] have you tried it? can I help? [21:22:55] hey nuria, you there? [21:22:56] I've been playing around but it's hard to get a good idea of how it works, I guess [21:23:04] I have a list of dumb questions ;p [21:23:18] ask me, maybe we can figure it out together [21:23:29] we can hangout if you wanna [21:24:09] eh, they're fairly simple. I'll write em out in an email :) [21:24:22] k [21:39:08] (CR) Mforns: "Cool!" (6 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180828 (owner: Rtnpro) [21:43:56] ottomata: yessss, i'm here [21:46:00] (CR) Nuria: [C: 2] Fix line ending stripping for monitoring script [analytics/aggregator] - https://gerrit.wikimedia.org/r/183077 (owner: QChris) [21:46:11] (Merged) jenkins-bot: Fix line ending stripping for monitoring script [analytics/aggregator] - https://gerrit.wikimedia.org/r/183077 (owner: QChris) [21:50:29] ottomata? [21:52:12] (CR) Nuria: [C: 2] "I think it should be OK to self-merge changes of this type." [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/182681 (owner: QChris) [21:52:14] (PS1) QChris: Recompute the "total sum" column upon rescaling [analytics/aggregator] - https://gerrit.wikimedia.org/r/183148 [21:52:32] nuria, hiayaa [21:52:45] could you give this a once over and a +1 if you like? [21:52:45] https://gerrit.wikimedia.org/r/#/c/182478/ [21:52:53] looking [21:52:59] (PS7) Ottomata: First draft of refinement phase for webrequest [analytics/refinery] - https://gerrit.wikimedia.org/r/182478 [21:53:20] ottomata: this is going to be SO helpful. we need to update docs too [21:59:36] (PS1) QChris: Recompute data for upstream change on total sum computation [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/183150 [22:03:10] (PS8) Ottomata: First draft of refinement phase for webrequest [analytics/refinery] - https://gerrit.wikimedia.org/r/182478 [22:03:34] (Abandoned) QChris: Add carrier 'Hello/Malaysia Telcom Cambodia' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112207 (owner: QChris) [22:04:01] (Abandoned) QChris: Add carrier 'GrameenPhone Bangladesh' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112208 (owner: QChris) [22:04:11] (Abandoned) QChris: Add carrier 'Celcom Malaysia' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112209 (owner: QChris) [22:04:27] (Abandoned) QChris: Stop excluding a carrier's start date from the treated dates [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112211 (owner: QChris) [22:04:33] ottomata, could you clarify something about the mental model I should be using for cascading? [22:04:44] (Abandoned) QChris: Drop three carriers from total sum [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112210 (owner: QChris) [22:05:00] maybe? [22:05:04] i can try? [22:05:05] Ironholds: ^ [22:05:09] (CR) Nuria: "Need to learn more about parquet to comment effectively but this is going to make our lives a lot better." (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/182478 (owner: Ottomata) [22:05:11] * Ironholds tries to think of how to phrase [22:05:12] (Abandoned) QChris: Use year to detect edge cases when downsampling months [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112212 (owner: QChris) [22:05:21] so, I can create a multiple-source tap. Cool. [22:05:24] (Abandoned) QChris: When counting bad days for a month, repspect the year [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112213 (owner: QChris) [22:05:34] (Abandoned) QChris: Updating bad days [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112214 (owner: QChris) [22:05:37] But, will Cascading know to have one mapper per source, or...? [22:05:43] (Abandoned) QChris: Connect days around bad_dates in the graph [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112215 (owner: QChris) [22:05:51] i doubt it would do just one mapper per source [22:05:53] (Abandoned) QChris: Fix treatment of good days, if no request was logged [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112216 (owner: QChris) [22:05:56] hrm :/ [22:05:59] likely it would do more [22:06:01] unless you tell it to [22:06:02] oh, that's fine too [22:06:05] (Abandoned) QChris: Updating bad dates [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112217 (owner: QChris) [22:06:12] I'm trying to come up with a mental model for how to treat cascading, and struggling [22:06:16] (Abandoned) QChris: Stop dropping Aircell India [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112218 (owner: QChris) [22:06:21] so, from what I can tell, the idea is that you specify the flow of the data in your job [22:06:26] (Abandoned) QChris: Remove overrides for foundation wiki table [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112219 (owner: QChris) [22:06:32] ottomata: ok, will read a bit more about parquet to contribute with more inteligent comments. Change looks great, now i think the IP bucketing might not be "random" enough. [22:06:34] (Abandoned) QChris: Add graph that sums monthly graphs [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112220 (owner: QChris) [22:06:34] and it then launches a chain of MR jobs based on that [22:06:41] gotcha. That makes sense! [22:06:42] (Abandoned) QChris: Stop dropping carrier's start date for summary graphs [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112221 (owner: QChris) [22:06:48] kind of like hive does, except you are planning the execution yourself and can write arbitrary java [22:06:50] (Abandoned) QChris: Mark 2013-08-14–2013-08-18 as bad days [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112222 (owner: QChris) [22:07:01] (Abandoned) QChris: Add carrier 'Orange Madagascar' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112223 (owner: QChris) [22:07:09] (Abandoned) QChris: Add carrier 'Banglalink Bangladesh' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112224 (owner: QChris) [22:07:14] then my next problem (and it's far in the future) is working out how to make sure all of the instances of [example_UUID] go to the same mapper, I guess. [22:07:17] (Abandoned) QChris: Add carrier 'Umniah Jordan' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112225 (owner: QChris) [22:07:25] (Abandoned) QChris: Add carrier 'Airtel Kenya' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112226 (owner: QChris) [22:07:34] (Abandoned) QChris: Strip markup from MCC-MNC loaded from wiki table [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112227 (owner: QChris) [22:07:45] qchris: I think I have tested & all your changes, do please let me know if there is something I might have missed it. [22:07:47] (Abandoned) QChris: Add carrier 'Beeline Kazakhstan' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112228 (owner: QChris) [22:08:14] nuria: Thanks! That's great. [22:08:19] Ironholds: i just flipped through all of the slides from this presentation [22:08:19] https://vimeo.com/59610496 [22:08:23] i haven't watched the video [22:08:27] but, the slides seemed really good [22:08:42] nuria: Did you see the one about changing how the "total sum" for rescaled is computed? [22:08:51] nuria: https://gerrit.wikimedia.org/r/#/c/183148/ [22:08:53] Ironholds: that is probably a group by operation of some kind :) [22:09:05] nuria: (I uploaded that while you've been reviewing the others) [22:09:19] qchris: ok, i was missing that one. let me look [22:09:33] ottomata, that makes sense! [22:09:35] thanks :) [22:09:43] nuria: Awesome. Thanks. [22:10:05] nuria, I'm going to respond to your comments here, if that's alright [22:10:08] (Abandoned) QChris: Add carrier 'Tcell Tajikistan' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112229 (owner: QChris) [22:10:15] ottomata: sure [22:10:18] ottomata: how do we fix someone's permissions to ssh into stat1003? [22:10:18] (Abandoned) QChris: Use link text for Zero link when stripping MCC-MNC markup [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112230 (owner: QChris) [22:10:26] (Abandoned) QChris: Mark 2014-01-05, and 2014-01-06 as bad dates [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112231 (https://bugzilla.wikimedia.org/59722) (owner: QChris) [22:10:26] a new PM for mobile web needs access and can only go to bastion [22:10:30] ok, re bucketing. [22:10:34] (Abandoned) QChris: Add carrier 'Grameenphone Bangladesh' [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112232 (owner: QChris) [22:10:40] the fields are hashed and then modded [22:10:47] (Abandoned) QChris: Mark 2014-02-05, and 2014-02-06 as bad dates [analytics/wp-zero] - https://gerrit.wikimedia.org/r/112233 (https://bugzilla.wikimedia.org/60955) (owner: QChris) [22:10:56] and then stored in a file based on the resulting value [22:11:53] ottomata: right [22:12:01] it doesn't quite work like partitioning, clustering won't do antyhing here if you did where ip = 1.2.3.4 or ip = 2.3.4.5 ' [22:12:38] so, i don't think that it really matters what field we choose, as long as there are a large number of values and the requests are relatively distributed across them. [22:12:40] oh.hm [22:12:42] that is what you are saying. [22:12:46] but if you have an ip that say, repeats 10% of the time, all those records are going to be bucketing to teh same bucket (same hash) [22:12:51] yes. [22:12:52] i see. [22:13:06] ottomata: that will happen in mobile [22:13:15] hmmm [22:13:16] ok. [22:13:18] ottomata: old times example is blackberry ips [22:13:20] will change that then. ok [22:13:20] yeah [22:13:38] we could just use dt i guess [22:13:43] hmm, dunno [22:13:54] ottomata: never examples are small virtual operators but even big ones like vodafone in spain used to use a very samll set of ips [22:14:17] what about ip+user_agent?' [22:14:19] yeah, opera mini does something funky too. [22:14:34] dt might work, ok, good point on that, i'll think about it (i have to run soon) [22:14:43] if you just +1ed it I would have merged and laucned the job! [22:14:46] GEEZ now I have to THINK! [22:14:47] :p [22:15:40] (Abandoned) QChris: Adapt monitoring to active editors' beginning of year increase [analytics/geowiki] - https://gerrit.wikimedia.org/r/119283 (owner: QChris) [22:16:15] right, ottomata opera mini and , i bet, amazon silk [22:16:24] although last one i do not know for sure [22:16:41] (CR) Ottomata: First draft of refinement phase for webrequest (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/182478 (owner: Ottomata) [22:16:57] ok, nuria, thank you. i will work on this more tomorrow then. [22:16:59] i gotsta go! [22:18:55] ottomata: ciaoooo [23:06:03] (CR) Mforns: Prevent crashing ui when metric data is missing (2 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/182946 (owner: Mforns) [23:09:49] (CR) Mforns: Prevent crashing ui when metric data is missing (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/182946 (owner: Mforns) [23:12:58] nuria, I answered to your review with no changes, let me know if you think something should be changed [23:13:59] (PS1) QChris: Remove dia backup files upon 'make clean' [analytics/refinery] - https://gerrit.wikimedia.org/r/183166 [23:17:22] nuria: ping re: refactor? [23:34:09] ori: yes. on it [23:34:18] mforns: will look after ori's review [23:36:52] ori: is this right for javascript unit tests? http://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing [23:37:06] yes [23:37:17] they pass, jenkins runs qunit too [23:46:06] ahhh [23:46:23] ori: in what port does mw install by default on vagrant? [23:46:30] 8080