[00:00:40] milimetric, does the test test_login_meta_mw pass in your machine for the mwoauth changeset? [00:01:20] Analytics-Wikimetrics: enwiki does not show up in my centralauth expanded cohort - https://phabricator.wikimedia.org/T78584#849591 (kevinator) [00:02:43] Analytics-Wikimetrics: enwiki does not show up in my centralauth expanded cohort - https://phabricator.wikimedia.org/T78584#849595 (mforns) This issue is not directly related to cohort user deletion, but with centralauth expansion. When separating the username and the project with a comma followed by a space... [00:04:49] Analytics-Wikimetrics: Story: WikimetricsUser deletes user from cohort [21 pts] - https://phabricator.wikimedia.org/T75350#849603 (kevinator) [00:05:21] Analytics-Wikimetrics: Wikimetrics User reads disclaimer on website [3 pts] - https://phabricator.wikimedia.org/T76107#849604 (kevinator) [00:05:32] Analytics-Wikimetrics: Wikimetrics auditor has read-only login to Wikimetrics DB [3 pts] - https://phabricator.wikimedia.org/T76109#849605 (kevinator) [00:05:43] Analytics-Wikimetrics: Fix oauth and do a quick pre-security review [13 pts] - https://phabricator.wikimedia.org/T76779#849607 (kevinator) [00:05:55] Analytics-Wikimetrics, Analytics-Dashiki: Vital Signs user sees annotations on graphs [13 pts] - https://phabricator.wikimedia.org/T78151#849608 (kevinator) [00:06:19] Analytics-Wikimetrics: Uploading cohort by copy-pasting breaks if names contain special characters [8 pts] - https://phabricator.wikimedia.org/T76105#849609 (kevinator) [00:06:50] Analytics-Engineering, Analytics-Wikimetrics: Re-run Wikimetrics data once Labs issues are fixed [8 pts] - https://phabricator.wikimedia.org/T78305#849610 (kevinator) p:High>Normal [00:12:59] (CR) Mforns: Strip project when uploading cohort (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180071 (owner: Mforns) [00:21:14] Analytics-Engineering, Analytics-Wikimetrics: Story: WikimetricsUser reports pages edited by cohort [13pts] - https://phabricator.wikimedia.org/T75072#849641 (kevinator) [00:22:28] ottomata, kicking off some test streaming jobs now. [00:22:31] :) [00:22:36] Your file structure is pretty [00:26:08] cool! heh [00:26:28] the parquet one may be a problem...due to some annoying hadoop streaming details (it uses the deprecated API...) [00:26:55] Honestly, I wasn't quite sure how to set up for that one. [00:27:09] Is parquet a compression format? [00:27:49] Oh! It's columnar. [00:28:08] ya, don't worry about that for now then [00:28:09] ui' [00:28:12] i'll have to figure that one out [00:28:15] kk [00:28:24] i was working on that today, but then i spent the day helping oliver instead :) [00:28:46] no worries. Pageviews is important. [00:28:48] halfak: http://grepalex.com/2014/05/13/parquet-file-format-and-object-model/ [00:28:53] I was working on other stuff all day anyway. [00:29:13] ottomata, I'm having trouble imagining how a columnar format would work for streaming. [00:29:34] Given that the scripts will expect to get a whole json thing at a time. [00:30:06] yeah, for streaming it might not make as much sense [00:30:13] but, i'm trying to benchmark more than just streaming [00:30:19] getting an idea for this in other use cases too [00:30:26] you can query this stuff with hive too [00:30:32] Totally. Hive makes a lot of sense. [00:31:31] ok, i'm out for the eve, ttyl [00:36:46] o/ ott [00:36:49] o [00:36:54] too late [00:39:13] I'm outta here too [00:46:54] Analytics-Engineering, Analytics-Wikimetrics: Story: WikimetricsUser reports pages edited by cohort [13pts] - https://phabricator.wikimedia.org/T75072#849730 (kevinator) p:High>Low After much discussion with the dev team, it became apparent that this is not a simple change to Wikimetrics. It is built t... [01:36:23] did we change how many days we stored in hive? [01:43:35] Analytics-EventLogging: find a better way to identify events that fail validation as early as possible - https://phabricator.wikimedia.org/T78355#849763 (kevinator) Resolved>Open reopening task because there are still issues to iron out. [04:00:08] (CR) Nuria: "Wow, code looks so much cleaner! Tests run clean in vagrant" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/179962 (owner: Milimetric) [07:43:08] (CR) Gergő Tisza: "mysql:research@analytics-store.eqiad.wmnet [log]> show index from NavigationTiming_10374055;" [analytics/multimedia] - https://gerrit.wikimedia.org/r/179872 (owner: Gergő Tisza) [08:15:50] Analytics-Tech-community-metrics: Key performance indicator: Bugzilla response time - https://phabricator.wikimedia.org/T63561#849864 (Qgil) [08:15:51] Analytics-Tech-community-metrics: Bugzilla response time: "Longest time without comment" is actually "Longest time without any comment by non-reporter"? - https://phabricator.wikimedia.org/T69589#849862 (Qgil) Open>declined Well, in fact this one refers to the Bugzilla stats page, which now is deprecate... [08:16:39] Analytics-Tech-community-metrics: Key performance indicator: code contributors new / gone - https://phabricator.wikimedia.org/T63563#849865 (Qgil) a:Qgil>None [08:16:51] Analytics-Tech-community-metrics: Key performance indicator: Top contributors - https://phabricator.wikimedia.org/T64221#849866 (Qgil) a:Qgil>None [09:55:56] Multimedia, Analytics, MediaWiki-extensions-MultimediaViewer: Create MediaViewer image varnish hit/miss ratio dashboard - https://phabricator.wikimedia.org/T78205#850032 (Gilles) a:Tgr [10:04:30] (CR) Gilles: [C: -1] "The data is there now, but the graph doesn't render the "cache miss ratio" curve, only the total request count. I think it's a limn bug, b" [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/179778 (owner: Gergő Tisza) [10:06:48] Multimedia, Analytics, MediaWiki-extensions-MultimediaViewer: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#788560 (Gilles) Initial poking at the data suggests that prerendering actually worsened performance. I'll work on creating graphs arou... [10:10:53] Multimedia, Analytics, MediaWiki-extensions-MultimediaViewer: Create MediaViewer image varnish hit/miss ratio dashboard - https://phabricator.wikimedia.org/T78205#850050 (Gilles) Note that pre-rendered thumbnails will appear as a varnish miss. The first time they're requested they're in swift, but not in varn... [10:29:33] Analytics-Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://phabricator.wikimedia.org/T71615#850070 (QChris) Happened again for: 2014-12-15T15/2H (on text) [10:30:10] !log Marked raw text webrequest partition for 2014-12-15T15/2H ok (See {{PhabT|71615|850070}}) [10:30:53] (PS1) Gilles: Query image performance by upload time [analytics/multimedia] - https://gerrit.wikimedia.org/r/180136 [11:11:31] Multimedia, Analytics, MediaWiki-extensions-MultimediaViewer: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#850128 (Gilles) Actually I've just thought of checking that data by dividing varnish hits (shouldn't be affected by prerendering) and... [11:17:47] (PS2) Gilles: Query image performance by upload time [analytics/multimedia] - https://gerrit.wikimedia.org/r/180136 [11:34:21] Analytics-Engineering: Dedupe data while doing ETL for the Analytics cluster - https://phabricator.wikimedia.org/T76724#850163 (QChris) [11:47:59] Multimedia, Analytics, MediaWiki-extensions-MultimediaViewer: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#850186 (fgiunchedi) good question Gilles, it shouldn't happen AFAIK but it is possible, were those also direct fetches from swift? wha... [12:38:21] Phabricator, Analytics-Tech-community-metrics, Engineering-Community: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#850331 (Aklapper) >>! In T1003#844983, @kevinator wrote: > This might be a cheap way to get a visualization of the data. Thanks. We might play wit... [12:38:26] Phabricator, Analytics-Tech-community-metrics, Engineering-Community: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#850332 (Aklapper) Status update: * Daniel triggered a "Phabricator monthly statistics" test email to my account last week. * For unknown reasons (t... [13:01:46] Phabricator, Analytics-Tech-community-metrics, Engineering-Community: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#850358 (Dzahn) >>! In T1003#850332, @Aklapper wrote: > * For unknown reasons (those SQL queries work locally, maybe the //cut -d " " -f3// in the... [13:06:39] (PS1) QChris: Add requirements.txt [analytics/blog] - https://gerrit.wikimedia.org/r/180156 [13:06:41] (PS1) QChris: Fix flake8 errors [analytics/blog] - https://gerrit.wikimedia.org/r/180157 [13:06:43] (PS1) QChris: Allow to specify date to compute the report for [analytics/blog] - https://gerrit.wikimedia.org/r/180158 [13:06:45] (PS1) QChris: Add basic tox setup [analytics/blog] - https://gerrit.wikimedia.org/r/180159 [13:06:47] (PS1) QChris: Move blogreport code behind a __main__ guard [analytics/blog] - https://gerrit.wikimedia.org/r/180160 [13:06:49] (PS1) QChris: Add tests for parsing string to date [analytics/blog] - https://gerrit.wikimedia.org/r/180161 [13:13:26] Multimedia, Analytics, MediaWiki-extensions-MultimediaViewer: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#850363 (Gilles) I'm looking at our own data gathered from sampled clients that checks if the image was a varnish hit or not. It's pars... [13:32:47] Multimedia, Analytics, MediaWiki-extensions-MultimediaViewer: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#850380 (Gilles) Average performance experienced by users hitting the cp4xxx "varnish2" servers is slower: P160 is this expected? No... [14:25:31] (PS2) Dr0ptp4kt: Adding MCC-MNC mismatch script. [analytics/zero-sms] - https://gerrit.wikimedia.org/r/179926 [14:26:00] (CR) Dr0ptp4kt: [C: 2] Adding MCC-MNC mismatch script. [analytics/zero-sms] - https://gerrit.wikimedia.org/r/179926 (owner: Dr0ptp4kt) [14:27:26] (CR) Dr0ptp4kt: [V: 2] Adding MCC-MNC mismatch script. [analytics/zero-sms] - https://gerrit.wikimedia.org/r/179926 (owner: Dr0ptp4kt) [14:37:43] (PS1) Dr0ptp4kt: Update README [analytics/zero-sms] - https://gerrit.wikimedia.org/r/180182 [14:38:14] (CR) Dr0ptp4kt: [C: 2 V: 2] Update README [analytics/zero-sms] - https://gerrit.wikimedia.org/r/180182 (owner: Dr0ptp4kt) [14:41:43] Phabricator, Engineering-Community, Analytics-Tech-community-metrics: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#850592 (Aklapper) I've added a few more patchsets / revisions to https://gerrit.wikimedia.org/r/#/c/177792/ : * worked around (fixed?) the two empt... [14:46:26] (PS2) Mforns: Strip project when uploading cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180071 [14:46:37] (CR) jenkins-bot: [V: -1] Strip project when uploading cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180071 (owner: Mforns) [14:49:16] yay ottomata! [14:49:19] (PS3) Mforns: Strip project when uploading cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180071 [14:50:31] mornin! [14:51:28] I have comments! comments on the UDF! [14:51:31] and also questions [14:51:35] Phabricator, Engineering-Community, Analytics-Tech-community-metrics: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#850622 (Dzahn) >>! In T1003#850592, @Aklapper wrote: > Works locally. Now waiting for Daniel to trigger another test email to me. yep, works for m... [14:51:48] like, should we also exclude based on GET versus POST, y'think? It seems to catch some silliness. [14:52:44] (CR) OliverKeyes: [C: -1] "Comments!" (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [14:53:52] Ironholds: that is up to the specification committee :) [14:54:16] but sure! and! I would be much obliged if you felt empowered to make new patches to the change :) [14:54:18] the specification committee thinks it's too finnicky a rule to be useful [14:54:22] but wanted to make sure [14:54:35] totally! I...have never actually patched an existing commit throug hgerrit before [14:54:42] do you use git review? [14:55:09] I read Patrick's tutorial to it? That counts. [14:55:31] well, i mean ,you can use straight git, but i use git review, and if you did I would know how to advise you [14:55:36] how do you usually push to gerrit for review? [14:55:40] oh, right. yes, I do! [14:55:46] ok cool [14:55:50] I mean, I've only patched MW a coupla times. [14:55:50] welp, git review makes this easy [14:56:22] okay, I've cloned the refinery project [14:56:25] gimme the tutorial! [14:56:25] so, the uhh the change number is the id in the url [14:56:29] ah, you actually need refinery-source [14:56:31] do you have that? [14:56:43] this one? [14:56:43] https://gerrit.wikimedia.org/r/#/admin/projects/analytics/refinery/source [14:57:23] once you do [14:57:32] you can take the change number from the url [14:57:33] indeed [14:57:37] https://gerrit.wikimedia.org/r/#/c/180023/ [14:57:38] in this ase [14:57:39] case [14:57:43] 180023 [14:57:43] and do [14:57:46] git review -d 180023 [14:57:48] aha [14:57:50] and it will checkout the change for you [14:57:58] then, make whatever changes you want [14:58:00] "no .gitreview file found" [14:58:01] and git commit --amend [14:58:03] ah [14:58:08] yeah i should fix that, id on't use git review files [14:58:08] um [14:58:09] do this [14:58:20] ahhhh, i'll just check in a git review file [14:58:23] kk [14:58:27] will wait and pull, in order [14:59:53] (CR) Mforns: Strip project when uploading cohort (7 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180071 (owner: Mforns) [14:59:59] (PS1) Ottomata: Add .gitreview file [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180188 [15:00:16] (CR) Ottomata: [C: 2 V: 2] Add .gitreview file [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180188 (owner: Ottomata) [15:00:20] ok Ironholds [15:00:21] pull [15:00:22] and try that again [15:02:58] shall do [15:03:32] We don't know where your gerrit is. Please manually create a remote [15:03:32] named "gerrit" and try again [15:03:45] hm [15:03:50] (sory in standup now) [15:03:51] hm [15:04:00] ok, then fine, don't use a .gitreview file [15:04:00] do [15:04:06] np! [15:04:23] git remote add gerrit ssh://ironholds@gerrit.wikimedia.org:29418/analytics/refinery/source [15:04:44] done and done [15:04:51] ok try again [15:04:56] oh maybe [15:04:57] do [15:04:57] first [15:04:59] git review -s [15:05:05] (setup) [15:05:43] Analytics-Wikimetrics: Fix oauth and do a quick pre-security review [13 pts] - https://phabricator.wikimedia.org/T76779#850637 (Milimetric) security review results: SQL in controllers and some forms / ajax improvements remain to be done. [15:05:46] permission denied [15:05:50] I'll make sure I'm using the right key [15:08:46] aha, wrong key [15:12:07] dammit [15:12:15] how the hell do I specify what key to use? [15:13:16] hm [15:13:22] git config, i'm sure uhm [15:13:48] hm maybe [15:14:33] hm, no? ok Ironholds in your ~/.ssh/config file [15:14:34] add [15:15:02] Host gerrit.wikimedia.org [15:15:02] User ironholds [15:15:02] IdentityFile /path/to/your/ssh/key [15:15:04] ta [15:19:31] aaand it can't find my key. hrm. [15:20:15] urgh [15:20:21] will finish setting up after the combined staff meeting [15:36:00] Phabricator, Engineering-Community, Analytics-Tech-community-metrics: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#850674 (Dzahn) merged PS10 https://gerrit.wikimedia.org/r/#/c/177792/ puppet-compiler: http://puppet-compiler.wmflabs.org/558/change/177792/html/i... [15:36:09] ottomata, got it working! [15:36:17] cool! [15:36:22] * Ironholds starts patching [15:36:24] ok, yeah, so now that you have the change checked out [15:36:29] you can commit new patches by doing [15:36:32] git commit --amend [15:36:44] (make sure you amend, otherwise things will get pretty annoying) [15:36:46] then [15:36:47] just [15:36:50] git review [15:37:18] make sure you only have this one commit that you are working on and amending at the head of this checkedout change-branch when you git review [15:50:07] ottomata, is it okay if I expand the unit tests? [15:50:17] ....I appreciate that's usually a stupid question [15:50:30] please! [15:50:31] haha [15:50:48] that was a quick and get it working proof of concept patch [15:50:52] it will need MANY more unit tests [15:50:59] btw, have you seen christian's webstatscollector tests? [15:51:27] https://github.com/wikimedia/analytics-webstatscollector/tree/master/tests [15:51:57] https://github.com/wikimedia/analytics-webstatscollector/blob/master/tests/test.sh#L265 [15:51:59] (PS4) Milimetric: Use the wonderful mwoauth [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/179962 [15:52:26] we should probably put together some canonical list of tests that can be reusable by different implementations of this [15:53:01] even just text files, or json files, or something [15:53:16] we could make json files with the same schema content of webrequest logs, and then maybe add an extra field in the tests [15:53:22] is_pageview? [15:53:29] and then just iterate throught them all [15:53:34] and assert that [15:53:49] Webrequest.is_pageview(...) == line['is_pageview'] [15:53:51] or whatever [15:54:06] i think qchris_meeting has opinions on this :) [15:54:14] but we are all in the same meeting! [15:54:16] shhhhh [15:55:57] Phabricator, Engineering-Community, Analytics-Tech-community-metrics: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#850735 (Dzahn) needed one more small fix: https://gerrit.wikimedia.org/r/#/c/180193/ but now it works, puppet applied it and i tested it. the co... [15:56:42] Ironholds: if you can put together a big list of test cases, lines that ARE pageviews, and lines that AREN'T pageviews, i can work on that [15:56:58] ottomata, will do! [15:57:00] at the moment I'm duplicating tests into the hive tests and the underlying java tests [15:57:08] this is inefficient although largely trivial with find-and-replace :D [15:57:08] ? [15:57:12] yeah [15:57:25] you have hive tests? [15:59:02] sorry, tests of the resulting UDF [15:59:27] refinery-hive/src/test/java/org/wikimedia/analytics/refinery/hive/TestIsPageviewUDF.java compared to refinery-core/src/test/java/org/wikimedia/mediawiki/TestWebrequest.java [15:59:51] ah yes [15:59:52] yeah [16:00:27] Ironholds: if you want to, for now, just do TestWebrequest.java? we can duplicate to the UDF tests later? or meh, do whatever you want :) [16:00:39] already wrote em! [16:01:04] (PS3) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [16:01:34] https://gerrit.wikimedia.org/r/#/c/180023/ boom [16:01:35] of course, I make no promises that it will actually run, as I am unable to test it locally [16:01:43] but it seems to syntactically match existing code [16:02:52] why can't you run locally? [16:02:54] do: [16:02:57] mvn test [16:03:08] hm, maybe you need maven, hmm? [16:03:39] will get [16:03:56] what OS are you Ironholds? [16:04:17] xubuntu; sudo apt-get install maven [16:04:23] ah easy peasy. [16:05:51] running... [16:06:45] incompatible types! aw man.. [16:07:36] okay, this makes no sense to me. [16:07:52] looking [16:07:58] apparently the new version of source/refinery-hive/src/test/java/org/wikimedia/analytics/refinery/hive/TestIsPageviewUDF.java has incompatible types everywhere [16:08:02] I assume that means I borked the UDF, actually [16:08:11] yes [16:08:19] you are assigning instead of newing [16:08:22] this [16:08:22] Text uriHost = new Text("en.mobile.wikipedia.org"); [16:08:23] vs. [16:08:26] Text uriHost = "en.wikipedia.org"; [16:08:36] in this case, Text is a [16:08:39] import org.apache.hadoop.io.Text; [16:08:39] javascript:; [16:08:41] oops [16:08:53] org.apache.hadoop.io.Text [16:08:58] which is a special Hadoop object [16:09:01] not a default Java String [16:09:02] aha [16:09:04] doy [16:09:38] note that the refinery.core.Webrequest class works with regular Strings [16:09:41] cause it is supposed to be generic [16:09:54] since Hive will always run in hadoop context, it works with Hadoop objects [16:10:02] yeah [16:11:28] this is making my internal C++ programmer really antsy [16:11:39] "you're calling new but there's no equivalent free() or delete! EVERYONE IS GOING TO DIE AND IT WILL BE THIS CODE'S FAULT" [16:12:11] haha [16:12:18] well, at least you feel that way! [16:12:22] okay, fixed, testing again [16:12:24] i think that is a good feeling [16:12:28] I spent all of yesterday tracking down a single memory leak [16:12:33] when people start with java they don't have that! [16:12:40] heh [16:14:18] hmn [16:14:30] ottomata, if I just include "foo" in a match pattern in Java, will it match barfoobar? [16:14:49] Ironholds: i think not, which is strange. [16:14:57] tha's why I prepended the .*s in the regexes [16:15:04] hrrrm [16:15:12] i use this to check: http://www.regexplanet.com/advanced/java/index.html [16:15:13] :) [16:15:38] it wants normal regexes there, not java style [16:15:42] .e.g. you don't need \\. [16:15:43] just \. [16:16:41] ohhh [16:16:55] that'd do it [16:17:55] nope [16:18:01] "illegal escape character" [16:18:55] ....wtf? [16:18:59] ? [16:19:00] ottomata, api.php matches api.php [16:19:09] that is, a regex, api.php, matches a string, api.php [16:19:16] you don't need to escape it at ALL? How can that be right? [16:19:21] . == single character [16:19:24] any character [16:19:35] aha [16:19:44] yet api\.php is an illegal escape, and api\\.php fails. [16:20:04] wait, no, api\\.php works. [16:20:06] ...sod it. [16:20:09] all the tests pass [16:20:12] want the code? :p [16:20:19] api\.php works [16:20:23] in the tester [16:20:31] which means api\\.php should work [16:20:33] aye [16:20:34] (PS4) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [16:20:36] yeah, submit, :) [16:20:39] done! [16:20:45] maven checks pass [16:20:52] Ironholds: if you want to build it and run it in hadoop [16:20:52] do [16:20:57] mvn clean package [16:20:57] then [16:21:31] scp refiney-hive/target/refinery-hive-0.0.3-SNAPSHOT.jar stat1002.e:~/ [16:21:35] then you've got the .jar on stat1002 [16:21:41] and you can ADD JAR like you know how to already :)( [16:21:49] oh [16:21:50] sorry [16:22:00] scp refiney-hive/target/refinery-hive-0.0.3-SNAPSHOT.jar stat1002.eqiad.wmnet:~/ [16:22:04] cool! Thankee :) [16:22:05] but ja [16:22:18] so I can just iterate on this endlessly? [16:22:20] yup [16:22:25] yay! [16:31:39] ottomata, will be in the dev thingy in ~5. Just gimme the hangout URI :) [16:31:41] Ironholds: https://plus.google.com/hangouts/_/wikimedia.org/aotto-jgerard [16:31:44] k [16:31:51] yay! [16:31:55] also, Java is not as hard as I expected. [16:32:01] I've been told it's a wretched hive of scum and villany. [16:32:26] naw, its easier than C++ Ironholds, for sure [16:32:32] but about the same [16:54:12] Analytics-Engineering, Analytics-EventLogging: EL office hours - https://phabricator.wikimedia.org/T76796#850892 (ggellerman) [16:56:58] (CR) Nuria: [C: 2] Use the wonderful mwoauth [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/179962 (owner: Milimetric) [16:58:28] ottomata, thoughts on making tests more exportable - https://gist.github.com/Ironholds/30c09e6b4402bfc0e967 [16:58:40] that's what I was trying to say but failing :D [17:02:29] Analytics-Wikimetrics: Fix oauth and do a quick pre-security review [13 pts] - https://phabricator.wikimedia.org/T76779#850902 (Milimetric) Open>Resolved [17:02:31] Analytics-Wikimetrics, Analytics-Engineering: EPIC: Productionizing Wikimetrics - https://phabricator.wikimedia.org/T76726#850903 (Milimetric) [17:09:05] qchris_meeting: what?! [17:09:16] tnegrin, okay, you may have been right. [17:09:33] "I'm not a java programmer! This is a one-off! Would anyone mind if I do some restructuring of the user-agent parser?" [17:09:34] about what? [17:09:39] heh [17:09:47] welcome what you cannot avoid [17:09:50] It'd make sense to have the actual parser as a Java function and the UDF Just do the type-checking and conversion [17:09:58] at the moment it's all one blob [17:10:15] so I figure, first I split, and then I can add Wikimedia-specific spider UAs to the Java function [17:10:16] * Ironholds nods [17:10:17] Ironholds: Java ain't so bad! come on in! [17:10:28] You say that now [17:10:44] and then in 6 months R&D see me as an engineer with ggplot2, and AnEng see me as a researcher with maven [17:10:59] and I have an existential crisis and flee to Peru, where I live under an assumed name and drink at the beach all day [17:11:04] it'll happen, I guarantee it. [17:11:15] sounds likea good life [17:11:16] (I may or may not have watched too many bad spy movies) [17:12:18] I don't think I could deal with the heat, though :( [17:16:17] gotta run to post office, bbiab [17:16:31] too hot? this looks cold to me :P http://www.weatherbase.com/weather/weather.php3?s=82648&cityname=Lima-Peru [17:26:45] Analytics-EventLogging: EventLogging ValidateSchemaTest::testValidEvent() fails under HHVM - https://phabricator.wikimedia.org/T78680#850963 (hashar) NEW [17:38:18] Analytics-Engineering: Split ua-parser UDF - https://phabricator.wikimedia.org/T78685#851017 (Ironholds) NEW [17:42:45] Analytics-Engineering: Include Wikimedia-specific spiders in ua-parser UDF - https://phabricator.wikimedia.org/T78686#851031 (Ironholds) NEW a:Ironholds [17:42:59] Analytics-Engineering: Include Wikimedia-specific spiders in ua-parser UDF - https://phabricator.wikimedia.org/T78686#851031 (Ironholds) [17:43:54] Analytics-Engineering: Oozie tutorial - https://phabricator.wikimedia.org/T78687#851042 (ggellerman) NEW [18:05:10] Analytics-EventLogging: find a better way to identify events that fail validation as early as possible - https://phabricator.wikimedia.org/T78355#851094 (kaldari) > It's possible (and easy) to set something up that watches invalid events in real-time and does something with them. The question is: what? E-mail... [18:11:04] ironholds: for the user agent parser udf you mean creating a static class that does the parsing? [18:11:28] Ironholds: that is done by UA parser already right? [18:11:42] nuria__, see ottomata|afk 's pageview identifier, where the identifier is one file and the UDF wrapper is another [18:12:02] If people think the current bundling of the tool is cool, that's cool :). I'll just add the WM-specific spiders and limit my changes to that [18:12:10] Ironholds: right, but in that case there is no external library like ua parser that is doing all the lifting [18:12:23] ahh, fair point [18:12:25] Ironholds: we are just wrapping ua parser a bit [18:12:27] okay, let's leave it :) [18:12:37] I'll incorporate the spiders as I would anyway and close that phab ticket [18:12:44] Ironholds: otherwise i feel we are wrapping the wrapper [18:12:51] so we can wrap while we wrap [18:12:54] Ironholds: also check Generic udf versus udf [18:13:01] will do! [18:13:23] Analytics-Engineering: Include Wikimedia-specific spiders in ua-parser UDF - https://phabricator.wikimedia.org/T78686#851222 (Ironholds) p:Triage>Normal [18:13:41] Analytics-Engineering: Include Wikimedia-specific spiders in ua-parser UDF - https://phabricator.wikimedia.org/T78686#851031 (Ironholds) [18:13:42] Analytics-Engineering: Split ua-parser UDF - https://phabricator.wikimedia.org/T78685#851223 (Ironholds) Open>declined a:Ironholds [18:24:46] Phabricator, Engineering-Community, Analytics-Tech-community-metrics: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#851249 (Aklapper) >>! In T1003#850735, @Dzahn wrote: > let me know if you don't receive it daily from now on Uh, //daily//? The idea was monthly (... [18:25:12] phew, man post office is busy! [18:26:14] nuria__: Ironholds, ja i remember this now [18:26:24] we didn't add a refinery.core class because ja, it was just a ua-parser wrapper [18:26:28] ottomata: yessss [18:26:29] but, Ironholds wants to add extra logic, right? [18:26:36] yup [18:26:46] so if that is the case, then there should be a new refinery.core class [18:26:51] just a pattern and then a check against it in the "else" of "if(device.family != NULL)" [18:26:54] hrm [18:26:54] so that tools other than hive can share Ironholds' new logic [18:27:14] Ironholds, ottomata [18:27:18] i see [18:27:37] oh, that makes sense [18:27:56] ottomata: but do all those calsses taht we wnat to share need to be static [18:27:57] ? [18:28:01] and then we have the ua-parser UDF inherit from that as well [18:28:02] sensible [18:28:06] that kind of does not sound so great [18:28:11] *that [18:28:22] (FWIW, in a health check, so may not have the best responsiveness) [18:28:31] Ironholds: no inherit please rather composition [18:28:33] no [18:28:36] gotcha [18:28:41] they do not need to be static necessarily [18:28:49] i just haven't seen a need for anything non-static yet [18:29:06] I created a Webrequest class for pageviews stuff just now, not sure that was a good idea [18:29:17] but, perhaps it will turn into a generic Webrequest object model, iunno [18:29:34] i have that for Revisions in the avro stuff i'm doing [18:29:49] if that were the case [18:29:56] then new Webrequest(..) might make sense [18:30:12] but, for hive UDFs (main use case right now), static functions is all we need [18:31:41] ottomata: last word goes to qchris in CR he can comment on that but unit testing of loads of static functionality might not be so friendly [18:32:11] ottomata: specially if some static functions end up using others, you will not be able to do partial testing [18:32:11] naw, more friendly! its all functional like! no need to share state. Inputs and outputs only :) [18:32:28] aye [18:32:42] ottomata: that you can do with classes non static that also keep ability to be mocked [18:32:50] ottomata: just as well [18:33:59] ottomata: no shared state is good agreed, but that does not imply static [18:36:06] s/shared/saved/ :) [18:36:19] but ja, ingeneral, i agree. wahtever the use case is makes sense [18:39:41] (CR) Milimetric: Strip project when uploading cohort (2 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180071 (owner: Mforns) [18:40:03] thanks milimetric, having a look [18:41:16] looks good Marcel, only some doc comments and i'll merge then [18:41:26] ok [18:41:32] oh, you know, i should run the tests myself. one sec [18:42:05] ok [18:42:28] oh, i guess mforns: rebase first and do the couple doc things [18:42:35] 'cause otherwise i have dependency annoyances [18:47:42] ottomata: ok, i was able to do what i wanted on puppet [18:47:53] ottomata: hopefully my changes do not make you cry [18:48:17] ottomata: (feel free to ask to re-do if needed) https://gerrit.wikimedia.org/r/#/c/180222/ [18:51:31] milimetric: you around? [18:52:06] trying to unwrap vital signs v. wikimetrics v. wikimetrics features for grantmaking [18:52:07] hi tnegrin yes [18:52:16] sure - batcave? [18:52:24] great [18:52:27] brt [18:53:06] old or new? [18:53:11] nuria__: can you explain the motivation for the change? [18:53:18] what's the readonly user for? [18:53:32] ottomata: requested by grantmaking [18:53:42] to access the database? [18:53:48] outside of wikimetrics app? [18:54:14] ottomata: yes, to run their "own" reporting [18:54:22] nuria__: I think you shouldn't puppetize it then. :/ [18:54:26] just run the grant [18:54:32] or, puppetize it outside of wikimetrics [18:54:43] if you like, maybe in the role class. but still, i dunno [18:55:06] ottomata: but wait, why not? if it is not there we will move machines and it will stop working [18:55:32] ottomata: is a security concern? [18:55:41] naw, just doesn't (yet) make sense to me [18:55:47] um [18:55:55] ${options} ${privileges} ON ${title}.* [18:56:00] here, title is 'research' [18:56:05] is there a database called 'research'? [18:57:33] ottomata: no arghhhhh [18:57:52] why do they want access to the wikimetrics database? [18:57:59] isn't there just wikimetrics cohort metadata there? [18:58:12] and i guess, stored results? [18:58:24] ottomata: yes, both, kevinator can answer that [18:58:52] ok, so I still think this shouldn't be in the wikimetrics module. you are granting access to a database for another 'app' or 'client' [18:59:27] what if researchers wanted access to the production mediawiki databases? you wouldnt' put that in the mediawiki module. [18:59:35] maybe the define is ok... [18:59:40] no tsure [18:59:42] i think not [18:59:53] but the usage of the define (for this research user) should be outside of the module [19:00:08] let's say grantmaking was actually building an app on top of wikimetrics database [19:00:21] the puppetization of that app would not go into the wikimetrics module [19:00:32] it would be ok for that puppetization to depend on the wikimetrics module [19:00:39] i see ottomata should be where then? in the role? [19:01:05] sure. i mean, the define is kinda ok. you are just abstracting out the creation of a mysql user. ideally that would be in a mysql module...buuuut, let's not go there :) [19:01:31] :-) [19:01:49] ottomata: i did looked at that , boy there are some sophisticated mysql puppet modules but i think we do not use those [19:02:15] ottomata: grantmaking wants DB access to that wikimetrics meta-data so they can get a sense of what features are used in wikimetrics [19:02:28] yeah, nuria__, indeed, they exist but we don't use them. [19:02:37] what features are used? [19:02:48] kevinator: is this just a one off so they issue some queries? [19:02:58] ottomata: yes [19:03:07] then ja, nuria__, just do it manually :) [19:03:28] kevinator: features? i thought you said data prior [19:03:39] ottomata: features are looking at average cohort size, how often a tag name is used etc [19:03:59] aye, but will they be 'productionizing' something using this access? or are they just currently curious? [19:04:07] like, they want to know, they'll have somebodoy run some queries and report back [19:04:09] replace features with stats [19:04:11] and then not mess with it? [19:04:21] ottomata: ok, will abandon change and run the grants now [19:04:35] nuria__: btw, nice work though! overall it looks like a good change! [19:04:40] they don’t want to productionize this. They’ll run queries every so often and create a manual report [19:04:50] will correct the $title cause is easy enough to do [19:04:58] :) [19:05:40] ja, if we were going to use this, i'd say: keep the new define, but get rid of the ::user { 'research' use in database.pp, and move that into a role [19:06:04] the define looks like a nice abstraction, but let's not push it through now if we don't need it [19:06:09] ottomata: agreed [19:06:27] cool thanks, hope that didn't take too much of your time [19:07:17] I’m glad I could help [19:08:09] ottomata: i learned a bunch, i think i understand i know 1% of what is going on so no worries [19:08:29] ottomata: you make a lot of progress when you do not know ANYTHING [19:10:30] hah [19:10:47] ottomata: corrected database for completition but will leave change unmerged [19:12:26] cool. [19:20:30] Analytics-Wikimetrics: Wikimetrics auditor has read-only login to Wikimetrics DB [3 pts] - https://phabricator.wikimedia.org/T76109#851445 (Nuria) Per ottomat's suggestion we just run the grants for the read only user in production (use is called "research") rather than puppetize teh setup. User is available... [19:21:27] Analytics-Wikimetrics: Wikimetrics auditor has read-only login to Wikimetrics DB [3 pts] - https://phabricator.wikimedia.org/T76109#851448 (Nuria) User "research" only has permits to "wikimetrics" database. [19:21:35] Analytics-Wikimetrics, Analytics-Engineering: Epic: Grantmaking User gets reports on Wikimetrics usage - https://phabricator.wikimedia.org/T76106#851452 (Nuria) [19:21:37] Analytics-Wikimetrics: Wikimetrics auditor has read-only login to Wikimetrics DB [3 pts] - https://phabricator.wikimedia.org/T76109#851451 (Nuria) Open>Resolved [19:23:03] (PS4) Mforns: Strip project when uploading cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180071 [19:23:33] milimetric, I deployed into staging and tested it, everything seems to be working fine [19:23:44] if you want to have a look it's there [19:23:45] cool, testing [19:25:11] Analytics-Dashiki, Analytics-Wikimetrics, Analytics-Engineering: Remove the "confusing" under reported data for Edits and Pages Created in Vital Signs - https://phabricator.wikimedia.org/T75617#851474 (Milimetric) a:Milimetric [19:27:03] Analytics-Wikimetrics, Analytics-Engineering: Epic: WikimetricsUser deletes user from cohort - https://phabricator.wikimedia.org/T76421#851478 (Nuria) [19:27:05] Analytics-Wikimetrics: Story: WikimetricsUser deletes user from cohort [21 pts] - https://phabricator.wikimedia.org/T75350#851477 (Nuria) Open>Resolved [19:27:23] Analytics-Wikimetrics, Analytics-Engineering: Epic: WikimetricsUser deletes user from cohort - https://phabricator.wikimedia.org/T76421#799951 (Nuria) [19:27:24] Analytics-Wikimetrics: Story: WikimetricsUser deletes user from cohort [21 pts] - https://phabricator.wikimedia.org/T75350#764487 (Nuria) Resolved>Open [19:28:53] (PS5) Milimetric: Strip project when uploading cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180071 (owner: Mforns) [19:31:55] kevinator: looking for suggestions what to pick up next. In the meantime I will add icon http://bits.wikimedia.org/static-1.25wmf11/skins/Vector/images/external-link-ltr-icon.png to dashiki [19:34:14] (CR) Milimetric: [C: 2] Strip project when uploading cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180071 (owner: Mforns) [19:37:07] nuria__ / kevinator: batcave in a few minutes? [19:37:10] kevin's finishing something up [19:37:19] milimetric: sure [19:37:44] Analytics-Wikimetrics: Story: WikimetricsUser deletes user from cohort [21 pts] - https://phabricator.wikimedia.org/T75350#851489 (mforns) [19:37:45] Analytics-Wikimetrics: enwiki does not show up in my centralauth expanded cohort - https://phabricator.wikimedia.org/T78584#851488 (mforns) Open>Resolved [19:38:22] Analytics-Wikimetrics, Analytics-Engineering: Epic: WikimetricsUser deletes user from cohort - https://phabricator.wikimedia.org/T76421#851492 (mforns) [19:38:24] Analytics-Wikimetrics: Story: WikimetricsUser deletes user from cohort [21 pts] - https://phabricator.wikimedia.org/T75350#851491 (mforns) Open>Resolved [19:42:24] nuria__: milimetric ok, give me 10 minutes more… talking to toby [19:57:29] qchris: hey, do you have a couple of minutes? [19:57:49] Sure. [19:57:57] What's up? [19:58:07] so, redirects are not valid json, which is why you couldn't find any schema moves, but there have in fact been some [19:58:41] (PS1) Milimetric: Add link to privacy policy on all pages [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180242 [19:58:45] So the page table does not list them as redirects? [19:58:48] mhmm. [19:59:02] see [19:59:19] milimetric nuria__ I’m ready to meet in the batcave [19:59:39] I asked Kaldari about it yesterday, since he's both a prolific admin and a user of EventLogging, and his take was: preserve the ability to move schema pages if it's not too much of a pain in the ass, and remove it otherwise. [20:00:03] (CR) Milimetric: "not sure about how this looks, but it's nice that the policy is now linked everywhere." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180242 (owner: Milimetric) [20:00:05] so how to preserve it? [20:00:07] I played around with it yesterday and got here: https://gist.github.com/atdt/fb70ccbb342b9a9bb426 [20:00:09] kevinator: omw [20:00:30] Oh :-) [20:00:35] that's an awful query, but since it's scoped to the schema namespace on metawiki, it's not awful [20:00:45] but there are still some tricky edge-cases to work out [20:01:09] I guess I am wrong ... I figured we trimm logging after $SOME_NUMBER of days? [20:01:19] that may be true, I'm not sure [20:01:48] the other thing I tried was [20:02:09] JSON schema actually specifies a 'title' property, so we could save the title with the page [20:02:50] unfortunately when you move a page it creates a no-diff revision, and as best as i could determine there is no way to intercept that and get it to save the new title [20:03:14] Mhmmm ... maybe we can solve the problem on a different end. [20:03:19] so I'm kinda leaning towards "this is a pain in the ass.. let's not allow moves" [20:03:27] Oh? Do you have something in mind? [20:03:37] I mostly care about not using the unvalidated title that is stored in the event. [20:03:51] Maybe we can just infer [20:03:56] ottomata, got a sec? [20:03:59] the table name from the title that schema has? [20:04:02] Ironholds: sure [20:04:09] ori: Would that work for you? [20:04:13] so, can you tell me what the expected outcome of the pageviews UDF is? [20:04:26] by that I mean, is it going to be merged, or held, or...? [20:04:32] I want to work out what to do if I make subsequent modifications that aren't in the scope of patch amendments. [20:04:36] *commit amendments [20:04:50] ori: It might need a bit of shoehorning to get the schema's title there (I have not checked yet) [20:05:07] ori: But then we'd also have a known-good table name. [20:05:26] ori: And redirects would still work. [20:05:36] s/redirects/moves/ [20:06:10] Ironholds: I think it should be merged [20:06:22] it will need more review (from me and qchris) and more testing before it is merged [20:06:37] gotcha [20:06:43] okay, I will continue patching in the meantime :) [20:06:53] Ironholds: you can always make changes in a branch (local or remote) for yourself [20:06:55] but [20:06:56] and when it is merged I will use it as a basis for a UDF for the existing filters [20:06:57] yeah [20:07:00] the intention for these udfs is [20:07:03] they will be merged [20:07:11] and release jars will be built and added to archiva [20:07:15] and they will be deployed [20:07:16] yup [20:07:21] and then added to hive auxpath (i think) automaitically [20:07:31] also, in some newer version of hive [20:07:35] there are permanent functions for udfs [20:07:46] so, ideally, you wouldn't have to create the function all the time [20:07:49] you'd just start up hive [20:07:54] and have is_pageview() around for use [20:09:33] qchris: but what about: i create 'MySchema'; i log stuff; i rename it to 'MyRenamedSchema'. A database outage necessitates backfilling events that were logged before the rename. Is it OK for the events to go into a table named 'MyRenamedSchema' now? [20:09:55] (CR) Nuria: "I think it needs UX work, can we move the "privacy link" to the top bar? otherwise it really looks strange. See screenshoot on ticket: T76" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180242 (owner: Milimetric) [20:10:39] ori: I'd say yes. If the schema got moved, it ist still the same schema. So we should also have moved the table along with it. [20:10:55] Otherwise, they are not the same schema, and the should not get moved. [20:11:02] qchris: in that case, this should be pretty easy. the eventlogging api module (that serves schemas) interpolates the title into the response. [20:11:08] $result->addValue( null, 'title', $title->getText() ); [20:11:13] in ApiJsonSchema.php [20:11:37] ottomata, indeed. [20:11:40] also, a thought comes to mind [20:12:04] if we're going to be having to put the output of this somewhere so it can be compared to WSC's output, and also having to create a WSC UDF, we may want to just have an oozie task that runs both. [20:12:08] ori: Yes, that line is how I found out about that old bug around the title. [20:12:26] kevinator, ggellerman: can I ping someone on Phab? [20:12:28] qchris: so what we're saying is: we use the reported title to generate the table name, but we don't validate that the event's schema name matches the title reported by the api? [20:12:45] why do we have to have a WSC UDF? [20:13:17] ori: I'd say so. But I know little about EventLogging and their users :-) [20:13:38] ori: You think that approach does not make sense outside of qchris' head? [20:13:41] qchris: between that and not allowing renames, what would be your preference? [20:14:05] I am not fond of renames. I'd much rather treat names as cast in stone. [20:14:07] I think it makes sense, but it's not simple [20:14:09] yeah. [20:14:13] OK, I think I agree. [20:14:18] DarTar: you can add them to cc in a ticket. Depending on their settings, they will get an email with every change to the ticket [20:14:45] fungible immutability is no immutability at all [20:14:58] right, that’s one of the things I hate about BZ/Phab: spam [20:15:09] DarTar: your vacation autoresponder says you love e-mail! [20:15:11] pings are very convenient [20:15:25] ori: that was my summer 2014 autoresponder [20:15:26] ori: Agreement is on what. Forbidding renaming, or using the name from the revision's current title? [20:15:36] qchris: forbidding renaming. [20:15:45] I should use one of these gmail plugins to plot the size of my inbox over time [20:15:52] ori: Ok. Forbidding renaming it is. [20:16:06] ori: Thanks. [20:18:18] ottomata, the UDF takes 146 seconds to grab 1m pageviews [20:18:21] holy crap, Java. Y u so fast. [20:18:49] ha, cool [20:19:39] Ironholds: I'll quote you on that :-P [20:19:52] it's like someone took C++ and made it less than a massive pain in the ass to write in. [20:20:00] Ironholds: remember that script you have running over sample logs that tells you of app uniques? [20:20:05] yup? [20:20:19] only, not sampled logs, it's running over hadoop [20:20:20] just HQL [20:20:21] Ironholds: is all data needed to get those uniques in hive too? [20:20:29] oh, totally! It's running over that data now. [20:21:43] I can sense a rush of UDF reviews from Ironholds in our future :)_ [20:21:50] Ironholds: is it running from hadoop data [20:21:54] ? [20:22:01] now? [20:22:15] nuria__, yup! [20:22:26] frankly it'd probably be trivial to infrastructure-ise [20:22:28] I keep meaning to submit a card and then forgetting [20:22:44] Ironholds: i think we can do it this upcoming sprint, can you point me to teh code? [20:22:46] *the [20:22:58] totally! [20:23:11] let me see if I have a local copy... [20:23:42] ach, I do not. /home/ironholds/R/UUIDs/uuid.R [20:23:56] it's junky as hell because it's crontab-dependent and cron does not recognise "every 31 days" :/ [20:23:58] Ironholds: so ... ahem.... is the code committed somewhere? [20:24:07] Ironholds: junky .. jajajaj [20:24:53] so it is R on top of hadoop? [20:25:50] ah. ahahah. oh no. [20:25:50] R script launches, creates a temp file containing the query [20:26:02] makes a system() call launching hive -f temp_file > temp_out.tsv [20:26:05] reads in temp_out.tsv [20:26:08] ..and then processes [20:26:14] when I say junky I mean JUNKY :d [20:26:31] it's not even distributed over hadoop, it's just HQL with a bit of formatting for pretty printing and making sure it's running when it should. [20:26:40] and naw, it's not committed because it was such a simple script [20:26:42] ok, so there is a hive query right? [20:26:45] but I can send it to you if that'd help? [20:26:46] oh yes [20:27:34] Analytics-Dashiki, Analytics-Engineering: Vital Signs user reads description of metric - https://phabricator.wikimedia.org/T76741#851568 (Milimetric) a:Milimetric [20:27:53] Analytics-Wikimetrics, Analytics-Engineering: Re-run Wikimetrics data once Labs issues are fixed [8 pts] - https://phabricator.wikimedia.org/T78305#851569 (Milimetric) a:Milimetric [20:28:05] Ironholds: ok, i can pretty-fy (oozie +udf , not sure) it but i need the hive query as i have no idea as of the business rules of app uniques [20:28:48] totally [20:28:49] will send over! [20:29:15] Ironholds: put that in a gist, i would like to look at it too [20:29:27] okie! [20:29:34] Ironholds, ottomata , qchris: for this card (app uniques on hive) https://phabricator.wikimedia.org/T76534 [20:30:07] https://gist.github.com/Ironholds/428014d22edb7969ff5c have also emailed so it doesn't get lost in logs [20:30:07] * qchris does not look at cards with "app uniques" unless community said we should do it. [20:30:23] we implement a query (a la webstatscollector) that dumps files in a known location? [20:30:26] qchris is a sensible fellow avoiding work which gives me both the heebies and indeed the jeebies [20:30:41] nuria__, yep, stat1002:/a/aggregate-datasets/apps/ [20:30:52] and then that gets rsynced over to the same place we throw out all our datasets [20:31:20] Analytics-Engineering, Analytics-Refinery: Mobile Product manager has daily App Uniques report generated using Hive - https://phabricator.wikimedia.org/T76534#851578 (Ironholds) https://gist.github.com/Ironholds/428014d22edb7969ff5c boop [20:31:48] ehhh Ironholds those are not app uniques, right? [20:31:52] those are installs [20:32:09] Ironholds: ah no, wait [20:32:28] qchris comment makes me feel like a minion [20:33:15] nuria__: I don't want to make you feel like a minion. Sorry. Everyone got decide whether or not thon wants to do something. [20:33:37] I think for app uniques, we just feel different. That's ok. [20:34:20] Analytics-Dashiki, Analytics-Engineering: Vital Signs user knows to click on metric title to open definition - https://phabricator.wikimedia.org/T76741#851585 (kevinator) a:Milimetric>None [20:34:35] qchris: more like it is already done, as in already implemented. But I really wish there was some cohesion here [20:35:14] Analytics-Dashiki, Analytics-Engineering: Vital Signs user knows to click on metric title to open definition - https://phabricator.wikimedia.org/T76741#819194 (kevinator) Updated the title and description of this task per discussions with dev team on best way to achieve our intentions. [20:35:40] I cannot much control what other teams do. But I can control what I work one. If it's ethically not sound for me to work on something, I don't work on it. [20:39:20] yeah. I...do not have that freedom :/ [20:39:20] FWIW, I also find this intensely creepy [20:39:29] frankly every time I meet someone and they have the app I tell them to uninstall it. [20:40:07] Everyone has the freedom. Just do not let some rob you that freedom. [20:40:26] If I refuse to work on in, I might get fired. [20:40:38] If we all refuse to work on it, they don't get the work done. [20:40:55] Also I think we all get fired [20:41:04] ;p [20:41:42] :-) [20:43:26] Ironholds: you find creepy the appinstallId, right? [20:43:34] Ironholds: not the app itself [20:43:59] yep [20:45:38] Ironholds: how long does the query take to eun? [20:45:41] *run? [20:46:05] nuria__, I do not actually know! [20:46:14] "a day-ish". It's crunching through 31 days of data [20:46:18] ...wait, crap. [20:46:25] we've started storing more than 31 days in hive, haven't we. [20:46:51] i do not know, ottomata : are we storing more than 31 days in hive? [20:47:17] if we are, we will need some way to automatically generate BETWEEN statements for the times :/ [20:47:47] nope [20:47:49] shouldn't have [20:49:10] uhh [20:49:12] (CR) Ori.livneh: [C: 2 V: 2] Add requirements.txt [analytics/blog] - https://gerrit.wikimedia.org/r/180156 (owner: QChris) [20:49:17] well, we have partitions from the 10th of November, sooo... [20:50:08] wait, earlier than that [20:50:10] webrequest_source=upload/year=2014/month=11/day=5/hour=1 e.g. [20:50:11] (CR) Ori.livneh: [C: 2 V: 2] Fix flake8 errors [analytics/blog] - https://gerrit.wikimedia.org/r/180157 (owner: QChris) [20:52:08] (CR) Ori.livneh: [C: -1] "1) Document the expected date format in the argument help." [analytics/blog] - https://gerrit.wikimedia.org/r/180158 (owner: QChris) [20:52:43] ok ottomata Ironholds so we actually have more than 31 days [20:52:53] (CR) Ori.livneh: [C: 2] "Looks good, feel free to merge." [analytics/blog] - https://gerrit.wikimedia.org/r/180159 (owner: QChris) [20:53:09] yeah :( [20:53:18] yay, found a bug in the pageviews UDF :d [20:53:25] (not sarcastic. Bugs means patches) [20:53:25] (CR) Ori.livneh: [C: 2] Move blogreport code behind a __main__ guard [analytics/blog] - https://gerrit.wikimedia.org/r/180160 (owner: QChris) [20:53:26] Ironholds: did you try to "sample" (using hive's random sampling) [20:53:52] (PS2) Milimetric: Add link to privacy policy on all pages [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180242 [20:53:58] no? [20:54:01] I mean, the task runs once a week [20:54:03] Ironholds: I cannot find a good reason why you would need to look at every record to estimate the uniques [20:54:31] (CR) Ori.livneh: "Add tests for dates set in the future (after updating https://gerrit.wikimedia.org/r/#/c/180158/ to reject such dates)" [analytics/blog] - https://gerrit.wikimedia.org/r/180161 (owner: QChris) [20:54:32] because they wanted a precise count, and because when I checked the randomised sampling methods, they appeared to be of the form "grab the first N rows from each block" [20:54:41] and the problem with that is that we partition based on temporal data [20:55:12] hmn. wait. actually, that objection doesn't make sense. Stupid brain! [20:55:18] they wanted a precise count. I don't know if an estimate is okay. [20:57:52] OO, Ironholds, good catch [20:57:57] i see errors in the drop partitions script [20:57:58] logs [20:58:02] investigating [20:58:09] yay! [20:58:09] I mean, that it was caught [21:01:16] Ironholds: you can have quite a precise count with sampling, there is random sampling on hive [21:01:41] fair! But as said, I don't know how the product owner would feel [21:01:44] we can ask. Deskana? [21:04:44] (PS1) Fhocutt: Save history.json in the config folder [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180296 [21:05:15] fhocutt: looking [21:05:15] ah [21:05:17] qchris: ! [21:05:21] thanks, milimetric [21:05:38] Ironholds: You can reduce the number of records sampled until the estimation for unqiues you get is, say, 1% different than the one you have. [21:05:48] ottomata: wassup? [21:05:55] /wmf/data/raw/webrequest/webrequest_mobile/hourly/2014/12/03/17.duped/ [21:06:10] File "/usr/lib/python2.7/dist-packages/dateutil/parser.py", line 303, in parse [21:06:10] raise ValueError, "unknown string format" [21:06:10] ValueError: unknown string format [21:06:15] this [21:06:15] 2014/12/03/17.duped [21:06:20] :-) [21:06:23] is getting passed to dateutil_parse [21:06:24] Ironholds: but -given the amount of data- you can sample quite a bit and still report with a high degree of confidence [21:06:25] You can remove it. [21:06:31] nuria__, okay. I'm sort of confused, I guess; do you want me to make that change or are you saying you're going to make it as part of the productionisation, as it were? [21:06:31] Let me do that. [21:07:01] Ironholds: i will do it, my point was [21:07:03] k danke! the script shoudl be more robust, and not die. [21:07:09] but in future, it would be better to keep temp stuff like that out of the data dir [21:07:16] even if there isn't a partition on it [21:07:26] checking if I can fix script. [21:07:39] that you do not need a query that runs for a day when you can get those same results in less time using sampling, [21:07:42] (CR) Milimetric: "good, just one naming nitpick" (1 comment) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180296 (owner: Fhocutt) [21:07:44] fair point [21:08:12] ottomata: I kept it there on purpose, so people would find it, if they looked for it :-/ [21:08:16] thanks, milimetric [21:08:28] Ironholds: i was thinking performance as part of productionizing but i wanted to let you know for when you CR it [21:08:49] gotcha :) [21:09:35] (PS2) Fhocutt: Save history.json in the config folder [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180296 [21:10:45] hokay, patching complete, runs :D [21:10:57] ottomata, can I add a commit message with amends? [21:11:22] ottomata: where (in refinery depo, if pertains) should the productionized hive queries go for the app unique report? [21:11:45] (CR) Nuria: [C: 2] Add link to privacy policy on all pages [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180242 (owner: Milimetric) [21:11:55] (Merged) jenkins-bot: Add link to privacy policy on all pages [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/180242 (owner: Milimetric) [21:12:44] (PS5) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [21:12:48] wikimedia/mediawiki-extensions-EventLogging#292 (master - 65143ca : Christian Aistleitner): The build was broken. [21:12:48] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/compare/3cc71f8bcd58...65143ca758a4 [21:12:48] Build details : http://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/44254784 [21:13:26] (CR) Milimetric: Save history.json in the config folder (1 comment) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180296 (owner: Fhocutt) [21:13:58] Analytics-Wikimetrics: Wikimetrics User reads disclaimer on website [3 pts] - https://phabricator.wikimedia.org/T76107#851690 (Milimetric) Open>Resolved [21:13:59] Analytics-Wikimetrics, Analytics-Engineering: Epic: Grantmaking User gets reports on Wikimetrics usage - https://phabricator.wikimedia.org/T76106#851691 (Milimetric) [21:15:49] Analytics-Engineering, Analytics-Wikimetrics, Analytics-Dashiki: Remove the "confusing" under reported data for Edits and Pages Created in Vital Signs - https://phabricator.wikimedia.org/T75617#851692 (Milimetric) [21:15:51] ottomata: the .duped should be gone. [21:16:06] Analytics-Wikimetrics, Analytics-Engineering: Re-run Wikimetrics data once Labs issues are fixed [8 pts] - https://phabricator.wikimedia.org/T78305#851693 (Milimetric) [21:16:43] k thanks qchris! [21:16:48] Analytics-Engineering, Analytics-Refinery: Mobile Product manager has daily App Uniques report generated using Hive - https://phabricator.wikimedia.org/T76534#851694 (Nuria) a:Nuria [21:16:55] halfak: you need to invite me I think? [21:17:00] sorry for breaking it in first place :-( [21:17:09] np [21:17:13] i'm making script better [21:17:18] ottomata: where (in refinery depo, if pertains) should the productionized hive queries go for the app unique report? [21:17:32] YuviPanda, {{done}} [21:17:52] nuria__: likely in oozie/ somewhere, if they are going be ooziefied [21:18:24] ottomata: If we want them to run the "last day of the month" we need to ooziefy them right? [21:18:54] (PS3) Fhocutt: Save history.json in the config folder [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180296 [21:19:13] ja should [21:19:34] nuria__: if we are going to productionize refinery/hadoop jobs, then they will probably need to be ooziefied, unless we change something [21:19:43] thus far only qchris and I have done this. [21:19:49] it sounds like ellery might be trying to do somethign too [21:20:11] the oozie stuff that exists in refinery now is really good, nice and clean, but it isn't the most straightforward to understand [21:20:14] i blame oozie and xml [21:20:16] travis-ci: Nothing broken. Runs fine locally and in jenkins. :-p [21:20:20] and our (pretty) use of abstractions :) [21:20:32] ottomata: ok, i will make the hive queries which will be oliver's but using some sampling and after i will seek your guidance [21:20:39] ok [21:21:10] ottomata: maybe an oozie class will be good [21:23:18] (PS1) Ottomata: Catch ValueError raised by hive.partition_datetime_from_path [analytics/refinery] - https://gerrit.wikimedia.org/r/180305 [21:23:24] nuria__: ja [21:23:39] i'd have to prepare for that a bit, if folks want to just get together and talk about oozie, that would be good [21:23:47] otherwise...i'd have to prepare :) [21:25:16] (PS4) Fhocutt: Save history.json in the config folder [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180296 [21:29:07] (CR) Milimetric: [C: 2] Save history.json in the config folder [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180296 (owner: Fhocutt) [21:29:11] (PS5) Milimetric: Save history.json in the config folder [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/180296 (owner: Fhocutt) [21:33:03] Analytics-Engineering: [Volunteer] Generate.py needs to write to a separate history.json file per configured instance - https://phabricator.wikimedia.org/T77936#851718 (Milimetric) Open>Resolved a:Milimetric Fixed by @fhocutt [21:43:07] Engineering-Community, Analytics-Tech-community-metrics, Phabricator: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#17323 (Qgil) This is just great. I'm so happy. Thank you! [21:50:54] ottomata: any idea of the number of records we have for 1 month of data in hadoop? [21:54:30] ha, um [21:54:38] select count(*) ...:) [21:54:38] or [21:54:42] napkin guess: [21:55:24] ottomata: napkin sounds good [21:56:06] hmm, maybe ~135K / second average (that's a guess) so [21:56:19] 361584000 [21:56:19] ? [21:56:30] oh no [21:56:38] 361584000000 [21:56:39] that many ^ [21:56:52] 135000*60*60*24*31 [21:58:18] Phabricator, Engineering-Community, Analytics-Tech-community-metrics: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#851771 (chasemp) can I get on this list? [21:59:26] hrm [22:01:20] Phabricator, Engineering-Community, Analytics-Tech-community-metrics: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#851782 (Qgil) [22:04:37] (Abandoned) Terrrydactyl: Add ability to delete wiki users [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/142045 (owner: Terrrydactyl) [22:06:28] MediaWiki-extensions-MultimediaViewer, Analytics, Multimedia: Create MediaViewer image varnish hit/miss ratio dashboard - https://phabricator.wikimedia.org/T78205#851804 (Tgr) Yeah, I didn't think of that. The Last-Modified header of thumbnails seems match when they were generated (Swift also adds an X-Timest... [22:08:09] MediaWiki-extensions-MultimediaViewer, Analytics, Multimedia: Create MediaViewer image varnish hit/miss ratio dashboard - https://phabricator.wikimedia.org/T78205#851811 (Tgr) Or when the last-modified header is older than the date header minus the difference between local times for request and response? That... [22:08:50] Phabricator, Engineering-Community, Analytics-Tech-community-metrics: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#851812 (Qgil) Is it possible to add these to the report? > Number of accounts created > Number of users logging And I agree with @Nemo_bis, sendi... [22:20:45] qchris, ottomata [22:20:55] Mhmm? [22:20:56] so, the pageviews UDF as it stands is working and not actively producing the wrong result [22:21:11] there are some idiosyncracies to work out in the definition but we knew that would happen [22:21:26] given that it successfully compiles and runs and produces pretty much the right result, could someone look at +2ing the latest changeset? [22:21:46] then I can use that as the basis for a WSC UDF, and we can look at how we go about building something in oozie to produce equivalent numbers [22:22:28] I won't today. Got to finish up some other things and then off to bed. [22:22:39] I can look tomorrow if it is still unreviewed then. [22:22:42] and then we get the fun of "hourX has a difference. Okay, SELECT *, WSC_pageviews() AS wsc_outcome, prototype_pageviews() AS prototype_outcome FROM webrequest WHERE ...and hour=X and WSC_pageviews() != prototype_pageviews(), or whatever [22:22:46] okie! Thank you :) [22:23:39] Ironholds: I'm going to have to work on the test case stuff i think before qchris lets me merge it [22:23:45] will try to do some of that tomorrow [22:23:48] i gotta run too [22:23:52] laterrrs [22:24:06] laters ottomta [22:24:22] ottomata, gotcha! [22:24:34] in that case, want me to write some more tests, or did you mean converting them so they are less duplicative? [22:24:35] aw [22:33:35] (PS2) QChris: Add basic tox setup [analytics/blog] - https://gerrit.wikimedia.org/r/180159 [22:33:37] (PS2) QChris: Allow to specify date to compute the report for [analytics/blog] - https://gerrit.wikimedia.org/r/180158 [22:33:39] (PS2) QChris: Move blogreport code behind a __main__ guard [analytics/blog] - https://gerrit.wikimedia.org/r/180160 [22:33:41] (PS2) QChris: Add tests for parsing string to date [analytics/blog] - https://gerrit.wikimedia.org/r/180161 [22:35:15] (CR) QChris: "> 1) Document the expected date format in the argument help." [analytics/blog] - https://gerrit.wikimedia.org/r/180158 (owner: QChris) [22:38:16] (CR) QChris: "> Add tests for dates set in the future" [analytics/blog] - https://gerrit.wikimedia.org/r/180161 (owner: QChris) [22:46:26] qchris, around? [22:46:42] * qchris is scared to say yes :-P [22:47:23] qchris, i just need you to say "I do" ;) [22:47:28] I do [22:47:31] , sir! [22:48:07] dan said "just sign this blank check please" [22:49:15] qchris, basically i need you to respond to my email about my way of counting, and say that yes, it makes sense and should be used as a valid way to count pageviews [22:49:29] * qchris checks email [22:49:30] (for the purpose of zero) [22:49:36] qchris, that email is fairly old [22:50:39] "and say that yes, it makes sense" ... I said that both pageview definitions are known to be wrong :-) [22:50:53] But I guess we can find a middle ground. [22:51:03] Help me find the email you are referring to. [22:51:15] Do you have a date for me? [22:51:44] (I do not have unreplied email from you in my Inbox) [22:52:22] looking [22:52:49] qchris, "Statistics" 11 days ago [22:53:00] After an extensive analysis... [22:53:32] qchris, mostly look at the "procedure" section [22:53:33] l~A [22:53:44] yep [22:53:49] whatever that suppose to be [22:53:59] Sorry. mutt shortcuts don't make much sense in IRC :) [22:54:49] Oh. I read the procedure part as statement, not as question. [22:54:51] Will respond. [22:59:41] yurikR: Currently, zero requests should come from mobile caches. Is checking the 'text' source just an oversight (that makes the query run longer), or is this preparation for the zero on desktop roll-out? [23:16:18] yurikR: ^ [23:19:53] yurikR: I am not sure if I am fine with saying that the procedure is fine. Who needs this sign off for what? [23:23:16] Meh. yurikR I am off to bed. [23:24:25] (CR) Ori.livneh: [C: 2] Allow to specify date to compute the report for [analytics/blog] - https://gerrit.wikimedia.org/r/180158 (owner: QChris) [23:50:42] Analytics-Engineering, Analytics-EventLogging: EL office hours - https://phabricator.wikimedia.org/T76796#851964 (kevinator) Follow these steps to organize an event with a google hangout: https://www.mediawiki.org/wiki/Project:Calendar/How_to_schedule_an_event#Tech_Talks