[03:48:50] Analytics-EventLogging: Add sampling support in EventLogging - https://phabricator.wikimedia.org/T67500#994061 (Nuria) >mw.eventLog.logWithSampling( 'Foo', 100, data ) I certainly agree that having a method that eases logging with sampling on the js end is a must. >If I need to look up EL data from three m... [04:54:36] Analytics-EventLogging: Add sampling support in EventLogging - https://phabricator.wikimedia.org/T67500#994095 (Tgr) In my experience with MediaViewer, we rarely cared about absolute numbers (there were cases when we did, but only a few), but we cared a lot about relative differences between various events (d... [14:06:25] (PS2) QChris: Load and refine logs from 'misc' caches [analytics/refinery] - https://gerrit.wikimedia.org/r/184191 [14:06:27] (PS1) QChris: Refine bits and upload webrequests [analytics/refinery] - https://gerrit.wikimedia.org/r/186774 [14:06:29] (PS1) QChris: Fix legacy_tsvs' use of datasets after switch to refined tables [analytics/refinery] - https://gerrit.wikimedia.org/r/186775 [14:06:31] (PS1) QChris: Use bits and misc when producing legacy tsvs [analytics/refinery] - https://gerrit.wikimedia.org/r/186776 [14:06:33] (PS1) QChris: Sort webrequest_sources in legacy_tsvs' HiveQL files [analytics/refinery] - https://gerrit.wikimedia.org/r/186777 [14:06:35] (PS1) QChris: Clarify why there are no bits or misc requests in sampled-1000 [analytics/refinery] - https://gerrit.wikimedia.org/r/186778 [14:30:40] (CR) QChris: "The corresponding change to get webrequest_misc into kafka is at Ia3712a0d85ffa893912af2dff312017f03a7a935." [analytics/refinery] - https://gerrit.wikimedia.org/r/184191 (owner: QChris) [14:44:06] Wikidata, Analytics, wikidata-query-service, operations, Services, MediaWiki-General-or-Unknown: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#994455 (JanZerebecki) >>! In T84923#993443, @GWicke wrote: > Since 0mq is not actually durable or replicated this does not cover th... [14:55:42] Analytics: analytics-logbot is no longer in the #wikimedia-analytics channel - https://phabricator.wikimedia.org/T85698#994484 (QChris) Open>Resolved a:QChris The bot returned somewhen in the last few days. Not sure who restarted the service, but whoever did it: Thanks! :-) [17:18:51] MediaWiki-Developer-Summit-2015, Analytics-EventLogging: Using EventLogging and Dashboards - https://phabricator.wikimedia.org/T85280#994594 (kevinator) [17:50:35] MediaWiki-extensions-UniversalLanguageSelector, Analytics, Wikipedia-App-iOS-App, Wikipedia-App-Android-App, Language-Engineering, Mobile-Web, Mobile-Apps: there should be a comparison of clicks count on interlanguage on different platforms - https://phabricator.wikimedia.org/T78351#994626 (Amire80) [18:00:00] is it just me or hadoop is ridiculously slow today? just launching a hive query takes forever [18:22:34] (CR) Ottomata: [C: 2 V: 2] "HA! that's it! Amazing." [analytics/refinery] - https://gerrit.wikimedia.org/r/177522 (owner: QChris) [18:29:22] (CR) Ottomata: [C: 1] "Just a few comments. Aside from those, LGTM. Just respond to them however you see fit, and then feel free to self merge this, just in ca" (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/185708 (owner: QChris) [18:52:52] Analytics-Dashiki: Commons page views in webstatscollector drop precipitously in 2015 - https://phabricator.wikimedia.org/T87589#994717 (Tnegrin) NEW a:kevinator [19:01:59] (CR) Ottomata: "wmf.webrequest is external!" [analytics/refinery] - https://gerrit.wikimedia.org/r/185179 (owner: QChris) [19:05:34] (CR) Ottomata: "Hm, I dunno. I actually really like this feature. Why should a job that doesn't need particular data depend on that data. Even if in th" [analytics/refinery] - https://gerrit.wikimedia.org/r/186775 (owner: QChris) [19:06:32] (CR) Ottomata: [C: 2 V: 2] Sort webrequest_sources in legacy_tsvs' HiveQL files [analytics/refinery] - https://gerrit.wikimedia.org/r/186777 (owner: QChris) [19:06:54] (CR) Ottomata: [C: 2 V: 2] Clarify why there are no bits or misc requests in sampled-1000 [analytics/refinery] - https://gerrit.wikimedia.org/r/186778 (owner: QChris) [19:10:23] (CR) Ottomata: "Perhaps the unneeded dependencies are just an artifact of the abstractions of all legacy_tsvs into a single coordinator. Should we consid" [analytics/refinery] - https://gerrit.wikimedia.org/r/186775 (owner: QChris) [19:14:44] Analytics-Wikimetrics: Uploading cohort or running a large report fails - https://phabricator.wikimedia.org/T87596#994809 (kevinator) NEW [19:42:52] Analytics, Wikidata, wikidata-query-service, operations, Services, MediaWiki-General-or-Unknown: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#994876 (mobrovac) Re: reliability, [RELP](http://www.rsyslog.com/doc/relp.html) might be of help on the application level. [20:31:56] Analytics-Engineering: Add time range selection to Limn dashboards (or new Dashiki dashboards) - https://phabricator.wikimedia.org/T87603#994953 (Milimetric) NEW a:Milimetric [20:35:59] Analytics-Engineering: Add ops-reportcard dashboard with analysis that shows the http to https slowdown on russian wikipedia - https://phabricator.wikimedia.org/T87604#994962 (Milimetric) NEW a:Milimetric [21:36:29] ottomata: Thanks for the merge and the comments! [21:38:46] qchris: hello! [21:38:48] yup :) [21:38:53] thanks for the poke :) [21:49:43] Analytics, operations: Hadoop logs on logstash are being really spammy - https://phabricator.wikimedia.org/T87206#995149 (Ottomata) Merged, but Hadoop daemons will need to be restarted to pick up this change. If you don't mind waiting, I will likely be restarting all of them soon (hopefully within the next f... [21:57:21] (PS3) QChris: Explain why tables that should be external are internal [analytics/refinery] - https://gerrit.wikimedia.org/r/185179 [21:58:22] (PS4) Ottomata: Explain why tables that should be external are internal [analytics/refinery] - https://gerrit.wikimedia.org/r/185179 (owner: QChris) [21:58:31] (CR) Ottomata: [C: 2 V: 2] Explain why tables that should be external are internal [analytics/refinery] - https://gerrit.wikimedia.org/r/185179 (owner: QChris) [21:58:44] You so fast to merge! [21:59:17] (CR) QChris: "> wmf.webrequest is external!" [analytics/refinery] - https://gerrit.wikimedia.org/r/185179 (owner: QChris) [21:59:58] vrrroooooom [22:04:49] :-D [22:05:03] keep poking me, these are fun! :) [22:05:07] and easy to do for the most part [22:05:12] while i'm watching james F talk about the future of editing [22:05:22] with 20 people all editing an etherpad behind him on the projector [22:05:35] k. In a few minutes I'll convince you to merge the legacy_tsv dependency change :-) [22:05:49] Whoa. Sounds cool. [22:06:00] it is distracting and hilarious [22:06:10] but interesting too! [22:06:11] Link to the etherpad? [22:06:20] https://etherpad.wikimedia.org/p/MWDS2015-FutureOfEditing [22:06:39] Thanks [22:06:56] 32 participants in the etherpad :-D [22:13:37] (CR) QChris: "> Perhaps the unneeded dependencies are just an artifact of the" [analytics/refinery] - https://gerrit.wikimedia.org/r/186775 (owner: QChris) [22:23:38] (CR) Ottomata: "> As there are no missing datasets, missing datasets can no longer block. So it's ok to depend on all of them and be on the safe side." [analytics/refinery] - https://gerrit.wikimedia.org/r/186775 (owner: QChris) [22:27:52] qchris: maybe live chat about this one real quick? [22:28:04] you can probably convince me, especially because I want this stuff running sooner rather than later :) [22:28:15] I was just about to respond to it. [22:28:23] ok [22:28:26] das fine [22:28:33] So ... you know ... I am not too fond of the way refined datasets are currently set up. [22:28:48] But I figured that the current setup is the way you want it, [22:28:52] so I wanted to roll with it. [22:29:01] yes, because they are not verified and because the data quality is not currently visible [22:29:10] right? [22:29:13] Right. [22:29:16] But that's ok. [22:29:25] And if one turns off bits again, [22:29:35] The refined partitions will get created, [22:29:38] but they are empty. [22:29:46] So they are ok is some sense, [22:29:49] oh? [22:29:50] really? [22:29:51] and Oozie jobs should run on them. [22:30:03] camus creates the directories even if it doesn't have any timestamps to put there? [22:30:13] No. [22:30:25] Oh. Sorry. [22:30:42] I screwed up, by mixing the turning off of esams bits with turning off bits fully. [22:30:59] For only esams bits turned off, it created the directories. [22:31:10] Obviously, because the other bits were present. [22:31:17] aye, yeah, no bits directories since the 14th :) [22:32:13] If you turn off bits and want only those jobs to run that do not depend on bits, [22:32:23] the trick I used for raw would not help. [22:32:32] Hence we'd really need separate coordinators. [22:32:45] why wouldn't the trick work? [22:33:03] Because I'd need an "raw_unchecked" dataset to exist. [22:33:10] And if you turn it off completely, [22:33:25] the bits "raw_unchecked" would not be marked done either. [22:33:49] So, we're basically back to 4 coordinators (which can share the same workflow). [22:33:55] It's easy to setup. [22:33:59] oh, yes. [22:34:00] true [22:34:01] But probably tedious to maintain. [22:34:19] hmMMMmmmMMmmMmm [22:34:32] Like having to propagate any change to all four coordinators. [22:34:41] tedious in that we have to manage them separately in oozie [22:34:44] restart all 4 of them [22:34:46] etc. [22:34:48] oh [22:34:59] and edit all of them for changes in [22:35:01] yeah [22:35:02] : [22:35:02] :? [22:35:04] :/ [22:35:10] but, then again, oozie is already tedious [22:35:11] heh [22:35:31] Mhmmm ... I guess I could make the trick work if we add a dummy partition, that automatically gets added and marked done. [22:35:42] But that's ugly. [22:35:54] Well ... :-/ [22:36:08] We're building against a case that I doubt it will become relevant. [22:36:17] So it's hard to say how likely one will need it. [22:36:20] aye [22:36:30] i feel like the right thing to do is to have multiple coordinators :/ [22:36:44] k. [22:36:48] i mean [22:36:50] Multiple coordinators it is. [22:36:52] maybe not? right? [22:36:58] i feel ya on the tediousness [22:37:07] but the artificial dependencies feel way worse to me [22:38:04] By artificial you mean the fake ones on "raw_unchecked", or the unneeded ones (like bits for the mobile-sampled-100)? [22:38:28] uh, undeeded ones [22:38:34] k. [22:38:58] Then it sounds like we want the multiple coordinators. [22:39:01] haha [22:39:05] i think so!? [22:39:13] Yes, you want them! [22:39:14] that sounds right to me, but I am not decreeing it! [22:39:22] YOU WANT THEM !!!!! [22:39:25] hahaha [22:39:32] :-P [22:40:19] If I've got you already ... about the pagecounts-raw workflow directories ... [22:40:29] I do not find "output_archive_directory" or "dataset_archive_directory" too convincing, as they do not really conway in which way they differ from the "archive_directory" from the properties file. [22:40:29] In general "archive_directory" is "/wmf/data/archive", and we want a subdirectory underneath that. [22:40:29] What we want is the directory that is specific to the workflow's configuration (either pagecounts-all-sites or pagecounts-raw). [22:40:29] Would "workflow_specific_archive_directory" work for you? [22:40:31] "pagecounts_kind_archive_directory", [22:40:33] "pagecounts_variant_archive_directory"? [22:42:30] ottomata: ^ [22:43:58] how about jsut pagecounts_archive_directory [22:44:04] ? [22:44:27] Is thaht also ok for the jobs that generate projectcoutns files? [22:44:41] hm. they use the same property? [22:44:48] Yes. [22:44:51] hm, then no. :p [22:45:11] It's where the workflow will put the generated file. [22:45:40] And projectcounts is kind of a "pagecounts_variant" ... [22:46:15] qchris, how about just [22:46:18] output_directory? [22:46:26] this is where the workflows output will go [22:46:37] Mhmm. [22:46:55] Isn't that too generic. [22:47:04] An "output_directory" could be anything. [22:47:07] Meh. [22:47:08] naw, why? it is a variable name [22:47:13] I guess it's ok. [22:47:18] yes! [22:47:21] it is ok! [22:47:30] it is not a directory of archived workflows [22:47:33] After all ... this is not in the properties file, but only lives from the coordinator onwards. [22:47:35] it is the output of some job! [22:47:37] yes [22:47:43] but can be overridden! [22:47:49] Ok. output_directory it is. [22:47:51] yes! [22:47:52] Thanks! [22:47:59] i am getting my way all over the place today! [22:48:00] :) [22:48:17] You're a winner! :-D [22:48:21] hahaha [22:55:05] (PS2) QChris: Add pagecounts-raw computation to pagecounts-all-sites [analytics/refinery] - https://gerrit.wikimedia.org/r/185708 [22:55:41] (CR) QChris: Add pagecounts-raw computation to pagecounts-all-sites (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/185708 (owner: QChris) [22:58:28] (PS2) QChris: Clarify why there are no bits or misc requests in sampled-1000 [analytics/refinery] - https://gerrit.wikimedia.org/r/186778 [23:03:02] (CR) QChris: [V: 2] Clarify why there are no bits or misc requests in sampled-1000 [analytics/refinery] - https://gerrit.wikimedia.org/r/186778 (owner: QChris) [23:08:08] (CR) QChris: [C: -1] "Per IRC discussion on 2015-01-26 in wikimedia-analytics," [analytics/refinery] - https://gerrit.wikimedia.org/r/186775 (owner: QChris) [23:09:05] (PS3) Ottomata: Add pagecounts-raw computation to pagecounts-all-sites [analytics/refinery] - https://gerrit.wikimedia.org/r/185708 (owner: QChris) [23:09:15] (CR) Ottomata: [C: 2 V: 2] Add pagecounts-raw computation to pagecounts-all-sites [analytics/refinery] - https://gerrit.wikimedia.org/r/185708 (owner: QChris) [23:10:06] Thanks! [23:12:23] ottomata: How do feel about the changes to add misc to kafka? [23:12:28] (I'll call it a day soon and need something to work on tomorrow besides deploying and add the coordinators for the legacy_tsvs :-) ) [23:14:53] hmm [23:14:58] ja can work on that i think [23:15:01] looking [23:15:25] (Changes for it are already in gerrit) [23:15:38] bits too? [23:15:55] https://gerrit.wikimedia.org/r/#/c/186641/ [23:16:02] ottomata: ^ is bits re-enabling. [23:16:20] (Hopefully :-) ) [23:17:48] yeah saw it, just was wondering if we really wanted to turn it back on [23:17:59] i quieted most of the upload alerts so they wouldn't spam the ops room. [23:18:19] The 5xx legacy tsvs would include bits. [23:18:22] i guess we do [23:18:23] yeah [23:18:29] But we can keep them out for now if you prefer. [23:18:40] dunno. [23:18:56] i mean,i prefer to turn it back on, but i might prefer to do it when i am back on regular work hours [23:19:01] so i can respond to things more easily [23:19:11] misc no problem [23:19:12] k. [23:19:13] we can do that now [23:19:20] Then let's leave bits out for now. [23:19:22] but it depends on bits [23:19:24] that change [23:19:34] I'll rebase. [23:20:06] k [23:20:23] https://gerrit.wikimedia.org/r/184183 [23:20:25] Done ^ [23:21:58] I guess I'll leave splitting the corresponding refinery change for tomorrow (so I can test them a bit) [23:22:23] ok [23:24:30] there they go! [23:24:44] qchris: [23:24:51] we should merge the camus change [23:24:53] whatcha think? [23:25:15] Sure. [23:25:20] But I guess thaht needs rebasing too. [23:25:20] oh i see it [23:25:21] coool [23:25:23] ah and refine [23:25:24] Let me check. [23:25:25] yes i'm for it. [23:25:31] it looks good i think [23:25:34] just adds everything for misc [23:25:39] ok. [23:25:41] on its own, upload+bits refine are separate [23:25:56] gonna do it, and git-deploy [23:26:05] Cool! [23:26:06] Thanks. [23:26:14] oh, it has dependency [23:26:17] on bits/upload [23:26:18] rebase? [23:26:25] will do. [23:27:34] Meh. The dia graph dependency gets in the way. [23:27:56] ? [23:28:00] Let's keep that for tomorrow? [23:28:13] I always update the oozie overview diagram with my changes. [23:28:23] i think i would rather merge the camus changes asap, so we don't have to consume a bunch later [23:28:35] And since that diagram is a binary file, git cannot rebase nicely for diagram changes. [23:28:35] hm, unless it will be smart enough to start at end [23:28:38] oh manbye it will [23:29:06] Ok. Then I'll update the graph. [23:29:57] meh i dunno what it does [23:30:01] not clear from the properties [23:33:08] (PS3) QChris: Load and refine logs from 'misc' caches [analytics/refinery] - https://gerrit.wikimedia.org/r/184191 [23:33:22] ottomata: rebased change ^ [23:33:36] thank you [23:34:16] (PS4) Ottomata: Load and refine logs from 'misc' caches [analytics/refinery] - https://gerrit.wikimedia.org/r/184191 (owner: QChris) [23:34:25] (CR) Ottomata: [C: 2 V: 2] Load and refine logs from 'misc' caches [analytics/refinery] - https://gerrit.wikimedia.org/r/184191 (owner: QChris) [23:40:25] hm, wouldn't it be cool if i could start PART of a bundle without killing the running parts? [23:40:36] e.g. start misc refinement without killing the whole bundle [23:40:37] hm. [23:40:41] i think not possible :p [23:40:58] I also think it is not possible. [23:41:12] There is a misc directory! [23:41:24] /mnt/hdfs/wmf/data/raw/webrequest/webrequest_misc [23:41:32] :) [23:41:35] So camus seems to be doing its job. [23:41:50] But do not worry about the refining. [23:41:57] I can take care of that tomorrow. [23:43:25] Content looks good too. [23:43:27] Awesome. [23:43:27] i am starting the refine job now :) [23:43:33] Ok. :-D [23:43:51] 3 coordinators now! [23:43:52] :) [23:44:06] woot Job Name : refine-webrequest_misc [23:44:07] Awesome! What a productive evening. [23:57:05] oh, qchris, i need to restart the load job too, eh? [23:57:14] yup, doh. [23:57:18] Yup. [23:59:28] ok done [23:59:32] now 5 load coordinators running [23:59:41] Cool.