[07:44:49] Analytics-Kanban, EventBus, Services, Wikimedia-Stream, User-mobrovac: Public Event Streams - https://phabricator.wikimedia.org/T130651#2708172 (Tomayac) > I see around 60 msgs/sec there, which isn't much. This [09:17:46] mforns: hola! You there? [09:17:54] (also joal ) [09:17:56] https://pivot.wikimedia.org/# [09:18:05] this is ehm temporarily live hacked [09:18:07] Hi elukey, usually mforns is not here before lunch :) [09:18:09] let's call it in this way :P [09:18:20] Hey ! Here you go :) [09:18:37] do we want to put descriptions? [09:18:38] elukey: So datasources need to be preconfigured in pivot, right? [09:18:55] only the datacubes, I put all in auto discovery [09:19:02] you can list measures dimensions etc.. [09:19:14] ok [09:19:21] About desc that would e rgeat [09:20:14] I can merge the puppet change with the config file and then ask to the team [09:20:22] it will be super quick to merge [09:20:29] Particularly for the pageviews one [09:20:33] ok [09:20:56] Thanks elukey for the good looking pivot :) [09:24:14] :) [09:34:53] joal: https://gerrit.wikimedia.org/r/#/c/315480/2/modules/pivot/templates/config.yaml.erb [09:34:59] this can of course change over time [09:35:38] I also added some tunables to the puppet class [10:18:05] all right joal, https://pivot.wikimedia.org final version :) [10:18:17] (I mean, for today :) [10:18:53] elukey: I don't view diffs with the previous ones you posted [10:19:05] I guess it's puppet related things [10:19:25] elukey: I'm going to suggest descriptions for the data cubes [10:19:31] Should I do that in the CR you sent? [10:20:37] nono it is already merged.. the diff is that the one before was a hacky solution :D [10:20:59] what do you have in mind for the descriptions? [10:21:02] elukey: Ahhhhm makes sense :) [10:21:03] I can add them [10:21:15] elukey: I'll suggest description in email then ;) [10:46:30] Analytics-Kanban, Patch-For-Review: Productionize Pivot UI - https://phabricator.wikimedia.org/T138262#2708524 (elukey) @mforns: I added a simple config file to Pivot and now the labels are clearly visible, plus we can add a description to them (afaik it is not possible to add this data to druid). I ag... [11:53:55] hi, in refinery-core I see a mix of org.junit and junit.framework imports is it ok to switch one of the tests to org.junit ? [11:54:28] Hi dcausse [11:54:35] joal: hi! [11:54:38] :) [11:54:40] Howdy? [11:54:45] fine and you? :) [11:55:16] basically I'd like to change some bits in the way we detect search api requests [11:55:29] I'm good thanks :) [11:55:42] and reading the tests I think I can clean it up with a Parametized one [11:55:51] dcausse: About tests, it really depends in my opinion on how many cases you want to tests [11:56:07] We use Right [11:56:10] oops [11:56:11] :) [11:56:31] We used prameterized ones when there are many cases we want to try [11:56:43] But I know nuria for instances prefers core junit ones [11:56:58] I actually don't mind and prefer when it's kinda all over the same :) [11:57:03] ah ok [11:57:25] joal: basically this one: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/test/java/org/wikimedia/analytics/refinery/core/TestSearchRequest.java [11:58:10] dcausse: Looks completely parameterizable in my opinion :) [11:58:19] yes me too [11:58:37] ok I'll do that, thanks! :) [11:59:02] You have examples of parameterized tests using CSV for test data in the same package [11:59:19] No prob, thank you for upgrading our cadebase ;) [13:02:51] o/ joal [13:03:02] Hey halfak [13:03:12] I've been out of the office for a week (Association of Internet Researchers in Berlin) so nothing new to discuss re. live systems [13:03:57] oki [13:04:12] halfak: Currently fihgting with some scala to better reconstruct our history :) [13:04:21] halfak: Those are my only news :) [13:04:49] joal, gotcha. Data cleanup is rough times. Good luck on your work. It's gonna be mega valuable. [13:04:59] halfak: Thanks mate :) [13:05:13] halfak: it's less cleanup than performance and correctness for the moment [13:12:50] (PS1) DCausse: [search] Add support for generator api requests [analytics/refinery/source] - https://gerrit.wikimedia.org/r/315503 [13:14:38] (PS2) DCausse: [search] Add support for generator api requests [analytics/refinery/source] - https://gerrit.wikimedia.org/r/315503 [13:19:13] (PS14) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [13:32:29] Hey elukey, I don't know if we have a task for the spark UI issue in yarn [13:38:22] joal: do you mean the spam logs? [13:38:39] or the proxy rewrite html part? [13:39:29] ok the latter, I can see in yarn.w.o that you are running a spark shell [13:39:35] no I don't have a task :( [13:40:05] elukey: ok, I'm going to create one then :) [13:40:20] elukey: I might have said that before, but this is for real ;) [13:42:07] Analytics: Make yarn.wikimedia.org correctly proxy to Spark UI - https://phabricator.wikimedia.org/T147927#2708768 (JAllemandou) [13:43:33] Analytics: Make yarn.wikimedia.org correctly proxy to Spark UI - https://phabricator.wikimedia.org/T147927#2708780 (elukey) p:Triage>Normal a:elukey [13:43:40] thanks :) [14:27:21] elukey: heya, do you think we're ready for pivot show up at Scrum of scrum ? [14:28:15] oh yes, but with the disclaimer that it has not been tested by a super wide audience so far [14:28:21] right [14:28:28] elukey: I'll ask at standup later on [14:50:41] yaaaay I have internet again yaaa [14:50:43] :) [14:52:31] \o/ [14:54:49] Hurray [15:10:21] (CR) Milimetric: Improve build a bit more (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/314622 (owner: Milimetric) [15:10:58] ottomata: if you have time today - https://phabricator.wikimedia.org/T147682#2703962 :) [15:20:30] nuria: Forgot to ask - On recruiting, we have finished scanning, now receiving and checking tasks, correct? [15:24:49] milimetric: another question - When we say deprecate limn1, is that he last instance of limn running for us? [15:25:15] joal: yes [15:25:22] ok thanks :) [15:31:47] wait joal you wanna talk about the revert stuff? [15:34:42] joal: no, we are still screening [15:34:53] milimetric: In meeting now, but yes, after [15:35:44] nuria: ok, I'll say we have sent some tasks [15:49:56] nuria: can we take a minute discussing your CR? [15:50:02] joal: yes [15:50:08] batcave? [15:51:35] (CR) Nuria: Improve build a bit more (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/314622 (owner: Milimetric) [15:51:46] ok, i have a few mins [16:29:46] joal: if you wanna talk in the cave :) [16:29:58] milimetric: still in meeting, 1-1 :) [16:30:11] nuria: will be 2 minutes late, kiss hello to Lino [16:30:22] ok, I'll be in mine after, so we can catch up tomorrow :) [16:30:27] joal: of course! let's met at :40? [16:34:30] nuria: Here ! [16:34:41] milimetric: sounds good :) [16:38:05] Hey nuria, I can see you, but you can't hear me :) [17:04:12] Analytics, Cassandra, Services (watching): Inconsistent Cassandra disk load shown in metrics and nodetool status - https://phabricator.wikimedia.org/T146130#2709379 (GWicke) [17:06:18] Analytics-Kanban, EventBus, Wikimedia-Stream, Services (watching), User-mobrovac: Public Event Streams - https://phabricator.wikimedia.org/T130651#2709389 (Pchelolo) [17:07:07] Analytics, EventBus, Services (watching): Check eventbus Kafka cluster settings for reliability - https://phabricator.wikimedia.org/T144637#2709391 (Pchelolo) [17:09:45] milimetric: want to talk about dashiki Cr? [17:10:15] sure but I gotta eat and then SoS and then I have to run to the DMV and police [17:10:31] I guess I can skip SoS... yeah, let's talk after I eat [17:11:50] joal: you're on your own for SoS. My perspective on the last year is that SoS is a great meeting but we could use a more structured way to define inter-team dependencies [17:12:24] so if someone is blocked on someone or wants to bring something to the attention of someone else, it's easy to miss if they're just saying it. On the other hand, it's hard to know who would be interested if you're the person talking [17:12:59] but I'm not sure how to make that better. A specific example was the pageview API that was brought up a few times but didn't fall on the right ears [17:13:39] milimetric: you gone for a bit? also wanted to talk to you about event stream stuff [17:14:33] ottomata: yeah, but I'm back from the DMV after 4:30 [17:14:43] hm, ok [17:16:27] Analytics: Puppetize job that saves old versions of geoIP database - https://phabricator.wikimedia.org/T136732#2709428 (Nuria) But wait, is this cron related to geowiki only? This is the GeoIp database backup. cc @Milimetric [17:54:08] Analytics-Tech-community-metrics: Ratio of performed code reviews vs. patches authored, for each Gerrit/Differential user - https://phabricator.wikimedia.org/T147948#2709778 (Aklapper) [17:54:20] Analytics-Tech-community-metrics: Ratio of performed code reviews vs. patches authored, for each Gerrit/Differential user - https://phabricator.wikimedia.org/T147948#2709778 (Aklapper) p:Triage>Low [18:00:17] nuria: ok, back, wanna chat? [18:01:16] milimetric: I am in the "quiet" part of the library. For CR we can probably do irc? [18:01:27] sure [18:01:30] milimetric: please take a look at my latest comment. [18:01:51] doh, I commented back but forgot to submit, doing now [18:01:55] milimetric: we do not load dashiki "all async" , scripts.js is loaded by index.html [18:01:56] (CR) Milimetric: Improve build a bit more (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/314622 (owner: Milimetric) [18:02:58] yep, agreed, but bundles load async and I'm thinking we'll lean towards bundling more and more out of the main layout as dashiki gets more sophisticated. That way it can load the chrome fast and add the rest as it's ready [18:03:21] in either case, I agree that piwik is pretty fast and I'm ok with switching that with scripts in the order [18:03:27] (as I said in my last comment there) [18:03:30] milimetric: The requests for scripts.js is not async and that is ko [18:03:53] milimetric: all that is bundled and loaded together (is our build process doing that, not ko) [18:04:31] require js is loaded 1st (after css) and after scripts.js which is everything minus teh bundles we explicitily exclude [18:05:22] yeah, I detailed all that exactly in my comment [18:05:36] I agree with you, the order I mention there is: [18:05:56] * CSS (this has not changed) [18:05:56] * root layout component (lazy-loaded by ko) [18:05:56] * main script (minus bundles which are lazy-loaded) [18:05:57] * piwik [18:06:18] ok, agreed on order, thus loading piwik async has no effect on performance on 1st page load [18:06:20] we can switch the piwik and main script if you like [18:06:35] but I'd like to keep the scripts under the root layout component [18:06:53] in case we add static content there at some point, I want that to load as fast as humanly possible [18:07:12] again, if piwik is loaded async it will not interfere [18:07:21] not piwik, the script that loads it [18:08:09] that creates an scr "async" tag whose execution is deferred [18:08:27] the other problem is that if we put the scripts below the root component and piwik in the head, we make the build a little more complicated 'cause we need two placeholders [18:08:38] "The async attribute on the script tag provides two critical properties: it tells the browser to not block DOM construction, and it does not block script execution on CSSOM" [18:08:57] yeah, but that's after it parses and executes the [18:10:48] then the hello will be rendered first before executing the script [18:11:08] milimetric: I think you will convince yourself if you measure the impact of your change [18:11:24] milimetric: do so as perf improvements shoudl always be driven by numbers [18:11:28] *should [18:12:14] yeah, my point is that I don't care if this is 10ms, my opinion is that static content should load first, it's more a style issue than performance [18:13:00] but for what reason would we want to load piwik before the html? Like, what benefit does that bring? [18:13:12] milimetric: moving it goes against the usual recommendation on how to load light async scripts [18:13:19] milimetric: the recommendation is as follows [18:13:53] milimetric: the inline script block should go on head as that way it is executed [18:14:11] milimetric: and creates the script "async" tag [18:14:17] milimetric: before css parsing starts [18:14:31] milimetric: thus teh very-tiny-inline block will be executed imediately [18:14:46] milimetric: but the async tag will not stop rendering [18:15:00] milimetric: piwik is not loaded [18:15:23] milimetric: the browser will enque the async tag for execution at the more optimal time [18:15:36] milimetric: let me find someone more articulate than me explaining this: [18:15:43] milimetric: https://www.igvita.com/2014/05/20/script-injected-async-scripts-considered-harmful/ [18:20:09] I feel like this doesn't apply to our usage of piwik [18:20:23] how so? [18:20:31] it's a bunch of complication in the build code just to load analytics faster, and we don't really care at all whether analytics loads right away or like 20ms later [18:21:23] ok, but that is not a performance argument when it comes to page rendering. [18:21:26] but it's ok with me, we can have two js sections and put the piwik above everything in the head and the main scripts below the content [18:22:05] my argument was never just performance, and this does include a performance component, it means that first script slows everything else down behind it for no apparent reason [18:22:18] other than small scripts should execute first, but that sounds too much like scripture to me [18:22:46] maybe if I actually care what the small script is doing, sure, like in the case of facebook where if they don't load their analytics they lose a billion dollars [18:23:46] anyway, I'm happy to do put the piwik script wherever you think it should go. But I'll keep the main script below the main component [18:24:12] I've gotta run to the DMV and police now, but please comment which way you decide and I'll change and push a patch tonight [18:25:06] "it means that first script slows everything else down behind it for no apparent reason"-> my point was that this is not correct. I can see the argument about the build regarding piwik but any performance argument one has to come with numbers as to measure impact. In this case I think if you try to measure impact on loading you are not going to find any. [18:26:24] milimetric: impact of loading on moving piwik code around. If you want to move it for simplicity of build that is a different thing and I have nothing against it. [18:49:18] Analytics, EventBus, Services (later): Ensure that EventBus extension gracefully handles service failures - https://phabricator.wikimedia.org/T125394#2710080 (Pchelolo) [18:56:40] milimetric: let me know if you want to talk about this in the cave [19:12:22] Analytics-EventLogging, ArchCom-RfC, Discovery, Graphs, and 10 others: RFC: Use YAML instead of JSON for structured on-wiki content - https://phabricator.wikimedia.org/T147158#2710180 (RobLa-WMF) [19:58:45] hello all, is there a plan to backfill more data into the pageviews API (earlier than July 2015)? [20:24:01] musikanimal, that might not make sense. The old way of counting pageviews is very different from the new. [20:24:09] But I agree that it would be useful. [20:24:24] In the meantime, I need to hack together a way to gather old pageview counts. [20:24:46] Maybe there could be a different path for requests pageview counts from old time periods. [20:27:18] Analytics, Operations, Traffic: The WMF-Last-Access Set-Cokkie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710558 (bd808) [20:27:33] Analytics, Operations, Traffic: The WMF-Last-Access Set-Cookie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710572 (bd808) [20:40:47] Analytics, Operations, Traffic: The WMF-Last-Access Set-Cookie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710588 (bd808) I've found some blog post from 2012 that discuss IE6, 7 & 8 not supporting `Max-Age` and suggesting s... [20:44:27] musikanimal: no, we no longer can do that as we do not have that data. [20:44:53] got it, that makes sense [20:45:06] musikanimal: we literally cannot do it, as halfak said we might make those available but not through the api [20:45:42] musikanimal: teh pageview counts exist but they count a very different thing than what we count now so they cannot be compared apples to apples [20:45:45] *the [20:45:50] right [20:46:25] nuria, but they still could be in the API with fewer breakdowns available. [20:46:29] E.g. no human/bot [20:47:03] We'd probably want to put them behind a different (but similar) endpoint [20:47:11] Analytics, Operations, Traffic: The WMF-Last-Access Set-Cookie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710558 (BBlack) We use expires in our `CP` cookies as well (which track connection properties for HTTP/2 stats), so... [20:47:27] This would have a lot of value for historical analysis, but I imagine it would be hard work too. [20:47:47] But the world was very interesting before July 2015 :) [20:47:50] halfak: tehy coull be in an API, i'd say probably yes, but not mixed with current pageview data on current api [20:47:57] +1 [20:47:57] *they [20:48:02] milimetric: if you are back would love to chat a bit [20:48:38] Analytics, Operations, Traffic: The WMF-Last-Access Set-Cookie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2710602 (BBlack) [20:48:46] thanks for the info! [21:28:58] (CR) EBernhardson: [C: 1] [search] Add support for generator api requests [analytics/refinery/source] - https://gerrit.wikimedia.org/r/315503 (owner: DCausse) [21:30:30] back, but ... sigh, the dmv never fails [21:30:36] I'll be taking more time off now [22:22:50] halfak, musikanimal: the exception is that there is *project level* pageview data available according to the new definition back to 2013 https://meta.wikimedia.org/wiki/Research:Page_view#Data_sources_that_use_this_definition [22:23:01] (this is what i used for https://commons.wikimedia.org/wiki/File:Wikimedia_pageviews_year-over-year_comparison_(since_May_2013).png , with some adjustments) [22:56:13] HaeB: no, very different that dataset is sampled [22:56:44] HaeB: besides other points that would be the main difference, you can never report absolute view_counts [22:57:22] At this scale, the sampling doesn't really mean anything [22:57:37] I mean to say that extrapolating is fine [22:57:59] nuria: recall that we had a very long phabricator task examining the differences [22:58:15] ... sampling was among the minor issue [22:58:38] halfak: for big wikis yes, for mid size/small ones I am going to say no [22:58:59] nuria, well, I wouldn't say no for my needs. [22:59:05] halfak: jaja [22:59:08] that can be [22:59:08] Regardless, it'd be up to the user. [22:59:12] ...after we quantified the others with joal's help, it now works OK for the purpose i mentioned [23:00:49] i agree the user needs to be aware that it's sampled [23:19:16] Analytics, RESTBase, Services: REST API entry point web request statistics at the Varnish level - https://phabricator.wikimedia.org/T122245#2711275 (GWicke) [23:21:07] Analytics-Cluster, Cassandra, RESTBase-Cassandra, Services: Standardized Cassandra dashboards - https://phabricator.wikimedia.org/T133403#2711325 (GWicke) [23:21:29] Analytics-Kanban, Datasets-Webstatscollector, RESTBase-Cassandra, Services, Patch-For-Review: Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314#2711334 (GWicke) [23:21:32] Analytics, Pageviews-API, RESTBase-API, Services: AQS: query multiple articles at the same time - https://phabricator.wikimedia.org/T118508#2711335 (GWicke) [23:24:03] Analytics, Wikimedia-Stream, service-runner, Services (later): Support node cluster sticky-session in service-runner - https://phabricator.wikimedia.org/T145805#2711365 (Pchelolo) [23:25:19] Analytics-Kanban, Datasets-Webstatscollector, RESTBase-Cassandra, Patch-For-Review, Services (watching): Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314#2711374 (GWicke) [23:28:37] Analytics, Pageviews-API, RESTBase-API, Services: AQS: query multiple articles at the same time - https://phabricator.wikimedia.org/T118508#2711397 (Pchelolo) Open>declined So, I guess we all are on the same page that this will never be implemented. Closing as 'Declined'