[04:22:18] (CR) Mforns: [C: 1] "LGTM!" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/278337 (https://phabricator.wikimedia.org/T130399) (owner: Madhuvishy) [05:19:54] Analytics, Pageviews-API, I18n: Message [[Wikimedia:Pageviews-select2-max-items/en]] needs PLURAL support. - https://phabricator.wikimedia.org/T130005#2138157 (Liuxinyu970226) [05:40:40] Analytics-Kanban: Communicate the WikimediaBot convention {hawk} - https://phabricator.wikimedia.org/T108599#2138158 (mforns) @jayvdb @bd808 @Anomie Here's the announcement email in wikitech-l: https://lists.wikimedia.org/pipermail/wikitech-l/2016-March/085083.html [05:52:20] Analytics-Kanban: browser_general table should have documenting page in wikitech - https://phabricator.wikimedia.org/T130060#2138165 (mforns) a:mforns [11:30:47] o/ [11:30:51] I am a bit puzzled by https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [11:31:00] there is no zookeper in there.. [11:33:56] elukey: there seems to exist deployment-zookeeper01 which has the role::zookeeper::server applied to it [11:37:08] mobrovac: that is not in Hiera! [11:37:12] ahhhh okkk [11:37:33] elukey: you also have hieradata/labs/deployment-prep/ in ops/puppet [11:37:53] that hiera data is also applied to the beta cluster instances [11:38:31] tl;dr deployment-prep functions a bit differently than other labs projects [11:42:06] mobrovac: always easy :D [11:42:17] :) [11:44:02] I guess I'll wait for ottomata :) [11:44:56] what do you want/need to achieve? [11:45:51] I cherry picked https://gerrit.wikimedia.org/r/#/c/278713/ in the deployment-prep puppet master, and I'd need to test it on the hadoop namenodes [11:46:19] but I need zookeeper to be configured properly otherwise my code won't activate [11:46:33] and I've never done it :) [11:49:23] ah, this will need a bit of work in order to get it right [11:49:43] theoretically it should be something like [11:50:11] "cdh::hadoop::zookeeper_hosts": deployment-zookeeper01.deployment-prep.eqiad.wmflabs [11:51:52] ah so you need to point hadoop to zookeeper [11:52:35] yes exactly [11:53:04] mobrovac: would it be fine to add the line to Hiera:Deployment-prep? [11:55:11] possibly, i'm not entirely sure wikitech hiera is even used for deployment-prep [11:55:55] elukey: i'd advise to amend your patch and add that line to hieradata/labs/deployment-prep/common.yaml [11:56:55] elukey: also note that zookeeper_hosts should be an array: cdh::hadoop::zookeeper_hosts: ['deployment-zookeeper01.deployment-prep.eqiad.wmflabs'] [11:58:23] mobrovac: yes thanks :) [11:59:26] np [12:33:00] team, will be off for a couple hours and be back for standup [14:08:57] ottomata: o/ [14:09:15] goooood morning [14:09:19] hallloo [14:09:19] ! [14:09:21] question for you when you have time [14:09:23] good morn [14:09:24] yes ask me!@ [14:09:34] thankssss [14:09:53] so I cherry picked my changes in deployment-prep's puppet master [14:10:24] but then I realized that no zookeeper_hosts variable was configured, so I added it to http://www.google.com/patents/US8024441 [14:10:28] argh sorry [14:10:34] https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [14:10:42] hmmmm [14:10:44] (the other one is a load balancing thing that I was reading :P) [14:10:54] Hm! hm. [14:11:02] but of course it doesn't work [14:11:05] no? [14:11:22] nah forcing puppet on the name nodes doesn't trigger any change [14:11:42] Marco suggested to use hieradata/labs/deployment-prep/common.yaml but there's nothing about cdh in there [14:12:22] elukey: [14:12:23] zookeeper_hosts => keys(hiera('zookeeper_hosts', undef)), [14:12:59] in analytics_cluster hadoop/client.pp [14:13:06] that applies the cdh class [14:13:32] ah so it is not "cdh::hadoop::zookeeper_hosts": [14:13:38] HMMMm [14:13:42] well,i mean [14:14:36] analytics_cluster role defaults to using zookeeper_hosts value [14:15:20] i guess the parameter supplied at class declaration takes precedence if your hiera change in wikitech didn't work [14:15:22] hm [14:15:28] but, its gotta be set somewhere already for beta labs [14:15:29] for kafka [14:15:29] hm [14:16:46] ah snap I missed this piece, of course the analytics role [14:16:50] AH [14:16:51] elukey: [14:17:04] hieradata/labs/deployment-prep/common.yaml [14:17:11] zookeeper_hosts: [14:17:11] deployment-zookeeper01.eqiad.wmflabs: "1" [14:17:25] that's also where the beta kafka cluster is defined [14:17:33] i actually had started putting the hadoop cluster in beta in there too [14:17:34] but [14:17:47] i had to rebuild it a few times [14:17:53] and i got tired of making ops/puppet commits to change those values [14:18:31] so, i'm not sure why it isn't just set in labs arlday [14:18:32] already [14:18:34] for hadoop cluster [14:18:37] you shouldn't have to do anything [14:18:56] for some reason I didn't see the zookeeper reference in there, and Marco pointed out that file [14:18:59] sigh [14:19:22] ja and [14:19:25] on deployment-analytics101 [14:19:28] yarn-site.xml [14:19:30] yarn-site.xml: deployment-zookeeper01.eqiad.wmflabs [14:19:37] yarn-site.xml- yarn.resourcemanager.zk-address [14:19:38] yarn-site.xml: deployment-zookeeper01.eqiad.wmflabs [14:19:40] so, it has the value [14:19:50] yes yes I missed it, puppet always confuses me [14:19:56] hiera confuses me [14:20:03] there are so many places a value could come from [14:21:21] joal: hiiiii [14:21:27] i'm going to add the partitions to the wr topics! [14:22:13] ottomata: elukey@deployment-puppetmaster:/var/lib/git/operations/puppet/modules/cdh$ git log --> doesn't show my commit [14:22:29] hm! [14:22:37] hmmmm [14:22:43] oh you are cherry picking the module hm [14:22:44] right [14:22:46] oh [14:22:47] so I'd need to run git submodules update possibly? [14:22:52] oh [14:22:56] yes you probably do! [14:23:23] ja [14:23:24] elukey: [14:23:25] modified: modules/cdh (new commits) [14:23:32] so ja you gotta submodule update [14:24:09] yeah it worker [14:24:12] *worked [14:24:15] grrrrrr [14:25:51] nice [14:26:10] !log altering kafka topics webrequest_text and webrequest_upload, increasing each from 12 partitions to 24 partitions [14:32:16] veeeery nice join(): Requires array to work with at [14:32:43] line is $zookeeper_hosts_string = join($::cdh::hadoop::zookeper_hosts, ',') [14:33:27] hmmmm [14:33:31] its not an arraY? [14:33:38] keys(hiera('zookeeper_hosts', undef)) [14:33:56] iw ould think keys($hash) would give you an array [14:33:59] of keys [14:34:27] maybe puppet is dumb and only gives you an array if the hash has more than one key? dunno. puppet is often dumb. [14:34:44] elukey: i think it is find to edit manually the cdh module on deployment-puppetmaster [14:34:49] no one else will mess witih that [14:35:12] editing ops/puppet directly usually isn't good, because many folks use that repo, but since the cdh modules is a submodule, you won't bother anyway [14:35:14] anyone [14:35:53] ottomata: yeah I was thinking the same [14:41:13] there is a nice any2array [14:44:56] not in the current stdlib [14:48:11] hm [14:48:16] ergh [14:48:18] hmmm [14:48:20] well you can do [14:49:05] $zookeeper_hosts = $::cdh::hadoop::zookeper_hosts [14:49:06] $zookeeper_hosts_string = inline_template('<%= Array(@zookeeper_hosts).join(",") %>') [14:49:18] here's a stupid feature of ruby [14:49:25] Array('string') == ['string'] [14:49:35] Array(['string']) == ['string'] [14:50:09] you can use that fact to use the Array constructor to not worry if something is an array or string [14:50:24] elukey: ^ [14:52:09] ottomata: I'm currently in the train, bad connection - Please go ahead, I'll be home during standup [14:53:16] a-team: Currently getting back home, will miss the beginning of standup - My update: further work on load job (only testing left) [14:55:57] (CR) Ottomata: [C: 2] Update mediawiki/event-schemas submodule [analytics/refinery/source] - https://gerrit.wikimedia.org/r/278346 (https://phabricator.wikimedia.org/T108618) (owner: BryanDavis) [14:56:37] (PS2) Ottomata: Upgrade to latest UA-Parser version [analytics/refinery/source] - https://gerrit.wikimedia.org/r/278337 (https://phabricator.wikimedia.org/T130399) (owner: Madhuvishy) [14:56:44] (CR) Ottomata: [C: 2 V: 2] Upgrade to latest UA-Parser version [analytics/refinery/source] - https://gerrit.wikimedia.org/r/278337 (https://phabricator.wikimedia.org/T130399) (owner: Madhuvishy) [14:57:55] ottomata: why do I need to use inline_template? Array() doesn't work? [14:58:00] * elukey ignorant [14:58:08] Array() is ruby [14:58:10] puppet is not ruby [14:58:21] templates are 'ERb', embedded ruby [14:58:47] HMMM [14:58:49] actually though [14:58:50] uhhhh [14:59:18] you might be able to do join([$::cdh::zookeeper::hosts], ',') [14:59:20] not sure! [14:59:44] puppet does do funky stuff to accomplish array concatenation, so that might just give you an array too, even if the var is already an array [14:59:51] but, you'd have to check that, i'm not sure [15:01:50] join([$::cdh::zookeeper::hosts], ',') works for this use case, not sure what happens if you pass an array to [] thought [15:01:53] *though [15:01:54] ahhahaha [15:02:04] puppet I hate you a bit [15:02:39] Analytics, Analytics-Kanban, Patch-For-Review, Technical-Debt: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts] - https://phabricator.wikimedia.org/T116609#2138748 (Nuria) >AnEng's review of the MediaWiki change wasn't needed indeed; True.... [15:05:20] Analytics-Kanban: Get jenkins to update refinery with deploy of new jars {hawk} - https://phabricator.wikimedia.org/T130123#2138750 (Nuria) [15:05:49] Analytics-Kanban: Get jenkins to automate releases {hawk} - https://phabricator.wikimedia.org/T130122#2138752 (Nuria) [15:09:54] Analytics-Cluster, Analytics-Kanban, EventBus, Patch-For-Review: Camus job to import mediawiki.* eventbus data to Hadoop - https://phabricator.wikimedia.org/T125144#2138771 (Ottomata) Open>Resolved [15:22:48] Analytics-Kanban: Browser report has odd "Not named" labels - https://phabricator.wikimedia.org/T130415#2138834 (Nuria) [15:29:54] a-team: standduppp [15:36:50] (CR) Milimetric: "couple comments and a question about site-version vs. site" (5 comments) [analytics/aqs] - https://gerrit.wikimedia.org/r/277784 (https://phabricator.wikimedia.org/T129518) (owner: Joal) [15:52:48] Analytics: Landing page for unqiue devices datasets on dumps (like other datasets we have) - https://phabricator.wikimedia.org/T130542#2138910 (Nuria) [16:00:56] Analytics: Fix phab script to gather stats also from point field , not only from title - https://phabricator.wikimedia.org/T130543#2138954 (Nuria) [16:05:07] Analytics: Fix phab script to gather stats also from point field , not only from title - https://phabricator.wikimedia.org/T130543#2138954 (Milimetric) p:Triage>Normal [16:05:09] Analytics: Landing page for unqiue devices datasets on dumps (like other datasets we have) - https://phabricator.wikimedia.org/T130542#2138910 (Milimetric) p:Triage>Normal [16:09:39] Analytics: Add (and default to) a breakdown in percentages also for the line chart. - https://phabricator.wikimedia.org/T130406#2135172 (Milimetric) p:Triage>Normal [16:09:43] Analytics: Browser reports improvements (parent task) - https://phabricator.wikimedia.org/T130405#2135158 (Milimetric) p:Triage>High [16:10:08] Analytics: Prototype Data Pipeline on Druid - https://phabricator.wikimedia.org/T130258#2131128 (Milimetric) p:Triage>Normal [16:10:28] Analytics: Wikistats 2.0. Edit Reports: Source Historical Edit Data into hdfs {lama} - https://phabricator.wikimedia.org/T130256#2131074 (Milimetric) p:Triage>Normal [16:11:03] Analytics: Make metrics-by-project breakdown interactive and bookmarkable - https://phabricator.wikimedia.org/T130255#2131059 (Milimetric) p:Triage>Normal [16:11:23] Analytics: Count requests for all wikis/systems behind varnish - https://phabricator.wikimedia.org/T130249#2130851 (Milimetric) p:Triage>Normal [16:12:20] Analytics, Wikipedia-iOS-App-Product-Backlog: Fix iOS uniques in mobile_apps_uniques_daily after 5.0 launch - https://phabricator.wikimedia.org/T130432#2135848 (madhuvishy) Hi @Tbayer, the app now sends unique device ids only when users opt in to send their data - so it makes sense that this data will dro... [16:12:22] Analytics: Operational improvements and maintenance in EventLogging in Q4 {oryx} - https://phabricator.wikimedia.org/T130247#2130808 (Milimetric) p:Triage>Normal [16:14:42] Analytics: Deprecate reportcard - https://phabricator.wikimedia.org/T130117#2126320 (Milimetric) p:Triage>Normal [16:16:55] Analytics-Kanban, MediaWiki-API, Reading-Infrastructure-Team: Create wmf_raw.ApiAction table - https://phabricator.wikimedia.org/T129886#2139017 (Milimetric) [16:17:30] Analytics, Patch-For-Review: Add legends to every graph and let them filter by date - https://phabricator.wikimedia.org/T129497#2139018 (Milimetric) [16:18:14] Analytics, Patch-For-Review: Add legends to every graph and let them filter by date - https://phabricator.wikimedia.org/T129497#2107463 (Milimetric) p:Normal>High [16:22:03] Analytics, Analytics-Cluster: Story: Community has periodic browser stats report generated from Hadoop data - https://phabricator.wikimedia.org/T69053#2139024 (Milimetric) [16:23:24] Analytics: Run browser reports on hive monthly - https://phabricator.wikimedia.org/T118330#1797540 (Milimetric) a:Milimetric>None [16:26:48] Analytics, Analytics-Cluster: Story: Community has periodic browser stats report generated from Hadoop data - https://phabricator.wikimedia.org/T69053#2139040 (Nuria) Preliminary browser reports are deployed to: https://browser-reports-test.wmflabs.org/ See task https://phabricator.wikimedia.org/T1304... [16:30:05] Analytics-Kanban: Change the Pageview API's RESTBase docs for the top endpoint - https://phabricator.wikimedia.org/T120019#2139060 (Milimetric) a:JAllemandou [16:57:49] mforns: i have added a link to implementation in pageview definition: https://meta.wikimedia.org/wiki/Research:Page_view#Implementation [16:58:02] nuria, aha [16:58:42] mforns: so whoever is interested can read it, documentation + code is real descriptive, issue is that the pageview definition is complicated [16:58:53] mforns: but our system is not simple either [16:59:13] I was writing the response to John about this exact issue right now, thanks :] [16:59:29] nuria, ^ [16:59:33] k [17:00:52] mforns: I think that joal has som eplots as to how our bot detection chnaged when we fixed our regex, latests additions have not chnaged matters much [17:01:03] aha [17:03:09] mforns: also we are not the rulers of what a pageview is, research has the last word but we expect teams to inform us as to what constitutes a pageview in their system. that would be very different for the IOs app and the mobile web [17:03:29] nuria, makes sense [17:03:43] mforns: we had a discussion with oliver (longggg) that should be on analytics archives [17:04:33] nuria, what should be on analytics archives? IOS and mobile web? [17:04:56] mforns: the "what is a pageview?" and "who owns definition?" [17:05:26] mforns: "[Analytics] Confusing pageviews" is the thread [17:05:27] nuria, you mean on analytics wikitech? [17:05:45] no analytics@ archives [17:06:03] nuria, you mean we should own that? [17:06:13] what do you mean with archives? :] [17:06:25] mforns: "e-mail list archives", sorry [17:06:36] nuria, ah! [17:06:41] ok ok ok [17:06:56] sure [17:06:58] argh, sorry, i was too brief [17:19:39] milimetric: have you talk to neilpquinn about the data asaf is interested on? [17:31:21] mforns: I realized Erik z. was not cc-ed in our questions about wikistats [17:31:28] nuria, oh... [17:31:58] nuria, did you forward them to him? or should I do that? [17:32:06] mforns: i did [17:32:10] thanks! :] [18:03:34] milimetric: another question on aqs uniques [18:03:58] sure [18:04:10] So far our format for timestamps is hourly based (even if we only servbe daily) [18:04:40] I changed it to daily for some of the pageviews endpoints and people objected [18:04:41] Do you think we should replicate that for uniques, or change and have daily timestamps ? [18:05:16] milimetric: I think it's currently daily for per-article [18:05:34] top is not concerned, so it leaves us per-project that is hourly [18:06:19] isn't the format passed in still hourly for per-article though? We changed that soo many times I forgot, looking [18:06:37] joal: I think timestamp should be daily for data with daily granularity , no? otherwise seems really confusing [18:06:55] milimetric: end-users pass daily formatted data (fakeHour trick) [18:07:15] right, but both daily and hourly can be passed [18:07:20] nuria: works for me - Day fomat data for daily (and therefore monthly) [18:07:27] going offline team! talk with you tomorrow :) [18:07:30] milimetric: correct ! [18:07:37] Had not thought of that [18:07:39] I think keep it that way, let it not error out, but document it as daily [18:08:11] that way machines, consistency freaks, and correctness freaks are all satisfied [18:08:26] ok - About the cassadandra data now, daily as well, or hourly ? [18:08:27] I myself am all three of those things and therefore none :) [18:08:56] daily, definitely, we never plan hourly for this, it would intrude on privacy [18:11:50] milimetric: cool :) [18:12:25] milimetric: if backend data is daily, then no need for falehour :) [18:14:19] to validate we do need it, though, right? [18:14:39] milimetric: was reading code again, and indeed you're right, we need it [18:21:22] milimetric: funny result https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Star_wars/daily/2016030100/2016030200 -- https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Star_wars/daily/20160301/20160302 [18:48:29] joal: that's def. a bug. I tried to file it and phab is giving me grief [18:48:38] if you're not around I'll file it later [18:48:38] :) [18:49:07] milimetric: That is actually fun the reason why it works nonetheless ! [19:56:34] Analytics-Kanban, EventBus: Deploy mediawiki/event-schemas with scap separately instead of using submodule in eventlogging - https://phabricator.wikimedia.org/T127099#2032407 (Ottomata)