[01:24:38] <wikibugs>	 Analytics, Editing-Analysis, Notifications, Collab-Team-2016-Apr-Jun-Q4: Numerous Notification Tracking Graphs Stopped Working at End of 2015 - https://phabricator.wikimedia.org/T132116#2321766 (Neil_P._Quinn_WMF) @Nuria, I have tweaked the SQL in that repo and it does use reportupdater, but it o...
[03:45:46] <wikibugs>	 Analytics-Kanban, Datasets-General-or-Unknown, WMDE-Analytics-Engineering: Fix permissions on dumps.wm.o access logs synced to stats1002 - https://phabricator.wikimedia.org/T134776#2321868 (Nuria) Open>Resolved
[07:34:49] <moritzm>	 joal: ping me when you're online and have time to debug the failing queries?
[07:35:27] <moritzm>	 the fix for the first aqs firewall problem is merged since yesterday
[08:01:38] <elukey>	 moritzm: o/
[08:55:07] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Configure Spark YARN Dynamic Resource Allocation - https://phabricator.wikimedia.org/T101343#2322119 (elukey) >>! In T101343#2320978, @Ottomata wrote: > Hmmm, >  > Just noticed 2 things.  [[ http://spark.apache.org/docs/1.5.0/configuration.html#d...
[10:49:15] * elukey lunch!
[11:59:02] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2322536 (mobrovac) a:Nuria>mobrovac >>! In T135240#2319092, @mobrovac wrote: > [Gerrit 290264](https://gerrit.wikimedia.org/r/#/c/290264/...
[12:09:50] <wikibugs>	 Analytics, Analytics-Wikistats, Internet-Archive: Total page view numbers on Wikistats do not match new page view definition - https://phabricator.wikimedia.org/T126579#2322567 (ezachte) After adding Wp zero earlier, this week I added two more categories that were missing: mobile traffic to other pro...
[12:28:28] <elukey>	 joal: you there?
[12:28:42] <wikibugs>	 Analytics-Wikistats: Unexpected increase in traffic for 4 languages in same region, on smaller projects - https://phabricator.wikimedia.org/T136084#2322595 (ezachte)
[12:46:41] <elukey>	 tried to execute joal's oozie load job to AQS test but
[12:46:42] <elukey>	 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: aqs1004-a.eqiad.wmnet/10.64.0.126:9042 (com.datastax.driver.core.TransportException: [aqs1004-a.eqiad.wmnet/10.64.0.126:9042] Cannot connect))
[12:46:46] <elukey>	 same error
[12:49:59] <elukey>	 !log stopping kafka on kafka1013 and rebooting the host for kernel upgrade
[12:50:00] <analytics-logbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master
[12:50:47] <mforns>	 hi team!
[12:50:55] <joal>	 Hi elukey !
[12:51:35] <joal>	 elukey: I saw you tried my query (it sends me an email ;), great
[12:52:02] <elukey>	 ahhhh okok :D
[12:52:05] <elukey>	 hellooo joal
[12:52:11] <joal>	 elukey, moritzm: I'm sorry to ask you some more help because of that failure
[12:52:16] <joal>	 Hi elukey :)
[12:52:31] <joal>	 I backloged a bit IRC, saw you had trouble with kafka a bit?
[12:52:51] <elukey>	 yeah all solved for the moment, the partitions were filling up
[12:52:58] <elukey>	 due to the migration
[12:53:06] <joal>	 elukey: how come?
[12:53:09] <joal>	 retention changed?
[12:53:17] <elukey>	 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=35&fullscreen
[12:53:48] <elukey>	 so logsizes were increasing right after the migration, then got purged after normal retention
[12:54:16] <elukey>	 we checked on the host and there were more logs for the days of the switch
[12:55:23] <joal>	 Yeah, I can imagine: one week of data copied on a single day (because of real retention, and new version integration) --> Leads to two actual weeks of data at the end of the first week
[12:55:30] <icinga-wm>	 PROBLEM - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: processor/client-side-11 processor/client-side-07 processor/client-side-04 processor/client-side-03 processor/client-side-00
[12:55:51] <elukey>	 hello EL!
[12:55:54] <elukey>	 let me restart you
[12:56:10] <joal>	 elukey: EL problems due to kafka?
[12:56:16] <elukey>	 yep
[12:56:37] <elukey>	 !log EL restarted after kafka1013 node stop (kernel upgrades)
[12:56:39] <analytics-logbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master
[12:56:53] <joal>	 k makes sense elukey ;0
[12:57:32] <icinga-wm>	 RECOVERY - Check status of defined EventLogging jobs on eventlog1001 is OK: OK: All defined EventLogging jobs are runnning.
[12:57:56] <elukey>	 yeah we need to fix that problem :(
[12:58:35] <joal>	 elukey: have you tried spark dynamic allocation?
[12:58:51] <joal>	 elukey: saw you merged and restarted, so I want to SEE IT !
[12:58:54] <elukey>	 joal: nope! I was waiting you or ottomata, but it works :)
[12:59:02] <elukey>	 ergh it is set correctly
[12:59:03] <joal>	 YAYYYY
[13:08:23] <elukey>	 joal: I mistakenly set max executors in the spark defaults initially but removed it this morning
[13:08:44] <joal>	 elukey: meaning, maxExecutirs back to infinity?
[13:08:53] <elukey>	 yep
[13:09:07] <elukey>	 ottomata suggested not to cap them for the moment
[13:09:19] <joal>	 k, thanks for having let me know, I would have tried to see if the restriction was working, and would have killed the lcuster ;)
[13:09:22] <elukey>	 and if we want to do it only with proper variable substitution
[13:12:13] <grrrit-wm>	 (CR) Mforns: [C: -1] "Great idea, I should have done this instead of the displayName. I -1'd it only because there's a missing semicolon." (2 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/290306 (https://phabricator.wikimedia.org/T122533) (owner: Nuria)
[13:16:14] <milimetric>	 mforns / joal: yall wanna meet about edit data since we missed yesterday?
[13:16:24] <joal>	 milimetric: sure !
[13:16:41] <milimetric>	 we could chat a bit now and then again later if nuria and ottomata wanna join
[13:16:46] <milimetric>	 I'll be in the cave
[13:17:03] <joal>	 works for me milimetric
[13:19:11] <joal>	 mforns, milimetric batcave now is it ?
[13:19:36] <milimetric>	 I'm in the cave but we could wait for mforns instead
[13:24:45] <elukey>	 for some reason grafana is acting weird joal https://grafana.wikimedia.org/dashboard/db/kafka
[13:24:54] <elukey>	 messages in shows a hole
[13:25:14] <elukey>	 but if you put your mouse on it then it shows correct values
[13:25:24] <elukey>	 this is one of the "mmmmmmm" situations
[13:26:23] <elukey>	 grafana looks like a war zone and it was only a reboot
[13:27:17] <elukey>	 milimetric: o/
[13:27:54] <milimetric>	 hi elukey
[13:29:23] <mforns>	 milimetric, joal, I'm coming, give me 5 minutes please
[13:29:33] <milimetric>	 no rush mforns we have all morning
[13:30:12] <joal>	 elukey: dynamic allocation tested with special parameters in spark shell command
[13:30:19] <joal>	 elukey: Works awesome :)
[13:30:20] <joal>	 :D
[13:31:19] <elukey>	 wooooooooo
[13:31:48] <elukey>	 joal: what did you set specifically?
[13:32:01] <joal>	 elukey: one caveat found (in spark config, need to be carefull on how executors are released), but that real great
[13:32:20] <joal>	 elukey: --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true
[13:32:48] <elukey>	 joal: spark.dynamicAllocation.enabled should be set to true by CDH theoretically, does it work even without it?
[13:32:54] <milimetric>	 more lightweight CSS that we could use for some of our layouts maybe: https://github.com/picturepan2/spectre
[13:33:05] <joal>	 elukey: not set by default for us
[13:33:33] <joal>	 elukey: When not setting it, we get back to default number of executors (2)
[13:34:04] <elukey>	 this is weird, it should have been enabled :/
[13:35:42] <joal>	 milimetric: I also confirm that, in addition to being a better yarn citizen, spark is real faster in dynamic allocation mode (it can actually use more of the unused resources)
[13:36:06] <milimetric>	 sweet
[13:43:32] <joal>	 elukey: Have you tried discussing with moritzm about the new-aqs loading issue?
[13:44:01] <mforns>	 joal, milimetric, do you want to batcave now?
[13:44:09] <milimetric>	 sure
[13:44:12] <mforns>	 omw
[13:44:23] <joal>	 elukey: to me dynamic allocation is working !!!
[13:45:08] <joal>	 elukey: we need some doc about how to manage memory and caching though, cause there is a strong link between cached data and number of live executor that will live for inifinity
[13:45:18] <elukey>	 joal: not yet because we were upgrading kafka1013 with the 4.4 kernel, but I tried to re-run the job and found the same issue.. even after the firewall changes
[13:45:58] <elukey>	 it took me a while because I didn't know where to get the info, and eventually I found the map logs :)
[13:46:55] <elukey>	 joal: yeah I added 10 as max executors to have a standard config so we might think to enforce a limit rather than relying on people :)
[13:47:29] <elukey>	 we can talk about it during the ops sync maybe?
[13:52:23] <moritzm>	 elukey, joal: we can look into it now, on which host is the query failing?
[13:52:37] <elukey>	 moritzm: I was about to write :)
[13:52:53] <joal>	 moritzm: I have not launched it yet, and we still have the issue of not knowing where it's executed :(
[13:52:56] <elukey>	 so I tried to run telnet aqs1004-a.eqiad.wmnet 9042 on analytics1052 (last one to fail) and aqs1005.eqiad.wmnet
[13:53:17] <elukey>	 on analytics1052 I see it hanging
[13:53:27] <elukey>	 meanwhile on aqs1005.eqiad.wmnet works fine (I get a session)
[13:54:05] <elukey>	 joal: there is some info the the mappers about the host that emits errors no?
[13:54:19] <joal>	 Yes, we acn know it after, but not before :)
[13:54:28] <elukey>	 ah yeah okok :)
[13:56:06] <moritzm>	 so I'd say I'll simply add logging rules to all analytics105* nodes and then a few queries should catch it, right?
[13:56:29] <joal>	 moritzm: hm, I hope so ;)
[13:56:43] <joal>	 if not at first try, at least at second
[13:56:48] <joal>	 moritzm: --^
[13:57:29] <moritzm>	 k, let me make tea and then I'll add the rules
[13:57:35] <joal>	 kk
[14:05:04] <moritzm>	 k, feel free to go ahead with one or a few queries now
[14:12:22] <elukey>	 joal: started 0011493-160519124420827-oozie-oozi-C
[14:12:28] <elukey>	 (from stat1004)
[14:12:40] <elukey>	 moritzm: the job is running, should fail in a bit
[14:16:44] <elukey>	 moritzm: analytics1053.eqiad.wmnet failed
[14:17:25] <moritzm>	 let me have a look
[14:17:42] <elukey>	 mmm no sorry I might be confused, lemme check better
[14:18:11] <moritzm>	 ack, nothing dropped on 1053
[14:18:47] <moritzm>	 nothing on analytics105* in fact
[14:19:20] <elukey>	 so analytics1056.eqiad.wmnet is the one
[14:19:37] <elukey>	 can see in the logs Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: aqs1004-a.eqiad.wmnet/10.64.0.126:9042 (com.datastax.driver.core.TransportException: [aqs1004-a.eqiad.wmnet/10.64.0.126:9042] Cannot connect))
[14:21:31] <elukey>	 moritzm: --^
[14:21:45] <moritzm>	 ok, so it's not a matter of the hadoop node not accepting the traffic (nothing is logged there), I added logging code to aqs1004, can you repeat the query?
[14:21:54] <elukey>	 sure!
[14:22:08] <elukey>	 0011509-160519124420827-oozie-oozi-C started
[14:26:54] <moritzm>	 did you get an error so far?
[14:27:55] <elukey>	 yep just checked
[14:28:21] <elukey>	 analytics1028,1033,1055,1052 failed in reduce step
[14:30:14] <moritzm>	 then it's not something failing on the iptables level, there's no traffic dropped by iptables on either analytics1052 or aqs1004
[14:32:22] <elukey>	 moritzm: so it is probably happening on application level
[14:32:37] <elukey>	 but it looks weird
[14:32:52] <elukey>	 I can telnet from analytics to aqs, but I can from aqs to aqs
[14:32:53] <elukey>	 mmmm
[14:33:11] <moritzm>	 there's a "n't" missing somewhere above?
[14:33:34] <elukey>	 yes there is sorry, long day :)
[14:33:58] <moritzm>	 so from where does telnet work and from where doesn't?
[14:34:17] <elukey>	 I can't telnet from analytics to aqs but I can from aqs to aqs (port 9042)
[14:34:28] <elukey>	 last one specifically aqs1005 to aqs1004-a
[14:34:50] <elukey>	 maybe I can retry from analytics1052
[14:35:01] <moritzm>	 where did you connect to, to the actual IP of the aqs host or to one of the IP addresses assigned to the Cassandra instances?
[14:35:33] <elukey>	 cassandra instance, aqs1004-a
[14:36:03] <elukey>	 so I just tried from analytics1052
[14:36:07] <elukey>	 telnet aqs1004-a.eqiad.wmnet 9042
[14:36:11] <elukey>	 telnet aqs1004.eqiad.wmnet 9042
[14:36:25] <elukey>	 and I get Trying $IP1 and $IP2
[14:36:32] <elukey>	 correspondent to the above domains
[14:36:53] <elukey>	 meanwhile
[14:36:54] <elukey>	 elukey@aqs1005:~$ telnet aqs1004-a.eqiad.wmnet 9042
[14:36:54] <elukey>	 Trying 10.64.0.126...
[14:36:54] <elukey>	 Connected to aqs1004-a.eqiad.wmnet.
[14:37:44] <elukey>	 compared the IPs from the various telnets, all looks good
[14:40:23] <moritzm>	 the aqs hosts are in the standard eqiad network, while the hadoop nodes are in the analytics network. you should talk to Faidon, I'm pretty sure this needs changes on the router side, probably only aqs100[1-3] are currently granted there
[14:40:30] <joal>	 actually milimetric, while looking at our schema definition, the new_editor (+ productive and survaving) are present ...
[14:40:38] <moritzm>	 once that is checked/fixed we can revisit the iptables/ferm level
[14:40:41] <elukey>	 moritzm: yes this is a very good point!
[14:40:50] <elukey>	 I forgot about it!
[14:40:58] <milimetric>	 joal: present?
[15:23:27] <elukey>	 joal: cassandra-daily-coord-pageviews_per_article_flat_LCS is now running like a charm :)
[15:23:38] <joal>	 elukey: YESSSSSS !
[15:23:53] <joal>	 elukey: in meeting, but I'll get back to you soo
[15:36:17] <joal>	 Thanks a lot moritzm for having found the issue :)
[15:45:16] <wikibugs>	 Analytics, Editing-Analysis, Notifications, Collab-Team-2016-Apr-Jun-Q4: Numerous Notification Tracking Graphs Stopped Working at End of 2015 - https://phabricator.wikimedia.org/T132116#2323211 (Nuria) @Neil_P._Quinn_WMF  Ah, I see, Thanks for looking into it, let us know if you need any help....
[15:49:51] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Configure Spark YARN Dynamic Resource Allocation - https://phabricator.wikimedia.org/T101343#2323217 (Ottomata) Need to:  - parameterize and set spark.shuffle.service.enabled true  Consider removing explicit executor settings for production oozie...
[15:56:57] <wikibugs>	 Analytics, Analytics-Wikistats, Internet-Archive: Total page view numbers on Wikistats do not match new page view definition - https://phabricator.wikimedia.org/T126579#2323297 (Nuria) >Also missing but negligible are mediawiki (6M/mon) and www.wikisource (0.5M/mon, the precursor of language-specific...
[16:00:59] <milimetric>	 ottomata / elukey: standup
[16:25:02] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2323436 (mobrovac) >>! In T135240#2299055, @Milimetric wrote: > The PR for this change: https://github.com/wikimedia/restbase/pull/614  It's been...
[16:47:18] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2323504 (Nuria) What i do not understand is what is exactly enabled here:  - from @mobroac's comments it looks like limits per worker are enabled...
[16:48:12] <joal>	 elukey: here ?
[16:49:42] <elukey>	 joal: yep
[16:51:43] <joal>	 elukey: batcave ?
[16:56:39] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2292830 (Pchelolo) @Nuria The DST is enabled in production right now. All the limits are per endpoint, not per worker.
[16:57:26] <wikibugs>	 Analytics, cassandra: AQS Cassandra cluster: Restart cassandra-metrics-collector - https://phabricator.wikimedia.org/T134513#2323530 (Eevans) Open>Resolved a:Eevans Since opening this issue, I've had Puppet updated to automatically restart cassandra-metrics-collector, and a subsequent run (on...
[16:59:58] <nuria_>	 a-team: retrroooooo
[17:01:07] <nuria_>	 ottomata: retrooo
[17:38:05] <milimetric>	 ottomata: I'm gonna poke around to see what's up with the druid community
[17:38:17] <ottomata>	 k
[17:42:24] <nuria_>	 ottomata: could we merge this one? https://gerrit.wikimedia.org/r/#/c/290284/
[17:43:05] <ottomata>	 nuria_: , quotes around 'latest' and yes!
[17:43:26] <nuria_>	 ottomata: will do, i saw it both ways on puppet, wasn't sure, thank you
[17:45:52] <grrrit-wm>	 (PS3) Nuria: MonthlyUniqueDevices metric should be bookmarkeable [analytics/dashiki] - https://gerrit.wikimedia.org/r/290306 (https://phabricator.wikimedia.org/T122533)
[17:45:53] <elukey>	 joal: sorry if I wasn't super chatty but today was looong, my brain is segfaulting now :D
[17:46:28] <grrrit-wm>	 (CR) Nuria: MonthlyUniqueDevices metric should be bookmarkeable (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/290306 (https://phabricator.wikimedia.org/T122533) (owner: Nuria)
[17:46:45] <joal>	 elukey: no worries, didn't even notice :)
[17:51:54] <elukey>	 a-team logging off, talk with you tomorrow :)
[17:51:59] <ottomata>	 laters!
[17:52:00] <joal>	 bye elukey !
[17:52:02] <mforns>	 elukey, bye!
[17:52:05] <milimetric>	 nite
[17:52:08] <madhuvishy>	 bye elukey :) good night!
[17:52:11] <elukey>	 o/
[17:52:49] <joal>	 mforns: will go for dinner in a few minutes, is there anything I can help on rapidly?
[17:53:09] <ottomata>	 nuria puppet pulled!
[17:53:17] <mforns>	 joal, I am doing a quick review of dashiki, and I will continue with scala
[17:53:27] <ottomata>	 but ja, looks cached :)
[17:53:42] <joal>	 k mforns, will come and help after diner
[17:53:49] * joal is away for diner
[17:53:54] <mforns>	 so, thanks joal we'll see later, bon appetit
[17:54:12] <ottomata>	 oh, nuria_ naw, i just had to hard referesh
[17:54:24] <ottomata>	 link is fixed
[17:54:51] <nuria_>	 ottomata: k, great!
[17:56:21] <grrrit-wm>	 (CR) Mforns: [C: 2 V: 2] "LGTM!" [analytics/dashiki] - https://gerrit.wikimedia.org/r/290306 (https://phabricator.wikimedia.org/T122533) (owner: Nuria)
[18:00:02] <madhuvishy>	 ottomata: Do you think we should still do this? https://phabricator.wikimedia.org/T132177
[18:00:20] <nuria_>	 mforns: all right, let's deploy to labs for now. I will be removing vital-signs i think when we announce analytics.wikimedia.org
[18:00:36] <madhuvishy>	 nuria_: you could set up a redirect
[18:00:44] <madhuvishy>	 seems kinda abrupt to kill it
[18:00:59] <nuria_>	 madhuvishy: yes, that is what i will do
[18:01:04] <madhuvishy>	 cool
[18:01:11] <ottomata>	 ah, ja madhuvishy that is probably a good one to do
[18:01:13] <nuria_>	 madhuvishy: but not yet, we need erik to change links in wikistats
[18:01:19] <ottomata>	 do you need that soon?
[18:01:24] <nuria_>	 madhuvishy: also test domains will remain cause those are super useful
[18:01:32] <wikibugs>	 Analytics-Kanban: Backfill Android Apps pageviews from May 2nd hour 21 - https://phabricator.wikimedia.org/T135299#2323872 (Tbayer) Great, thanks! Will take a closer look, but at least the percentage of app views among overall pageviews is back to previous levels (1.3% for the week until May 22).   What abou...
[18:01:47] <madhuvishy>	 ottomata: if you can do that now, I can update that part of the job before getting it merged
[18:02:08] <madhuvishy>	 but it's not necessary to finish this project
[18:02:44] <mforns>	 nuria_, I will deploy to staging, but remember that we can not see the unique devices there, until we change the config wiki for production, they share the same config page
[18:02:45] <madhuvishy>	 nuria_: ah yeah cool - I've been wanting to ask - how are you deploying the dashboards to stat1001?
[18:02:57] <nuria_>	 madhuvishy: static git clone
[18:03:01] <ottomata>	 ok, i can make the user real easy, but storing the pw in the pwstore has always been a pain and i've put off doing that
[18:03:03] <madhuvishy>	 :(
[18:03:12] <nuria_>	 mforns: nah, i will deploy no worries
[18:03:16] <ottomata>	 hm, oh, but it will be in puppet, ja?
[18:03:17] <ottomata>	 hmmmm
[18:03:18] <ottomata>	 that isn't so hard
[18:03:20] <ottomata>	 in puppet private
[18:03:32] <ottomata>	 madhuvishy:  where will you save the login info?
[18:03:43] <madhuvishy>	 ottomata: cool - i'll need to save it jenkins anyway
[18:03:46] <madhuvishy>	 let me give you link
[18:03:46] <nuria_>	 mforns: i will re test in the browsers i have and deploy
[18:03:55] <ottomata>	 madhuvishy:  where /how does jenkins save it?
[18:04:09] <madhuvishy>	 ottomata: https://integration.wikimedia.org/ci/credential-store/
[18:04:30] <madhuvishy>	 ottomata: to be specific, here - https://integration.wikimedia.org/ci/credential-store/domain/_/
[18:04:48] <madhuvishy>	 all private keys, sensitive passwords etc are stored here
[18:04:59] <ottomata>	 huh, ok
[18:05:03] <ottomata>	 interesting
[18:05:17] <madhuvishy>	 it's still labs
[18:05:36] <madhuvishy>	 ottomata: I have the archiva creds available there now - but ideally I'd remove those and add new ones.
[18:05:49] <madhuvishy>	 and have it available in a global maven settings file
[18:05:51] <ottomata>	 aye
[18:06:01] <madhuvishy>	 and any one can use it
[18:06:01] <ottomata>	 is 'archiva-ci' a good name? you think?
[18:06:21] <ottomata>	 username
[18:06:28] <madhuvishy>	 ottomata: sure!
[18:06:32] <ottomata>	 k
[18:07:08] <madhuvishy>	 nuria_: why static clone? we should probably extend the deployment process to stat1001 if we are going to have more dashboards
[18:08:16] <madhuvishy>	 ottomata: you can just add it here - https://integration.wikimedia.org/ci/credential-store/domain/_/newCredentials and dont have to send me the password (username with password, global scope)
[18:09:45] <ottomata>	 ok
[18:11:25] <nuria_>	 madhuvishy: git clone  is the easiest to deploy to 1001,  is it not? it's all static for all dashboards so there are not several sites, just one that is cached by varnish.
[18:11:48] <madhuvishy>	 nuria_: but deployment should not be controlled by puppet ideally
[18:11:54] <madhuvishy>	 puppet sets up infrastructure
[18:12:02] <madhuvishy>	 deployments are handled by us
[18:12:48] <ottomata>	 madhuvishy:  done
[18:12:55] <madhuvishy>	 ottomata: <3 thank you
[18:12:58] <nuria_>	 madhuvishy: in theory sure, in reality we do not have an easier way to deploy a static site that i know of, have in mind that this could be deployed to s3, is all client side
[18:13:17] <ottomata>	 madhuvishy:  ja we all talked about this.  deploying via scap would be fine too
[18:13:20] <ottomata>	 we might do it eventually
[18:13:26] <ottomata>	 for now a git clone of static files was pretty easy
[18:13:31] <madhuvishy>	 hmmm
[18:13:34] <nuria_>	 scap is overkill in my opinion
[18:13:40] <nuria_>	 finishing with a hammer
[18:13:58] <ottomata>	 nuria_:  i mean, it's kinda more correct, but ja, in this case all scap would do is a git clone anyway
[18:14:00] <ottomata>	 git pull
[18:14:12] <ottomata>	 we just told puppet it to do it every time it runs
[18:14:16] <ottomata>	 instead of doing it manually
[18:14:28] <madhuvishy>	 right - but this way - once merged, its deployed
[18:14:44] <nuria_>	 madhuvishy: right
[18:15:31] <nuria_>	 madhuvishy: which for this site is no issue that i can see of, for a real application (server+client+storage) it would be
[18:15:39] <madhuvishy>	 so scap i guess atleast lets you control when something is deployed. i mean, we could still use fabric but then have to deploy from local
[18:16:10] <nuria_>	 madhuvishy: in an ideal world I would have a copySRC build that creates a package and a deploy process that pulls that package (no deploy from git)
[18:16:37] <madhuvishy>	 yeah i'm not a fan of the deploy from git for this case too
[18:16:58] <nuria_>	 madhuvishy: the fact that scap still deploys from git -which means i need to commit the build- makes me not think good thinks about that system
[18:17:21] <nuria_>	 i would buy it if deployed not from source
[18:17:23] <madhuvishy>	 ah nuria_ I guess scap was not built with static sites in mind
[18:17:32] <madhuvishy>	 hmmm
[18:20:42] <nuria_>	 ya
[18:21:20] <nuria_>	 deploying from the depot makes me sad and also makes me think we are back in the PAST
[18:22:39] <ottomata>	 bbib
[18:23:53] <travis-ci>	 wikimedia/mediawiki-extensions-EventLogging#558 (wmf/1.28.0-wmf.3 - 57f911c : Mukunda Modell): The build has errored.
[18:23:53] <travis-ci>	 Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/57f911c05ff8
[18:23:53] <travis-ci>	 Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/132637611
[18:29:26] <wikibugs>	 Analytics-Kanban: Create separate archiva credentials to be loaded to the Jenkins cred store {hawk} - https://phabricator.wikimedia.org/T132177#2190727 (madhuvishy) archiva-ci user has been created, and is available in the Jenkins credential store
[18:29:28] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2323937 (Nuria) @Pchelolo : so i should be able to see limits getting logged if i run a test using apache bench correct?
[18:30:42] <wikibugs>	 Analytics: Have archiva server credentials available via the Config File Builder in global maven settings.xml - https://phabricator.wikimedia.org/T132178#2323940 (madhuvishy) Done. Available in https://integration.wikimedia.org/ci/configfiles/ as ArchivaCredentialsSettings
[18:30:58] <wikibugs>	 Analytics-Kanban: Have archiva server credentials available via the Config File Builder in global maven settings.xml - https://phabricator.wikimedia.org/T132178#2323941 (madhuvishy)
[18:34:20] <wikibugs>	 Analytics-Kanban: Figure out if the Changelog file can be updated in the release process by Jenkins {hawk} - https://phabricator.wikimedia.org/T132181#2323958 (madhuvishy) Wrote a script to compute changelog from commit messages and have it automatically committed and pushed as part of the release process. W...
[18:34:38] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2323960 (Pchelolo) @Nuria not really, page views are varnish cached, so for `ab` only the first request will actually hit the servers. You need t...
[18:35:03] <wikibugs>	 Analytics: Dashiki: move available.projects.json to a better location - https://phabricator.wikimedia.org/T136120#2323964 (Nuria)
[18:48:14] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2324009 (Nuria) @Pchelolo : well, i was going to "replay" requests to article endpoint which is real easy to do as we have them all.  Looking at...
[18:53:20] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2324016 (Pchelolo) >>! In T135240#2324009, @Nuria wrote: > Should I send a PR to change it here: https://github.com/wikimedia/restbase/blob/maste...
[19:07:42] <nuria_>	 mforns: ok, tested a bunch found a  small issue with breakdowns that it is not super easy to fix but that i think doesn't need fixing quite yet.
[19:07:59] <mforns>	 nuria_, mmmmmm buf
[19:08:10] <nuria_>	 mforns: nothing to do with the new code though
[19:08:16] <mforns>	 nuria_, can you summarize?
[19:08:17] <nuria_>	 mforns: no ui issue
[19:08:33] <nuria_>	 mforns: yes, for small projects in which there is no mobile-app breakdown
[19:08:54] <nuria_>	 the promise to request that data fails (as req returns 404)
[19:08:58] <mforns>	 nuria_, mmmmm I see
[19:09:09] <nuria_>	 mforns: and thus promises.all() falls into rejected patterns
[19:09:23] <mforns>	 nuria_, well
[19:09:28] <nuria_>	 mforns: it is fixable though
[19:09:39] <mforns>	 sure, I can create a task and do it this week
[19:10:08] <nuria_>	 mforns: nah, i will do it. no rush
[19:10:16] <wikibugs>	 Analytics-Kanban: Backfill Android Apps pageviews from May 2nd hour 21 - https://phabricator.wikimedia.org/T135299#2324062 (JAllemandou) @Tbayer : I Forgot about those two (they don't depend on pageview). Currently backfilling daily, no need to do it for monthly.
[19:13:06] <nuria_>	 joal: Yt?
[19:13:11] <joal>	 yes nuria_
[19:13:56] <nuria_>	 joal: i was looking at graphana and it seems that when we get about 30/40 requests on aqs endpoint (per article) the number of 500 spikes
[19:14:07] <nuria_>	 joal: does that sound right?
[19:14:20] <nuria_>	 joal: ( i know issues with date ranges might be on top of this though)
[19:14:27] <joal>	 nuria_: correct
[19:14:53] <joal>	 nuria_: 500 begin to rise after 25, and higher after 30
[19:15:01] <joal>	 (in my looking)
[19:17:41] <wikibugs>	 Analytics-Kanban: Create separate archiva credentials to be loaded to the Jenkins cred store {hawk} - https://phabricator.wikimedia.org/T132177#2324101 (Ottomata) I need to add this to the ops pwstore, but am having trouble committing.  Keep this open.
[19:17:45] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2324102 (Nuria) @Pchelolo : number for us at which we want to start throttling it would be ~25/30 request per whole cluster (which has three mach...
[19:19:34] <nuria_>	 joal: k
[19:25:59] <nuria_>	 !log deploying latest master to dashiki 08cc9a2545bcc0a183a3c00c18e81f21326a41b
[19:26:00] <analytics-logbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master
[19:30:26] <milimetric>	 nuria_: just fyi, jquery 3.0 rc is out, if you work on that .all() promises bug, you can upgrade if you like their changes (they just re-worked deferred a bit, don't think it affects the bug you found)
[19:30:49] <milimetric>	 for that bug, probably just adding a .always handler instead of the .then
[19:31:25] <nuria_>	 milimetric: always? wait, let me see if that is on the native promises
[19:31:32] <nuria_>	 milimetric: ja, taht might work
[19:31:43] <milimetric>	 these are jquery promises I think, not native
[19:31:53] <milimetric>	 though in jquery 3.0 it sounds like they'll be fully compatible
[19:32:05] <nuria_>	 milimetric: right, the native ones do not have always
[19:32:12] <nuria_>	 milimetric: k, noted, need to file bug
[19:35:59] <nuria_>	 milimetric, mforns: I think "unique Devices" should probably "daily unique devices" though, for clarity
[19:36:08] <nuria_>	 *should probably be
[19:36:08] <mforns>	 nuria_, makes sense
[19:36:28] <milimetric>	 my thought before was that every other metric is daily, so that was like the assumed default
[19:36:41] <nuria_>	 true
[19:36:43] <milimetric>	 and if we changed one to "daily ..." it might be confusing if we don't change them all
[19:36:57] <nuria_>	 right, very true, leaving as is then
[19:36:58] <milimetric>	 buut... I donno, I agree "daily unique devices" is clearer
[19:37:39] <milimetric>	 might make sense in the future to change all metrics to include the granularity or to have selectable granularity from a dropdown that knows all available choices
[19:37:52] <nuria_>	 ya readers-.daily->unique devices
[19:38:04] <nuria_>	 readers->daily->unique devices
[19:38:19] <milimetric>	 yep, and we could easily do that in the categorizedMetrics computable, it already does this kind of thing with the categories
[19:38:38] <milimetric>	 or maybe readers -> unique devices -> daily
[19:39:00] <nuria_>	 milimetric: ok, will file task for taht too
[19:39:02] <nuria_>	 *that
[19:39:21] <nuria_>	 mforns, milimetric : deployed, https://vital-signs.wmflabs.org/#projects=eswiki/metrics=MonthlyUniqueDevices
[19:39:41] <nuria_>	 mforns, milimetric : i think it looks good, am about to send e-mail about it
[19:39:41] <milimetric>	 yes!!!
[19:39:48] <milimetric>	 \o/
[19:39:53] <mforns>	 woooohoooo!!!
[19:39:56] <milimetric>	 really nice job mforns + nuria_
[19:40:13] <mforns>	 + milimetric :]
[19:40:30] <milimetric>	 omg this reminds me I have to change metrics-by-project to use a flex layout, so much better
[19:42:08] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2324160 (Pchelolo) @Nuria this is per-endpoint limit. You have 3 endpoints, so 10 would be a good number
[19:46:13] <nuria_>	 milimetric: can you try IE?
[19:46:23] <nuria_>	 milimetric: i tried chrome and FF
[19:46:34] <milimetric>	 not yet, I have to be at home
[19:46:42] <milimetric>	 but I'll def. try it when I get home
[19:47:37] <nuria_>	 milimetric: will send two e-mails one external one internal
[19:48:05] <milimetric>	 sweet, I'm happy to help with that if you want
[19:57:19] <wikibugs>	 Analytics: Dashiki, Unique Devices and Pageview data breakdown doesn't work if any of the items are not available for the project - https://phabricator.wikimedia.org/T136125#2324192 (Nuria)
[20:00:26] <wikibugs>	 Analytics: Dashiki: Better menu for metrics - https://phabricator.wikimedia.org/T136126#2324211 (Nuria)
[20:11:11] <wikibugs>	 Analytics-Kanban, RESTBase, Services, RESTBase-API, User-mobrovac: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2324260 (Nuria) https://github.com/wikimedia/restbase/pull/618
[20:13:20] <wikibugs>	 Analytics: Dashiki: Breakdowns should be bookmarkeable - https://phabricator.wikimedia.org/T136127#2324269 (Nuria)
[20:22:39] <milimetric>	 hi terrrydactyl :) hope all's well
[20:22:53] <terrrydactyl>	 hi!
[20:33:08] <wikibugs>	 Analytics, Analytics-Dashiki: Breakdowns should be bookmarkeable - https://phabricator.wikimedia.org/T136127#2324348 (Danny_B)
[20:34:01] <wikibugs>	 Analytics, Analytics-Dashiki: Better menu for metrics  - https://phabricator.wikimedia.org/T136126#2324354 (Danny_B)
[20:35:10] <wikibugs>	 Analytics, Analytics-Dashiki: Dashiki, Unique Devices and Pageview data breakdown doesn't work if any of the items are not available for the project - https://phabricator.wikimedia.org/T136125#2324367 (Danny_B)
[20:35:26] <wikibugs>	 Analytics, Analytics-Dashiki: move available.projects.json to a better location  - https://phabricator.wikimedia.org/T136120#2324369 (Danny_B)
[20:37:27] <wikibugs>	 Analytics, Analytics-Dashiki: Simplify readiness checking by making a ready computed - https://phabricator.wikimedia.org/T136025#2324373 (Danny_B)
[20:37:57] <wikibugs>	 Analytics, Analytics-Dashiki: Searching for nl... does not bring nlwikipedia only nlwikidata - https://phabricator.wikimedia.org/T133718#2324375 (Danny_B)
[20:38:22] <wikibugs>	 Analytics-Dashiki, Analytics-Kanban, Patch-For-Review: Add support to 'displayName' param in config - https://phabricator.wikimedia.org/T134924#2324377 (Danny_B)
[20:40:04] <wikibugs>	 Analytics-Dashiki, Analytics-Kanban, Patch-For-Review: Visualize unique devices data {bear} - https://phabricator.wikimedia.org/T122533#2324380 (Danny_B)
[20:41:00] <wikibugs>	 Analytics, Analytics-Dashiki: Add extension and category  (ala Eventlogging) for DashikiConfigs - https://phabricator.wikimedia.org/T125403#2324390 (Danny_B)
[20:41:25] <wikibugs>	 Analytics, Analytics-Dashiki: Upgrade Dashiki to semantic-2 for all layouts - https://phabricator.wikimedia.org/T125409#2324392 (Danny_B)
[20:41:42] <wikibugs>	 Analytics, Analytics-Dashiki: Have dashiki read and write GET params to pass stateful versions of dashboard pages {crow} - https://phabricator.wikimedia.org/T119996#2324394 (Danny_B)
[20:42:07] <wikibugs>	 Analytics, Analytics-Dashiki: Allow clicking on links in annotations - https://phabricator.wikimedia.org/T110459#2324395 (Danny_B)
[21:00:17] <wikibugs>	 Analytics, Wikimedia-Site-requests: Need a Dashiki namespace so we can protect configs {crow} - https://phabricator.wikimedia.org/T112268#2324496 (Danny_B)
[21:32:44] <joal>	 ottomata: still around?
[21:33:03] <ottomata>	 ja hi
[21:33:20] <joal>	 quick question on your comment on https://gerrit.wikimedia.org/r/#/c/288210/6/jsonschema/mediawiki/revision_create/1.yaml
[21:34:04] <joal>	 ottomata: every field is named rev_* except for page and user related ones ... So I guess if we change user, we change page?
[21:34:45] <ottomata>	 hmmmm
[21:34:57] <ottomata>	 oh no, i mean we should just reorder them so that they are next to each other in the schema
[21:35:00] <ottomata>	 no functional change
[21:35:03] <ottomata>	 just aesthetic
[21:35:18] <joal>	 OoooOOOh !
[21:35:23] <joal>	 Didn't get it :)
[21:35:35] <joal>	 Ok, aesthetic change it will be :)
[21:48:16] <neilpquinn>	 nuria: in dashiki, src/app/apis/wikimetrics.js has the comment: "This module returns an instance of an object that knows how to get reports run by WikimetricsBot on wikimetrics".
[21:48:22] <neilpquinn>	 What's WikimetricsBot?
[21:50:12] <joal>	 ottomata: did one big last change to the schemas (page_move one)
[21:50:43] <joal>	 ottomata: after that we should discuss with Mark on details/technical aspects, since we seem to be globally in agreement
[21:57:05] <ottomata>	 k cool
[22:07:31] <ottomata>	 joal:  you still up?  if so, how would you feel about a druid/hadoop brainbounce real quick?
[22:07:40] <joal>	 ottomata: let's go :)
[22:08:26] <joal>	 ottomata: in da cave
[22:26:22] <mforns>	 bye a-team! tomorrow
[22:26:27] <joal>	 bye mforns !
[22:26:34] <madhuvishy>	 bye :)
[22:26:49] <madhuvishy>	 joal: do we have a refinery release sometime soon?
[22:26:59] <joal>	 madhuvishy: nothing planed yet
[22:27:07] <madhuvishy>	 joal: alright
[22:31:27] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Translate the analytics-release-test job to YAML config in integration/config {hawk} - https://phabricator.wikimedia.org/T132182#2324797 (madhuvishy)
[22:35:49] <wikibugs>	 Analytics: Figure out the exact strategy for release {hawk} - https://phabricator.wikimedia.org/T132180#2324843 (madhuvishy) This will have to be:  Go to https://integration.wikimedia.org/ci/job/analytics-refinery-release/m2release/. Verify the versions - Check the Specify custom SCM version checkbox - chang...
[22:36:11] <wikibugs>	 Analytics-Kanban: Figure out the exact strategy for release {hawk} - https://phabricator.wikimedia.org/T132180#2324844 (madhuvishy)
[22:52:08] <wikibugs>	 Analytics-Kanban: Backfill Android Apps pageviews from May 2nd hour 21 - https://phabricator.wikimedia.org/T135299#2324875 (JAllemandou) @ TBayer: Mobile apps uniques backfilled !
[23:28:54] <joal>	 off for tonight a-team, see you tomorrow !
[23:35:55] <nuria_>	 neilpquinn: still there?
[23:39:59] <neilpquinn>	 yep, I'm here!
[23:40:34] <neilpquinn>	 nuria ^ :)
[23:40:48] <neilpquinn>	 or actually nuria_ :)
[23:41:39] <nuria_>	 neilpquinn: to your dashiki question, did you get an answer?
[23:41:58] <neilpquinn>	 no, I didn't. nothing urgent but I'm curious :)
[23:42:16] <nuria_>	 neilpquinn: mostly is that it does not matter for dashiki purposes, we have an automated user for wikimetrics
[23:42:23] <nuria_>	 which we call wikimetrics bot
[23:42:49] <nuria_>	 neilpquinn: it calculates the edit metrics like rolling active editor
[23:43:09] <nuria_>	 neilpquinn: but you know we are working on having edit data on hdfs and do away with all those calculations right?
[23:44:32] <neilpquinn>	 nuria_: Yeah, I'm generally aware of that work, although I don't really know what it consists of or how it will work with our use cases specifically. Who's the best person to talk with about that? Dan?
[23:45:57] <neilpquinn>	 nuria_: And I hear it's at least 3-6 months away so we still have to have editor metrics in the interim.
[23:46:08] <neilpquinn>	 nuria_: Anyway, what I'm doing right now is trying to understand Dashiki so I can add things to our dashboard :)
[23:46:14] <nuria_>	 neilpquinn: k
[23:52:43] <neilpquinn>	 nuria_: so who's the best person to talk to about the editing data project? :)