[02:15:12] <wikibugs>	 Analytics / Visualization: correct attribution to comScore in monthly report card - https://bugzilla.wikimedia.org/73344 (Kevin Leduc) NEW p:Unprio s:normal a:None WMF displays data form comScore: http://reportcard.wmflabs.org/  Per comScore's data usage policy: http://www.comscore.com/Insi...
[02:26:10] <wikibugs>	 Analytics / Visualization: correct attribution to comScore in monthly report card - https://bugzilla.wikimedia.org/73344#c1 (Kevin Leduc) where else is comScore data displayed?
[07:36:25] <wikibugs>	 Analytics / Dashiki: Story: VitalSignsUser selects Monthly Pageviews metric - https://bugzilla.wikimedia.org/73331 (Kevin Leduc)
[07:41:12] <wikibugs>	 Analytics / Wikimetrics: Story: WikimetricsUser deletes user from cohort - https://bugzilla.wikimedia.org/73350 (Kevin Leduc) NEW p:Unprio s:enhanc a:None A user can legally request to be removed from a cohort.  The creator of that cohort needs a simple mechanism to remove that user from th...
[07:42:10] <wikibugs>	 Analytics / Wikimetrics: Story: WikimetricsUser deletes user from cohort - https://bugzilla.wikimedia.org/73350#c1 (Kevin Leduc) collaborative tasking on etherpad: http://etherpad.wikimedia.org/p/analytics-73350
[07:50:55] <wikibugs>	 Analytics / General/Unknown: Story: Analyst uses an operationalized Saiku - https://bugzilla.wikimedia.org/73246 (Kevin Leduc) s:normal>enhanc
[07:53:10] <wikibugs>	 Analytics / General/Unknown: Story: Analyst uses an operationalized Saiku - https://bugzilla.wikimedia.org/73246#c1 (Kevin Leduc) collaborative tasking on etherpad: http://etherpad.wikimedia.org/p/analytics-73246
[08:58:48] <grrrit-wm>	 (CR) Hashar: "Adding the py27 env to Jenkins is straightforward: https://gerrit.wikimedia.org/r/#/c/172954/" [analytics/aggregator] - https://gerrit.wikimedia.org/r/172195 (https://bugzilla.wikimedia.org/72740) (owner: QChris)
[12:52:52] <springle>	 qchris: wow, I thought I logged that earlier. sorry :\
[12:53:07] <qchris>	 Maybe I overlooked it
[12:53:10] * qchris checks again.
[12:53:20] <springle>	 no, I think you're right
[12:54:07] <qchris>	 Anyways ... did you restart the consumers by hand, or did EventLogging pick the CNAME change up some other way?
[12:54:11] <springle>	 yes, I restarted the mysql-m2-master service, which didn't work. quizzed ori who brought up firewall issues, then reverted
[12:54:25] <qchris>	 k.
[12:54:41] <springle>	 why did that cause it to skip data? jfmi
[12:54:42] <qchris>	 Ja, I think firewall looks like a good scapegoat.
[12:54:56] <qchris>	 I tried to connect to dbproxy by hand and that timed out.
[12:55:00] <qchris>	 (from vanadium)
[12:55:39] <qchris>	 Because restarting the mysql consumer caused it to try to connect to dbproxy (i guess)
[12:55:46] <qchris>	 but it could not connect.
[12:55:53] <qchris>	 so it could not write the events to the db.
[12:56:10] <springle>	 so it writes thing synchronously, or not at all?
[12:56:12] <qchris>	 (But I hope it could write the events to fallback logs ... the check is still running)
[12:56:22] <qchris>	 right. synchronously or not at all.
[12:56:36] <springle>	 that's quite a fundamental misunderstanding of the wark the consumer works, on my part
[12:56:40] <springle>	 way*
[12:56:48] <qchris>	 I should have stayed up longer :-)
[12:57:28] <springle>	 So I guess even haproxy failover won't work for EL, if any disconnection will mean an outage, however brief
[12:58:07] <springle>	 (since a failover would still potentially incur a $tcp-timeout delay, depending on the client)
[12:58:12] <qchris>	 The other job just finished ... we're having good logs. So we can backfill the missing data if needed.
[12:58:46] <qchris>	 Well ... loosing a few minutes of data during failover is probably better than loosing all data.
[12:59:06] <qchris>	 And we have fallbcak logs that we can use to backfill. For cases just like this :-D
[13:00:15] <springle>	 what does it do if a deadlock occurs or txn fails?
[13:00:27] <qchris>	 Boooom! :-D
[13:00:33] <springle>	 oh
[13:00:39] <springle>	 damn
[13:00:45] <qchris>	 If things fail badly, the service dies, and restarts automatically.
[13:00:59] <qchris>	 Up to 30 times in 5minutes.
[13:01:19] <qchris>	 If it would have to restart more often, it just stays stopped.
[13:02:09] <qchris>	 Yes, that part would benefit from more robustness, but it basically just works. So it's not too bad.
[13:02:46] <qchris>	 Nuria as working on batching the writes to the db. The corresponding queuing up of events and such could help to improve the robustness,
[13:03:06] <qchris>	 but I am not sure how her final design will work ... so I don't know.
[13:03:12] <springle>	 fair enough
[13:04:19] <qchris>	 Thanks for taking care of looking at EventLogging and reverting while I was sleeping ;-)
[13:05:06] <springle>	 hehe.. well it turns out i actively broke it, so I figure thanks is not deserved ;-)
[13:05:33] <springle>	 how does the gap filling reply work? is that easy to trigger?
[13:06:10] <qchris>	 No, you did not break it. You repaired it!
[13:06:51] <qchris>	 To be honest, I have no idea how to backfill. I never did it before. I just know we have all the data in files. I just checked that.
[13:07:35] <qchris>	 Not sure if there is prebuilt machinery for filling in the missing data, but worst case, I can just feed it "somehow" into the database consumer.
[14:54:57] <wikibugs>	 Analytics / Refinery: pagecounts-all-sites files for 2014-11-12T21/1H not getting generated automatically - https://bugzilla.wikimedia.org/73369 (christian) NEW p:Unprio s:normal a:None The pagecounts files for 2014-11-12T21/1H did not get automatically generated [1].  What happened?    [1]...
[15:20:49] <nuria__>	 springle, the batching for now only deals with burst of traffic
[15:21:26] <nuria__>	 springle: but the same principles can be used to delay writes in the presence of  a db error
[15:21:39] <nuria__>	 springle: if you feel that is a must let us know and we can do it
[15:24:42] <nuria__>	 qchris: the all events log in the beta EL machine was last hit in aug: 2415 -rw-r--r-- 1 eventlogging eventlogging 146632 Aug 21 20:56 all-events.log-20140822.gz
[15:33:53] <nuria__>	 kevinator: trying to join
[15:33:59] <kevinator>	 ok
[16:53:47] <wikibugs2>	 Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#21995 (Ottomata) I'm starting the upgrade of analytics1026 now.  Process:  - schedule downtime in icinga - disable pupppet - start udp2log instance running sqstat on stat1002 ```/usr/bin/udp2log --con...
[17:01:01] <ottomata>	 hmm, i think i like phabricator.
[17:10:18] <wikibugs2>	 Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200 (chasemp)
[17:17:34] <ori>	 nuria__: i'll try my best to review the batching patch today
[17:17:40] <ori>	 sorry for the delay
[17:26:03] <nuria__>	 ori: thank youuuuuu
[17:26:18] <nuria__>	 ori: let's talk if things are not clear so we can iterate faster
[17:44:52] <wikibugs2>	 Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22043 (Ottomata) analytics1026 is done.
[17:50:14] <wikibugs2>	 Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22056 (Ottomata)
[17:55:29] <qchris>	 Ironholds: stat1002 again free to use?
[17:55:40] <Ironholds>	 qchris, yep, sorry. Should've sent update >.>
[17:55:49] <qchris>	 no worries. Thanks!
[19:13:40] <wikibugs>	 Analytics / Refinery: pagecounts-all-sites files for 2014-11-12T21/1H not getting generated automatically - https://bugzilla.wikimedia.org/73369#c1 (christian) NEW>RESO/FIX The corresponding partitions did not show errors, but an Oozie job (that is responsible for marking partitions as successful...
[19:34:41] <wikibugs2>	 Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22071 (Ottomata) analytics1003 is done.
[19:47:28] <Ironholds>	 yay!
[19:47:34] <Ironholds>	 ottomata, the cluster is on Trusty?
[19:47:40] <ottomata>	 ha
[19:47:41] <ottomata>	 not yet!
[19:47:44] <Ironholds>	 aw
[19:47:46] <ottomata>	 but i've started the process
[19:49:52] <wikibugs2>	 Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22072 (Ottomata) Going to start on Zookeepers now: analytics1023,1024,1025.  Will do them in that order, one at a time.   Process is simple, just do the upgrade and reboot.
[19:53:29] <Ironholds>	 ottomata, gotcha. Sending out an email?
[19:53:41] <ottomata>	 I'lll let you know when it is done, i'm doing the easy pieces now
[19:53:54] <ottomata>	 the pieces that you don't actually care about :)
[19:55:33] <Ironholds>	 totally!
[19:55:44] <Ironholds>	 but the "reboot" bit may be something cluster users should know before they launch a query ;p
[19:55:59] <ottomata>	 jaja, sure, i'm not doing any disruptive parts right now
[19:56:08] <ottomata>	 I'll schedule the other parts
[20:36:46] <wikibugs2>	 Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22089 (Ottomata) I just finished analytics1023, but accidentally upgraded from zookeeper 3.3.5 -> 3.4.5 in the process.  I had forgotten that we don't use the CDH zookeeper package, but the ones from...
[21:08:09] <ottomata>	 qchris_away: should I merge the /srv/log/eventlogging change?
[21:12:04] <grrrit-wm>	 (CR) Ottomata: [C: 2] Document that filter drops desktop site of wikimediafoundation.org [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/170673 (owner: QChris)
[21:15:30] <grrrit-wm>	 (Abandoned) Ottomata: [WIP] kraken-hive UDFs for parsing user agent strings. [analytics/kraken] - https://gerrit.wikimedia.org/r/96738 (owner: QChris)
[21:19:37] <jdlrobson>	 when does millimetric get back from hawaii tobie?
[21:21:36] <qchris>	 ottomata: That would be awesome!
[21:21:49] <qchris>	 (And if possible also the logrotate thing)
[21:23:21] <qchris>	 jdlrobson: milimetric should be back on 2014-11-19
[21:23:35] <qchris>	 ottomata: Thanks!
[21:23:50] <jdlrobson>	 thanks qchris :)
[21:28:59] <qchris>	 I guess now everything that can get merged around eventlogging is merged.
[21:29:02] <qchris>	 Thanks ottomata!
[21:29:18] <wikibugs>	 Analytics / EventLogging: Beta setup of event logging not working - https://bugzilla.wikimedia.org/73388 (nuria) NEW p:Unprio s:normal a:None There are no events being logged  for event logging in beta since August 22, also looks like EL was restarted about then.  host: deployment-eventlogg...
[21:31:04] <ottomata>	 qchris: this one?  yes or no?
[21:31:04] <ottomata>	 https://gerrit.wikimedia.org/r/#/c/172707/
[21:31:38] <ottomata>	 i think i need to restart eventlogging for the change to have effect, you want to do that part?
[21:31:41] <qchris>	 Oh. You're right.
[21:31:50] <qchris>	 That can get merged too.
[21:32:27] <qchris>	 Logrotate should pick up the change up automatically.
[21:32:38] <qchris>	 But for the others, things need to get restarted.
[21:32:44] <qchris>	 I'll take care of that . Sure.
[21:34:33] <ottomata>	 oh, it moved to the template though, right?
[21:34:33] <ottomata>	 ok
[21:34:52] <ottomata>	 ah, i think we can abandon that
[21:34:55] <ottomata>	 since it is a change to the file
[21:34:57] <ottomata>	 qchris: ? right?
[21:35:00] <ottomata>	 the template already says 90
[21:35:04] <qchris>	 Argh. Sounds right.
[21:36:09] <qchris>	 I'll upload a new change, since the template still says 45 for me.
[21:36:30] <wikibugs>	 Analytics / EventLogging: Beta setup of event logging not working - https://bugzilla.wikimedia.org/73388#c1 (nuria) Looks like code deployed there is pretty old:  commit 395a1b1a9034ba413b7f9886923e08d734cb2ac7 Author: Ori Livneh <ori@wikimedia.org> Date:   Thu May 15 14:59:25 2014 -0700      Check tha...
[21:38:04] <ottomata>	 oh
[21:38:16] <ottomata>	 hm
[21:38:28] <ottomata>	 i thought i saw it at 90
[21:38:31] <ottomata>	 weird. uh, ok'
[21:41:21] <qchris>	 https://gerrit.wikimedia.org/r/#/c/172707/4/modules/eventlogging/templates/logrotate.erb
[22:09:01] <wikibugs>	 Analytics / EventLogging: Beta setup of event logging not working - https://bugzilla.wikimedia.org/73388#c2 (nuria) I updated the code but I think the machine needs a reboot or at least, a re-start of all processes of EL for it to work properly. Looks like harshar is the owner, so it will let him know.
[22:09:17] <nuria__>	 YuviPanda: holaaaaa
[22:10:56] <YuviPanda>	 nuria__: 'sup
[22:10:58] <YuviPanda>	 only partly here
[23:02:16] <bmansurov>	 hey guys, can anyone help me with setting up limn locally? I've followed the installation instructions, but when I run 'npm start' I get the following error: https://gist.github.com/anonymous/6c9c0c7d00ddf10810cc