[02:15:12] Analytics / Visualization: correct attribution to comScore in monthly report card - https://bugzilla.wikimedia.org/73344 (Kevin Leduc) NEW p:Unprio s:normal a:None WMF displays data form comScore: http://reportcard.wmflabs.org/ Per comScore's data usage policy: http://www.comscore.com/Insi... [02:26:10] Analytics / Visualization: correct attribution to comScore in monthly report card - https://bugzilla.wikimedia.org/73344#c1 (Kevin Leduc) where else is comScore data displayed? [07:36:25] Analytics / Dashiki: Story: VitalSignsUser selects Monthly Pageviews metric - https://bugzilla.wikimedia.org/73331 (Kevin Leduc) [07:41:12] Analytics / Wikimetrics: Story: WikimetricsUser deletes user from cohort - https://bugzilla.wikimedia.org/73350 (Kevin Leduc) NEW p:Unprio s:enhanc a:None A user can legally request to be removed from a cohort. The creator of that cohort needs a simple mechanism to remove that user from th... [07:42:10] Analytics / Wikimetrics: Story: WikimetricsUser deletes user from cohort - https://bugzilla.wikimedia.org/73350#c1 (Kevin Leduc) collaborative tasking on etherpad: http://etherpad.wikimedia.org/p/analytics-73350 [07:50:55] Analytics / General/Unknown: Story: Analyst uses an operationalized Saiku - https://bugzilla.wikimedia.org/73246 (Kevin Leduc) s:normal>enhanc [07:53:10] Analytics / General/Unknown: Story: Analyst uses an operationalized Saiku - https://bugzilla.wikimedia.org/73246#c1 (Kevin Leduc) collaborative tasking on etherpad: http://etherpad.wikimedia.org/p/analytics-73246 [08:58:48] (CR) Hashar: "Adding the py27 env to Jenkins is straightforward: https://gerrit.wikimedia.org/r/#/c/172954/" [analytics/aggregator] - https://gerrit.wikimedia.org/r/172195 (https://bugzilla.wikimedia.org/72740) (owner: QChris) [12:52:52] qchris: wow, I thought I logged that earlier. sorry :\ [12:53:07] Maybe I overlooked it [12:53:10] * qchris checks again. [12:53:20] no, I think you're right [12:54:07] Anyways ... did you restart the consumers by hand, or did EventLogging pick the CNAME change up some other way? [12:54:11] yes, I restarted the mysql-m2-master service, which didn't work. quizzed ori who brought up firewall issues, then reverted [12:54:25] k. [12:54:41] why did that cause it to skip data? jfmi [12:54:42] Ja, I think firewall looks like a good scapegoat. [12:54:56] I tried to connect to dbproxy by hand and that timed out. [12:55:00] (from vanadium) [12:55:39] Because restarting the mysql consumer caused it to try to connect to dbproxy (i guess) [12:55:46] but it could not connect. [12:55:53] so it could not write the events to the db. [12:56:10] so it writes thing synchronously, or not at all? [12:56:12] (But I hope it could write the events to fallback logs ... the check is still running) [12:56:22] right. synchronously or not at all. [12:56:36] that's quite a fundamental misunderstanding of the wark the consumer works, on my part [12:56:40] way* [12:56:48] I should have stayed up longer :-) [12:57:28] So I guess even haproxy failover won't work for EL, if any disconnection will mean an outage, however brief [12:58:07] (since a failover would still potentially incur a $tcp-timeout delay, depending on the client) [12:58:12] The other job just finished ... we're having good logs. So we can backfill the missing data if needed. [12:58:46] Well ... loosing a few minutes of data during failover is probably better than loosing all data. [12:59:06] And we have fallbcak logs that we can use to backfill. For cases just like this :-D [13:00:15] what does it do if a deadlock occurs or txn fails? [13:00:27] Boooom! :-D [13:00:33] oh [13:00:39] damn [13:00:45] If things fail badly, the service dies, and restarts automatically. [13:00:59] Up to 30 times in 5minutes. [13:01:19] If it would have to restart more often, it just stays stopped. [13:02:09] Yes, that part would benefit from more robustness, but it basically just works. So it's not too bad. [13:02:46] Nuria as working on batching the writes to the db. The corresponding queuing up of events and such could help to improve the robustness, [13:03:06] but I am not sure how her final design will work ... so I don't know. [13:03:12] fair enough [13:04:19] Thanks for taking care of looking at EventLogging and reverting while I was sleeping ;-) [13:05:06] hehe.. well it turns out i actively broke it, so I figure thanks is not deserved ;-) [13:05:33] how does the gap filling reply work? is that easy to trigger? [13:06:10] No, you did not break it. You repaired it! [13:06:51] To be honest, I have no idea how to backfill. I never did it before. I just know we have all the data in files. I just checked that. [13:07:35] Not sure if there is prebuilt machinery for filling in the missing data, but worst case, I can just feed it "somehow" into the database consumer. [14:54:57] Analytics / Refinery: pagecounts-all-sites files for 2014-11-12T21/1H not getting generated automatically - https://bugzilla.wikimedia.org/73369 (christian) NEW p:Unprio s:normal a:None The pagecounts files for 2014-11-12T21/1H did not get automatically generated [1]. What happened? [1]... [15:20:49] springle, the batching for now only deals with burst of traffic [15:21:26] springle: but the same principles can be used to delay writes in the presence of a db error [15:21:39] springle: if you feel that is a must let us know and we can do it [15:24:42] qchris: the all events log in the beta EL machine was last hit in aug: 2415 -rw-r--r-- 1 eventlogging eventlogging 146632 Aug 21 20:56 all-events.log-20140822.gz [15:33:53] kevinator: trying to join [15:33:59] ok [16:53:47] Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#21995 (Ottomata) I'm starting the upgrade of analytics1026 now. Process: - schedule downtime in icinga - disable pupppet - start udp2log instance running sqstat on stat1002 ```/usr/bin/udp2log --con... [17:01:01] hmm, i think i like phabricator. [17:10:18] Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200 (chasemp) [17:17:34] nuria__: i'll try my best to review the batching patch today [17:17:40] sorry for the delay [17:26:03] ori: thank youuuuuu [17:26:18] ori: let's talk if things are not clear so we can iterate faster [17:44:52] Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22043 (Ottomata) analytics1026 is done. [17:50:14] Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22056 (Ottomata) [17:55:29] Ironholds: stat1002 again free to use? [17:55:40] qchris, yep, sorry. Should've sent update >.> [17:55:49] no worries. Thanks! [19:13:40] Analytics / Refinery: pagecounts-all-sites files for 2014-11-12T21/1H not getting generated automatically - https://bugzilla.wikimedia.org/73369#c1 (christian) NEW>RESO/FIX The corresponding partitions did not show errors, but an Oozie job (that is responsible for marking partitions as successful... [19:34:41] Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22071 (Ottomata) analytics1003 is done. [19:47:28] yay! [19:47:34] ottomata, the cluster is on Trusty? [19:47:40] ha [19:47:41] not yet! [19:47:44] aw [19:47:46] but i've started the process [19:49:52] Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22072 (Ottomata) Going to start on Zookeepers now: analytics1023,1024,1025. Will do them in that order, one at a time. Process is simple, just do the upgrade and reboot. [19:53:29] ottomata, gotcha. Sending out an email? [19:53:41] I'lll let you know when it is done, i'm doing the easy pieces now [19:53:54] the pieces that you don't actually care about :) [19:55:33] totally! [19:55:44] but the "reboot" bit may be something cluster users should know before they launch a query ;p [19:55:59] jaja, sure, i'm not doing any disruptive parts right now [19:56:08] I'll schedule the other parts [20:36:46] Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#22089 (Ottomata) I just finished analytics1023, but accidentally upgraded from zookeeper 3.3.5 -> 3.4.5 in the process. I had forgotten that we don't use the CDH zookeeper package, but the ones from... [21:08:09] qchris_away: should I merge the /srv/log/eventlogging change? [21:12:04] (CR) Ottomata: [C: 2] Document that filter drops desktop site of wikimediafoundation.org [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/170673 (owner: QChris) [21:15:30] (Abandoned) Ottomata: [WIP] kraken-hive UDFs for parsing user agent strings. [analytics/kraken] - https://gerrit.wikimedia.org/r/96738 (owner: QChris) [21:19:37] when does millimetric get back from hawaii tobie? [21:21:36] ottomata: That would be awesome! [21:21:49] (And if possible also the logrotate thing) [21:23:21] jdlrobson: milimetric should be back on 2014-11-19 [21:23:35] ottomata: Thanks! [21:23:50] thanks qchris :) [21:28:59] I guess now everything that can get merged around eventlogging is merged. [21:29:02] Thanks ottomata! [21:29:18] Analytics / EventLogging: Beta setup of event logging not working - https://bugzilla.wikimedia.org/73388 (nuria) NEW p:Unprio s:normal a:None There are no events being logged for event logging in beta since August 22, also looks like EL was restarted about then. host: deployment-eventlogg... [21:31:04] qchris: this one? yes or no? [21:31:04] https://gerrit.wikimedia.org/r/#/c/172707/ [21:31:38] i think i need to restart eventlogging for the change to have effect, you want to do that part? [21:31:41] Oh. You're right. [21:31:50] That can get merged too. [21:32:27] Logrotate should pick up the change up automatically. [21:32:38] But for the others, things need to get restarted. [21:32:44] I'll take care of that . Sure. [21:34:33] oh, it moved to the template though, right? [21:34:33] ok [21:34:52] ah, i think we can abandon that [21:34:55] since it is a change to the file [21:34:57] qchris: ? right? [21:35:00] the template already says 90 [21:35:04] Argh. Sounds right. [21:36:09] I'll upload a new change, since the template still says 45 for me. [21:36:30] Analytics / EventLogging: Beta setup of event logging not working - https://bugzilla.wikimedia.org/73388#c1 (nuria) Looks like code deployed there is pretty old: commit 395a1b1a9034ba413b7f9886923e08d734cb2ac7 Author: Ori Livneh Date: Thu May 15 14:59:25 2014 -0700 Check tha... [21:38:04] oh [21:38:16] hm [21:38:28] i thought i saw it at 90 [21:38:31] weird. uh, ok' [21:41:21] https://gerrit.wikimedia.org/r/#/c/172707/4/modules/eventlogging/templates/logrotate.erb [22:09:01] Analytics / EventLogging: Beta setup of event logging not working - https://bugzilla.wikimedia.org/73388#c2 (nuria) I updated the code but I think the machine needs a reboot or at least, a re-start of all processes of EL for it to work properly. Looks like harshar is the owner, so it will let him know. [22:09:17] YuviPanda: holaaaaa [22:10:56] nuria__: 'sup [22:10:58] only partly here [23:02:16] hey guys, can anyone help me with setting up limn locally? I've followed the installation instructions, but when I run 'npm start' I get the following error: https://gist.github.com/anonymous/6c9c0c7d00ddf10810cc