[00:23:27] nuria, analytics-store/EventLogging Q: what's the normal delay on EL events getting into there? [00:23:43] I'm not seeing anything past 10pm UTC in some of our tables and I'm trying to validate the launch of an A/B test worked [00:23:50] Ironholds: it depends, we were seeing quite a lag as of recent [00:23:55] *nod* [00:25:40] Ironholds: and some tables have more lag than others, the mobile and edit tables are huge and are seeing quite a bit of delay, now, all that data shoudl be in hadoop [00:26:16] yeah, this is WikipediaPortal_14377354 [00:26:19] not tremendously big (afaik) [00:27:04] Ironholds: give me 5 mins and i can look at that a bit [00:27:41] nuria, thanks so much! [00:30:57] Ironholds: it's going to be 15 mins [00:32:47] nuria, np :) [00:43:37] Ironholds: I see max timestamp is 20151208002407 [00:44:05] madhuvishy, yeah, looks like it's come through now :). Thanks for pointing me to it! [00:44:06] that seems right? [00:44:08] (CC nuria ) [00:44:20] Ironholds: np :) [00:44:24] yeah, previously it was 20151207220703 until, well, some time after 10mins ago [00:44:39] aaah [00:44:47] Ironholds: nice, i was working on EL also but on labs, thanks to madhuvishy [00:47:47] thanks madhuvishy :) [00:50:35] Analytics-Kanban: {loon} Refactor Data Dumps - https://phabricator.wikimedia.org/T117141#1861094 (kevinator) [00:52:22] Analytics-Backlog, DBA: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1861101 (Nuria) Ahem ... just tried this: 'mysql_engine': 'TokuDB' on eventlogging on labs and no, table did not get created. But I am not su... [00:53:10] Analytics-Engineering, Wikimedia-Mailing-lists: home page for the analytics mailing list should link to gmane - https://phabricator.wikimedia.org/T116740#1861103 (Dzahn) Analytics list run by kleduc at wikimedia.org, mforns at wikimedia.org [10:56:20] Analytics-Backlog, Discovery, WMDE-Analytics-Engineering, Wikidata, and 3 others: Add firewall exception to get to wdqs*:8888 from analytics cluster - https://phabricator.wikimedia.org/T120010#1861764 (akosiaris) Open>Resolved a:akosiaris ACLs updated. Just tested it from `stat1003` and it... [11:14:36] Analytics-Backlog, Discovery, WMDE-Analytics-Engineering, Wikidata, and 3 others: Add firewall exception to get to wdqs*:8888 from analytics cluster - https://phabricator.wikimedia.org/T120010#1861805 (Addshore) Many thanks! :) [13:46:47] morning a-team [13:46:58] Hi ottomata [14:49:19] nuria: hiyaaaa [14:56:53] halfak: I'm overjoyed at the press coverage you're getting, especially that video [14:57:33] I think it's all really good timing too, because we're just talking about strategy and all that [14:58:05] video? [15:11:06] hey a-team, i'm going to deploy the service EL refactor code to eventlog1001 now [15:11:17] we merged it in yesterday, and its been running in beta for a while now [15:11:37] sounds ok to me, but I'm a little hesitant because we haven't figured out the slowness problem yet [15:11:43] slowness problem? [15:11:46] ottomata: service EL refactor code --> Please rephrase :) [15:11:46] oh with the db? [15:11:49] yea [15:11:55] btw, this is the video: http://shows.howstuffworks.com/now/wikipedia-trolls-video.htm [15:11:59] joal: the eventbus work i've been doing [15:12:14] ottomata: I had guessed so :) [15:12:15] changed some EL internals to support it, mostly around abstracting metadata [15:12:29] thanks ottomata, a lot cleaerer :) [15:13:06] ottomata: fine by me :) [15:31:17] !log deployed new eventlogging code, restarting eventlogging [15:31:39] ottomata: is /etc/mysql/conf.d/analytics-research-client.cnf deliberately not on stat1003? (and only on 1002?) [15:31:51] yes [15:32:09] on stat1002 you use ../research-client.cnf [15:32:16] but you have to be in the researchers group...i think [15:32:27] ahh, yes, I cant read that file [15:32:56] addshore: there is an annoying technical reason for that [15:33:09] but the same creds that are in the stat1002 analytics-research-file will work from stat1003 [15:33:19] thats okay! I'll just keep doing my stuff on 1002 :) [15:33:22] ok [15:33:47] just of course stat1003 is less loaded so I thought I might as well do stuff there ;) but it's not like 1002 doesnt have capacity etc :) [15:35:01] addshore: the same creds will work from stat1003 [15:35:09] its just that the file isn't there [15:35:32] but then I cant just point my script at the file ;) [15:35:36] ja [15:35:48] if you are automatiing something, ya then not ideal [15:46:17] ottomata: holaaa [15:46:42] hiya! [15:47:07] milimetric: i think slowness is likely due to : https://grafana.wikimedia.org/dashboard/db/server-board?panelId=17&fullscreen&from=1440501127031&to=1448280487032&var-server=db1046&var-network=eth0 [15:47:23] milimetric: we should focus on addressing capacity [15:47:52] milimetric: we tried to create tokudb tables yesterday with little success [15:47:53] yeah, nuria, I kind of disengaged from that discussion as it was getting really confusing to me [15:48:18] I'm not sure what kind of storage engine allows you to delete data without saving space [15:48:22] milimetric: ya, we need to talk to jaime cause from his ticket it was not clear to me what we needed to do, now i get it [15:48:26] nuria: am working on a thing, will help you with tokudb in beta after [15:48:38] jaime did finish the upgrade to the version of mysql we need [15:48:55] nuria: if you want, se eif you can manually create a tokudb table with just mysql client [15:48:59] make sure that works there i guess [15:49:02] mysql doesn't give up disk space from deleting internally held data :) afaik [15:49:24] my opinion of that database was already bad... like, this is just insane though [15:50:02] ottomata: will try that, i tried creating one through the app yesterday, give me a sec to send e-scrum cause today i cannot attend [15:50:19] milimetric: it's an "oh ffs" moment for everyone at some point :) [15:50:46] thing is, I've already had at least two of those with mysql. No exagerration [15:51:48] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [30.0] [15:52:28] HMM [15:52:32] everytyhing looks fine though... [15:52:32] hmmm [15:52:57] milimetric: ha same post I read like 2 years ago https://www.percona.com/blog/2013/09/25/how-to-reclaim-space-in-innodb-when-innodb_file_per_table-is-on/ first result [15:53:21] OOok i definitly broke things [15:53:24] whaa [15:53:29] on it... [15:54:16] ottomata: rollback rather than fixing right? [15:54:22] chasemp: yeah, I had read that before when freeing up space on beta, for I guess the same reason. [15:54:41] I just refused to believe that it was unavoidable in a production environment [15:54:55] like, I figured I was just dumb and didn't understand how to config it properly [15:55:50] ottomata: do you want another pair of eyes on EL? [15:56:18] sure, hang on, not sure what is wrong yet [15:56:28] looking in logs on eventlog1001 [15:56:34] i was when i deployed too [15:56:38] everything looked fine... [16:01:35] ok nuria rolling back [16:01:47] k [16:04:16] !log rolling back eventlogging to previously deployed version [16:05:07] oook [16:05:10] looks better now [16:05:15] things flowing through [16:05:17] and backfilling [16:05:42] inserts slow on mysqldb [16:05:43] i think [16:05:51] 2015-12-08 16:05:44,768 (Thread-15 ) Inserted 4845 MobileWebUIClickTracking_10742159 events in 21.089853 seconds [16:05:52] but its going [16:05:56] hm. [16:06:01] ding darnit [16:06:06] this work for days on beta [16:06:09] sigh :p [16:22:18] ottomata: it's called scaling [16:22:47] haha [16:22:49] ottomata: do not feel bad, one thing is functional correctness on test environment other is working at prod scale [16:22:50] uh huh, i don't think thats it [16:22:56] i think something else is funky [16:23:15] the errors were about pykafka not being able to create extra consumers, it looks like it was trying to start more consumers than partitions [16:23:20] but, that may have been a side effect [16:23:38] hm, maybe you are right though, i don't think any of the topics in beta have more than 1 partition [16:23:42] will test that there [16:34:18] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 20.00% above the threshold [20.0] [16:35:05] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015, Patch-For-Review: Clarify Demographics definitions on korma (Attracted vs. time served; retained) - https://phabricator.wikimedia.org/T97117#1862466 (Aklapper) Setting #Patch-For-Review as per https://github.com/Bi... [16:47:17] PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0] [16:48:04] ottomata: i changed EL on beta yesterday for teh toku db [16:48:16] ottomata: FYI so you can undo my changes [16:48:29] aye, just did :) [16:48:42] a-team: I am going to Tuesday's technology meeting, my e-scrum is on e-mail. [16:49:08] thx nuria, happy technologizing [16:49:35] Analytics-Backlog: Make eventlogging use jessie + systemd - https://phabricator.wikimedia.org/T120840#1862485 (Nuria) NEW [16:50:28] ottomata: still mysql consumer on prod not working, or is taht an alarm from before? [16:50:30] *that [16:50:53] nuria: https://phabricator.wikimedia.org/T114199 [16:51:02] its not working? [16:51:08] it looks like its working, watcha mean? [16:51:11] oh [16:51:13] that alarm... [16:51:14] hm [16:51:24] nuria: that may be a consequence of slow backfilling? [16:51:48] yeah, inserted.rate is jumpy now [16:51:52] like it was when it backfilled before [16:52:14] ottomata: hugely SLOw then, 10 per sec is 1 order of magnitude less than what it should be, right? [16:52:45] nuria: i think its the windowing, but yes [16:53:03] it pauses for a few minutes while it inserts? [16:53:04] hm [16:53:08] Analytics-Backlog: Make eventlogging use jessie + systemd - https://phabricator.wikimedia.org/T120840#1862502 (Nuria) [16:53:09] Analytics-Backlog, Analytics-EventLogging: Upgrade eventlogging servers to Jessie - https://phabricator.wikimedia.org/T114199#1862504 (Nuria) [16:53:15] Analytics-Backlog, Analytics-EventLogging: Upgrade eventlogging servers to Jessie - https://phabricator.wikimedia.org/T114199#1687834 (Nuria) [16:53:38] ottomata: taht sounds real fishy [16:53:41] *that [16:54:04] nuria: i thin kit is caught up, based on looking at logs, i think once window passes alert will right itself [16:54:22] not sure though, inserted.rate looks jumpy still [16:55:45] Analytics-Backlog, DBA: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1862512 (Nuria) a:Milimetric>Nuria [16:57:49] a-team, my wife is late, in need to babysit, I'll be late for standup :( [16:57:54] Sorry:( [16:58:01] np joal [16:58:08] joal: np [16:58:32] joal: you can send e-scrum if you cannot attend taht is fine. [16:58:43] at a later time [16:59:36] Analytics-Kanban: Create fake data for beta AQS deployment - https://phabricator.wikimedia.org/T120841#1862546 (Milimetric) NEW a:Milimetric [16:59:47] Analytics-Kanban: Create fake data for beta AQS deployment - https://phabricator.wikimedia.org/T120841#1862557 (Milimetric) [16:59:48] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Patch-For-Review, and 2 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1862556 (Milimetric) [17:26:26] Analytics-Kanban: Create fake data for beta AQS deployment - https://phabricator.wikimedia.org/T120841#1862600 (Nuria) Does this deployment to beta include RESTBase as well? We need both. [17:30:32] !log restarting eventlogging with new code on eventlog1001 (again) (I fixed a config problem) [17:32:59] grrr [17:33:02] still not working... [17:38:39] PROBLEM - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: consumer/server-side-events-log consumer/mysql-m4-master-03 consumer/mysql-m4-master-02 consumer/mysql-m4-master-01 consumer/mysql-m4-master-00 consumer/client-side-events-log consumer/all-events-log processor/server-side-0 processor/client-side-11 processor/client-side-10 processor/client-side-09 processor/client-side- [17:38:43] that's me [17:39:30] Analytics-Backlog: Quaterly review 2016/22 - https://phabricator.wikimedia.org/T120844#1862635 (Nuria) NEW [17:39:48] ACKNOWLEDGEMENT - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: consumer/server-side-events-log consumer/mysql-m4-master-03 consumer/mysql-m4-master-02 consumer/mysql-m4-master-01 consumer/mysql-m4-master-00 consumer/client-side-events-log consumer/all-events-log processor/server-side-0 processor/client-side-11 processor/client-side-10 processor/client-side-09 processor/clie [17:39:50] ACKNOWLEDGEMENT - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0] ottomata attempting deploy [17:40:10] Analytics-Backlog: Gather preliminary metrics of Pageview API usage for quaterly review - https://phabricator.wikimedia.org/T120845#1862643 (Nuria) NEW [17:47:09] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [30.0] [17:48:40] fixed it. [17:48:42] k [17:48:47] makes sense why we never saw this in beta... [17:49:07] RECOVERY - Check status of defined EventLogging jobs on eventlog1001 is OK: OK: All defined EventLogging jobs are runnning. [17:51:46] madhuvishy: burrow works so well now! [17:56:29] ottomata: let's sync up when i am out of meeting [17:56:31] k [17:57:06] nuria, one forgotten config change: https://gerrit.wikimedia.org/r/#/c/257644/ [17:57:06] and one bug: https://gerrit.wikimedia.org/r/#/c/257652/ [17:57:44] phew, ok! lunchtime. [18:01:01] ottomata: waitttt [18:01:12] ottomata: well lemme look at your changes [18:01:57] ottomata: i see another consumer lag email - is everything good now? [18:06:47] RECOVERY - Overall insertion rate from MySQL consumer on graphite1001 is OK: OK: Less than 20.00% under the threshold [100.0] [18:11:42] ottomata: ok, i see , rather than scaling configuration [18:12:08] ottomata: "rather than having scaling issues we were having configuration ones" , somethings never change ... [18:14:55] Analytics-Kanban: Create fake data for beta AQS deployment - https://phabricator.wikimedia.org/T120841#1862797 (mobrovac) >>! In T120841#1862600, @Nuria wrote: > Does this deployment to beta include RESTBase as well? We need both. @Nuria, the blocked task aims at setting AQS entirely up in beta, which inclu... [18:18:07] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 20.00% above the threshold [20.0] [18:35:30] yeah nuria back from lunch if you wanna sync up [18:36:53] Analytics-Backlog: Send burrow lag statistics to statsd/graphite - https://phabricator.wikimedia.org/T120852#1862859 (Ottomata) NEW [18:37:56] madhuvishy: i think all is well [18:38:03] ottomata: awesome [18:39:09] Analytics-Kanban: Create fake data for beta AQS deployment - https://phabricator.wikimedia.org/T120841#1862886 (Nuria) Nice! thanks for claryfing [18:45:32] ottomata: ok, saw changes, one was just a string causing a runtime due to 2 params needed instead of 1 [18:45:59] ottomata: the other was due to puppet config... so things were working on beta cause we do not use puppetization of kafka right? [18:46:18] we do [18:46:20] you mean [18:46:23] puppetization of eventlogging [18:46:28] previously there was an if labs do this conditional [18:46:37] that i forgot needed removed when we deployed this to prod [18:46:54] ottomata: ah, ok [18:47:24] nuria, ja, and we didnt' see that error in beta because likely that log message never happened [18:47:32] since i doubt the blacklisted events even flow through beta [18:49:04] ottomata: they do not [18:49:08] ottomata: cause they are FR [18:50:49] ottomata: i need to read with more care for sure. got it [18:51:09] ottomata: well, nowmonitor logs and throughput and errors for the whole day right? [18:51:16] ottomata: i will do some tailing [18:51:38] ja [18:51:44] ottomata: can we do the toku db stuff in like 30 ins? [18:51:47] *mins [18:58:54] Analytics-Engineering, Community-Tech, Community-Tech-fixes: Add page view statistics to page information pages (action=info) [AOI] - https://phabricator.wikimedia.org/T110147#1863012 (DannyH) [18:59:16] yes [18:59:19] that's perfect nuria [19:01:00] madhuvishy: just curious, have you had a change to look into this at all? [19:02:02] ottomata: look into what? [19:02:14] oops [19:02:14] https://phabricator.wikimedia.org/T118869 [19:03:01] ottomata: aah yes i want to work on this everyday but i get hung up on some other wikimetrics thingy - today for sure later in the day? :) [19:03:33] i dont think there are existing solutions though - we probably have to hook into the setup methods and send data [19:03:50] hehe, ok [19:04:01] aye, yeah i was looking at some tornado code, and I think you are right [19:04:13] there is an application log_request() method that at least has response codes [19:04:16] that it might be ok to override [19:05:09] ah okay, i saw some example code on github - https://github.com/bassdread/tornado-statsd-example/blob/master/app.py - it adds it to prepare and on_finish [19:05:25] madhuvishy: ? http://sprocketsmixinsstatsd.readthedocs.org/en/latest/api.html [19:07:03] ottomata: sprockets is a library on top of tornado? [19:07:59] i guess? [19:08:03] https://github.com/sprockets [19:08:07] it seems to do similar things - initiate on 'prepare', write on 'on_finish' [19:09:16] guess we can do it ourselves without the dependency - i'll test it today - this is deployed on beta right? i can test from there to labs statsd [19:09:43] yes [19:09:58] madhuvishy: actually, its running on deployment-eventlogging04 on beta [19:09:59] not on 03 [19:10:08] ottomata: ok cool [19:11:58] ja madhuvishy either way [19:12:00] this looks pretty simple [19:12:00] https://github.com/sprockets/sprockets.mixins.statsd/blob/master/sprockets/mixins/statsd/__init__.py [19:12:03] could just take the code there [19:12:40] ya alright [19:13:04] or we could just package up the mixin...it looks kinda nice actually [19:13:18] theres more too it in the sprockets.clients.statsd package [19:13:21] either way, dunno [19:20:21] ottomata: ah okay i'll try to use the library directly first then [19:23:52] wikimedia/mediawiki-extensions-EventLogging#516 (wmf/1.27.0-wmf.8 - 706e521 : Tyler Cipriani): The build has errored. [19:23:52] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/706e521347a2 [19:23:52] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/95647583 [19:28:04] J-Mo: re. global metrics, is bytes contributed only ns 0 or all namespaces? [19:28:16] (or anyone here who may happen to know) [19:29:38] not sure, harej. ^milimetric? [19:32:36] ottomata: back, should we tackle toku? [19:34:25] yes! [19:34:25] okkkkkk [19:34:26] step 1, lets make a toku table directly [19:34:37] k , lemme try [19:34:48] k [19:37:10] nuria, create toku is fine for me [19:37:28] show create table PageMove_7495717_TOKU; [19:37:36] hmm ENGINE=Aria [19:37:38] is that normal? [19:38:31] ottomata: my internet is #^%^%#, labs is super slooow [19:39:06] oh, also, it appears my global metrics python wrapper works [19:39:26] https://github.com/harej/globalmetrics/blob/master/globalmetrics.py [19:39:51] you give it data and it returns a GlobalMetrics object. [19:39:55] ottomata: [19:39:57] ahem: [19:39:59] https://www.irccloud.com/pastebin/sCvm9FNC/ [19:41:24] hmm [19:41:26] Aria for both? [19:41:28] what is aria? [19:41:43] ja ok, so that is not right [19:42:40] 2 warnings? [19:42:41] hmmm [19:42:51] where be da warnings...?! [19:45:54] ottomata: aria is mariadb looks like [19:46:10] so the engine="blah" is not taking effect [19:46:35] aye [19:46:41] create table toku_test2 (testing varchar(20)) ENGINE=TokuDB; [19:46:44] would like do know what those warnings are [19:46:49] but they don't seem to go to any logs [19:47:13] do we call jaime? [19:47:25] let me try this on an El slave [19:49:42] ottomata: same thing works on slave: [19:49:45] nuria: sorry, hang on, fixing something for urandom [19:49:49] https://www.irccloud.com/pastebin/xw6ayJWx/ [19:49:59] ottomata: k [19:53:43] harej: I'm a little lost. I didn't know https://github.com/harej/globalmetrics/blob/master/globalmetrics.py was being developed... [19:54:46] Well, it *wasn't*, until around last week. [19:55:34] (also, I got an answer to my question.) [19:57:49] harej: i've been working on it too! [19:58:12] as part of wikimetrics though [19:58:21] yours is probably better than mine :) [19:58:21] mine is deliberately not a part of wikimetrics [19:59:48] harej: hmmm, that was the plan for our thing too I think - to have a public api - since the metrics are already defined well in wikimetrics we were using it. It's currently here - https://metrics-staging.wmflabs.org/reports/program-metrics/create/ [20:00:44] looks like i did something to it though [20:02:04] ok sorry nuria back [20:02:14] sooo, hm it looks like tokudb isn't fully avail, looking [20:06:00] nuria: am going to restart mysql server [20:06:04] inb eta [20:06:07] ottomata: k [20:06:47] haha where the crap do logs go!? [20:08:28] ah, it was disabled.. [20:08:33] 151208 20:08:24 [ERROR] Plugin 'TokuDB' init function returned error.! [20:09:08] there we goOoO [20:09:09] lets see [20:09:26] ok nuria, try now [20:10:08] ottomata: now working, let's try to create it through EL again [20:10:16] ottomata: let me do changes in place [20:10:30] k [20:10:31] go ahead [20:11:04] milimetric: around? [20:12:05] madhuvishy: yeah [20:16:39] ottomata: i modified files on /usr/local/lib/python2.7/dist-packages/eventlogging-0.9_20151207-py2.7.egg/eventlogging [20:16:40] milimetric: staging redis seems to be down [20:16:53] https://www.irccloud.com/pastebin/HffAVXQE/ [20:17:09] weird, checking [20:17:10] nuria: ok [20:20:49] nuria: how's it look? [20:21:05] ottomata: mmm i cannot create a table with either engine [20:21:15] ottomata: but yesterday i did it np with innodb [20:21:20] ottomata: let me redeploy [20:21:31] ottomata: how did you redeploy your code, running puppet? [20:21:37] madhuvishy: looks like something happened to mess up the permissions of the RDB file [20:22:02] milimetric: hmmm after i deployed my changes yesterday - everything was fine [20:22:09] not sure what transpired in between [20:22:14] yep: /srv/redis/wikimetrics-staging1-6379.rdb [20:22:19] weird, yea [20:22:27] nuria: i deployed from deployment-bastion [20:22:35] and then did sudo python setup.py install [20:22:43] nuria: did you restart eventlogging when you did that? [20:22:48] ottomata: yes [20:22:51] seems to me like puppet messed this up somehow [20:22:52] ottomata: to beta labs [20:23:04] maybe the /srv mount got unmounted and remounted or something... [20:23:09] it seems to me this would be broken in prod too [20:23:10] hm [20:23:18] milimetric: oh [20:24:12] milimetric: i think prod is fine - i can run a report [20:24:50] ok, madhuvishy i chowned that file wanna try it in staging again? [20:25:04] looks like it's not throwing an error in the log anymore [20:25:36] ottomata: question [20:25:43] ottomata: all python packages are deployed to: [20:25:46] https://www.irccloud.com/pastebin/pOh0knbB/ [20:26:18] ottomata: or does it run from elsewhere? [20:27:32] afaik, but i'm not sure [20:27:43] nuria, just edit the code in /srv/deployment... and then run sudp python setup.py install after [20:28:30] ottomata1: i just did that [20:29:38] ottomata1: but i am wondering how do we know to run from the newest package in: /usr/local/lib/python2.7/dist-packages [20:30:03] ? [20:30:11] oh! there are bunches in htere [20:30:12] no idea. [20:30:19] i really dislike this global install [20:30:27] for the systemd stuff i'm doing for eventlogging-service [20:30:29] i've ditched it [20:30:34] code is run direclty out of deploy dir [20:31:04] nuria [20:31:09] its in the installed bin wrapper [20:31:15] cat /usr/local/bin/eventlogging-consumer [20:31:25] pkg_resources.run_script('eventlogging==0.9-20151208', 'eventlogging-consumer') [20:31:54] right [20:32:05] madhuvishy: I ran a report on staging, looked ok, seems like everything's working [20:35:32] milimetric: ah okay thanks, i had to step out to get the door sorry [20:35:58] np [20:40:49] ottomata: so now no tables are being created at all [20:41:17] when changing that config in jrm.py? [20:41:59] no, without changing anything [20:42:35] let me see again if it is taking a while [20:42:55] ahhh no i take it back it is just slow, so with innodb it works [20:43:14] ottomata: trying new config now [20:43:19] k [20:51:18] ottomata: success [20:51:21] https://www.irccloud.com/pastebin/keHDM291/ [20:51:27] yeehaw [20:51:29] ottomata: just took a while [20:51:36] so that little change in jrm.py does it? [20:51:43] yes [20:51:50] what's nice, is it looks like it falls back to InnoDB or whatever the default is if Toku is not avail, eh? [20:51:53] as you must have setup teh compression right? [20:52:10] that apparently is default for the wmf mariadb package [20:53:04] let me add that change and we can deploy it for real [20:54:49] k [20:57:30] have 30 min meeting , will do it right after, thank youu for your helpppp [21:06:32] np! :) [21:15:13] Hi all! What's your usual practice for exporting distilled data from stat1002 after a scheduled oozie job? [21:15:33] I'd love to see an example of somebody's scripts [21:15:47] ejegg: sometimes we rsync it to datasets.wikimedia.org via a cron there [21:15:55] but that is for public data [21:16:04] right-o [21:16:11] what's this data? where is it now? [21:16:49] This is counts of hits to donatewiki per hour, for combinations of utm_source, utm_campaign, and link_id [21:16:57] basically, email campaign performance stats [21:17:23] It's de-duped by a contact_id param, but that's not part of the export [21:17:43] I woudn't think the counts need to be secret [21:17:54] but they wouldn't be useful to anyone except the FR team [21:18:40] ejegg: sometimes we use this https://github.com/wikimedia/analytics-refinery/tree/master/oozie/util/archive_job_output [21:18:54] but that just concats and copies files from one directly to another [21:19:02] what format do you want the data in? [21:19:15] Analytics-EventLogging, EventBus, Wikimedia-Logstash: eventlogging syslog message not properly recognized by logstash - https://phabricator.wikimedia.org/T120874#1863786 (hashar) NEW [21:19:39] ottomata: I was planning to just replace \x01 with commas [21:19:48] csv [21:20:01] but I can do that on the consuming side if need be [21:20:33] ejegg: you are inserting into a hive table, rigth? [21:20:40] and this is just stored as text data anyway? [21:20:50] you can make the table write the data as csv or tsv to begin with [21:21:10] ottomata: Oh right, I should do that! [21:21:35] something like ROW FORMAT DELIMITED DELIMITED FIELDS TERMINATED BY ' ' [21:21:42] when you create the table [21:22:00] yeah, that makes sense! [21:22:28] then after updating the counts table, copy that from hdfs to local filesystem [21:22:39] ja [21:22:46] ejegg: you can do that with hdfs dfs -get [21:22:57] or direct cp out of /mnt/hdfs/user/hive/warehouse/ejegg.db/... [21:23:08] but /mnt/hdfs is less reliable [21:23:52] ah, I see lets me run shell scripts via oozie... [21:23:58] Analytics-Backlog, Wikipedia-iOS-App-Product-Backlog, operations, vm-requests, iOS-5-app-production: Request one server to suport piwik analytics - https://phabricator.wikimedia.org/T116312#1863827 (Nuria) Can we get an update on this? cc @joe We expect no support when it comes to uptime of piw... [21:25:36] I'm still not sure how to get it off the box automatically, without leaving a private key unlocked someplace. Not really an analytics problem, guess I just need unix advice! [21:26:15] Analytics-Kanban, DBA: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1863831 (Nuria) p:Normal>High [21:26:44] ejegg: where do you want to get it? [21:26:49] labs? [21:28:01] ottomata: sure, labs would work [21:28:14] ottomata: need to submit after module pointer update correct? [21:28:26] ottomata: after updating new eventlogging service module [21:28:37] ? [21:28:38] nuria: ? [21:28:47] ottomata: ah sorry [21:28:51] ejegg: https://wikitech.wikimedia.org/wiki/Analytics/FAQ [21:28:54] https://gerrit.wikimedia.org/r/#/c/257738/ [21:29:22] nuria: oh its all deployed! [21:29:27] you can merge this a deploy at will [21:29:35] +1 [21:30:27] ejegg: does that help? [21:30:32] nuria: am heading out pretty soon... [21:30:42] maybe deploy that tomorrow? [21:31:48] ottomata: maybe? can the rsync run as part of an oozie job? Would I have to o+w the destination directory? [21:32:06] milimetric: can i see what has been written into redis somehow? [21:32:14] hm, no i wouldn't run that as part of oozie job ejegg [21:32:25] ottomata: is this good info as to how to deploy?https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/How_to#Deploying_EventLogging [21:32:32] milimetric: i'm having difficulty getting this controller test to work [21:32:49] nuria: yes [21:33:04] ejegg: i'd just make oozie do what it does to make the data in hdfs [21:33:10] then, in a cron on stat1002 [21:33:27] run an rsync regularly to sync the dir out of /mnt/hdfs/user/hive/warehouse/ejegg.db/... to labs [21:33:42] ah cool, that makes sense [21:33:44] thanks! [21:33:47] yup! [21:34:04] madhuvishy: wanna chat in the cave? [21:34:04] ottomata: i prefer to deploy this tomorrow morning, does that sound good? [21:34:11] milimetric: sure [21:37:06] yup [21:38:48] laters all! [23:28:15] madhuvishy, milimetric: abit just demoed the wikimetrics UX updates. Awesome work! [23:32:33] fhocutt: thanks :) [23:36:21] madhuvishy: i am looking at wikimetrics, no new pushes since yesterday right? [23:36:36] nuria: no - going to push in a little bit [23:36:50] have some changes post the meeting today [23:43:52] madhuvishy: how long did the last access jobs (the monthly ones) took to run last time? [23:44:06] i don't know how long, but they are finished [23:44:13] the daily ones are up to date too [23:45:01] they are in madhuvishy.last_access_uniques_daily_new and madhuvishy.last_access_uniques_monthly_new [23:45:39] reason for _new is i changed the table's schema and dropped some of the columns like zero and webrequest_source, and added uniques_estimate [23:46:35] code is here - https://gerrit.wikimedia.org/r/#/c/216341/ [23:47:10] daily should have data from oct 22, and monthly should have the nov data [23:47:38] nuria: ^ [23:48:11] madhuvishy: nice! [23:48:34] madhuvishy: you have been updating directly the excell from the table right? [23:48:59] nuria: i haven't looked at the tables yet. which excel? [23:50:00] madhuvishy: i mean from our 1st runs with nocookie, you put together the excell spreadsheet from data already available in the tables right? [23:50:15] nuria: oh yes i did [23:50:27] that data should be in the old tables. [23:50:31] from May [23:51:17] madhuvishy: i will look at this in detail likely tomorrow after i deploy the changes for tokudb [23:51:46] nuria: ya, i can help then [23:51:56] madhuvishy: in teh wikimetrics patch the form accepts usernames separated by commas? [23:52:21] i think it's separated by newline [23:52:28] it'll accept - [23:52:32] cause adding that i get an exception but i can wait to re-test on your newest patch [23:53:23] so [23:53:29] the format it expects is [23:54:06] https://www.irccloud.com/pastebin/cl5bIB2V/ [23:54:31] nuria: so it's understandable that it tried to parse something and it wasn't a valid wiki [23:55:19] madhuvishy: ok, let's talk about your next patch, did you tested non ascii chars? [23:55:33] madhuvishy : encoding in flask is real hard [23:55:45] nuria: no - but this form basically inherits the cohort form [23:55:50] which is tested for all that [23:56:49] madhuvishy: the uploading is but the ui i think still has issues, anyways let's talk after your next patch [23:57:00] okay