[00:00:09] I've built a connector between R and the C++ implementation of ua-parser, and this is a dependency. Womp womp :( [00:04:39] Ironholds: done [00:04:54] woah! Speedy :D. Thanks! [00:07:24] whee, it installs [00:09:55] ...holy hell [00:10:30] ? [00:10:49] I've been switching the Python in this library out for C++ wherever possible and getting 1-4 OOM improvements. [00:10:58] ..this is a 6 OOM improvement in speed. [00:11:04] eeeheeeee. God bless you, compiled languages. [00:11:17] Sure, you segfault and tell me nothing useful when I give you the wrong stuff, but when you work you work /so fast/ [00:53:15] anybody know why there are four eventlogging consumers on m2-master? [00:54:26] not me. I don't know if anyone else is going to be around until Monday :( [00:56:08] ori: happen to hear anything about that^ from qchris ? maybe related to backfilling? [00:56:54] springle: no, but that sounds plausible. we could probably verify that by looking at the invocation.. i'll take a look [00:58:09] springle: yeah, they're being run by qchris and they're reading from a file, so he must be backfilling [00:58:38] ori: ok, thank you [06:28:39] YuviPanda.Hi [07:15:37] Analytics-Quarry: Implement pagination in recent queries page - https://phabricator.wikimedia.org/T75712 (rtnpro) NEW p:Normal a:rtnpro [07:36:30] (PS1) Rtnpro: Paginate recent queries page. [analytics/quarry/web] - https://gerrit.wikimedia.org/r/175400 [07:36:38] (CR) jenkins-bot: [V: -1] Paginate recent queries page. [analytics/quarry/web] - https://gerrit.wikimedia.org/r/175400 (owner: Rtnpro) [07:43:02] (PS1) Rtnpro: Flake8 cleanup in web/utils/pagination.py. [analytics/quarry/web] - https://gerrit.wikimedia.org/r/175402 [07:44:49] Analytics-Quarry: Implement pagination in recent queries page - https://phabricator.wikimedia.org/T75712#777545 (rtnpro) Gerrit review: https://gerrit.wikimedia.org/r/#/c/175402 [09:50:47] rtnpro: hey! Just getting off a flight will look at it shortly, [10:35:04] YuviPanda, no problem, bro :) [14:22:24] * halfak looks around for milimetric [14:22:35] Hmm. Not in the batcave [15:11:50] Analytics-Features: test #2 - https://phabricator.wikimedia.org/T75742#779041 (kevinator) [15:31:50] qchris_meeting: invite me to the hangout manually [15:31:56] i think my hangouts app on my phone will ring [15:31:56] I'll try [15:32:31] mili-metric will help me out :-) [16:03:00] qchris_meeting, pls ping when back ) [16:08:09] YuviPanda, you there? [16:27:35] rtnpro: hey! yup, got home [16:27:51] YuviPanda, :) [16:28:05] YuviPanda, ok, you must be tired [16:28:09] rtnpro: nah, working :) [16:28:12] rtnpro: I'm looking at patch [16:28:18] YuviPanda, cool +1 [16:28:54] YuviPanda, I had some concerns using timestamp during filtering [16:29:13] oh [16:29:14] not ids? [16:29:15] YuviPanda, what if two QueryRun instances have the same timestamp [16:29:21] they definitely could, yeah [16:29:25] YuviPanda, so I preferred using ids [16:29:28] but they won't have the same queryrev ids [16:29:31] and timestamp [16:29:37] but I haven't looked at the patch yet [16:29:54] YuviPanda, agree [16:30:28] YuviPanda, when you have time, give it a look [16:30:55] YuviPanda, please feel free to point out any concerns you have with the patch [16:31:35] YuviPanda, also, I noticed that gerrit failed to update the ticket status in phabricator as it did for the bugzilla ticket last time [16:31:53] YuviPanda, when I sent the patch for review [16:32:04] oh [16:32:09] you've to say Bug: T [16:32:55] YuviPanda, IIRC, I did that [16:32:59] oh [16:33:01] not sure :| [16:34:23] rtnpro: so jenkins gave you a -1. you would have to amend the same patchset instead of submitting a new one [16:34:34] rtnpro: so you just modify, do 'git commit --amend' and then it creates a new patch [16:34:42] rtnpro: git commit shas aren't used in gerrit, change id is instead [16:37:15] YuviPanda, yeah, I will squash the commits into one [16:37:46] rtnpro: cool :) keep the change-id same as the original change [16:38:02] YuviPanda, ok [16:46:23] yurikR: Back. [16:46:25] What's up? [16:47:00] Ah. Ok. I see that you'll email me :-) [16:47:09] qchris, ok, basically - i have another weird one, i will email its number to you, and i need to figure out why its different [16:47:18] the data in limn does not match our calc [16:47:51] the problem is that dan MUST have the same numbers [16:47:59] so somehow i need to generate comparable results :( [16:48:11] so need to figure out what is the difference in counting :( [16:48:21] one issue - the 10.xxx is understood, and can be accounted for [16:48:38] will email you the number, lets pick a recent day, and see how it differs [16:49:11] The pageview definition that you use is different from the one the limn grahps use. I do not think that you can get comparable numbers without using the same definition in both places :-( [16:49:28] But! [16:49:47] The pgaeview definition used for the limn grahps is known to be broken and off. [16:50:02] So is the webstatscollector pageview definition (which you based your counts on). [16:50:44] Anyways ... I'll have a look at your email. Hopefully there is a similar simple explaination. [16:56:22] rtnpro: getting dragged for food sorry. Will do when I'm back [17:05:37] ok i need to go figure out power cord solution, back asap [17:06:01] YuviPanda, np [17:30:17] (PS2) Rtnpro: Paginate recent queries page. [analytics/quarry/web] - https://gerrit.wikimedia.org/r/175400 [17:30:20] (PS1) Milimetric: [WIP] Break down pageview metric [analytics/dashiki] - https://gerrit.wikimedia.org/r/175458 [17:30:29] nuria__: very rough draft, but at least it's working ^ [17:30:50] i have to fix the tests and finish some thoughts but today's all meetings [17:32:02] YuviPanda, updated https://gerrit.wikimedia.org/r/#/c/175400/ [17:55:36] milimetric: is changeset published? man .. i do not see it [17:56:19] if you click on the link above my ping? [17:56:26] https://gerrit.wikimedia.org/r/#/c/175458/ [17:56:48] i can add you but if you don't see changes unless I add you there's a problem [18:01:14] milimetric: all good now, i was not signed in ..... argh [18:04:28] milimetric: i am going to look at changes with mforns in his dashiki code and will get back to yours right after, i can continue those if you want [18:10:23] (PS1) Mforns: [WIP] [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/175467 (https://bugzilla.wikimedia.org/68448) [18:43:55] milimetric: hey! got a minute for PM? [18:44:04] pm? [18:44:27] milimetric: Private Message, that is :) [18:44:56] ha, sure [18:45:11] Hey y'all, in case my teammates cannot help me, apparently the research user changed on the cluster, how can I haz read access to the eventlogging db? [18:50:42] ottomata: I'm told I need to be added to the "researcher" group or something equally silly [18:50:50] It turns out to not be as urgent as I thought, so no worries [18:51:38] marktraceur: ah, yes, puppet change required. I'll let ottomata handle this one though, there has been some vagueness/weirdness around access requests lately even when they aren't access requests [18:51:49] marktraceur: that's correct, [18:51:53] yeah, YuviPanda, this one is weird [18:52:06] because all this access grants is ability to read a file [18:52:27] marktraceur already has stat1002 access [18:52:34] so this doesn't grant any more shell access to anyone [18:52:52] and for those users that previously had this pw and already have access to stat1002, i just put them in th researchers group [18:53:01] because, this isn't escalated access at all [18:53:07] it just now let's us know who actually has this passwor [18:53:08] d [18:54:47] ottomata: yup. and previously we've just merged those for people we knew already had the password [18:58:09] Ta, ottomata [19:13:57] trick question for analytics engineering [19:14:18] how would you go about designing a hive query that'll always pick a seven-day slice of data, even if it crosses year bounds? ;p [19:15:52] a specific 7 day slice of data that crosses a year bound? [19:16:35] the most recent seven days [19:16:46] Product asked for something to give them session data, see [19:16:59] (I may just give it for a 31 day slice. It'll be more computationally intensive but less silly. [19:17:20] ah, like a sliding 7 day window kinda thing? [19:17:35] good trick q! hm. [19:17:51] Ironholds: calculate the days in the last week [19:17:57] and make your where clause only include those days [19:18:09] with ORs [19:18:43] ... (year=2014,month=12,day=31) OR (year=2015,month=1,day=1) OR ... [19:18:50] huh! [19:19:05] does chaining ORs like that bear a cost? [19:19:14] I hadn't thought of breaking it down that granularly, but it makes sense. [19:19:28] i doubt it, in this case, in this case you are only telling hive where to look for dat [19:19:29] data [19:19:43] so, it figures out what to do when it is planning the mr jobs from your query [19:20:02] it uses that part of the where to fiure out how many mappers to launch, and to tell each one where to look [19:20:16] so, i'd assume that [19:20:21] year=2014,month=12,day=31 [19:20:24] woudl be the same as doing [19:20:34] (year=2014,month=12,day=31,hour=0) OR (year=2014,month=12,day=31,hour=1) ... [19:20:35] with ours [19:20:35] cool! [19:20:39] hours specific manually [19:20:42] thanks :D [19:20:42] i'm assuming here [19:20:47] maybe it takes hive longer to plan the job [19:20:53] but i don't think it shoudl take longer to run the job [19:22:15] eeexcellent [20:14:01] (PS2) Milimetric: [WIP] Break down pageview metric [analytics/dashiki] - https://gerrit.wikimedia.org/r/175458 [21:26:44] Hey ottomata. I was getting my homework done on Hadoop and it looks like I found a very straightforward way to do "partitioning" and "sorting" for a reducer in hadoop. [21:26:44] http://blog.tomhennigan.co.uk/post/46330524717/secondary-sorting-flags-for-hadoop-0-20-2-streaming [21:26:52] Have you seen this? [21:26:56] (or something like it) [21:28:38] This makes me happy because it looks like it mimics the argument structure of unix 'sort'. [21:29:03] ori, i haven't! [21:29:13] that looks awesome, you shoudl try it, you know you can set job properties on the CLI, right? [21:29:20] -Dstream.num.map.output.key.fields=2 [21:29:21] etc. [21:29:47] i *think* if you start hive CLI while doing that, it will apply them to the submitted job. or, there might be away from HIVE syntax to set job properties [21:29:54] oops [21:29:55] i mean [21:29:58] halfak: i haven't! [21:30:02] (sorry tlaking to ori in the other chatroom) [21:30:09] No worries. [21:30:33] I'm building up some code to do a quick test :) [21:31:47] If this works the way I expect it to, then you wouldn't even need to store the diff in the revision records. [21:31:53] I could work from the revisions directly. [21:32:02] Then again, storing the diff wouldn't be terrible either. [21:39:52] nm [21:54:48] Then again, storing the diff wouldn't be terrible either. https://gist.github.com/halfak/6d1e7e2191eabeca079c [21:54:59] Woops. Wrong message. [21:55:23] ottomata, I'm having trouble rsync'ing from stat1002 to stat1003 [21:55:26] See gist https://gist.github.com/halfak/6d1e7e2191eabeca079c [21:55:30] Am I doing it wrong? [21:58:11] stat1002.eqiad.wmnet [21:58:22] Yeah. I tried that. See the second command in the diff [22:03:50] hm [22:04:29] o/ ggellerman [22:06:02] halfak: weird, i see the problem... [22:06:10] \o/ [22:06:18] one min.. [22:06:20] kk [22:08:00] hm., ok [22:08:58] ok, halfak, i know what is wrong,a nd it is very annoying, but there is a workaround [22:08:59] so. [22:09:05] this is a /srv vs /a problem [22:09:08] some tech debt. [22:09:10] Gotcha [22:09:17] historically, stat servers used /a [22:09:28] we would prefer to depricate that, and use /srv [22:09:37] but, this change happened in the middle of provisioning these new stat servers [22:09:46] anyway, i could fix this by making a /a module writeable on stat1002, but [22:09:52] the easiest thing to do (if you don't mind) [22:09:58] would be to pull from stat1002, rather than push from stat1003 [22:10:11] on stat1003, /a -> /srv [22:10:16] so, you should do [22:10:38] stat1002$ rsync -rv stat1003.wikimedia.org::srv/path/to/file /a/path/to/fiel [22:10:44] Oh yeah. No worries. That should work just fine :) [22:10:47] ok cool [22:10:49] yeah, sorry about that [22:10:50] I don't know why I didn't think to try that [22:10:57] Thanks for taking a look :) [22:10:59] yu [22:11:01] yup [22:18:48] does anyone know how to reference a phabricator bug from a gerrit commit msg? [22:19:02] Check the Commit message guidelines :-) [22:19:17] https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines [22:19:26] It's "Bug: T4711" [22:19:29] milimetric: ^ [22:19:31] woo! [22:19:36] thx qchris [22:19:39] yw [22:19:57] that's the last place I would've thought to look because it's by far the most logical and sensible [22:20:03] * milimetric forgot that qchris worked on this - doh [22:20:20] Hahahaha :-P [22:21:26] (But the bot's session broke during the weekend, and the bot didn't automatically get a new session. So if it is not working, it might be that the bot is choking. I could not fully reproduce the issue.) [22:35:47] (PS3) Milimetric: Implement breakdowns for metrics with CSV data [analytics/dashiki] - https://gerrit.wikimedia.org/r/175458 [22:36:03] we'll find out ^ :) [23:10:54] laters, i'm out til monday, have a good TG all! [23:11:06] ottomata: Enjoy your week off! :-D [23:12:14] (PS4) QChris: Implement breakdowns for metrics with CSV data [analytics/dashiki] - https://gerrit.wikimedia.org/r/175458 (owner: Milimetric) [23:15:31] qchris, yt? [23:15:34] yup. [23:15:48] regarding https://gerrit.wikimedia.org/r/#/c/175338/2 (the mysql consumers) [23:17:32] qchris: i do not understand of the new code which is the one that is closing the connection [23:17:51] qchris: is the clean(er) exit of the thread that terminates it? [23:18:05] oh yeah, people get holidays [23:18:29] nuria__: The worker.stop and worker.join in https://gerrit.wikimedia.org/r/#/c/175338/2/server/eventlogging/handlers.py [23:18:38] Signal to the writing thread that it should stop. [23:18:53] There is no connection that needs closing. [23:19:22] ... I mean ... there is the database connection. But closing that is something different. [23:19:35] qchris: so it was not the "connection hanging" but rather thread was idle [23:19:41] w/o work [23:19:48] qchris: is that so? [23:20:18] Yup. That's what https://gerrit.wikimedia.org/r/#/c/175338/1//COMMIT_MSG says. [23:20:45] But I mean ... prior to that change, the connection did not closed, [23:20:52] just because the thread was idling. [23:22:37] qchris: ahhhh...ok, i understand now. That was on the "regular" case (no exception) [23:23:24] Yup. [23:23:25] qchris: thread executed his workload (with the given connection) and in the case everything went well it idle and thus retain the connection [23:23:35] Right. [23:24:04] qchris: in the case of an exception however, both the main process and thread were killed so no issue was happening [23:25:08] That depends a bit on which thread the exception happens. [23:25:28] Python + multithreading + exceptions is a bit tricky [23:25:41] (Or at least it always was for me, when I had to deal with it) [23:25:41] qchris: i was thking child thread [23:27:17] qchris: ya... learning a lot about this [23:27:34] I think the program would exit in that setting. But I guess each thread would exit for different reasons. [23:27:43] just try it out. [23:27:46] :-) [23:27:59] qchris: i did already [23:28:14] All the better! [23:28:44] qchris: i tested on beta labs a bunch (the exception case), did not look at resources of regular case though [23:29:11] Cool. [23:29:13] qchris: there is something i do not understand though... shouldn't the code in vanadium now be running out of [23:29:25] connections ? [23:29:46] Running out of connections? Why should it? [23:29:48] qchris: exausthing the threadpool for mysql? [23:30:03] You mean the server-side thing that happened a few days ago? [23:30:31] qchris: no, i mean now, connections are being held by the idling threads being created [23:30:48] after they do their work, they are being left open, correct? [23:31:34] The mysql consumer spawns a worker thread. But this worker thread currently does never finish with its work. [23:32:00] So there should only be one mysql worker thread on vanadium. [23:33:18] The issue with the idling threads only becomes relevant if you spawn lots of mysql-consumers. [23:33:42] For backfilling, I spawned a separate mysql-consumer for every packet of 64K events. [23:34:17] In order to batch them, I needed to make sure that the first exits correctly before the second started. [23:34:32] That's why I care about the mysql-consumer exiting properly. [23:35:28] But in the usual working mode of EventLogging on vanadium, whether/how the mysql-consumer exits is not much of a concern. [23:35:50] qchris: i see. How can i see the spawned thred in vanadium? >pa auxfw does show a child process of the mysql consumer [23:36:11] qchris: Does *not* show a child process [23:37:46] nuria__: Run pstree -lpa | grep -A 3 [m]ysql [23:46:32] qchris: I see