[16:17:48] brion: I need to get DB dumps for the optin_survey tables on all wikis ( nkomura wants the data aggregated), but that info is not available through the toolserver. Could you 1) get me those dumps or 2) give your OK so someone else can get them for me without fear of your wrath? [16:18:09] Or 3) just give me shell access :D but I'm not sure what the policies around that are [16:18:23] hey brion, welcome back! [16:18:30] howdy :D [16:19:16] moment [16:27:33] RoanKattouw: ok running em [16:41:13] RoanKattouw: can you set me up an account on prototype box and i'll scp it over [16:41:40] brion: I want them on the toolserver anyway, don't you have root there? [16:41:53] ehhhh in theory but it's probably changed a few times ;) [16:41:59] haven't logged into ts in years [16:42:20] not even sure which hostname is the login host [16:42:25] but i might have a working account :) [16:42:38] nightshade.toolserver.org [16:42:40] Your account has expired; please contact your system administrator [16:42:51] heh yeah it expires like every 3 months [16:43:04] Or 3 months after last usage, rather [16:43:09] lemme try root ;) [16:43:49] nope, neither my nor root keys logs in there [16:43:52] Hm [16:44:03] I'll create you an account on prototype, gotta run after that [16:44:48] brion: You already have an account on prototype [16:44:55] Good luck figuring out the password to that one ;) [16:44:57] *RoanKattouw has to go now [16:45:16] aha i do have it :D [16:47:32] RoanKattouw_away / TrevorParscal -- optin_survey sql dumps in my home dir on prototype server [16:48:15] sweet [16:48:41] hi brion [16:48:58] how's it going? did you get sick too or just take some time off? [16:49:17] amazingly i seem to have come home healthy for once ;) [16:49:29] took a couple days off afte the conf for extra sightseeing with some of the gang [16:49:36] got back yesterday midday [16:54:37] right on [16:54:51] did you see my summary of the openweb irc meeting? [16:55:39] nope. linky? [16:55:42] *Fflapjac demands to know brion's current password [16:55:48] I emailed it to you [16:56:28] It's just a wrap up of what has gone down in the meeting and past 2 weeks [16:57:04] I figured you were way too busy to be keeping up by reading all the mailing list and irc backlogs [16:58:39] discussion has been taking place here though -> http://groups.google.com/group/openweb-group?hl=en [17:17:25] Brion ... good to hear you had fun ... I realised that I have only seen seven streets in BA [17:17:43] hehe [17:18:41] Jet lag was a bummer on the way back... am still recovering from that one [17:22:21] i somehow did ok :D slept on the overnight flight from BA to washington dc, then ended up feeling pretty normal once i reached california [17:22:37] did wake up a little early but that just means i had a leisurely breakfast :D [17:33:49] i was up 4am yesterday! [17:33:55] it got better today [17:34:45] he he [18:34:16] TrevorParscal: remind me our current schedule on babaco? [18:34:48] i think we are delaying a bit [18:35:06] we are having an outside contractor do some rigorous testing and bug reporting [18:35:18] and we want to have that done before deployment [18:35:26] also, my wife may pop at any moment [18:35:54] super [18:35:55] :D [18:35:57] a little earlier than originally expected - she's at a doc appt. right now we will know more soon [18:36:17] yeah, she's full term, but last time she was a week early, and this time is likely the same [18:36:19] so yeah... [18:36:28] just working on stablizing the features [18:37:11] have you seen the prototypes lately? the features we are actually going to deploy are explained in detail here ... http://usability.wikimedia.org/wiki/Releases/Babaco/Compatibility_Matrix [18:37:20] brion, yeah the babaco release schedule needs to be reviisted [18:37:42] trevor, nimish and roan have a quite a bit of code review request in the queue [18:38:00] the dialogs are stable [18:38:04] ready for review [18:38:10] and the toc plugin as well [18:38:40] more review intensive will be the click tracking stuff I think [18:38:53] it touches the database and API [18:39:16] my daughter is watching the same cartoon over and over [18:39:17] i would like to revisit the date based on QA feedback and brion's code review feedback [18:39:28] 9/9 is not attainable at this point [18:39:31] sounds good to me [18:40:12] *nod* [18:41:24] i'm mostly squashing bugs today... [18:42:34] TrevorParscal: may i remind you search suggest for new search box will be a big piece too? :) [18:42:40] squish squish squish :D [18:42:53] yes [18:43:00] I will work on that too today [19:10:05] brion, do you know if we have data dumps for ALL wikipedias? [19:13:15] nimish_g: the general data dumps, or the optin survey dumps i made this morning? [19:13:48] several wikis were missing the optin_survey table: http://pastebin.com/d2ec7c1a8 [19:14:29] the general data dumps [19:17:52] nimish_g: yeah, they exist for everything i believe. (except full-history for enwiki) [19:18:06] http://download.wikimedia.org/backup-index.html <- status looks clean [19:22:46] *brion advocates for lunch [19:23:24] i'm @ home [19:23:49] brion, I'd be down [19:23:57] woot [19:24:11] nkomura: lunch? or are you still recovering @ home too :D [19:24:54] i'm in the office [19:24:57] sure [19:25:12] i have 1:30 mtng at stillman though [19:26:13] brion: i think the Vietnamese place is open [19:26:19] if i'm not mistaken [19:27:16] \o/ [19:27:19] let's totally do that [19:27:25] meet you there in a few mins? [19:27:32] sure! [19:29:07] wheeeee [19:30:32] *MC8 has never seen anyone invite someone else for lunch via IRC before :P [19:35:12] you don't hang in this channel much then :) [19:35:31] No, I obviously don't [20:09:38] TrevorParscal: The link dialog is NOT stable, I'm working on adding the namespace dropdown and title suggestions [20:09:57] k [20:16:08] well, that's just an extra feature [20:16:17] it doesn't really mean the code isn't reviewable [20:16:31] it just means once that feature is added that patch will need review too [20:18:34] Yeah [20:18:41] Depends on what you mean by 'stable' I guess [20:19:22] not broken? [20:19:37] the other meaning would be "not changing" [20:19:44] i can see both meanings [20:20:10] I meant to communicate that it's ok for him to get started on review of the code since it's not changing that much between now and deployment, and what is there is working [20:20:19] sorry if it was unclear [20:20:32] going to lunch with wife and daugher [20:20:38] bb in a bit [20:20:41] Ah yes, it's no bigge [20:20:43] Enjoy [20:20:48] :) [20:24:52] brion-lunch: Thanks, now I can start analyzing all that :) [20:26:14] RoanKattouw die zwitserse toestand heeft de twee partijen nu aan het praten .... [20:26:33] *RoanKattouw moet nog mail lezen, moment [20:26:47] al is het alleen maar dat ze beter van elkaar weten waar ze staan is het genoeg [20:28:33] whee [20:29:03] brion: here's the code I was talking about... as far as I can tell it's not saving any information that should cause malloc() to fail after several million loops, but it does, and I'm still pretty convinced it's PHP [20:29:04] http://pastebin.com/m3a3e87ce [20:29:11] *brion peeks [20:33:05] do you profiling on, or anything like that that might be storing extra data in-memory? offhand i don't see anything that should obviously leak, but i would make a few general recommendations: [20:33:12] 1) consider using an actual xml parser :) [20:33:51] XMLReader class should be very nice for this, as it lets you write your code fairly linearly, pulling through bits rather than the sax-based events stuff of the plain 'xml' module [20:34:39] 2) the way you're saving to the database looks pretty inefficient, running an 'insert on duplicate key update' for every individual item [20:35:02] the update adds to the count in the DB [20:35:03] you can probably make it a hojillion times faster by batching things up [20:35:31] as for the memory issue, check whether you get the same problem if you comment out the actual query [20:35:56] so if we find "user 1, 9/2/09", we go count++, otherwise we create a new entry [20:36:13] also it looks like you're wrapping everything into a single transaction? [20:36:34] db server prolly won't like that very much [20:36:47] I do, I tried it with just counting the number of revisions (which works), then I tried it with just counting revs and then finding the timestamp and doing ; with it, which fails [20:37:17] i might consider saving initially into a table of (timestamp,user_id) pairs then let the db create the aggregates [20:38:54] lemme peek at Database class to see if anything extra might get saved when we're doing the queries [20:38:58] hm, ok, like in a temporary table somewhere? [20:39:00] ok [20:40:26] I'll try again without the querying, and I just upgraded my PHP to 5.3.0, we'll see if that does it [20:40:55] wonder if you can get a breakdown of resource allocations with xdebug... [20:45:54] I haven't used that before, I'll ask google abt it [20:48:45] comes in very handy at times :D [20:48:53] especially for profiling, but it's got lots of other debugging features [20:49:02] can help track down memory leaks i think [20:54:27] sans DB write and using php 5.3, it seems to be working great =) [20:54:46] heh [20:54:58] could still be something leaking with all the db writes... [20:55:23] yeah [21:16:25] brion: still a little premature but I think I'm gonna say PHP 5.3 FTW [21:16:45] whee [21:16:59] now...do we have that in the clusters? =) [21:22:37] not yet :) [21:23:18] hm, well, that shouldn't be much of a problem, this script just has to be run once ever for every project [21:23:30] so it doesn't need to run on the clusters so much [21:56:44] nimish_g: how's it going? [21:57:36] in general or the data processing? [22:03:26] the php memory intensive stuff first, then life.. [22:03:31] :) [22:03:59] php 5.3 = the win [22:04:12] btw - while you were at wikimania, I became a huge That Mitchell and Webb Look fan [22:04:23] thanks for turning me onto that [22:04:48] the soccer team fan one was really awesome - have you seen that one? [22:04:55] I'm running ALL the data through it just to be sure, but it's running OK, I gave it 2GB of ram but it didn't even seem to need it [22:06:01] thing is we don't have PHP 5.3 in the clusers. we don't need to b/c the script only needs to be run once ever. and no, I haven't seen that one yet! link me! [22:25:21] http://www.youtube.com/watch?v=xN1WN0YMWZU [22:36:44] win