[00:21:13] Ir :( [00:21:26] pm [00:21:31] * greg-g nods [00:21:31] he's reachable that way [01:46:00] halfak|Mobile: am loading the big dataset into it now [01:46:06] have split it into 16different bits [01:46:08] and am loading them in [01:52:42] halfak: ^ [02:02:04] hi milimetric, i see you're now in charge of structured data? [02:06:45] yuvipanda, why do you split it up into bits? [02:09:00] harej: haha, yes, the data shall be structured! sit data, sit. good data [02:09:22] okay! i have an idea for you, and it is totally my idea: structured data for commons [02:10:10] hah [02:11:05] :) yeah, there are lots of proposals, I guess I'll do an RFC at some point but I wanna talk to Yuri more first [02:11:22] maybe let this quarter end and get some of this organizational insanity behind us emotionally [02:11:41] As for me, I have too many ideas and not enough staff. [02:12:26] ("staff" broadly defined to include volunteers) [02:13:04] halfak: because I'm pretty sure importing the whole thing in one go into mysql would end badly. [02:13:38] halfak: and also less monitorable, etc [02:27:03] yuvipanda, OK. I import much larger things into MySQL all the time --- but those are analytics slaves with fewer users. [02:28:55] halfak: yep [02:29:01] halfak: how long do those take? [03:17:26] yuvipanda, sorry to not respond to your question. It depends on the size of the dataset, but I would expect this one to take less than an hour. maybe 10 minutes? [03:17:32] Hard to say without testing directly. [03:17:43] ANyway, I like your method just fine. Was more curious. [03:17:44] :) [03:18:05] I'm going to grab dinner and head to hotel/bed. Have a good one folks! [03:18:07] o/ [16:14:34] halfak: hey; what's the best reference for your paramecium metaphor? I know you've presented it a few times and I'm wondering if you have a written version. [16:15:15] guillom, regretfully, I don't have a nice written version. [16:15:28] * halfak adds a task in phab [16:16:25] halfak: Ok; Then I'll have to have a chat with you later :) I have a few questions about what inspired it. [16:16:46] https://phabricator.wikimedia.org/T127978 [16:16:49] :) sounds good. [16:17:00] Geat! [16:17:03] Great, too.e [16:17:09] Typing is hard. [18:03:40] halfak: am importing the whole thing now, to see how that goes [18:03:56] halfak: I found that importing 1/16th piece took same amount of time as importing a 1/4th piece?! [18:03:58] OK. I noticed that there were no rows in the DB this morning :) [18:04:06] so I truncated them all [18:04:18] yeah, I'm not too surprised. I think that mysqlimport does some optimizations for large imports. [18:04:20] halfak: also, start/end look like TIMESTAMPS, but we're using varbinary [18:04:25] is that ok? [18:04:26] E.g. rebuilding the index at the end rather than for every row. [18:04:27] halfak: yeah [18:04:40] yuvipanda, yeah. MediaWiki uses varbinary for timestamps. [18:04:42] halfak: indeed, that's my hypothesis too (Index relatd) [18:05:11] halfak: I'm also 99% done on 'fork' functionality, btw [18:12:32] yuvipanda, "fork" functionality? [18:12:49] Oh, BTW, I am struggling to use PAWS. I think I'll need to do the demo this evening from my laptop :/ [18:13:05] Getting 503s. [18:13:19] I'm set though. Just wanted you to know why I won't be using PAWS [18:13:52] halfak: yeah, is ok [18:14:00] halfak: I haven't fully stablized it, nor had the time to [18:14:11] No worries. make sense. Vacation and all ;) [18:14:27] Did you see the emails from AJ/Kristen re new datasets? [18:14:30] halfak: can you tell me which account you used and when you got a 503? [18:14:37] EpochFail [18:14:40] ok! [18:14:43] I'll investigate late [18:14:46] halfak: haven't yet [18:15:00] I'm hoping to have those cleaned up and ready for you to load by OED. [18:15:02] *EOD [18:16:19] halfak: awesome [18:16:25] halfak: hopefully none are too big :) [18:18:10] halfak: may I ask you mention PAWS during your talk even when not using it? :D [18:22:12] Sure! [18:22:45] ty [18:25:06] halfak: goddamnit, so you got a 503 because one of the underlying labs instances just died. [18:25:19] * yuvipanda feels vaguely infuriated about us still not having this be solid :( [18:26:11] * yuvipanda flips tables [18:26:33] halfak: do you have any updates for SoS? [18:26:44] Hey schana [18:26:54] Yeah. Still blocked on Ops for ORES in production. [18:27:06] This is now also blocking the deployment of ORES extension to fawiki and wikidata. [18:27:08] * yuvipanda gives up on PAWS for today / CSCW [18:27:38] * halfak helps yuvipanda put tables back and offers hugs. [18:28:07] * halfak has been talking to a lot of people about PAWS recently. [18:28:10] It's going to be awesome. [18:28:13] yeah [18:28:21] but not so much if it keeps falling over all the time :( [18:28:28] Just needs some work. [18:28:31] if I can't depend on VMs to not fall off... [18:28:36] the problem isn't in PAWS itself [18:28:39] anyway [18:28:49] Also some backfilling for Yuvipanda moving on to web-based open infra [18:28:49] thanks halfak [18:29:07] halfak: I saw the other datasets email, if you can put them into a form where you have a table schema, I can help do imports [18:29:28] halfak: for smaller ones I can even give you rights to do imports [18:29:37] schana, if you could ask someone (probably akosiaris) to contact me about status re. Ops & ORES deployment timeline, that'd be really helpful. [18:29:57] I'm still not sure how people expect me to engage with them. [18:30:15] yuvipanda, either way is fine with me. [18:30:18] um normally ops is pretty phab responsive [18:30:44] halfak: alex is on vacation [18:32:12] apergos, gotcha. In this case, it seems the work has fallen through the cracks [18:32:31] Re. alex being on vacation, does that mean the work just doesn't happen or does someone else take over? [18:32:38] yuvipanda, ^ [18:33:08] good Q :) [18:33:13] I'm not sure if my ask is even "work" or "volunteer" time for opsen. [18:33:15] depends, is the answer. [18:33:21] halfak: neither are we [18:33:26] lol [18:33:28] well, neither am I [18:33:31] This is the biggest blocker [18:33:36] I think I can use 'we' there [18:33:38] yeah [18:33:47] I counted it as my 'volunteer' time [18:34:02] Yeah. The transition from skunkworks project to supported project sucks. [18:34:36] partially we've never done it for python before... [18:34:41] so we aren't sure how to handle dependencies [18:34:50] and so it isn't a cut and dried 'follow these steps' [18:34:54] as much as 'let us figure this out' [18:44:05] halfak: I forwarded you an email [18:47:22] halfak: word is to try contacting Faidon [18:47:56] kk thanks schana [18:48:02] yuvipanda, looking [18:50:24] yuvipanda, one thing I want to make sure is totally clear. I don't want to complain or push people harder than makes sense for support here. I want to learn the norms/process/pattern and do what ops needs me to do to get these things done. [18:50:47] I think we had/have a good dynamic of you saying "I need you to " and me working that out. [18:50:55] I'd like to keep that going. [18:51:16] halfak: +1 I totally agree [18:51:52] halfak: I think it's just bad timing and some unclear responsibility shifting around - somehow I had assumed that alex would do the final prod deployment, but then his time off happened and then I got distracted... [19:01:23] yuvipanda, kk. If the situation is "good" but "we'll need to wait a bit", that's cool with me. [19:01:26] Just want to know. [19:01:30] :) [19:18:21] halfak: yeah :) [19:19:27] halfak: not sure if I mentioned, but dataset is fully loaded [19:19:34] halfak: do do some sample queries to check speeds [19:19:41] halfak: I'll load them into another db as well, as backup [19:26:33] OK. it looks like I have spent about 445 hours editing English Wikipedia. http://quarry.wmflabs.org/query/7555 [19:29:10] halfak: cool! [19:29:11] yay [19:30:49] Krenair: do you think you can help review https://gerrit.wikimedia.org/r/273030 [19:31:01] Just updated the query. It looks like Ironholds kicks our butt! [19:31:15] busy right now, sorry [19:31:20] will look in an hour or so [19:31:37] halfak: cool! [19:31:44] halfak: that patch will allow 'forks' [19:32:09] Oh! Quarry forks! [19:32:54] This one is the real test: http://quarry.wmflabs.org/query/7560 [19:33:06] Total monthly labor hours. [19:33:41] or maybe two [19:34:47] Krenair: ok :) [19:37:16] yuvipanda, what if I said I wanted to load in a dataset that is about the same size as enwiki's revision table? [19:37:24] * halfak prepares to dodge attacks [21:34:42] halfak: what if I told you I checked and that the one we just finished loading was already the size of enwiki revisions table? [21:35:21] That's not possible. [21:35:24] How is that possible [21:35:39] revision table has more columns that are larger and it has 600 million rows [21:35:54] Maybe a different storage engine? [21:36:43] halfak: nevermind [21:36:50] halfak: | revision | 78693.50 | [21:36:54] revision table size in MB [21:36:59] | enwiki_sessions_20150801 | 11207.70 | [21:37:10] That's almost an order of magnitude smaller :P [21:37:13] I was counting uncompressed TSV size of enwiki_sessions [21:37:19] vs mysql size of revisions [21:40:42] So, the table I want to have you load would be revision + archive + 3 additional columns. [21:40:58] Could certainly be too much, but I thought I'd ask. [21:41:12] let's just say I'd prefer to not do it :) [21:41:14] The additional columns are a VARBINARY(14) and two UINT [21:41:19] OK. Understood. [21:41:21] we can give it a shot [21:41:25] Not critical [21:41:28] once we've other things done [21:41:30] cool [21:41:40] I'll focus on the datasets that Missouri crowd found. [21:41:49] halfak: ok! [21:41:58] halfak: I'd also like us to load that into the two remaining working labsdbs [21:42:06] halfak: also did the queries you were running run within the 30min limit? [21:42:16] halfak: and how many people are at the workshop? [21:43:57] yes. Queries finished. 25 people in the workshop [21:44:07] halfak: ok! [21:44:17] halfak: I'm curious what'll happen if we have 25 of those running at the same time :D [21:44:48] heh. Fun tests [21:45:45] halfak: what I think we should do is to simulate that earlier on and see what happens [21:45:56] have ten runs simultaneously and see how that goes [21:46:08] Can do that. [21:47:02] halfak: thank you! [21:47:32] halfak: am going to deploy 'fork' now [21:57:43] halfak: I forked your query! http://quarry.wmflabs.org/query/7568 [22:01:52] yuvipanda, oh, you merged it [22:01:53] ok.. [22:09:43] Krenair: yeah, sorry :( [22:09:51] on and off, sneaking time away when gf is doing other things...