[15:00:33] hey halfak, I have time to take a pass at the doc now, but not later. Shall I do so? [15:00:45] Yes. [15:00:59] You beat me to it. I was hoping to do an aggressive run to trim the length. [15:01:16] Could you focus on that and I'll do a followup when you are done for copy editing and all that jazz? [15:01:51] * halfak got side tracked by Research SE task submissions. [15:02:12] sounds good, halfak [15:05:25] Cool. Thank you. [15:11:11] hey halfak, in this section: https://meta.wikimedia.org/wiki/Research:Infrastructure_for_open_community_science#Report_out.2Fsumming_up [15:11:18] the text in yellow [15:11:33] do you still need these questions answered in the proposal? They don't seem to logically follow here, to me. [15:11:45] Yeah. Doesn't need to be there. [15:11:58] I think that we'll want to answer the questions both in he intro and here though. [15:12:40] cool. [15:12:49] So in the report out/summing up, we'll "focus participants on discussing issues/opportunities related to methods, the tech. infra and metadata index" or something like that. [15:13:17] If you want to hack another section, I can take a pass on that one right now. [15:13:23] Or vice versa. [15:21:46] J-Mo, ^ [15:22:08] * halfak watches the sweet sweet watchlist notifications come in. [15:22:23] yeah, halfak, you can hit up "Summing up" and I'll work earlier in the doc [15:22:29] Cool [15:22:31] what do you think re: ditching bios? [15:22:54] I think that's reasonable. [15:23:09] It seemed to me that this was a soft requirement for workshop proposals. [15:23:21] yah [15:26:21] halfak: "As part of the workshop, participants who study data management and science infrastructure will observe, interview, and survey the workshop participants." I thought we agreed that the participants would all be participating, and we'd do the observing. Does this need to eb updated? [15:26:36] Yes. That. [15:26:42] It needs to be updated. [15:27:45] excellent [15:41:53] task-switching now, Halfak. But a) in general, this looks great. I'll probably do a copy-edit pass this afternoon, but mostly minor stuff and all of it will be optional, not mandatory b) I have some notions of where we can cut if we need to, and can perform that work during/after our meeting this afternoons, and c) I bet we don't actually NEED to cut much, unless you anticipate that the conclusion section will need to be. [15:42:04] *will need to be very large [15:42:50] All sounds good. I do not expect the conclusion to be large. Maybe we don't even need one for such a short document. [15:44:15] Thanks J-Mo [15:44:24] np. talk to you in a couple hours [15:44:28] :) [16:50:59] halfak: can haz OAuth consumer approval? This one really is just changing the callback url from the previous consumer: https://meta.wikimedia.org/w/index.php?title=Special:OAuthListConsumers/view/a29aecd8bbbf32db3f23ef9334d1bf3a&name=&publisher=Sage+%28Wiki+Ed%29&stage=0 [16:51:17] previous one: https://en.wikipedia.org/wiki/Special:OAuthListConsumers/view/2df1a77fd9d4fefa8211ec875ced46b1 [18:25:36] halfak: we are talking about https://github.com/halfak/MediaWiki-Streaming [18:25:58] That library is a bit out of date. [18:26:12] I've split up those utilities into the more specific MW packages [18:26:32] so mwxml has the old dump2json utility (now called dump2revdocs) [18:26:57] halfak: links? also the concept more or less - the event bus setup that ottomata is doing, and also maybe a standard for 'change' schemas that unifies dumps and a more augmented RCStream [18:26:57] mwdiffs has the old json2diffs (now called revdocs2diffs) [18:27:02] halfak: aaah [18:27:28] Oh! maybe you were thinking of MediaWiki-Events? [18:27:35] yeahhhh [18:27:40] right [18:27:50] https://github.com/mediawiki-utilities/python-mwevents [18:28:09] https://meta.wikimedia.org/wiki/Research:MediaWiki_events:_a_generalized_public_event_datasource [18:28:22] ^ for diagram loveliness [18:30:51] coool [18:31:13] halfak: I've invited apergos here [18:31:21] responsible for the dumps and stuff [18:32:19] o/ apergos [18:32:23] hey [18:32:30] I probably should have started camping in here a lot earlier [18:32:54] :D it is an awesome channel for camping. Also, I should have been talking to you about dump stuff earlier too :S [18:33:12] I figure you guys are busy with conversations. Let me know if/when you want to chat about stuff. [18:33:35] yeah we're pretty much meeting mode right now [18:33:52] hey I want to suck you into the dumps 2.0 planning session for wikidev summit [18:34:14] there's a ticket somewhere in phab with an obvious name like dumps 2.0 plus pointers to a wiki page and another ticket or so [18:37:57] apergos: yeah, I think halfak would want them in JSONL format :D [18:38:08] json-lines where you get one line per entry of JSON data [18:38:31] so there's lots to say about format but we need to start at a much higher level [18:38:51] +1 [18:38:59] sounds good. I'll make sure I'm subscribed to the task [18:39:04] great [18:39:53] to give you an idea, we want to handle any format and any range of data in tiny jobs that are really really cheap [18:39:56] https://phabricator.wikimedia.org/T114019 [18:39:58] so that if any job fails we just rerun it [18:40:18] maybe we lose 1/2 hour for some particular dump, max [18:41:14] That would be great. Hard to get something like that for one of the big text dumps though. [18:41:17] and parseling out any given dump step (be it sql table or xml + compression or any other thing anyone dreams up) into little pieces that cna be run remoitely on whichever hosts [18:41:28] recombined later [18:41:33] progress reports and logging [18:42:26] incluign live reports (your client at home can ask for current status of x dump on y project and it can tell you, rather like bittorrent, X pieces completed, estimate Y pieces to go, guess Z time left) [18:43:00] and allow adding any format conversion or production in a unified framework [18:43:24] as long as the dump step itself conforms to a specific architecture [18:43:52] include in there: media. incrementals whatever the hell we want. [18:44:01] anyways you'll see [18:44:29] Sounds like a lot of engineering. [18:50:08] yes. [18:50:17] the alternative is the mess we have now. I'm done with that ;-D [18:50:31] yay! [18:50:33] :D [18:50:37] I have at least three levels of "run this in smaller pieces".. we should do it in one place. [18:50:58] * halfak filed a post in the conversation on phab. [18:51:03] Looking forward to the dev summit. [18:52:08] hope they take it [18:57:56] I'm sure they will. [19:03:28] halfak: can haz OAuth consumer approval? This one really is just changing the callback url from the previous consumer: https://meta.wikimedia.org/w/index.php?title=Special:OAuthListConsumers/view/a29aecd8bbbf32db3f23ef9334d1bf3a&name=&publisher=Sage+%28Wiki+Ed%29&stage=0 [19:03:34] previous one: https://en.wikipedia.org/wiki/Special:OAuthListConsumers/view/2df1a77fd9d4fefa8211ec875ced46b1 [19:04:26] Arg! I wish that I could approve things from this link. [19:04:36] One sec. I'll go find the admin interface [19:05:23] ragesoss, {{done}} [19:05:34] thanks much halfak! [19:05:42] n/p godspeed :) [20:03:29] * halfak wins the race to the J-Mo/halfak meeting