[12:48:04] Hey hey science people. [12:54:40] what science people? [12:55:00] :) [12:55:40] Hey YuviPanda. [12:55:44] How's them mountains? [12:56:21] halfak: AWESOME [12:56:26] halfak: been doing some 'manual labor' [12:56:37] helped build a compost box, then made food for everyone [12:56:43] by food I mean sandwiches and cocktails but... [12:57:29] * halfak would be quite happy with sandwiches and cocktails. [12:57:46] halfak: most people here are! We experimented a lot with the sandwiches as well [12:57:50] and with the coctails too [12:57:56] *cocktails [12:59:01] * halfak has to run to the first meeting [12:59:05] o/ [13:05:03] halfak: have fun! I've none this week. You guys should reduce too :) [13:30:49] YuviPanda, I've been considering the meeting load recently. [13:31:18] I think I can draw a divide between "work meeting" and "meeting meetings". [13:31:46] Work meetings are about a subject and progress is the goal. It's more like a hackathon via talking. [13:32:24] Meeting meetings, are about communication & negotiation... [13:32:44] halfak: hmm, do you have a % split on either? [13:32:46] I usually don't show up to a meeting meeting with an idea that I want to test out in someone else's brain. [13:33:20] Today, I have 4 work meetings and 2 meeting meetings. [13:33:39] are they all 1h each? 30m? [13:34:05] usually 1 hour. [13:34:11] 80% 1 hour [13:34:26] 15% 30m and 5% > 1 hour [13:34:28] hmm, so 6h of meetings. [13:34:34] Yup. [13:34:50] so I guess one of the solutions is to just... hire more people [13:34:58] but then meeting overhead might increase [13:35:27] I don't think we're dealing with Brooke's law here. Many of my meeting are not with WMF folks. [13:35:37] oooh [13:35:40] I see [13:35:59] so I guess step 1 is to just... add more people [13:36:52] Say, speaking of adding more people, how's your work this quarter looking? Can I borrow your time to help me secure and configure some server resources? [13:37:29] I've been talking to people about trying to get a big instance inside of Labs for some computationally intensive data "products". [13:39:23] halfak: sure! I'm moving to ops next week :D [13:39:34] halfak: and labs focus, starting with monitoring + labsdb audit. [13:39:49] halfak: but I don't know if Labs is the appropriate environment for that, due to the way our provisioning works... [13:40:48] YuviPanda, I'd like to be able to operate within labs, but it wouldn't need to be all that integrated. I just want external collaborators to be able to work on these things with me. [13:41:07] halfak: yeah, that I can definitely help with :) 'big' instance would be a bit of an issue... [13:41:20] one thing we were considering is just setting up another OS cluster to replace stat* boxes... [13:41:32] which will also free them from the 'puppetize everything' thing [13:41:34] OS cluster? [13:41:42] Oooh. Tell me more. [13:42:16] (nothing against puppet, just that whenever I need something done, someone frowns and says they'll need to make time for puppet work) [13:42:40] halfak: OpenStack [13:42:45] halfak: so it's be like a 'research labs' of sorts... [13:42:59] halfak: it was only vaguely floating around, but IMO that's the correct solution... [13:43:43] halfak: fwiw, I can help with puppet changes, esp. now I can +2 them myself next wee k:) [13:43:49] halfak: but yeah, still an interruption to your work [13:44:18] halfak: so idea being you wiill get a labs-like environment gated to researchers, *but* have access to the other stuff you guys use (hadoop, analytics-store, etc) [13:44:20] I'm liking this "research labs" idea. [13:44:22] halfak: and gated by people with an NDA [13:44:25] *gated to [13:44:45] so it can be smaller, but would still be a fair amount of money to get new hardware... [13:44:57] This is something we can bug tnegrin about. [13:45:02] well, I don't know exactly how much :) but it's something you need to bug toby about [13:45:04] yeah, exactly [13:45:51] uhh, don't know anything bout that ^^^ but sounds awesome [13:45:54] who's talkign about that? [13:46:17] o/ ottomata [13:46:20] ottomata: at the moment, me :) but idea was planted by someone else at some point in a single sentence... [13:46:23] a long time ago... [13:47:03] ottomata: so idea being you set up a single machine with OpenStack on it, have a small interface for it gated to people with an NDA, and also expose Hadoop, analytics-store, etc to it [13:47:28] ottomata: it'll have limited to no public accessibility (no public IPs, HTTP Proxies, etc) so exposing data will still need to be done in some other way [13:47:39] ottomata: but this would mean you can have a simple OS installation that researchers can treat truly as 'cattle' [13:47:44] ottomata: and use and throw whenever they want... [13:49:03] YuviPanda: people wouldnt' be able to use webproxy? [13:49:05] ottomata: other options exist too, of course. we can run Mesos or something Docker based... [13:49:24] ottomata: well, they can. idea is to limit the 'StatLabs' from the outside internet as much as possible. [13:49:43] ottomata: by 'they can' I mean we can surely set one up, sure. [13:50:16] aye, i think at the very least people will want to be able to use things like pip, etc. [13:50:18] halfak: btw, what was your original request for the 500GB instance request? [13:50:27] ottomata: oh of course! it will have access *to* the internet, same way as labs [13:50:32] ottomata: just not access *from* the internet [13:50:49] if that makes sense [13:50:52] ok phew [13:51:20] ottomata: heh :) [13:51:28] ottomata: idea is for it to be way more fluid for researchers, not way less :) [13:51:35] ottomata: so essentially it is 'labs with private data access' [13:51:36] YuviPanda, for that instance, I would need it to be accessible from the internet. I'd only be working with public data in that case. [13:51:54] halfak: sure, but what exactly needed 500GB? MongoDB? [13:52:15] Oh! Yeah. Some of the datasets I wanted to store. E.g. all diffs for English Wikipedia. [13:52:37] I'd like to put a btree on it for an API and Mongo is a fine option in that regard. [13:53:26] halfak: right, so for cases like this, we can also just set up Mongo (or RethinkDB (Or PG)) on a separate machine and make that available to labs [13:53:32] halfak: btw, have you heard of gabriel's restbase project/ [13:53:33] ? [13:53:43] i think he's going to have an API to revision data stored in cassandra [13:53:45] halfak: labs already has access to a mysql database you can use, and also a postgres database you can use, which have a *lot* of space [13:53:54] halfak: so if you can use postgres, what you wanted is already possible. [13:54:36] YuviPanda, postgress can also store and index json docs. That could work. [13:54:47] ottomata, we talked about it briefly. [13:54:51] halfak: yup, and if you want I can get you a username/password for our labs postgres stuff. [13:54:59] halfak: and you can start already, and can use the current instances themselves.... [13:55:01] ottomata, but I'm not sure what I'd do with it. [13:55:38] YuviPanda, that would be great. Anyone going to get mad if I fill the DB with a few TB of rows and indexes? [13:55:56] halfak: I'll check :) only others using it is OSM [13:56:10] Open Street Map is on Labs!? [13:56:25] halfak: no, it's not on a *labs* instance [13:56:32] halfak: it's on a physical machine, which is accessible from labs [13:56:39] halfak: not OSM itself, but our mirror [13:57:26] ottomata, any links to docs around restbase? [13:57:47] halfak: probably? but I don't have them. [13:58:17] OK. I'll try to find gwicke when the west coast wakes up. :) [13:59:31] halfak: not very useful: [13:59:31] https://github.com/gwicke/restbase#usage [13:59:32] but something [14:00:53] This is very useful. [14:01:06] Thanks ottomata. [14:01:41] halfak: see -operations, you have postgres access with 4T of data storage... [14:06:15] ottomata: looks like we might've found a solution for that now that's far simpler [14:07:14] Indeed. This could work for me in the short- to mid-term. [14:08:08] * halfak will need to re-read up about postgres' json storage system. [14:10:43] Hey, check it out: I've generated a list of open access papers that aren't cited in Wikipedia anywhere, ordered by the number of times they're cited by other open access papers: https://tools.wmflabs.org/oar/bestuncitedlong2.html [14:13:30] EdSaperia___, Oooh. I know what to do with the first few examples. [14:13:55] The 4th article I should probably read. [14:14:01] When this system is fully up and running, it will be a much longer list - and one that will constantly grow [14:14:10] and ordered in a better way [14:14:10] :) [14:14:19] cool tool I think [14:17:22] Agreed. [16:55:53] Hello! I'm a engineer doing some research together with WMF Sweden. I have been running some database queries with the http://quarry.wmflabs.org/-tool and have gotten some really good results, but would like to run a few bigger queries that are timing out. I talked to Jan Ainali at WMF Sweden, and he said that perhaps someone here would be able to help me? [16:56:23] * Emufarmers eyes YuviPanda. [16:59:55] hey Icebear_! [17:00:06] I've been trying to catch you. [17:00:25] I have 15 minutes. [17:00:37] I think you will want to get an account on Tool Labs. [17:01:09] See https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Labs [17:01:38] That guide is intended to help you get from zero to running queries directly against LabDB. [17:02:39] morning! [17:03:38] Hey Ironholds [17:04:10] hey halfak :). How goes? [17:04:26] Too many meetings. not enough time inbetween. [17:04:47] Hi Icebear_, note that "WMF Sweden" doesn't exist [17:06:39] Nemo_bis, https://se.wikimedia.org/wiki/Huvudsida [17:07:05] So, I'm confused. [17:07:12] WMSE != WMFSE [17:09:41] Gotcha. [17:11:05] Allright, thanks! [17:12:25] I have some well-written queries, but the quarry-tool times out with the max-limit of 10 minutes. It's possible to do more extensive queries with the Tool Labs? [17:14:24] Oh yes, sorry I was referring to WMFSE, not WMSE. [17:19:05] Sob. WMSE, Wikimedia Sverige, exists and Jan is its president. WMFSE / WMF Sweden doesn't exist. :) https://meta.wikimedia.org/wiki/WMSE [17:20:59] Haha oh I see. I had the abbreviations mixed up [19:26:45] halfak, continuing our theme of intelligent, interesting hip-hop [19:27:10] Astronautalis's "Do You Believe In Life After Thug?" [19:27:25] That title [19:27:32] :) [19:44:30] halfak, yup! Although now every time I see it I get the original stuck in my head [19:44:34] (also, C++ debugged YASS) [21:18:29] o/ heatherw [21:18:51] Ironholds, finally getting to listen to that song. Will report back. [21:19:28] hi! halfak [21:19:48] hey heatherw! [21:19:52] halfak, cool! [21:19:59] I don't know if you're in this channel often, but I just noticed you join and wanted to say hello. :) [21:19:59] Ironholds: xo [21:20:09] :) [21:23:26] oy, this is a weird day [21:23:30] everything is so /quiet/. [22:05:00] Ironholds: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaah [22:07:04] not sure that helps [22:12:15] Ironholds, song approved. [22:14:24] gewd [22:23:52] halfak, I remembered what I was going to tell you! [22:24:07] our internal search logs deliberately don't contain any PII so we can't fingerprint, so using that dataset is probably out. Womp womp :( [22:43:21] leila, it's your birthday?! [22:43:36] I wish. ;-) why? [22:45:08] Ironholds, do you have plans for me? [22:45:42] leila, see analytics-internal [22:48:53] okay. I accepted the gift [22:49:02] it's absolutely fantastic!