[02:19:05] OK. I'm outa here. Have a good night! [02:19:07] o/ [14:52:08] Hello, researchers. [15:45:31] o/ guillom [15:47:14] hey halfak [15:53:43] :) wonderful day for some sciencey technology work. [16:09:02] "What I found interesting, methodologically, is that for the analysis they had to exclude two mega-jerks as outliers." http://reagle.org/joseph/pelican/social/the-skew-of-rotten-apple-jerks.html [16:16:37] ha! That doesn't mean the statistical analysis would have been sound if they had kept them around. What we really need is a larger N to accurately estimate the low probability of MEGA-JERK. [16:17:29] The word "mega-jerk" would be funny if it wasn't so depressing. [16:52:40] I enjoyed that read too [17:01:04] I was just reading the Wikipedia article on thought disorders; the examples are quite funny even though they're indicative of serious mental disorders! [18:25:11] Ironholds, you poked? (I was sick/recovering, yesterday) [18:25:37] quiddity, oh, it's about the email thread. I'll reply there. [18:25:42] why does Moushira think I'm a product manager? [18:26:14] because staff page https://wikimediafoundation.org/wiki/Template:Staff_and_contractors#Search_.26_Discovery_Product_Management [18:26:17] i assume [18:28:24] ironholds works in S&D product *management* but isn't a *manager* [18:28:27] he helps the manager, i presume? [18:28:48] no, more "there's nowhere else to put me under the existing structure" [18:28:50] that's changing [18:28:52] i view product community liaisons to be deputy product managers [18:30:07] that's nice. I'm not a community liaison so. [18:30:30] indeed you're not! [18:35:07] halfak: guillom I'm trying to corall more hardware for labsdb!!! [18:35:27] Cool! Will be helpful for loading in quality scores? [18:35:34] Also, aren't you sick? Shouldn't you be resting :P [18:36:33] halfak: I am resting! [18:36:40] halfak: could be. [18:36:48] Oh. Just so long as it is restful :) [18:36:53] RESTful ;) [18:37:34] YuviPanda provides complete biographical and contextual information in every statement he makes. [18:38:22] halfak: the loading quality scores into db, I was wondering if I should use something massively parallel to process the dumps to do that :P [18:38:27] probably mw-utilities I guess.. [18:38:32] and just call into revscoring [18:38:55] YuviPanda, +1 [18:39:00] That's what I'd do. [18:39:09] You can leave it to me to produce a dataset. [18:39:10] halfak: I wonder if I can appropriate a real-hardware machine for this and similar use cases. [18:39:23] We have stat1003 [18:39:24] like stat* but in the labs vlan [18:39:27] Yeah [18:39:37] That'd be cool. I have been using ores-compute like that [18:45:11] halfak: would such a machine be memory bound or CPU bound? [18:45:56] YuviPanda, a bit of both. It depends on the job. E.g. there's a content persistence job that I'm going to kick off in a little bit that needs 8GB per CPU -- so the 12 core machine needs 90GB! [18:46:05] But most jobs will be more CPU bound. [18:46:28] halfak: so I was going to get either [18:46:29] Dell PowerEdge R410, Dual Intel Xeon X5650 (2.66 GHz), 48GB Memory, (2) 150GB Disks (Old lsearch) [18:46:29] or [18:46:34] Dell PowerEdge R310, Single Intel Xeon X3450, 8GB Memory (2) 500GB 3.5 SATA [18:46:41] big difference in disk space. [18:47:01] halfak: would 300G (at best) storage affect dumps processing? [18:47:18] Yeah. That's not sufficient to host the bz2 dump of enwiki [18:47:21] ~500 GB [18:47:55] halfak: ok, looking. [18:48:02] halfak: can you approve OAuth request 6da7c708634269ac3a02e6cf7fec8e68 [18:50:32] YuviPanda, {{done}} [18:50:45] halfak: https://phabricator.wikimedia.org/T106731 (don't comment there yet) but I"ve asked for a min of 1TB [18:52:29] halfak: ores is going to be integrated into tools.wmflabs.org/crosswatch soon [18:52:37] Oooh :) [18:52:53] * halfak has been working on better language import patterns. [18:54:14] * YuviPanda waves at sitic [18:54:19] halfak: sitic is working on crosswatch :) [18:54:26] sitic: halfak is behind ORES :) [18:54:30] halfak: we might get Dell PowerEdge R420, single Intel Xeon E5-2450 v2 2.50GHz, 16GB Memory, (2) 500GB Disks [18:54:35] halfak: that good enough? [18:54:44] 16GB would be good for many types of work. [18:54:52] halfak: and disks?} [18:55:09] Dink I'm less worried about 1TB should be good enough. [18:55:14] ok [18:55:20] halfak: https://wikitech.wikimedia.org/wiki/Server_Spares is the scavenge yard [18:55:32] 8 core? [18:55:45] 16 with HT [18:55:53] That should be pretty good. [18:57:05] halfak: cool. [18:57:16] halfak: I guess you shall still prefer Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32GB Memory, Dual 300GB SSD, Dual 500GB Nearline [18:57:42] Yeah... that seems like it's a better machine in all regards [18:57:47] (judging by the specs [18:58:11] 12 cores, 24 with HT [18:58:22] halfak: no, the nearline NAS means you get worse disk IO on the non SSDs (AFAICT) [18:58:45] Not too worried about that. [18:58:55] Big data crunching usually means append-only [18:58:59] and compress on write. [18:59:01] like, an order of magnitude worse IO? [18:59:03] I see. [18:59:38] halfak: also if we RAID them we might get only 500GB, and if we don't RAID them we should be ok with disk failure (whcih is ok perhaps) [19:00:39] YuviPanda, only real disk failure scenario I'm worried about is in the case of a long running job [19:00:55] Where a failure in the middle of that or a transfer would result in days of lost work. [19:01:10] halfak: ok. technically I'm requesting this as test machine for doing labs on hardware tests, but the first allocation I have in mind is you anyway :) [19:01:49] Woot. I'm happy to run some tests with some XML processing jobs I have planned. [19:02:08] * halfak just figured out how to solve language import issues. [19:02:09] \o/ [19:02:22] Soon you will not have to install ALL THE LANGUAGES to use revscoring [19:02:30] Just the ones you want to use. [19:03:20] Now if I could only render this scipy issue irrelevant [19:03:44] halfak: you should check the vagrant setup :D [19:09:35] halfak: not having to install indonesian would be nice :) [19:12:47] YuviPanda: Yay for more labs hardware! [19:15:20] YuviPanda, yeah. That's what I was thinking. [20:25:17] Bam https://github.com/wiki-ai/revscoring/pull/141 [22:03:58] halfak: running a bit late, joining us for grooming? [22:04:32] DarTar, yup. hanging out with grace [22:04:43] k coming up [22:51:37] halfak: https://phabricator.wikimedia.org/T106760?workflow=create stub for you to fill out about enabling flow on the research namespace [22:51:58] Thanks YuviPanda [22:52:13] halfak: if you fill in rationale and reasons I can try pushing it through [22:52:29] Going to need to RFC this, I think. [22:52:35] Regretfully. [22:52:54] halfak: I see. I shall stay away and leave it to the politicians then :) [22:53:51] Meh either way, I'll get that rationale filled in. [22:54:54] halfak: ok! [22:58:30] * halfak jumps on bike to race home to feed dog.