[09:50:11] Ironholds: email is easy to fake. Sender addresses can not be trusted unless pgp signed [14:09:23] o/ Ironholds [14:09:34] morning halfak :). How goes? [14:10:43] Pretty good. I slept for about 20 hours yesterday. [14:10:46] I didn' [14:10:59] nice! [14:11:02] t realize I needed it [14:11:15] * halfak ignores rogue enter keys [14:11:18] How you doin? [14:12:37] trying to work out why C++ has started acting weirdly [14:12:43] apparently -1 is >= 0 [14:12:50] * Ironholds throws hands up [14:13:05] but, look at the language stats on https://github.com/Ironholds/WMUtils if you please [14:13:10] Wha? Could be unsigned int? [14:13:12] I finally got the UA parser's C++ working [14:13:33] see, y'd think, only I changed it from != -1 to >=0 to /avoid/ compilers shouting at me [14:13:49] still, let's check [14:14:07] ... [14:14:10] that is THE STUPIDEST THING! [14:14:27] Either I break my code or the compiler scolds me for not breaking my code? Screw you, C++! [14:15:15] * Ironholds goes through adding "signed int" to things. Bleh. [14:15:33] but we have C++ based geolocation and UA Parsing now, so that's nice. [14:15:52] I might try to port Myer's diff algorithm too [14:17:49] Myer's? [14:18:01] https://neil.fraser.name/software/diff_match_patch/myers.pdf [14:18:14] There's a set of implementations called diff-match-patch it might be nice to have. [14:18:24] Oh yeah. [14:18:37] * halfak wonders if there's more to it than when he last looked. [14:20:19] Am I right to conclude that this is slightly more performant LCS? [14:21:59] from what I've read, preeeetty much [14:23:07] halfak, love your email [14:23:20] "personal interest means I'll probably look at it by the end of the calendar year. Probably" [14:23:26] but it's okay, I hear we're not overworked. [14:24:43] Yeah. I really wish I didn't have to have so many qualifiers. [14:25:29] just think about how fulfilling the final will be as a result, though! [14:27:23] oh. I wonder if it's "I can't compare a signed and unsigned int, unsigned is more basic, so convert -1 to 0" [14:27:35] that'd make sense. It'd be stupid but it'd make sense. [14:30:56] * YuviPanda sets up a day long meeting with halfak [14:31:09] Ironholds: 'unsigned is more basic'? [14:31:23] C++ defaults to signed. [14:31:27] what exactly was it complaining? [14:31:27] YuviPanda, hypothesising as to why the hell this is happening [14:31:34] paste the warning? [14:31:48] well, the complaint was "comparing signed and unsigned ints", which is true [14:31:57] I had {range down to -1} != 0 [14:32:07] however, {range down to -1} >= 0 /always comes up as true. [14:32:28] which, my only theory is some sort of conversion of the signed int for comparison purposes. [14:32:52] ah, that sounds like compiler being stupid or something super subtle [14:33:06] * YuviPanda suggests compiling with clang, which has much better error messages than gcc [14:33:08] err, g++ [14:33:33] heh [14:34:03] well, I just signed it by switching out if( possibly_unsigned != -1) for if((signed int) possibly_unsigned != -1) [14:34:08] *solved it [14:34:25] my entire weekend has been stuff like this. For a three hour period on Sunday, I actually forgot how to write R [14:34:45] hmm, there's no possibly unsigned, I think. [14:34:50] signed vs unsigned refers to layout of the bits in memory [14:35:00] point me to commit? [14:35:02] well, it also refers to the possible /range of values/ [14:35:02] * YuviPanda is just curious [14:35:08] which I think is what it means [14:35:14] true, but the compiler doesn't know about that. [14:35:48] yeah. Who knows? [14:35:58] It's simultaneously working and not complaining now, so I'll call that a win. [14:36:04] (compiling with C++, but --pedantic) [14:36:07] *G++ [14:36:22] heh [15:06:58] o/ YuviPanda [15:07:51] hi halfak! [15:08:11] * halfak is surfing the email backlog this morning. [15:08:17] How you doin' YuviPanda? [15:22:47] halfak, apparently my rebuttal draft was exactly what Scott was going to say. VALIDATION. [15:23:04] Woot! [15:23:08] I had no idea I could make a career out of telling people with PhDs to suck it. [15:23:12] halfak: pretty great :D [15:23:16] halfak: back in a city now [15:35:11] YuviPanda, which city? [16:09:36] halfak: Bangalore [16:10:18] Holy non-gridness. [16:10:51] * halfak is used to the mathematical simplicity of midwestern grid cities. [16:11:38] heh [16:11:41] you'd like Boston! [16:11:49] it's almost entirely organic growth [16:15:18] hey halfak, could we push our 1:1? Got a doctor's appointment :( [16:15:21] Check out Salt Lake City. It's the most griddy gird I've ever seen. [16:16:01] They have big grids with organic-looking growth on the inside. [16:16:06] It's an interesting strategy. [16:20:50] huh! [16:21:01] sounds almost Pattern Language-y [16:28:17] halfak: heh :) [16:28:24] halfak: this is terrible city, yes. [16:28:28] halfak: before this I was in Kovalam [16:29:57] 12 hour bus ride? [16:30:11] Wow. gmaps shows a flight when I ask for directions [16:30:30] I want that in the US. [16:34:20] * halfak pulls up quarry to share a query. [16:34:21] :) [16:34:26] <3 quarry [16:37:39] quarry is pretty friggin' awesome [16:39:12] If YuviPanda had a nickel for every barnstar on his talk page, he'd not be able to carry them all at once. [16:39:13] https://en.wikipedia.org/wiki/User_talk:Yuvipanda [16:39:53] I could always use more of 'em [16:40:01] halfak: it was a 12h bus ride. I got an AC sleeper bus last week, was fun [16:40:12] halfak: rtnpro is sending patches to quarry in the analytics channel [16:40:20] :D [16:40:59] also, more barnstars please :) [16:41:39] Just gave you a new one. I realized I hadn't yet. [16:42:47] :D [16:42:48] yay [16:44:08] hmmm.. Pulling data from hdfs is pretty slow. [17:14:53] n+1 barnstars [17:25:17] :) [17:25:27] Stars for all the barns [17:32:52] Ironholds, are we meeting? (there is no DarTar for now) [17:53:54] DarTar, I'll be meeting with Mike Bernstein today. If you have specific projects/ideas you'd like me to discuss with him, let's chat or let me know here. [17:55:03] nice – his team is doing cool stuff, I cannot think of any project they would be immediately a good fit for but do keep me in the loop [17:55:29] okay. cool! [18:42:49] hey halfak [18:44:22] Check out the first gadget on https://test.wikipedia.org/wiki/Special:Preferences#mw-prefsection-gadgets [18:44:37] and see the result here: https://test.wikipedia.org/w/index.php?diff=219084 [18:46:49] (and also here: https://test.wikipedia.org/w/index.php?diff=219084&uselang=pt ) [18:46:50] ooh [19:02:15] ToAruShiroiNeko: ^ [19:18:16] halfak, Helder: can you guys mail me a quick status update on this project, this looks amazing but I haven’t followed the latest developments, specifically gadget-oriented [19:36:59] DarTar, I think the status is that I made some mocks and Helder made a JS skeleton to demonstrate how it could work. [19:37:41] Helder, do you have a repo I can submit pull requests to? [19:39:41] halfak: I can create one right now [19:40:02] I was just testing it locally and wanted to put it somewhere others could see it, so I added to testwiki [19:40:35] Cool :) [19:41:05] I want to explore how difficult it would be to make it highly modularized so that similar hand-coding work can be done with it as well. [19:41:17] e.g. Mass Article Quality classifications. [19:41:26] Or edit type classifications. [19:43:52] halfak: https://github.com/he7d3r/mw-gadget-QualityCoding/blob/master/src/QualityCoding.js [19:53:52] ottomata, re sharing a public ssh key with ops via gerrit. do you have link to instructions for that? for example, I'm not sure what repository should be used. [19:56:12] leila, i think that means adding the keyt o puppet [19:56:42] e.g. [19:56:42] https://github.com/wikimedia/operations-puppet/blob/production/modules/admin/data/data.yaml#L1195 [19:56:46] adding another entry in this file [19:56:53] i'm not really sure how that verifies the key though [19:57:39] yeah. we can give this a try. thanks! [19:58:18] er [19:58:20] * Ironholds headscratches [19:58:31] this got a lot harder some time after I joined [19:58:38] I just IRCd it to Dario at like, 11pm at night [19:58:42] Ironholds, talking to me? [19:58:50] yup ;p [19:59:12] yeah. I'm unclear about what we should do. [19:59:57] halfak, Helder: awesome I’ll try and look this up later today [20:09:55] Dartar: the code is now on github: https://github.com/he7d3r/mw-gadget-QualityCoding/commits/master [20:11:20] tnegrin, so what format changes did mobile want? Just finishing up the UUID generator so I was going to start rebuilding the session stuff [20:59:01] tnegrin, request? [20:59:25] With the next budget, can we change my title to Arghitecht? [21:00:03] also, and more seriously, what format changes does mobile want around session data, because I was gonna work on that next. [21:01:12] If we are changing titles, I want to change mine to SCIENCE MASTER [21:03:31] all-caps mandatory? [21:03:36] also, that's rather gendered. [21:03:48] you can have SCIENCE MONARCH [21:03:54] and dario SCIENCE OVERSEER [21:03:57] I feel like "master" is one of those gender-optional words. [21:04:29] yeah, I don't think that's a uniformly held opinion, though, and we should pick the least-likely-to-offend. [21:04:54] leila, what do you want for your title? [21:05:05] hey DarTar we were just changing your job title. [21:05:32] halfak, okay, you can have have SCIENCE MASTER if I can call myself Count Logula, how's that. [21:05:49] but I don't wanna let go of Arghitect. That's too good. [21:05:49] what's the context Ironholds? [21:05:50] :d [21:06:08] leila, oh, I was bored and toby isn't responding to my query about session format changes and I'm waiting for code to compile. [21:06:22] thus ARGH? [21:06:37] Argh-itect [21:06:54] One who constructs "Argh"s [21:07:15] Or I suppose designs the construction of "Argh"s [21:08:57] exactly [21:09:07] although really I'm more of an ArghFactory than an ArghConstructor. [21:11:02] * halfak cycles through Dessa, Royksopp & Coheed/Cambria [21:17:11] * YuviPanda wonders if he can get a honorary title [21:17:23] 'hit and run terrible person'? [21:18:20] wat [21:18:21] lol [21:26:40] halfak: for the second time I found myself copying https://github.com/halfak/Revision-Scoring#examples line by line to Python interpreter just to find out a few moments latter that example.py contained the same code ... [21:26:50] o.O [21:27:48] I guess I should be reading the source instead of the docs.. heh [21:29:21] heh. I wonder if there's a good way to just link to the code there. [21:29:32] rst has inter-document links. [21:32:05] halfak: any reason to use .rst instead of .md? [21:32:28] pypi understands rst, but not md. [21:32:28] because there seems to be that feature for md: http://stackoverflow.com/questions/7653483/github-relative-link-in-markdown-file [21:32:34] hmm [21:32:54] I wouldn't mind switching to md if you like. [21:33:14] I think we'll have more people viewing our github repo than the pypi listing anyway. [21:33:45] I don't have an opinion about that [21:49:05] All of stat3's CPU belongs to me. [21:49:13] (with low priority) [22:51:09] hhmnm [22:51:25] I wonder what'd happen if I implemented a TSV parser for the sampled logs [23:00:17] Ironholds: see udp-filter [23:00:51] udp-filter -h [23:00:51] on stat1002 [23:02:13] neat! my use case is somewhat different [23:02:38] I want to be able to read the things in rly rly quickly into R, while not giving a hoot about most of the columns to save memory [23:02:50] and fastread, which does that, is not fully functional yet [23:03:02] halfak, ^ - feature I forgot to mention, you can say "oh, and don't give me colX, colY..." [23:03:41] nice. [23:03:59] 'cut' is a nice unix utility for doing that to a dataset on the fly. [23:05:06] But I can see the utility of leaving a column be and just not loading it into memory. [23:06:42] halfak: have a minute? [23:06:52] Sure. What's up? [23:06:55] https://github.com/halfak/Revision-Scoring/issues/4#issuecomment-64281775 [23:07:00] two issues [23:07:12] ignore the issue with the regex for now [23:07:25] in the end of that comment I mentioned an error with ">>> from revscores.language import Portuguese" [23:07:28] Ahh. Yes. I tried to dig into this at some point. [23:07:32] I should have filed a bug. [23:07:48] So, there's not a nice way to specify "work characters" in python's regex. [23:08:15] no \w ? [23:08:17] o.O [23:08:55] Hmm... I can't find my notes. This should be pretty easy to test if you want to fix the issue. [23:09:07] How do you feel about writing up a unit test and patching it? [23:09:20] If you feel sad about that, I can have a look later. [23:09:37] so... right now I was interested in knowing if my workaround for the "ImportError: cannot import name 'Portuguese'" was the correct thing to do [23:10:19] Yes it is. [23:10:44] halfak: what is that __init__.py supposed to do? [23:10:54] It sets up a module space. [23:11:06] So, when you have /foo/bar/derp.py [23:11:15] should it have an import for each supported Language? [23:11:48] Sure. I think that's a fine pattern to follow now. [23:12:07] We can have a chat about the implications later, but following the pattern I did for English should be good for now. [23:12:36] I assume English is imported from .english. [23:12:39] * halfak pokes around. [23:12:51] from .english import English [23:12:51] from .portuguese import Portuguese [23:12:55] halfak: ^ [23:13:11] Yup [23:13:46] Eventually, I think we'll want to refactor this. But that's a problem for later -- when it's a Saturday and it seems like I have all of the time in the world. :) [23:14:07] * halfak sighs wistfully [23:17:47] ewulczyn: sorry, I need to move our 1:1 again :-/ [23:17:52] tomorrow afternoon ok? [23:18:18] +DarTar:sure, no problem [23:18:28] awesome, thanks [23:19:08] halfak: I added the import for now to avoid having the same problem later [23:19:08] https://github.com/halfak/Revision-Scoring/commit/5d51decc5fc96c525dfccea30f6e87337e79aced [23:19:53] ewulczyn, where does your code live, out of interest? [23:20:56] It lives on my machine and on whatever server it runs (lutetium, stats) [23:21:08] aha [23:21:16] Can I suggest throwing the more regularly used stuff up on github? [23:21:49] +Ironholds I would love too! I just haven't been sure what is public [23:22:11] I put all the analysis code I write on github [23:23:17] In a public repo? [23:23:26] Yup [23:23:45] It's kind of a mandate at the WMF [23:25:15] open-sourcing isn't open-sourcing if nobody can find it! [23:25:23] obviously don't put datasets or your private key up there. [23:25:38] Indeed. [23:25:38] unless you want "Ryan" to be your nickname [23:25:47] heh [23:26:44] I do set a low threshold though. Once I work in a project once or twice it gets a repo. There's lots of disposable projects that I work on that are just a couple of queries. Those don't survive. [23:26:52] Or go public. [23:27:01] But if there's going to be a writeup, there's going to be a repo. [23:27:19] Some of my code will reveals that there is a macine called lutetium, where on stat2 the .cnf file is, some database schemas etc. Is that ok? [23:27:34] Yup [23:27:43] wikitech reveals server names. [23:28:22] db schemas are public [23:28:27] Great, my work flow just got a lot easier. [23:28:30] :) [23:28:38] That's the spirit! [23:30:49] aaargh, C++ [23:30:59] aaargh, Hadoop [23:43:18] so, what's with the infinite perl on stat1002? [23:43:32] I assumed they were Erik's scripts but they've been running constantly for the last 8 hours. [23:43:52] (I mean, they're probably still Erik's scripts. It just seems unusual) [23:44:03] also, what's "pigz"? ;p