[01:50:00] 10Quarry: Query counter increases but draft query is not accessible when window is closed and query doesn't have a title - https://phabricator.wikimedia.org/T101394#1791638 (10XXN) A similar problem. There are some queries without title and placeholder 'None' does not appear instead and i can't access query even... [22:31:57] halfak: I got a flow board for quarry! https://www.mediawiki.org/wiki/Talk:Quarry [22:32:24] Woot! [22:32:47] halfak: it's the wrong wiki (should be meta) but unfortunately I do not have a firstborn to sacrifice [22:32:52] so its' to be mw.org [22:33:23] YuviPanda, what do you think about embedding? [22:33:48] Sounds like it would be difficult and buggy, but could also pull a lot more conversation. [22:34:13] halfak: yeah, mostly I don't know how auth will work there [22:34:29] halfak: I can also say 'fuck all of this' and embed an entirely different discussion system :) [22:34:43] Ha. Yet another discussion option. [22:35:00] Maybe we could go the other direction and embed a query on a wiki. [22:35:18] like http://posativ.org/isso/ [22:35:29] yeah but that'll require I don't know what [22:35:31] a gadget? [22:36:21] Yeah. A gadget [22:36:23] :) [22:36:43] * halfak imagines the Quarry extension [22:36:54] ;) [22:37:23] halfak: there used to be a Special:Ask :) [22:37:27] way baaaaack [22:38:02] Oh yeah. I never used that but I talked to Staeiou about it. [22:38:13] :) [22:38:19] I need to spend some time on Quarry [22:38:25] that probably counts as self care too [22:39:48] :) "fun" work [22:40:05] * halfak writes test cases for Amir's wikibase feature extractors [22:40:30] It currently uses the API for its test cases. [22:40:40] I do not like non-determinism. [22:42:25] * YuviPanda is slowly getting back to doing 'development' too [22:42:37] heard a podcast about sklearn earlier this week [22:42:55] Cool. What bits did they talk about? [22:43:43] halfak: it wasn't that great (it was the 'talk python to me' podcast, which has always felt a bit 'bro'y to me) [22:44:02] halfak: but I guess I understood finally what 'feature extraction' means (I think?) and why everyone's talking about vectors :) [22:44:25] :) We actually hide as much numpy stuff as possible in revscoring. [22:44:36] I'd rather work the pythonic way in a system that needs to work. [22:44:37] nice! [22:44:39] +1 [22:44:54] I'll do vector math when I'm doing one-off statistical analysis. [22:45:00] like, I understand 'so we transform text instead into a k/v pair of 'feature' and 'value'' or something [22:45:16] Indeed. [22:45:21] Just a collection of statistics [22:45:26] Scalars and booleans [22:45:28] yeah [22:45:48] I think I kept thinking of vector as 'speed + direction' for a while and was super confused :) [22:46:08] * YuviPanda blames his lack of 'real' schooling [22:46:35] Yeah. That threw me off too. What's worse is that calc has both types of vectors. [22:46:49] right [22:46:53] and we think CS people suck at naming :D [22:46:59] for some definition of 'CS people' [22:47:39] "CS people" are mathematicians who program. [22:47:54] "I'll just call this important variable 'b'" [22:48:02] And this important variable 'c' [22:48:06] mathematicians SUCK [22:48:15] Except in this function that I hacked together. [22:48:22] Mikhail taught me the basics of Bayesian analysis last week, and it involved maths [22:48:27] And this code block that I copy-pasted from the internet [22:49:10] lower-case a is a value, upper-case A is a variable, except when upper case A is an object space, in which case lower case a is the variable [22:49:14] * Ironholds throws hands up [22:49:36] nice [22:50:04] lol @ math notion of a "variable" [22:50:10] Seriously screwed me up. [22:50:13] halfak, the worst is square brackets. Square brackets! [22:50:18] they mean two different things in pure maths! [22:50:30] I programmed before I had to deal with the "A is a random variable" thing. [22:50:32] one of those things, also what square brackets mean in code! The other thing, not that! No way of easily telling the two apart! [22:50:39] * Ironholds screams [22:50:44] haha [22:51:03] * YuviPanda has thankfully forgotten all these [22:51:04] my_brain[mathematics] == lunacy [22:51:20] YuviPanda, don't worry we're putting on a Bayesian analysis workshop for you at all-hands [22:51:33] it'll aaaaaall come up again [22:51:44] I'll bring my posterior [22:52:07] Will we do some conjugating? [22:52:11] ;) [22:52:11] halfak, bring your prior instead [22:52:20] we can inform it with the posterior and end up with a better prior for the next analysis! [22:52:36] * Ironholds high-fives, pauses in mid-air, the word "SCIENCE!" appears on the screen, aaaaaand credits [22:52:39] markov chains are the only thing I ended up enjoying [22:52:45] YuviPanda, oh you'll love this shit then [22:52:47] mostly because I generated a lot of crap text with it and it was super fun [22:52:53] :D [22:53:02] a lot of Bayesian analysis for complex datasets is heavily dependent on markov-chain based monte carlo methods [22:53:15] I keep meaning to mix AN/I and Discordia into a Markov Chain and see what comes out [22:53:26] Perl [22:53:27] or AN/I and bible [22:53:28] the answer is Perl [22:53:29] hidden Markov chains, z0mg [22:53:49] Ironholds: perl6 is a great language! [22:53:53] no relation to perl5, thankfully [22:54:04] is that the one they made great by making it unrelated to perl-yep [22:54:14] yeah [22:54:27] I love their replacement for Regexes [22:54:34] it's super great, *actually* readable [22:54:37] and I hope it catches on [22:55:01] 'replacement for regexes' [22:55:05] *googles regexes* [22:55:08] *comes up with 'PCRE'* [22:55:15] *googles 'PCRE'* [22:55:17] P5CRE :D [22:55:36] Larry Wall wrote this big post about how basically they fucked the world over by giving them PCRE [22:55:39] my point is that Perl replacing regexes is like...Tony Blair coming up with a new system of government for Iraq [22:55:40] which was great but also line noise [22:55:51] you don't get to expect gratefulness when /you broke the thing/ [22:56:08] (Ironholds wins) [22:56:12] at least Wall knows it. Google came up with a really interesting regex engine recently, actually [22:56:32] it basically nixed backreferences and in exchange, runs consistently in polymorphic time [22:56:35] downside, C++11 [22:56:48] https://github.com/yuvipanda/perl6-Ident-Client/blob/master/lib/Ident/Client.pm6#L7 [22:56:55] that's the nicest parsing code I've ever written [22:56:59] and parsing things suckls [22:57:44] Would not mind having a powerful parser syntax in python [22:57:51] +1 halfak [22:58:05] halfak: there's pyparsing but it's suppperr slooow [22:58:17] also, http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1447023445.069&target=tools.tools-proxy-02.reqstats.line_rate that's the requests per second for toollabs! [22:58:25] apparently we average around 60 per second now [22:58:28] which isn't too bad [22:58:35] (raw requests) [23:00:25] YuviPanda, requests from labs, eh? [23:00:29] :P [23:01:30] requests *to* labs, Ironholds :D [23:01:35] I'm giving a talk about it on Tuesday [23:01:40] so gathering actual stats! [23:01:41] oh really [23:01:49] yeah [23:01:50] do you have any stats on requests from labs :D [23:01:56] Eugh, I'm giving a talk tomorrow morning [23:01:57] actually we don't :P [23:02:00] and another one on the 20th. FML. [23:02:02] which is kind of sad [23:02:11] we can at best do 'maaaybe any one of these things' [23:02:39] Ironholds: did you try to get people to bring back the 'no sending requests without UA or you will get blocked' thing? [23:02:45] YuviPanda, I know, I'm just goading you about those WDQS requests [23:02:50] oh hell no, I want people to send without UA [23:02:57] oh? [23:03:22] Operations is far too intransigent to actually do anything about the problem in terms of rate limiting, because hey, it's not an operational concern, it just happens to be only addressable by people who run the servers [23:03:34] in the absence of that blocking NULL UAs just guarantees we will have automata traffic we can't see [23:03:35] Ironholds, did you ever respond re. presenting at the Nov. showcase? [23:03:49] yeah, I guess if it isn't actually causing service disruption I'll tell Coren to look at it on Mon/Tue/Wed, and check back on Tuesday (when my conference ends) (re: WDQS) [23:03:50] not blocking NULL UAs...'no UA' is a really good one-way heuristic for 'automated bullshit' :D [23:04:02] YuviPanda, I mean, nothing has broken yet, except /all of our analytics/ [23:04:05] I quite like our analytics [23:04:29] halfak, I did not! I don't have anything interesting these days :(. I never work on much interesting any more. [23:04:30] I still don't fully know if I should drop prepping for my talk and track this down or not... [23:04:33] Which reminds me to kick Nate about that paper [23:04:33] should I? [23:04:47] YuviPanda, naw, damage is already done, I can bollock them whenever [23:04:53] cool cool [23:05:06] but one of my red flags is people declaring something not a service disruption because hey, the lights on the server still blink ;p [23:05:17] Ironholds, what about your recent work with Disco. It would be cool to dig into a dashboard or to discuss some A/B results. [23:05:44] my definition is 'will I call someone up if it is 2AM if they are the only person who can fix this?' and if the answer is 'no' then not a service disruption, I guess [23:05:49] Disco? [23:06:09] YuviPanda, I mean, I could call you at 2am but I would be asleep, that's like 5am my time and I don't wake up before midday unless my bed is on fire [23:06:21] and even then...how fast is the fire moving is my question [23:06:39] halfak, Mikhail has some A/B results! I don't :(. I will have interesting portal commentary in early December, though? [23:06:56] Gotcha. I'll need to prod Mikhail [23:07:03] One of the reasons we had this workshop was me not knowing how to do bayesian analysis, which was a blocker on A/B testing reporting, so Mikhail has been doing all'o that [23:07:05] I just...abide. [23:07:15] Like a dude [23:07:18] :D [23:07:19] Ironholds: see, it's all complicated :) [23:07:36] halfak, something like that ;p [23:07:47] okay,s lides almost done [23:07:56] woah, the generic project proxy gets about 90reqs/s [23:08:02] halfak, see I've been publishing nothing interesting but I have to talk to two different colleges this month [23:08:02] which is about 50% more than tools [23:08:06] which I wasn't expecting [23:08:08] * Ironholds throws hands up [23:08:08] nice [23:08:16] I'm not doing any work! [23:08:20] * YuviPanda hasn't done anything interesting in one year now [23:08:25] I think Quarry was the last 'interesting' thing I did [23:08:34] rest has just been walking around a house cleaning up puke [23:09:17] pretty much the same [23:09:27] * YuviPanda sanitizes hand [23:09:27] 'hey Oliver, what have you published this year?' 'code review' [23:09:31] * YuviPanda hi5s Ironholds [23:09:35] * Ironholds hi5 [23:09:38] * Ironholds sobs quietly [23:09:45] I need to email people about papers, now I have some free time coming up [23:09:55] I'm helping halfak with one! [23:09:57] I think [23:10:05] although I haven't really done much to help so far [23:10:17] except to show up at meetings and go 'what exactly are you guys talking about?' [23:10:29] that's pretty much what I did last time halfak and I published together [23:10:43] more seriously, my man is a lean mean research machine. Not doing as much as him does not mean doing nothing [23:11:01] it means you are human and not in fact a jolly bearded robot powered by LaTeX and CC-0 datasets :D [23:11:04] * YuviPanda meets with halfak for a few days, goes on vacation for a while, comes back in a few months angrily demanding credit for all the papers halfak ever publishes [23:11:31] Sounds strangely familiar ... :/ [23:11:40] heh [23:11:53] what reference am I missing? :p [23:12:14] Seriously though, you guys are awesome collaborators. I just work really hard to be a good project lead to work with. [23:12:20] and you are a fantastic one! [23:12:26] +1 [23:12:28] YuviPanda, unfortunately you cannot meet with him, take a week off and then demand credit [23:12:29] Oh, some researchers who I was hoping to work with on the Wikidata vandalism detector [23:12:32] halfak has already got his PhD [23:12:38] you can't be his supervisor [23:12:39] * Ironholds rimshots [23:12:53] the ops team sees halfak as a good example of 'how to do SOA right' [23:13:08] despite it revscoring being like, the 7th-8th service we are going to deploy [23:13:09] Lydia introduced us. I warned them that we were going to be able to deploy a model in "a couple months" so we're working together or duplicating work. [23:13:20] They wanted to work on their Java thing and not contribute to our live system. [23:13:32] Java ruins everything [23:13:42] ^ [23:13:44] So we met with them a couple times to help them develop their feature set (terribly naive or intractible) and continued our work. [23:14:14] I hadn't heard from them in a while, but they were upset when they heard that we had deployed a working model for Wikidata. [23:14:14] YuviPanda, oh you can't just ruin everything! Not without instantiating a BeanFrameRuinerConstructorFactory! [23:14:17] halfak, wat [23:14:18] one of the things I eventually want to do is 'given this patchset who are the most likely reviewers for it?' which I guess can be nicely machine-learned [23:14:33] but...you said. you were going t. [23:14:38] YuviPanda, woah there, woah there [23:14:39] They wanted credit. I asked, "I'd love to have you contribute, but I don't see any contibution." [23:14:49] YuviPanda, that is a very good example of where algorithms can do Evil. [23:15:08] Long story short, there was an angry email thread and now we're not friends anymore. [23:15:13] Boo :( [23:15:14] * Ironholds hugs [23:15:24] (it's not anyone I have to deal with, right?) [23:15:27] Thanks Ironholds [23:15:35] * YuviPanda provides hugs too [23:15:45] :) [23:15:54] * YuviPanda takes a tiny amount of credit for ORES and keeps it in a box [23:16:03] YuviPanda, so, on selecting reviewers through a predictive model, things to keep in mind [23:16:10] I've been troubled by this all week -- double checking everything with Amir to make sure we didn't make a mistake and overlook some contribution. [23:16:22] 1. you are relying on pre-existing events, which means biasing against new blood [23:16:24] YuviPanda gets a substantial amount of credit for ORES. [23:16:35] 2. you are relying on pre-existing events, which means perpetuating systemic bias that may live in how humans interact [23:16:50] halfak, has anyone made the obvious joke about ORES yet? [23:17:01] In paper terms it's a real gold mine [23:17:38] :D [23:17:49] But more seriously, I am sorry you have been so troubled. I've not had that precise scenario but it can be very awkward and unhappy when collaborative relationships break down, particularly if you are a good person (as you are) and orient towards possibly being at fault [23:18:00] Ironholds, this assumes that the machine gets a strong amount of agency. [23:18:03] Ironholds: sure, hence only suggesting reviewers. because a good current problem is 'I did not know about that patch!' because people who are newcomers specifically do not know whom to add as reviewers. [23:18:23] YuviPanda, I meant the inverse [23:18:34] newcomers not being considered because they haven't been given patches before ;) [23:18:40] halfak, it's true! [23:18:49] sorry, I've been writing ethics slides all day and it has polluted my brain [23:18:55] right, but this is targetted at newcomers who create patches and want non-newcomers to review them (since only non-newcomers can merge patches) [23:19:01] I'm lecturing some happy excited students at Northeastern on Monday and I want to make sure they don't go all dark side [23:19:15] No. It's good. Direct agency of black-box models is scary as hell. [23:19:16] so you only want (people who have rights to merge said patch) to be considered as reviewers, I guess? [23:19:20] E.g. Facebook's feed [23:19:53] halfak: I've a nice chrome extension that blanks out the facebook news feed! that coupled with a 30min a day FB+HN+TWITTER+REDDIT+CRICKET limit, I've been doing better [23:20:05] Cricket? [23:20:07] YuviPanda, then instead of newcomers, consider people who are not newcomers but are new to a particular skill area, or are experienced in a particular skill area but (for whatever reason) not particularly thought of by meatbags [23:20:21] yes would definitely bias against them [23:20:44] and this could be good (strictly speaking I have merge rights to core. Do not ask me to write PHP) or bad (what if I learn to write PHP, and am pretty good at it, but it takes people a while to cotton on?) [23:20:47] but isn't the 'current situation' that 'nobody gets added, and hence nobody looks at patch, patch rots and newcomer goes away' [23:20:52] like I'm not saying don't do it [23:21:00] I'm saying: be aware that a model trained on human events has some human limitations [23:21:25] see also those abominable ML experiments in banking :/ [23:21:27] right, but if I modify file X in my patch, suggesting as reviewers people who have reviewed/written/merged code in X (or in patches that touched X and also touched Y and then there are people who just touched Y) seems like a useful start [23:21:58] this also requires that I have 1. time, 2. understanding of how exactly to do this, which might both be unfortunatley in short supply [23:22:00] yeah, if you use as a baseline 'existing contributors' rather than 'existing reviewers', that sounds like it'd help [23:22:20] yeah, both 'people who have merged code here' and 'people who have written code here' [23:22:39] *thumbs up* [23:22:40] I guess 'people who are probably familiar with the area of code you touched' [23:22:54] in something like the Linux model this is very explicitly set up in the 'submaintainers' stuff [23:22:58] we don't have anything like that [23:23:11] honestly the last entity I want to copy any systemic or interaction design principle from is Linux [23:23:13] in a lot of ways we are just a 'company that has code in the open' and less 'OSS project' anymore [23:23:51] I mean we could copy from Linux but then our CR process would consist of an angry insecure scandinavian with no empathy rampaging around screaming 'fuck you and fuck your code, LINUS SMASH' [23:24:00] and you don't really need a predictive model for that [23:24:14] well, that's just confirmation bias IMO since there's a lot of patches that go around *without* that going on :) [23:24:18] not defending Linus [23:24:37] indeed, lots of patches work without him being an ass, which begs the question of why he feels any need to be an ass ;) [23:24:38] but a lot of patches get merged and there's like 0 involvement from Linus at all [23:24:45] right, so I think that's a problem with Linus [23:24:47] agreed, and that's the way all projects should work [23:25:07] but whatever I was saying I think was orthogonal - that there is clearly defined 'subsystem maintainers' [23:25:09] but there are other elements of their culture and process that are reflective of this tolerance of intolerance [23:25:11] for MW, for example [23:25:14] *nod* [23:25:15] if you want to touch EditPage.php [23:25:22] there's no clear 'so who is going to look at this?' [23:25:24] *crickets* [23:25:33] and then you have to spend time and energy shopping around [23:25:42] *nod* [23:25:46] and if you're new you don't even know *whom* to shop around with and where to shop around for [23:25:51] oh god this is going to end wi- YuviPanda you need to stop this project. [23:25:59] because I've written a lot of our Hive UDFs [23:26:05] haha [23:26:06] it's going to recommend I CR Java for money. Please no [23:26:08] * Ironholds whimpers [23:26:12] so you're responsible for that now :P [23:26:19] but you are defacto responsible for the UDFs [23:26:33] this reminds me I forgot to update the pageview documentation. Oops. [23:26:36] * Ironholds goes to do that [23:26:51] I wonder if there's value in identifying explicitly who 'de facto' maintainers of things are [23:27:02] like Brad/Tim for Lua, Tim/JackMcBarn for parser stuff, etc [23:27:05] absolutely [23:27:16] but I know this from looking at patches pass through for ages [23:27:21] most people don't [23:27:43] how is it only 18:30? [23:27:48] so having this be auto-generated (with appropriate - 'these are not the only people! THESE ARE JUST SUGGESTIONS BASED ON DATA!') [23:28:09] * YuviPanda will ask Ironholds next time he has a Java question [23:28:30] Ironholds: you should hang out in #wikimedia-ai sometimes :) Lots of ethics discussion there too [23:28:46] YuviPanda, also make it a service that solves a general problem and let people decide how to use it. [23:29:02] halfak: what do you mean [23:29:06] how general? [23:29:15] like, not 'wikimedia specific' general? [23:29:17] So, if I wanted, I could intersect the review prediction with some minority signal to target good reviewers who don't get to review much. [23:29:23] ah [23:29:37] ^ [23:29:49] and make the system output not just the recommendations but the reasoning behind them [23:29:54] +1 [23:30:02] Ironholds, that's basically impossible :( [23:30:07] aw :( [23:30:15] but another problem we have is that you add 15 people to review a patch and then bystander effect [23:30:17] unless you do something very very simple and inelegant, I guess [23:30:17] BUT that doesn't mean you can't also help people get an intuition for how the model works. [23:30:23] Ironholds, yeah [23:30:23] da [23:30:32] YuviPanda, people do that? I add, like, 2. [23:30:40] E.g. even with a black box model, you can ask "what would happen if I changed this one feature" [23:30:43] One to review it and the other to feel bad enough that the first person didn't review it to make them [23:30:47] Ironholds: there are patches that have the entire ops team added on them [23:30:59] Where "this one feature" is "is newcomer" or "is south african" [23:31:00] wat [23:31:03] and ori and tim alongside for good measure [23:31:25] yeah I aggressively either merge /review these or take myself out [23:31:29] and most of the time it is latter [23:31:45] in that I do have the rights but fuck no I'm not touching Varnish [23:31:47] halfak: +1 [23:31:52] * YuviPanda should be slowly learning these [23:32:19] halfak: I keep being splintered between 'what should I spend my time on' [23:32:31] halfak: spending time on things means less time for things like Quarry or the PyWikiBotAsAService [23:32:48] *these [23:32:57] * halfak really wants PyWikiBotAsAService [23:33:02] me too [23:33:08] I'd never use it, but other people would -- a lot [23:33:11] I might actually do it with DockerSwarm to test that technology out [23:33:14] yeah [23:33:23] I've slowly developed a coherent theory of 'what I want to do' [23:33:37] which is related to the 'commandline bullshit' article I shared with you, halfak [23:34:02] ^ +1 [23:34:04] people want to answer questions, and that shouldn't require them to need to know: ssh, private/public key cryptography, screen, gridengine, rate limiting [23:34:06] Incidental complexity [23:34:16] so my hypothesis is that by removing each layer of incidental complexity [23:34:22] you increase the pool of people who can do this [23:34:24] YuviPanda, I wish someone who do that in Hadoop land. [23:34:42] and also reduce the mental cost of doing this for the people currently doing it [23:34:50] so Quarry does it for SQL, pwbAAS will do it for pwb [23:34:59] PWBAAS is in fact just 'screen as a service' [23:35:04] since that's what you're getting - a shell [23:35:05] ^ +1 [23:35:12] halfak: have you seen try.jupyter.org? [23:35:15] And some nice public code display [23:35:41] halfak: so just like how there's an ipython notebook thing, there's also one for... bash [23:35:44] * halfak clicks [23:35:46] which I think is just 'screen as a service' [23:36:00] halfak: try both their 'bash notebook' and their 'terminal' [23:36:21] YuviPanda, will have to do it later. I need to run now. [23:36:28] halfak: kk [23:36:30] halfak: have fun! [23:36:39] Have a good one YuviPanda & Ironholds [23:36:40] o/ [23:36:44] take care two-rood! [23:37:03] Ironholds: do also look at teh R integration in try.jupyter.org if/when you have time [23:37:08] ehhhh [23:58:53] I'm running some stats against analytics-store [23:59:00] if it fucks up stuff for other people you know whom to blame!