[00:23:37] evening party people [00:24:40] hi iron holds [00:25:00] how goes, harej :) [00:25:21] i am investing a lot of effort into something that probably is not as important as i think it is, but it's sunday so whatever [00:26:05] what is it? [00:26:37] Making Wikidata's coverage of WikiProjects all nice and consistent [00:27:17] The goal is to have Wikidata serve as a suitable index of the different WikiProjects; part of that is fixing up what's currently there. [00:27:22] But that reminds me. [00:27:44] Have any of you developed any cool way to list WikiProjects that are related to other WikiProjects using some kind of automated function? [00:28:10] * Ironholds thinks [00:28:14] define "related to" [00:29:15] so you go to WikiProject Medicine and it recommends WikiProject Biology, WikiProject Malaria, etc. [00:29:27] But the list is generated automatically using computer magic. [00:31:13] ohh [00:31:23] uhm [00:31:40] harej, what about using set theory? [00:31:51] That's what I was thinking. [00:31:56] so, you want, say... [00:32:08] (1) a database of all the wikiproject tags on all the pages [00:32:17] (2) take that and work out every permutation of wikiproject tags [00:32:24] (3) work out how often each permutation happens [00:32:25] (5) sort [00:32:32] (6) a list of the projects that most commonly intersect. [00:32:48] ? [00:33:02] Yes [00:33:35] Which seems fairly intensive; it'll probably be a lot of one-time generation and then updates every six months or whatever. [00:34:38] and on a scale of 1 to 10 your technical skills sit..? [00:34:49] "shitty PHP developer" [00:35:03] "who hasn't written code in four years" [00:35:20] python? [00:35:29] Haven't learned Python. I guess I should at some point. [00:35:39] * Ironholds nods [00:35:42] oh, wait. I'm an idiot. [00:35:52] this is just a table. [00:36:03] well. If we set some filters. hrmrmrm [00:36:18] age of the project; perhaps new projects get a nice shiny "new" button [00:36:19] activity level [00:36:41] Put it in Phabricator, and pitch it to your IEG overlords? [00:37:08] I want to do IEG work. But, nobody's paying me to do IEG work, and I'm knee-deep in a paper. Hopefully I'll be able to convince someone to pay me to do IEG work at some point. [00:38:12] "do IEG work" == you want to be awarded an IEG, or you want to volunteer for someone else's? [00:39:01] the latter, only with less volunteer and more "it would be nice not to always answer random-ass questions for software peeps" [00:39:08] it's a long-term rather than short-term goal [00:41:46] so if i created this phabricator task, it would be part of the research project? [00:43:12] (not that it would necessarily be automatically made your homework, but that it would in one way or another be a part of your department) [00:44:42] harej, yeah, but whether it'd get dealt with... [00:44:49] the R&D team just got a pile of unexpected work to deal with. [00:44:58] oh? [01:07:23] halfak, poke me if yer around later? :) [04:07:47] evening leila [04:07:58] evening Ironholds. :-) [04:08:15] are you going to be in Philly or are you dealing with the gummint? [04:08:35] I'm missing Philly. [04:08:40] :-\ [04:08:49] aww [04:08:52] well, I promise to drink your share [04:08:56] * Ironholds alots himself an extra 0 beers ;p [04:09:02] yeah, too bad we can't see each other [04:09:14] it'll happen, I'm sure! I have to come back to SF eventually. [04:09:16] haha, thanks! [04:09:24] yeah, and then there is Wikimania [04:10:28] I don't actually know if I'm going to Wikimania [04:15:23] o/ [04:15:31] hey fhocutt :) [04:15:41] hey Ironholds, how's it going? [04:16:36] it goes okay! Doing a random request for legal, which is always fun because they always want highly specific things that nobody else has asked for [04:17:06] a result of which is me writing a bash script to simply run wc-l on log files but ONLY THOSE that match a particular regex and also they're all tarballed so can we unzip them first and remove them after and eeeeh. [04:17:43] oh fun :P [04:18:28] yerp. Yourself? [04:19:51] finally went in and fixed the weird bug in the wikimetrics patch I've been theoretically-working-on for awhile now [04:19:57] now it can be reviewed, yay [11:48:12] ugggh [11:48:16] Amtrack is entirely non-smoking. [11:54:31] Ironholds: you stopped smoking too, no? [11:54:32] also hi [11:55:07] YuviPanda: I tried, but then my immune disorder kicked me in the ass. [11:55:09] Literally! [11:55:14] ow [11:55:20] * YuviPanda gives Ironholds hugs etc [11:55:25] I’ve been clean 3 months! [11:55:32] and survived being part of groups of smokers [11:57:04] nice! [11:57:51] anyway, off to see sunset from a lighthouse with Alice before she flies back! [11:57:54] cya later [11:58:56] have fun! [12:05:32] hey hare-j [12:06:16] hi iron-holds [12:07:16] harej, have you ever been on the amtrak? [12:07:23] yes [12:07:31] what is it like? [12:08:09] i haven't taken intercity rail in other countries so i have no frame of reference. i consider it acceptable. it runs on time generally. you can go to the cafe car and buy alcohol [12:08:18] sweet [12:08:21] there is wi-fi [12:08:45] amtrak is good in the northeast and nowhere else [12:09:09] I'm only taking it to PA, so that's okay [12:09:18] where in pennsylvania? [12:11:19] Philadelphia! [12:11:45] The City of Brotherly Love! [12:11:53] so named because what the hell else is there to love about Philadelphia? [12:11:55] ah. so basically you will be on a train for a few hours [12:12:13] northeast regional i think will take you like 5-6 hours? [12:12:48] (i was worried for a moment that pennsylvania-not-further-specified was referring to "some city that wasn't philadelphia," meaning you'd have to connect to the Keystone Light or whatever) [12:13:03] excuse me, the "Keystone Service" [12:13:26] there are cities in PA that aren't philly? [12:13:39] several, depending on how liberally you want to define "city" [12:13:48] pennsylvania is a very big state [12:15:00] the city of York, Pennsylvania, is centuries younger than the New York. they thought they could pull a fast one on us. they failed. [12:17:00] uhm [12:17:27] you know the city of York, Yorkshire is milennia older, right? [12:17:40] yes [12:17:45] like, 8-7,000 BC milennia older [12:17:48] presumaby that's what New York is named after. [12:17:52] * Ironholds raises hand [12:18:04] next question; do US history students have an unfair advantage on account of there being less of it? [12:18:20] i'm not sure; i think we just study less history in more detail [12:18:24] we all study you guys [12:18:30] er, i don't know "all" [12:18:31] i meant "also" [12:18:53] ooh, which bits of our history did you study? ;p [12:20:13] uhhh, somewhere between the 1400s and when we started having our own history to study [12:20:53] cool! [12:21:27] you're making me drudge up ancient memories, but i think we basically studied the colonial powers to put our own history into context [12:22:25] that makes sense [12:22:43] okay, back in a tick! Switching partitions [13:50:35] morning milimetric :) [13:50:39] do you have any wumphers yet? [13:51:04] lol, what are wumphers [13:52:30] WMF == wumph [13:52:35] WMFer, by extension, == wumpher [13:55:26] o/ Ironholds [13:55:46] hey halfak! How goes it? :) [13:55:58] Not bad. Not flying today :( [13:56:09] I had my flight rescheduled for tomorrow morning. [13:57:55] halfak: have you done any investigation into automated generation of "related wikiprojects" lists? i discussed this with ironholds last night; he suggested using set theory. [13:58:09] harej, aww, how come? [13:58:18] halfak, rather [14:02:40] Weather, I guess. [14:03:01] You get in this evening, right? [14:03:16] Ironholds, ^ [14:03:35] harej, can you link me to some discussion of "related wikiprojects"? [14:03:45] halfak, yep! [14:04:00] halfak: so you go to WikiProject Medicine and it recommends WikiProject Biology, WikiProject Malaria, etc. [14:04:06] that is all i mean by "related" [14:05:02] harej, does it seem like there's a need for that? [14:05:02] my idea was pulling out the templatelinks table and building a big map of which projects most commonly appear on the same articles as which other projects [14:05:16] so, Wikiproject:New Hampshire and Wikiproject:United States on 50,000 talkpages together? Neat. [14:05:18] (yes, joseph is here) [14:05:23] milimetric, cool! Hi Joseph! [14:05:27] don't steal all the internet! [14:05:37] [09:03:59] halfak: so you go to WikiProject Medicine and it recommends WikiProject Biology, WikiProject Malaria, etc. [14:05:39] [09:04:06] that is all i mean by "related" [14:06:15] harej, I think I get it. But do you *need* a related WikiProject browser? [14:08:08] "need" is a subjective question. if i decide it is wikiproject x's product development priority, then yes. [14:08:37] Well, why would you think this should be prioritized? [14:08:43] Who would use this and for what purpose? [14:09:11] Also, need is very much not subjective :) [14:09:20] At least when it comes to the affordances of software. [14:11:30] ...oh crap. [14:12:05] So: Evidently my 1% skill is "having an autoimmune flareup just before a work trip" [14:12:13] I would think it would be worth it so that WikiProjects could be more integrated together instead of being thousands of different silos. There is some overlap between projects and people should be readily connected to them. This would help other projects grow through a network effect. [14:12:28] Ironholds: have you considered a causal relationship? [14:13:39] I don't know how. I mean, I get more anxious around travelling but I don't think anxiety has been noted as a trigger [14:14:11] anxiety triggers the product of chemicals, namely adrenaline [14:14:50] ditto $HORMONES and triggers, though [14:14:58] ulcerative colitis is stupidly vulnerable to many things but.. [14:15:08] I think you're looking for https://en.wikipedia.org/wiki/Cortisol [14:15:19] Stress hormone. [14:15:32] * milimetric has Cortisol in abundance, anyone need a donation? [14:15:33] :) [14:16:07] halfak: so i'm happy to batcave the whole time - joal and I are just hacking on "diff database" and related concepts [14:16:16] harej, re. connecting WikiProjects, I buy the argument that WikiProjects ought to be more connected, but how does a WikiProject similarity algorithm help with that? [14:16:23] right now I'm going to try that perfectly valid suggestion to use ujson [14:16:28] milimetric, cool. I'll hop in shortly [14:16:59] Oh! I did performance tests with ujson and it is slower :\ [14:17:04] believe it or not! [14:17:49] " severe chronic stress can lead to increased [14:17:49] inflammation." [14:17:51] oh bollocks [14:18:31] wait. HANGON A TICK [14:21:42] halfak: there would be a list of related wikiprojects for whatever wikiprojects want to use it. the automation thing lets us do it faster than manually; it also lets use list wikiprojects we wouldn't have otherwise thought of. [14:22:10] it could also help new wikiprojects potentially get off the ground; they could get a shiny "new" badge or something. this would be a list that wouldn't really be updated more than once a month. [14:23:26] i will be right back. [15:00:22] harej, it sounds like a WikiProject index would serve a similar need. Do we have one of those? [15:01:05] the answer is "barely", but we are incidentally working on that. [15:01:56] in fact, i would like for the index to live (in some form) on wikidata! [15:02:49] Every time someone mentions a WikiProject index, I get scared. [15:03:06] How come? [15:03:33] On it.wiki, when someone said "let's make an index!" they really meant "let's delete 200 WikiProjects so that the list looks better!" [15:03:52] And even quoting Anthere rarely suffices https://meta.wikimedia.org/wiki/Keep_history [15:04:41] Ah. Thankfully I am not in the business of deciding which WikiProjects live and which WikiProjects die. [15:05:16] Well, everyone can propose deletion [15:05:43] I didn't know that deleting a WikiProject would ever be desirable. [15:06:04] You can always just replace the WikiProject page with a notice about the status and someone can revive if they want to . [15:06:14] That's how I think it should work too. [15:06:28] Yep [15:06:50] Also, to the extent this is my job as a grantee, I am distinctly *not* in the business of making Community® decisions. [16:11:00] YuviPanda|zzz, hey dude. You around? [17:00:47] halfak: dinnertime [17:00:49] But sup [17:06:44] YuviPanda|zzz, milimetric and I are interested in that postgres instance in labs. [17:06:50] Is that still a thing? [17:06:58] How much space does it have? [17:07:38] halfak: it has a lot of space [17:07:41] Is still a thing [17:07:50] I can get you guys an account if you would like [17:07:54] Yes please :) [17:07:55] yes please [17:07:57] :) [17:08:08] I'm out at dinner [17:08:10] I'll make accounts soon [17:08:14] Can you file a bug? [17:08:14] eat! no worries [17:08:15] no rush [17:08:19] i'll phabricate it [17:08:23] Put it under labs project [17:08:27] Assign to me [17:08:30] k [17:10:25] Ty [17:31:56] hey leila [17:32:07] 2 minutes and I’ll be with you [17:32:26] I understand Oliver is traveling, right? [17:37:42] leila: I just saw your comment, happy to remove this from the calendaer [17:55:54] thanks DarTar. [18:09:02] leila: I just saw the UC spreadsheet, very interesting. Do you guys filter out bots and does this data include single page sessions? [18:09:32] we don't consider sessions, DarTar, for now. [18:09:56] bots are excluded the same way Ellery/Oliver exclude bots [18:10:04] some of them may sneak through [18:10:04] leila: to be clear, a client with a “trace” consisting of a single node would be included, correct? [18:10:12] correct [18:10:17] k cool [18:10:40] what’s the next step with this data? [18:10:46] I need a comScore data to compare the results. I only to Arabic's results and they are very close to comScore's [18:10:50] I know tnegrin was curious about it [18:11:23] got it, let me send out the new account request [18:11:50] I'm definitely curious -- it would be good to have another source of data to compare with last-visited as well as comscore [18:11:53] I just talk to tnegrin. The data is a backup for if things go wrong with last-month cookie implementation for whatever reason [18:12:09] and that :) [18:12:13] Otherwise, we know that that method is probably working better than what we've used, DarTar. [18:12:19] makes sense [18:12:31] leila: DM regarding comScore creds [18:12:53] At this point, I'm curious, too. This data wasn't collected with the goal of counting, I"m curious how accurate it ends up being. [18:13:17] thanks, DarTar. [18:44:31] DarTar: should I let folks know that the google letter will be delayed due to illness? [18:45:53] tnegrin: no, I can work on that, I’m slower than usual but around today. I asked Erik to chime in, I’m not sure his proposal will fly [18:46:13] I also think that he wanted to sign it [18:46:17] ok -- thanks Dario. we can chat on this if you want. [18:47:06] tnegrin: prod Erik on your end if you see him, I want to close this asap [18:47:22] will do [18:49:10] "the google letter" [18:49:26] sounds dire [19:31:39] morning :/ [19:37:30] Ironholds, how are you feeling? [19:37:53] lots of pain but nothing explicitly untoward currently happening [19:37:54] so, progress! [19:38:11] And at least I'm not my brother. [19:38:21] You have a brother? [19:38:29] I have two brothers; one of them has UC [19:38:39] * halfak learns new things about Ironholds all the time :) [19:38:43] he was admitted to hospital on Thursday because he's 5 foot 8, weighs <100 pounds. [19:38:49] they're talking about removing his entire colon [19:39:06] so..I mean, there are worse options than being houseridden, I guess? [19:39:16] I have a good friend who went through something like that. Ended up keeping his GI tract, but lost A LOT of weight. [19:39:16] halfak, do you have any siblings? I don't know the answer to this either. [19:39:20] :( [19:39:31] BTW, Ironholds, we're in the batcave hacking ig you want to join us. [19:39:39] I have three sisters and a step brother :) [19:39:46] you do? woah! [19:39:57] were any of them at the wedding and I just missed it? [19:40:11] Yup. Two sisters and a brother at my wedding. [19:40:20] wow, I'm terrible. [19:40:29] I might have failed to make the intro [19:40:41] The whole day is sort of blurry [19:40:58] the whisky and adrenaline and candle wax will do that to you [19:41:16] that reminds me! I keep meaning to ask where the reading was from! It was beautiful and totally unfamiliar. [20:04:18] Ironholds: you should get your project in 20-30min, I think [20:04:42] Ironholds: wehave had multiple (2 of 12) hardware failures in labs so a little crunched on resources atm, but should easily support one machine with 4 cores and 8G of RAM [20:05:22] YuviPanda, ack! Dan just added me to analytics project! [20:05:24] save your machine! [20:05:40] Ironholds: well, you will have to create a new instance there anyway. [20:05:47] Ironholds: and I’d far prefer it on its own project. [20:05:54] Ironholds: analytics already should be split into like 4-5 projects... [20:05:59] fair! [20:06:06] okay, I'll teat 4/8G as an upper bound :) [20:06:22] (won't need that much, I think) [20:18:27] YuviPanda, danke! [20:21:15] Ironholds: done [20:21:50] and now I’m offff [20:22:29] take care! [21:15:52] YuviPanda|zzz, lmk when you're around [21:48:16] nuria, pingeth :) [21:50:19] ottomata, we seem to have a corruption problem(?) with one of the webrequest files? [21:50:27] Try running https://gist.github.com/Ironholds/13000a707a400a55d2ff [21:57:47] haha [21:57:50] Ironholds: you must read more emails. [21:58:26] "ALSO:  The webrequest table now has some new fields!  client_ip, geocoded_data and record_version.  WooT!  This data will only be filled in for new partitions.  It should be present for everything beginning at 2015-02-26T18:00.  Anything before that will not have these fields.  Also note that you can no longer  use SELECT * on data older than this.  This is a technical consequence of the way we import the new data. [21:58:27] " [21:58:37] no select *! [21:58:39] or [21:58:40] at least [21:58:45] not before Feb 26 [21:59:07] but, i do not guaruntee that we will support select * for current data either, i can explain this to you if you are really interested in understanding why [22:00:14] awww [22:00:36] ottomata, I assume it's because we're storing the geodata as a Hive hashmap and it gets confused af trying to turn that into columns? ;p [22:02:26] nope [22:02:28] want to keep guessing? [22:47:19] ottomata, if you get some free time, https://gerrit.wikimedia.org/r/#/c/193513/ ? *bats eyelids* [22:52:13] ok! [22:53:47] ta! [22:55:34] yay! thanks ottomata :) [23:02:46] ottomata, https://gerrit.wikimedia.org/r/#/c/193982/ you did this to yourself by clearing my review log ;p [23:04:10] oh, I forgot to update the log. BOO [23:04:19] naw you did, didn't you? [23:04:34] oh that's a new one! [23:05:00] nice Ironholds I almost suggested to do that for the wikidata one [23:06:31] ottomata, yeah, future-proofing! [23:06:37] * Ironholds high-fives [23:07:17] hokay, I will have vun more patch for you [23:07:21] as soon as I work out how to best structure it [23:08:16] https://en.wikipedia.org/wiki/?search - there is no god [23:14:37] TIL Romanian has an XOR word. [23:14:43] I like Romanian. It's a rational language. [23:18:28] https://gerrit.wikimedia.org/r/#/c/193985/ [23:22:01] Ironholds: curl --head http://en.wikipedia.org/wiki/Horseshoe_crab [23:23:13] Ironholds: sorry i totally missed this ping [23:24:44] Ironholds: and the bating of eye-lashes, jajaj [23:30:13] nuria, that's okay! This is the last patch for the new definition :D [23:30:29] who wants to +2 and make history? [23:30:37] Ironholds: ottomata just merged it I think [23:31:03] nuria, https://gerrit.wikimedia.org/r/#/c/193985/ ? [23:31:24] Ironholds: ah wait, this is a different one [23:31:47] yeah, there have been 3! Amending is so much easier than writing from scratch :D [23:32:23] Ironholds: ok, let me look at this one [23:32:35] thankee! [23:34:16] Ironholds: so serach requests en up in the cluster for all users [23:34:18] ? [23:35:26] nuria, sorry [23:35:27] ? [23:35:36] argh, sorry [23:36:42] Ironholds: From your change I assume we have search requests in hadoop and they appear like regular requests, true? [23:37:09] that is, search page hits [23:37:17] not actual search attempts; sorry, the commit message is vague [23:37:31] so https://en.wikipedia.org/?search=arghleflargh for exampl [23:37:31] e