[00:03:08] drdee: two questions you might have access to the #s on quickly [00:03:16] shoot [00:03:19] Number of people who have edited Wikipedia ever? [00:03:33] > 1M. [00:03:39] anything more precise than that? [00:03:42] no :) [00:03:43] 2. Number of countries in which there are (or have ever been, doesn't matter which) editors? [00:03:45] that's a kraken job [00:03:49] yeah, it seems like it'd be complicated [00:04:03] since you'd want to basically apply the active-editors metric to all of time [00:04:06] another kraken job [00:04:08] with the requisite deduplication [00:04:13] so we don't have it on-hand? [00:04:16] no [00:04:26] k [00:04:33] and the 2nd question is actually really hard to answer accurately [00:05:10] I agree. [00:05:15] I think they're both kind of a bitch. [00:05:20] (They come from Sue.) [00:05:48] what was the original question? [00:05:54] i joined halfway through the response [00:06:08] 1. Number of people who have edited Wikipedia ever? [00:06:12] hey folks - have you guys seen http://apiary.io [00:06:14] 2. Number of countries in which there are (or have ever been, doesn't matter which) editors? [00:06:23] I think so. [00:06:31] oh, no. i saw ql.io [00:06:52] 2. metamx dashboard has an answer ready for the most recent edits [00:07:09] Neither number is "recent" [00:07:13] it's all of time on both. [00:07:22] 1. question around "people" is probably ill-framed ;) [00:07:26] I know. [00:07:26] inre 2: i have recent numbers easily available [00:07:33] That's part of the difficulty in both. [00:07:37] They're for an op-ed. [00:07:43] gotcha [00:07:47] So I think the standard is not really academic precision. [00:07:51] signpost or NYT? :) [00:07:54] Just something reasonably defensible. [00:07:56] LA Times [00:08:00] sweet [00:08:40] Any thoughts on reasonable definitions for the nouns in either question? [00:08:51] Basically I'm trying to figure out how long it'd take to write the pig script. [00:08:54] (for each) [00:08:55] well, that's a pretty hard question, determining how many *registered users* ever edited WP is way more tractable [00:09:04] *nod* [00:09:05] true. [00:09:08] trivial, even. [00:09:26] that's just a pretty simple select-unique, right? [00:09:32] users joined against edits? [00:09:35] well, be ready to enter the magical world of central auth [00:09:41] ugh. [00:09:42] right. [00:09:48] any better ideas? [00:09:57] nah, that's the best we can use [00:10:35] other question is: is the scope Wikipedia or Wikimedia projects? [00:10:35] 2. seems pretty impossible [00:10:43] Wikipedia. [00:10:51] erosen, you're a shining beacon of optimism. [00:10:55] hehe [00:11:01] i was going to say, [00:11:10] you could however use the non-logged-in edits [00:11:12] as a proxy [00:11:48] okay, how about this. [00:11:49] indeed [00:11:52] let's do some lateral thinking [00:12:10] what are some numbers *related* to those that are interesting, but we have on hand? [00:12:22] the goal of both numbers appears to be to show our breadth [00:12:27] our globe-spanning nature. [00:12:29] ya [00:12:39] gotta go now, but do keep me in the loop [00:12:45] like, what's the version of (2) for recent history? [00:13:02] using the ups for logged in users [00:13:11] IPs [00:13:13] do we have # countries from which edits came across all wikis? [00:13:22] i do [00:13:24] why not IPs for all edits? [00:13:36] we don't have them [00:13:38] if you had that number, i bet it would make it into the article. [00:13:40] ah. [00:13:53] it's a privacy thing, I think--we won't log your IP if you're logged in [00:14:11] yeah. [00:14:24] do you have access to the "staging"db? [00:14:35] i was hoping you just had this number :) [00:14:42] i want to avoid spending a lot of time on this [00:14:44] i'll have it in 20s [00:14:47] econds [00:14:52] sweet [00:15:37] i guess the query takes a while though... [00:15:46] so more like a 1m [00:15:52] s'ok :) [00:16:41] 231 [00:16:50] That's for the last 30 days? [00:17:02] Can you state in a sentence what that number exactly means? [00:17:19] yeah [00:19:10] hey ottomata, still there? [00:21:53] erosen: i await with bated breath. [00:22:03] number of countries from which an edit to was made between 2012/7/20 and ... [00:22:10] (running query for end date) [00:23:04] and 2012/12/18 [00:23:12] aiight. [00:23:16] or 2012/12/17 [00:23:27] I'm just going to say "the last 5 months" [00:23:37] and I'll get the complete list of wikipedias for those counts right now [00:23:42] nice. [00:23:57] you think you could enlarge the country count to the last six months? [00:24:00] to make it a round number? [00:24:16] (that is, go back to 2012/6/17?) [00:24:22] or do we not have that? [00:24:22] nope [00:24:26] we don't have it [00:24:32] k [00:24:35] it was before I started collecting the data [00:25:16] For 231: what wikis does that count? [00:26:02] All language wikis for wikipedia? [00:26:05] (That is my hope) [00:26:07] yeah [00:26:10] k [00:26:23] And it only counts logged-in edits? [00:26:29] Or non-logged-in edits? [00:26:39] both [00:26:48] k [00:27:31] ... [00:27:32] http://geography.about.com/cs/countries/a/numbercountries.htm [00:27:40] "Ultimately, the best answer is that there are 196 countries in the world." [00:27:42] Ahem. [00:27:49] ultimate data source is http://www.mediawiki.org/wiki/Manual:Recentchanges_table [00:28:04] hmm [00:28:09] ask the maxmind db [00:28:10] ... [00:28:28] http://www.maxmind.com/en/iso3166 [00:29:11] maxmind has 252 [00:29:42] yeah. [00:29:47] 91.3% is pretty hot. [00:29:50] yeah [00:36:38] erosen, you're running a thing for the editor count now? [00:37:11] i'm actually just rechecking the code to make sure non-logged-in users are in the checkuser_changes table [00:37:28] ah [00:40:19] back [00:41:07] if you guys need stats on anon edits by country over the last 6 months, metamx has the answer: https://dash.metamx.com/wikipedia_editstream/explore#e=2012-12-19&p=custom&s=2012-06-18&w=-a-&zz=3 [00:41:51] cool. [00:41:57] i'll check it out, DarTar [00:42:11] dschoon, sent you the lists [00:42:15] ty [00:42:17] i see them! [00:43:08] also, just to be clear, the original data source is the check user changes table (cu_changes) which seems to only be documented here: http://www.mediawiki.org/wiki/Extension:CheckUser [00:44:08] heh [00:44:19] any thoughts on the first question -- a sum total of editors? [00:44:24] to quote bosslady: [00:44:48] I want to kick off with some facts supporting a claim that essentially *everyone* edits. (Not true, but I want people to imagine themselves as possible editors.) So I want to say, you know, 1 million+, practically every country, that kind of thing. [00:45:10] hmm [00:45:43] I think she understates the case for "everyone". [00:45:54] hehe [00:45:57] As given a fraction of the internet audience, we really are about as close to "everyone" as ther eis. [00:46:05] we're more "everyone" than Facebook or Google. [00:46:12] i mean logged in users would be a lower bound [00:46:19] ip addresses risk double counting [00:46:23] yeah. [00:46:30] well, if i could get a range, that'd be great. [00:46:54] i have to head out soon, as i locked myself out [00:46:59] hehe [00:47:03] and a housemate's bf will be around soon [00:47:07] k [00:47:22] i think counting all editors for every language could take a bit [00:47:32] but I can probably get a number within the hour [00:48:04] sweet. [00:48:08] we have until 7 Jan [00:48:08] heh [00:48:15] gotcha [00:49:26] aiight -- heading home to get unlockedout. brb45 [00:52:42] hi ottomata [00:57:06] just wanted to follow-up on a request from Friday about hosting an API that exposes some metrics data on the stat cluster [00:58:05] just resent. [00:58:35] anytime you could chat would be great, it's isn't super urgent [02:21:15] new sankey diagram up: http://visualizations.meteor.com/ [09:39:08] the Antoine :) [09:39:11] hashar: how are you man ? [09:39:30] hello :) [09:39:34] got to bed at like 2am luckily my wife let me slept till 8am :-) [09:39:50] how are the jenkins jobs running for your team ? [09:40:18] hashar: I would like to know more about Zuul [09:40:25] hashar: but first, I want to ask if you talked to milimetric [09:40:34] hashar: he wants to set up CI for limn which is a node application [09:40:37] nodejs [09:40:50] hashar: I guess he's talked to you about it right ? [09:42:26] average_drifter: milimetric pinged me yesterday but I was about to leave [09:42:45] is in US ? [09:46:12] average_drifter: we indeed have nodejs on gallium, so that should be easy to setup. The only drawback is that we cannot use npm to install the node packages :-D [10:08:31] hashar: yes, that is the problem [10:08:42] hashar: it can be fixed if we could have access to gallium, just to set up npm [10:08:45] locally [10:08:50] we can set up npm locally [10:08:53] hashar: ^^ [10:09:21] let me correct that [10:09:30] use of npm is forbidden :-] [10:09:35] but [10:09:48] we can provide the package in a wikimedia git repository [10:10:01] hashar: what do you think of a frozen private npm repo ? [10:10:03] the reason is that we can not trust the npm packages [10:10:33] hashar: for example we freeze a bunch of npm packages at a certain version and we keep them in our private repos [10:10:43] ah that might work :) [10:10:49] given the repo is hosted on wmf [10:10:50] hehe [10:10:57] hashar: yes [10:13:56] hashar: there is another problem with npm packages [10:13:59] hashar: a big one [10:14:19] hashar: you might have the following dependency chaos going on [10:14:58] packageA requies (packageB,packageC) . In turn packageB requires packageE_v1 and packageE_v2 [10:15:56] damn, let me rephrase that [10:16:07] packageA requies (packageB,packageC) . In turn packageB requires packageE_v1 and packageC requires packageE_v2 [10:16:16] now it looks right :) [10:16:28] well you get the node_module dir that look something like: [10:16:36] node_mode/B/node_module/E_v1 [10:16:43] node_mode/C/node_module/E_v2 [10:16:45] yes ! [10:16:50] exactly [10:17:08] then I assume B will include E_v1 and C include E_v2 [10:17:12] hashar: what if we would make a big .deb with all packages needed for limn ? [10:17:17] then namespaces make it possible to use both versions at the same time :) [10:17:27] you could try packaging yeah [10:17:32] let me rephrase, what if we would make a big package limn_deps.deb with all npm stuff in there [10:17:41] or just add the deps in your repo :-) [10:17:45] brb coffee [10:18:16] hashar: adding deps in the repo is the quick and easy way out but it would make the repo heavy [10:26:36] average_drifter: indeed [10:27:02] average_drifter: if you feel like packaging the node dependencies go ahead :-D [10:27:28] average_drifter: but we will have to repackage every time you change them :) [10:29:35] hashar: I can do it to break the ice and then explain everything in a screencast to the people working on limn [13:24:33] milimetric: hi, I've talked to hashar. we reached an agreement that we could basically throw all npm modules that are deps of limn into a big .deb and deploy it on gallium [13:25:03] milimetric: that would allow for jenkins to be able to run the limn unit tests [13:25:24] milimetric: however, if there are new npm modules that you need, you will need to rebuild the big .deb package [13:25:24] oh ok [13:25:39] well [13:25:57] milimetric: so basically that package comes with + and - [13:26:01] can't we just install nodejs and npm, then do "npm install" as part of build? [13:26:02] depends if the + overweigh the - :) [13:26:21] milimetric: node is already 0.8.16 on gallium [13:26:32] milimetric: the big problem is npm. if you know a way to install npm locally then you're done [13:26:44] oh, cool [13:26:53] npm can be installed globally [13:27:00] the npm packages limn needs can be installed locally though [13:27:04] hey :) [13:27:09] morning hashar [13:27:11] milimetric: actually scratch that. the problem hashar mentioned is that we are not allowed to have stuff from npmjs on gallium because it's unsafe code [13:27:20] milimetric: we do not use npm on production box because we can't trust the code that will be automatically installed [13:27:27] yes [13:27:30] that's what I wanted to say [13:27:41] oh i see [13:27:46] I don't have any specific way to fix that though [13:27:54] ok, we have a task to debianize the installation [13:28:01] the most straightforward way is to have your git repo to provide all the required node modules [13:28:12] so one just git clone && run_script [13:28:53] hm, ok, I wonder if there's a way that's less brittle. [13:29:26] separating all the stuff in a repo just for deps ? [13:29:34] and using that as a submodule [13:29:51] git submodule [13:29:54] yeah, that might work [13:30:39] thing is, this makes experimenting with packages more expensive as it now takes up space in the repo forever [13:31:07] milimetric: yes but it's a separate repo [13:31:11] milimetric: just for deps [13:31:28] milimetric: so then that repo could even escape gerrit because you can add packages there as you please [13:31:37] milimetric: the real limn code stays in teh limn repo [13:32:46] another would be to have our own npm repository :-D [13:32:57] honestly I don't have any clean solution. Maybe that should be raised with ops [13:33:02] they are creative people [13:33:03] yeah, i remember that was tossed around. I'll catch up with David who's gone through this before [13:33:09] maybe he already had a solution [13:33:24] I have no idea how the parsoid team ended up managing their npm dependencies [13:33:49] could gallium have npm installed (since it's not production) and create a deb package for us out of the dependencies? [13:34:05] then it could deploy that package to dev/test and we could promote it to prod? [13:34:21] milimetric: gallium is production :-D [13:34:33] as a long term goal I will probably get it moved out of the prod cluster though [13:34:34] oh oops [13:34:53] or use a vagrant virtual machine to properly isolate the tests :) [13:36:04] ok, cool, thanks guys, I'll think about it now that I know the restrictions [13:37:32] oh hashar, one more question: does gallium have access to kripke? [13:37:44] milimetric: what is kripke ? :D [13:38:07] the machine we have dev and test deployments on [13:38:14] i think it's the analytics dev box [13:38:40] no idea, what is the hostname / IP ? :)D [13:38:41] so this is on kripke: http://dev-reportcard.wmflabs.org/ [13:38:44] ahh on labs [13:38:51] yeah that should work [13:39:15] ok, cool, so I can deploy there if the build succeeds. Thanks! [13:39:16] given the security rules in labs let an outside machine access whatever service hosted there [13:42:13] re [14:41:59] morning ottomata, milimetric, average_drifter [14:42:43] drdee: hello :) [14:44:11] any news from the bug hunting front? [14:44:40] I've constructed the test [14:45:03] to test...... [14:45:06] I've generated 10 tablet entries, and they all go into Mobile other [14:45:15] to test the decrease of mobile percentage [14:45:23] ok [14:45:23] I mean to find out why they have decreased [14:45:48] and now I want to revert the isMobile function which I made, so that I can see if it makes a difference in the percentage [14:48:27] why revert it? [14:49:33] i am not sure if we have to go to the bottom of this issue, if we have a good explaination than that's fine as well [14:49:45] i really want to get it deployed [14:51:06] I'm trying to revert it to see if I did not create a big problem by introducing the dichotomy with isMobile [14:55:40] oh oops, morning drdee [14:55:46] morning [14:55:47] i just figured out why my speakers weren't working! [14:55:54] linux is crazy :) [15:00:16] what happened [15:00:17] ? [15:13:55] brb reboot [17:48:58] hi drdee, how are you and how is the family? [17:50:13] I'm looking at https://www.mediawiki.org/wiki/Analytics and wondering whether there's a good place you want us to slap questions that we will someday want answers to (once Kraken is all ready) [17:50:32] I have some ideas for questions I want to ask once the oracle exists [17:51:05] drdee: you about? [17:51:24] dschoon: hi there! [17:51:31] hihi [17:51:33] I'm looking at https://www.mediawiki.org/wiki/Analytics and wondering whether there's a good place you want us to slap questions that we will someday want answers to (once Kraken is all ready) ... I have some ideas for questions I want to ask once the oracle exists [17:51:42] * sumanah may have missed a page [17:51:46] yeah, that's a great idea. [17:52:09] maybe https://www.mediawiki.org/wiki/Analytics/FeatureRequests [17:52:09] ? [17:52:25] something to distinguish it from the normal, day-to-day data requests we get [17:52:48] I worry that that name makes it sound like we'll have well-thought-out ideas for components you should build [17:52:58] when in fact these are future queries we'll want to run [17:53:13] okay, true [17:53:14] how about [17:53:22] (and how you service those via software is way more in your court than in the asker's) [17:53:26] https://www.mediawiki.org/wiki/Analytics/Dreams ? [17:53:29] ok [17:53:31] :D [17:53:38] maybe https://www.mediawiki.org/wiki/Analytics/Aspirations [17:53:46] fine with me [17:53:49] or https://www.mediawiki.org/wiki/Analytics/StunningGrandoiseVisions [17:53:56] I like Dreams best :) [17:54:15] Dreams it is! [17:57:44] ok, dschoon https://www.mediawiki.org/wiki/Analytics/Dreams exists [17:59:12] yay [17:59:38] oh man, that second one is so good [18:00:21] here [18:00:30] did you edit the main page to link to it? [18:00:42] ^^ sumanah [18:01:25] I edited [[Analytics]] yup! [18:01:43] feel free to move my link around tho [18:02:14] ottomata, milimetric [18:02:16] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [18:02:19] doh, my bad [18:02:23] btw dschoon it was Rachel Farrand who originally asked about userscripts etc [18:02:27] nice [18:02:31] yeah, great question [18:03:05] now, if y'all know how to answer any of these already, that'd be good to know [18:03:15] heh. i don't think we do yet. [18:03:18] but we will! [18:03:31] rock [18:03:35] several of those are great tests for event tracking [18:03:57] (send an event whenever a gadget is enabled/disabled) [18:04:28] hmmm! [18:04:39] good for new gadgets especially, then [18:27:42] ottomata i get a 503 on hue [18:28:33] uh oh, servers down? [18:28:44] looks like networking prob [18:41:16] ok, internet here bad, need food, moving locations, be back ina bit [18:43:32] aight [18:44:07] ottomata: leslie was working on eqiad networking last night [18:44:11] you should check in with her [19:55:05] so, packet loss! [19:55:18] maybe it is better, but i am very skeptical [19:55:20] watching this for now [19:55:21] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=Analytics+Webrequest+Packet+Loss&vl=&x=&n=&hreg[]=analytics100%5B3456%5D.eqiad.wmnet&mreg[]=packet_loss_average>ype=line&glegend=show&aggregate=1 [19:56:57] i'm also watching /proc/net/udp, so far so good [19:57:19] that is lower than it used to be :D [19:57:38] is this the partition by 4 solution? [19:58:25] yes [19:58:35] as long as it hovers close to 0, things are ok [19:58:42] normal range (IIRC) was 4.5% [19:58:51] yup [19:58:54] awesome! [19:59:19] so far there are no dropped packets in /proc/net/udp [19:59:24] but we'll see [19:59:24] if it stays zero during PST afternoon then we are fine because that's when we get the most requests [19:59:29] aye ok [20:01:25] ottomata: so those boxes are now consuming the incoming udp2log stream instead of an26? [20:02:55] brb getting a food [20:12:12] yeah [20:12:18] an26 is just a test playground [20:12:23] i had been doing it just on an03 for a while [20:29:08] ottomata: yo, got a sec for https://gerrit.wikimedia.org/r/#/c/39161/ ? [20:30:13] ori-l, ottomata: that is very annoying [20:30:23] it's because suddenly pep8 has been enabled [20:30:34] for python checking [20:31:10] i think it's good [20:31:14] temporary annoyance, long-term gain [20:31:25] well prod ops ;) [20:33:57] or feel free to fix it yourself :D [20:36:31] ori-l, looks fine, did you also set up an rsyncd instance on vanadium? [20:37:59] ori-l, drdee: personally, i'm not a fan of pep8, but i like the principle. (i think guido is crazy on a few things, like 2-space, lack of newlines in some places, etc) [20:38:15] i don't disagree with pep8 [20:38:20] i think it's okay [20:38:31] question is who is gonna clean it up [20:39:21] ottomata: no; can you help me w/that? [20:39:58] it's usually quite a bit of (tedious) work [20:40:55] drdee: the PEP8 doesn't apply; this is a puppet manifest. it's just a jenkins configuration issue. [20:42:03] bwerrr yeha, example, see misc/statistics.pp class misc::statistics::rsyncd [20:42:05] line 345 [20:43:20] looking [20:45:41] ottomata: i inherit from misc::statistics::base [20:45:43] i noticed the big FAILURE first but apparently it does not matter in this case, ignore my remarks [20:45:45] so i should have the same config [20:46:04] oh ok cool [20:46:12] then I think that should work [20:46:14] merging [20:46:25] thanks [20:46:55] oo actually [20:47:00] can you ammend with one change before I do [20:47:08] add vanadium to the list of allowed hosts for that rsync module [20:47:10] line 39 [20:47:45] actually, hm, you can do that if you want, but you don't have to [20:48:13] that would allow the vanadium to copy files to the stat servers [20:48:36] your job is running on the stat1 server, so i think its ok [20:48:42] i'll merge, you can change later if you need to [20:48:55] yeah, i'm doing about four different things atm so if i can defer this by a bit that would be great [20:48:58] k [20:49:05] i've been doing that all day too :p [20:50:07] /a/eventlogging doesn't exist on stat1; next puppetd run i guess? [20:51:22] ah, ori, can we copy the files to /a/eventlogging/archive [20:51:22] ? [20:51:34] yes, that's fine [20:55:56] cool, ori-l, merged and in place on stat1 [20:56:07] thanks! [21:13:52] ottomata: can you explain why packet loss was negative about half an hour ago? [21:14:02] we ... made up new, magical packets? [21:14:04] haha, barely, i don't really know what negative means [21:14:05] hehe [21:48:54] erosen, drdee, mind if i try to start a benchmark? [21:49:00] go for it [21:49:03] go for it [22:31:58] ottomata, are you running something on kraken? [22:32:36] oh you are [23:27:15] I'll be back in 30m, getting some late lunch.. [23:27:18] erm dinner [23:28:54] dschoon: hate to ask an old limn question, but can you spare a second to see help with a graph description problem? [23:29:36] Sure. I'll come up in a few minutes. [23:29:45] Need to finish writing up today's interview. [23:29:49] k [23:45:43] omw up, erosen [23:45:48] sweet [23:53:51] Hi [23:53:55] Hi Matt!