[00:31:19] man does my head hurt. [00:31:23] maybe i should go home or something [14:18:51] hi Andrew :) [14:24:13] morning [14:24:22] i'm about to run to a cafe in just a sec [14:24:25] but hi! [14:24:32] you know you can get into the analytics machines now, right? [14:31:22] wow :) nice [14:39:53] heeaaaayyyaaaa [14:39:57] it's very vey very white here [14:40:04] a very thick layer of snow [14:40:17] has engulfed the city of toronto [14:48:54] wow :) [14:49:02] we don't have snow here.. weird [14:49:06] seasons gone crazy [14:49:24] they discovered grolar bears recently [14:49:24] ha, we have rain that is about to turn into snow [14:49:24] grolar?! [14:49:26] which is a combination between polar and grizzly bears [14:49:27] yes [14:49:28] grizzly/polar? [14:49:29] whaaaa [14:49:37] they claim it's an adaptation to climate changes [14:50:23] ha [14:50:29] and the bee population in .us have some problems apparently, they started disappearing in many areas [14:50:30] so is my scarf [14:50:37] hehe [14:50:40] yeah, that's been a problem for a while [14:50:47] i thought I heard they figured that out or something [14:50:48] dunno though [15:03:27] morning [15:03:35] merrrningghghghg [15:03:38] :) [15:03:52] flying to SFO is a pain - what flights do you usually take ottomata? [15:04:01] whatever, they get me i guess [15:04:13] looks like i'm going ton delta [15:04:19] on [15:04:32] from JFK [15:04:39] what time you getting into sfo? [15:04:55] sunday 18.15 [15:05:22] ah! because it's not a weekday. That makes a lot of sense [15:05:47] I'd have to leave my house at like 3:15 to get there at a reasonable time - because I'm far from the airport [15:05:48] hm... [15:06:07] hey average_drifter [15:06:10] new reports ready? [15:11:15] drdee: they're running [15:11:22] drdee: but I have something else you can look at in the meantime [15:11:26] just december? [15:11:28] ok [15:11:37] a preview [15:11:38] http://stat1.wikimedia.org/spetrea/misc/preview_new_mobile_reports/pageviews.html [15:11:41] that's how they're going to look [15:11:47] hover your mouse over the cells please [15:12:00] the month cells have a "discarded piechart" [15:12:14] the cells in the table have a "breakdown piechart" in them [15:14:16] cool! [15:14:19] :) [15:14:22] but we need to figure out the 500M gap [15:14:26] that needs to be explained [15:14:36] and we need to have an explanation for amit and his country stuff [15:16:20] ottomata, have you seen this: http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-hdfs-httpfs/index.html [15:17:35] pretttyyyyyy sure that is how the proxy was working, just connecting to that [15:21:33] dumbass me [15:27:09] what can we do for Amit, I'm scratching my head [15:27:12] I wish I could do more [15:28:03] so basically the bug was that Uganda (I think) had half as many results than the previous report [15:28:11] and India too [15:28:19] right [15:28:37] drdee: and you told Amit that the issue was related to GeoIP, and that makes sense, it's possible that that was the problem [15:28:51] that's my suspicion [15:28:55] uhm, previously we were using that utilitary binary to geoip [15:28:56] but i am not sure either [15:29:16] right [15:29:19] so to prove that indeed that was the problem, I'd have to get that binary and find out which GeoIP.dat it's using [15:29:31] ok [15:29:32] and then get the data and geoip it with udp-filters and the old utility [15:29:39] and then compare [15:30:01] drdee: would you agree with this ? [15:30:27] yes, but only for a small slice of data :) [15:30:38] ok, I'll get a sample [15:30:40] so let's hunt for the GEoIP.dat file on stat1 [15:30:46] ok [15:31:44] well probably better to read the source code of latlong binary whatever it's called [15:31:56] and see which file geoip file it loads [15:33:44] average_drifter: source code is at http://svn.mediawiki.org/viewvc/mediawiki/trunk/tools/geoiplogtag/ [15:34:58] oink [15:35:08] it doesn't say where it loads the db from [15:35:09] oh nice, I was looking for that [15:35:10] thanks [15:35:29] ottomata: does puppet keep a log of packages installed and their versions on machines ? [15:36:03] ottomata: like if we previously had a different libgeoip-dev could puppet tell us what is the history of package versions that were present on stat1 and the times they were upgraded/changed ? [15:36:43] puppet does not, unless it the version was specifically set in the puppet manifest [15:36:56] ok [15:36:57] it currently does this [15:36:57] class geoip::packages { [15:36:57] package { [ "libgeoip1", "libgeoip-dev", "geoip-bin" ]: [15:36:57] ensure => latest; [15:36:57] } [15:36:59] which I HATE [15:37:01] aahhh I hate latest! [15:37:04] shoudl be present [15:37:13] yeah, definitely the explicit version would be more helpful [15:37:20] but, it could be in syslog [15:37:23] if we had that then we could go in the git repo and see when it changed [15:37:28] and track the versions across time [15:37:29] i don't mind not explicit version [15:37:36] but I don't like it when things change from underneath [15:37:43] also, the GeoIP.dat files are not packages [15:37:45] yeah me too [15:37:46] they are auto updated by a cron job [15:37:57] actually, not by a cron job [15:38:04] by puppet, but essentially the same [15:38:13] whenever maxmind releases new .dat files [15:38:15] they will be downloaded [15:38:30] downloaded by.. ? [15:38:33] cron ? [15:38:42] so i am pretty sure that's the cause: an update to the maxmind geo ip db [15:39:04] /usr/bin/geoipupdate [15:39:11] it ships with the geoip package [15:39:14] oh, which is probably in cron [15:39:18] well no [15:39:21] its run by puppet [15:39:26] on puppetmaster [15:39:38] hmmmm [15:39:38] nope [15:39:39] i'm sorry [15:39:40] doh [15:39:42] it is run bycron [15:39:46] ok [15:39:47] once a week [15:40:06] (i wrote this puppet manifest, I should know that :p) [15:40:10] does geoipupdate write about its update in some log file ? [15:40:15] hmmm [15:40:23] command => "$geoipupdate_command > /dev/null", [15:40:24] nope [15:40:28] hmm [15:40:33] that would be good though [15:40:40] yes, I agree [15:40:43] it would :) [15:40:59] i will make it do so [15:41:05] thank you :) [15:50:21] "Paul: are you logged into the Member Center now?Paul: are you logged into the Member Center now?" [15:50:31] So I'm talking to Maxmind customer support [15:50:40] and they say that they have an archive of GeoIP.dat databases [15:50:54] but I need to be logged into Member Center to have access to that [15:51:13] ha [15:51:13] uhhh [15:51:18] no idea [15:51:23] are we using commercial or free Maxmind ? [15:51:36] "Paul: After logging in, go to https://www.maxmind.com/en/download_files?show_all_dates=1 and that will list all of our back databases" [15:52:30] drdee: are we using commercial or free Maxmind ? [15:52:43] commercial [15:52:45] commercial [15:53:02] ok, how can I acquire the member credentials to it ? [15:53:07] 1 sec [15:53:09] ok [15:56:41] see pm [16:55:36] drdee, i'm sure you know, but both blog and event imports have been down since an01 was down [16:55:45] you might want to let tilman know (or I can) [16:56:05] can we re-enable those streams? [16:59:04] not til I have a public IP I can puppetize :( [16:59:11] i could do it manually [16:59:44] or, i could redo blog improts from the firehose [16:59:56] via grep / udp-filter -d [17:00:04] actually, probably grep on hostname for marmontel [17:00:14] those I don't mind doing manually [17:00:18] since they are already running [17:14:50] "There are 124 questions remaining in this section." [17:14:53] arrrgghhhh [17:15:02] hahaha [17:35:45] average_drifter new report ready? and news from the geo ip stuff? [17:36:19] I'm downloading all the geo ip databases [17:36:28] then I'm going to run reports against all of them with a sample dataset [17:36:39] and I'll find out on which of them India or Uganda is off [17:36:47] about the new report, it reached November 2012 [17:36:58] it's going to be ready quite soon [17:37:10] ~20-30m [17:37:19] max [17:39:26] guys, low on batt, gotta move [17:39:28] be back on for standup [17:59:02] yaaaawn [18:16:09] this is annoying: 17:20 UTCWe are preparing to perform some unscheduled fileserver maintenance. This will affect a rolling set 0.2% of repositories, which will be unable to push or pull for the duration of the maintenance. [18:16:27] from https://status.github.com/messages [18:19:38] ha, aww [18:24:09] thanks preilly [18:24:58] drdee: np [18:56:21] erosen, tell me how to hit vumi! [18:56:28] k [18:56:39] so it is running a jabber / xmpp client [18:57:27] so all you should need to do it get an xmpp client pointed at wikipediavumi [18:57:28] ok [18:57:32] and wikipediavumitest [18:57:33] ok, is that the production one? [18:57:40] wikipediavumi@? [18:57:43] gmail [18:57:45] sorry [18:57:46] k [18:57:52] i'm going to try from gchat instead of adium [18:57:57] yeah that is how I've done it so far [18:58:07] you ping the test accnt with anything [18:58:10] like 'hi!' [18:58:18] and then it starts of the used dialog [18:58:38] then when you confirm that you want some article content, it send it to you from the wikipediavumi accnt [18:58:55] ok yeah it works from gmail [18:59:00] actually i got those backwards [18:59:07] you ping wikipediavumi [18:59:08] to start it [18:59:14] and then the content is sent via vumitest [18:59:31] back [18:59:32] ? [18:59:39] so, is that the labs instance or the production one? [18:59:45] ooo [18:59:50] good q [18:59:52] i suspect the test [18:59:56] hm [19:00:00] hadn't thought about that [19:00:08] yeah labs [19:00:15] that works though [19:00:16] from bmail [19:00:17] gmail [19:00:19] just not from adium [19:00:21] so that was my problem [19:01:09] I have no idea how to actually use ussd, so it seems like we would just want to get the same xmpp thing set up for the prod instance [19:01:20] it might be easily reconfigurable [19:01:34] we just point it at a different server [19:01:37] do you know if it is live at all? [19:01:42] are people using it via sms [19:01:43] ? [19:02:00] not sure [19:02:06] i suspect it doesn't get much use [19:02:12] preilly is not far physically [19:02:15] maybe I will go grab him [19:02:25] oh he's around? cool [19:02:51] this seems like what we need: python /usr/bin/twistd -n --pidfile=/tmp/wikipedia_xmpp_transport_1.pid start_worker --worker_class=vumi.transports.xmpp.XMPPTransport --vhost=/develop [19:03:22] vhost [19:03:23] ? [19:08:29] k [19:08:31] just talked to preilly [19:08:53] says there is a config somewhere with some yaml which tells it which gmail accounts to listen on [19:09:00] hm ok [19:09:10] i'm actually trying to make udp2log work again on the labs instance now first [19:09:16] (now that I can test it is easier) [19:09:17] k [19:09:22] it looks like the metrics processes aren't doing anything [19:09:27] hmm [19:09:28] they don't even write to their own log files [19:09:31] hard to tell though [19:09:35] weird [19:10:31] hmmm wait oh yeah that's right [19:10:36] File "/home/jerith/vumitest/vumi/vumi/blinkenlights/metrics_workers.py", line 314, in startProtocol [19:10:36] self.transport.connect(self._ip, self._port) [19:10:36] File "/usr/lib/python2.7/dist-packages/twisted/internet/udp.py", line 195, in connect [19:10:36] self.socket.connect((host, port)) [19:10:36] File "/usr/lib/python2.7/socket.py", line 224, in meth [19:10:37] return getattr(self._sock,name)(*args) [19:10:37] socket.error: [Errno 13] Permission denied [19:10:38] hmm [22:08:27] drdee, ottomata, milimetric: please review https://www.mediawiki.org/wiki/Analytics/Kraken/Security_Review_Meeting [22:08:35] it's a work in progress, so add things as you see fit [22:09:27] dschoon, see also [22:09:27] https://rt.wikimedia.org/Ticket/Display.html?id=4433 [22:09:34] awesome, ty [22:31:33] i added it to the page. [22:31:43] it'd be good if people *actually* reviewed the content :) [22:31:46] i'm sure i didn't get everything [22:31:53] drdee, especially the data retention section [22:32:09] you are on point for our data release policy, so it'd be good if you revised that section [22:32:17] i can only comment on the technical specifics of our storage [22:38:43] dschoon - shouldn't I mention Limn on that page? Or is the scope exclusively back-end stuff? [22:39:04] let's keep it focused, it will be hard enough to keep the meeting focused [22:39:44] i agree with drdee [22:39:48] that's application security [22:40:01] this is an system architecture review, milimetric [22:40:18] good eve'n boys! [22:40:26] cheers, ottomata [22:40:28] man [22:40:28] i go to find potatoes and leaks [22:40:32] i forgot i did http://upload.wikimedia.org/wikipedia/mediawiki/1/1b/Kraken_flow_diagram-2.png [22:40:32] later bud [22:40:39] hot damn is that a beautiful diagram [22:40:51] system architecture - Limn servers would be part of that, no? [22:41:42] limn runs on port 80. [22:41:48] it is therefore covered in the ACL. [22:42:15] i agree it's important to do an application security review [22:42:22] but that's not really in the understood scope [22:42:33] and like drdee said, we need to make sure we get through the stuff already on our plate [22:43:22] cool, makes sense [22:59:41] milimetric, dschoon: what's the url to visualize a custom time series data source? I can't find any "Add new" button in http://test-reportcard.wmflabs.org/datasources [23:00:01] it's at the very top [23:00:02] it [23:00:02] wait, I'm an idiot [23:00:05] 's the title [23:00:06] yeah [23:00:06] right [23:00:13] i have a TODO to give that a proper UI [23:00:14] title blindness [23:00:17] yeah [23:00:25] bootstrap misleads you into thinking someone other than an engineer made the UI :) [23:02:59] dschoon, milimetric: I just gave a 30 sec demo to howie, that's pretty neat [23:05:20] how did it go? [23:53:38] DarTar: he like it? [23:54:06] yes he did :) [23:54:10] sweet