[00:33:54] am I allowed to use the mongodb on stat1 ? [00:47:00] got the bots showing up in the reports [00:47:09] I'll make a short and quick experiment with redis [00:47:17] locally [00:47:32] I want to make a producer that takes gunzips and throws log lines in redis [00:47:36] and then fire up 4 consumers [00:47:52] and see how much time they take to consume up one day of data [00:48:37] I'm curious what will happen in terms of runtime [00:48:55] is there any restriction on stat1 ? like normally only 2/16 cores are used [00:49:15] it's sort of bursty in my experiences [00:49:16] and I was thinkin if nobody else uses the other 14 cores, I might have something for those cores to do [00:49:22] erosen: bursty ? [00:49:22] like right now i'm using 15…;) [00:49:32] like most of the time nothing [00:49:42] and then sometimes a lot of use [00:49:48] usually just by one person [00:51:07] erosen: do you have background jobs of your own ? [00:51:22] not really [00:51:36] hm ok [00:51:37] usually i just am trying to churn through some data, so I spin out a bunch of processes [00:51:43] (like now) [00:52:55] average_drifter: do you need some stat1 cores now? [00:53:11] I can hold back a bit [00:53:35] erosen: not right now, no [00:53:41] kk [00:53:54] right now I'll do it locally, but I will move to stat1 if I get positive results [00:53:59] cool [00:54:08] i'm out for the day [00:54:11] later [00:54:18] see you soon :) [01:05:39] dschoon you're gonna love this [01:05:40] # FIXME: donkey-punching timeseries module to parse dates better [01:05:41] CSVData.prototype.parseDate = (s) -> moment(s, "YYYY-MM-DD_HH").toDate() [01:40:38] average_drifter: am I allowed to use the mongodb on stat1 ? [01:40:57] yes you are but don't go crazy with optimizing this ;) [01:41:50] ok no problem [01:56:43] !log Limn just released a new version to production! Head over to http://reportcard.wmflabs.org/ and check it out. For a peek at hourly mobile pageview data and maps, check: http://dev-reportcard.wmflabs.org/graphs/pageviews_mobile_hourly and http://dev-reportcard.wmflabs.org/graphs/editors_by_geo [01:56:46] Logged the message, Master [02:06:01] milimetric: nice job on the hourly graph [02:06:09] thanks bud [02:06:15] credit's all yours though [02:06:21] that was easy data [02:06:37] oh, I made my own datasource though, for I do not speak that bizarre !!! str language [02:06:46] hehe [02:06:59] that is because yaml treats the word "no" as false [02:07:03] and break limn [02:07:13] unless you explicitly give each value a type argument [02:07:34] i was gonna ask about the datasource actually [02:07:41] assuming it hadn't looked so weird [02:07:46] could you have even used the old format? [02:08:12] yes with only one exception [02:08:16] type: timeseries is required [02:08:22] at the top level [02:08:32] that's the only thing we did to update it to the new limn [02:08:44] gotcha [02:08:51] cool [02:09:10] this is the thing about YAML that I never knew until I had some super weird bug http://blog.teamlazerbeez.com/2009/04/15/yaml-gotchas/ [02:09:38] ooh crazy [02:09:51] good to know, thank you [02:10:10] gonna go crash now :) [02:10:14] later [02:10:19] see you tomorrow [03:14:58] drdee: http://lists.gnu.org/archive/html/parallel/2011-11/msg00004.html [03:15:17] drdee: I tried unpigz -c on a sampled file from stat1 [03:15:26] unpigz vs gzip -dc [03:15:30] 10s vs 18s [03:15:57] although unpigz uses threads [03:16:08] and gzip -dc is just one process [03:18:39] http://zlib.net/pigz/ [03:31:48] 1 sec [03:32:17] so that's a quick win :) [03:36:53] yep, have to try it [05:16:57] !log restarting analytics udp2log instance on an03, an04, an05, an06…something is either weird with them or with the consumer [05:22:19] !log restarting analytics udp2log instance on an03, an04, an05, an06…something is either weird with them or with the consumer [12:47:56] morning! [13:42:51] moooooorning guys! [14:22:33] morning [14:22:41] we got 0 byte import file sizes, woohooo! [14:23:21] for which stream? all? [14:24:46] yeah looks like it [14:24:54] grumble [14:26:57] were you notified by ganglia or something? [14:34:22] no, i just looked, [14:34:40] saw it last night right before bed [14:34:45] i restarted udp2log instances hoping it would help [14:35:14] i'm not sure what's happening, all of the numbers look ok [14:35:24] there are bytes streaming into kafka, according to the monitor stuff I have going [14:35:30] k [14:35:47] same issue as after xmas? [14:39:49] ah no, they don't look the same….i'm not sure yet [14:39:53] investigating [14:40:18] no, not the same issue as then, at that time I was seeing dropped drops from the kernel queue [14:40:26] k [14:42:05] totally weird [14:42:05] http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&h=analytics1005.eqiad.wmnet&c=Analytics+cluster+eqiad&m=kafka_producer_KafkaProducerStats-webrequest-wikipedia-mobile.ProduceRequestsPerSecond [14:44:08] uuuuuhhhhhhhhh [14:46:29] yeah [14:46:41] my test udp2log instance on an08 seems just fine [14:46:55] !log stopping udp2log on analytics1005 to investigate [14:46:58] Logged the message, Master [14:47:39] doesn't seem to be a udp2log problem [15:01:15] found the problem, not really the cause though [15:01:27] udp2log tried to restart java processes, but for whatever reason the other ones didn't shut down [15:01:40] so it can't star tthem back up, because the ports are in use. [15:02:42] so it is a udp2log problem [15:02:44] sorta [15:05:02] welp, that sucked [15:05:09] they should be all back up an running now [15:05:52] weird that they all stopped at the same time, I guess either the kafka producers or the udp2log daemons all do something nasty ether after a certain amount of time or a certain number of bytes [15:06:11] !log restart analytics udp2log instances. I had to kill all udp2log processes manually before restarting. [15:06:13] Logged the message, Master [15:09:48] mmmmmm, hopefully you can find out what causes thi [15:09:49] s [15:14:41] milimetric: how was the geo_editors graph constructed? [15:15:01] /graphs/editors_by_geo? [15:15:02] or drdee ^^ [15:15:05] yeah [15:15:19] you mean from what datasource or how? [15:15:26] which datasource [15:15:33] just curious where the data came form [15:15:36] fabian's editorship data [15:15:39] i see [15:15:43] that had editors5, editors100, and editors [15:15:45] so it is rather old then [15:15:48] yes [15:15:50] aah [15:16:09] david's not done with it, 'cause of his plague thing :) [15:16:17] right [15:16:20] well cool [15:16:25] he's gonna add an infobox [15:16:32] I just ask because I have a database with that info updated daily [15:16:38] :) [15:16:43] that's why we love you Evan [15:16:47] well, other reasons too [15:16:50] but mainly that [15:16:52] hehe [15:17:19] I was also thinking about the possibilities when using a url based datasource to just make a request to a webserver [15:17:19] w [15:17:31] so maybe one thing to work on would be porting that analysis to kraken and pointing the graph to hdfs [15:17:34] which sends back the appropriate datafile [15:17:47] yeah, definitely [15:17:59] cool [15:18:12] remote datasources open up a lot of possibilities. Though I'd think for now we want to focus on the Kraken -> Limn pipeline [15:18:17] and branch out from there [15:18:19] yeah I'm not sure if kraken is ready for edit events, but I'll keep an eye out [15:18:42] yeah, that priority list is above my paygrade [15:18:45] though they might be included in the event logging stream [15:18:49] hehe [15:18:58] yeah, so if we get kraken to consume event logging stuff, we'd be all set [15:19:02] I think that is a priority [15:19:36] indeed [15:20:09] kraken is already consuming event logging data [15:20:35] yup! [15:20:44] hit event.gif and you will get a log in kraken! [15:20:48] "using a url based datasource " would json-ld be something to investigate? [15:20:50] drdee, ok, here's what I think happened [15:21:17] json-ld => http://json-ld.org/ [15:21:29] for some reason, the kafka producers were not able to connect to the brokers at around 2013-01-08 13:39 [15:21:29] just looking at it now [15:21:46] i see these in kafka.log [15:21:46] [2013-01-08 13:39:44,920] 15654440 [ProducerSendThread-836611963] ERROR kafka.producer.SyncProducer - Connection attempt to 10.64.36.121:9092 failed, next attempt in 10 ms [15:21:47] ottomata: i have some ideas on that [15:21:55] I was running a pig job last night [15:22:04] which weirdly took way longer than it did the last time I had ran it [15:22:15] and I got a low memory warning from Pig [15:22:19] hm [15:22:27] ahum indee [15:22:35] most likely it is a common cause [15:22:39] i don't see how that could be related to udp2log + kafka though [15:22:48] they run on different machines than your pig stuff [15:22:50] aren't an01 in the cluster? [15:22:53] aaah [15:22:57] an02-05 [15:22:59] i meant [15:23:07] but these aren't running on an01, and even so, the pig script you run doesn't run on an01 when you submit it [15:23:18] it runs on an10-an21 [15:23:23] i seeeee [15:23:27] okay, nvm then [15:23:43] but, since all of the producers had a broker connection problem at the same time [15:23:50] that seems like either a networking issue, or a broker issue [15:24:39] ottomata: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=mem_free&s=by+name&c=Analytics+cluster+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [15:25:22] free memory is very low on most nodes [15:27:52] ok, an21's kafka broker restarted at the same time this started happening [15:29:18] http://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&m=disk_free&s=descending&c=Analytics+cluster+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [15:29:34] shows free disk space; i am not sure if the y-axis are correct [15:29:49] Jan 8 13:39:31 analytics1021 kernel: [5146389.359914] init: kafka main process (19429) terminated with status 1 [15:29:50] Jan 8 13:39:31 analytics1021 kernel: [5146389.359967] init: kafka main process ended, respawning [15:30:44] ok, i'm not sure why the kafka broker restarted, but [15:30:49] even if it did, normally that would not be a problem [15:31:27] but, because it did, the kafka producers did something funky [15:31:35] udp2log tried to restart them, but they never really shut down [15:31:43] so the old processes were using the ports [15:31:49] so udp2log couldn't restart them [15:35:08] so the restarting of udp2log seems to be the weakest chain, [15:35:18] that was also an issue right after xma [15:35:19] s [15:35:56] ottomata, are you worried about the ganglia charts or is that info incorrect? [15:36:47] yeah, what the free mem? [15:36:49] no i'm not worried about free mem [15:36:58] and free disk space? [15:37:03] kernel will allocate memory and not free it up until it needs to [15:37:12] k [15:37:43] what are those units? [15:37:53] that's my question as well :) [15:38:09] 19k of what :) [15:38:24] kb / mb / gb / :D [15:38:25] GB [15:38:45] well that's not obvious from the graph [15:38:54] yeah, i found it by goign to the actual graph of a few [15:39:00] k [15:39:07] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Analytics+cluster+eqiad&h=analytics1027.eqiad.wmnet&v=128.702&m=disk_free&jr=&js=&vl=GB&ti=Disk+Space+Available [15:39:21] ok [15:45:12] hmmmmmmmmm drdee, i think I am wrong [15:45:14] the times don't match up [15:45:20] yes kafka broker on an21 did restart [15:45:25] but that was on jan 08 13:49 [15:48:10] not jan 10 05:24 (that's when we started losing produce events) [15:55:43] sigh [15:58:52] ? [16:00:21] pretty stumped at the moment [16:00:28] here's somethign super weird [16:01:13] between 2013-01-10_00:15 [16:01:13] 2013-01-10_02:51 [16:01:20] there weren't any consumed directories created [16:01:23] byt [16:01:26] but [16:02:26] once they started up again [16:02:40] the directories for each 15 minute interval was created [16:04:28] for the missing time period? [16:05:20] yeah, and this is before we started seeing 0 filesizes [16:05:51] but, the directories for those 15 minute intervals between those times [16:05:55] each contain much more data than usual [16:06:34] because kafka held in it's buffer? [16:07:32] basically, each of those 15 minute intervals between [16:07:33] 2013-01-10_00.30.00 [16:07:33] and [16:07:33] 2013-01-10_02.45.00 [16:07:33] contain data for logs between [16:07:34] 2013-01-10T00:15:25 [16:07:34] and [16:07:34] 2013-01-10T01:28:00 [16:07:53] odd [16:07:58] i know [16:08:00] (to say the least) [16:08:03] pretty stumped [16:08:13] then at 2013-01-10_03.00.00 [16:08:17] 0 data starts [16:08:18] another reason to go to storm? [16:08:37] that would help keep the buckets sorted properly [16:08:48] but, wouldn't solve the missing data problem [16:09:07] you know, flume has rolling bucketing based on timestamp import into hdfs built in :p [16:09:18] maybe oozie recreated those folders? [16:09:28] hmmmmmmmmmMMmmmmm, nooooooo [16:09:32] because there is actual data in them [16:09:48] ok [16:10:14] you know, i hate to admit this, but in this hybrid situation we have right now (udp2log and no storm) [16:10:25] flume would be better suited for what we are doing (importing log data into hdfs) [16:10:35] HAHAHA [16:10:43] well, bring it up in scrum today :) [16:10:47] k [16:10:53] do we have scrum? is metrics meeting today? [16:11:08] yes but we should quick scrum [16:11:18] what time is metrics meeting?, same time as scrum? [16:11:23] 30 minutes later [16:11:33] ok cool, then eyah for sure [16:11:36] 130 est [17:39:58] man i am so confused [17:40:11] there are so many pieces that seem to be out of place to this problem, I keep trying to summarize it but then run into other things [17:40:25] drdee, or dschoon (if you are up) can I call you and walk you through what I'm looking at? [17:40:37] i think it will help if I say it out loud [17:41:55] ottomata https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:43:39] ottomata ^^ [17:50:29] http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=analytics1003.eqiad.wmnet&c=Analytics%20cluster%20eqiad&m=kafka_producer_KafkaProducerStats-webrequest-all.100.ProduceRequestsPerSecond&r=day [18:00:59] morning [18:32:55] it really is a nice graph [18:39:39] ROFLOL: http://en.wikipedia.org/wiki/User:$5000_Heroin_Per_Day [18:39:49] and scroll to "What's up with the name?" [19:01:35] http://www.youtube.com/watch?v=Bt68Yd1mt_k&feature=youtu.be [19:01:42] that's the stream link, in theory [19:30:42] average_drifter: do you have a sec for a quick wikistats question? [19:31:01] i'm inetersted in /a/wikistats_git/dumps/csv/csv_wp/StatisticsMonthly.csv again [19:39:16] new drinking game - we drink everytime someone mentions funnel analysis [19:58:25] really? but there are so many moments i wouldn't be drinking! [20:01:21] erosen: go ahead with the question [20:01:32] erosen: you can't find the csv ? [20:01:32] it looks like it was updated on jan3 [20:01:37] i found it [20:01:44] ok [20:01:45] but I'm looking for december data [20:01:59] erosen: what are you interested in ? [20:02:01] looking through it, seems like some languages are finished and other aren't [20:02:02] ar [20:02:03] arabic [20:02:09] and a few others actually [20:02:23] i guess my question is more broadly about how this file gets generated [20:03:17] and what I can expect in terms of updates (as I am using the file for some graphs that I have on a dashboard) [20:03:30] updates *should* be monthly [20:03:40] poke erikz directly about this [20:03:41] spetrea@stat1:/a/wikistats_git/squids/perl$ ack "StatisticsMonthly.csv" [20:03:43] pure chutzpah and braggoticato [20:03:48] VizPrepJs.pl [20:03:53] erosen: VizPrepJs seems to do this [20:04:13] i already have the graphs: http://global-dev.wmflabs.org/graphs/ar_wp_active_editors [20:04:35] i'm just trying to figure out when I should tell people that it will be updated [20:12:49] okay, lunch. [21:04:50] drdee: https://github.com/TheWeatherChannel/dClass/commit/3f1cd74128fbe1f77ee3daf5016a83a1385d4a8d [21:04:53] drdee: Reza is a nice guy [21:05:01] drdee: do you know him personally ? [21:05:11] no [21:05:22] drdee: how did you find dClass ? [21:05:31] drdee: you sent me that e-mail, where did it come from ? :) [21:05:42] that's a secret my friend ;) [21:05:48] :D [21:06:04] it was on the mailing list of the apache device map project [21:06:15] oh, I understand [21:06:18] a project that i have been trying to help a little bit [21:06:24] it's a incubator project [21:06:26] very small [21:23:30] back [21:51:44] drdee: what is the status of x-carrier headers? [21:51:58] semi-deployed :) [21:52:05] that's something that we need to look into [21:52:11] it's not yet working as expected [21:52:29] not working, as expected…. hehe [23:24:33] hey drdee [23:24:40] yo [23:24:46] was waiting for you in the hangout [23:24:48] my calendar forgot to remind me of the meeting, do you have time now? [23:24:55] gah - sorry :( [23:26:57] yup [23:27:05] ok, let me find a room [23:35:39] http://meta.wikimedia.org/wiki/Research:Metrics#Funnel_metrics