[00:39:21] Analytics / Wikimetrics: Admin script is really slow now that there are lots of reports - https://bugzilla.wikimedia.org/70775 (Kevin Leduc) [00:40:06] Analytics / Visualization: Story: EEVSUser loads static site in accordance to Pau's design - https://bugzilla.wikimedia.org/67806 (Kevin Leduc) [00:41:00] Analytics / Wikimetrics: Story:d WikimetricsUser runs 'Rolling Recurring old active editors' report - https://bugzilla.wikimedia.org/69569 (Kevin Leduc) [00:42:36] Analytics / Dashiki: Story: Bookmarks / Statefull URL. Define protocol and use it to bootstrap the dashboard and keep state - https://bugzilla.wikimedia.org/70887 (Kevin Leduc) p:Unprio>Immedi s:normal>enhanc [05:40:37] Analytics / Tech community metrics: Allow contributors to update their own details in tech metrics directly - https://bugzilla.wikimedia.org/58585#c28 (tessyjoseph199@gmail.com) Hi I would like to work with this project for opw round9.I could not find the github repo and also I would like to know the... [07:59:48] (CR) Gilles: [C: 2] Dedicated page for UW funnel analysis [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/161290 (owner: Gergő Tisza) [08:24:14] (CR) Gilles: [V: 2] Dedicated page for UW funnel analysis [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/161290 (owner: Gergő Tisza) [09:11:22] Analytics / Tech community metrics: Allow contributors to update their own details in tech metrics directly - https://bugzilla.wikimedia.org/58585#c29 (Andre Klapper) Hi Tessy! Thanks for your interest! (In reply to tessyjoseph199@gmail.com from comment #28) > I could not find the github repo Please... [10:57:06] Analytics / Tech community metrics: Graphs for median/average should report absolute numbers - https://bugzilla.wikimedia.org/66266#c3 (Alvaro) Quim, you have now the number of reviews used for computing the age of open changesets in: http://korma.wmflabs.org/browser/gerrit_review_queue.html There ar... [11:07:39] Analytics / Tech community metrics: Allow contributors to update their own details in tech metrics directly - https://bugzilla.wikimedia.org/58585#c30 (Quim Gil) This search query points to Tech community metrics bugs with the "easy" keyword: https://bugzilla.wikimedia.org/buglist.cgi?component=Tech%20... [13:50:38] qchris: hiya! [13:51:13] hallo ottomata [13:51:45] hm, i was about to offer a quick summary of magnus and my inconclusive kafkatee findings yesterday, but i guess i'll just wait for standup...that's what standup is for! [13:52:11] Either way is fine by me :-) [13:52:35] (IRC has the advantage that I can grep for it ... no grep for hangouts) [13:57:23] qchris: this affects us too, although I am unsure if it is related to our kafkatee problems [13:57:23] https://github.com/edenhill/librdkafka/issues/147 [13:57:38] * qchris looks [13:59:39] Interesting [14:01:48] The issue feels way less awkward, when seeing that others are suffering similar issues too. [14:02:40] ottomata: Standup time \o/ Tell us about librdkafka issues :-) [14:03:04] SNAKEY [14:03:05] SNEAKY [14:28:24] qchris: pybal has moved to an internal host [14:28:34] http://config-master.eqiad.wmnet/pybal/ [14:28:58] Thanks! [14:29:17] But why internal? :-( [14:30:20] more like: why external, i think [14:30:25] mark said we could proxy it if we need to [14:30:33] i liked it public, but I don't really have a need for it to be....do you? [14:30:36] you can curl it? [14:30:51] I can curl it from stat1002, which is the immediate need. [14:30:54] aye [14:30:54] So it works. [14:30:56] yeah [14:31:11] But external is nice, because people like to look at our serivecs and tehir configs. [14:31:21] yeah [14:31:23] s/tehir/their/ [14:31:24] i liked it too [14:31:29] I mean ... we had icinga open. [14:31:42] i think ops would be ok with making it public, but it would take someone asking for it :) [14:31:43] and only closed it because the webui has hosles. [14:31:45] ahem, you? :) [14:32:04] get matanya or jeremyb to back you up :) [14:32:07] I guess that's asking on the ops list ... or creating an rt ticket? [14:32:12] ja, either [14:32:20] is matanya on the ops list? [14:32:27] i am [14:32:32] ok :-) [14:32:33] what do you need? [14:32:56] Support when I ask ops to open up pybal again [14:33:02] and make it readable to the world again. [14:33:33] you have it :D [14:33:44] Thanks :-D [14:35:18] oh qchris [14:35:18] https://gerrit.wikimedia.org/r/#/c/160467/1 [14:35:43] and _joe_ is saying he wants to make it public anyway, just hadn't done so yet [14:36:09] Thanks. [14:38:33] qchris: i'm going to try some things with analytics1021, then will look into syncing from hdfs/hive stuff [14:39:32] Any chance we could switch them around, because I want to catch some of today's sunrays :-) [14:39:42] ah! [14:39:42] sure! [14:39:46] we can! [14:39:47] :) [14:39:50] Cool. [14:39:54] So about the syncing ... [14:39:59] fyi, i just checked up on that zookeeper mistmatch with 1021 [14:40:11] I really would want to produce the files through oozie. [14:40:22] any partition for which 1021 is a replica but is not the leader, has an ISR that does not include 1021 [14:40:27] That would help a ton when backfilling and caching up after issues. [14:40:28] but, ok, onto oozie! [14:40:40] hive can write to the plainfs. [14:40:47] But that is plainfs on the datanode. [14:40:56] as in, a separate oozie job that is triggered by the existence of the dataset? or just another action? [14:41:05] a separate job. [14:41:40] Because then failures are more isolated. [14:42:53] So if we write to plainfs on the datanote ... how can we get the data from there to {dumps,datasets}.wikimedia.org [14:42:55] ? [14:43:02] aye cool, and then the job is reusable too! [14:43:04] After all, we do not know, which datanode wrote it. [14:43:04] :) [14:43:10] So I thought ... [14:43:12] hm, there's got to be a better way [14:43:18] * hashar doesn't understand anything [14:43:20] Could we have an nfs mount that's mounted on all datanodes? [14:44:02] eeeef, probalby not :/ [14:44:04] hMMMM [14:44:06] or, hm [14:44:13] there is a dumps mount of some kind...but it might be read only [14:44:17] and i think we are discouraged from using it [14:44:32] Then (whatever datanode is in charge) could write to that mount, and we can pick it up from there and bring it to dumps.wikimedia.org [14:44:36] .e.g stat1002:/mnt/data [14:44:47] ja it is mounted ro though [14:45:05] that is a good idea, but i think we are very discouraged from using hdfs [14:45:06] hmm [14:45:11] It need not be an nfs mount from dumps.wikmedia.org [14:45:15] aye [14:45:17] but still [14:45:20] i think we are discouraged [14:45:22] but maybe [14:45:41] Ok. [14:45:47] What alternatives are there? [14:46:04] WE could use cron to run hive to extract the data. [14:46:09] another more primitive idea: i betcha we could install a hadoop client on dumps and just run a cron hdfs dfs -get [14:46:09] :/ [14:46:12] or hive [14:46:13] But cron is not aware of :SUCCESS files. [14:46:16] yeah [14:46:19] oozie is better here for sure [14:46:31] hadoop client on dumps? [14:46:36] Whoa. That'd be great. [14:46:44] Oh wait. [14:47:07] That does not help with oozie... [14:47:15] Oh! Have hive produce [14:47:21] the data files onto the cluster again. [14:47:34] ? [14:47:37] and then use the hadoop client to get them onto dumps. [14:47:41] you're right. [14:47:49] I misunderstood you before. [14:48:11] hadoop clients ond dumps.wikimedia.org would help. [14:48:24] yeah, would help [14:48:26] still annoying because you coudln't easily reschedule it [14:48:30] and there's no hdfs rsync :/ [14:48:32] The files are currently snappy compressed, but we can turn compression off. [14:48:44] (for the pagecount files) [14:49:19] If we go down that road ... where does the gzip happen? [14:49:26] maybe this could be useful somehow? [14:49:26] I guess directly on dumps? [14:49:26] https://wiki.apache.org/hadoop/MountableHDFS [14:49:36] although i heard it isn't that reliable...but that was a while ago [14:49:55] I would not want to rely on fuse. [14:50:07] It broke many times on me. [14:50:08] whoa [14:50:08] https://github.com/cloudera/hdfs-nfs-proxy [14:51:11] That would be nice. [14:51:16] haha, if it worked [14:51:19] Even comes from cloudera :-) [14:51:43] http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_hdfs_mountable.html [14:51:50] i dunno, i wonder if it wouldn't be so bad, just for reading [14:51:58] maybe we could use that mount that on dumps (or s [14:52:01] oops [14:52:08] use that on dumps to mount hdfs and then just rsync from it? [14:53:30] Did you try fuse before? [14:53:48] kevinator: did you want to go with '*' for all namespaces or just blank? [14:54:34] qchris: not really, not for a long time. i think i used it for somethign on my mac a loooong time ago [14:54:43] i've also heard that it is unreliable [14:54:55] but, maybe since cloudera supports it and ahs for a while. AND if we use it as readonly [14:54:58] it will work? :p [14:55:00] I just installed hadoop-hdfs-fuse on my test-cluster. [14:55:09] Gonna toy around with it a bit. [14:55:19] ok [14:55:25] that would be the easiest, right? [14:55:32] just rsync -r a directory from hdfs [14:55:49] Well ... we need to reformat the file a bit. But roughly an rsync. Yes. [14:55:50] tha way if we rerun jobs, the older files would be picked up by the rsync [14:55:54] aja [14:56:03] oh gzip...hive/oozie can't do gzip? [14:56:15] probably can, ja? [14:56:16] just set [14:56:22] It can gzip. [14:56:22] mapreduce.output.compression.codec or whatever? [14:56:29] But gzip != gzip. [14:56:31] :-/ [14:57:21] haha [14:57:22] eh? [14:57:52] org.apache.hadoop.io.compress.GzipCodec is not the same as gzip? [14:58:51] I did not try specifically, but when programs use gzip algorithms on some structures, the output is different from [14:58:58] writing the structures to disk and gzipping them. [14:59:03] hm ok, i guess we have to test [14:59:06] header/metadata/... [14:59:11] qchris: if you are testing the fuse hdfs thing, check to see if you can mount a subdirectory in hdfs [14:59:12] But the encoded content is the same. [14:59:12] not just / [14:59:27] Ok. Will do. [15:35:20] ottomata: fuse does not fall appart immediately for hdfs. [15:35:24] no subdir mounts [15:35:38] but bind mounts on top of fuse obviously work [15:35:46] (but I doubt they serve what you were after) [15:36:07] And it can serve files of the size thaht we need. [15:37:17] So I guess we try to use it for webstatscollector and see if it breaks? [15:37:46] s/I guess we/should we/ [15:46:07] ottomata: ^ [15:46:27] in 1:1 [16:05:53] I'll gonna call it a day and head into the weekend. [16:06:10] Let's continue the discussion next week. [16:06:13] ok qchris [16:06:15] sorry, long 1:1 [16:06:24] No worries :-) [16:06:26] Have fun. [16:06:39] qchris, [16:06:44] what instance do you have stuff mounted on? [16:06:46] i want to play with it too [16:06:53] qchris-worker2 [16:06:55] ok awesome [16:07:00] will try, thanks! have a good weekend! [16:08:06] Running (as root) [16:08:11] umount /mnt/hdfs ; echo ---------------------- && hadoop-fuse-dfs dfs://qchris-master.eqiad.wmflabs/wmf /mnt/hdfs && echo ---------------- && ls -l /mnt/hdfs/ [16:08:19] should have all the things you need. [16:08:34] Bye. [16:12:00] (PS1) Milimetric: Update NamespaceEdits metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/161470 [16:38:40] (PS1) Milimetric: Update PagesCreated metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/161472 [16:50:38] Analytics / Wikimetrics: Story:d WikimetricsUser runs 'Rolling Recurring old active editors' report - https://bugzilla.wikimedia.org/69569 (Kevin Leduc) [16:52:22] Analytics / Dashiki: Story: Bookmarks / Statefull URL. Define protocol and use it to bootstrap the dashboard and keep state - https://bugzilla.wikimedia.org/70887 (Kevin Leduc) [16:52:40] Analytics / Dashiki: Story: Bookmarks / Statefull URL. Define protocol and use it to bootstrap the dashboard and keep state - https://bugzilla.wikimedia.org/70887 (Kevin Leduc) [16:54:26] milimetric: any idea why this bug isn’t showing up in our sprint in scrumbugs: https://bugzilla.wikimedia.org/show_bug.cgi?id=70887 [17:00:08] Analytics / Wikistats: report SquidReportCountryData.htm does not show country names or demographics - https://bugzilla.wikimedia.org/57376#c6 (Erik Zachte) NEW>RESO/FIX Fixed, see http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm [17:05:49] kevinator: it didn't have "bugwatcher@sb-mail.wmflabs.org" in the cc list [17:05:58] not sure why - i'll brb, lunch [17:08:53] Analytics / Dashiki: Story: EEVSUser loads dashboard from URI that specifies state / EEVSUser copies URI that recreates current dashboard state - https://bugzilla.wikimedia.org/70887 (Kevin Leduc) [17:26:52] Analytics / Dashiki: Story: EEVSUser loads dashboard from URI that specifies state / EEVSUser copies URI that recreates current dashboard state - https://bugzilla.wikimedia.org/70887#c3 (Kevin Leduc) testing if scrumbugs will catch this and put it into current sprint. [18:00:12] pizzzacat: I'm in the batcave (http://goo.gl/1pm5JI) [18:01:10] pizzzacat: I'm in the batcave (http://goo.gl/1pm5JI) [18:05:26] milimetric: https://www.youtube.com/watch?v=iaM_tYWBdyU [18:05:32] ? [18:09:22] yuvipanda: I am very confused [18:09:47] oh [18:09:48] lol [18:10:20] obviously I'm not batman, I'm just the butler [18:12:29] milimetric: but then who is batman?! [18:12:43] i always thought it was Diederik [18:13:46] but he hasn't been around the batcave for a while [18:13:53] then again, that's *just* what the butler would say [18:15:21] heh [18:18:41] Analytics / General/Unknown: kafkatee not consuming for some partitions - https://bugzilla.wikimedia.org/71056 (Andrew Otto) NEW p:Unprio s:normal a:None Below is a paste of an email I sent to Christian and Magnus in an effort to describe this problem. I will add a comment with further ins... [18:29:07] Analytics / General/Unknown: kafkatee not consuming for some partitions - https://bugzilla.wikimedia.org/71056#c1 (Andrew Otto) Magnus and I worked to try to figure out what was going on. We have upgraded librdkafka to 0.8.4 on analytics1003 (and also attempted to use broker offset storage). By only... [18:40:44] !log reenabled kafkatee with all webrequest topics as input on analytics1003 [19:09:37] Analytics / General/Unknown: kafkatee not consuming for some partitions - https://bugzilla.wikimedia.org/71056#c2 (Andrew Otto) Hm, when I restarted kafkatee with all webrequest topics as input, it exhibited the same behavior as described in this bug's description. About 4 different partitions had lea... [19:22:07] (PS2) Milimetric: Update NamespaceEdits metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/161470 [19:49:42] (PS1) Milimetric: Add Rolling Recurring Old Active Editor [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/161521 [19:49:52] Analytics / General/Unknown: Lines with “nan” for “Request service time” column and empty HTTP status code in cache logs - https://bugzilla.wikimedia.org/59645 (Andrew Otto) NEW>RESO/FIX [19:50:00] kevinator: all three metrics have a corresponding patch in gerrit [19:50:15] but i'm pretty skeptical about the recurring old performance [19:50:21] it's grabbing like the whole user table [19:51:27] (PS2) Milimetric: Update PagesCreated metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/161472 [19:55:54] Analytics / Quarry: Quarry shows old username after user was renamed - https://bugzilla.wikimedia.org/71064 (Helder) NEW p:Unprio s:normal a:None The first time I logged in to Quarry was before my global account was renamed. Now, if I go to http://quarry.wmflabs.org/ and click in the login... [20:09:39] Analytics / General/Unknown: Kafkatee zero files having 10% less requests than udp2log zero files - https://bugzilla.wikimedia.org/64181 (christian) [20:09:39] Analytics / General/Unknown: kafkatee not consuming for some partitions - https://bugzilla.wikimedia.org/71056#c3 (christian) (In reply to Andrew Otto from comment #2) > I believe I can conclude that analytics1021 is not the source of these > kafkatee problems. That's great! (But analytics1021 would... [20:25:05] kevinator: yt? [20:25:08] I’m in Elder [20:25:10] yes [20:25:15] (texted toby that I was running late) [20:25:28] I’m at my desk... [20:25:31] on my way... [20:25:35] wanna come over? we have the room [20:25:37] cool [21:30:30] kevinator: https://www.mediawiki.org/wiki/Template:In_progress