[01:57:12] Analytics / Quarry: Raise query limits - https://bugzilla.wikimedia.org/72342 (Aaron Halfaker) NEW p:Unprio s:normal a:None I'd like to raise the query limits because I want to run a big query and Quarry is a great place to SQL & dataset. http://quarry.wmflabs.org/query/794 returns all o... [09:50:50] (PS1) Yurik: Adapted analytics pageview counter [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168057 [10:16:08] Analytics / Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://bugzilla.wikimedia.org/69615#c13 (christian) Happened again for: 2014-10-21T14/2H (on upload) [10:54:10] Analytics / Refinery: Raw webrequest partitions for 2014-10-21T11/1H not marked successful - https://bugzilla.wikimedia.org/72352 (christian) NEW p:Unprio s:normal a:None None of the webrequest partitions [1] for 2014-10-21T11/1H have been been marked successful. What happened? [1] _____... [10:54:12] Analytics / Refinery: Bits and mobile raw webrequest partitions for 2014-10-21T13/2H not marked successful - https://bugzilla.wikimedia.org/72353 (christian) NEW p:Unprio s:normal a:None Bits and mobile raw webrequest partitions [1] for 2014-10-21T13/1H have been been marked successful. Wh... [10:55:11] Analytics / Refinery: Raw webrequest partitions that were not marked successful due to network issues - https://bugzilla.wikimedia.org/72298 (christian) [10:55:12] Analytics / Refinery: Raw webrequest partitions for 2014-10-21T11/1H not marked successful - https://bugzilla.wikimedia.org/72352#c1 (christian) NEW>RESO/WON (2014-10-21T13/1H is handled in bug 72353) The affected period is 2014-10-21T11:41:19/2014-10-21T11:59:09. It affected only ulsfo caches,... [10:55:54] Analytics / Refinery: Raw webrequest partitions that were not marked successful due to configuration updates - https://bugzilla.wikimedia.org/72300 (christian) [10:55:55] Analytics / Refinery: Bits and mobile raw webrequest partitions for 2014-10-21T13/2H not marked successful - https://bugzilla.wikimedia.org/72353#c1 (christian) NEW>RESO/FIX (2014-10-21T11/1H is handled in bug 72352) (upload's 2014-10-21T14/2H is handled in bug 69615 comment 13) Commit 4f6ba147e... [10:56:23] Analytics / Refinery: Raw webrequest partitions that were not marked successful but are too old to debug - https://bugzilla.wikimedia.org/72301 (christian) NEW>RESO/WON [11:00:36] !log Marked webrequest partitions for 2014-10-21T11/1H ok (See {{bug|72352}}) [11:01:42] !log Marked bits and mobile webrequest partitions for 2014-10-21T13/2H ok (See {{bug|72353}}) [11:04:23] !log Marked upload webrequest partitions for 2014-10-21T14/2H ok (See {{bug|69615#c13}}) [11:40:25] Analytics / General/Unknown: "ulsfo <-> eqiad" network issue on 2014-10-21 affecting udp2log streams - https://bugzilla.wikimedia.org/72355 (christian) NEW p:Unprio s:normal a:None Ops reported [1] a network issue between ulsfo and eqiad (According to IRC logs [2], alerts started around 201... [11:40:53] Analytics / General/Unknown: "ulsfo <-> eqiad" network issue on 2014-10-21 affecting udp2log streams - https://bugzilla.wikimedia.org/72355#c1 (christian) NEW>RESO/WON The upd2log pipeline shows the first sporadic ulsfo drop-outs on 2014-10-21T10:58 and continued to show ulsfo drop-outs until uls... [12:02:54] Analytics / General/Unknown: "ulsfo <-> eqiad" network issue on 2014-10-21 affecting udp2log streams - https://bugzilla.wikimedia.org/72355#c2 (christian) (In reply to christian from comment #0) > We did not see alerts on the udp2log pipeline. That's wrong. There have been alerts [1]: [11:54:29] Analytics / General/Unknown: "ulsfo <-> eqiad" network issue on 2014-10-20 affecting udp2log streams - https://bugzilla.wikimedia.org/72306#c3 (christian) (In reply to christian from comment #0) > We did not see alerts on the udp2log pipeline. That's wrong. There have been alerts [1]: [13:19:04] Analytics / Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://bugzilla.wikimedia.org/69615#c14 (christian) Happened again for: 2014-10-22T06/2H (on bits) [12:07:06] !log Marked raw bits webrequest partitions for 2014-10-22T06/2H ok (See {{bug|69615#c14}}) [14:01:55] ottomata: Sneaky meeting is starting :-) [14:03:06] qchris_meeting: NOT SO SNEAKY ANYMORE! [14:03:54] OMG. YOU'RE RIGHT! [14:37:41] (CR) Milimetric: Add a timerange validator (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/166157 (https://bugzilla.wikimedia.org/70714) (owner: Bmansurov) [14:37:43] (CR) Nuria: Improves retrieval of user names on csv report (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167356 (https://bugzilla.wikimedia.org/71255) (owner: Nuria) [15:24:02] qchris: i just looked at rtt a bit, but have you noticed any changes from the ack=2 mobile varnishkafka merge? [15:24:07] i don't see an obvious one there [15:24:14] Nope. [15:24:18] k, me neither, cool. [15:24:50] I wanted to ask you about the schedule to deploy the remaining ACK changes. [15:25:01] Would that be weekly ... daily ... other schedule? [15:28:27] I guess I'll just remove the CR-1 on the next one tomorrow, and we'll discuss then. [15:29:22] hm [15:29:29] I'm fine with daily [15:29:37] let's just put them each through a high load time and keep our eye on it [15:29:43] qchris: shall I go ahead and do the next now? [15:29:49] Fine by me. [15:29:51] k [15:31:32] done. [15:31:47] Thanks. [15:31:57] Cool. [16:00:00] * halfak finally gets a chance to check out https://www.mediawiki.org/wiki/Extension:Graph/Demo [16:00:03] ottomata, ^ [16:00:06] milimetric, ^ [16:00:46] cool, ja? the contributions by diff example made me think of you halfak [16:01:17] :) [16:01:34] halfak: I'm going to try to help the vega guys to release 2.0, should make that extension that much cooler :) [16:01:34] Looks like I'm going to need to learn a new graphing grammar. [16:01:42] :) [16:01:46] halfak: please ping me if you want to talk it over [16:01:55] it's a weird grammar at first, with some idiosyncracies [16:03:08] I see that it is pulling from CSVs. :) [16:42:53] qchris, ottomata: helllooo, yt? [16:42:58] yes. [16:43:01] qchris: ottomata: how easy would it be to set up CORS for the pagecounts-all-sites [16:43:20] qchris: will it be posssible to set up cors in this endpoint: http://dumps.wikimedia.org/other/pagecounts-all-sites/2014/2014-09/ [16:43:21] not sure. It's lightppd [16:43:24] so we could fetch project counts without jumping through hoops [16:43:25] oh ok [16:43:26] interestin [16:43:56] so dashiki can retrieve those files just like it does in apache in wikimetrics [16:44:34] qchris: for example, see: https://metrics.wmflabs.org/static/public/datafiles/RollingNewActiveEditor/dewiki.json [16:44:49] qchris: the headers that are sent when you request that file [16:45:07] nuria__: , milimetric, yeah, don't know lighttpd well, but i'm sure it is possible [16:45:15] i think opsen would not mind seeing lighttpd going away [16:45:21] replaced by nginx maybe [16:45:26] if this is not possible in lighttpd [16:45:29] (it probably is) [16:45:30] :) but i love lighty :) [16:45:38] ok, cool, thx [16:45:58] https://git.wikimedia.org/blob/operations%2Fpuppet.git/0031aa374b64a9013a8fa232773ee7084811272d/modules%2Fdownload%2Ffiles%2Flighttpd.conf [16:45:58] if you remember to ask someone if that would be *allowed*, let us know what they say [16:46:03] ^ is the lighttpd config [16:46:24] cool, looks easy enough to change, thanks qchris [16:46:51] milimetric: nuria__, you are on the ops@ list, ja? ask :) [16:47:29] ok, will do [16:47:49] ottomata: will do although sometimes i feel my questions there end up on /dev/null [16:48:24] ottomata: understandably so cause they are never urgent [16:48:35] ottomata: urgent questions that is [16:49:22] nuria__: hm, yerah [16:49:29] that happens to me sometimes too [16:49:38] but, if you ask there, then I can reference it in the next ops meeting [16:49:38] nuria__, milimetric: It seems: http://redmine.lighttpd.net/projects/1/wiki/Docs_ModSetEnv allows to set headers. [16:50:24] Analytics / EventLogging: Add test flag to EventLogging - https://bugzilla.wikimedia.org/72365 (Dario Taraborelli) NEW p:Unprio s:normal a:None EventLogging currently doesn't allow flagging specific events (or events originating from a specific client/IP address/IP range) as "test events".... [16:50:39] heading to cafe, back shortly. [16:58:12] qchris, ottomata: e-mail sent regarding CORS [17:06:38] qchris: where do you guys get "ab.d" and "en.m" codes for the project aggregation? i can go browse to the code [17:07:00] Strip the first two top levels. [17:07:15] cool, thanks [17:07:22] Like en.m.wikipedia.org -> strip org -> strip wikipedia.org -> en.m [17:07:30] You might want to look at the wikipage: [17:07:47] https://wikitech.wikimedia.org/w/index.php?title=Analytics/Pagecounts-all-sites [17:08:03] It has a big table detailing on the columns and values. [17:08:09] (PS2) Yurik: Adapted analytics pageview counter [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168057 [17:08:18] qchris: in order to use those files for pageviews in dashiki [17:08:27] we would need them by "project" [17:08:40] as in "enwikidictionary" [17:08:44] enwiki, dewiki [17:08:47] (CR) Yurik: [C: 2] Adapted analytics pageview counter [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168057 (owner: Yurik) [17:08:47] so we'd need a mapping like 'en.d' => enwiktionary [17:09:01] nuria__: Just grab the projectcounts files. [17:09:15] milimetric: yes. [17:09:16] ideally we'd have that done in hive so the clients don't have to all do it [17:09:31] so if we made such a mapping file, would you guys be able to use it in hive? [17:09:39] (CR) Yurik: [V: 2] Adapted analytics pageview counter [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168057 (owner: Yurik) [17:09:43] milimetric: not sure I understand that. [17:09:58] so, ideally we'd want a new dataset that is: [17:10:18] project, day, pagecount [17:10:29] enwiktionary, 10/01, 200 [17:10:33] enwiki, 10/01, 2000 [17:10:35] etc. [17:10:38] I see. [17:10:55] so the question is [17:11:02] is it better to do this in Hive or in Dashiki [17:11:02] That'd be easy with Hive + Oozie [17:11:14] yeah, we'd love if that was the case [17:11:28] If dashiki can do it ... dashiki could just consume the existing files. [17:11:30] we'd have to keep that mapping up to date, that's the only annoyance i see [17:11:49] qchris: the least teh client has to do the faster it will be [17:11:54] right [17:12:17] qchris: also for daily counts seems a lot of wasted work, right? [17:12:27] qchris: mapping will happen per file per day [17:12:40] The less time we have to invest in Hive, the faster we can fix the cluster :-) [17:13:52] qchris: think about it if we need to remap per day and we are showing 6 months of data daily counts, we are doing the re-mapping 6*30 for 1 project, really not optimal [17:14:01] nuria__: The mapping may be most efficient on the parsed data clientside ... There it's updating a single key on the client ... in hive, it's updating every single row. [17:15:13] qchris: but we are talking daily files, so it's 1 re-map per file, per request [17:15:15] Gonna grab something to eat. [17:15:16] qchris: it could be a second oozie job that works on the results of the first [17:15:20] Let's discuss afterwards. [17:15:20] np, later qchris [17:15:24] no it's ok [17:15:28] we can estimate around it [17:24:32] hey ottomata -- I'm not in favor of openTSB on hadoop [17:25:02] k, i don't know that much about it, ori just asked me what I thought, and I didn't see why not [17:25:08] why not? [17:25:10] hbase? [17:25:13] support burden sucks [17:25:21] don't want to commit to HA on hadoop [17:25:27] ? [17:25:36] ah, for a production service you mean? [17:25:38] yeah [17:25:40] right [17:25:51] I want us to be able to take hadoop down [17:25:54] although, there's no reason there couldn't be a separate smaller hadoop cluster for stuff like that [17:26:03] if ops wanted to allocate it [17:26:07] I think cassandra is a better choice [17:26:17] there's a graphite backend that feeds into cassandra [17:26:56] hadoop sucks operationally -- I have scars [17:27:00] googling, cyanite? [17:27:12] (I guess you talked to ori?) [17:27:13] y [17:27:19] in a meeting -- we'll need to discuss [17:27:52] k [17:29:31] ottomata, hi, i can't kill /usr/lib/hadoop/bin/hadoop job -kill job_1409078537822_50633 [17:29:43] first it complained about JAVA_HOME not set [17:29:45] yarn application -kill [17:29:47] not job id [17:29:58] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Killing_a_running_query [17:30:08] i followed the console output :) [17:31:54] thx, worked! [17:36:32] ottomata: i second toby's concerns with making hadoop into tear-1 [17:38:38] Analytics / EventLogging: Add test flag to EventLogging - https://bugzilla.wikimedia.org/72365#c1 (nuria) This is due to the fact that EL does not have a server testing environment in which you can try your data setup. Adding a test flag to the production schemas is really not a solution to the riot... [17:50:09] Hi analytics! [17:50:15] * AndyRussG waves [17:50:33] I'm working on a possible minor change to this file (line 13): https://git.wikimedia.org/blob/operations%2Fpuppet/2b028c108fc3deaddd9e34620ad55ac08ab17ebd/templates%2Fudp2log%2Ffilters.erbium.erb [17:51:00] Does anyone know where I could get some redacted log samples to test with? [17:56:41] Hi AndyRussG [17:56:48] hmm, redacted, eh? [17:57:04] you couulllld get access to stat1002 (non redacted) logs [17:57:07] that would be easiest [17:58:09] ottomata: Ah could I? [17:58:31] Don't I need to sign something or other first? [17:58:38] thanks btw [17:58:43] AndyRussG: actually, Hi! not sure I know who you are! [17:58:48] are you an employee or a volunteer? [17:58:54] I'm a contractor [17:59:08] aye, hm ok, likely your contract covers access to it [17:59:17] you'd just need to double check that (with your manager?) [17:59:20] I know AndyRussG. He's solid people. [17:59:44] Thanks Ironholds :) (solider a name than "Ironholds" has none) [17:59:50] you'd need an RT ticket with manager approval on it, and confirmation that your contract (NDA?) status covers access [18:00:03] plan B, since RT tickets + managers == slow as hell [18:00:18] email your manager checking on the NDA, CC ottomata and I, I'll grab you some raw lines when it comes through [18:00:19] * Ironholds jazz hands [18:00:35] magic [18:00:51] (what are you actually planning on tweaking?) [18:01:03] https://git.wikimedia.org/blob/operations%2Fpuppet/2b028c108fc3deaddd9e34620ad55ac08ab17ebd/templates%2Fudp2log%2Ffilters.erbium.erb [18:01:20] Line 13 [18:02:09] We made a change in CentralNotice bannerController.js, that line for udp-filter relies on a set order of URL parameters, which is silly 8p [18:04:34] Ironholds: sounds like a plan! thanks much :) [18:04:43] * Ironholds reads [18:04:59] Ironholds: health check! [18:05:51] If you're interested this is the Mingle card: https://wikimedia.mingle.thoughtworks.com/projects/online_fundraiser/cards/2066 [18:07:33] AndyRussG, awesome! I was investigating that issue a while back. Thumbs up. [18:07:46] Lemme know when you've got approval and I'll throw you a randoma ssortment of BannerRandom hits [18:13:14] ottomata, do you have any hadoop sql that goes like "insert overwrite table ..." unless partion exists? [18:14:35] Ironholds: fantastic, thanks! K I'll let you know :) [18:14:44] cool! [18:15:35] yurikR: i don't have any, but i think you can do that [18:15:36] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries [18:16:04] oh, they have IF NOT EXISTS, thx!!! [18:21:05] (PS1) Yurik: don't overwrite if exists [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168121 [18:21:34] (CR) Yurik: [C: 2 V: 2] don't overwrite if exists [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168121 (owner: Yurik) [19:10:24] this channel is huge... I feel obligated to like tell a joke, entertain everyone [19:10:39] What's Beethoven's favorite fruit? [19:11:04] Ba-na-na-naaa [19:14:09] Beethoven will nevver be the same [19:14:17] O.o [19:16:26] hah [19:26:40] milimetric, nuria__: back. (In case you want to discuss the pagecounts-all-sites further) [19:27:36] thanks qchris - we finished up our conversation [19:27:51] we'll talk about it again when Kevin schedules it [19:28:20] Mhmmm. But I guess we did not end up with a solution that we like, did we? [19:28:24] Well. Ok. [19:28:38] Let's discuss when the time comes. [19:40:36] where do we have graphs for the number of thanks actions ? [19:40:59] say, the number of thanks per day [19:43:49] Helder: I am not aware of such a graph. Maybe Ironholds, or halfak know one? ^ [19:44:07] we definitely /did/ have one on the wmflabs limn dashboard [19:44:13] So far I found this, which seems related to what I'm looking for: https://gerrit.wikimedia.org/r/#/c/162779/ [19:44:15] I am not sure if that still exists (or is still used) however :( [19:44:39] and I didn't find anything on https://tools.wmflabs.org/directory/ either [19:45:02] http://mobile-reportcard.wmflabs.org/graphs/thanks-daily [19:45:14] Helder, we could write that in Quarry. [19:45:25] No easy graphing, but producing a table should be simple. [19:45:30] ^ is the Thanks graph from the mobile reportcard. Thanks, Ironholds. [19:45:38] qchris how did you get to that page? [19:45:53] Helder: Through the gerrit change you linked. [19:46:14] It said mobile report card. And http://mobile-reportcard.wmflabs.org is the mobile reportcard. [19:46:17] * Ironholds salutes [19:46:33] I just checked all tabs for "Thanks". [19:47:57] unfortunatelly it is not split by wiki [19:49:03] indeed - and quarry won't help because this is looking at event data halfak [19:49:27] Helder: we're trying to get that event logging data made public, but right now it's not [19:49:33] milimetric, thanks are recorded in the log table. [19:49:48] oh, then yea :) [19:49:54] *logging [19:49:55] :) [19:49:58] I just checked the table, and it has a wiki column. [19:50:06] Oh ... too late :-) [19:50:30] actually, i'm gonna try grabbing that data from quarry then, about time I use it :) [19:50:52] Woo. I have some queries from enwiki where I was playing around with thanks. Let me look. [19:51:12] http://quarry.wmflabs.org/query/326 [19:51:23] * Helder is ashamed of not being able to write SQL (e.g. on quarry) yet [19:52:09] (CR) Nuria: [C: 2] "Tested on staging for the month of October for ruwiki and eswiki. In the case of ruwiki there are about 50 less RollingActive editors per " [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167064 (https://bugzilla.wikimedia.org/72134) (owner: Milimetric) [19:52:26] (Merged) jenkins-bot: Improve performance by excluding 0 results [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167583 (owner: Milimetric) [19:58:02] Helder: http://quarry.wmflabs.org/query/802 [19:58:44] that's like the details of who thanked who for the last 30 days [19:58:56] halfak: it defaults to enwiki, is there a way to make it work on other wikis? [19:59:14] (CR) QChris: Add UAParserUDF from kraken (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 (owner: Ottomata) [19:59:24] Helder: if you want to write some sql, just phrase as specifically as you can here and one of us might do it just for fun / curiosity [20:00:03] Helder: also interesting is the logging table manual with all the different goodies you can find in that table: http://www.mediawiki.org/wiki/Manual:Logging_table [20:00:05] thanks milimetric ! I really need to learn it... [20:00:29] it is like LaTeX, once it is done the solution is kind of obvious... =/ [20:01:32] LaTeX looks like craziness to me, so yea [20:01:33] BTW: I forked hakfak's query to create the 802 [20:01:44] (your link above) [20:01:48] cool - quarry's fun :) [20:02:09] at first I was looking for a button "Fork me" [20:05:16] halfak: is there a difference between the pages-meta-history1, 2, 3 files? [20:05:47] i guess not, they are all jsut different parts [20:06:34] ottomata: different parts of the same compressed file [20:06:57] but each individual file is a full xml docuement, right? [20:07:01] (CR) Nuria: Add UAParserUDF from kraken (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 (owner: Ottomata) [20:08:20] from what I was told, yes :-) [20:08:38] ok ja looks like it [20:08:43] but, then they aren't differnet parts of the same file [20:08:52] each one is a stand alone file [20:08:57] ah, ok [20:09:00] makes sense [20:09:04] (PS1) Yurik: create table if doesn't exist, runner script [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168168 [20:09:07] guess it has something do with the content contained in teh fiels, iunno [20:09:10] ok i don't need to care, thanks! [20:09:11] :) [20:09:11] but the whole history is split between these files [20:09:14] aye [20:09:23] as long as i look at them all i have the whole history [20:10:09] ottomata, could you take a look at https://gerrit.wikimedia.org/r/#/c/168168/1/scripts/zero-counts.hql -- why does it not skip existing partitions? it seems like it runs a partion, and later skips it [20:11:07] (CR) Yurik: [C: 2 V: 2] create table if doesn't exist, runner script [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168168 (owner: Yurik) [20:12:09] (CR) Milimetric: "tiny cleanup" (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167356 (https://bugzilla.wikimedia.org/71255) (owner: Nuria) [20:13:36] (PS3) Milimetric: Default to all namespaces for edits and pages [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167214 (https://bugzilla.wikimedia.org/72114) [20:13:41] yurikR: don't understand your question [20:13:43] (CR) Milimetric: [C: 2] Default to all namespaces for edits and pages [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167214 (https://bugzilla.wikimedia.org/72114) (owner: Milimetric) [20:14:40] runs a partion, and later skips it [20:14:41] ? [20:15:25] ottomata, correct - it seems like it does all the work until it is done, and later looks at the partition and decides it doesn't need it [20:15:40] doesn't need all that work it just did [20:15:57] instead of simply skipping over partitions that already exist [20:16:52] (PS10) Nuria: Improves retrieval of user names on csv report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167356 (https://bugzilla.wikimedia.org/71255) [20:17:03] oh, hm, i see, you are saying that it will do the select part, even if the partition already exists? [20:17:12] (PS1) Yurik: chmod +x run-hivezero.sh [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168172 [20:17:31] (CR) Nuria: Improves retrieval of user names on csv report (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167356 (https://bugzilla.wikimedia.org/71255) (owner: Nuria) [20:17:33] (CR) Yurik: [C: 2 V: 2] chmod +x run-hivezero.sh [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168172 (owner: Yurik) [20:17:35] hmmm, i wonder if it could be really smart and do the partition set dynamically, even if the parittion itself doesn't know how to do that? [20:17:52] like you could manually set which field to use for the partition in the query? [20:17:53] dunno [20:17:56] does something like [20:18:39] PARTITION(date=printf('%d-%02d-%02d', year, month, day)) work? [20:18:54] where year, month, day would be a value for each returned record? [20:18:56] dunno. [20:18:58] qchris: do you know? [20:19:03] (PS11) Milimetric: Improves retrieval of user names on csv report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167356 (https://bugzilla.wikimedia.org/71255) (owner: Nuria) [20:19:03] yurikR: is asking about https://gerrit.wikimedia.org/r/#/c/168168/1/scripts/zero-counts.hql [20:19:11] * qchris reads scrollback [20:19:21] (CR) Milimetric: [C: 2] Improves retrieval of user names on csv report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167356 (https://bugzilla.wikimedia.org/71255) (owner: Nuria) [20:19:25] ottomata, nope, already tried that - it can only do eval() in the select :( [20:20:06] ottomata, qchris, btw, would love a review of that code, see if it should be done differently [20:20:28] its already merged, but will fix any issues you may find [20:21:52] Analytics / Wikimetrics: Story: WikimetricsUser downloads large CSV - https://bugzilla.wikimedia.org/71255#c10 (Dan Andreescu) PATC>RESO/FIX will be deployed after sprint demo [20:22:07] Analytics / Wikimetrics: Story: VSUser has corrected historical edits/pages data - https://bugzilla.wikimedia.org/72114#c4 (Dan Andreescu) PATC>RESO/FIX will be deployed after sprint demo [20:22:09] Analytics / Wikimetrics: Story: VSUser has bots filtered out of all metrics - https://bugzilla.wikimedia.org/72134#c5 (Dan Andreescu) PATC>RESO/FIX will be deployed after sprint demo [20:24:25] yurik: Mhmm. OVERWRITE and IF NOT EXISTS looks ambiguous. I'll have to try that. [20:24:59] (CR) Ottomata: Add UAParserUDF from kraken (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 (owner: Ottomata) [20:26:22] qchris: just commented on that UAparser change [20:26:32] what part of the readme do you still consider unattributed/stolen content? [20:27:14] The last part of the first line :-(( The way it got created it infects the whole repo :-((( [20:27:26] I'l chime in there. Let me finish the hive test before. [20:28:39] haha, ok [20:39:23] Analytics / Wikimetrics: JSON reports printing user names incorrectly if they have non ascii chars - https://bugzilla.wikimedia.org/64555 (nuria) PATC>RESO/FIX [20:41:37] Analytics / Wikimetrics: Cohort validation: text is confusing "0 invalid" - https://bugzilla.wikimedia.org/71842 (nuria) PATC>RESO/FIX [21:06:33] yurikR: INSERT OVERWRITE allows to give IF NOT EXISTS but it happily ignores it :-( [21:06:49] Argh. Too late. No more yurikR. [21:06:54] qchris, here [21:07:05] Oh :-) [21:07:08] so should i just skip the "overwrite" ? [21:07:20] * yurikR1 has many faces [21:07:20] I just saw the quit message for your nick without "1". [21:07:32] auto-reconnect thingy [21:07:43] Hive does not allow IF NOT EXISTS without OVERWRITE. [21:07:53] (CR) Ottomata: [C: 2 V: 2] [webstatscollector] Add Makefile [analytics/metrics] - https://gerrit.wikimedia.org/r/99077 (owner: QChris) [21:07:58] I guess that aspect is somewhat broken. [21:08:05] (PS2) Ottomata: [webstatscollector] Cleanup whitespaces [analytics/metrics] - https://gerrit.wikimedia.org/r/99075 (owner: QChris) [21:08:13] How do you run the query ... maybe you can check there [21:08:17] if the partition exists? [21:08:38] (And not run the query in case the partition already exists) [21:08:45] (CR) Ottomata: [C: 2 V: 2] [webstatscollector] Cleanup whitespaces [analytics/metrics] - https://gerrit.wikimedia.org/r/99075 (owner: QChris) [21:09:10] (PS2) Ottomata: [webstatscollector] Correct casing in build example [analytics/metrics] - https://gerrit.wikimedia.org/r/99076 (owner: QChris) [21:09:15] (CR) Ottomata: [C: 2] [webstatscollector] Correct casing in build example [analytics/metrics] - https://gerrit.wikimedia.org/r/99076 (owner: QChris) [21:09:20] (CR) Ottomata: [V: 2] [webstatscollector] Correct casing in build example [analytics/metrics] - https://gerrit.wikimedia.org/r/99076 (owner: QChris) [21:10:41] qchris, the run script is in the same dir [21:10:44] same patch [21:11:13] qchris, i was looking for that info too - how do i check if partiton exists in HQL [21:11:14] ? [21:11:42] Let me look it up again ... describe table? [21:12:35] SHOW PARTITIONS $table_name [21:12:38] ^ yurikR1 [21:12:55] qchris, yes, but can i do an "if" on it? [21:13:37] Since Hive's IF NOT EXIST is broken for the use-case you have, I'd do the "if" in bash. [21:13:55] From scripts/run-hivezero.sh [21:14:38] qchris, you mean parse that blob in bash? eek [21:15:18] That's easy ... it's just plain text. [21:15:21] But! [21:15:37] once we upgrade to Hive 0.13, Hive can do that. [21:15:58] But there was a second command to do the same thing ... maybe that can do it already now ... let me find it. [21:16:48] qchris, when are we upgarding? :D [21:17:12] Ask the clusteroverlord ... [21:17:15] how come i always hit the "its implemented in the next version, but we don't have any plans to upgrade just yet wall?" :D [21:17:20] haha [21:17:25] There he is! [21:17:30] yurikR1: we can get 0.13 whenever i have time to test it and make sure it works and upgrade [21:17:36] now that there are trusty .debs for cdh 5.1 [21:17:45] ooh mighty overlord of the cluster... [21:17:50] haha [21:18:12] i do want to do that, i would guess, hmmmm, within a couple of months...? [21:18:21] lovelly [21:18:26] sigh [21:18:30] ha [21:18:36] 5.1 just came out like a week or two ago! [21:18:44] * yurikR1 leaves to strangle a puppy [21:18:48] or a month or two ago [21:19:07] i gotta run, laters yalls [21:19:16] ltr :) [21:19:26] puppy has been successfully strangled [21:24:12] yurikR1: Let's revive the puppy. [21:24:34] Hive can filter SHOW PARTITIONS if one does not use the fully qualified table name. [21:24:39] qchris, puppy is dead, the code is to be written :) [21:25:15] use $DATABASE; SHOW PARTITIONS $TABLE PARTITION(date=$DATE); [21:25:26] Should show you the partition, if it exists. [21:25:41] If the partition does not exist, it should not show partitiotns. [21:25:42] qchris, come to think of it, i could simply do an if on the /mnt dir in bash [21:25:44] much easier [21:25:57] But that makes you dependent on the mount. [21:26:17] You could also use the "hdfs dfs -test -d". [21:26:19] qchris, i'm already dependent on that - i look at those files from my python code [21:26:35] Oh ... :-D [21:26:54] they are in TSV format on purpose - this way i can do manual parsing of them afterwards [21:27:09] and add a few extra columns based on magic [21:27:22] i don't want to deal with running python code on hadoop [21:27:34] But since you don't use an external table ... you're also dependent on the place where hive decides to store the files. [21:27:36] and deal with outside dependencies (like calling zero portal configs) [21:28:31] Mhmm ... note that Hive need not necessarily create a single file ... it might produce two, three or more of them. [21:28:48] that's true, but i could check for dir existance [21:28:55] true, hacky [21:29:01] Meh. If it works ... it works. [21:29:24] besides, if it breaks, not hard to fix [21:29:43] Btw ... I guess others told you the Hive-is-not-yet-productionized-yadda-yadda. So expect things to break/change without notice. [21:29:46] but thx for looking into it! let me know if you spot any problems with the overall query [21:29:56] actually.... [21:30:01] i was told it is prod ready :D [21:30:08] Hahahaha. Sure. [21:30:12] tobie & andrew :) [21:30:26] Hahahaha. [21:31:15] On second thought ... you might not be kidding ... well ... expect things to break/change without notice. [21:31:47] sadly, i'm not :( [21:31:53] wmf_raw database will hopefully get inaccessible for us plain devs. [21:32:18] its ok if things change, as long as i can run similar queries on raw_rq :) [21:32:57] Honestly ... I hope that raw will die for us. [21:33:14] But there'll by some form of request logs. [21:33:18] let me know if you tihnk i could optimize that query btw, i use "DISTRIBUTE BY" to limit number of files (no more tiny <1k files) [21:33:27] we will need IPs :( [21:33:42] mostly to do ip drift analysis [21:33:46] I hope we can strip them. [21:34:08] e.g. carrier forgets to tells us that they have added a few new ip rages, and we should detect that by looking at the traffic patterns [21:34:30] thus we would need the XFF [21:34:33] We have community not liking us to access IP without need. [21:34:46] true, but this IS a need :( [21:35:08] we have found a number of times when carrier didn't update us [21:35:15] The carrier needs to get his act together and not forget to tell wmf IP updates. [21:35:17] and we had to ping them and change their ip settings [21:35:29] tell that to 1000 carriers :D [21:35:39] Tell that to a few million users :-D [21:35:40] some of which are tiny [21:35:43] hehe [21:35:46] That is also true for users. [21:35:50] I am tiny. [21:36:09] not really, i'm sure you weigh more than i do :-P [21:36:14] 150lbs [21:36:26] Weigh ... yes :-D But in terms of tallness. [21:36:34] 5.9 [21:36:41] :-D [21:36:41] aaanyway [21:36:53] You need IPs for drift computations. [21:36:55] I know. [21:37:16] that, and for bug detection - e.g. opera identifies it as zero, but we don't [21:37:30] similar to drift, but could be misconfigs [21:37:38] Yup. [21:37:49] but i hear you about ips [21:38:06] would be nice if we stopped any ip recording alltogether [21:38:36] We need to find a way to have both. Detect drift/misconfigs and get rid of IPs for Hive. [21:39:19] Yay ... more challenges :-) [21:39:37] well, we can only do that if we figure out how to detect drift by looking at a single request [21:40:08] at varnish level too [21:40:32] Ironholds: it looks like we don't need those logs anymore for now, opted for a simpler solution :) thanks a lot anyway! [21:40:36] ok, one problem at a time, i still need to get this stuff out [21:40:46] AndyRussG, okay! [21:41:17] yurikR1: Ok. [21:41:46] qchris, and thx :) [21:41:56] yurikR1: yw [21:42:02] i am sure we will solve al of these prblms... [21:42:03] some day [21:50:02] (PS1) Yurik: shell bug, weblogs2 to combine multi-sourced data [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168195 [21:50:24] (CR) Yurik: [C: 2 V: 2] shell bug, weblogs2 to combine multi-sourced data [analytics/zero-sms] - https://gerrit.wikimedia.org/r/168195 (owner: Yurik) [21:56:54] nuria__: can I make an ssh tunnel from my laptop to s5.labsdb, but using a labs instance as a middle hop? [22:02:20] nuria__: nvm [22:07:46] (PS3) Bmansurov: Add a timerange validator [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/166157 (https://bugzilla.wikimedia.org/70714) [22:30:11] (PS1) Bmansurov: Show namespace field placeholder next to the field [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/168202 (https://bugzilla.wikimedia.org/71582) [22:40:06] milimetric: hi, is this bug (https://bugzilla.wikimedia.org/show_bug.cgi?id=72116) something I should work on? Or is the functionality in the bug not necessary? [23:48:13] (CR) QChris: Add UAParserUDF from kraken (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 (owner: Ottomata) [23:48:21] (PS7) QChris: Add UAParserUDF from kraken [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 (owner: Ottomata) [23:49:04] (CR) QChris: Add UAParserUDF from kraken (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 (owner: Ottomata) [23:50:53] (CR) QChris: [C: -1] "There are still unaddressed comments from Patch Set 1." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 (owner: Ottomata)