[00:12:48] (CR) Mattflaschen: "What license do you normally use for this kind of stuff?" [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/171454 (owner: Milimetric) [00:16:12] Analytics / EventLogging: Story: Identify and direct the purging of Event logging raw logs older than 90 days in stat1002 - https://bugzilla.wikimedia.org/72642 (Kevin Leduc) NEW>RESO/FIX a:Kevin Leduc [00:18:41] Analytics / EventLogging: Story: Identify and direct the purging of Event logging raw logs older than 90 days in stat1002 - https://bugzilla.wikimedia.org/72642#c5 (Kevin Leduc) Andrew's comment in RT: This is done for eventlogging: https://gerrit.wikimedia.org/r/#/c/171329/ I ran the job manually. I... [00:32:58] Analytics / Wikimetrics: Misleading search result displayed when filtering cohorts - https://bugzilla.wikimedia.org/73040 (Bahodir Mansurov) a:Bahodir Mansurov [00:43:41] Analytics / General/Unknown: Make sure 2013 traffic logs are gone from /a/squids/archive on stat1002 - https://bugzilla.wikimedia.org/63543#c8 (Kevin Leduc) Discussions started on what to do with each directory: /a/squid/archive api arabic-banner bannerImpressions blog edits edits-geocoded glam_nara mo... [01:30:29] (PS1) Bmansurov: Hide cohort details when filtering results no results. [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/171489 (https://bugzilla.wikimedia.org/73040) [01:40:42] Analytics / Wikimetrics: Can not delete tagged cohorts - https://bugzilla.wikimedia.org/72434 (Bahodir Mansurov) a:Bahodir Mansurov [06:23:11] Analytics / EventLogging: ext.eventLogging.subscriber.js broken - https://bugzilla.wikimedia.org/72197 (Matthew Flaschen) PATC>RESO/FIX [08:21:13] Analytics / EventLogging: Epic: ProductManager visualizes EL data - https://bugzilla.wikimedia.org/73068 (Kevin Leduc) NEW p:Unprio s:normal a:None ProductManager sets up a new schema and visualizing the incoming data is trivial. No need to set up CRON jobs or Limn dashboards. [08:23:26] Analytics / EventLogging: Epic: ProductManager visualizes EL data - https://bugzilla.wikimedia.org/73068#c1 (Kevin Leduc) s:normal>enhanc Collaborative design and tasking: http://etherpad.wikimedia.org/p/analytics-73068 [08:49:28] Analytics / Wikimetrics: Story: WikimetricsUser searches for cohort (filters) using tag name - https://bugzilla.wikimedia.org/73071 (Kevin Leduc) NEW p:Unprio s:enhanc a:None from https://metrics.wmflabs.org/cohorts/ type in a tag name in the search field result: your cohorts with that tag... [08:51:11] Analytics / Wikimetrics: Story: WikimetricsUser searches for cohort (filters) using tag name - https://bugzilla.wikimedia.org/73071#c1 (Kevin Leduc) Collaborative tasking on etherpad: [08:56:58] Analytics / Wikimetrics: Story: WikimetricsUser reports pages edited by cohort - https://bugzilla.wikimedia.org/73072 (Kevin Leduc) NEW p:Unprio s:enhanc a:None for each cohort, report should contain metadata like creation date, cohort creator, tags on the cohort need to know schema for re... [08:58:26] Analytics / Wikimetrics: Story: WikimetricsUser reports pages edited by cohort - https://bugzilla.wikimedia.org/73072#c1 (Kevin Leduc) Collaborative tasking on etherpad: http://etherpad.wikimedia.org/p/analytics-73072 [09:01:56] Analytics / Wikimetrics: Story: WikimetricsUser reports pages edited by cohort - https://bugzilla.wikimedia.org/73072#c2 (Kevin Leduc) Ignore description above. I cut and pasted the wrong thing. DESCRIPTION: Story As a Program leader or grant recipient Wikimetrics user, I want to be able to report on... [09:07:26] Analytics / Wikimetrics: Story: WikimetricsUser searches for cohort (filters) using tag name - https://bugzilla.wikimedia.org/73071#c2 (Kevin Leduc) Collaborative tasking on etherpad: http://etherpad.wikimedia.org/p/analytics-73071 [10:55:42] Analytics / Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://bugzilla.wikimedia.org/69615#c18 (christian) Happened again for: 2014-11-05T14/2H (on bits) [10:57:23] !log Marked raw bits webrequest partition for 2014-11-05T14/2H ok (See {{bug|69615#c18}}) [12:47:23] (CR) Milimetric: "Thanks for the fix! One suggestion for the knockout change. (Computed observables are best thought of as single-purpose)" (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/171489 (https://bugzilla.wikimedia.org/73040) (owner: Bmansurov) [13:33:07] (PS6) Milimetric: Transform projectcounts hourly files [analytics/refinery] - https://gerrit.wikimedia.org/r/169974 (https://bugzilla.wikimedia.org/72740) [13:46:32] qchris: is there anything I should do to help speed along the review of that python patch? [13:46:41] milimetric: I am just reviewing it. [13:46:45] oh! :) [13:46:47] great [13:47:01] If you prefer, I can publish what I have right now. [13:47:10] no it's ok [13:47:28] i'll try to address everything today and maybe by tomorrow we can have the job running [13:47:29] (I doubt I finish before standup, because I can smell sunch arriving :-) ) [13:47:47] s/sunch/lunch/ [13:48:05] :) that's ok, if it's sometime this morning (for me) I should have enough time to fix it [13:48:15] k [15:18:02] milimetric: do you want review comments on that puppet stuff or do you want me to just fix it up? [15:18:16] for flow data sync? [15:19:02] whichever's easier for you ottomata, they're not in too much of a rush but we should probably shoot to merge it sometime after I get back [15:19:29] Analytics / EventLogging: Stale EventLogging data on vanadium - https://bugzilla.wikimedia.org/73084 (christian) NEW p:Unprio s:normal a:None It seems the recent efforts to clean up EventLogging data missed /srv/eventlogging-logs on vanadium [1]. Can this directory get deleted too?... [15:20:26] Analytics / EventLogging: List tables/schemas with data retention needs - https://bugzilla.wikimedia.org/72741#c2 (Kevin Leduc) schema of talk pages has been updated to include a purge schedule. ( https://trello.com/c/ioW5LdYl/523-el-schema-data-audit ) It is now possible to view schemas without a pu... [15:23:30] OOf, I'm going to fix a buncha things up... [15:23:37] mostly some old things [15:23:40] not your stuff [15:26:38] ottomata: cool, i'll learn either way, i'll check out your changes [15:27:13] well, there are some better things to do with the research db file now that i've made some changes [15:29:51] hm, milimetric, why not just make this more generic for limn dashboards overall [15:29:54] if they both do the same thing? [15:29:56] why a new repo? [15:31:17] (CR) QChris: Transform projectcounts hourly files (34 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/169974 (https://bugzilla.wikimedia.org/72740) (owner: Milimetric) [15:31:41] (CR) QChris: [C: -1] "Per Code-Review on Patch Set 5" [analytics/refinery] - https://gerrit.wikimedia.org/r/169974 (https://bugzilla.wikimedia.org/72740) (owner: Milimetric) [15:32:03] ottomata: tasking? [15:32:06] and i can splain there [16:00:48] milimetric: whatcha know about this? [16:00:48] http://datasets.wikimedia.org/limn-public-data/ee/ [19:32:39] qchris_meeting, when you get back: where would I go to look at the HQL used to generate pageviews under the legacy definition? [19:32:40] Ironholds: the pagecounts-all-sites thing? [19:32:40] yup [19:32:40] https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webstats/insert_hourly_pagecounts/insert_hourly_pagecounts.hql [19:32:42] ta! [19:32:47] ottomata: that's the place where Dario puts output from some sql he runs in his personal cron I think [19:32:47] it's used by the ee dashboards: ee-dashboard.wmflabs.org [19:32:47] hm, so it needs to sync? [19:32:47] but it doesn't use generate.py? [19:32:48] it hasn't been updated since June [19:32:48] ottomata, so are we using that to generate the total pageview count or just the aggregate-by-page? [19:32:48] Ironholds: that is the by page [19:32:48] https://github.com/wikimedia/analytics-refinery/tree/master/oozie/webstats/generate_hourly_files [19:32:48] check out those two .hql scripts [19:32:48] those actually output the files [19:32:48] aha! [19:32:48] the one I linked to above selected data from webrequest table and inserted into webstats table [19:32:48] those two queries select from webstats table and output to files [19:32:48] so, [script] aggregates by page, https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webstats/generate_hourly_files/generate_hourly_projectcounts_file.hql takes the result and outputs? [19:32:48] gotcha [19:32:48] so it goes from https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webstats/insert_hourly_pagecounts/insert_hourly_pagecounts.hql to generate_hourly_projectcounts_file.hql to wikistats to consumer [19:32:48] to wikistats? [19:32:48] it goes from generate_hourly_projectcounts_file.hql -> files in hdfs -> http://dumps.wikimedia.org/other/pagecounts-all-sites/ [19:32:48] ottomata: yep, that needs to sync but it's weird it hasn't updated [19:32:48] milimetric: well, it isn't managed by puppet [19:32:48] the only reason it was syncing is because limn-mobile-data sync just synced the whole directory. [19:32:48] HMMM [19:32:48] i could make the sync its own cron job [19:32:48] yeahhHh that is probably better. [19:32:48] makes sense [19:32:49] that way if you generate limn stuff on your own, you can just put it in there [19:32:49] i guess like dar tar does [19:32:49] ottomata, okay. So, how do we generate http://reportcard.wmflabs.org/graphs/pageviews ? [19:32:49] Ironholds: good q. [19:32:49] um [19:32:49] Ez's old perlballs? [19:32:49] i think that comes from data that eric z prepares from the pagecounts-ez files? [19:32:49] yes think so [19:32:49] huh. qchris_meeting, mind chipping in when you come back? [19:32:49] ta ottomata :) [19:32:49] now I just need to remember where those live and learn to read perl. [19:32:49] why, you trying to make those from pagecounts-all-sites now? [19:32:50] no, I'm trying to work out why there's a 3B PV disparity between my count and stats.wikimedia's. [19:32:50] ah [19:32:50] making those from pagecounts-all-sites would be a terrible idea because it doesn't appear to kill Special:RecordImpression or Special;BannerRandom [19:32:50] (which may be a feature rather than a bug, iunno) [19:32:50] ok, milimetric, I've refactored a bit of the limn_data_sync stuff [19:32:50] (i'll take a look in a bit - still in design) [19:32:50] when you are ready [19:32:50] for flow [19:32:50] fingers crossed [19:32:50] https://github.com/wikimedia/operations-puppet/blob/production/manifests/misc/statistics.pp#L836 [19:32:50] and all you will have to do is add a line to that ^ [19:32:50] misc::statistics::limn::data::generate { 'flow': } [19:32:57] going to lunch! [19:33:00] Ironholds: Back. Yes, that's essentially ezachte's perl scripts. [19:33:00] qchris, hokay. Thanks! [19:33:00] There is some glue in between (like https://gerrit.wikimedia.org/r/#/admin/projects/analytics/reportcard [19:33:00] yup [19:33:00] and https://gerrit.wikimedia.org/r/#/admin/projects/analytics/reportcard/data ) [19:33:00] About the difference in counts ... yes. [19:33:00] so, for http://reportcard.wmflabs.org/graphs/pageviews - that's projectcount files, aggregated? [19:33:00] They're ntot expected to agree. [19:33:00] yeah, by-page and overall shouldn't agree [19:33:00] I am not sure what ezacht's scripts do. [19:33:00] * Ironholds nods. Okay! I'll talk to him. [19:33:00] Thanks! [19:33:00] I know that he generates some files where some pv's are stripped. [19:33:00] (Like ones with <5 pageviews) [19:33:00] Yes, ezachte is the guy to ask. [19:33:04] ta! [19:33:14] milimetric, nuria__ and folks: are you participating to the metrics meeting? [19:33:19] mforns: you can see it streaming on you tube [19:33:19] http://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings [19:33:20] yes, I'm ther [19:33:29] (PS1) Gilles: Action schema update [analytics/multimedia] - https://gerrit.wikimedia.org/r/171560 [19:33:35] (CR) Gilles: [C: 2] Action schema update [analytics/multimedia] - https://gerrit.wikimedia.org/r/171560 (owner: Gilles) [19:33:37] (Merged) jenkins-bot: Action schema update [analytics/multimedia] - https://gerrit.wikimedia.org/r/171560 (owner: Gilles) [19:39:52] Ironholds: Is your stat1002 thing done? [19:40:16] qchris, no idea! I'll check [19:40:29] 215/590-ish [19:40:36] Argh :-( [19:40:41] what do you need to do? [19:40:59] I wanted to (maybe) break the webrequests table. [19:41:08] Are you using it? [19:41:10] oh! [19:41:29] I'm using it for an unrelated query but that'll be done in ~20 minutes [19:41:36] the stat1002 stuff is using the sampled logs [19:41:47] I should be offended you think I'd write a pageview counter that'd take 10 hours :P [19:42:03] Ok. So If I break the webrequest in say 40 minutes ... would that be ok for you? [19:42:08] * qchris does evil grin [19:42:40] totally! [19:42:44] Cool. [19:42:46] just be sure to put it back together at some point :P [19:43:09] If it explodes, I have a script that brings it back into shape :-) [19:43:16] But it takes ~20 minutes. [19:47:04] ottomata: Do you think we could get the Range header in today (tomorrow is Friday :-( ), so we can see if things explode already this week? [19:47:25] I can take care of the Hive side of things, if you chime in on [19:47:30] https://gerrit.wikimedia.org/r/#/c/171267/ [19:48:35] (CR) Ottomata: [C: 2] Add Range header to webrequest table [analytics/refinery] - https://gerrit.wikimedia.org/r/171267 (https://bugzilla.wikimedia.org/73021) (owner: QChris) [19:48:43] hm, qchris, we ahve to add it to varnishkafka, is all, right? [19:48:49] https://gerrit.wikimedia.org/r/#/c/171268/ [19:48:50] ^ [19:48:55] But that has to wait a bit. [19:49:08] We need to adapt Hive first, otherwis ethe table falls apart. [19:49:25] OH [19:49:25] hm [19:49:30] Hive does not like to have the SerDe report columns that it cannot use. [19:49:31] if there is an extra json field, hive falls apart? [19:49:36] uh! [19:49:37] huh! [19:49:43] Yes. :-) [19:50:03] Ironholds want to finish a query, then I'll take care of Hive. [19:50:14] that is dumb! [19:50:25] Ah nooo. Waiting is ok :-P [19:50:26] hm, i think if we were using avro, we would not have this problem :p [19:50:29] nono [19:50:30] i mean [19:50:31] dumb of hive [19:50:32] :-) [19:50:40] why should it care if there are unused fields? [19:50:41] ! [19:50:45] Agreed. Avro might help here. [19:50:53] hm, but it doesn't care if there are missing fields? [19:50:59] as in, the table has range [19:51:02] but the json doesn't? [19:51:03] No, it reports NULL. [19:51:06] hm, ok [19:51:07] cool. [19:51:10] at least it does that [19:51:17] At least that was the way it worked in my test cluster. [19:51:51] (CR) Ottomata: [V: 2] Add Range header to webrequest table [analytics/refinery] - https://gerrit.wikimedia.org/r/171267 (https://bugzilla.wikimedia.org/73021) (owner: QChris) [19:51:54] yeah, we cna merge that hive one now [19:52:04] are you going to attempt to alter the table? [19:52:06] But if that first step fails, we can just dostroy the table and recreate it from scratch with the old schema. [19:52:14] Yes. I'll try alter table first. [19:52:17] aye cool [19:52:24] If that fails, I have a script to readd all partititons.. [19:53:42] aye cool [19:53:56] nuria__: re labs. it's partially up, right? [19:54:15] leila: should be fully back up now. [19:54:20] some things might need restarting [19:54:20] my API queries are working, but something like reasonator looks problematic. [19:54:27] leila: I just restarted reasonator... [19:54:27] YuviPanda: https://tools.wmflabs.org/reasonator/ [19:54:32] :D [19:54:42] ah, hmm [19:55:18] leila: back up now [19:55:33] yup! thanks! :-) [20:01:30] ottomata: so i should abandon that patch I put up and make that one line change right? [20:01:44] or just amend your patch to do only that [20:01:49] rebase and amend [20:01:50] but ja [20:01:54] k [20:17:31] milimetric: is that ready to be merged? [20:17:42] the flow repo config is all set up properly? [20:17:47] ottomata: is it ok if that job just runs every 30 minutes and fails? [20:17:53] no, it's basically an empty config right now [20:18:00] they've not started to add config [20:18:03] i guess so, but if it doesn't do anything yet we should just wait [20:18:29] ok, cool. spagewmf: could you just +1 that change when you guys are ready (have config + sql that you think will work)? [20:19:00] spagewmf: that change: https://gerrit.wikimedia.org/r/171465 [20:38:09] nuria__, milimetric: can I help you in something? [20:38:26] i have just restarted wikimetrics [20:38:30] oh uh, here Marcel, review my dashiki patch [20:38:35] one sec lemme push [20:38:40] it's not done but i'm working on the python one [20:38:41] ok [20:38:50] there are tons of errors regarding db connections [20:39:02] and some reports have not been calculated since end of october [20:39:18] ok nuria__, how can I help? [20:40:03] (PS2) Milimetric: Add Separated Values converter [analytics/dashiki] - https://gerrit.wikimedia.org/r/168488 [20:40:09] mforns: ^ that patch [20:40:10] well, mforns i think there is nothing we need to do now but it sill be worth it to schedule an item for next sprint cc milimetric [20:40:17] ok milimetric [20:40:35] yeah, looking into those errors might not be a terrible idea. But it's to be expected - labs just failed HARD [20:41:05] ok [20:41:21] so, I'll review milimetric's patch [20:42:14] thanks [20:49:11] !log switched webrequest's time_firstbyte from float to double [20:50:06] !log added range column to webrequest table [20:50:39] :) [20:50:49] Zooooom... and the table is broken :-D [20:51:04] selects fail with: [20:51:06] Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating hostname [20:51:08] for me. [20:51:19] heh, iiinteresting [20:51:20] ok [20:51:33] That's also the error I got in cluster in 1/5 cases :-( [20:51:48] I guess I'll go rebuilding that table then :-) [20:52:12] btw mforns I love christian's reviews (example: https://gerrit.wikimedia.org/r/#/c/169974/5/python/refinery/projectcounts.py). They're so awesome, I try to use him as a role model when I do my reviews [20:52:34] :-D [20:52:35] ok milimetric, I'll have a look at them [20:52:38] hehehe [21:05:20] milimetric: qchris' review is really comprehensive, that's helpful! [21:05:43] Better just call me "notorious nagger" :-( [21:08:59] haha [21:17:42] Analytics / Refinery: Switching column type of time_firstbyte broke the table - https://bugzilla.wikimedia.org/73095 (christian) NEW p:Unprio s:normal a:None Updating the time_firstbyte column from float to double (bug 73018) using a plain ALTER TABLE webrequest CHANGE time_firstbyte tim... [21:18:38] !log Rebuilt webrequest table, since changing the time_firstbyte from float to double broke it (See {{bug|73095}}) [21:19:00] Analytics / Refinery: Switching column type of time_firstbyte broke the table - https://bugzilla.wikimedia.org/73095#c1 (christian) NEW>RESO/FIX I rebuilt the webrequest table and added all partitions again. Now selecting again works as expected. [21:19:00] Analytics / Refinery: In Hive, change webrequest's time_firstbyte from float to double - https://bugzilla.wikimedia.org/73018 (christian) [21:19:12] Analytics / Refinery: In Hive, change webrequest's time_firstbyte from float to double - https://bugzilla.wikimedia.org/73018 (christian) PATC>RESO/FIX [21:19:58] qchris: xD [21:20:12] no, that was sincere, not sarcasm [21:20:32] weird [21:20:37] so qchris, altering the table breaks things [21:20:46] not always :-/ [21:20:49] but recreating it with the new schema works? [21:20:55] Yup. [21:21:20] (At least for me) [21:21:39] Yay for external tables :-) [21:23:58] ottomata: want to get rid of the NULLs in the range column? [21:23:59] https://gerrit.wikimedia.org/r/#/c/171268/ [21:24:09] (Or if that fails ... break this hour's data) [21:25:04] haha, yeah, i'm fine with merging that, I think it will be pretty harmless on the production side of things [21:25:10] who k nows what will happen on the analytics side! :) [21:25:18] :-D [21:25:18] here we go! [21:25:58] I hope we only see the alerts about sequence number resets :-) [21:26:29] Thanks for merging! [21:26:37] running puppet on cp1052 [21:28:54] kafkatee is still reporting cp1052 lines [21:29:00] with reset sequence numbers. [21:30:05] lookin good [21:30:13] i see range in the messages [21:30:20] \o/ [21:31:04] milimetric: i'm going to CR your dashiki stuff, what is next after that? [21:31:20] nuria__: I've gotta talk to Toby now [21:31:27] and then I'm doing my python changes [21:31:36] but i have one more patch coming on top of that dashiki one [21:31:52] so i was just having mforns review it so he gets more familiarity with dashiki and doesn't get bored :) [21:32:01] ok, milimetric feel free to code my EL patch, i have added test but just a few, will add more once methodology passes CR [21:32:21] milimetric: sure, i will just review it too, better if we both do it [21:32:36] milimetric: I think we should look at labs issues and data and errors [21:36:31] (PS1) Bmansurov: Delete cohort tags when a cohort is deleted [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/171726 (https://bugzilla.wikimedia.org/72434) [21:47:41] Analytics / Refinery: Add Range header to varnishkafka - https://bugzilla.wikimedia.org/73021 (christian) PATC>RESO/FIX [22:41:26] (PS7) Milimetric: Transform projectcounts hourly files [analytics/refinery] - https://gerrit.wikimedia.org/r/169974 (https://bugzilla.wikimedia.org/72740) [22:41:28] (CR) Milimetric: Transform projectcounts hourly files (34 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/169974 (https://bugzilla.wikimedia.org/72740) (owner: Milimetric) [22:53:42] Analytics / Tech community metrics: bugzilla_response_time.html not updating its data? - https://bugzilla.wikimedia.org/73101 (Andre Klapper) NEW p:Unprio s:normal a:None http://korma.wmflabs.org/browser/bugzilla_response_time.html looks stuck. Last item in middle table ("Longest time with... [22:53:55] Analytics / Tech community metrics: bugzilla_response_time.html not updating its data? - https://bugzilla.wikimedia.org/73101 (Andre Klapper) p:Unprio>Low [23:23:08] (CR) Milimetric: [C: 1] "seems like a fine fix, I don't have time to test so I can't merge, but it should go with the next deployment." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/171726 (https://bugzilla.wikimedia.org/72434) (owner: Bmansurov) [23:23:49] (CR) Bmansurov: "Do you think I should write some unit tests?" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/171726 (https://bugzilla.wikimedia.org/72434) (owner: Bmansurov) [23:33:16] (CR) Mforns: "Well, sorry if I've been too "tiquismiquis", it's your fault! :]" (12 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/168488 (owner: Milimetric)