[00:10:03] (PS1) QChris: Create empty CSVs for relevant wikis [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172181 [00:10:05] (PS1) QChris: Backfill daily projectcounts for 2008 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172182 [00:11:23] (PS1) QChris: Backfill daily projectcounts for 2009 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172183 [00:12:14] (PS1) QChris: Backfill daily projectcounts for 2010 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172185 [00:13:14] (PS1) QChris: Backfill daily projectcounts for 2011 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172186 [00:14:31] (PS1) QChris: Backfill daily projectcounts for 2012 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172187 [00:15:54] (PS1) QChris: Backfill daily projectcounts for 2013 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172188 [00:17:31] (PS1) QChris: Backfill daily projectcounts for 2014 up to 2014-09-22 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172189 [00:19:20] (PS1) QChris: Backfill daily projectcounts up to 2014-11-08 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172190 [02:10:43] Analytics / EventLogging: Event Logging tables that we can drop as of Nov 7th - https://bugzilla.wikimedia.org/73140#c1 (Sean Pringle) NEW>RESO/FIX Done. [03:03:58] (PS5) Nuria: Add Separated Values (CSV, TVS) data converter [analytics/dashiki] - https://gerrit.wikimedia.org/r/168488 (owner: Milimetric) [04:03:46] (PS1) QChris: Add basic python setup [analytics/aggregator] - https://gerrit.wikimedia.org/r/172194 (https://bugzilla.wikimedia.org/72740) [04:03:51] (PS1) QChris: Add basic implementation of projectcount aggregation [analytics/aggregator] - https://gerrit.wikimedia.org/r/172195 (https://bugzilla.wikimedia.org/72740) [04:03:55] (PS1) QChris: Add basic monitoring script for projectcount aggregates [analytics/aggregator] - https://gerrit.wikimedia.org/r/172196 (https://bugzilla.wikimedia.org/72740) [04:04:00] (PS1) QChris: Allow additional logging to disk for projectcounts aggregation [analytics/aggregator] - https://gerrit.wikimedia.org/r/172197 (https://bugzilla.wikimedia.org/72740) [04:04:05] (PS1) QChris: Add switch for automatic pushing of data repo for projectcounts aggregation [analytics/aggregator] - https://gerrit.wikimedia.org/r/172198 (https://bugzilla.wikimedia.org/72740) [04:08:26] (PS2) QChris: Create empty CSVs for relevant wikis [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172181 (https://bugzilla.wikimedia.org/72740) [04:08:30] (PS2) QChris: Backfill daily projectcounts for 2008 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172182 (https://bugzilla.wikimedia.org/72740) [04:08:35] (PS2) QChris: Backfill daily projectcounts for 2009 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172183 (https://bugzilla.wikimedia.org/72740) [04:11:04] (PS2) QChris: Backfill daily projectcounts for 2010 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172185 (https://bugzilla.wikimedia.org/72740) [04:11:09] (PS2) QChris: Backfill daily projectcounts for 2011 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172186 (https://bugzilla.wikimedia.org/72740) [04:11:14] (PS2) QChris: Backfill daily projectcounts for 2012 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172187 (https://bugzilla.wikimedia.org/72740) [04:14:25] (PS2) QChris: Backfill daily projectcounts for 2013 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172188 (https://bugzilla.wikimedia.org/72740) [04:14:30] (PS2) QChris: Backfill daily projectcounts for 2014 up to 2014-09-22 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172189 (https://bugzilla.wikimedia.org/72740) [04:14:35] (PS2) QChris: Backfill daily projectcounts up to 2014-11-08 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172190 (https://bugzilla.wikimedia.org/72740) [13:04:12] (PS3) QChris: Backfill daily projectcounts for 2008 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172182 (https://bugzilla.wikimedia.org/72740) [13:04:14] (PS3) QChris: Backfill daily projectcounts for 2009 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172183 (https://bugzilla.wikimedia.org/72740) [13:04:16] (PS3) QChris: Backfill daily projectcounts for 2010 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172185 (https://bugzilla.wikimedia.org/72740) [13:08:15] (PS3) QChris: Backfill daily projectcounts for 2011 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172186 (https://bugzilla.wikimedia.org/72740) [13:08:17] (PS3) QChris: Backfill daily projectcounts for 2012 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172187 (https://bugzilla.wikimedia.org/72740) [13:08:19] (PS3) QChris: Backfill daily projectcounts for 2013 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172188 (https://bugzilla.wikimedia.org/72740) [13:11:07] (PS3) QChris: Backfill daily projectcounts for 2014 up to 2014-09-22 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172189 (https://bugzilla.wikimedia.org/72740) [13:11:09] (PS3) QChris: Backfill daily projectcounts up to 2014-11-08 [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/172190 (https://bugzilla.wikimedia.org/72740) [13:14:00] (PS1) QChris: Add header line to CSVs for projectcounts aggregation [analytics/aggregator] - https://gerrit.wikimedia.org/r/172234 [13:14:02] (PS1) QChris: Switch to CRLF as line terminator for CSVs [analytics/aggregator] - https://gerrit.wikimedia.org/r/172235 [13:14:04] (PS1) QChris: Allow to force recomputation of existing data [analytics/aggregator] - https://gerrit.wikimedia.org/r/172236 [13:14:06] (PS1) QChris: Add total column for project aggregation CSVs [analytics/aggregator] - https://gerrit.wikimedia.org/r/172237 [13:14:08] (PS1) QChris: Add bad dates for projectcount aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/172238 [13:14:34] (Abandoned) QChris: Add bad dates for projectcount aggregations [analytics/aggregator] - https://gerrit.wikimedia.org/r/172238 (owner: QChris) [15:01:11] Analytics / Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://bugzilla.wikimedia.org/69615#c21 (christian) Happened again for: 2014-11-09T21/2H (on upload) [15:01:29] !log Marked raw upload webrequest partition for 2014-11-09T22/1H ok (See {{bug|69615#c21}}) [15:03:06] AHH [15:03:17] having google trouble... [15:18:28] (PS1) Gilles: Update mysql command to use shared config file [analytics/multimedia] - https://gerrit.wikimedia.org/r/172252 [15:18:51] (CR) Gilles: [C: 2] "Command tested during ssh session" [analytics/multimedia] - https://gerrit.wikimedia.org/r/172252 (owner: Gilles) [15:18:56] (Merged) jenkins-bot: Update mysql command to use shared config file [analytics/multimedia] - https://gerrit.wikimedia.org/r/172252 (owner: Gilles) [15:52:22] oo, hey, there's a thread on the kafka mailing list about adding people to the Power By page [15:52:26] https://cwiki.apache.org/confluence/display/KAFKA/Powered+By [15:52:41] I'm going to send them something. think I should say anything more than this? [15:53:20] The Wikimedia Foundation uses Kafka as a log transport for analytics data from production webservers and applications.  This data is consumed into Hadoop using Camus and to processors of analytics data. [15:53:24] meh, needs edits ^help! [15:53:29] (qchris? ^) [15:53:42] * qchris just loaded that page [15:54:14] other [15:54:15] That's fine. [15:54:22] The Wikimedia Foundation uses Kafka as a log transport for analytics data from production webservers and applications.  This data is consumed into Hadoop using Camus and to other processors of analytics data. [15:54:32] hokay! [15:59:25] Analytics / Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://bugzilla.wikimedia.org/69615#c22 (christian) Happened again for: 2014-11-10T10/2H (on upload) [16:00:20] !log Marked raw upload webrequest partition for 2014-11-10T10/2H ok (See {{bug|69615#c22}}) [16:08:17] mforns, who wants a present? ;p [16:08:29] :] [16:08:46] you have it [16:08:47] ? [16:09:51] I have it! Or will in 30 seconds when I've aggregated to the month level as Toby requested. [16:09:55] I think. It was month, right? [16:10:54] yes, but if you have it like on friday and want me to do it with a select, I already have the query [16:11:15] Ironholds: ^ [16:11:21] ahhh [16:11:30] eh, easiest at my end. 'angon. [16:12:46] ok, so where is it? the same place? what table? [16:14:30] (PS1) Ottomata: Fix user in kafkatee.logrotate [analytics/kafkatee] - https://gerrit.wikimedia.org/r/172262 [16:14:41] (Abandoned) Ottomata: Fix user in kafkatee.logrotate [analytics/kafkatee] - https://gerrit.wikimedia.org/r/172262 (owner: Ottomata) [16:15:09] (PS1) Ottomata: Fix user in kafkatee.logrotate [analytics/kafkatee] (debian) - https://gerrit.wikimedia.org/r/172263 [16:15:30] (CR) Ottomata: [C: 2 V: 2] Fix user in kafkatee.logrotate [analytics/kafkatee] (debian) - https://gerrit.wikimedia.org/r/172263 (owner: Ottomata) [17:09:00] mforns_lunch, almost done! [17:24:25] mforns_lunch, check out staging.pentahoviews [18:00:52] (PS1) Gilles: Output script actually doesn't need the table in the SQL command [analytics/multimedia] - https://gerrit.wikimedia.org/r/172291 [18:01:13] (CR) Gilles: [C: 2] Output script actually doesn't need the table in the SQL command [analytics/multimedia] - https://gerrit.wikimedia.org/r/172291 (owner: Gilles) [18:01:19] (Merged) jenkins-bot: Output script actually doesn't need the table in the SQL command [analytics/multimedia] - https://gerrit.wikimedia.org/r/172291 (owner: Gilles) [18:01:24] ottomata: hmm, jdlrobson tried sudo again for the research password. [18:06:39] ARGH [18:06:56] YuviPanda: i am working with a new computer today and my ssh key and my puppet repo are in a weird state [18:07:09] ottomata: want me to add him? [18:07:15] yeah, add him to resaerchers group [18:07:21] do you have merge rights? [18:07:47] YuviPanda, why would he do that? [18:07:56] because he is not in the researchers grou [18:07:57] p [18:08:02] ottomata: I do! :) [18:08:06] yes, but if you have to sudo to get permissions to something... [18:08:12] he can't sudo [18:08:14] he is just trying [18:08:15] you should probably not be accessing it. [18:08:16] that too [18:08:19] indeed. [18:08:30] YuviPanda, do me a favour; when you land in SF, give him a look of disapproval? ;p [18:08:55] Ironholds: doing similar things got me and milimetric a sternly worded letter from leslie a few years back :) [18:09:05] yup [18:09:15] I accidentally once sudod and got told off by, I think, Tim [18:09:26] (I forgot which terminal window was to my local and which was to stat1 [18:09:33] yeah, I'm wondering if I should send him an email with a link to https://xkcd.com/838/ [18:09:39] oh I had no excuse, I was just exploring :) [18:09:55] ottomata: +1? https://gerrit.wikimedia.org/r/#/c/172292/ [18:10:15] YuviPanda, do it [18:10:16] dooo iiiit [18:10:54] merge away, thank you [18:11:13] ottomata: cool, puppetmerging shortly. [18:11:23] goign to eat lunch before ops meetinb, bbl [18:16:47] Ironholds: done [18:17:01] ta [18:31:03] Ironholds: Thanks! [18:31:10] np! [19:03:53] mforns, any luck? [19:08:21] Ironholds: haven't started yet, i'm finishing a CR, will get it in a sec [19:08:46] kk [19:26:30] qchris: still working on pageviews have not forgotten but there are some major spikes in the data, I think, lemme do some plots [19:27:03] Yes, there are spikes [19:27:15] there's not much we can do about it. [19:27:31] I noted some issues in the commit messages of [19:27:59] https://gerrit.wikimedia.org/r/#/projects/analytics/aggregator/data,dashboards/default:recent [19:32:20] (CR) Mforns: "I think everything was fine!" (6 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/168488 (owner: Milimetric) [19:33:00] Ironholds: starting now [19:37:37] ta [19:38:21] Ironholds: you speak portuguese? [19:38:50] mforns, no! As a side-effect of there only being a limited number of sounds, "ta" is also British English for "thanks :) [19:38:52] *" [19:39:03] xD [19:39:08] I didn't know of its portugese meaning (googled!) [19:39:25] in portuguese tá is short for está [19:39:27] oh! ok [20:14:12] qchris: ok, did some plots and spikes are not as bad as vega shows them soooo. something is going on [20:14:43] nuria__: Not sure what you mean. Can you point me to a vega rendering? [20:15:02] qchris: no, not outside of mydesktop right now [20:15:08] Screenshot? [20:15:14] qchris: will investigate a bit [20:15:26] Ok. That works too :-) [20:15:31] Ping me if you need help. [20:21:18] do point questions about research pw to me today :) [20:22:19] hi mforns [20:22:29] hi tnegrin [20:22:36] no pressure, just wondering about the ETA of the Pentaho stuff [20:23:08] I think it will take like 2 hours [20:23:33] is this OK? [20:24:11] sure -- can you ping me when it's ready? it's a holiday here tomorrow so we want to look at it today if possible [20:24:48] of course [20:25:08] I'll ping you in short tnegrin [20:25:14] thanks marcel! [20:25:19] np [20:55:56] ottomata: halfak hmm, do we use mongodb at all in stat1003? [20:56:05] I think puppet has it set to ensure that the service is stopped [20:56:07] perhaps we could kill it [20:56:17] * halfak is not here [20:56:21] ok [20:56:23] Also, I don't use mongodb on stat3 anymore [20:56:24] :) [20:56:25] :) [20:56:28] alright [21:01:03] YuviPanda: I think halfak was the only one who used it, not sure [21:01:13] yeah, that's my impression too [21:01:25] (CR) Hashar: [C: 1] "recheck" [analytics/aggregator] - https://gerrit.wikimedia.org/r/172194 (https://bugzilla.wikimedia.org/72740) (owner: QChris) [21:06:32] (CR) Hashar: "Before merging this, one probably want to have the *flake8-bin job to be triggered by Zuul." [analytics/aggregator] - https://gerrit.wikimedia.org/r/172195 (https://bugzilla.wikimedia.org/72740) (owner: QChris) [21:08:21] (CR) Hashar: "That is all good to me. I would merge it myself but I guess you probably review by other members of the analytics team :-]" [analytics/aggregator] - https://gerrit.wikimedia.org/r/172194 (https://bugzilla.wikimedia.org/72740) (owner: QChris) [21:22:23] ottomata: Could I bribe you to have a quick look on the projectcounts puppet stuff to see if I should improve anything? [21:22:48] (No need to merge, as the python code has not been reviewed ... just to have puppet parts ready once the python code got reviewed) [21:23:16] will do, gimme just a few [21:23:22] Awesome! Thanks. [21:28:03] ottomata, what is the "stats" account and why is it sucking up everything stat1002 has? [21:28:25] * YuviPanda checks [21:28:47] YuviPanda: is your man for the moment,, Ironholds, I am having unexpected trouble using my new laptop [21:57:57] ottomata, a friend suggests having sshd automatically at the highest niceness value (or, lowest?) to avoid this problem in the future. Not sure if that'd be useful? (not familiar with how we currently handle that) [21:58:38] probably not a bad idea [21:58:48] qchris: ok, review easier via irc [21:58:52] how about [21:59:04] make your base class just called [21:59:04] misc::statistics::aggregator [21:59:06] and then [21:59:31] don't set $data_path there, as it is the only thing that has anything to do with 'projectcounts' in that path [21:59:34] then [21:59:38] in your other class [21:59:44] you can inherit from misc::statistics::aggregator [21:59:54] and have all the variables from that class in local scope [22:00:01] then you can set $data path there [22:00:05] i guess your subclass could be [22:00:15] class misc::statistics::aggregator::pagecounts [22:00:38] as all it has to do is make sure that the $data_path exists, and that hte cron job for aggregated projectounts runs [22:00:59] Sorry. read up on stat1002 and it's uptime :-) [22:01:22] tch tch misc :) [22:02:06] ottomata: k. thanks. [22:02:14] What about the /mnt/hdfs ... [22:02:17] YuviPanda: I don't want to talk about it :/ [22:02:20] :) [22:02:20] Is that ok to hardcode? [22:02:23] hehe :) [22:02:31] ah, qchris, hm, its ok, but yeah, it would be better to refernce [22:02:38] /srvvvvv!!! :) [22:02:40] but, probably not that great to include it, so, you can just set up the dep without including! [22:02:41] * YuviPanda stops fly by reviewing [22:03:06] at the top of your ::projectcounts class (because that is the one that needs the mount) [22:03:06] do [22:03:31] Class['cdh::hadoop::mount'] -> Class['misc::statistics::aggregator::projectcounts] [22:03:40] Ok. [22:03:41] Thanks. [22:03:52] that will set up an explicit dependency between the classes, but won't actually attempt to include the class [22:04:04] basically, puppet will fail if that class is not included elsewhere [22:04:40] It's better than it is now :-) Great! [22:04:40] and, qchris, as for cron time, any time is fine. [22:04:45] k. [22:05:26] And any comments on the "mount aggregated project counts files in wikimetrics' public directory" change? [22:06:00] https://gerrit.wikimedia.org/r/#/c/172285/ [22:06:01] lnk? [22:06:01] looking [22:06:01] link* [22:06:13] qchris: I have merged a flake8 Jenkins job for analytics/aggregator [22:06:23] qchris: flake8-bin is not triggered yet though [22:06:38] hashar: Yes, I saw it and commented. [22:06:47] The recheck worked :-) [22:07:04] Triggering is at https://gerrit.wikimedia.org/r/#/c/172399/ [22:07:36] Thanks for the quick merge of the first part :-) [22:07:59] qchris: yes, you have the /datafiles say ensure => directory and there is a target [22:08:12] qchris: nice :-] [22:08:23] qchris: I would have merged the change introducing tox.ini but refrained [22:08:23] Also, you don't need to enclose your require in an array [ ] if there is only one require [22:08:41] hashar: I read that comment ... you're joining us for standup .. you're part of the team :-) [22:08:57] hashar: But I'll get analytics people to review it. Thanks. [22:09:02] qchris: , will put comments in that change instead of here, that one is easier [22:09:11] ottomata: k. thanks. [22:17:20] qchris: thank you! [22:18:40] hashar: Thanks for making continuous integration so easy! It's the first job I set up for analytics ... and it is sooooo easy. [22:18:45] It's amazing! [22:18:58] And there he goes :-) Hahaha [22:31:27] Analytics / General/Unknown: Operationalize Saiku - https://bugzilla.wikimedia.org/73246 (Toby Negrin) NEW p:Unprio s:normal a:None We like this visualization platform. What would it take to put it in production? [22:34:19] hey qchris/ottomata, did we do anything to the sampled logs in 2013-10? [22:34:27] there's this TREMENDOUS jump in pageviews [22:34:39] 4bn to 15bn. What did we do differently? [22:34:56] Was'n that CentralAuth? [22:35:12] naw, Special pages are being filtered [22:35:30] hmn. Looks like it hit the desktop site specifically [22:35:32] * Ironholds goes down another level [22:35:47] Ironholds: actually look per project [22:36:05] good idea [22:36:08] Google referers changed a lot during that time. [22:36:13] in enwiki things kind of make sense over the timeperiod (desktop, mobile) [22:36:36] They jumped up sometime in autumn 2013. [22:36:52] Meh ... but not 11bn :-) [22:36:55] yeah [22:37:03] but in other projects (es.mobile) if you plot pageviews in the time period it jumps arround too much [22:37:04] if we had 4bn PVs prior to that we'd be in twubble [22:37:22] from 1 million a month to 100.000 a month a few months later [22:37:30] Ironholds: It's hard to tell without knowing what you filter / don't filter. Sorry. [22:37:47] yeah, I'll dig in and see what I find if I run it over two different days [22:38:09] nuria__: Where should the aggregated csvs go in the wikimetrics' public directory? [22:38:13] Something like "http://metrics.wmflabs.org/public/projectcounts/daily" [22:38:14] Ironholds: seems like the data 2013-10 to 2013-11 cannot be compared [22:38:15] I /have/ discovered that my URL-parsing solution is failing. Womp womp. [22:38:16] or "http://metrics.wmflabs.org/public/datafiles/DailyPageviews" [22:38:20] or something else? [22:38:29] nuria__, well, I'll see what I find [22:38:39] if it turns out to be some historical artefact, I can easily run over 6 months of logs by wednesday [22:38:51] if it's something from 2013-11 /onwards/ we're in twubble [22:39:02] qchris: https://metrics.wmflabs.org/static/public/datafiles/ [22:39:22] qchris: _. name of metric [22:39:44] nuria__: So since metrics are currently camelcased ... [22:39:53] Does "http://metrics.wmflabs.org/public/datafiles/DailyPageviews" look right? [22:40:04] sure, that is how i have it [22:40:07] right now [22:40:10] Cool. [22:40:17] qchris: brb, need to go pick my daughter [22:40:23] Sure. [22:40:32] per-project isn't getting me anything [22:40:48] I'm going to go into the hourly-level resolution sets and see what I can find there [22:40:55] see if I can pick out a particular day/hour that dramatically changed. [22:40:58] should narrow down the hunting [22:42:06] tnegrin, for your reading pleasure: massive unexplained jump in PVs in my data from 2013-10 to 2013-11 (like, fourfold increase) [22:42:06] Ironholds: With that increase, one would assume that the TSVs size dramatically increased. But it did not. [22:42:08] tracking down as to why. [22:42:20] qchris, indeed. Alternately that the things feeding into it were different? [22:42:29] I'll go through and see. [22:42:33] huh [22:43:56] pulling out hourly-level aggregates to try and narrow it down [22:44:09] I've also discovered my project aggregator didn't work for historical wikisource URLs. will track down. [22:45:01] Ironholds, qchris: The TSV file has one line for each combination of dimensions, this should not necessarily grow with the increase in the value [22:45:57] mforns, no, the sampled logs this is based on [22:46:14] mforns: I was referring to the sampled-1000 TSVs on stat1002 at /a/squid/archive/sampled/sampled-1000.tsv.log-* [22:46:24] oh! sorry [22:47:32] Ironholds: the data looks ok after 11/13 [22:48:04] yup [22:48:10] 8 November 2013 mean anything to anyone? [23:05:28] Is http://datasets.wikimedia.org/public-datasets/all/registrations/daily_reg_mobile.csv supposed to be empty? [23:18:43] hey Ironholds -- what's going on? [23:18:59] tnegrin, I'm digging into the datasets trying to work out where the drop-off happens and wh [23:18:59] y [23:19:09] pinned it down to starting around 8 November [23:19:38] the data after that seems fine [23:19:59] yup [23:20:09] so I'm grabbing 1 day from before and 1 day from after and seeing what the difference is [23:20:52] how about we just do december through october in the cube? [23:21:19] sure [23:21:30] but if I can work this out we can get at least a year, so I'd like to at least try. [23:21:44] oh wow, that IS a drop [23:21:45] huh [23:21:57] we need to hurry [23:21:57] after all of filtering it should be left with 600k hits [23:21:58] not 185524 [23:22:00] yes, I know. [23:22:04] sorry dude [23:22:09] so, go with December onwards for now, and I'll see if I can debug [23:22:20] marcel might need to sleep [23:22:28] if I can debug in time to run for the remaining time period - which will not take long in runtime terms, it's only around 6 months - great [23:22:39] and then marcel can put it in tomorrow and we are go for wednesday [23:22:43] otherwise, we have 11 months of data [23:22:46] which is still really good [23:23:52] tnegrin, no no, here it's fine [23:24:06] I will stay here for 1h [23:24:19] ok -- thanks mforns [23:24:55] for a half a second I misread > Ironholds: “qchris: brb, need to go pick my daughter “ [23:25:25] mforns, well, we won't have new data in an hour. [23:25:37] not without breaking stat1002 and I leave that to other people *looks at ellery ;p* [23:25:55] DarTar, that's...yeah: not going to be a sentence I utter, I think. [23:29:12] Ironholds, OK [23:30:55] FOUND IT. [23:31:00] ...huh. this is the weirdest thi-OH FUCK. [23:31:03] * Ironholds headdesks [23:31:51] what was it? [23:32:09] checking, checking... [23:32:25] that is the /dumbest thing/ [23:32:32] so: none of the PV filters were killing anything wrong [23:32:48] but, there is a filter to remove rows that are malformed before they do something stupid with future processing [23:33:00] one great indicator as to whether it is malformed is whether the timestamp follows the right format [23:33:12] 2013-11 was when we stopped storing Y-M-D-H-M-S-ms [23:33:33] and of course my strptime function is designed to handle YYYY-MM-DDTHH:MM:SS because that's all we've had for the last year. [23:34:20] * Ironholds headslaps [23:34:23] okay, can patch [23:34:26] oh... [23:34:51] fine! [23:35:28] but, this will be done tomorrow. [23:35:40] go sleep in the meantime [23:42:47] tnegrin, fixed the bug upstream [23:42:52] we're good. Gonna rerun for those months. [23:42:59] ok [23:49:07] mforns: looks like we can stand down [23:49:17] Ironholds: neglect my last mail, Toby told me [23:49:53] tnegrin, OK [23:49:54] kk [23:50:23] thanks for the help -- I think Ironholds will be done in 24 hours or so and we can reload the data on Wednesday [23:51:25] Ironholds: to confirm -- the data in the existing series should be accurate b/t dec of 2013 and october 2014? [23:52:00] yup [23:52:05] Ironholds: awesome [23:52:10] that's appears to be when the remaining machines transitioned their TS format [23:52:35] Ironholds, you're going to work tomorrow? [23:52:41] even just having a couple of accurate months in 2014 can help us answer a number of questions [23:52:54] mforns, I'm going to carefully monitor my scripts [23:53:01] speaking of which, Ironholds: since you asked about new dimensions we may want to add to the cube, guess what the elephant is the room is [23:53:02] and then I'm going to check my paper for WWW got submitted and works [23:53:03] ok, I see [23:53:05] and then I'm going to go drinking [23:53:09] DarTar, oh god. what? [23:53:11] xD [23:53:45] Ironholds: the single biggest change impacting our traffic that we rolled out in summer 2013 and we haven’t seriously monitored ever since [23:53:52] SSH usage? [23:53:55] *HTTPS [23:54:00] bingo [23:54:05] I'll see if the x_analytics flags for that exist across all time [23:55:24] Ironholds: I imagine that having a boolean isSSL dimension combined with geographic data *over time* will allow us to understand a lot more about what’s happening with non-spider/non-robot PV trends [23:55:42] sorry, the last one was non-spider/non-automated [23:56:17] sure, if the field is intact. we'll see. [23:56:18] Ironholds: let’s add this to the wishlist for after Nov 17 ;) [23:56:25] I'm seeing weird stuff happen with the mobile versions [23:56:26] Ironholds: right [23:56:54] meaning? [23:56:59] note that this will be superuseful to sanity check the referred traffic data too