[00:00:12] Since it's getting late here ... ok for me to head to bed? You've got things covered, right? [00:01:18] ottomata: ^ [00:02:15] yup, ees cool [00:02:16] goodnight! [00:02:18] thanks qchris [00:02:24] night everyone! [00:02:29] Thanks ottomata :-) [00:06:01] Analytics-EventLogging: Provide a robust way of logging events without blocking until network request completes; use sendBeacon - https://phabricator.wikimedia.org/T44815#995499 (phuedx) Is there a follow-on task to implement the client-side storage feedback? I couldn't see one. [00:19:43] Analytics, operations: Hadoop logs on logstash are being really spammy - https://phabricator.wikimedia.org/T87206#995529 (Gage) Where does this 3x number come from? Logstash reports that over the last 7 days Mediawiki has logged 3x as many events as Hadoop: mediawiki (241677764) MWException (86008559) Hadoo... [00:51:19] Analytics-EventLogging: Provide a robust way of logging events without blocking until network request completes; use sendBeacon - https://phabricator.wikimedia.org/T44815#995583 (Nuria) No, at this time we do not have plans to implement a client-side storage mechanism. [06:58:31] MediaWiki-Developer-Summit-2015, Analytics-EventLogging: Using EventLogging and Dashboards - https://phabricator.wikimedia.org/T85280#995790 (Qgil) Please update the description with the achievements of this session. Thank you in advance. [13:18:18] (PS1) QChris: Harmonize names of raw webrequest load coordinators [analytics/refinery] - https://gerrit.wikimedia.org/r/186951 [15:26:26] (PS1) QChris: Add designated value for 'no extra filter' for pagecounts-all-sites jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/186956 [17:18:49] Analytics-EventLogging: Provide a robust way of logging events without blocking until network request completes; use sendBeacon - https://phabricator.wikimedia.org/T44815#996180 (phuedx) The `MobileFrontend` extension has some code that deals with deferring logging events until the next full page load. We cou... [17:27:04] heads up folks, we are going to need to reboot hadoop cluster for a vulnerability in libc6 announced this morning [17:27:31] please speak up if you have a big job running [17:29:31] anyone seen otto? [17:50:22] (PS2) QChris: Fix legacy_tsvs' use of datasets after switch to refined tables [analytics/refinery] - https://gerrit.wikimedia.org/r/186775 [17:50:24] (PS2) QChris: Refine upload webrequests [analytics/refinery] - https://gerrit.wikimedia.org/r/186774 [17:50:26] (PS2) QChris: Sort webrequest_sources in legacy_tsvs' HiveQL files [analytics/refinery] - https://gerrit.wikimedia.org/r/186777 [17:50:28] (PS2) QChris: Use misc when producing legacy tsvs [analytics/refinery] - https://gerrit.wikimedia.org/r/186776 [17:50:30] (PS1) QChris: Condense job starts for legacy_tsvs [analytics/refinery] - https://gerrit.wikimedia.org/r/186967 [17:50:32] (PS1) QChris: Drop unneeded 2 hour penalty from starting legacy_tsv jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/186968 [17:50:34] (PS1) QChris: Refine bits webrequests [analytics/refinery] - https://gerrit.wikimedia.org/r/186969 [17:50:36] (PS1) QChris: Use bits when producing legacy tsvs [analytics/refinery] - https://gerrit.wikimedia.org/r/186970 [17:56:32] Analytics, Multimedia, MediaWiki-extensions-MultimediaViewer: Create dashboard showing file namespace page views and MediaViewer views - https://phabricator.wikimedia.org/T78189#838837 (Gilles) The file page graph is still broken for me. Could we leave this aside for now? A lot of work has gone into it and it... [17:57:23] (PS1) QChris: Add misc to webrequest status dump [analytics/refinery] - https://gerrit.wikimedia.org/r/186972 [17:58:35] Analytics: write a short document directing people who want us to use their predictive analytics tools ..... - https://phabricator.wikimedia.org/T76617#996262 (ggellerman) [18:00:32] jgage: I have some jobs running, and also the cluster is backfilling pagecounts. Those are all ok to kill/ignore. [18:13:19] Analytics-EventLogging: Provide a robust way of logging events without blocking until network request completes; use sendBeacon - https://phabricator.wikimedia.org/T44815#996268 (Nuria) >The MobileFrontend extension has some code that deals with deferring logging events until the next full page load. We >coul... [18:26:51] Analytics-EventLogging: Change timestamp fields to reduce DB storage size - https://phabricator.wikimedia.org/T87660#996303 (Nuria) NEW [18:30:14] Analytics-EventLogging: Remove autoincrement id from tables. - https://phabricator.wikimedia.org/T87661#996330 (Nuria) [18:30:30] Analytics-EventLogging: Remove autoincrement id from tables. - https://phabricator.wikimedia.org/T87661#996332 (Nuria) p:Triage>High [18:30:48] Analytics-EventLogging: Change timestamp fields to reduce DB storage size - https://phabricator.wikimedia.org/T87660#996333 (Nuria) p:Triage>High [23:32:02] So we'd need to bring hive's pagecounts-raw files into that directory in that flat shape [23:32:08] (so no leading 2015/2015-01) [23:32:25] We cannot just use the [23:32:44] /mnt/hdfs/wmf/data/archive/pagecounts-raw/2015/2015-01 [23:32:58] hm, 1. i think we should make it so we can use whatever directory we want, but i see your point about the flat dir now [23:33:00] directory, as that would go wrong during the turn of month/year. [23:33:23] ooof [23:33:33] man this is so annoying, why do we need a nice html page? apache dir listing is fine :p [23:33:42] Full ACK. [23:34:08] But I think people are used to them. Also to having md5sums. [23:34:16] yeah [23:34:28] hm, ok, so, let's change the part that copies to incoming. [23:34:37] Ok. [23:34:38] hm [23:34:42] i think [23:34:50] I'll try to puppetize something around that. [23:35:18] The second aspect that needs coverage before the switchover is the community. [23:35:37] We should probably announce that and ask for opinions, should we? [23:36:29] (The only noticable difference should be that the files appear a few hours later, and that a few pages will go down in count, as some monitoring requests are no longer visible on the kafka pipeline) [23:37:29] oh, qchris, since the hdfs layout is the same as the dumps layout [23:37:38] it looks like you can probalby just do a normal rsync [23:38:24] haha [23:38:25] you could [23:39:03] you coudl flatten into the incoming dir :p, and let the script sort out unflattening :p [23:39:21] Yup. That's what I would have done :-) [23:39:45] Something like 'find -exec rsync' [23:39:59] Adding a time filter to the find. [23:40:21] And using the incoming directory as target for the rsync. [23:40:29] (Too bad rsync cannot flatten on its own) [23:41:03] however ... /mnt/data is read-only on stat1002. [23:41:41] I'll have to look up if there is a suitable rsync target that works around it. [23:41:49] Or I'll find some other way to make it work. [23:42:08] I guess I can cheat by looking at how gadolinium is doing it. [23:42:36] s/gadolinium/protactinium/ [23:43:01] The protactinium move was pretty smart and extremely fast btw. [23:43:07] That was great. [23:44:14] So back to the timeline. [23:44:54] I guess the rsyncing part can be done tomorrow. [23:45:34] hm? [23:45:45] why is readonly a problem? [23:45:59] Because it's hard to write to a readonly mount. [23:45:59] rsync happens to dataset1001 ja? [23:46:29] Mhmm. [23:46:32] I guess so. [23:46:34] Yes. [23:46:43] That kills the readonly issue. [23:46:44] incoming dir is on datset1001 [23:46:45] :) [23:46:46] Right :-) [23:46:49] Thanks. [23:46:52] yup :) [23:47:38] That leaves notifying the community about the change. [23:48:27] You wanna send out a heads-up about this? [23:48:37] (I am too tired to do it today) [23:49:27] are we ready? [23:50:12] Ready in the sense that we want to switch over and can switch over in a few days, hence we can let the community know. [23:50:24] Not ready in the sense of "let's switch over right now". [23:50:27] ok, community == public analytics list? [23:50:34] I'd say so. yes. [23:50:37] ok [23:50:40] Thanks. [23:51:13] And I think the third step is turning off webstatscollector's filter+collector. [23:51:25] Fourth and final step is celebrating. [23:51:31] so: "The only noticable difference should be that the files appear a few hours later, and that a few pages will go down in count, as some monitoring requests are no longer visible on the kafka pipeline" [23:51:51] a few hours later means when? [23:51:55] 2 hours later than usual? [23:51:56] about? [23:52:17] You're right. That's all too vague. [23:52:27] Let's wait with the announcement until tomorrow. [23:52:36] haha, ok! [23:52:37] :) [23:52:44] It should be 2 hours. [23:52:55] But then ... the scripts take time to run. [23:52:59] aye [23:53:09] Might miss the cron jobs that pick them up ... etc etc. [23:53:41] Are those 4 items enough of a timeline for you, or do you want concrete dates? [23:54:34] naw that's perfect [23:54:41] just wanted to know what was next and move it along :) [23:54:53] cool :-) [23:55:18] we are still waiting on this, eh? [23:55:19] So I guess tomorrow I'll do some deploying of the changes that you merged today [23:55:19] https://gerrit.wikimedia.org/r/#/c/168104/ [23:55:21] ok [23:55:38] oh ariel reviewed! [23:55:42] (Thanks for the merges again) [23:55:50] Yes, he did. [23:55:52] But! [23:55:59] There is a parent change that needs review. [23:56:00] wow long time ago [23:56:04] You even pinged him. [23:56:10] ah yes [23:56:12] But it seems undecided :-) [23:56:59] milimetric: https://gerrit.wikimedia.org/r/#/c/186356/ might need thought... [23:57:55] ottomata: ^ also [23:58:13] see that. aye, springle, should I let you merge such things? [23:58:23] ok. I'll leave the db discussions to you and head to bed. [23:58:25] Good night. [23:58:45] ottomata: not necessarily :) just spreading more info around [23:59:23] ok cool [23:59:25] nighters qchris, thanks!