[04:53:52] Analytics-Backlog, Datasets-General-or-Unknown, operations: Requests to dumps.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116430#1760874 (Dzahn) yep, you are right. i don't know why i thought it was a duplicate, it's not. [07:00:32] Analytics-Backlog, Datasets-General-or-Unknown, Wikidata, operations: Requests to dumps.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116430#1761030 (Addshore) [08:25:10] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic, Outreachy-Round-11: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1761094 (01tonythomas) @jgbarah : are you planning to mentor https://phabricator.wikimedia.org/T6058... [08:55:04] Analytics-General-or-Unknown: http://reportcard.wikimedia.org/ - redirect and delete old stuff - https://phabricator.wikimedia.org/T71625#1761133 (Addshore) [09:00:58] Analytics-Visualization: Mobile report card needs descriptions - https://phabricator.wikimedia.org/T66561#1761140 (Deskana) Open>declined a:Deskana I'm declining this because I doubt that this task is needed any more. Those dashboards do not see much use. [10:08:39] Analytics-Engineering, Beta-Cluster-Infrastructure, operations, Varnish: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1761231 (hashar) NEW [10:12:49] (PS1) Joal: Correct CamusPartitionChecker bug [analytics/refinery/source] - https://gerrit.wikimedia.org/r/249368 [10:50:49] Analytics-General-or-Unknown: http://reportcard.wikimedia.org/ - redirect and delete old stuff - https://phabricator.wikimedia.org/T71625#1761329 (JanZerebecki) The domain was removed in dd7c6e3b99cc6645d2dcabd9df65a78cd28112de. Is the old config on stat1001 still around? It can thus be deleted now. [10:53:47] (PS1) Joal: Update oozie load job to use _IMPORTED flag [analytics/refinery] - https://gerrit.wikimedia.org/r/249373 [11:14:20] Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1761397 (Qgil) Hi @Nuria, this proposal is focusing on a Summit session but there is no indication about topics that could be discussed her... [11:15:07] Analytics, MediaWiki-extensions-General-or-Unknown, Reading-Infrastructure-Team: Track hook usage counts - https://phabricator.wikimedia.org/T106450#1761407 (TTO) [11:15:20] Analytics, MediaWiki-extensions-General-or-Unknown, Reading-Infrastructure-Team: Track hook usage counts - https://phabricator.wikimedia.org/T106450#1469404 (TTO) Seems to relate to WMF infrastructure?? [11:51:27] hm... why is the hue file browser like 1000 times faster than commandline hdfds -ls? [13:21:06] milimetric: have no idea :( [13:22:34] also trying to figure out what's up with the pageviews rsync. Files are in archive but not deployed [13:22:42] *not rsynced [13:23:40] milimetric: path issue in puppet [13:23:58] right, must be [13:24:06] milimetric: I sent a patch in CR (added you and otto as reviewers) [13:24:36] Oh you found it :) yay. ottomata has to merge but I'm looking [13:25:06] I also think we should provide the same service for the /wmf/data/archive/projectview/legacy/hourly folder [13:28:03] hm, I thought there was some weird symlinking that I didn't understand - but the other path is just not as complicated (pagecounts-all-sites) [13:28:13] I'll update your change with projectview joal [13:29:13] thx milimetric :) [13:30:26] seriously though, have you tried the file browser? https://hue.wikimedia.org/filebrowser/ [13:30:29] it's super fast! [13:30:42] I never use it, will try :) [13:35:10] k, patchset up [13:35:20] brb [13:46:52] ottomata: Hiii [13:47:08] I fixed the stuff with the burrow patch, and things seem to work fine [13:47:09] hallo! [13:47:17] awesome, cool, ja and I think we have a place to run it [13:47:23] ooh nice [13:47:24] going through email, then will check it out [13:47:29] cool [13:47:30] Analytics-Tech-community-metrics, Outreachy-Round-11: Outreachy Proposal for Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T116733#1761711 (Anmolkalia) [13:56:40] Analytics-Tech-community-metrics, Outreachy-Round-11: Outreachy Proposal for Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T116733#1761754 (Anmolkalia) [14:00:00] (PS1) Matthias Mullie: Add .gitreview [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/249393 [14:00:02] (PS1) Matthias Mullie: Measure the user responsiveness to notifications over time [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/249394 (https://phabricator.wikimedia.org/T108208) [14:01:30] (CR) Matthias Mullie: Measure the user responsiveness to notifications over time (1 comment) [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/249394 (https://phabricator.wikimedia.org/T108208) (owner: Matthias Mullie) [14:21:34] hm, weird, sorry, dunno why i was ofline [14:22:30] joal: i added a patch to https://gerrit.wikimedia.org/r/#/c/249378/3 [14:22:35] it bugged me too much :) [14:23:28] actually ottomata, rebase your patch on milimetric ones -- he made a change in between [14:23:51] on same change? [14:23:56] yeah I htink so [14:24:21] pretty sure it is on top of his [14:24:27] yes ottomata, your patches are 3 and 4, milimetric is 2 [14:24:30] aye [14:24:32] ok [15:38:42] oh ottomata, your script put both pageview and projectview into the same folder pageviews, correct ? [15:38:44] yes [15:38:45] and that works ? [15:38:45] i made it into one rsync job too [15:38:46] that's what you wanted, right? [15:38:46] hm, not sure :) [15:38:46] ja, did the rsync command with -rvn [15:38:46] hourly/2015/2015-09/projectviews-20150930-200000 [15:38:46] hourly/2015/2015-09/projectviews-20150930-210000 [15:38:46] hourly/2015/2015-09/projectviews-20150930-220000 [15:38:46] hourly/2015/2015-09/projectviews-20150930-230000 [15:38:46] hourly/2015/2015-10/ [15:38:46] hourly/2015/2015-10/pageviews-20151026-150000 [15:38:46] hourly/2015/2015-10/pageviews-20151026-160000 [15:38:46] hourly/2015/2015-10/pageviews-20151026-170000 [15:38:47] hourly/2015/2015-10/pageviews-20151026-180000 [15:38:47] I wanted projectviews to be on dumps, but where, I had no preference :) [15:38:48] you had your projectview class specifiying the same destination as the pageview class [15:38:48] into pageviews/ [15:38:48] ok, milimetric did that, so it's perfectly fine that way :) [15:38:49] which, makes sense, because that's how other ones are doing it, e.g. pagecounts-raw, pagecounts-all-sites [15:38:49] thanks :) [15:38:49] yep [15:38:49] also, joal, cOOl! hdfs webrequest report looking good [15:38:49] we got imported flags? [15:38:50] ottomata: we got flags, but wrongly set :) [15:38:50] (CR) Ottomata: [C: 2 V: 2] Correct CamusPartitionChecker bug [analytics/refinery/source] - https://gerrit.wikimedia.org/r/249368 (owner: Joal) [15:38:50] aye [15:38:50] joal: feel free to self release and review deploy that [15:38:50] ok, I don't like too much not to be reviewed before release [15:38:50] But ok :) [15:38:50] joal, why'd you take out the properties here [15:38:50] https://gerrit.wikimedia.org/r/#/c/249373/1/oozie/webrequest/load/coordinator.xml [15:38:50] joal: the stuff you'd have to release is just that change I merged, right? [15:38:50] that's all? [15:38:50] For refineru-source, yes it is [15:38:50] The rest is refinery [15:38:50] Currently starting a deploy a refinery-source [15:38:50] k [15:38:51] k, for refinery ja, just have the q about why you removed those properties [15:38:51] ottomata: removed the properties cause they were not needed, and since I was changing the file ... [15:38:51] But I can leave them if you prefere [15:38:51] they're not needed because they are already set? [15:38:51] or aren't used? [15:38:51] i think its fine, if you know what's going on, but update the commit message to say why please :) [15:38:51] Sure, will do [15:38:51] madhuvishy: this burrow patch looks great! [15:38:51] going to ammend to have it install on krypton [15:38:51] (PS1) Joal: Update changelog.md for v0.0.22 [analytics/refinery/source] - https://gerrit.wikimedia.org/r/249407 [15:38:51] ottomata: --^ [15:38:51] (CR) Ottomata: [C: 2 V: 2] Update changelog.md for v0.0.22 [analytics/refinery/source] - https://gerrit.wikimedia.org/r/249407 (owner: Joal) [15:38:51] sure, joal btw, you can update the changelog long with the actual change [15:38:51] in the same commit [15:38:51] either way :) [15:38:51] yeah, true :) [15:38:51] (PS2) Joal: Update oozie load job to use _IMPORTED flag [analytics/refinery] - https://gerrit.wikimedia.org/r/249373 [15:38:51] !log deploying refinery-source v0.0.22 [15:38:51] (CR) Milimetric: [C: -1] "really close, just one functional problem with the timeboxing, notes on the connection information and path conventions." (8 comments) [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/249394 (https://phabricator.wikimedia.org/T108208) (owner: Matthias Mullie) [15:38:52] also ottomata about yesterday's logging stuff [15:38:52] ? [15:38:53] it was not camus logging that is not present, it's the checker one [15:38:53] oh [15:38:53] The checker should print the actions it does, and actually doesn't [15:38:55] (CR) Ottomata: [C: 2 V: 2] Update oozie load job to use _IMPORTED flag [analytics/refinery] - https://gerrit.wikimedia.org/r/249373 (owner: Joal) [15:38:55] moving locations, back shortly [15:38:55] Is http://dumps.wikimedia.org/other/pageviews/ meant to be being filled with data yet? The puppet change was merged 20 hours ago. :-) [15:38:55] James_F: yes, I messed up the name though, so we have to merge a fix [15:38:55] and then there was some refactoring and stuff, should be merged and synced soon [15:38:56] milimetric: Aha, OK. Thank you! [15:38:56] (otto just quit but he'll merge it) [15:38:56] * James_F grins. [15:38:56] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [15:38:57] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [15:38:57] (CR) Matthias Mullie: Measure the user responsiveness to notifications over time (1 comment) [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/249394 (https://phabricator.wikimedia.org/T108208) (owner: Matthias Mullie) [15:38:59] milimetric: is the pivot ui deployed to druid1? http://druid.wmflabs.org/pivot [15:38:59] is druid the replacement for limn ? :D [15:38:59] it is really nice and very intuitive [15:38:59] as good as Excel :-} [15:39:03] Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1761988 (Qgil) Please check the [[ https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016#Scope_of_the_proposals | scope of the pro... [15:39:10] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1762001 (Aklapper) >>! In T112527#1757945, @Luiscanasdiaz wrote: > #5 we have a bug in our queries, that should return the same name alway... [15:46:44] (PS1) Joal: Upgrade refinery-job.jar to v0.0.22 [analytics/refinery] - https://gerrit.wikimedia.org/r/249425 [15:47:24] (CR) Joal: [C: 2 V: 2] "Self merging, bug correction." [analytics/refinery] - https://gerrit.wikimedia.org/r/249425 (owner: Joal) [15:48:32] !log Deploying refinery [15:48:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [16:06:18] Analytics-Kanban, Echo, Editing-Analysis, Flow, and 2 others: Measure the user responsiveness to notifications over time - https://phabricator.wikimedia.org/T108208#1762179 (Milimetric) [16:12:35] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Use Burrow for Kafka Consumer offset lag monitoring {hawk} - https://phabricator.wikimedia.org/T115669#1762203 (Nuria) [16:13:06] Analytics-Kanban, Patch-For-Review: Create deb package for Burrow {hawk} [5 pts] - https://phabricator.wikimedia.org/T116084#1762204 (kevinator) [16:21:21] madhuvishy: burrow running! [16:21:32] yay! where is it? [16:23:03] OH, need to open up firewall, ahng ont [16:25:27] ottomata: okay [16:27:32] oh madhuvishy, i see you got the hiera module param stuff to work, right? [16:27:37] do you know why it didn't work before? [16:27:39] just curious [16:28:24] Analytics-Backlog: Use spares to test Druid in production - https://phabricator.wikimedia.org/T116930#1762281 (Nuria) NEW [16:28:34] ottomata: yeah, i had a for loop in the template, and inside the for loop instead of using the local variable, i was doing @something and it was trying to look in the global scope [16:28:58] so the hiera stuff was working, just other silly things [16:29:02] ahhh, hm, weird ok [16:29:30] Analytics-Kanban: Browser Report. Small bugfixes - https://phabricator.wikimedia.org/T116931#1762294 (Nuria) NEW [16:32:52] ok, madhuvishy from bast1001 [16:33:04] (because port krypton 8000 not opened up from analytics vlan) [16:33:11] curl http://krypton.eqiad.wmnet:8000/v2/kafka/eqiad/consumer/eventlogging-00/topic/eventlogging-client-side | jq . [16:33:13] ah okay [16:34:08] ottomata: it seems like i can't get into bast1001 [16:34:22] ? [16:34:23] really? [16:34:27] hm, maybe it is your ssh config [16:34:27] https://www.irccloud.com/pastebin/SNlpmjPv/ [16:34:30] you must have an account there [16:34:34] aah [16:34:45] .wikimedia.org [16:34:49] hmmm [16:34:55] bast1001.wikimedia.org [16:35:04] cool that worked [16:35:16] Analytics-Tech-community-metrics, Developer-Relations, DevRel-October-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1762313 (Aklapper) https://github.com/Bitergia/mediawiki-repositories/pull/10/ and http... [16:35:29] ottomata: yay all the offsets showed up :D [16:35:40] will have to wait for an email [16:36:13] ha, have to wait for something to go wrong [16:36:17] madhuvishy: i wish this had a viz tool [16:36:18] ottomata: it'll only come if something goes wrong though, so not sure if i want that [16:36:22] was googling around for that [16:36:33] i guess we could write a script to query this and write to graphite [16:36:35] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1762316 (ellery) I have never used data on landing page impressions (I only use banner impressions and clicks). [16:36:38] lag. [16:36:44] ottomata: hmmm, yeah I was gonna look into posting to a http endpoint [16:36:48] no need to query no [16:36:52] topic latests offset - consumer offset [16:36:54] no, i mean, viz [16:36:58] if we can figure out where to post [16:37:01] the post just posts when there are problems, no? [16:37:08] we could use that for icinga alerts, instead of email [16:37:09] direct [16:37:12] but not for viz [16:37:45] ah yes [16:37:49] true [16:38:03] i can write a bot that will post it to graphite [16:38:27] madhuvishy: cool, that would be awesome, and i think very useful for many people [16:38:35] you could probably get it upstreamed into burrow project [16:38:35] yeah! [16:38:45] some simple little python client, make it work with statsd [16:38:46] that would be cool [16:38:49] yeah [16:39:04] make it read the burrow.cfg file to look for consumers to monitor [16:39:07] find out what topics they consume [16:39:13] do the offset subtraction [16:39:18] send lag metrics to graphite :) [16:39:23] statsd* [16:39:41] Orrr, maybe not burrow.cfg [16:39:49] maybe just cli opts, that way it can run anywhere [16:39:52] not on burrow host [16:39:57] yeah [16:42:56] ottomata: when you say lag, would it be per partition? [16:43:20] yes [16:43:39] (topic head offset - consumer committed offset) == lag [16:43:51] hm, that should be in the API though [16:43:54] show current lag [16:44:05] maybe I will make a burrow issue :) [16:44:22] ./v2/kafka/(cluster)/consumer/(group)/lag [16:44:25] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1762353 (Aklapper) @Luiscanasdiaz: Another problem I still see with the refreshed and deployed data: In the right panel of http://korma.w... [16:44:28] hm madhuvishy reading [16:44:29] OH [16:44:31] madhuvishy: [16:44:47] OH [16:44:50] didn't se it! [16:44:51] great! [16:45:07] awesome [16:45:37] all are lags are 0! [16:45:38] :) [16:45:49] that is awesome. [16:45:52] timestamp!!!! [16:45:58] if only camus committed to kafka! [16:46:02] joal wouldn't have had to do all that worok [16:46:05] I knowww [16:46:13] hm, madhuvishy do you know what the start/end pieces are? [16:46:39] oh, maybe those are where the consumer is (start) and the head of the topic (end) [16:46:40] ? [16:46:43] (sorry, tail of the topci) [16:46:53] (or head? iunno, whtaever, the end!) [16:46:58] ottomata: i think [16:47:08] it's the start and end of the window [16:47:25] window? [16:47:25] lagcheck evaluation blah window [16:47:40] https://github.com/linkedin/Burrow/wiki/Consumer-Lag-Evaluation-Rules [16:48:23] it has a sliding window and it keeps track of some 10 offsets and does all the consumer checks based on some rules [16:48:50] i guess start and end show the lags at the beginning and end of this sliding window [16:49:07] ok cool [16:49:12] got it, nice [17:07:42] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 3 others: Schema changes - https://phabricator.wikimedia.org/T114164#1762475 (Jdlrobson) a:atgo>None [17:24:39] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 3 others: Schema changes - https://phabricator.wikimedia.org/T114164#1762536 (DarTar) @csteipp I am confused about your last comment. We're talking of the possibility of logging IP addresses for //readers//, not contri... [17:29:10] milimetric, joal one advantage of ES is that it can index/query by http [17:29:21] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 4 others: QuickSurveys: Schema changes - https://phabricator.wikimedia.org/T114164#1762575 (DarTar) [17:35:58] ottomata: is burrow stuff going to be reported to graphana? [17:36:02] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [30.0] [17:37:02] ottomata, milimetric , mforns : it seems that pivot ui takes a config file [17:37:21] so we can have an external ui with a "subset" of dimensions [17:37:26] right? [17:39:53] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [17:40:16] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1762594 (Aklapper) @ Bitergia: Generally, any identities with "email": "username@svn.wikimedia.org" got assigned to "organization": "Wikim... [17:50:16] ottomata: should I mention that the Event Bus is gonna be discussed at today's RFC review? [17:50:19] (in SoS) [17:59:26] yes [17:59:30] milimetric: ^ [17:59:36] done [17:59:47] coming to the druid hangout? [17:59:55] joal, I was wrong, the loading of the 1Million file was 3minutes [18:00:05] right [18:00:32] yes milimetric [18:06:06] joal, the data size seems to be 10% smaller (with index:no and doc_values:true and source:enabled:false) [18:06:19] k mforns :) [18:06:28] not that much smaller :) [18:07:39] Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1762704 (bearND) Will this include impression data, too, e.g. for hovercards/link previews? [18:08:43] Analytics-Tech-community-metrics, Developer-Relations, DevRel-October-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1762705 (Aklapper) >>! In T103292#1746532, @saper wrote: > Wouldn't whitelisting of com... [18:12:23] ottomata: You don't want to expose that thing, he [18:25:11] Analytics, MediaWiki-extensions-General-or-Unknown, Reading-Infrastructure-Team: Track hook usage counts - https://phabricator.wikimedia.org/T106450#1762755 (Tgr) >>! In T106450#1761407, @TTO wrote: > Seems to relate to WMF infrastructure?? Not entirely, but definitely not a release blocker. The tag... [18:34:10] joal: ? [18:34:22] old reaction :) [18:34:42] when nuria was talking about UI config not to show everything [18:52:55] FYI, i just got this twice (while using "mysql ... -hanalytics-store.eqiad.wmnet") on stat1003: "ERROR 2013 (HY000): Lost connection to MySQL server during query" [18:53:08] it works again now though, so just leaving this here [18:53:41] a-team: anyone wanna de-brief on next steps in the cave? [18:53:51] sure a-team, I'm for it [18:54:40] fyi, afaict, there is no explicit linux setting for limiting page cache [19:00:14] lzia: I'll be a little late, feel free to start without me [19:00:43] thanks for the reminder milimetric. ;-) [19:06:40] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [19:07:55] !log Restart load job (based on IMPORTED flag) [19:07:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [19:08:30] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [19:12:10] Analytics-Engineering, Community-Tech: Add page view statistics to page information pages (action=info) [AOI] - https://phabricator.wikimedia.org/T110147#1762939 (DannyH) [19:15:32] milimetric: on cave? [19:15:46] sorry nuria had to run to another meeting, we can talk in 15 min? [19:15:55] milimetric: sure cc mforns_away [19:16:30] !log Downsizing cassandra replication from 3 to to 2 on per_article_flat keyspace [19:16:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [19:18:56] milimetric: joal, running rysnc command manually now, its happening! [19:18:59] http://dumps.wikimedia.org/other/pageviews/2015/ [19:19:02] sweet [19:19:03] :) [19:19:21] kevinator wanted to see that ^ [19:19:47] and James_F|Away: pageviews are starting to show up on dumps: http://dumps.wikimedia.org/other/pageviews/2015/ [19:24:03] Analytics-Engineering, Beta-Cluster-Infrastructure, Varnish: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1762992 (chasemp) [19:24:35] Analytics-Cluster, Analytics-Kanban: Browser Report. Small bugfixes - https://phabricator.wikimedia.org/T116931#1762995 (Nuria) [19:25:10] milimetric: rsync done. [19:25:11] http://dumps.wikimedia.org/other/pageviews/2015/2015-10/ [19:25:27] awesome, thx [19:25:36] milimetric: did we backfill just legacy projectviews or something? [19:25:42] or is pageviews just taking a long time to backfill? [19:26:17] ottomata: no backfilling on pageviews, so they're only available after the coordinator change Monday [19:26:38] aye makes sense, cool [19:26:50] Analytics-Backlog, Datasets-General-or-Unknown, Wikidata, operations: Requests to dumps.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116430#1763027 (chasemp) p:Triage>Normal [19:27:00] milimetric: i forget, how big is this data? [19:27:07] 17T avail on this drive [19:27:32] hm, projectviews are tiny [19:27:42] pageviews look like 200+ MB per hour [19:28:04] milimetric: Didn't see that before: data is not comporessed? [19:28:08] so about 2TB per year? [19:28:36] hm, yeah, pagecounts is zipped: http://dumps.wikimedia.org/other/pagecounts-all-sites/2014/2014-09/ [19:28:52] milimetric: we should do that don't you think ? [19:28:59] yeah, definitely [19:28:59] :( [19:29:07] it's zipped out of the coordinator or how? [19:29:14] milimetric: 51T per yera? [19:29:16] I didn't notice before [19:29:17] milimetric: hive [19:29:28] oh its not zipped [19:29:28] ha [19:29:35] woops! [19:29:41] milimetric: I'll do it first thing tomorrow morning, then backfill the old not-zipped stuff [19:29:45] ottomata: no, even unzipped it's only 1.7T per year [19:29:54] am I multiplying wrong? [19:29:55] ottomata: ok for you if I do that tomorrow morning ? [19:29:56] joal: ok, sorry [19:30:01] np milimetric :) [19:30:08] I think so - 200MB * 24 * 365 [19:30:08] 200*24*31*365/1024/1024 [19:30:15] ohDUH [19:30:16] no * 31 [19:30:16] hahaha [19:30:16] Analytics-Cluster, Analytics-Kanban: Browser Report. Small bugfixes - https://phabricator.wikimedia.org/T116931#1763072 (Nuria) Also, seems that our longtail is quite large, as for the report in 2015-09-27 reported percentages add up to 75%, maybe we should consider reporting up to 0.1%? [19:30:16] * joal is the BACKFILLER ! [19:30:24] ja ok, NP then [19:30:25] there aren't 365 months, thankfully [19:30:35] joal: don't worry about backfilling pageviews, right? [19:30:37] its fine. [19:30:50] the rsync runs witih --delete so if you remove the files from archive they will be removed from dumps [19:30:54] not backfilling pageviews, I think he means removing raw and replacing with zip [19:30:55] haha [19:31:11] ottomata: ok great [19:31:19] projectcounts aren't zipped? [19:31:19] I'll do that [19:31:37] the projectcounts are really tiny, but they could be zipped i guess [19:31:39] looks like projectcounts aren't event zipped anyway [19:31:51] well, they aren't on the other project* datasets [19:31:52] so meh? [19:32:03] I'll zip everything [19:32:09] so, that makes joal's job easy, just gotta make pageviews job zip, and then delete the unzipped ones [19:32:13] and restart job from current [19:32:16] joal: actually, nah, don't zip projectviews [19:32:24] cause then maybe you'd have to backfill them? [19:32:25] :) [19:32:31] that way any automated tools can use *views the same way as *counts [19:32:34] yeah, joal, the others aren't so don't, yeah [19:32:52] ok milimetric [19:33:00] zip pagevierws only :) [19:33:21] I'll zip the existing ones, not to loose the already computed data [19:33:33] joal good idea [19:33:33] k [19:33:46] tomorrow (tonight, too late :) [19:33:58] ottomata: I'll self merge that change tomorrow ;) [19:35:38] going to dinner, back to double everything is fine after, then sleep ! [19:35:50] Bye a-team, maybe later, if not, tomorrow :) [19:35:57] sleeeep! [19:36:17] goodnight! [20:00:46] nuria: ok, back [20:01:00] ottomata: yt? [20:01:10] Analytics-Engineering, Beta-Cluster-Infrastructure, Varnish: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1763189 (hashar) The reason is the role classes in `modules/role/manifests/cache/statsd.pp` all have: ``` statsd_server => 'statsd.e... [20:01:37] nuria: ya [20:02:01] ottomata, milimetric want to talk about next steps cc mforns_away [20:02:03] ? [20:02:09] sure [20:09:46] Analytics-Engineering, Beta-Cluster-Infrastructure, Patch-For-Review, Varnish, WorkType-Maintenance: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#1761231 (hashar) [20:18:47] Hey folks. madhuvishy around today? [20:21:17] halfak: she's in India this week and on vacation next week [20:21:40] Thanks kevinator [20:21:47] Middle of the night there, I imagine [20:21:48] halfak: she's on india time, so I think she signs off shortly after our standup that ends at 10:00am PST [20:21:59] yup [20:22:14] PDT or are you guys doing daylight savings early :P [20:22:28] ugh yes PDT [20:22:29] [20:22:32] Sorry [20:22:53] Anyway thanks for the info. [20:23:03] yw [20:23:12] * halfak is hunting for someone who isn't YuviPanda who can help him with some puppet. [20:30:29] milimetric: ut? [20:33:06] Analytics, Analytics-Backlog: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. - https://phabricator.wikimedia.org/T116609#1763297 (Nuria) [20:36:19] Analytics, Analytics-Backlog: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. - https://phabricator.wikimedia.org/T116609#1763300 (kevinator) [20:42:52] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 4 others: QuickSurveys: Schema changes - https://phabricator.wikimedia.org/T114164#1763354 (leila) [20:43:18] milimetric: moved your task to done (about feeding the wikistats reports) [20:43:25] kevinator: you around ? [20:44:00] yes [20:44:06] have a minute for me ? [20:44:10] cave ? [20:44:14] on my way [20:44:25] oh, busy cave :) [20:48:52] kevinator: free when you are now [20:53:02] Analytics, Analytics-Backlog: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. - https://phabricator.wikimedia.org/T116609#1763375 (Nuria) We have a table in hive that has the sample logs loaded for a year, I will look a bit into this. N Now, bottom line... [20:53:19] hey [20:53:34] milimetric: batcave? [20:54:15] there kevinator [21:04:55] milimetric: Awesome, thanks! [21:11:52] James_F: one last small snag is that the per-article files aren't zipped right now, so those will be deleted and re-added as .gz [21:12:04] but the projectviews-* files will all stay the way they are [21:12:22] milimetric: Ha. OK. .gz not .bz or .7z? [21:12:34] * James_F is difficult. [21:12:48] uh... I'm a 7z man myself, but the other archives on there are gz so I'm guessing we'll just stick with that [21:12:56] in case people have automated tools or whatever [21:12:56] Sure. [21:13:10] Automated tools that magically work for a new format? ;-) [21:13:12] * James_F grins. [21:13:17] not a new format! [21:13:20] same legacy format [21:13:36] just new definition being aggregated, so the counts should be much better behaved around bots, weird spikes, etc. [21:14:11] we were just talking how to explain this crazy mess of different files being available, we have a rough plan, we'll be doing that soon [21:14:42] .... "talking how" ... wtf... I need sleep [21:14:59] Oh, right. [21:15:02] * James_F grins. [21:15:06] Thank you for everything! [21:15:35] np, I just deliver good news, otto and jo did all of this [21:23:48] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1763543 (MeganHernandez_WMF) We have used landing page impressions in the past and I think we do use this for e... [21:25:49] milimetric: Thank you anyway, and thanks to ottomata and joal. :-) [21:26:35] James_F: let us know if you want data backfilled at the per-article level, right now we're just generating that going forward [21:26:42] * James_F nods. [21:26:44] WFM. [21:38:55] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1763618 (Ottomata) >> If we adopt a convention of always storing schema name and/or revision in the schemas themselves, then we... [21:47:47] milimetric, nuria : updated doc page: https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI/DataStore [21:47:53] sleeeeeep! [21:47:54] :P [21:47:59] please let me know of any issue :) [21:48:03] I have an issue [21:48:06] sleeeeep [21:48:09] milimetric: right nooooooow :D [21:48:14] :) [21:48:16] have a nice night man [21:48:29] Thx ! [21:48:39] joal: i thought you were sleeping hours ago.. ay ay .. [21:48:42] thank you [21:48:54] I'm sleeping all day long nuria ;) [21:49:19] jajaj [21:49:34] Tomorrow ! [21:51:07] by joal [21:51:17] *bye! [21:59:10] nite everyone [22:02:50] Analytics-Cluster, Analytics-Kanban: Browser Report. Small bugfixes - https://phabricator.wikimedia.org/T116931#1763776 (mforns) a:mforns [22:06:35] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1763815 (ezachte) Thanks Dan! I'll do some sanity checks, and report back. [22:13:52] Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1763855 (Milimetric) @bearND: the Pageview API right right now is a simple public-facing query into pre-aggregated data. What we want to g... [22:15:48] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1763870 (Milimetric) You're very welcome, Erik. There was one small last bug, the pageviews-* files got synced without being zipp... [22:22:47] is it known analytics1032 is messed up? [22:38:41] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1763953 (GWicke) @ottomata, I think understanding the semantics of an event primarily requires knowledge of the topic. The topic... [22:58:42] Analytics, Analytics-Cluster, Fundraising-Backlog, Unplanned-Sprint-Work, and 3 others: Impression log parsers should get sample rate from filenames - https://phabricator.wikimedia.org/T116800#1764074 (DStrine) [23:00:59] Analytics, Analytics-Cluster, Fundraising-Backlog, Unplanned-Sprint-Work, and 2 others: Impression log parsers should get sample rate from filenames - https://phabricator.wikimedia.org/T116800#1764161 (DStrine) [23:26:57] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 4 others: QuickSurveys: Schema changes - https://phabricator.wikimedia.org/T114164#1764287 (csteipp) I'm also slightly confused, but let me try: >>! In T114164#1762536, @DarTar wrote: > @csteipp I am confused about yo... [23:29:11] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 4 others: QuickSurveys: Schema changes - https://phabricator.wikimedia.org/T114164#1764288 (csteipp) s/make an edit/make a logged-in edit/ If they never log in, then it's not checkuser data. But #1 still is a concern... [23:59:26] (PS1) Mforns: Improve details in browser report [analytics/refinery] - https://gerrit.wikimedia.org/r/249659 (https://phabricator.wikimedia.org/T116931)