[00:01:21] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: MediaViewer open button clicks not logged on file page - https://phabricator.wikimedia.org/T76925#839483 (Tgr) [00:01:22] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Track clicks on "Open in Media Viewer" on file page - https://phabricator.wikimedia.org/T77851#839484 (Tgr) [00:32:35] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: More Metrics Dashboards for Media Viewer - https://phabricator.wikimedia.org/T77264#839613 (Tgr) [00:33:01] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: More Metrics Dashboards for Media Viewer - https://phabricator.wikimedia.org/T77264#839615 (Tgr) Open>declined a:Tgr We never missed these stats so I'll just close this. [00:44:43] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Make upload.wikimedia.org serve images with Timing-Allow-Origin header - https://phabricator.wikimedia.org/T76020#839654 (Tgr) [00:45:44] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Make upload.wikimedia.org serve images with Timing-Allow-Origin header - https://phabricator.wikimedia.org/T76020#839657 (Tgr) p:High>Normal Lowering priority as we do not plan to work on this anytime soon. [00:45:51] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Track clicks on "Open in Media Viewer" on file page - https://phabricator.wikimedia.org/T77851#839671 (Tgr) p:High>Normal Lowering priority as we do not plan to work on this anytime soon. [00:45:59] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Viewing Options Dashboards - https://phabricator.wikimedia.org/T77670#839675 (Tgr) p:High>Normal Lowering priority as we do not plan to work on this anytime soon. [00:47:09] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#839739 (Tgr) p:Normal>High [01:06:59] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Identify metrics which show changes in reader file view behavior - https://phabricator.wikimedia.org/T77612#839803 (Tgr) [01:28:24] Analytics-Engineering: [Volunteer] Generate.py needs to write to a separate history.json file per configured instance - https://phabricator.wikimedia.org/T77936#839944 (Fhocutt) It makes more sense to store history.json in the config folder, which is already passed in as a parameter. No change to statistics.p... [03:01:15] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Collect data on how many images have captions - https://phabricator.wikimedia.org/T77793#840145 (Tgr) [03:24:10] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Add shell scripts to analytics/multimedia repo - https://phabricator.wikimedia.org/T77455#840234 (Tgr) [03:31:37] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Measure the time won by JS preloading on hover - https://phabricator.wikimedia.org/T77439#840258 (Tgr) [03:32:06] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Measure the time won by JS preloading on hover - https://phabricator.wikimedia.org/T77439#840260 (Tgr) Open>declined a:Tgr [03:34:27] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Track the number of MMV usages vs. MMV preloads - https://phabricator.wikimedia.org/T77438#840262 (Tgr) [03:34:35] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Track the number of MMV usages vs. MMV preloads - https://phabricator.wikimedia.org/T77438#840264 (Tgr) Open>declined a:Tgr [03:44:59] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Log loading times directly - https://phabricator.wikimedia.org/T77353#840289 (Tgr) [03:46:51] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Show loading time charts on the dashboard - https://phabricator.wikimedia.org/T77354#840296 (Tgr) [03:47:16] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Show loading time charts on the dashboard - https://phabricator.wikimedia.org/T77354#840298 (Tgr) Open>Resolved a:Tgr Done a long time ago: multimedia-metrics.wmflabs.org/dashboards/mmv#media_viewer_vs_file_page-graphs-tab [04:14:51] MediaWiki-extensions-MultimediaViewer, Multimedia, MediaWiki-extensions-ImageMetrics, Analytics: Calculate the ratio of opted-out anonymous users of MediaViewer - https://phabricator.wikimedia.org/T78228#840369 (Tgr) NEW [04:15:15] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Viewing Options Dashboards - https://phabricator.wikimedia.org/T77670#840376 (Tgr) Open>Resolved a:Tgr Mostly done; the rest is T78228. [08:44:03] Analytics-Engineering: Epic: Data Warehouse Vet Data Round 2 - https://phabricator.wikimedia.org/T78097#840843 (kevinator) [08:44:04] Analytics-Wikimetrics, Analytics-Engineering: Eng has vetted data in Data Warehouse - https://phabricator.wikimedia.org/T78019#840844 (kevinator) [08:46:01] Analytics-Engineering: EPIC: data warehouse - https://phabricator.wikimedia.org/T76382#840854 (kevinator) p:Triage>High [08:46:56] Analytics-Engineering: EPIC: data warehouse - https://phabricator.wikimedia.org/T76382#799107 (kevinator) [08:49:12] Analytics-EventLogging, Analytics-Engineering: WMF reads announcement on simpler process to get a dashboard from EL data - https://phabricator.wikimedia.org/T76058#840866 (kevinator) [08:49:48] Analytics-EventLogging, Analytics-Engineering: WMF reads announcement on simpler process to get a dashboard from EL data - https://phabricator.wikimedia.org/T76058#840870 (kevinator) p:High>Normal [08:53:04] Analytics-Dashiki, Analytics-Engineering: Vital Signs user reads description of metric - https://phabricator.wikimedia.org/T76741#840878 (kevinator) While speaking to Dan, we came up with: * remove the title of the metric on the graph (it's redundant) * add a button on the top right of the graph "More Info"... [08:54:31] Analytics-Engineering: Epic: Data Warehouse. Investigate an automated way to run "vetting data scripts" - https://phabricator.wikimedia.org/T78098#840882 (kevinator) p:Triage>Normal [08:55:58] Analytics-Engineering: Community has a developer doc "Getting Started with Wikimetrics" - https://phabricator.wikimedia.org/T77075#840891 (kevinator) p:High>Normal [08:58:45] Analytics-Dashiki, Analytics-Engineering: Vital Signs user reads description of metric - https://phabricator.wikimedia.org/T76741#840913 (kevinator) [08:59:19] Analytics, Analytics-Engineering: Analytics User uses CentralNotice cookie in x-analytics field of web-request logs - https://phabricator.wikimedia.org/T75835#840921 (kevinator) p:Normal>Low [09:00:56] Analytics-Engineering: Epic: Data Warehouse Investigate an automated way to load the data into warehouse - https://phabricator.wikimedia.org/T78099#840930 (kevinator) p:Triage>High [09:01:51] Analytics-Engineering: Investigate an automated way to load the data into warehouse - https://phabricator.wikimedia.org/T78099#840933 (kevinator) [09:01:59] Analytics-Engineering: Investigate an automated way to run "vetting data scripts" - https://phabricator.wikimedia.org/T78098#840934 (kevinator) [09:17:10] Analytics-Engineering: Puppet Production role class for wikimetrics scheduler/queue - https://phabricator.wikimedia.org/T76791#840975 (kevinator) p:Triage>Normal [09:17:33] Analytics-Engineering: Data Warehouse manages schema migrations with alembic - https://phabricator.wikimedia.org/T76829#840976 (kevinator) p:Triage>Normal [09:17:47] Analytics-Engineering: [Ops] new group that would allow you to sudo to the wikimetrics user (to enable dev to run admin script) - https://phabricator.wikimedia.org/T76792#840978 (kevinator) p:Triage>Normal [09:18:02] Analytics-Engineering: user defined mysql functions on analytics-store - https://phabricator.wikimedia.org/T76366#840979 (kevinator) p:Triage>Normal [09:19:35] Analytics-Wikimetrics, Analytics-Engineering: labsdb issues forced wikimetrics-scheduler to be stopped - https://phabricator.wikimedia.org/T74281#840988 (kevinator) p:Triage>High [09:21:48] Analytics-Wikimetrics, Analytics-Engineering: Eng has vetted data in Data Warehouse - https://phabricator.wikimedia.org/T78019#841004 (kevinator) p:High>Normal [09:21:55] Analytics-Engineering: Investigate an automated way to load the data into warehouse - https://phabricator.wikimedia.org/T78099#841005 (kevinator) p:High>Normal [09:32:34] Analytics-Engineering: changing puppetization for scheduler/queue - https://phabricator.wikimedia.org/T76790#841027 (kevinator) p:Triage>Normal [11:29:16] Engineering-Community, Analytics-Tech-community-metrics, Phabricator: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#841296 (Nemo_bis) > start humble with sql queries Speaking of which. As the Bugzilla weekly report is no more :[, this update could be sent direct... [11:51:42] qchris: ping [11:51:46] pong [11:52:07] i found an odd thing on eventlogging [11:52:18] two tables: Mobil%WebClickTracking_5929948 and MobileWebclickTracking_5929948 [11:52:27] also MobileWabClickTracking_5929948 [11:53:26] the table with % in the name was causing wierdness on analytics-store; i had to drop it. it only had one record [11:53:51] however now i'm not sure if MobileWebclickTracking_5929948 is intact [11:53:53] The MobileWabClickTracking_5929948 also has only 1 entry. [11:54:00] I'll drop it too. [11:54:04] do we know if that is correct? [11:54:13] oh Wab with an a [11:54:29] do we know how much data MobileWebclickTracking_5929948, with correct spelling, should have? [11:54:30] I do not think that it is correct. Validation should fail for them. [11:54:54] But maybe ... maybe validation only happens through the revision Id. I'll have to check that. [11:55:12] If validation only happens through the revision id, that would be a bug. [11:55:44] * qchris adds it to his todo-list [11:55:48] :) [11:55:50] tnx [11:56:03] tnx for finding it :-) [11:56:28] But it seems that dropping the Mobil%Web... table did not help with the RT ticket. Bummer. [11:57:43] qchris: RT 9016? [11:58:00] i think % might have been a contributor [11:58:04] continuing investigation now [11:59:13] yes. RT 9016. [11:59:16] Thanks [12:14:05] Analytics-Tech-community-metrics: korma page returns 404 error 64 times - https://phabricator.wikimedia.org/T78268#841375 (Qgil) p:Triage>Low [13:52:18] springle: We're having EventLogging schemas that have - (dash) in their names. Is it ok to allow dashes? [13:53:27] qchris: prefer not if we have a choice in the matter. but tolerable [13:54:25] Ok. Then I'll allow dash for now, and discuss with schema owners to move away from the dash. [14:54:26] Analytics-Refinery: Raw webrequest partitions for 2014-12-09T20/2H not marked successful - https://phabricator.wikimedia.org/T78282#841670 (QChris) NEW [14:57:40] Analytics-Refinery: Raw webrequest partitions that were not marked successful due to configuration updates - https://phabricator.wikimedia.org/T74300#841694 (QChris) [14:57:41] Analytics-Refinery: Raw webrequest partitions for 2014-12-09T20/2H not marked successful - https://phabricator.wikimedia.org/T78282#841690 (QChris) Open>Resolved a:QChris These alerts have been caused by the deployments of 492b674c07aefe06206602ef46466dad787bc903, and dd8465508b7b880c2fe2e9f758af5b57... [15:31:37] !log Marked all raw webrequest partitions for 2014-12-09T20/2H ok (See {{PhabT|78282}}) [15:34:42] are we meeting? [15:36:50] (PS6) Nuria: Manage warehouse migrations with alembic [analytics/data-warehouse] - https://gerrit.wikimedia.org/r/177739 [15:52:40] Analytics-Engineering, Analytics-Wikimetrics: Re-run Wikimetrics data once Labs issues are fixed [8 pts] - https://phabricator.wikimedia.org/T78305#841895 (kevinator) NEW [15:56:58] Analytics-Engineering, Analytics-Wikimetrics: labsdb issues forced wikimetrics-scheduler to be stopped - https://phabricator.wikimedia.org/T74281#841916 (Milimetric) Open>Resolved a:Milimetric reports were disabled in the database by adding an old_recurrent field and migrating the value from the recu... [16:12:24] Analytics-Engineering, Analytics-Wikimetrics, Analytics-Dashiki: Vital Signs user sees annotations on graphs [13 pts] - https://phabricator.wikimedia.org/T78151#841933 (kevinator) [16:13:45] Analytics-Engineering, Analytics-Wikimetrics, Analytics-Dashiki: Vital Signs user sees annotations on graphs [13 pts] - https://phabricator.wikimedia.org/T78151#836977 (kevinator) [16:14:05] Analytics-Engineering, Analytics-Wikimetrics, Analytics-Dashiki: Vital Signs user sees annotations on graphs [13 pts] - https://phabricator.wikimedia.org/T78151#836977 (kevinator) [16:14:07] Analytics-Engineering, Analytics-Wikimetrics, Analytics-Dashiki: Vital Signs user sees annotations on graphs [13 pts] - https://phabricator.wikimedia.org/T78151#841939 (Milimetric) [16:16:56] Analytics-Refinery: Eng uses Mahout installed on Hadoop cluster - https://phabricator.wikimedia.org/T78016#841946 (Ottomata) Ok, I have 0 experience with mahout, but from what I can tell, it is just an executable that needs to be installed on client nodes, i.e. stat1002. DONE! Let me know if there is more!... [16:26:31] Analytics-Refinery: Eng uses Mahout installed on Hadoop cluster - https://phabricator.wikimedia.org/T78016#841969 (Ottomata) Open>Resolved Pretty sure this is resolved! Feel free to reopen if there is more ( that was too easy!) [16:41:46] Analytics-Engineering, Analytics-Wikimetrics: Wikimetrics User reads disclaimer on website - https://phabricator.wikimedia.org/T76107#842000 (kevinator) [16:42:11] Analytics-Engineering, Analytics-Wikimetrics: Wikimetrics User reads disclaimer on website [3 pts] - https://phabricator.wikimedia.org/T76107#842004 (kevinator) [16:43:44] (CR) Ottomata: "Still WIP, but mostly works. Schema to be discussed in more detail. Better tests needed." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/171056 (owner: Ottomata) [16:45:57] Analytics-Engineering, Analytics-EventLogging: WMF engineer follows steps to collect EL data - https://phabricator.wikimedia.org/T76679#842012 (kevinator) p:High>Low [16:51:12] Analytics-Engineering, Analytics-Wikimetrics: Eng has vetted data in Data Warehouse [13 pts] - https://phabricator.wikimedia.org/T78019#842024 (kevinator) [16:52:12] Analytics-Engineering, Analytics-Wikimetrics: Wikimetrics auditor has read-only login to Wikimetrics DB [3 pts] - https://phabricator.wikimedia.org/T76109#842029 (kevinator) [17:00:03] Analytics-Engineering: Investigate an automated way to load the data into warehouse - https://phabricator.wikimedia.org/T78099#842055 (kevinator) Track discussion with Sean [17:11:35] Phabricator, Analytics-Tech-community-metrics: SQL user/grant for phabricator statistics script - https://phabricator.wikimedia.org/T78311#842089 (Dzahn) NEW a:Springle [17:12:17] Analytics-Engineering: Puppet Production role class for wikimetrics scheduler/queue - https://phabricator.wikimedia.org/T76791#842101 (Nuria) [17:14:20] Phabricator, Analytics-Tech-community-metrics, Engineering-Community: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#842103 (Dzahn) [17:17:46] Multimedia, MediaWiki-extensions-MultimediaViewer, Analytics: Update the SQL queries for the new versions of the schemas - https://phabricator.wikimedia.org/T78312#842104 (Gilles) NEW a:Gilles [17:25:39] Phabricator, Analytics-Tech-community-metrics: SQL user/grant for phabricator statistics script - https://phabricator.wikimedia.org/T78311#842121 (Aklapper) p:Triage>Normal [17:29:54] Analytics-Engineering, Analytics-Dashiki: Vital Signs user reads description of metric - https://phabricator.wikimedia.org/T76741#842124 (kevinator) Even simpler: Add this icon at the end of the title so users know the title is clickable. https://wikitech.wikimedia.org/w/static-1.25wmf4/skins/Vector/images/ex... [17:31:47] Analytics: Fix Varnishkafka delivery error icinga warning - https://phabricator.wikimedia.org/T76342#842127 (fgiunchedi) I misunderstood "many metrics", what I meant with the previous suggestion applied only if the statsd traffic was going to be reduced by a local txstatsd, this doesn't seem to be the case ho... [17:32:38] Analytics-Engineering, Analytics-Wikimetrics: Uploading cohort by copy-pasting breaks if names contain special characters [8 pts] - https://phabricator.wikimedia.org/T76105#842129 (kevinator) [17:36:21] milimetric, Hi [17:36:54] milimetric, I am stuck with some urgent office work for a day or two [17:37:17] milimetric, is it OK if I deliver the logging integration feature during the weekend? [17:45:52] (PS1) Gilles: Update the SQL queries for the new versions of the schemas [analytics/multimedia] - https://gerrit.wikimedia.org/r/179159 [17:50:52] Analytics-Engineering, Analytics-Wikimetrics: Eng has vetted data in Data Warehouse [13 pts] - https://phabricator.wikimedia.org/T78019#842185 (kevinator) p:Normal>High [17:52:46] Analytics-Engineering: [Volunteer] Improve Generate.py [13 pts for the Analytics Eng team] - https://phabricator.wikimedia.org/T76407#842194 (kevinator) p:High>Normal [17:53:02] Analytics-Engineering: Write new Config Script [13 pts] - https://phabricator.wikimedia.org/T76408#842199 (kevinator) p:High>Normal [17:53:21] rtnpro: milimetric is in a meeting, I’ll have him get back to you [17:53:38] tnegrin, thanks :) [17:53:44] rtnpro: sorry - yes of course it's ok [17:53:51] as always thanks for the work [17:53:56] milimetric, thanks :) [17:54:02] Analytics-Engineering, Analytics-Wikimetrics: Uploading cohort by copy-pasting breaks if names contain special characters [8 pts] - https://phabricator.wikimedia.org/T76105#842213 (kevinator) p:Normal>High [17:55:15] Analytics-Engineering, Analytics-Wikimetrics: Fix oauth and do a quick pre-security review [13 pts] - https://phabricator.wikimedia.org/T76779#842224 (kevinator) p:Normal>High [17:55:36] Analytics-Engineering, Analytics-EventLogging: Automate pruning of sampled logs after 90 days [0 pts] - https://phabricator.wikimedia.org/T74743#842227 (kevinator) p:Normal>Low [17:55:48] Analytics-Engineering, Analytics-EventLogging: Automate purge of rows older than 90 days for select tables/schemas [0 pts] - https://phabricator.wikimedia.org/T74744#842231 (kevinator) p:Normal>Low [17:55:56] Analytics-Engineering, Analytics-Wikimetrics: Story: WikimetricsUser deletes user from cohort [21 pts] - https://phabricator.wikimedia.org/T75350#842234 (kevinator) p:Normal>High [18:01:17] Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#842251 (Ottomata) From: https://wikitech.wikimedia.org/wiki/Distribution_upgrades "python-diamond has transitioned to diamond, after the trusty-wikimedia repo has been re-enabled upgrade with: ```ap... [18:23:25] Analytics-Engineering, Analytics-Wikimetrics: Story: WikimetricsUser deletes user from cohort [21 pts] - https://phabricator.wikimedia.org/T75350#842277 (kevinator) [18:25:23] Analytics-Engineering, Analytics-Wikimetrics: Wikimetrics auditor has read-only login to Wikimetrics DB [3 pts] - https://phabricator.wikimedia.org/T76109#842287 (kevinator) [18:26:30] Analytics-Engineering, Analytics-Wikimetrics: Fix oauth and do a quick pre-security review [13 pts] - https://phabricator.wikimedia.org/T76779#842294 (kevinator) [18:26:58] Analytics-Engineering, Analytics-Wikimetrics: Wikimetrics User reads disclaimer on website [3 pts] - https://phabricator.wikimedia.org/T76107#842295 (kevinator) [18:27:36] Analytics-Engineering, Analytics-Wikimetrics: Fix oauth and do a quick pre-security review [13 pts] - https://phabricator.wikimedia.org/T76779#819674 (kevinator) [18:27:47] Analytics-Engineering, Analytics-Wikimetrics: Wikimetrics auditor has read-only login to Wikimetrics DB [3 pts] - https://phabricator.wikimedia.org/T76109#842300 (kevinator) [18:27:52] Analytics-Engineering, Analytics-Wikimetrics: Wikimetrics User reads disclaimer on website [3 pts] - https://phabricator.wikimedia.org/T76107#789675 (kevinator) [18:27:57] Analytics-Engineering, Analytics-Wikimetrics: Story: WikimetricsUser deletes user from cohort [21 pts] - https://phabricator.wikimedia.org/T75350#842302 (kevinator) [18:30:35] milimetric: trying to re-join [18:32:36] Analytics-Engineering, Analytics-Wikimetrics, Analytics-Dashiki: Vital Signs user sees annotations on graphs [13 pts] - https://phabricator.wikimedia.org/T78151#842305 (kevinator) [18:32:56] Analytics-Engineering, Analytics-Wikimetrics: Uploading cohort by copy-pasting breaks if names contain special characters [8 pts] - https://phabricator.wikimedia.org/T76105#842306 (kevinator) [18:34:03] Analytics-Engineering, Analytics-Wikimetrics: Uploading cohort by copy-pasting breaks if names contain special characters [8 pts] - https://phabricator.wikimedia.org/T76105#789658 (kevinator) [18:34:36] Analytics-Engineering, Analytics-Wikimetrics, Analytics-Dashiki: Vital Signs user sees annotations on graphs [13 pts] - https://phabricator.wikimedia.org/T78151#836977 (kevinator) [18:41:08] Analytics-EventLogging: Engineer reads an email announcement about the documentation for creating a dashboard from EL data [1 pts] - https://phabricator.wikimedia.org/T76367#842319 (kevinator) [18:42:42] Analytics-EventLogging: Engineer reads an email announcement about the documentation for creating a dashboard from EL data [1 pts] - https://phabricator.wikimedia.org/T76367#798813 (kevinator) [18:54:32] Analytics-Engineering: Community has a developer doc "Getting Started with Wikimetrics" - https://phabricator.wikimedia.org/T77075#842352 (kevinator) There's a placeholder for documentation: https://www.mediawiki.org/wiki/Analytics/Volunteering The doc in the github readme is outdated. [19:09:23] milimetric: can you take a look at this one? https://gerrit.wikimedia.org/r/#/c/178887/ [19:09:41] looking [19:09:43] I think it will be worth it to deploy dashiki with this fix when we do update production [19:10:03] (CR) Milimetric: [C: 2 V: 2] Bumping up version of mediawiki-storage on dashiki [analytics/dashiki] - https://gerrit.wikimedia.org/r/178887 (owner: Nuria) [19:17:36] nuria__: I"m not sure about the css change though [19:17:45] I really really dislike manually changing the build system while developing [19:17:59] so the file name thing works [19:18:09] milimetric: we can do the other approach of 00_file 01_file [19:18:53] yeah, i meant to say - while the filename thing is a little awkward, i'd rather do that than change the build system [19:19:10] sure, i have used similar approach before [19:19:24] cool. One other side note there [19:19:33] aham [19:19:34] the symlinks don't work in windows (obviously) [19:19:59] so if someone's developing in windows, keep that in mind, they won't get the style in their local version [19:20:38] yes, i guess they have to use cygwin or similar [19:27:16] (CR) Milimetric: Concatenation of css files by build is deterministic (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/178758 (owner: Nuria) [19:44:45] hey folks, I’m trying to figure out what the heck is going on with some EL events that are missing [19:45:03] is it possible for some events to be available in SQL but not in the raw JSON logs? [19:46:50] milimetric, nuria__: any idea under which conditions this would happen? [19:47:03] ^ [19:58:47] Analytics-EventLogging: Calls to EventLogging::logEvent should either return false or an error message when schema validation fails - https://phabricator.wikimedia.org/T78324#842547 (kaldari) NEW [19:59:01] DarTar: hey - reading [19:59:22] milimetric: howdy [19:59:35] Analytics: Upgrade Analytics Cluster to Trusty, and then to CDH 5.2 - https://phabricator.wikimedia.org/T1200#842557 (Ottomata) Done today: 1033 1011 1019 1028 Only the Kafka Brokers (will be tracked in a different ticket) and the new Hadoop Namenodes are left. [19:59:44] DarTar: events will be on SQL but not tin the logs if they are of certaing age [19:59:50] DarTar: so the one issue i know happened was invalid table names caused some event logging events to error out and not be inserted [20:00:22] DarTar: as we are purging logs more frequently than teh db, so that case will happen often [20:00:33] hmm, I am looking at recent events (just generated) that are being logged in SQL but not visible in the all-events log [20:01:01] for example? [20:01:06] I might be doing something wrong, here’s an example [20:01:57] we should have several events with my event_userToken (a field in the WikiGrok logs): zvjZiSY9TGYy8vhPD9lWGYfx8J56wV4O [20:02:07] they show up in SQL [20:02:35] but can’t find them when I zgrep all-events.log or client-side-events.log on stat1003 [20:03:23] is there a lag in the raw logs on stat1003? [20:03:37] can you send me like say 5 or 10 db records? [20:03:43] Analytics-EventLogging: Cannot pass null to EventLogging::logEvent for optional boolean fields - https://phabricator.wikimedia.org/T78325#842558 (kaldari) NEW [20:04:00] DarTar: so i have more than 1 example to go by? [20:04:00] nuria__: sure, uuids ok? [20:04:09] DarTar: sure [20:07:20] nuria__: mail for you [20:07:43] DarTar: ok, checking, give me couple mins [20:07:49] sure :) [20:08:55] interesting - i'll read along but i gotta step out for a bit [20:11:10] milimetric: np [20:26:25] DarTar: For xample, something that can happen is that events that have a timestamp of '20141130230235' on db [20:26:39] are actually present on the dec-1st all-raw logs [20:26:46] nuria__: right [20:27:00] so for events generated now (or 30 minutes ago), [20:27:08] rather than nov 30th as that timestamp will indicate [20:27:10] which log should I use? [20:27:19] log file, that is [20:28:22] nuria__: I am zgrepping all-events.log-20141211.gz [20:28:52] since now it's Thu Dec 11 20:27:31 the all-raw logs should not be rotating so i would expect events to be in 20141211 log [20:29:07] right [20:29:25] but that log real time [20:29:28] is not on 1003 [20:29:32] it's on vanadium [20:29:47] sure, and we can’t access it, but the lag should not be long right? [20:29:52] so are those missing uuids on vanadium [20:29:52] and gets rsync to 1003 periodically [20:30:07] milimetric: no, dartar does not have access to vanadium [20:30:09] how often does the rsync happen? [20:30:16] milimetric: DarTar is looking in 1003 [20:30:33] i know - my point is - to track these missing log lines to find out what happened to them [20:30:42] we can compare the logs from stat1003 with the ones from vanadium [20:31:35] the earliest event with usertoken zvjZiSY9TGYy8vhPD9lWGYfx8J56wV4O has timestamp 20141211191008 [20:31:53] that’s more than an hour ago [20:35:54] milimetric: right, that is what i am looking ta [20:35:56] *at [20:36:11] ottomata: which account do I use to log into Hue? My wikitech account doesn’t seem to work [20:36:45] shell username [20:36:52] and wikitech password [20:37:00] i should probably change that...to use wikitech username :/ [20:37:21] DarTar: If i know how to read puppet the rsync from vanadium to 1003 is happening once a day [20:37:44] nuria__: ha, interesting [20:38:14] for some reason I thought that data on 1003 would be virtually instantaneously available, that’s good to know [20:38:25] thanks ottomata, it worked [20:39:17] DarTar: ya, i can see some of the events you sent me in vanadium (did not checked all) [20:39:31] nuria__: cool, that’s reassuring [20:39:57] we still have this problem that for data QA (right after or before a deployment) a 24h lag is inconvenient [20:40:25] in other words: SQL events will generally be available with a short lag (assuming no replag), but they only capture valid events [20:40:26] (PS5) Ottomata: Add MapReduce job to convert Mediawiki XML Export Dumps into Revision Avro records [analytics/refinery/source] - https://gerrit.wikimedia.org/r/171056 [20:40:49] DarTar: If QA needs to test validity of events should do so in beta labs [20:40:53] not prod [20:40:53] while invalid events can only be identified in the raw/json logs, but these are available with a 24h log [20:41:20] DarTar: by the time we get to prod that can be tested in 1) vagrant and 2) beta labs [20:41:25] well, the reality is that we know that things often break when they hit prod [20:41:54] and, we have validation issues that we can only identify when a large number of users use a feature [20:42:03] DarTar: mmm... for validity of events... mmmm... that is unlikely [20:42:06] (due to specific clients/browsers) [20:42:16] DarTar: that can be tested on vagrant with a sampling of 1:1 [20:42:24] and user agent spoofing [20:42:37] DarTar: Same for beta labs which QA uses greatly [20:42:47] what I mean is that we cannot redirect live traffic from prod to beta labs [20:43:42] DarTar:developers can easily test validation in vagrant [20:44:07] DarTar: regardless of the event, as you can fake it, [20:44:27] DarTar: I just did that myself with Matt for sendbeacon yesterday [20:44:28] let me put it like this: there are types of validation that can be tested by developers by faking events [20:44:38] and that’s what we do all the time [20:44:47] but there are other issues that only occur in the wild [20:45:13] and typically are not captured by manual tests made by the devs at a small scale and with a limited set of browsers/clients [20:45:21] DarTar: ok, can you give me an example so i can better understand? [20:45:27] I wish we had automated unit testing for data across multiple platforms [20:45:42] sure: [20:46:17] DarTar: write up a card — we might be able to get it in in the Q3 project [20:46:21] we had one version of a feature used in an A/B test which worked well after all the developer tests [20:46:47] but as soon as we released it in production we started seeing missing responses [20:47:08] DarTar: but that indicates perhaps missing test cases on suite? [20:47:11] it was not clear if this was an instrumentation or a UI issue [20:47:36] well again, we don’t do automated unit testing for *data* :) [20:48:10] What was the common denominator in those missing responses? [20:49:13] we sorted out after the fact that it was an issue with how the event was generated (kaldari just filed a bug), but it would have been practical to realize earlier on that these events were failing validation [20:50:35] now, maybe a simple fix is for this problem is to have invalid events written into SQL (a separate DB), we floated this idea a while ago and I think it would be immensely useful to catch these problems early [20:51:52] DarTar: The best plan to action -seems to me- is to spot how to best test that on the beta labs development environment, on my opinion any effort that moves more testing to production is a missguided effort. By the time the code gets there we should not need to "test" it. [20:52:23] DarTar: Perhaps this case in point could benefit from a discussion with mobile on how could they could have caught this earlier. [20:52:53] nuria__: I am not in the best position to answer that, but this is not a mobile-specific issue, I’ve been dealing with this problem since EL was first launched [20:52:59] agreed that it needs discussion [20:53:30] DarTar: but when it was 1st launched beta labs did not worked, now it does [20:53:46] DarTar: we troubleshotted there successfully couple issues as of recent [20:53:51] nuria__: but I also think there is some opportunity for you guys to faciliate the work of yoru customers [20:54:19] like I said, making invalid events available for data QA was one of the design principles that Ori and I had agreed upon [20:54:25] DarTar: that is why we went through all the trouble of getting beta labs in shape [20:54:39] DarTar: it took work from me, ori and Yuvi [20:54:53] DarTar: plus some of the labs folks [20:55:12] got it, looks like good material for a retrospective :) [20:55:29] DarTar: agree, all that data is available in the testing environment (for anyone) on beta labs which QA uses before every release [20:55:43] Now, it is true that some teams use it more than others [20:56:09] DarTar: given how useful it is the idea would be that teams use it more and us vouch to keep it healthy [20:56:13] still, my point that we don’t do automated *data* unit testing remains and is orthogonal to the beta labs vs prod issue [20:56:31] so we need to address that [20:57:36] DarTar: I agree with that as long as there has been sufficient testing for instrumentation for features in the testing environment, many data problems can be solved at that stage (but, of course, not all) [20:57:42] nuria__: I have to run, but this is super-helpful information, thanks for checking, I’ll send a note to the analytics list [20:57:50] DarTar: sure, thank you [21:04:10] hey DarTar, do you know what is up with community-analytics.wikimedia.org? [21:04:21] is that something we still use (it is not online), i don't think its been updated since 2012 [21:04:44] ottomata: hmm I don’t know (in a meeting, will check it when I’m done) [21:04:55] k [21:05:14] just wondering if I can remove it [21:13:05] (PS3) Nuria: Concatenation of css files by build is deterministic [analytics/dashiki] - https://gerrit.wikimedia.org/r/178758 [21:19:23] milimetric, mforns : see how css files will look like: https://gerrit.wikimedia.org/r/178758 [21:19:37] not supper bad, makes pretty obvious what is going on [21:19:38] i was just looking :) [21:20:02] milimetric: you are a speedy ninja [21:20:19] (CR) Milimetric: [C: 2 V: 2] Concatenation of css files by build is deterministic [analytics/dashiki] - https://gerrit.wikimedia.org/r/178758 (owner: Nuria) [21:21:29] (PS2) Nuria: Bumping up version of mediawiki-storage on dashiki [analytics/dashiki] - https://gerrit.wikimedia.org/r/178887 [21:21:41] nuria__: I assumed the gulp glob always comes in alphabetical order [21:21:52] but why is concating style.css still required? [21:21:55] and isn't that wrong? [21:21:57] milimetric: ya, it does 20.000 convoluted node things [21:22:04] and ta-tachannn runs "ls" [21:22:12] (it's a separate issue, so I merged regardless) [21:23:22] milimetric: that is the way the build has always worked, look at network tab on: http://ncase.me/polygons/ [21:23:26] SORRY! [21:23:33] https://metrics.wmflabs.org/static/public/dash/#projects=ruwiki,itwiki,dewiki,frwiki,enwiki,eswiki,jawiki/metrics=RollingActiveEditor [21:23:41] there is only one css file [21:23:50] :) I love that first link better, I'll just read that again, so good [21:24:05] oh sorry I misread what that code is doing [21:24:07] k, makes sense [21:24:08] milimetric: it can of course can be as many as we want but that is teh current default [21:24:16] I loved the polygons too [21:33:16] milimetric: ahem... shouldn't this one have merged too? https://gerrit.wikimedia.org/r/#/c/178887/ [21:35:49] (CR) Milimetric: [V: 2] Bumping up version of mediawiki-storage on dashiki [analytics/dashiki] - https://gerrit.wikimedia.org/r/178887 (owner: Nuria) [21:36:10] oh, it was ready to submit except the rebase took off my earlier review [21:36:11] sorry [21:44:19] Analytics-Wikimetrics, Analytics-Engineering: "Validate Again" functionality is broken - https://phabricator.wikimedia.org/T78339#842830 (mforns) NEW [21:48:37] ottomata: re: community-analytics.wikimedia.org I have no idea what that’s for, if it’s not used by other teams (Grantmaking?) I think we can safely take it down [21:48:52] DarTar, I thikn you had me create it a long time ago [21:49:00] oops [21:49:11] ottomata: usermetrics? [21:49:17] naw, maybe? it was before that [21:49:30] but maybe that was your intention? there looks to be some django code in the doc root [21:49:32] I wasn’t even born before usermetrics [21:49:53] hmm, where is the doc root? [21:50:03] ah I found some emails! [21:50:06] it is on stat1001 [21:50:07] I can quickly check it out [21:50:08] ah [21:50:12] i think it wasn't you, maybe just diederik [21:50:15] def a ryan faulkner thing [21:50:18] good memories, stat1001 :) [21:50:23] https://rt.wikimedia.org/Ticket/Display.html?id=3001 [21:50:41] yup, user metrics [21:50:50] version -1.0 i guess :p [21:50:58] alright, then. We can take it down [21:51:25] awesome. [21:51:27] thank you! [21:51:30] np [21:53:41] (CR) Gergő Tisza: [C: 2] Update the SQL queries for the new versions of the schemas [analytics/multimedia] - https://gerrit.wikimedia.org/r/179159 (owner: Gilles) [21:53:51] (Merged) jenkins-bot: Update the SQL queries for the new versions of the schemas [analytics/multimedia] - https://gerrit.wikimedia.org/r/179159 (owner: Gilles) [22:04:47] (PS2) Mforns: Add cohort membership page [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/178746 [22:07:26] (CR) Mforns: Add cohort membership page (3 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/178746 (owner: Mforns) [22:10:49] nuria__, I pushed your suggested changes, please let me know what you think when you have some time :] [22:11:48] mforns: will do, i will be offline for some time in a bit, will get back to those later. [22:11:54] Hello. May I ask where page views are stored, exactly? I see even Wikipedia has such data only for the last 90 days. [22:12:02] nuria__, thanks! [22:14:34] hi gry! I think qchris can help you, but he is offline, not feeling well today... [22:15:13] ok; hope he gets better soon :) [22:15:24] gry [22:15:25] http://dumps.wikimedia.org/other/pagecounts-raw/ [22:15:30] and there is also [22:15:34] http://dumps.wikimedia.org/other/pagecounts-all-sites/ [22:15:48] oh, there you are :] [22:16:06] I already saw these -- I would like to know where they come from, i.e. how they're obtained :) [22:16:59] gry [22:16:59] https://wikitech.wikimedia.org/wiki/Analytics/Webstatscollector [22:17:10] (it mentions some squid server logs, but they're not unique visits, and I feel that's a little unfair) [22:17:30] also [22:17:33] https://wikitech.wikimedia.org/wiki/Analytics/Pagecounts-all-sites [22:18:59] It looks like all these things count hourly without caring about multiple visits from the same IP or user. Is such my understanding correct? [22:21:08] gry: They are pageviews, not unique visits [22:22:01] so every user visit will be counted even if it is for the same page [22:22:39] gry: hopefully that makes sense [22:24:57] gry: (the counting has some known caveats but its goal is to count pageviews) [22:28:09] Aye. I would like to count visits and record user agent, please, — so that we can get rid of bot visits properly. [22:28:25] Is there any work in that direction? :-) [22:38:45] gry, yes, some, but none that I can comment on. That is a tough one, since identifying or tracking users has privacy concerns. [22:43:32] I believe that a program should be able to track and then anonymize such data, with only the end result available. Assuming the WMF may have internal access to such private data. Would be glad to hear about effort in that direction. :) [22:52:15] yeah, I would like to see that as well! we will see. The 'anonymize' thing is tough. Aggregations I think are ok. There are folks thinking (and debating) about this, for sure. [23:10:08] halfak / ottomata: just double checking - you don't think I'm blocking any changes you guys are trying to make right? [23:10:12] I was just making a general point [23:10:20] Not at all. :) [23:10:23] YAY [23:10:25] QCHRIS IS GLORIOUS [23:10:27] k, cool [23:10:27] We're just charging ahead either way. [23:10:49] great, I'm very excited. And halfak: though I was out of touch I've been fiendishly reading through your code and thinking things through [23:11:17] :) I'm just about to push some changes that will make our decisions about naming easy to adjust for. :) [23:11:21] I just think it'll take me longer than expected to come up with a diagram of the "fastest to market" system and "a maintainable nice" system [23:11:51] milimetric, no worries. [23:12:05] Right now, I'm focused on doing exploratory analysis. [23:12:38] Analytics-Wikimetrics, Analytics-Engineering: Fix oauth and do a quick pre-security review [13 pts] - https://phabricator.wikimedia.org/T76779#843060 (Milimetric) a:Milimetric [23:13:07] The streaming scripts I am working on represent the simplest possible version of the operations we'll need to implement in WikiCredit. [23:13:18] Have you been looking at those or the diffengine hairball? [23:14:08] i'm looking at both [23:14:22] and was trying to think of everything from how the diffs are computed to how the pieces fit together [23:14:54] halfak: let's talk real time for a second if you have one [23:15:13] how often do you envision the score of each user would be updated [23:15:26] score - ew, bad word [23:15:45] milimetric, voip or IRC? [23:15:52] up to you [23:16:03] * halfak dials milimetric [23:16:37] Or wait. Batcave! [23:16:40] https://plus.google.com/hangouts/_/wikimedia.org/a-batcave [23:20:35] ooo i am interested [23:20:37] but i am busy! [23:20:38] HMMM [23:29:52] ori, what are the keywords I can use in Filter rules to figure out if EL is blocked via Adblock Plus? [23:31:53] when I search for "event" I see ".com/log?event" in the list for example [23:44:14] leila: you can looks for bits.wikimedia.org [23:44:44] I just sent an email: it's blocking /event.gif? [23:44:53] ori, ^ [23:48:57] leila: you're noticing that it blocks URLs of that pattern, or you've identified the pattern in one of the rule lists? [23:50:05] oh yeah, i just saw your e-mail [23:50:11] yeah, it looks like easyprivacy filters event.gif