[00:52:03] 10Analytics, 10Community-Tech, 10SVG Translate Tool, 10Community-Tech-Sprint: Integrate Piwik with SVG Translate to keep track of metrics - https://phabricator.wikimedia.org/T215478 (10Samwilson) Merged, and this is now live on the staging site — which is not good. Oops. I'd forgotten that we have the 'pro... [03:27:15] (03PS1) 10GoranSMilovanovic: re-engineer data sets [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/494156 [03:27:32] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] re-engineer data sets [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/494156 (owner: 10GoranSMilovanovic) [03:29:39] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering: Pyspark2 fails to read.csv when run with spark2-submit - https://phabricator.wikimedia.org/T217156 (10GoranSMilovanovic) @Ottomata Thank you for your support. I have actually used the approach and parts of code suggested in T217156#4991118, so yo... [06:02:05] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10chelsyx) Here is the link to the temporary dashboard/report: https://people.wikim... [06:26:39] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10chelsyx) Hello @santhosh , I think it would be helpful if we can add a field (`in... [06:50:57] morning people [06:51:02] analytics-store is gone :) [06:52:28] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) MySQL has been stopped on dbstore1002 and won't be started again, as this host will be decommissioned [06:52:38] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Marostegui) MySQL has been stopped on dbstore1002 and won't be started again, as this host will be decommissioned [06:52:40] Devoid of context that sounds like pretty bad news [06:52:42] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 2 others: Replace the current multisource analytics-store setup - https://phabricator.wikimedia.org/T172410 (10Marostegui) MySQL has been stopped on dbstore1002 and won't be started again, as this host will be decommissioned [06:52:46] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10Marostegui) MySQL has been stopped on dbstore1002 and won't be started again, as this host will be decommissioned [06:53:23] hare: nono we have replaced it with three brand new hosts and migrated people some time ago :) [06:53:35] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Marostegui) [06:53:46] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Marostegui) 05Stalled→03Open [07:52:25] Hi a-team. The data in Grafana–can it be accessed from mwmaint1002 or stat1007? Or only through the web at http://grafana.wikimedia.org/ ? [07:52:51] By "accessed from mwmaint1002 or stat1007" I mean "run a query in the shell". [07:53:28] aharoni: hi! so grafana pulls the data from either graphite (graphite.wikimedia.org) or prometheus [07:53:46] depending on how the metric is stored [07:54:12] I haven't done anything related in the past so I am fairly ignorant, but there might be some documented api to use [07:54:22] what is the use case? (To see if I can help) [07:59:19] elukey: hmm, I'm not sure. Something like this: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/476480/5/api/ApiContentTranslationPublish.php . Does this code make sense? Does it log to Grafana? [07:59:38] (If this code is too obscure and too far removed, I'll try to find a clearer example.) [08:06:02] Morning elukey [08:06:07] RIP analytics-store [08:13:38] aharoni: so that thing logs to graphite [08:13:50] do you want to see the metric? [08:14:40] joal: bonjour! [08:21:09] aharoni: so the full metric name should be (in graphite) MediaWiki.cx.publish.highmt.etc.. [08:21:35] the in grafana you can create graphs based on those graphite metrics [08:23:49] elukey: Ideally, I want to write a script that combines this data, per day or per week, with other queries that I run on mwmaint1002 and show on a Dashiki board. [08:24:46] aharoni: ah ok! [08:31:35] aharoni: https://graphite.wikimedia.org/S/0 is an example of graph generated for all wikis with 'mean' [08:31:41] you also have 'count' etc.. [08:31:50] I don't know how to fetch the data though [08:32:11] elukey: does anyone know? :) [08:32:47] aharoni: I can ask but it would be better in my opinion if all the metrics to display were in graphite [08:32:54] to easily combine them in grafana [08:33:21] it might be difficult to get time series from graphite and combine them with other data on mwmaint/stat etc.. [08:34:29] is it me, or do we have too many stats tools? :) [08:34:43] ahahah :) [09:06:45] 10Analytics, 10Performance-Team: ServerTiming schema value for duration is 0 - https://phabricator.wikimedia.org/T217111 (10Gilles) Currently we're only using Server-Timing to pass Varnish caching information, which doesn't use the duration field. Server-Timing is a freeform thing we can use. The spec that i... [09:07:55] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering: Pyspark2 fails to read.csv when run with spark2-submit - https://phabricator.wikimedia.org/T217156 (10JAllemandou) I meant to comment last week and didn't - I completely support the idea of using HDFS for reading data files and not the `--file` o... [09:44:38] (03PS3) 10Thiemo Kreuz (WMDE): Make connections persistent in WikimediaDb lib class [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/493214 (https://phabricator.wikimedia.org/T216613) [09:45:13] (03PS1) 10Thiemo Kreuz (WMDE): Fix typo in author name "Adddshore" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494186 [09:45:20] (03CR) 10jerkins-bot: [V: 04-1] Fix typo in author name "Adddshore" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494186 (owner: 10Thiemo Kreuz (WMDE)) [09:47:58] 10Analytics, 10Analytics-Dashiki, 10CX-analytics: Dashiki: CX2 translations fails to load - https://phabricator.wikimedia.org/T217506 (10Petar.petkovic) I see `Uncaught TypeError: Cannot read property 'id' of undefined` [09:51:40] (03CR) 10Thiemo Kreuz (WMDE): "Oh. This sounds more like a bug in PDO or MySQL itself, and doesn't have much to do with the proposed caching. The code in question here s" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/493214 (https://phabricator.wikimedia.org/T216613) (owner: 10Thiemo Kreuz (WMDE)) [09:52:36] mmmmm [09:52:55] joal: the CX2 dashiki issue concerns me, did we fix report updater ? [09:53:07] the task was created yesterday though [09:53:22] so now I am wondering if the fact that the staging db was readonly impacted the dashboard [09:54:28] elukey: I unforntunately have no idea [09:54:52] elukey: Can we wait for milimetric to have an eye, or od you prefer me to investigate? [09:57:05] nono it was just a brainbounce [09:57:40] elukey: I have not followed RU queries and usage at all, so I have no idea :S [10:23:26] 10Analytics-Kanban, 10Product-Analytics: Superset Updates - https://phabricator.wikimedia.org/T211706 (10elukey) [10:23:28] 10Analytics, 10User-Elukey: Staging environment for upgrades of superset - https://phabricator.wikimedia.org/T212243 (10elukey) 05Open→03Stalled This task is blocked by the Debian Installer for buster not available for amd64 - https://d-i.debian.org/daily-images/daily-build-overview.html [10:27:15] (03PS1) 10Thiemo Kreuz (WMDE): Add MediaWiki's PHP CodeSniffer rule set [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494190 [10:31:01] (03PS1) 10Thiemo Kreuz (WMDE): Update all code to use the short array syntax [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494192 [10:38:59] (03PS1) 10Thiemo Kreuz (WMDE): Apply all trivial auto-fixes to the PHP code style [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494194 [10:40:33] R.I.P MySQL on dbstore1002 [10:42:44] :) [10:44:01] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+2] "Trivial, comments-only fix in a codebase with a very limited audience." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494186 (owner: 10Thiemo Kreuz (WMDE)) [10:44:10] (03Merged) 10jenkins-bot: Fix typo in author name "Adddshore" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494186 (owner: 10Thiemo Kreuz (WMDE)) [10:45:51] (03CR) 10Thiemo Kreuz (WMDE): Update all code to use the short array syntax (032 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494192 (owner: 10Thiemo Kreuz (WMDE)) [10:48:00] (03PS1) 10Thiemo Kreuz (WMDE): Add missing limits to explode() and fix PHPDoc tags [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494196 [10:54:28] (03CR) 10Ladsgroup: Apply all trivial auto-fixes to the PHP code style (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494194 (owner: 10Thiemo Kreuz (WMDE)) [11:10:44] (03CR) 10Thiemo Kreuz (WMDE): Apply all trivial auto-fixes to the PHP code style (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494194 (owner: 10Thiemo Kreuz (WMDE)) [11:28:21] * elukey lunch! [11:54:02] (03PS1) 10Ladsgroup: Change conditions on active_users [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494203 (https://phabricator.wikimedia.org/T213894) [12:12:19] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10Pginer-WMF) >>! In T212414#4996916, @chelsyx wrote: > Here is the link to the tem... [12:43:43] 10Analytics, 10Dumps-Generation: pageviews dumps contain invalid lines - https://phabricator.wikimedia.org/T217071 (10ArielGlenn) p:05Triage→03Normal [13:03:20] (03CR) 10WMDE-leszek: [C: 03+2] "oops" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494203 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup) [13:03:30] (03Merged) 10jenkins-bot: Change conditions on active_users [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494203 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup) [13:32:37] (03PS1) 10Ladsgroup: Change conditions on active_users [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494217 (https://phabricator.wikimedia.org/T213894) [13:32:44] (03CR) 10Ladsgroup: [C: 03+2] Change conditions on active_users [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494217 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup) [13:32:52] (03Merged) 10jenkins-bot: Change conditions on active_users [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494217 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup) [13:54:50] 10Analytics, 10Patch-For-Review, 10User-Elukey: Enable encryption and authentication for TLS-based Hadoop services - https://phabricator.wikimedia.org/T217412 (10elukey) First attempt to make a sane/configurable TLS certificate deployment is in https://gerrit.wikimedia.org/r/493693. The following is essentia... [14:20:40] 10Analytics, 10Patch-For-Review, 10User-Elukey: Enable encryption and authentication for TLS-based Hadoop services - https://phabricator.wikimedia.org/T217412 (10Ottomata) Sounds great! Only comment I would make is to leave out the _eqiad bit in the private repo cergen/certificates dir hierarchy. It should... [14:40:40] milimetric: yt? [14:41:13] hey fdans [14:41:44] milimetric: can we batcave for a second to brainstorm the dashboard with the metric groups? [14:42:06] k, one sec [14:44:18] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) Next step is to copy data from labsdb1011 and then start mysql and configure analytics-specific things. [14:52:04] 10Analytics, 10Analytics-Dashiki, 10CX-analytics: Dashiki: CX2 translations fails to load - https://phabricator.wikimedia.org/T217506 (10Nikerabbit) Please update your bookmarks (or any place with outdated links). New url is https://language-reportcard.wmflabs.org/cx2/#cx-2-translations (it changes every tim... [14:53:07] 10Analytics, 10Analytics-Dashiki, 10CX-analytics: Dashiki: CX2 translations fails to load - https://phabricator.wikimedia.org/T217506 (10Nikerabbit) For anyone maintaining dashiki: it should not fail on non-existing anchors this way. [15:08:10] 10Analytics, 10Patch-For-Review, 10User-Elukey: Enable encryption and authentication for TLS-based Hadoop services - https://phabricator.wikimedia.org/T217412 (10elukey) >>! In T217412#4997826, @Ottomata wrote: > Sounds great! Only comment I would make is to leave out the _eqiad bit in the private repo cerge... [15:25:53] (03CR) 10Bearloga: Update whitelisting for Android-related schemas (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/493424 (https://phabricator.wikimedia.org/T209087) (owner: 10Bearloga) [15:27:43] 10Analytics, 10Patch-For-Review, 10User-Elukey: Enable encryption and authentication for TLS-based Hadoop services - https://phabricator.wikimedia.org/T217412 (10elukey) [15:31:36] (03PS1) 10Fdans: Add concept of metric groups, rotate between them in dashboard [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/494241 (https://phabricator.wikimedia.org/T187806) [15:31:46] (03CR) 10jerkins-bot: [V: 04-1] Add concept of metric groups, rotate between them in dashboard [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/494241 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [15:36:25] (03PS1) 10Ladsgroup: Rewrite user_langauges.php to use babel table and Inclusion–exclusion principle [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494245 (https://phabricator.wikimedia.org/T213894) [15:39:06] 10Analytics: Bug when toggling Chrome mobile view - https://phabricator.wikimedia.org/T217559 (10Milimetric) [15:39:13] 10Analytics: Bug when toggling Chrome mobile view - https://phabricator.wikimedia.org/T217559 (10Milimetric) p:05Triage→03Normal [15:43:43] (03PS1) 10Ladsgroup: Remove WikimediaDb::getPdo() [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494248 (https://phabricator.wikimedia.org/T213894) [15:56:36] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+2] "Confirmed. The only usage was removed in the parent patch." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494248 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup) [16:08:21] 10Analytics, 10Analytics-Kanban, 10Discovery, 10EventBus, and 2 others: EventBus mediawiki outage 2019-02-28 - https://phabricator.wikimedia.org/T217385 (10Ottomata) [16:09:12] 10Analytics, 10Analytics-Kanban, 10Discovery, 10EventBus, and 2 others: EventBus mediawiki outage 2019-02-28 - https://phabricator.wikimedia.org/T217385 (10Ottomata) a:03Ottomata After this finished, I verified that last event in the file was produced. [16:09:50] 10Analytics, 10Analytics-Kanban, 10Operations, 10Wikimedia-Stream, and 2 others: Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Ottomata) [16:10:02] 10Analytics, 10Analytics-Kanban, 10Discovery, 10EventBus, and 2 others: EventBus mediawiki outage 2019-02-28 - https://phabricator.wikimedia.org/T217385 (10Ottomata) [16:10:05] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10bmansurov) Hi @RyanSteinberg. Thanks for the analysis. - Page title has been removed in {T206083}. - As for missing page id, I th... [16:17:12] !log disable all report updater jobs via puppet (ensure => absent) due to dbstore1002 decom [16:17:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:32:49] fdans: yoohoo [16:34:24] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10Aklapper) [16:34:42] 10Analytics, 10Analytics-Kanban, 10Discovery, 10EventBus, and 2 others: EventBus mediawiki outage 2019-02-28 - https://phabricator.wikimedia.org/T217385 (10Milimetric) p:05Triage→03High [16:35:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Enable encryption and authentication for TLS-based Hadoop services - https://phabricator.wikimedia.org/T217412 (10Milimetric) [16:35:56] 10Analytics, 10Dumps-Generation: pageviews dumps contain invalid lines - https://phabricator.wikimedia.org/T217071 (10Milimetric) p:05Normal→03High [16:36:51] 10Analytics, 10Analytics-Kanban: Explain with annotations start of new registered users data - https://phabricator.wikimedia.org/T215887 (10Milimetric) p:05Triage→03Normal a:03Milimetric [16:38:07] 10Analytics, 10Analytics-Kanban: Issues with page deleted dates on data lake - https://phabricator.wikimedia.org/T190434 (10Milimetric) a:05Milimetric→03JAllemandou [16:39:25] 10Analytics: Update datasets to have explicit timestamp for druid indexation facilitation - https://phabricator.wikimedia.org/T205617 (10Milimetric) [16:39:51] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Pixel ratio messed up on Windows Chrome - https://phabricator.wikimedia.org/T194428 (10Milimetric) a:03Milimetric [16:43:55] 10Analytics: Quantify volume of traffic on piwik with DNT header set - https://phabricator.wikimedia.org/T199928 (10Milimetric) [16:46:54] 10Analytics, 10Operations, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10Milimetric) p:05Triage→03High a:03RobH [16:47:25] 10Analytics: Test sqooping from the new dedicated labsdb host - https://phabricator.wikimedia.org/T215550 (10Milimetric) [16:48:09] 10Analytics, 10Operations, 10RESTBase, 10Traffic, and 2 others: Verify that hit/miss stats in WebRequest are correct - https://phabricator.wikimedia.org/T215987 (10jbond) p:05Triage→03Normal [16:48:40] 10Analytics: Make edit data lake data available as a snapshot on dump hosts that can be sourced by Presto - https://phabricator.wikimedia.org/T214043 (10Milimetric) [16:50:24] 10Analytics, 10Analytics-Kanban: Make edit data lake data available as a snapshot on dump hosts that can be sourced by Presto - https://phabricator.wikimedia.org/T214043 (10Milimetric) [16:51:56] 10Analytics, 10Pageviews-API: Pageview API: Better filtering of bot traffic on top enpoints - https://phabricator.wikimedia.org/T123442 (10Milimetric) [16:52:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging's processors stopped working - https://phabricator.wikimedia.org/T200630 (10Milimetric) 05Open→03Declined Closing because we haven't seen the issue resurface. Our temporary fix seems fine until we transition off of EventLogging, hopefully... [16:52:13] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging's processors stopped working - https://phabricator.wikimedia.org/T200630 (10Milimetric) [16:52:21] 10Analytics, 10Analytics-Kanban: Set a timeout for regex parsing in the Eventlogging processors - https://phabricator.wikimedia.org/T200760 (10Milimetric) 05Open→03Declined Closing because we haven't seen the issue resurface. Our temporary fix seems fine until we transition off of EventLogging, hopefully... [16:56:23] 10Analytics-Kanban: Enable automatic ingestion from eventlogging into druid for some schemas - https://phabricator.wikimedia.org/T190855 (10Milimetric) 05Open→03Declined not ready to think about this until we have a schema registry and more concrete thoughts on this kind of metadata. [16:57:00] 10Analytics-Kanban: Raise Edit Data Quality to the point where we can offer snapshots on Cloud (labs) environment - https://phabricator.wikimedia.org/T204953 (10Milimetric) a:03JAllemandou [16:57:53] 10Analytics: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10Milimetric) [16:58:04] 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10Ottomata) a:05JAllemandou→03None [16:58:07] 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10Milimetric) [16:58:10] 10Analytics-Kanban: Raise Edit Data Quality to the point where we can offer snapshots on Cloud (labs) environment - https://phabricator.wikimedia.org/T204953 (10Milimetric) a:05JAllemandou→03None [16:58:30] 10Analytics: Make edit data lake data available as a snapshot on dump hosts that can be sourced by Presto - https://phabricator.wikimedia.org/T214043 (10Milimetric) [16:59:12] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service - https://phabricator.wikimedia.org/T201068 (10Ottomata) [16:59:33] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201063 (10Ottomata) [16:59:38] 10Analytics, 10Analytics-Kanban: reportupdater TLC - https://phabricator.wikimedia.org/T193167 (10Milimetric) [17:00:17] 10Analytics: reportupdater TLC - https://phabricator.wikimedia.org/T193167 (10Milimetric) [17:00:44] 10Analytics, 10Analytics-Wikistats: Wikistats Beta - https://phabricator.wikimedia.org/T186120 (10Ottomata) [17:00:57] 10Analytics, 10Analytics-Wikistats: Wikistats Beta - https://phabricator.wikimedia.org/T186120 (10Ottomata) p:05Normal→03High [17:02:46] 10Analytics-EventLogging, 10Analytics-Kanban: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10Milimetric) [17:02:48] 10Analytics-EventLogging, 10Analytics-Kanban: Find an alternative query interface for eventlogging on analytics cluster that can replace MariaDB - https://phabricator.wikimedia.org/T189768 (10Milimetric) 05Open→03Resolved a:03Milimetric We decided on a combination of Presto and Druid to handle this kind... [17:03:39] 10Analytics, 10Analytics-EventLogging: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10Milimetric) [17:04:23] 10Analytics, 10Analytics-Kanban: Measure Community Backlog. - https://phabricator.wikimedia.org/T155497 (10Milimetric) [17:04:28] 10Analytics: Measure Community Backlog. - https://phabricator.wikimedia.org/T155497 (10Milimetric) [17:06:40] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0 Remaining reports. - https://phabricator.wikimedia.org/T186121 (10Milimetric) [17:08:23] 10Analytics-Kanban, 10Cloud-Services: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950 (10Milimetric) [17:08:33] 10Analytics-Kanban, 10Cloud-Services: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950 (10Milimetric) [17:08:36] 10Analytics-Kanban, 10Cloud-Services: Provide mediawiki history data to Cloud Services users - https://phabricator.wikimedia.org/T169572 (10Milimetric) [17:15:58] (03CR) 10Milimetric: "ping @nuria, let's discuss and assign review" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471722 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [17:16:55] (03CR) 10Milimetric: [C: 03+2] Update the unit-test for Dataframe conversion [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476093 (https://phabricator.wikimedia.org/T210465) (owner: 10Joal) [17:18:05] (03PS2) 10Fdans: Add concept of metric groups, rotate between them in dashboard [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/494241 (https://phabricator.wikimedia.org/T187806) [17:19:29] (03CR) 10jerkins-bot: [V: 04-1] Add concept of metric groups, rotate between them in dashboard [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/494241 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [17:20:38] (03CR) 10Milimetric: "@mforns will review this" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/491494 (https://phabricator.wikimedia.org/T216603) (owner: 10Joal) [17:21:18] (03CR) 10Milimetric: "@milimetric will review this" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492304 (https://phabricator.wikimedia.org/T178587) (owner: 10Joal) [17:21:48] (03CR) 10Milimetric: "@ottomata, Joseph says this is ready to merge if you agree" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493237 (owner: 10Joal) [17:23:07] (03CR) 10Milimetric: "@fdans will do this review, after @mforns pings him that the bigger core review is done" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492320 (owner: 10Joal) [17:30:05] Hallo. [17:30:42] I tried asking earlier today, but now there are more people here, so maybe someone will answer :) [17:31:28] Can the data in Grafana be accessed from mwmaint1002 or stat1007? Or only through the web at http://grafana.wikimedia.org/ ? [17:31:53] I want to write a script that combines this data, per day or per week, with other queries that I run on mwmaint1002 and show on a Dashiki board. [17:37:34] aharoni: hi! [17:37:40] so I have some answers [17:38:00] as said earlier on, you'd need to pull data from graphite, not grafana [17:38:27] if you pick the link that I gave you this morning, and append "&format=json" or "&format=raw" you'll get the data that you need [17:38:31] namely the datapoints [17:39:48] OK... so that's through the web. Is there a way, by any chance, to get it from the shell? [17:40:26] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 3 others: RFC: Modern Event Platform - Choose Schema Tech - https://phabricator.wikimedia.org/T198256 (10CCicalese_WMF) [17:40:30] 10Analytics, 10CirrusSearch, 10Discovery, 10Discovery-Search, and 4 others: Expose a metric that reflect EventBus queue pressure - https://phabricator.wikimedia.org/T190416 (10CCicalese_WMF) [17:40:31] aharoni: yeah you can use something like curl or libcurl or similar libraries (Python etc..) to collect the data [17:40:40] from the same link, they will basically make a http call [17:41:09] Hmm... OK, maybe I can try. [17:41:11] the format is graphite oriented though, so time series etc.., might be difficult to combine [17:41:16] with the data that you need [17:41:59] 10Analytics, 10EventBus, 10Core Platform Team Kanban (Done with CPT), 10Services (later), and 2 others: EventBus should not use service container in application logic - https://phabricator.wikimedia.org/T204296 (10CCicalese_WMF) [17:51:45] 10Analytics, 10Analytics-Kanban, 10Discovery, 10EventBus, and 2 others: EventBus mediawiki outage 2019-02-28 - https://phabricator.wikimedia.org/T217385 (10jcrespo) May I ask to help completing documentation (when possible, doesn't have to be now) https://wikitech.wikimedia.org/wiki/Incident_documentation/... [17:55:14] 10Analytics, 10Analytics-Kanban, 10Discovery, 10EventBus, and 2 others: EventBus mediawiki outage 2019-02-28 - https://phabricator.wikimedia.org/T217385 (10Ottomata) Ya will do today. [17:56:58] Gone for diner team, back in a bit [18:13:19] (03PS1) 10GoranSMilovanovic: multi-instance SQL [analytics/wmde/WiktionaryCognateDashboard] - 10https://gerrit.wikimedia.org/r/494280 [18:15:05] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] multi-instance SQL [analytics/wmde/WiktionaryCognateDashboard] - 10https://gerrit.wikimedia.org/r/494280 (owner: 10GoranSMilovanovic) [18:16:31] (03CR) 10Mforns: [C: 04-1] Update whitelisting for Android-related schemas (0310 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/493424 (https://phabricator.wikimedia.org/T209087) (owner: 10Bearloga) [18:17:32] Another question... This is more or less about the usual MediaWiki database, but it's the kind of usage that may be more familiar to analytics people, so I'm asking here. What is the best way to query revision content? It's stored separately in the text table, which, in turn, links to some other storage. It looks like "185151524 | DB://cluster25/39288397 | utf-8,gzip,external". [18:17:41] Should it be queried only in dumps? Or is there a way to look at it using SQL in some way? [18:20:20] * elukey off! [18:20:27] (sorry need to go!) [18:22:27] a-team ^ [18:23:39] aharoni: the most traditional way is likely mw api [18:23:49] dumps, or MW API [18:24:10] joal: has an import of dumps that he's parsed for use in Hive, but I'm not sure how up to date it is [18:24:11] (i was holding out to see if analytics had some special sauce for full-text search. not yet, apparently) [18:24:16] aharoni: yeah and Joseph has been working on some jobs that import dumps and content into Hadoop, so there’s a Hive table [18:24:30] there is one? [18:24:48] I think so [18:24:48] also Search and Cloud Services are working (gradually) to replicate ElasticSearch on Cloud Services [18:24:48] milimetric: Hive would be nice... but if he "has been working", then I guess it's not done yet?.. [18:26:23] i see a joal.mediawiki_wikitext_history table [18:26:38] looks like it was created with a 2018-09 partition [18:26:44] use wmf; desc mediawiki_wikitext_history; [18:26:44] aharoni: what are you looking for? [18:26:49] oh in wmf? [18:26:53] yeah ottomata [18:27:03] oh cool yeah [18:27:05] aharoni: I'm not sure the status but it's there, and it has a revision_text field which has the content [18:27:06] with a 2019-01 partition [18:27:14] I think there's a monthly job yeah [18:27:30] i don't think so milimetric i think it is run manually by jo [18:27:32] aharoni: but I think it's not done in the sense of it's not vetted or very publicized yet [18:27:57] it was ottomata but now it's auto: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0131429-181112144035577-oozie-oozi-C/ [18:28:18] what I'm not sure about is how vetted the data is and how tested actually querying it is [18:28:20] ah cool [18:28:55] oh right i remember, that's why he made the hdfs 'rsync' stuff [18:28:58] hare: to your point/question, I'm not sure how easy it is to actually do full-text search using this table, but give it a shot, let us know [18:29:20] aharoni: as long as you aren't looking for really up to date stuff, hive should have what you need [18:30:57] milimetric: I want to examine articles that are in a certain category, but the articles' presence in this category is very volatile. It's supposed to be removed. So I need to look at revisions, not just text. [18:31:27] It's probably OK if it's not super up to date. [18:31:38] aharoni: yeah in that table you have exactly what’s in dumps, text per revision [18:37:02] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10dr0ptp4kt) @chelsyx wold it be possible to have two y-axes for so that it's easi... [18:37:17] 10Analytics, 10Analytics-Kanban, 10Discovery, 10EventBus, and 2 others: EventBus mediawiki outage 2019-02-28 - https://phabricator.wikimedia.org/T217385 (10Ottomata) Done [18:55:29] 10Analytics, 10MediaWiki-extensions-GrowthExperiments, 10Product-Analytics, 10Growth-Team (Current Sprint): Homepage: instrumentation - https://phabricator.wikimedia.org/T216586 (10Ottomata) [19:23:09] Thanks milimetric for the precise desc of current status of wikitext :) [19:25:44] aharoni, hare: if you play with the wikitext, 2 little things to keep in mind - 1) data is partition per project (field name `wiki_db`), be sure to filter for the projects you're after - 2) When using regexp, test them for efficiency (lazy vs greedy) as it makes a huge difference at the end (revisions are big, and there are a huge amount of them0 [19:41:36] 10Analytics, 10EventBus, 10Operations, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019. - https://phabricator.wikimedia.org/T217359 (10jbond) p:05Triage→03Normal [19:48:32] milimetric: so can I actually use mediawiki_wikitext_history ? I tried running `select * from mediawiki_wikitext_history where wiki_db = 'enwiki' limit 1;` and got: "SLF4J: Class path contains multiple SLF4J bindings". [19:48:46] But I never learned hive properly, so I may be missing something basic. [19:51:29] aharoni: The table contains very big bunch of stuff, so the default hive client doesn't work as-is - You need: export HADOOP_HEAPSIZE=4096 && hive [19:51:35] Then: use wmf; [19:51:57] and finally: select * from mediawiki_wikitext_history where wiki_db = 'enwiki' and snapshot='2019-01' limit 1; [19:51:59] aharoni: that's just a warning [19:52:19] yeah, as Jo says, use the snapshot partition as well as the wiki one [19:52:31] the real error was that there was too much data [19:52:32] aharoni: don't forget the snapshot partition (otherwise you query multiple dumps), and also, maybe try on simplewiki (that's a lot smaller than enwiki for tests) [19:59:10] 10Analytics, 10Operations, 10RESTBase, 10Traffic, and 2 others: Verify that hit/miss stats in WebRequest are correct - https://phabricator.wikimedia.org/T215987 (10BBlack) The raw data should be accurate. I had thought we were already sending the summarized `X-Cache-Status` to hadoop as well, but apparent... [20:02:15] joal, milimetric : works, thanks. [20:10:09] Hello A-team! I want to schedule a cron job to update and publish a jupyter notebook on notebook1004 daily. However, the current publishing solutions (https://wikitech.wikimedia.org/wiki/SWAP#Sharing_Notebooks) all seem to require a password, which makes it impossible to publish automatically. I'm wondering if you can help set up a directory on notebook1004 like `stat1007:/srv/published-datasets/discovery/reports`, where we can put html [20:10:09] files converted from ipynb in it and share them with the public. [20:18:02] Another question: At https://grafana.wikimedia.org/d/000000598/content-translation?orgId=1 there's a section called "Published translations with high amount of unreviewed MT". It shows the total number for the interval of time that I select. [20:19:27] Can I look into how this data is stored? If I try to export CSV, I only get the same info that is shown on the website: language code and total number. [20:23:41] chelsyx: yeah we don't support public/shareable notebooks right now unfortuntely, and doing so would be a lot of work, if even possible [20:24:00] there are security/access problems already wtih notebooks and the clsuter [20:24:10] but, i think publishing files generated on notebook hosts could be fine [20:24:22] we can do just like we do on stat* boxes with the /srv/published-datasets stuff [20:24:24] would that work for you? [20:25:26] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10chelsyx) @dr0ptp4kt and @atgo , regarding T212414#4956321, you want to track the... [20:26:44] ottomata: that would be awesome! [20:31:36] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10atgo) @chelsyx I'm thinking specifically about our traffic from Google search. So... [20:52:56] joal: would it be worth adding the tip about heap size to https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/XMLDumps/Mediawiki_wikitext_history ? (and perhaps also update the recommended number at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Out_of_Memory_Errors_on_Client ) [20:54:29] very much worth it HaeB :) [21:18:55] (03PS3) 10Joal: [WIP] Refactor mediawiki-page-history computation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493390 [21:19:40] \o/! Refeactor of pageHistoryBuilder done :) Moar tests to be written (and possibly bugs to be corrected), but at least existing tests pass [21:43:54] (03PS7) 10Ottomata: Event(Logging) schema loader [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492399 (https://phabricator.wikimedia.org/T215442) [21:45:26] (03CR) 10Ottomata: "K, here's another refactor!" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492399 (https://phabricator.wikimedia.org/T215442) (owner: 10Ottomata) [22:31:19] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Discovery, and 3 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Ottomata) [22:36:20] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Discovery, and 3 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Ottomata) @akosiaris, I deployed the new 0.0.7 eventgate-analytics chart to pr... [22:57:57] ottomata: should I create a ticket for the published directory request on notebook1003&1004? [22:59:33] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10chelsyx) >>! In T212414#4999143, @dr0ptp4kt wrote: > @chelsyx As it is the "Acces... [23:13:09] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10chelsyx) >>! In T212414#4999480, @atgo wrote: > @chelsyx I'm thinking specificall... [23:52:16] 10Analytics, 10Community-Tech, 10SVG Translate Tool, 10Community-Tech-Sprint: Integrate Piwik with SVG Translate to keep track of metrics - https://phabricator.wikimedia.org/T215478 (10Samwilson) PR83 is merged, and so now this is waiting to be deployed to production. QA is not possible before then (althou...