[04:35:59] PROBLEM - eventbus grafana alert on icinga2001 is CRITICAL: CRITICAL: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is alerting: EventBus POST Response Status alert. https://wikitech.wikimedia.org/wiki/EventBus [04:38:23] RECOVERY - eventbus grafana alert on icinga2001 is OK: OK: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is not alerting. https://wikitech.wikimedia.org/wiki/EventBus [07:01:27] morning folks :) [07:12:56] so for the missing VirtualPageView hour, I'd re-run this [07:12:58] sudo -u hdfs /usr/local/bin/refine_eventlogging_analytics --ignore_failure_flag=true --table_whitelist_regex=:VirtualPageView --since='2019-03-08T17:00:00Z' --until=''2019-03-08T18:00:00Z'' --verbose refine_eventlogging_analytics [07:13:02] on an-coord1001 [07:13:16] and hopefully wait for the _REFINED flag [07:13:33] since now there is a _REFINE_FAILED [07:13:46] then in theory the coordinator should start fine [07:14:07] now I get why restarting was not really worth-it/doble :) [07:14:09] *doable [07:44:17] (03PS8) 10Elukey: Add artifacts for Debian Buster and upgrade to 0.29rc7 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/495182 (https://phabricator.wikimedia.org/T212243) [07:49:03] superset seems to break due to a pandas error [07:49:05] ufffff [08:21:41] and forcing the current version seems not working with python 3.7 [08:31:59] elukey: Hi ! I'm interested in understanding why no REFINED flags ... [08:32:57] joal: bonjour! [08:33:07] I think it may have been due to the hive issue [08:33:13] I wanted to re-run the job to be sure [08:33:52] (the alert was fired ~ when the hive issue happened IIUC) [08:34:14] elukey: I assumed the issue was due to the hive downtime [08:34:26] What I don't understand is the flag issue about rerunning [08:34:46] sorry I didn't get the point [08:43:52] (03PS9) 10Elukey: Add artifacts for Debian Buster and upgrade to 0.29rc7 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/495182 (https://phabricator.wikimedia.org/T212243) [09:14:12] I might have found a superset fix in the code for the issue that I am chasing [09:14:16] but it is kinda weird [09:14:17] sigh [09:16:20] ahhhh https://github.com/apache/incubator-superset/pull/2708 [09:16:25] so they merged this in master [09:17:13] what a mess [09:21:18] any python pandas expert in the chan? [09:24:29] elukey: I can try to help, but I'm no expert [09:24:49] elukey: I have another question for you, on labsdb1012 this time if you have aminute [09:25:35] joal: lets go with labsdb1012, I am live debugging code in the meantime :D [09:25:51] you sure? this seems like risky !!! [09:26:30] I am on staging :) [09:26:38] elukey: I was wondering if, given the load I have put on the machine, I try to continue optimizing more, of if we think it's enough [09:28:01] joal: the load didn't seem unbearable to me, the cpu was used as well as the network, but this is what the host is there for :) [09:28:11] Ah also elukey - Let's deploy new AQS datasource :) [09:28:17] we don't have to worry about others so it makes sense for me to keep tuning [09:28:28] elukey: ok - will try sqooping HARDER :) [09:30:00] joal: new AQS datasource? [09:30:12] yessir [09:30:18] 2019-02 [09:30:24] ah the druid one [09:30:26] yes yes [09:33:45] merging the change [09:33:54] joal: can I re-run the virtualpage view refine? [09:34:03] never done it so I am curious :) [09:34:13] sure elukey [09:37:15] joal: aqs1004 ready for testing [09:40:02] ack! [09:40:42] elukey: seems all good :) [09:41:59] joal: https://yarn.wikimedia.org/cluster/app/application_1551342556468_54877 !!! [09:42:13] Andrew was already on it no? [09:42:33] WUT? [09:42:47] I think so yes [09:43:04] I clearly remeber him discussing this case, but I don't recall if it was for eventgate or refine or something else [09:46:09] aqs deploy completed [09:56:25] oh sorry elukey, missed the ping - checking WKS2 UI [09:57:39] sounds good :) [10:03:04] (03PS2) 10Joal: Update refinery sqoop to use dedicated labsdb host [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495266 (https://phabricator.wikimedia.org/T215550) [10:03:29] (03PS6) 10Joal: Refactor mediawiki-page-history computation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493390 (https://phabricator.wikimedia.org/T190434) [10:04:29] ah no I am stupid [10:04:41] the yarn error is about my parameter [10:04:44] what the hell [10:04:46] I need coffee [10:04:59] it is about your parameter, I nonetheless don't understand elukey [10:10:01] so I have removed the 'Z' from the data and it succeeded, but not adding the _REFINED flag :D [10:10:10] so I guess it went in another hour [10:12:02] the flag? [10:12:20] yes the file in the hdfs directory [10:12:33] the oozie coordinator is waiting for that one to start no? [10:12:35] I can't imagine why elukey - Sounds very not probable [10:12:41] elukey: yes [10:13:22] I don't know how to get logs for https://yarn.wikimedia.org/cluster/app/application_1551342556468_54926 [10:13:27] to gather what it did [10:13:47] elukey: yarn logs --applicationId application_1551342556468_54926 [10:13:50] arf again elukey [10:14:02] elukey: yarn logs --applicationId application_1551342556468_54926 --appOwner hdfs | less [10:14:09] ah yes of course [10:15:27] since = 2019-03-08T17:00:00.000Z, [10:15:27] until = 2019-03-08T18:00:00.000Z, [10:15:48] so it seems not liking the 'Z' at the end of the since/until parameter [10:15:55] but it adds it when not provided [10:16:03] ??? [10:16:21] joal: I started the job with the same dates but without 'Z' [10:16:23] at the end [10:16:29] and the above job is the result [10:16:35] in the logs though I can see the 'Z' [10:16:46] when it dumps the Config used [10:17:38] elukey: maybe it want the .000 when using a Z? [10:17:40] not sure [10:29:00] I tried even with a larger date range but it doesn't do anything [10:29:17] namely it succeeds quickly and then no _REFINED flag [10:29:47] joal: does _REFINE_FAILED need to be removed manually? [10:30:18] elukey: I don't think so, but I'm no expert :( [10:30:32] elukey: have you tried looking at wikitech doc on this? [10:31:04] elukey: --ignore_failure_flag=true [10:31:11] or manually remove flag ;) [10:31:12] yeah I am using it [10:31:16] k [10:31:21] but it doesn't seem doing anything [10:31:39] then it might be an issue with dates or whitelist of folders I guess [10:32:00] the whitelist is table_whitelist_regex = Some(:VirtualPageView) [10:32:10] and in theory it looks for input_path_regex = "eventlogging_(.+)/hourly/(\d+)/(\d+)/(\d+)/(\d+)", [10:32:30] seems consistent with eventlogging_VirtualPageView that I can see on hdfs [10:32:41] elukey: colons i [10:33:17] elukey: ah - non-capturing I guess [10:33:52] colon might be it joal, lemme retry, I thought it was something added by spark [10:35:14] ok now it seems working [10:35:18] * elukey cries in a corner [10:35:29] thanks joal [10:35:53] brb [10:38:41] (03CR) 10Addshore: [C: 04-1] "needs to be added to one of the cron files so that this actually gets run :)" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494772 (https://phabricator.wikimedia.org/T215796) (owner: 10Tonina Zhelyazkova) [10:53:45] joal: worked! [11:16:30] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Staging environment for upgrades of superset - https://phabricator.wikimedia.org/T212243 (10elukey) 0.29rc7 is deployed on analytics-tool1004, so far the only I issue that I found is that graphs showing data on a World Map are broken. To r... [11:16:38] joal: --^ this is basically my issue with Pandas [11:19:58] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Staging environment for upgrades of superset - https://phabricator.wikimedia.org/T212243 (10elukey) To quickly test superset: `ssh -L 9080:analytics-tool1004.eqiad.wmnet:80 analytics-tool1004.eqiad.wmnet` and then localhost:9080 [11:36:58] * elukey lunch! [14:14:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Enable encryption and authentication for TLS-based Hadoop services - https://phabricator.wikimedia.org/T217412 (10elukey) The next step is to figure out how to deploy the Java trustore (where the TLS CA's certificate is) and the keystore (... [14:23:36] (03CR) 10Bearloga: "@Mforns: Yup! :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/493424 (https://phabricator.wikimedia.org/T209087) (owner: 10Bearloga) [14:37:15] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Staging environment for upgrades of superset - https://phabricator.wikimedia.org/T212243 (10elukey) I added some logging on the main superset instance. This is what df[metric] looks like for the pageview dashboard: ` Mar 11 14:32:35 analy... [14:37:35] brb! [14:58:14] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Cool, thanks! Merging..." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/493424 (https://phabricator.wikimedia.org/T209087) (owner: 10Bearloga) [14:59:13] heya teammmmm [15:22:29] 10Analytics, 10Operations, 10RESTBase, 10Traffic, and 2 others: Verify that hit/miss stats in WebRequest are correct - https://phabricator.wikimedia.org/T215987 (10Pchelolo) 05Open→03Resolved a:03Pchelolo Thank you, everyone! :) [15:25:20] (03PS1) 10Nuria: Revert "Create metrics matrix component" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/495699 [15:27:15] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Event counts from Mysql and Hive don't match. Refine is persisting data from crawlers. - https://phabricator.wikimedia.org/T210006 (10mforns) [15:27:51] 10Analytics, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10mforns) p:05High→03Normal [15:27:54] 10Analytics: Upgrade matomo1001 to latest upstream - https://phabricator.wikimedia.org/T218037 (10elukey) [15:28:25] 10Analytics, 10User-Elukey: Upgrade matomo1001 to latest upstream - https://phabricator.wikimedia.org/T218037 (10elukey) [15:29:29] 10Analytics, 10Better Use Of Data, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10mforns) Hi @MNeisler! We'd like to have this done by the end of this quarter. Is there anything we can do, I can help you build... [15:31:15] 10Analytics, 10User-Elukey: Enable Security (stronger authentication and data encryption) for the Analytics Hadoop cluster and its dependent services - https://phabricator.wikimedia.org/T211836 (10mforns) p:05High→03Normal [15:32:53] 10Analytics, 10Product-Analytics, 10Patch-For-Review: Standardize datetimes/timestamps in the Data Lake - https://phabricator.wikimedia.org/T212529 (10mforns) p:05High→03Normal [15:35:04] (03PS3) 10Tonina Zhelyazkova: Count enables/disables for rollback conf prompt user setting [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494772 (https://phabricator.wikimedia.org/T215796) [15:36:59] 10Analytics, 10Analytics-Data-Quality, 10Tool-Pageviews: Anomalous statistics results in eu.wikipedia siteviews - https://phabricator.wikimedia.org/T212879 (10mforns) p:05High→03Normal [15:37:45] 10Analytics: Alarms for virtualpageview should exist (probably in oozie) for jobs that have been idle too long - https://phabricator.wikimedia.org/T213716 (10mforns) a:05Nuria→03None [15:40:02] 10Analytics: Make edit data lake data available as a snapshot on dump hosts - https://phabricator.wikimedia.org/T214043 (10mforns) p:05High→03Normal [15:40:54] 10Analytics, 10Analytics-Data-Quality, 10Datasets-Webstatscollector, 10Language-Team: Add alarms for high volume of views to pages with replacement characters - https://phabricator.wikimedia.org/T117945 (10mforns) p:05Normal→03Low [15:42:23] 10Analytics: Alarm on data quality issues - https://phabricator.wikimedia.org/T159840 (10mforns) [15:42:48] 10Analytics: Coarse alarm on data quality for refined data based on entrophy calculations - https://phabricator.wikimedia.org/T215863 (10mforns) [15:42:50] 10Analytics: Alarm on data quality issues - https://phabricator.wikimedia.org/T159840 (10mforns) [15:42:56] 10Analytics: Productionize analysis of editcount vs per_user_revision_count - https://phabricator.wikimedia.org/T168648 (10mforns) [15:42:59] 10Analytics: Alarm on data quality issues - https://phabricator.wikimedia.org/T159840 (10mforns) [15:43:28] 10Analytics, 10Discovery, 10Discovery-Analysis, 10Product-Analytics: Add referer to WebrequestData - https://phabricator.wikimedia.org/T172009 (10mforns) p:05Normal→03Low [15:43:58] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461 (10mforns) p:05Normal→03High [15:45:30] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461 (10mforns) p:05High→03Normal [15:45:54] leila: o/ [15:47:12] 10Analytics, 10Analytics-Wikistats: Feedback on hive table mediawiki_history by Erik Z - https://phabricator.wikimedia.org/T178591 (10mforns) a:05Nuria→03JAllemandou [15:47:20] 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10elukey) ping @leila :) [15:48:00] 10Analytics, 10Analytics-Wikistats: Feedback on hive table mediawiki_history by Erik Z - https://phabricator.wikimedia.org/T178591 (10mforns) Hi @JAllemandou :] (in grosking) Can you outline what remains to be done here please? [15:48:43] 10Analytics: Decomission old analytics kafka cluster - https://phabricator.wikimedia.org/T183303 (10mforns) p:05Normal→03High [15:51:06] 10Analytics, 10Analytics-Wikistats: Wikistats 2: New Pages split by editor type wrongly claims no anonymous users create pages - https://phabricator.wikimedia.org/T185342 (10mforns) @Milimetric, @JAllemandou (grosking) Do we know what is the cause of this issue, after the recent quality work? [15:54:32] 10Analytics, 10MediaWiki-Vagrant, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Vagrant's /var/log/daemon.log filling up with kafka errors - https://phabricator.wikimedia.org/T187102 (10mforns) a:03elukey [15:55:08] 10Analytics, 10MediaWiki-Vagrant, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Vagrant's /var/log/daemon.log filling up with kafka errors - https://phabricator.wikimedia.org/T187102 (10mforns) We have not been able to reproduce this issue. But maybe @elukey has an idea of how... [15:59:17] 10Analytics: Mediawiki History: moves counted twice in Revision - https://phabricator.wikimedia.org/T189044 (10mforns) @JAllemandou @Milimetric Was something done in this regard, during data quality work? [16:02:56] 10Analytics, 10MediaWiki-Vagrant, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Vagrant's /var/log/daemon.log filling up with kafka errors - https://phabricator.wikimedia.org/T187102 (10elukey) @DLynch did you manage to solve the issue? [16:03:36] 10Analytics, 10Analytics-Data-Quality: Spike: Quantify how many EventLogging requests we get from non-wiki* hostnames or apps - https://phabricator.wikimedia.org/T190840 (10mforns) Much of this data may be coming from bots as well, see: T210006 [16:04:07] 10Analytics, 10MediaWiki-Vagrant, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Vagrant's /var/log/daemon.log filling up with kafka errors - https://phabricator.wikimedia.org/T187102 (10elukey) 05Open→03Resolved [16:14:13] 10Analytics, 10Product-Analytics, 10Patch-For-Review, 10SEO: Make various auth libraries available on stat* machines - https://phabricator.wikimedia.org/T197896 (10mforns) @mpopov Hi! Do you need google-auth here, or you managed with the others? [16:19:02] (03CR) 10Addshore: [C: 03+2] Count enables/disables for rollback conf prompt user setting [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494772 (https://phabricator.wikimedia.org/T215796) (owner: 10Tonina Zhelyazkova) [16:19:08] (03PS1) 10Addshore: Count enables/disables for rollback conf prompt user setting [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/495703 (https://phabricator.wikimedia.org/T215796) [16:19:10] (03Merged) 10jenkins-bot: Count enables/disables for rollback conf prompt user setting [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494772 (https://phabricator.wikimedia.org/T215796) (owner: 10Tonina Zhelyazkova) [16:19:14] (03CR) 10Addshore: [C: 03+2] Count enables/disables for rollback conf prompt user setting [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/495703 (https://phabricator.wikimedia.org/T215796) (owner: 10Addshore) [16:19:22] (03Merged) 10jenkins-bot: Count enables/disables for rollback conf prompt user setting [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/495703 (https://phabricator.wikimedia.org/T215796) (owner: 10Addshore) [16:19:46] (03PS3) 10Addshore: Add MediaWiki's PHP CodeSniffer rule set [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494190 (owner: 10Thiemo Kreuz (WMDE)) [16:19:50] (03CR) 10Addshore: [C: 03+2] Add MediaWiki's PHP CodeSniffer rule set [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/494190 (owner: 10Thiemo Kreuz (WMDE)) [16:19:56] (03PS1) 10Addshore: Add MediaWiki's PHP CodeSniffer rule set [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/495704 [16:19:59] (03CR) 10Addshore: [C: 03+2] Add MediaWiki's PHP CodeSniffer rule set [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/495704 (owner: 10Addshore) [16:20:04] 10Analytics, 10ChangeProp, 10EventBus, 10WMF-JobQueue, and 3 others: Consider disabling automatic topic creation in main-kafka - https://phabricator.wikimedia.org/T199432 (10mforns) [16:20:39] 10Analytics, 10ChangeProp, 10EventBus, 10WMF-JobQueue, and 3 others: Consider disabling automatic topic creation in main-kafka - https://phabricator.wikimedia.org/T199432 (10mforns) @elukey we were here in grosking and agreed to look into this to be sure there are no action items left. before closing. [16:23:19] 10Analytics, 10Analytics-Wikistats: Contextualize wikistats metrics - https://phabricator.wikimedia.org/T187212 (10Nuria) Like "pageviews per capita" or "pageviews per device" rather than pageviews per country. [16:27:38] 10Analytics, 10Analytics-Wikistats: Issues with page view map in Wikistats 2 - https://phabricator.wikimedia.org/T200070 (10mforns) @ezachte The issue 1 has changed since your initial description. Now we do not have intervals, rather we round to the 1000s. However, there is still some non-human values like 304... [16:29:28] 10Analytics: Wikistats: Change Mercater Projection to Eckert IV - https://phabricator.wikimedia.org/T218045 (10mforns) [16:31:30] 10Analytics, 10Analytics-Wikistats: Issues with page view map in Wikistats 2 - https://phabricator.wikimedia.org/T200070 (10mforns) Issues #2 and #3 are tackled in other tasks: T187212 and T218045 respectively. Let's keep this task about issue #1. @Jsc39, please feel free to take this task :] Ping me, if you n... [16:32:27] 10Analytics, 10Analytics-Wikistats: Wikistats2: Values in map view show unnecessary decimal digits - https://phabricator.wikimedia.org/T200070 (10mforns) [16:50:36] elukey, are the labs alarms sth I should look into? [16:51:36] mforns: what labs alarms? [16:51:44] elukey, the shinken ones [16:52:04] I don't think I receive them :( [16:52:06] for what hosts? [16:52:21] elukey, hadoop-coordinator2 [16:52:25] you can leave them [16:52:45] CRITICAL: analytics.hadoop-coordinator-2.diskspace.root.byte_percentfree (<100.00%) [16:52:54] WARNING: 100.00% of data above the warning threshold [2.0] [16:53:00] ok [16:53:19] elukey, thanks for looking into the virtualpageview rerun [16:56:51] ah will check the disk space thing but not really pressing [17:19:19] nuria: turned out that if I re-create the world maps in superset they work [17:19:19] ahahhaah [17:20:17] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Staging environment for upgrades of superset - https://phabricator.wikimedia.org/T212243 (10elukey) Turned out that simply re-creating the world maps from scratch worked fine! [17:20:18] elukey: nice [17:20:30] elukey: and how did you figured that out? [17:20:53] nuria: I tried to repro with a new chart and I couldn't [17:21:01] elukey: man ... [17:21:07] yeah [17:23:44] now that I am looking better the world map graphs look a bit different across versions [17:23:44] ah no wait! [17:23:44] I was missing the "bubbles" [17:23:44] those are the ones yielding to the error [17:23:45] /o\ [17:29:45] what a mess [17:29:52] I think this is worth reporting upstream [17:32:27] https://github.com/apache/incubator-superset/issues/4535 is similar but for 0.22.. [17:32:33] meanwhile 0.26 works for us [17:42:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Staging environment for upgrades of superset - https://phabricator.wikimedia.org/T212243 (10elukey) Created https://github.com/apache/incubator-superset/issues/7006 [17:42:47] created an issue for upstream, let's see if they answer (I have not a lot of hope) [17:42:59] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Neil_P._Quinn_WMF) 05Resolved→03Open >>! In T206894#4957722, @Nuria wrote: > THis change s... [18:05:46] 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging sanitization - https://phabricator.wikimedia.org/T199898 (10Nuria) [18:05:54] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive - https://phabricator.wikimedia.org/T209503 (10Nuria) 05Open→03Resolved [18:12:39] 10Analytics, 10Analytics-Kanban, 10Discovery, 10EventBus, and 2 others: EventBus mediawiki outage 2019-02-28 - https://phabricator.wikimedia.org/T217385 (10Nuria) 05Open→03Resolved [18:13:19] 10Analytics, 10Analytics-Kanban, 10Operations, 10Wikimedia-Stream, and 2 others: Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Nuria) 05Open→03Resolved [18:13:53] 10Analytics, 10Analytics-Kanban, 10Operations, 10User-Elukey: Review znodes on Zookeeper cluster to possibly remove not-used data - https://phabricator.wikimedia.org/T216979 (10Nuria) 05Open→03Resolved [18:14:40] 10Analytics, 10Analytics-Kanban, 10EventBus, 10MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), and 2 others: EventBus mediawiki extension should support multiple 'event service' endpoints - https://phabricator.wikimedia.org/T214446 (10Nuria) 05Open→03Resolved [18:14:42] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Discovery, and 4 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Nuria) [18:15:07] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10Nuria) [18:15:09] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: update mw scooping to be able to scoop from new db cluster - https://phabricator.wikimedia.org/T215290 (10Nuria) 05Open→03Resolved [18:25:58] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Staging environment for upgrades of superset - https://phabricator.wikimedia.org/T212243 (10elukey) [18:30:53] 10Analytics: Mediawiki History: moves counted twice in Revision - https://phabricator.wikimedia.org/T189044 (10JAllemandou) IMO this issue is not data-quality as in a problem in the dataset generation, but rather a problem of data-semantics and how we interpret our data. By this I mean the effort on data quality... [18:33:31] * elukey off! [18:37:37] 10Analytics, 10CirrusSearch, 10EventBus, 10WMF-JobQueue, and 6 others: EventBus or CirrusSearch: DomainException from line 353 of /srv/mediawiki/php-1.33.0-wmf.9/vendor/firebase/php-jwt/src/JWT.php: Unknown JSON error: 5 - https://phabricator.wikimedia.org/T212335 (10Pchelolo) 05Open→03Resolved The cha... [18:58:36] Hello! Anybody know how to interpret this error message? `2019-03-11 18:42:32,795 FATAL [IPC Server handler 4 on 39169] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1551342556468_57389_m_000005_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable@3efedc6f` [18:58:45] https://hue.wikimedia.org/hue/jobbrowser/#!id=task_1551342556468_57389_m_000014 [19:02:13] Hi chelsyx - The interesting bit of it is a few lines downward: Caused by: org.apache.avro.AvroTypeException: Found long, expecting union [19:04:12] joal: Does that mean the output is too long? [19:04:19] chelsyx: I don't think so [19:04:44] chelsyx: I can't access our hive query through hue (or I don't know how to) - Do you mind pasting it somwhere for me? [19:05:20] https://www.irccloud.com/pastebin/X80NtSiD/ [19:05:32] Thanks [19:06:18] I ran this query a month ago but didn't see the error [19:07:00] chelsyx: we've been dealing with difficulties in the past few month with sqooping of mediawiki-tables, leading to schema incoherences [19:07:11] I think you're hitting one of those - let me double chekc [19:08:09] Thanks joal! [19:11:04] confirmed chelsyx - Error comes from log_user field being a long in the data (as expected) but being a String in the schema because of past months issues [19:11:07] Will correct [19:11:58] Thanks you joal!!! [19:13:14] Hello hello [19:13:24] For some reason, the following sqoop command: [19:13:25] /usr/bin/sqoop import --connect jdbc:mysql://s1-analytics-replica.eqiad.wmnet/enwiki:3311 --password-file /user/goransm/mysql-analytics-research-client-pw.txt --username research -m 16 --query "select * from wbc_entity_usage where \$CONDITIONS" --split-by eu_row_id --as-avrodatafile --target-dir /user/goransm/wdcmsqoop/wdcm_clients_wb_entity_usage/wiki_db=enwiki --delete-target-dir> [19:13:44] (03CR) 10Mforns: "Starting review now." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/491494 (https://phabricator.wikimedia.org/T216603) (owner: 10Joal) [19:13:56] Fails to deliver [19:14:01] 19/03/11 19:07:23 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure [19:14:14] 19/03/11 19:07:23 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter [19:15:13] (03PS1) 10Joal: Correct mediawiki_logging log_user field [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495734 [19:18:41] chelsyx: corrected in prod, see patch --^ [19:19:06] Thanks joal! [19:19:36] To make it easier to follow the above mentioned sqoop failure: https://phabricator.wikimedia.org/T214539#5016107 [19:23:00] GoranSM: on what host are you executing it? [19:23:22] can you telnet to s1-replica-analytics.eqiad.wmnet 3301 from there ? [19:24:45] sorry s1-analytics-replica.eqiad.wmnet. [19:24:50] GoranSM: I also have questions about usage, regularity etc of the data you're sqooping, so that we try to prevent multiplication of efforts [19:26:43] GoranSM: I'd also try with jdbc:mysql://s1-analytics-replica.eqiad.wmnet:3311/enwiki [19:26:54] elukey: stat1004 [19:27:05] in your example you put the port after the dbname [19:27:23] sounds like a good catch elukey :) [19:27:33] elukey: see https://phabricator.wikimedia.org/T212386#4957409 [19:28:04] GoranSM: sure but that might be wrong [19:28:08] can you please try? [19:28:08] :) [19:28:29] and then also joal's question about usage :) [19:28:53] elukey: I confirm I can telnet s1-analytics-replica.eqiad.wmnet 3311 [19:29:07] elukey: try what? telnet? [19:29:13] :) [19:29:24] GoranSM: confirmation [19:29:52] GoranSM: /usr/bin/sqoop import --connect jdbc:mysql://s1-analytics-replica.eqiad.wmnet:3311/enwiki .... [19:30:01] GoranSM: nono sqoop with the jdbc string above [19:30:19] elukey: will do [19:30:22] super [19:30:23] joal: Thanks! [19:30:41] You guys are great. Let me check this and I'll get back to you. [19:33:00] (going afk :) [19:35:20] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10JAllemandou) Following up on this: another viable solution to get monthly-coherence between dumps is to force a dump on the 1st o... [19:36:09] joal: sqoop job is running, looks Ok [19:38:00] elukey: Thanks!, and: [19:38:04] joal: Thanks! [19:40:48] joal: Just let me know when would you like to discuss this sqooping operation (the only one that I maintain at all...) [19:41:58] GoranSM: the wikidata-item-to-page is a reccurring need that is partially solved with research using wikidata-json dumps [19:42:38] GoranSM: I am trying to get an idea of how much and how often you're gathering data :) [19:43:23] joal: Once weekly; but I also need to have the data organized by usage aspects (see: https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage) [19:44:43] GoranSM: all wikis? [19:45:07] joal: Almost all WMF projects (+800) [19:49:09] joal: One question, please: in our new multi-instace MySQL layout we have exactly eight s1, s2,.., s8 shards, right? [19:49:28] GoranSM: we have 3 shards [19:49:42] joal: you mean: sX, x1, and staging? [19:49:48] GoranSM: no [19:49:52] joal: ? [19:50:14] joal: I still do not get the new organization of our MySQL databases [19:50:49] joal: sorry: s1, s2,.., s8 are hosts, right? [19:50:56] GoranSM: no [19:51:37] GoranSM: https://github.com/wikimedia/analytics-refinery/blob/master/python/refinery/util.py#L1075 [19:52:03] GoranSM: there is a mapping from s1->s8 as hosts to real hostnames + port [19:52:15] joal: Got it. [19:52:28] We have 3 db hosts, each of them containing multiple db-shards as sX [19:52:41] joal: The docs (https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas) say: "... has been deprecated in favor of a multi instance/host solution, namely each section (sX or x1) is now running in a separate mysql database instance." [19:52:54] Please update :) [19:53:40] joal: Oh no - I am not updating anything that has to do with our Data Engineering :) - Thanks god I don't have to manage our databases! :) [19:55:52] joal: Seriously now, I enjoy updating the docs whenever I am sure that I understand what am I doing, but this thing - please, someone else. [19:57:48] ok :) [19:58:33] 10Analytics, 10Product-Analytics, 10Patch-For-Review: Standardize datetimes/timestamps in the Data Lake - https://phabricator.wikimedia.org/T212529 (10JAllemandou) I have an opinion and some information here. We use string-encoded timestamps in mediawiki-history because our version of hive doesn't support ti... [20:18:03] joal: looks like the latest sqoop of mediawiki db hasn't completed yet? I ran the query on `2019-02` snapshot but only see records between 2018/02 and 2018/10 [20:19:07] chelsyx: seems very bizarre [20:19:35] chelsyx: logging table? [20:19:45] joal: yes [20:23:00] chelsyx: and from your query, commonswiki onl [20:23:59] joal: yes [20:25:40] chelsyx: https://gist.github.com/jobar/3c1cad1e0bff0bd17a7d70e23e452e19 [20:25:56] chelsyx: seems related to something else [20:27:42] Gone for tonight team - see you tomorrow [20:30:45] Thanks joal! I will check again [23:32:16] 10Analytics, 10Product-Analytics, 10Patch-For-Review: Standardize datetimes/timestamps in the Data Lake - https://phabricator.wikimedia.org/T212529 (10Nuria) While a common standard makes sense, I really see a lot of value in being able to use hive udfs in mw history so I defer to @JAllemandou for what forma... [23:55:26] (03CR) 10Nuria: "Does this need to be changed on the sqooping as well?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495734 (owner: 10Joal)