[04:30:45] (03PS4) 10Milimetric: [WIP] Update mediawiki-history comment and actor joins [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal) [07:29:49] !log decom analytics103[7,8] from Analytics Hadoop [07:29:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:35:58] nodes decomming now :) [07:36:18] -3 and I am done, probably tomorrow evening? [07:45:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Decommission old Hadoop worker nodes and add newer ones - https://phabricator.wikimedia.org/T209929 (10elukey) [07:58:11] not sure if the denormalize failures are my fault [07:58:27] I checked the workflow and its task ran on an1039 afaics, that is not decom [07:59:34] ah no wait need to check https://yarn.wikimedia.org/cluster/app/application_1544022186674_118925 [08:00:37] ok decom should not have interfered [08:13:49] Morning [08:14:00] Checking MWH job [08:16:33] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Wikidata, and 3 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10WMDE-leszek) @JAllemandou: as at WMDE we are in the need for the December data (D... [08:19:43] bonjour joal [08:20:46] elukey: given that the past 3 regular MWH jobs have failed, I'm enclined to think you're not part of the mess :) [08:21:21] 10Analytics-Kanban, 10User-Elukey: Q1 2018/19 Analytics procurement - https://phabricator.wikimedia.org/T198694 (10elukey) 05Open→03Resolved [08:23:31] 10Analytics, 10Beta-Cluster-Infrastructure, 10User-Elukey: TCP connections between analytics and deployment-prep - https://phabricator.wikimedia.org/T208870 (10elukey) @thcipriani ping :) [08:26:58] joal: yeah I saw it later on :P [08:27:04] but usually I am part of the mess! [08:31:25] Actually elukey, might be related :( There is a line about the error-type we're having in the article about scaling spark-jobs: Better Fetch Failure handling [08:34:59] the timing was suspicious, right after the decom [08:35:10] even if in theory it should have been graceful [08:35:15] but probably it is not [08:44:45] ahhh now I get what you're saying (I am watching the spark summit video from facebook) [08:45:04] the 4 fetch failure limit might have hit a node under decomming right? [08:45:39] I think that's right elukey [08:46:25] joal: in this case we could simply re-run it right? Now it should be fine [08:46:40] elukey: We use 4-cores per exec, meaning 4 tasks are run per executor - If one exec fail, well 4 tasks will possibly fail to fetch at the same time [08:47:02] yep yep [08:47:29] I have only 3 nodes left to decom (tomorrow), but I can wait [08:48:38] elukey: I'm updating job conf to bump the number [08:49:50] !log Rerun failed mediawiki-denormalize job with update spark conf [08:49:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:51:00] nice :) [08:52:49] My firstlaunch attempt failed, but it now succeeded - Babysitting the folk [09:02:34] brb [09:19:37] !log Deploying refinery onto HDFS so that refinery-job-0.0.82.jar is present on HDFS (needed to run mediawiki-history successfully) [09:19:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:20:39] did we miss to update it? [09:21:12] elukey: deploys of refinery-source and refinery have been made in december, but none made it HDFS [09:22:29] elukey: I suspected the job would fail because of deploy issues (said that in standup last week) [09:22:47] ack :) [09:32:23] Ok job started and seems ok for now [09:32:45] One weird thing though, related to the original failure: it was not mentioning cast issues (while it should have be ...) [09:46:09] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Wikidata, and 3 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10JAllemandou) Hi @WMDE-leszek - core data has not been computed et (usually done a... [09:46:10] * elukey nods in an attempt to support joal despite his ignorance on the subject [09:59:33] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Wikidata, and 3 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10WMDE-leszek) Very much appreciated @JAllemandou! [10:35:09] 10Quarry, 10I18n: Quarry cannot save queries with emojies - https://phabricator.wikimedia.org/T196153 (10jcrespo) Regarding the second error- binary strings are not text, so they must be converted to python strings explicitly after driver execution. For the first, I can help- most likely there is no need to u... [10:44:59] (03CR) 10Joal: "Comments inline - Many thanks for working on this milimetric!" (036 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal) [11:26:05] !log move hue/oozie/hive password handling from auto-load to role lookup in the puppet private repo [11:26:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:26:24] this was a no-op but let me know if you see anything strance later on --^ [11:27:21] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10elukey) a:03elukey [11:44:24] * elukey lunch! [13:57:46] joal: agreed with everything you said, it was half-finished, sorry. I'll push another patch later, but going to work with fdans now [13:58:29] no prob milimetric - Thanks a lot for the work :) [13:59:09] it's a lot easier to nitpick something than come up with it :) [13:59:19] (you came up with this, to be clear) [14:01:38] fdans: ping me when you're around [14:03:49] milimetric: holiday today! [14:04:19] ah sorry fdans! [14:04:28] enjoy holiday!!! [14:04:44] fdans: how dare you! :D [14:19:16] joal: can you tell me what machine in labs do you use to conect to the db replicas [14:19:21] joal: hola [14:19:27] Hi nuria [14:20:00] Confirming my understanding nuria: From which machines do I connect to the labs-analytics DB, for instance for sqoop? [14:20:14] joal: or just to look at labs db [14:20:30] same nuria - I do it from stat1004 usually [14:20:38] Can probably be done from stat1005 [14:21:55] joal: wait then i am missing the connection string cause thecone i have is just for prod replicas [14:22:00] joal: makes sense? [14:22:03] yup [14:23:25] nuria: from stat1004 I do: mysql -h labsdb-analytics -u s53272 -p [14:32:43] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10elukey) All right so from now on the analytics systemd timers by default will not log into syslog/daemon.log, this should help preventing this issue again. Good... [14:38:42] Many thanks for that elukey --^ [14:41:57] joal: hope that fixes it! [15:13:52] joal: qq - is it better if I hold off tomorrow morning with the decom of the last 3 hadoop workers? [15:18:45] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10herron) 05Open→03Resolved a:03herron Good to close! (can always re-open if we need to follow up) Thanks @Elukey! [15:20:59] 10Analytics, 10Operations, 10decommission, 10ops-eqiad, and 2 others: Decommission analytics100[1,2] - https://phabricator.wikimedia.org/T205507 (10elukey) a:05elukey→03RobH [15:22:49] milimetric: question if you may [15:22:54] milimetric: about [15:22:57] https://www.irccloud.com/pastebin/K1APB1VM/ [15:23:42] nuria: yeah, what's up [15:23:52] hey team :] [15:23:55] hi mforns [15:23:59] hi! [15:24:11] milimetric: in the revision table there is no wiki_db right? [15:24:37] milimetric: so that code is for the revision table "once scooped"? [15:24:41] nuria: there is in wmf_raw, the sqoop partitions by wiki_db [15:24:53] milimetric: k [15:24:55] yeah, this is the history reconstruction, running on top of the sqoop [15:25:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Replace the Analytics HDFS/Yarn masters (hardware refresh) - https://phabricator.wikimedia.org/T203635 (10elukey) [15:25:56] 10Analytics, 10Operations, 10decommission, 10ops-eqiad, and 2 others: Decommission analytics100[1,2] - https://phabricator.wikimedia.org/T205507 (10elukey) 05Open→03Stalled a:05RobH→03elukey Didn't realize that the task was still assigned to me, apologies :) This is a good thing though since Analy... [15:27:50] ottomata: thanks a lot for the review :) [15:28:23] I am pretty sure that this change will require a looong back and forth with SRE, I have this feeling :D [15:28:39] I should have started the conversation earlier on [15:30:53] (03CR) 10Ottomata: [C: 03+1] "Great! Tests would be good, but I'd be fine with waiting if/until we move this logic to somewhere more generic like HivePartition." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477295 (https://phabricator.wikimedia.org/T210099) (owner: 10Mforns) [15:31:17] elukey: aye indeed! [15:31:33] ottomata, I added some tests already, they are there in the latest patch [15:31:35] what do you think of the idea of making the extra param just an ssh key blacklist, rather than using it as more groups [15:31:39] oh mforns sorry, thanks! [15:31:42] np! [15:31:46] I didn't click through the new files [15:31:52] NICE [15:31:56] big ol +1 from me [15:32:02] (03CR) 10Ottomata: [C: 03+2] Allow for custom transforms in DataFrameToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477295 (https://phabricator.wikimedia.org/T210099) (owner: 10Mforns) [15:32:06] why not +2 even [15:32:09] :D [15:33:08] thanks! :] [15:34:02] ottomata: I like it, it seems easy enough to avoid changing too much code. Let's see what the SRE team things about it! [15:34:16] k [15:35:23] I am going to send an email to ops@ explaining everything, it might help speeding up the review process :) [15:36:06] mforns: have you seen or are you working on https://hue.wikimedia.org/oozie/list_oozie_workflow/0034383-181112144035577-oozie-oozi-W/?coordinator_job_id=0034382-181112144035577-oozie-oozi-C ? [15:36:10] or joal ^ [15:36:17] (denormalize failed) [15:36:24] !log Restarted turnilo to clear deleted test datasource [15:36:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:36:45] milimetric, looking! [15:37:14] yea I saw that, will have a look [15:37:58] (03Merged) 10jenkins-bot: Allow for custom transforms in DataFrameToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477295 (https://phabricator.wikimedia.org/T210099) (owner: 10Mforns) [15:38:05] mforns: oh it's ok, I was going to look, just making sure I'm not duplicating effort [15:38:26] milimetric, it's my ops week! I will look :] [15:38:42] thanks [15:40:15] oh mforns maybe joal killed that and restarted with this: https://hue.wikimedia.org/oozie/list_oozie_workflow/0069776-181112144035577-oozie-oozi-W/?coordinator_job_id=0000049-181009135629101-oozie-oozi-C [15:40:59] AH merge! i forogt that our jenkins is auto merging +2s [15:41:01] milimetric, yea, I can see a couple logs in SAL from joal [15:41:18] mforns: i think jenkins merged that change...i hope that was opk [15:41:26] ottomata, I think so [15:41:36] milimetric: do the changes for https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/480796 are needed to run teh denormalize this month? seems like they might [15:41:45] I'm giving it a last test in turnilo to be sure, but should be ok, I tested it several times [15:42:28] nuria: no, that's what we were saying in standup on Friday [15:42:42] that we'll wait to get this change right [15:42:56] run this month with the compat views approach that ran last month [15:43:13] milimetric: ah i see. [15:44:19] milimetric: let me see, are we scooping comment and actor from prod now? [15:44:59] nuria: no, I can go over the plan with you in cave if you like [15:46:16] milimetric: my internet is terrible (total misscalculation on my part, today is a holiday here and place i was going to work at is closed) [15:46:20] milimetric: let me try to join [15:46:53] nuria: ah, no prob, we can chat, but we have standup soon, so let's test your connection too [15:49:01] 10Analytics, 10Operations, 10decommission, 10ops-eqiad, and 2 others: Decommission analytics100[1,2] - https://phabricator.wikimedia.org/T205507 (10RobH) >>! In T205507#4859289, @elukey wrote: > Didn't realize that the task was still assigned to me, apologies :) > > Would it be feasible to keep these two h... [15:51:01] ottomata, HiveToDruid Turnilo test succeeded, everything good with the merged code [15:51:57] awersome [15:53:12] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Replace the Analytics HDFS/Yarn masters (hardware refresh) - https://phabricator.wikimedia.org/T203635 (10elukey) [15:53:19] 10Analytics, 10Operations, 10decommission, 10ops-eqiad, and 2 others: Decommission analytics100[1,2] - https://phabricator.wikimedia.org/T205507 (10elukey) 05Stalled→03Open a:05elukey→03RobH Nevermind then, I can easily use only analytics1028->41, we are good to decom. Thanks! [15:55:41] a-team: there is a power problem on rack A2 in eqiad, some hadoop workers are going down [15:55:53] this probably will cause some jobs to fail etc.. [16:00:31] a-team: will be 5 mins late at standup, the el master db is down [16:01:15] ping ottomata mforns [16:01:15] ping ottomata and mforns [16:01:18] holaaa [16:01:24] standuppp ottomata [16:02:04] !log stop eventlogging mysql consumers on eventlog1002 - db1107 down [16:02:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:02:21] elukey: mediawiki-history job didn't fail :) [16:02:42] * elukey dances [16:04:18] PROBLEM - Check status of defined EventLogging jobs on eventlog1002 is CRITICAL: CRITICAL: Stopped EventLogging jobs: eventlogging-consumer@mysql-m4-master-00 eventlogging-consumer@mysql-eventbus [16:04:26] OH UH [16:04:41] this is me :) [16:04:48] forgot to downtime [16:04:52] my comment was fro standuplateness.. [16:06:27] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453 (10Ottomata) [16:06:52] PROBLEM - HDFS corrupt blocks on an-master1001 is CRITICAL: 16 ge 5 https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen [16:07:08] yeah yeah [16:07:22] elukey: nodes decom, right? [16:07:40] joal: nope, we have some workers down, rack A2 went without power :( [16:07:49] a day full of joy [16:07:51] Aouch - ok [16:08:03] elukey: please ping if there's anuthing I can help with [16:15:03] 10Analytics, 10Operations, 10ops-eqiad: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10Cmjohnson) I replaced the fuse on the wrong side initially and caused an outage. I then replaced the fuses on the correct phase and the power was not restored, I tried replacing them... [16:21:59] all right so the workers down are up [16:22:01] should be good [16:22:32] the blocks are still in that state but there is also decom in progress, let's wait for it to finish and worst case we run hdfs fsck [16:25:25] about the EL master - Jaime is rebooting the host again to check that everything comes back cleanly [16:25:39] from the eventlog1002's logs it seems that nothing started screaming [16:25:54] probably the TCP conns to db1107 were timing out [16:26:18] in theory simply restarting the consumers when db1107 is up should suffice.. [16:26:21] ottomata: thoughts? [16:27:35] elukey: that sounds right to me [16:29:50] super [16:35:06] (03PS1) 10Joal: Update big spark jobs settings [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/482661 [16:35:17] 10Analytics, 10Analytics-Kanban: [Spike] Spark job for digests-only mediawiki-history-reduced - https://phabricator.wikimedia.org/T212928 (10Milimetric) p:05Triage→03High [16:36:07] (03CR) 10Joal: "Settings tested on cluster through CLI." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/482661 (owner: 10Joal) [16:36:52] 10Analytics, 10Analytics-Data-Quality, 10Tool-Pageviews: Anomalous statistics results in eu.wikipedia siteviews - https://phabricator.wikimedia.org/T212879 (10Milimetric) p:05Triage→03High [16:39:18] (03PS1) 10Joal: Use spark dynamic allocation in mediawiki-history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482663 [16:39:59] RECOVERY - Check status of defined EventLogging jobs on eventlog1002 is OK: OK: All defined EventLogging jobs are runnning. [16:44:35] 10Analytics, 10Core Platform Team, 10EventBus, 10WMF-JobQueue, 10Wikimedia-production-error: EventBus error "Unable to deliver all events: (curl error: 28) Timeout was reached" - https://phabricator.wikimedia.org/T204183 (10Ottomata) Hm, ok. That is a lot. The timeout issue is a known (and was difficul... [16:44:50] 10Analytics, 10Analytics-SWAP, 10Contributors-Analysis, 10Product-Analytics: Provide Python 3.6 on SWAP - https://phabricator.wikimedia.org/T212591 (10Milimetric) This isn't just a simple upgrade, so we're taking it as normal priority and it won't be done in the immediate future [16:45:46] 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10Milimetric) p:05Normal→03Low [16:47:13] 10Analytics, 10Analytics-SWAP, 10Contributors-Analysis, 10Product-Analytics: Provide Python 3.6 on SWAP - https://phabricator.wikimedia.org/T212591 (10Nuria) It requires to recompile all deps to 3.6 (including jupyter itself) so it is an upgrade of the whole platform that might not be trvial [16:47:24] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream: Move KafkaSSE development from Differential to Gerrit - https://phabricator.wikimedia.org/T212420 (10Milimetric) [16:47:56] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream: Move KafkaSSE development from Differential to Gerrit - https://phabricator.wikimedia.org/T212420 (10Ottomata) For this particular project, I'd like to move to github. This is a generic library that is non mediawiki specific. I'll try to... [16:50:03] (03PS1) 10Joal: Update wikidata-coeditor job data dependency [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482664 (https://phabricator.wikimedia.org/T193641) [16:50:06] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Milimetric) p:05Normal→03High [16:50:14] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Degraded RAID on analytics1054 - https://phabricator.wikimedia.org/T213038 (10Cmjohnson) The disk at slot 1 is failed, the server is out of warranty but I do have a spare 4TB SATA. cmjohnson@analytics1054:~$ sudo megacli -PDList -aALL |grep "Firmware... [16:50:28] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Milimetric) a:03Ottomata [16:51:15] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Start refining all blacklisted EventLogging streams - https://phabricator.wikimedia.org/T212355 (10Ottomata) Ok! So just like the Edit schema data (which was migrated to EditAttemptStep), these are blacklisted because of schema nastiness or incompat... [16:51:26] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Degraded RAID on analytics1054 - https://phabricator.wikimedia.org/T213038 (10Cmjohnson) @elukey the disk still shows failed do you have to manually add it back? [16:51:52] 10Analytics, 10Analytics-Kanban: Create staging domain for turnilo to test config changes - https://phabricator.wikimedia.org/T212958 (10Milimetric) p:05Normal→03High [16:53:58] 10Analytics, 10Analytics-Kanban: Clean up staging db - https://phabricator.wikimedia.org/T212493 (10jcrespo) May a suggest a different route? Let's migrate the mediawiki replicated tables first- then migrate the staging ones on a per-case bases. After all, it makes no sense to copy them to, which of the new se... [16:54:12] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Allow the deployment of users without SSH access - https://phabricator.wikimedia.org/T212949 (10Milimetric) p:05Normal→03High [16:55:40] 10Analytics, 10EventBus, 10Operations, 10Core Platform Team Backlog (Watching / External), 10Services (watching): eventbus should send statsd in batches - https://phabricator.wikimedia.org/T141524 (10Milimetric) 05Open→03Declined won't fix this because we're working on a new implementation [16:56:03] 10Analytics, 10Core Platform Team, 10EventBus, 10WMF-JobQueue, 10Wikimedia-production-error: EventBus error "Unable to deliver all events: (curl error: 28) Timeout was reached" - https://phabricator.wikimedia.org/T204183 (10Pchelolo) As an easy mitigation attempt, we could try to increase the HTTP timeou... [16:56:25] 10Analytics, 10Analytics-Cluster, 10DBA, 10Operations: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Milimetric) @Dzahn wikimetrics is going to be sunset this quarter, so you won't have to worry about that any more. [16:57:09] 10Analytics, 10EventBus, 10Operations, 10Core Platform Team Backlog (Watching / External), 10Services (watching): eventbus should send statsd in batches - https://phabricator.wikimedia.org/T141524 (10Pchelolo) And the new implementation is based on #service-runner which batch stats by default. [16:57:18] 10Analytics, 10Analytics-Cluster, 10DBA, 10Operations: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Ottomata) OK FINE I'LL DO IT [16:57:29] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10DBA, 10Operations: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Milimetric) a:03Ottomata [16:57:37] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10DBA, 10Operations: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Milimetric) p:05Normal→03High [17:01:53] 10Analytics, 10Analytics-Data-Quality, 10Growth-Team, 10Product-Analytics, 10Patch-For-Review: Add EditAttemptStep properties to the schema whitelist - https://phabricator.wikimedia.org/T208332 (10Milimetric) p:05Triage→03High [17:01:58] 10Analytics, 10Analytics-Data-Quality, 10Growth-Team, 10Product-Analytics, 10Patch-For-Review: Add EditAttemptStep properties to the schema whitelist - https://phabricator.wikimedia.org/T208332 (10Milimetric) a:05nettrom_WMF→03mforns [17:03:32] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Degraded RAID on analytics1054 - https://phabricator.wikimedia.org/T213038 (10elukey) >>! In T213038#4859708, @Cmjohnson wrote: > @elukey the disk still shows failed do you have to manually add it back? Sorry Chris didn't get the question - do you mea... [17:03:40] a-team: green light for deployments etc.. [17:03:47] !log re-enabled eventlogging mysql consumers [17:03:47] \o/ ! [17:03:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:03:49] elukey, cool thanls! [17:03:53] Thanks elukey :) [17:09:28] 10Analytics, 10Analytics-SWAP, 10Contributors-Analysis, 10Product-Analytics: Provide Python 3.6 on SWAP - https://phabricator.wikimedia.org/T212591 (10Ottomata) The extra tricky think about this SWAP setup is ALL user jupyterhub virtualenvs will have to be reinstalled. This means that every user of SWAP wi... [17:10:56] 10Analytics, 10Core Platform Team, 10EventBus, 10WMF-JobQueue, 10Wikimedia-production-error: EventBus error "Unable to deliver all events: (curl error: 28) Timeout was reached" - https://phabricator.wikimedia.org/T204183 (10Ottomata) Not a bad idea! [17:13:25] 10Analytics, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10Milimetric) a:03Neil_P._Quinn_WMF Assigning this to @Neil_P._Quinn_WMF to provide us the definition of the simplified version. [17:14:37] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Degraded RAID on analytics1054 - https://phabricator.wikimedia.org/T213038 (10elukey) a:03Cmjohnson [17:15:26] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Degraded RAID on analytics1054 - https://phabricator.wikimedia.org/T213038 (10Cmjohnson) @elukey sorry, i replaced the disk and it is still showing failed, I don't know if the disk needs to be manually added back to the array? [17:16:00] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Milimetric) p:05High→03Low ping @Neil_P._Quinn_WMF, setting to low priority for now [17:16:46] 10Analytics: [EventLoggingToDruid] Add support for ingesting subfields of map columns - https://phabricator.wikimedia.org/T208589 (10Milimetric) p:05High→03Normal [17:19:36] 10Analytics: Hide unavailable metrics from dashboard - https://phabricator.wikimedia.org/T204717 (10Milimetric) p:05High→03Normal [17:20:16] 10Analytics, 10Analytics-Wikistats: Beta: Provide easier mapping between Wikistats1 metrics and Wikistats2 metrics (example: "active editors") - https://phabricator.wikimedia.org/T187806 (10Milimetric) p:05High→03Normal [17:20:36] (03PS1) 10Addshore: Add build for deployment [analytics/wmde/toolkit-analyzer-build] - 10https://gerrit.wikimedia.org/r/482668 (https://phabricator.wikimedia.org/T209399) [17:20:45] (03CR) 10Addshore: [V: 03+2 C: 03+2] Add build for deployment [analytics/wmde/toolkit-analyzer-build] - 10https://gerrit.wikimedia.org/r/482668 (https://phabricator.wikimedia.org/T209399) (owner: 10Addshore) [17:20:51] (03Merged) 10jenkins-bot: Add build for deployment [analytics/wmde/toolkit-analyzer-build] - 10https://gerrit.wikimedia.org/r/482668 (https://phabricator.wikimedia.org/T209399) (owner: 10Addshore) [17:21:07] !log Manually repair hive table and add _PARTITIONED flag to project_namespace_map [17:21:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:22:01] 10Analytics, 10Operations, 10ops-eqiad: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10RobH) >>! In T212861#4859499, @Cmjohnson wrote: > 1. Do we want to leave these servers with non-redundant power until we can replace the PDU with a new one that should be ordered soon?... [17:23:57] (03CR) 10Mforns: [V: 03+2 C: 03+2] Update wikidata-coeditor job data dependency [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482664 (https://phabricator.wikimedia.org/T193641) (owner: 10Joal) [17:27:19] 10Analytics, 10ChangeProp, 10EventBus, 10RESTBase, and 5 others: Support change propagation for private wikis - https://phabricator.wikimedia.org/T137140 (10Milimetric) @Aklapper: any idea why Herald is adding Analytics here? I tried in vain to search for any rules that would apply. [17:36:11] ;la la la la la [17:36:41] * elukey is happy to see addshore singing [17:36:45] :D [17:36:55] I hope it is a sing of joy and not crazyness after a working day :D [17:37:01] hahahha [17:53:43] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Degraded RAID on analytics1054 - https://phabricator.wikimedia.org/T213038 (10elukey) @Cmjohnson so I got a different than usual output from: ` elukey@analytics1054:~$ sudo megacli -PDList -aAll | grep Firm Firmware state: Online, Spun Up Device Firmw... [17:54:45] nuria: when were we thikning analytics offsite aagain? [17:59:30] * elukey off! [18:22:55] 10Analytics, 10Operations, 10ops-eqiad: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10jcrespo) I am creating a subtask to fix db1082, which may have to be reimaged because the power loss. [18:23:38] 10Analytics, 10Operations, 10ops-eqiad: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10jcrespo) ^CC @Marostegui so you know why db1082 + db1124 + labsdb replication (s5) are broken or stopped [18:27:15] (03CR) 10Ottomata: [C: 03+1] Use spark dynamic allocation in mediawiki-history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482663 (owner: 10Joal) [18:33:10] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Allow the deployment of users without SSH access - https://phabricator.wikimedia.org/T212949 (10Ottomata) [18:33:49] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Allow the deployment of users without SSH access - https://phabricator.wikimedia.org/T212949 (10Ottomata) [18:34:01] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10Ottomata) [18:45:39] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10Ottomata) Saving some notes here I took while disc... [18:52:20] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Degraded RAID on analytics1054 - https://phabricator.wikimedia.org/T213038 (10elukey) ` Enclosure Device ID: 32 Slot Number: 1 Drive's position: DiskGroup: 2, Span: 0, Arm: 0 Enclosure position: 1 Device Id: 1 WWN: 500003964b700233 Sequence Number: 3 M... [18:58:22] 10Analytics, 10Analytics-Kanban: Create staging domain for turnilo to test config changes - https://phabricator.wikimedia.org/T212958 (10elukey) p:05High→03Normal When we discussed this use case I was not aware (shame! shame!) about using SSH -L in the following way: ` ssh -N analytics-tool1002.eqiad.wmne... [18:59:23] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Create staging domain for turnilo to test config changes - https://phabricator.wikimedia.org/T212958 (10elukey) [19:00:07] nuria: --^ [19:00:18] if you are ok I'll update the docs and call it done (tomorrow) [19:00:40] going to dinner, o/ [19:07:40] mforns: do you remember when were thinking for analytics offsite? [19:07:57] ottomata, I think it was end of may or june [19:08:15] (03CR) 10Ottomata: [C: 03+1] "oook!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/482661 (owner: 10Joal) [19:08:29] SRE is looking at doing an offsite in june, maybe in the US [19:08:34] I believe fdans was in Texas until the 25th of may [19:08:55] oh we have staff tomorrrow i guess we will discuss [19:09:03] ottomata, but I can not make it during 8th to 13th of june [19:09:07] yea [19:11:24] ok [19:12:33] 10Analytics, 10Contributors-Analysis, 10Product-Analytics, 10Epic: Support all Product Analytics data needs in the Data Lake - https://phabricator.wikimedia.org/T212172 (10Ottomata) a:05Ottomata→03None [19:16:56] (03PS1) 10Mforns: Update changelog.md for v0.0.83 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/482684 [19:17:05] (03CR) 10Mforns: [V: 03+2 C: 03+2] Update changelog.md for v0.0.83 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/482684 (owner: 10Mforns) [19:46:49] !log Deployed refinery-source using jenkins [19:46:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:48:29] !log merging change to make rsync server modules pull only - T205157 , T205152 [19:48:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:02:44] (03PS1) 10Joal: Correct wikidata-coeditors data dependencies (2nd) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482692 (https://phabricator.wikimedia.org/T193641) [20:02:56] mforns: I'm sorry I just discovered that --^ [20:03:12] mforns: Let's wait until next deploy to restart wikidata-coeditors :S [20:03:18] joal, refinery still not deployed :] [20:03:23] \o/ ! [20:03:40] joal, I was wondering if we need to bump up any jars for specific jobs? [20:03:51] interesting question mforns :) [20:03:58] I think we do [20:04:16] mforns: I let you review that one and during that I make my list ;) [20:04:42] ok [20:06:13] joal, if my deployment notes can help: https://pastebin.com/0abnG63B [20:07:27] joal: interestingly weird thing, getting top articles for the whole year for a bunch of wikis only took 45 minutes: https://phabricator.wikimedia.org/T211827#4847998 [20:07:48] maybe that means we can add that to the api? [20:09:10] milimetric: I'm not too surprised :) [20:10:48] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482692 (https://phabricator.wikimedia.org/T193641) (owner: 10Joal) [20:10:56] milimetric: 1 year of pageviews is ~3days of webrequests in term of datasize [20:11:21] joal: I thought originally the query to crunch yearly tops was too big for the cluster [20:12:06] milimetric: cluster also has got bigger :) [20:12:31] that's what I meant, that it seems ok to run this job now [20:12:39] milimetric: very much :) [20:12:42] I'll update the associated task and maybe put it back in incoming [20:12:45] we could do that [20:13:38] Ah milimetric - question for you [20:13:57] 10Analytics, 10Pageviews-API: Yearly endpoint for the /pageviews/top API - https://phabricator.wikimedia.org/T154381 (10Milimetric) @Tbayer recently showed that performance of this query is pretty good: T211827#4847998, so we should re-consider this. Editing and prepping for grooming again. [20:14:04] milimetric: have you computed the top using a windowing function? [20:14:36] 10Analytics, 10Pageviews-API: Yearly endpoint for the /pageviews/top API - https://phabricator.wikimedia.org/T154381 (10Milimetric) [20:15:19] this is the query that was fast: https://phabricator.wikimedia.org/P7945 [20:15:57] I don't see anything fancy there [20:16:08] milimetric: prefiltering is the trick :) [20:16:10] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10DBA, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Ottomata) @Dzahn https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482693/ [20:16:26] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Ottomata) [20:16:58] the ns0 filter, yeah, maybe, I mean, if we have to do that in the API, we could offer it as a parameter, might help that endpoint be more relevant in general [20:17:03] milimetric: prefiltering for ns0view > 100 makes the results drastically smaller, therefore allowing to get windowing not failing [20:17:31] milimetric: I was thinking in term of performance, not in term of functionality :) [20:17:34] But why not [20:17:57] mforns: I'm ready to talk with you about jobs to update :) [20:18:04] ok! to the bc? [20:18:09] OMW! [20:21:26] 10Analytics, 10Pageviews-API: Yearly endpoint for the /pageviews/top API - https://phabricator.wikimedia.org/T154381 (10Milimetric) [20:23:22] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Ottomata) Docs updated here: https://wikitech.wikimedia.org/wiki/EventStreams/Administration#Deployment [20:25:19] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Ottomata) The next time we build/deploy EventStreams, KafkaSSE from diffusion will no longer be used. Can we close... [20:26:25] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 4 others: Prototype in node intake service - https://phabricator.wikimedia.org/T206815 (10Ottomata) [20:26:45] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Ottomata) [20:27:43] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Presto cluster online and usable with test data pushed from analytics prod infrastructure accessible by Cloud (labs) users - https://phabricator.wikimedia.org/T204951 (10Ottomata) [20:28:53] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10Ottomata) [20:29:46] 10Analytics, 10Pageviews-API: Yearly endpoint for the /pageviews/top API - https://phabricator.wikimedia.org/T154381 (10MusikAnimal) I meant to say this earlier: P7945 did not give me any results for 2016 (adjusting only the `year=` clause at https://phabricator.wikimedia.org/P7945$10). I tried three times. It... [20:29:53] 10Analytics, 10Analytics-Wikistats: Wikistats New Feature - DB size - https://phabricator.wikimedia.org/T212763 (10Milimetric) @TheSandDoctor I had read the page, and that section, which indeed itself says size is considered in many ways. My question is, which specific measurement do you think we should add t... [20:29:58] chasemp: o/ [20:30:10] bump on https://phabricator.wikimedia.org/T208251 when you get a chance, how do we get that moving? [20:30:25] we'd like to deploy this service this quarter, and a security review of this thing would be good before we dop that [20:32:21] 10Analytics, 10Pageviews-API: Yearly endpoint for the /pageviews/top API - https://phabricator.wikimedia.org/T154381 (10Milimetric) @MusikAnimal that's because the `namespace_id` field was added later, so the first CTE would just be empty with the `>= 100` filter. [20:36:28] Wow mforns - Just realized we forgot a complete pan of stuff :) Batcave again? [20:36:35] joal, sure! [20:41:00] 10Analytics, 10Fundraising-Backlog: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10AndyRussG) [20:41:02] 10Analytics, 10Pageviews-API: Yearly endpoint for the /pageviews/top API - https://phabricator.wikimedia.org/T154381 (10Tbayer) >>! In T154381#4860630, @Milimetric wrote: > @MusikAnimal that's because the `namespace_id` field was added later, so the first CTE would just be empty with the `>= 100` filter. Inde... [20:41:10] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Legoktm) >>! In T212420#4859662, @Ottomata wrote: > For this particular project, I'd like to move to github. This i... [20:45:02] 10Analytics, 10Pageviews-API: Yearly endpoint for the /pageviews/top API - https://phabricator.wikimedia.org/T154381 (10MusikAnimal) Eek, so the data I have for 2017 may also be a little off? I see T156993 was resolved in February. [20:48:28] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Ottomata) Most of the time I'm mostly unoppionated and am all for gerrit, especially when the target audience of the... [20:56:37] mforns: we can kill webrequest-load for upload now if you want - Shall I do that? [20:56:45] joal, sure :] [20:56:48] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Reedy) >>! In T212420#4860661, @Ottomata wrote: > But for softwares that are very generic and intended to be used wi... [20:57:13] (03PS1) 10Mforns: Update job-specific jars to v0.0.83 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482702 [20:57:18] joal, ^ [20:59:08] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Ottomata) > Are you likely to get many contributors? You never know! But for KafkaSSE, probably not. :) [20:59:12] thanks! [20:59:20] np :) [21:00:41] (03CR) 10Joal: [V: 03+2 C: 03+2] "LGTM - Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482702 (owner: 10Mforns) [21:01:29] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Paladox) T37497#4860677 - We can install the GitHub plugin (which gerrithub) uses to pull requests in. So users cont... [21:02:06] mforns: One job we didn't talk about that need to be-killed-restart: wikidata-Coeditor - Moving the task from CR to ready [21:02:22] joal, ok! [21:02:42] Thanks mforns - Heavy deploy this week ! [21:02:53] :] thank *you* [21:03:50] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Neil_P._Quinn_WMF) >>! In T212487#4839932, @elukey wrote: > `datasets` seems indeed not us... [21:05:03] !log Starting deployment of refinery using scap and refinery-deploy-to-hdfs [21:05:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:07:53] ottomata: that whole process is getting setup right now, for context there were 23 in backlog and we sorted out actionables on 13 today but this wasn't one. I'm making note and will ping folks here who would do the review (probably brian or sam) and get this moving in the next week or two hopefully. In theory there is a task template to use now but I'm not sure about it. [21:08:02] tldr it's not you, it's us I'm sure [21:08:56] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: T206785: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10chasemp) @charlotteportero do we have everything we need here to assign this at the next meeting?... [21:11:00] chasemp: thanks for the update much obliged :) [21:14:30] mforns: killed webrequest text as well as the bundle - You can restart the bundle with start hour 20:00 [21:14:50] joal, sure will do once refinery is deployed [21:15:02] great :) [21:15:07] thx [21:15:57] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Wikidata, and 3 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10JAllemandou) Bug found and corrected (patches above). Data is available now and t... [21:23:10] (03CR) 10Milimetric: [WIP] Update mediawiki-history comment and actor joins (037 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal) [21:24:35] !log Finished deployment of refinery using scap and refinery-deploy-to-hdfs, proceeding to restart oozie jobs [21:24:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:25:20] (03PS5) 10Milimetric: Update mediawiki-history comment and actor joins [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal) [21:28:23] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Wikidata, and 3 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10Addshore) I guess we can verify this after the next run! [21:28:56] (03CR) 10jerkins-bot: [V: 04-1] Update mediawiki-history comment and actor joins [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal) [21:30:09] 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Ottomata) If something like ^ worked nicely, my pro github arguments would be all moot and I'd be fine with gerrit a... [21:36:55] joal, webrequest load bundle restarted, looks good [21:43:45] \o/ mforns :) [21:44:13] hm hiccup [21:46:05] mediawiki-history-denormalize-coord restarted [21:47:01] mforns_: Oh by the way - please log about restarts - Can help trouble shott later on [21:51:24] mediawiki-history-load-coord restarted [21:55:54] webrequest-druid-hourly-coord restarted [21:57:37] webrequest-druid-daily-coord restarted [21:59:54] clickstream-coord restarted [22:01:44] wikidata-coeditors_metrics-coord restarted [22:02:01] !log Finished to restart oozie jobs after refinery deployment [22:02:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:08:46] 10Analytics, 10Analytics-Kanban, 10Readers-Web-Backlog, 10Patch-For-Review: Print schema is whitelisting both session ids and page ids - https://phabricator.wikimedia.org/T209050 (10pmiazga) @ovasileva @Tbayer looks like this task is done and we can resolve it, right? [22:21:54] 10Analytics, 10Analytics-Data-Quality, 10Growth-Team, 10Product-Analytics, 10Patch-For-Review: Add EditAttemptStep properties to the schema whitelist - https://phabricator.wikimedia.org/T208332 (10Neil_P._Quinn_WMF) We're waiting on Analytics to figure out what's going on here 😁 [22:22:44] ottomata, yt? It would be good to merge this patch: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/479847/ [22:23:07] ah cool mforns! can we do tomorrow? [22:23:08] otherwise, tomorrow HiveToDruid is going to fail [22:23:10] i'm about to sign off [22:23:11] oh. [22:23:14] did you update the jar? [22:23:15] ok, no prob [22:23:19] ottomata, yes [22:23:19] in the cron? [22:23:22] yes [22:23:26] it's in the change [22:23:51] there are two more changes... [22:23:53] oh, but it won't fail [22:23:54] right? [22:23:59] since it will still use 0.0.79 [22:24:06] until we merge, right? [22:24:25] hmmmm, makes sense... [22:24:42] ok, will leave the 3 puppet patches prepared for tomorrow :] [22:24:43] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Start refining all blacklisted EventLogging streams - https://phabricator.wikimedia.org/T212355 (10Neil_P._Quinn_WMF) >>! In T212355#4859706, @Ottomata wrote: > Ok! So just like the Edit schema data (which was migrated to EditAttemptStep), these are... [22:25:43] ok mforns sounds good, [22:25:49] thanks! [22:25:53] OH i have a dentist appointment tomorrow 10am my time [22:25:59] i might make standup, not sure.... [22:26:03] will send escrum [22:26:08] no prob, luca can merge [22:26:18] k [22:27:15] k [22:27:51] 10Analytics, 10Core Platform Team, 10EventBus, 10WMF-JobQueue, and 2 others: EventBus error "Unable to deliver all events: (curl error: 28) Timeout was reached" - https://phabricator.wikimedia.org/T204183 (10Pchelolo) > When searching on Logstash/mediawiki-errors for Unable to deliver all events the vast m... [22:31:15] 10Analytics, 10Analytics-SWAP, 10Contributors-Analysis, 10Product-Analytics: Provide Python 3.6 on SWAP - https://phabricator.wikimedia.org/T212591 (10Neil_P._Quinn_WMF) >>! In T212591#4859833, @Ottomata wrote: > The extra tricky think about this SWAP setup is ALL user jupyterhub virtualenvs will have to b... [22:33:30] 10Analytics, 10Analytics-SWAP, 10Contributors-Analysis, 10Product-Analytics: Provide Python 3.6 on SWAP - https://phabricator.wikimedia.org/T212591 (10Ottomata) Hmm, I'm not sure if it would work, but it might. We should be able to at least install the python 3.6 binary (we have deb packages for it now :... [22:44:48] bye teaam [22:55:01] RECOVERY - HDFS corrupt blocks on an-master1001 is OK: (C)5 ge (W)2 ge 1 https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen [23:08:35] (03PS1) 10GoranSMilovanovic: pyspark: new etl + refactor engine [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/482733 [23:09:14] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] pyspark: new etl + refactor engine [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/482733 (owner: 10GoranSMilovanovic) [23:13:46] (03PS1) 10GoranSMilovanovic: ignore params [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/482734 [23:14:14] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] ignore params [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/482734 (owner: 10GoranSMilovanovic)