[01:05:17] PROBLEM - Throughput of EventLogging NavigationTiming events on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=6&fullscreen&orgId=1 [06:14:53] morningggg [07:09:02] 10Analytics-Kanban, 10Patch-For-Review: Eventlogging's processors stopped working - https://phabricator.wikimedia.org/T200630 (10Volans) >>! In T200630#4460196, @Nuria wrote: > Length of UA at fault is 1000 chars and regular UAs are about 200 chars, how about not running through UA parser anything bigger tha... [07:13:03] hey elukey :] [07:13:09] morniinng [07:13:54] hola :) [07:14:17] if you haven't seen https://phabricator.wikimedia.org/T200630 we have eventlogging down since saturday [07:14:40] reading [07:14:52] we can bc if you want [07:24:09] elukey, done, yes, please, batcave? [07:24:46] ack! [08:08:24] RECOVERY - Check status of defined EventLogging jobs on eventlog1002 is OK: OK: All defined EventLogging jobs are runnning. [08:09:04] RECOVERY - Throughput of EventLogging NavigationTiming events on einsteinium is OK: (C)0 le (W)1 le 1.434 https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=6&fullscreen&orgId=1 [08:30:47] 10Analytics-Kanban, 10Patch-For-Review: Eventlogging's processors stopped working - https://phabricator.wikimedia.org/T200630 (10elukey) p:05Unbreak!>03High [10:43:03] * elukey afk for a couple of hours! [13:42:56] 10Analytics-Kanban, 10Patch-For-Review: Eventlogging's processors stopped working - https://phabricator.wikimedia.org/T200630 (10elukey) Opened https://github.com/ua-parser/uap-core/issues/332 to upstream. [14:02:41] PROBLEM - eventlogging Varnishkafka log producer on cp1075 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:03:57] cp1075 is having issues --^ [14:04:05] (so not only a vk problem) [14:04:42] RECOVERY - eventlogging Varnishkafka log producer on cp1075 is OK: PROCS OK: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/eventlogging.conf [14:08:17] 10Quarry: Add a possibility to delete a draft - https://phabricator.wikimedia.org/T135908 (10Halfak) Right. I think it makes a lot of sense to *archive* old queries so that they don't clutter up useful, but not-worth-publishing queries. Given that queries don't take up a huge amount of drive space, it seems th... [14:39:47] 10Analytics-Kanban, 10Patch-For-Review: Eventlogging's processors stopped working - https://phabricator.wikimedia.org/T200630 (10elukey) Incident report started in https://wikitech.wikimedia.org/wiki/Incident_documentation/20180728-eventlogging [15:55:02] 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Can't write from Spark to local FS - https://phabricator.wikimedia.org/T200609 (10Milimetric) hi Goran, what are you trying to accomplish? It looks like somehow it was trying to write to "/" which makes sense is acce... [15:57:17] 10Analytics, 10Analytics-Wikistats: [Wikistats 2] Provide metric area headings in the 'Explore Topics' dropdown - https://phabricator.wikimedia.org/T200498 (10fdans) p:05Triage>03Low [15:59:24] 10Analytics, 10Analytics-Wikistats: [Wikistats 2] Provide metric area headings in the 'Explore Topics' dropdown - https://phabricator.wikimedia.org/T200498 (10mforns) @sahil505 Let's concentrate on the popups fix, and then if we have time, we can work on this. Is this OK? [15:59:49] 10Analytics, 10Analytics-Wikistats: [Wikistats 2] Bug in time-range selector on detail page - https://phabricator.wikimedia.org/T200497 (10Milimetric) p:05Triage>03High [16:00:02] 10Analytics, 10Analytics-Wikistats: [Wikistats 2] Bug in time-range selector on detail page - https://phabricator.wikimedia.org/T200497 (10Milimetric) a:03mforns [16:00:42] 10Analytics, 10Analytics-Wikistats: [Wikistats 2] Provide metric area headings in the 'Explore Topics' dropdown - https://phabricator.wikimedia.org/T200498 (10sahil505) Absolutely. No problem at all. [17:03:53] (03PS1) 10Mforns: Add saltrotate, a script that manages cryptographic salts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/449249 (https://phabricator.wikimedia.org/T199899) [17:06:42] (03CR) 10Mforns: [C: 04-1] "Still needs testing" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/449249 (https://phabricator.wikimedia.org/T199899) (owner: 10Mforns) [17:40:27] * elukey off! [18:09:21] milimetric, yt? [18:09:56] hi mforns, wassup [18:10:36] milimetric, hey :] I added a couple methods to HdfsUtils in refinery, and can not manage to make the script that I'm writing to access that [18:10:47] do I need to setup python in refinery? [18:10:58] I mean, setup install [18:11:18] it's probably executing from another path [18:11:45] mforns: did you do the export PYTHON_PATH thing (not sure about the spelling)? [18:12:04] no, forgot [18:12:11] I'll point it at my copy [18:12:41] yeah, that's the trick [18:13:00] yessss, thanks :] [18:20:58] milimetric, another quick question: [18:21:33] is it possible with logging_setup.py to output INFO and DEBUG logs to stdout? [18:21:43] or only file? [18:24:02] not with the current way it's set up, mforns, you'd have to change the code to take another parameter or something, that would be mutually exclusive with log_file [18:24:36] milimetric, ok I'm not sure I need this though [18:24:37] right now it either sends INFO and DEBUG to log_file or doesn't configure a handler, so I guess then it would have the default handler [18:24:54] try it without the log_file parameter, that should be fine [18:25:46] yes, I was testing it by hand and wanted to see the logs, will pass a file name [18:25:57] actually, this means that I need to add a log_file parameter to the script [18:27:43] yeah. I think that's good, all of our scripts logging the same seems easier to deal with [19:07:40] (03PS2) 10Mforns: Add saltrotate, a script that manages cryptographic salts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/449249 (https://phabricator.wikimedia.org/T199899) [19:13:10] (03PS3) 10Mforns: Add saltrotate, a script that manages cryptographic salts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/449249 (https://phabricator.wikimedia.org/T199899) [19:13:36] (03CR) 10Mforns: "Ok, this is ready to review!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/449249 (https://phabricator.wikimedia.org/T199899) (owner: 10Mforns) [19:13:52] 10Analytics, 10Analytics-Kanban: Scan npm dependencies for vulnerabilities - https://phabricator.wikimedia.org/T200717 (10Milimetric) [19:14:05] 10Analytics: Scan npm dependencies for vulnerabilities - https://phabricator.wikimedia.org/T200717 (10Milimetric) [19:33:07] can anybody help me with https://wikitech.wikimedia.org/wiki/Analytics/Ad_hoc_datasets ? It refers to /srv/published-datasets/ but files there and files on https://analytics.wikimedia.org/datasets/archive/public-datasets/ are very different [19:33:17] so how one gets from here to there? [19:42:37] SMalyshev: I can help with that, lemme look real quick and remember exactly how it's set up [19:43:13] milimetric: thanks! looks like https://analytics.wikimedia.org/datasets/ has some of what /srv has but also some more directories [19:43:52] so I wonder for example how to get something into one-off/ [19:43:54] SMalyshev: right, so https://analytics.wikimedia.org/datasets/ is the mirror of three things: /srv/published-datasets from two machines, and some old archived stuff [19:44:12] milimetric: ah, there's two of them! so one is stat1005 and another? [19:44:59] yeah, I think 1006, validating [19:45:21] SMalyshev: yeah, 1006 has one [19:45:28] ah, looks like I don't have access to 1006, I wonder what's the diff... [19:45:39] and stat1004 has one [19:46:11] hm, there must be some other one because I don't see a one-off in any of those three [19:46:23] 1004 one is empty [19:47:37] I wonder why I can't get to 1006... I'm in analytics-privatedata-users but looks like there's more needed [19:50:45] ok, SMalyshev I got to the bottom of it [19:51:05] so this folder was synced to from 1002, 1003, 1004, 1005, 1006 [19:51:19] so one-off must be from one of the old decomissioned hosts, like 1002 or 1004 [19:51:24] *1002 or 1003 I meant [19:51:50] aha. so if I wanted to add something, I could just add it on 1004? [19:52:08] as for access to 1006, you need statistics-users, as per https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups [19:52:35] oh, complicated! [19:52:39] SMalyshev: in theory, if you add a one-off folder on any of those boxes, including 1004, it should merge it with the existing one-off folder. It could also delete that folder completemly [19:52:46] maybe I don't need it if 1004 works out [19:52:59] it'll work to keep your stuff for sure... [19:53:05] milimetric: ok, thanks, I'll try it and see [19:53:06] one sec, I'll copy that folder just to be safe :) [19:53:41] milimetric: np, I don't have the final data yet anyway, I can wait a bit, just wanted to know where to place it [19:53:47] probably will have it this week... [19:54:18] gr, it's 7.9G [19:54:38] I'll ping you when it's done [19:56:11] the script that does it is this: https://github.com/wikimedia/puppet/blob/fca2647342fb393e596916b8f461a5f78a8ae2c4/modules/statistics/files/hardsync.sh [19:58:03] ok SMalyshev I backed it up just in case, go ahead and let me know what happens :) [20:06:51] milimetric: thanks! [20:14:22] (03PS1) 10Milimetric: Update lock file to exclude vulnerable package [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/449286 (https://phabricator.wikimedia.org/T200695) [20:14:23] 10Analytics, 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review, 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10Pchelolo) [20:14:40] (03CR) 10Milimetric: [V: 032 C: 032] Update lock file to exclude vulnerable package [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/449286 (https://phabricator.wikimedia.org/T200695) (owner: 10Milimetric) [20:41:15] (03PS2) 10Mforns: Add MobileApp fixes to EL sanitization whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447088 (https://phabricator.wikimedia.org/T200095) [20:45:38] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Enable TLS and authorization for cross DC MirrorMaker - https://phabricator.wikimedia.org/T196081 (10Pchelolo) [20:45:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Re-enable cross DC mirroring of job and change-prop Kafka topics over TLS - https://phabricator.wikimedia.org/T197254 (10Pchelolo) 05Open>03Resolved Seems like this was done. Resolving. [21:01:03] 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Can't write from Spark to local FS - https://phabricator.wikimedia.org/T200609 (10GoranSMilovanovic) @Milimetric Hi Dan, thanks for responding. The following is R/SparkR code of which only what follows the `# - write... [21:54:43] byeee team [22:18:36] 10Analytics, 10Analytics-Cluster: Upgrade spark 2.3.0 -> 2.3.1 on analytics cluster - https://phabricator.wikimedia.org/T200732 (10EBernhardson) [22:34:38] 10Analytics, 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review, 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) As requestsed, I've sent an email to ops list, cc'd to mobrovac, giving a... [22:35:50] is anything happening with https://phabricator.wikimedia.org/T164020 ? it looks like everything was prepared for it to happen and then it didn't...