[00:31:45] 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Setup Config:Dashiki:WMCSEdits on meta wiki - https://phabricator.wikimedia.org/T236223 (10srishakatux) Update > I have the config here https://meta.wikimedia.org/wiki/Config:Dashiki:WMCSEdits, and I can use it to generate a dashboard with t... [06:53:53] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Rerun sanitization before archiving eventlogging mysql data - https://phabricator.wikimedia.org/T236818 (10elukey) As part of this task I'd also clean up `profile::mariadb::misc::eventlogging::sanitization` from the db110[78] hosts :) Script to run... [07:31:19] brb [08:06:31] 10Analytics, 10User-ArielGlenn: Spike [2019-2020 work] Oozie Replacement. Airflow Study / Argo Study - https://phabricator.wikimedia.org/T217059 (10ArielGlenn) [08:21:13] Good morning team [08:23:38] bonjour! [08:25:00] All good today elukey ? [08:28:12] joal: I hope so! I am about to create the TLS certs for the hadoop hosts, if this goes fine I'll deploy them and the map-reduce config later on [08:28:25] should be a no-op, and then next week we'll be ready :) [08:28:26] \o/ [08:29:19] Thanks a lot for all the effort you put elukey - This really is a marathon, and you're close to the finish line :) [08:29:37] joal: 30km mark more or less, the hardest! :D [08:29:58] ah joal https://github.com/internetarchive/snakebite-py3/issues/5 [08:30:11] I also pinged Jarek in the airflow issue [08:31:10] Awesome elukey :) [09:07:03] 10Analytics: Understand why SQL string pattern matching differ from Hive to Spark - https://phabricator.wikimedia.org/T236985 (10JAllemandou) [09:37:51] fdans: Please ping me if/when you join :) [09:38:17] joal: let's talk now about the jobs? [09:38:29] Indeed fdans :) [09:38:33] sorry, I was in but got entangled with the list admin [09:39:37] joal: I'm in bc [09:39:43] np fdans - joining ! [10:03:42] 10Analytics, 10Analytics-Kanban, 10Multimedia, 10Tool-Pageviews: Make job to backfill data from mediacounts into mediarequests tables in cassandra so as to have historical mediarequest data - https://phabricator.wikimedia.org/T234591 (10fdans) Per file failed jobs during backfilling: - 17 jul 2015 - 5 mar... [10:06:19] joal: deploying TLS certs! [10:06:25] Yes! [10:17:06] !log deploy TLS certificates for MapReduce Shufflers on Hadoop worker nodes (no-op change, no yarn-site config) [10:17:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:17:26] joal: all right when we enable the encrypted shuffler? Monday? [10:21:15] elukey: as you want :) [10:24:37] joal: something like 15:30? or earlier/later/etc..? [10:25:11] elukey: 14:00 would be better if feasible (I'm teaching in the morning and then get kids at 16:00) [10:25:18] Would that work for you? [10:29:39] ah snap I think I'll be off at that time, we can do Tue 14:00? [10:30:30] we also have to upgrade spark to 2.4 [10:30:34] on Tue [10:30:38] argh [10:33:04] ok I'll find another time, ideally I'd like to avoid mixing it with Spark 2.4 [10:33:19] and leave a couple of days between each of them [10:33:27] Thursday morning? [10:33:34] elukey: monday at 12:00? [10:33:52] Thursday morning works for me [10:34:02] all right then, so you'll not have to rush [10:34:14] let's say 10 AM CET indicatively [10:34:23] I'll send an email and open a task [10:34:24] :) [10:34:43] ack! [10:48:24] joal: OR, if we are brave, we do it today :P [10:48:51] elukey: Works for me - I'll be on/off tomorrow as it's a bank holiday in France [10:48:58] But fine by me to do it now :) [10:49:40] 10Analytics: Enable TLS encryption for the MapReduce Shufflers in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T236995 (10elukey) [11:03:24] joal: I'll wait a bit, now I am writing the code change to enable TLS but it might be trickier than I thought [11:03:31] Ah [11:03:33] ok :) [11:03:33] too eager to get this done [11:03:36] huhu [11:03:56] When you'll have a minute elukey my CR has been updated with data-purge change :) [11:04:04] yep next in my list :) [11:04:10] Many thanks :) [11:04:10] the following properties [11:04:11] core_site_extra_properties: [11:04:11] hadoop.ssl.enabled: true [11:04:11] hadoop.ssl.enabled.protocols: 'TLSv1.2' [11:04:21] should not cause any other side effects [11:04:24] but I want to be sure [11:04:28] I forgot about them [11:05:35] ah no I only need to set the second one [11:05:41] okok [11:05:49] well anyway, need to study :) [11:05:59] Deprecated. Use dfs.http.policy and yarn.http.policy instead. [11:06:17] elukey: --^ this is what https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/core-default.xml tells me about the first one [11:06:48] elukey: second one seems indeed just a declaration, not usage [11:07:40] yes even in https://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html it is not mentioned [11:13:24] (03CR) 10Elukey: [C: 03+1] "Looks good to me, Mforns what do you think?" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547168 (owner: 10Awight) [11:31:46] I always forget how to check which snapshots are availible :/ / which are the newest, [11:50:37] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10Wikimedia-production-error: ERROR: Additional properties are not allowed ('sort' was unexpected) - https://phabricator.wikimedia.org/T236999 (10MarcoAurelio) [11:51:29] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10Wikimedia-production-error: ERROR: Additional properties are not allowed ('sort' was unexpected) - https://phabricator.wikimedia.org/T236999 (10MarcoAurelio) Mostly logspam I guess but it's ammounting to ~400 entries in logstash-beta.... [12:38:16] * elukey afk for a bit! [13:35:45] o/ Looking at https://phabricator.wikimedia.org/T235845 apparently centralnoticeimpression is not being populated? I have no idea of anything about this new banner imporession event logging pipe line thing, but does anyone else have any pointers? [13:36:15] addshore: u can also try #wikimedia-fundraising [13:41:34] elukey: about setting the resource absent (file and timer), I assume they must be confirgured the exact same? [13:41:55] elukey: about setting the resource absent (file and timer), I assume they must be configured the exact same? [13:41:59] -typo [13:44:00] elukey: And in that case, does it mean I can't delete the tempate because puppet will try to render it? [13:49:01] +1 to addshore's question, this eventlogging table abruptly drops by ~500x on Oct 24th. [13:50:25] hmm, will look into it, need to finish something first... [13:54:20] ottomata: I think it's on the CentralNotice side, not analytics infra. [13:55:40] awight: Just looked at actions taken around Oct 24th - And nothing related to EL comes to mind [13:56:29] joal: I found a perfectly timed CentralNotice commit, so lemme check with that team first. Sorry for the noise! [13:56:48] np awight :) Thanks for pointing the issue! [14:00:42] joal: here I am sorry [14:01:06] yes possibly, we can absent it and keep the template in the first go, then delete in a second pass (I'll do it) [14:01:25] ok elukey [14:03:31] awight: Well, I saw the event land on the event beacon and enter the webrequest stream, so afaik, it has to be between that and the event logging table? [14:03:43] * addshore isn't sure of the moving parts in between those 2 places though [14:04:36] addshore: It's a validation issue, see latest comments in the task... [14:05:00] Thanks for pushing, we caught the bug just in time :D [14:05:02] oooh, campaignStatuses [14:05:20] https://phabricator.wikimedia.org/T236627 ? [14:06:11] Nice connection! [14:06:18] 2 tickets actually, added a comment [14:06:45] Whenever one sees something as searchable as "campaignStatuses", one just has to search for it and see what comes up, after all, everything is connected ;) [14:07:09] Got that Wikidata :brain as machine: [14:15:53] hey teamm! [14:16:11] running home, be back in a bit! [14:18:09] (03PS2) 10Awight: Mock date to prevent massive backfilling [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547190 [14:19:15] Hea fdans - Will you tchutchu this week? [14:21:00] elukey: weird - previous failure was telling me the thing was incorrectly indented (https://integration.wikimedia.org/ci/job/operations-puppet-tests-stretch-docker/25023/console) [14:21:08] joal: I’m in the hospital right now, will probably be in for standup, I can do train afterwards [14:21:26] elukey: And now it tells me it's incorrectly indented ... https://integration.wikimedia.org/ci/job/operations-puppet-tests-stretch-docker/25030/console [14:21:54] fdans: Arf :( Let us know how thing go for you, so we can cover for you if needed! [14:22:42] elukey: my complete bad, I should have read better the thing [14:22:47] please excuse my noise elukey [14:24:44] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547184 (https://phabricator.wikimedia.org/T236687) (owner: 10Joal) [14:26:08] ottomata: I am now inspecting what has caused one my R processes to hit 50Gb of RAM on stat1007. It is an interactive R session, so if you notice something eating up too much memory do not worry I will be there to stop the process immediately. I need to know the cause before I start refactoring the code, it might be the rendering from {ggplot2} that has caused the problem. Thank you. [14:26:54] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Thanks Awight for improving this!" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547168 (owner: 10Awight) [14:31:22] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Wow, of course, tests run much faster! Thanks a lot." [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547190 (owner: 10Awight) [14:33:58] mforns: If you're still in the CR zone, thought I would mention that this one is ready for review as well: https://gerrit.wikimedia.org/r/#/c/analytics/reportupdater-queries/+/542419/ [14:34:15] awight, looking, thanks [14:34:54] Cool! Not a huge rush since it will backfill by design, but it would be great to have in the next week or two. [14:39:06] joal, ottomata - ok if I merge + roll restart NMs for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/547522/ ? [14:39:17] Yessir for me [14:39:30] the more I check it the more it should be safe, and easy to quickly rollback in case [14:47:12] joal: the shufflers are now TLS equipped [14:47:18] Yay ! [14:49:57] elukey: sure! [14:50:00] yeehaw! [14:50:57] I am testing a map-reduce job now [14:51:01] disabled the timers [14:51:26] ok GoranSM [14:51:30] thanks for the heads up [14:51:47] let us know if there is a way we can help you move it to distributed spark stuff [14:52:02] joal: I have run a simple grep on a text file [14:52:03] https://yarn.wikimedia.org/jobhistory/attempts/job_1571142484661_62212/r/SUCCESSFUL [14:52:06] looks working [14:52:23] if you guys are ok I'd restart the timers [14:52:53] sure [14:52:54] sure elukey - first mapreduce jobs will test us :) [14:53:40] (03CR) 10Mforns: [C: 03+1] "LGTM! Minor nitpicky comments inline. But +1 otherwise!" (034 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/542419 (https://phabricator.wikimedia.org/T214493) (owner: 10Awight) [14:53:59] !log enabled encrypted shuffle option in all Hadoop Analytics Yarn Node Managers [14:54:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:54:39] * elukey loves systemctl stop '*.timer' [14:55:34] awight, I +1'ed the reportupdater-queries patch, and left some minor comments. Have a look and I can merge [15:08:25] mforns: Right on, thanks. I'll probably be able to get to the cleanup tomorrow. [15:14:32] (03CR) 10Awight: New reports for Reference Previews (033 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/542419 (https://phabricator.wikimedia.org/T214493) (owner: 10Awight) [15:15:08] (03PS10) 10Awight: New reports for Reference Previews [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/542419 (https://phabricator.wikimedia.org/T214493) [15:17:06] something is off [15:17:37] I can clearly see on port 13562 a TLS handshake with openssl s_client [15:17:57] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Thanks!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/542419 (https://phabricator.wikimedia.org/T214493) (owner: 10Awight) [15:17:58] but then if I tcpdump the traffic I see plain HTTP requests [15:18:01] A lot of failed reduces [15:18:07] https://yarn.wikimedia.org/proxy/application_1571142484661_62237/mapreduce/job/job_1571142484661_62237 [15:18:29] elukey: stop timers again? [15:19:25] sure done [15:19:39] but not all the jobs reported issues no? [15:20:06] one thing that it is missing may be mapreduce.ssl.enabled: true [15:20:19] I thought it was something not needed [15:20:20] Could be :) [15:20:38] elukey: all jobs fail IMO [15:20:45] elukey: more precisely, all jo [15:20:51] all jobs needing to have reduce [15:22:23] that makes sense, if they try to pull from a shuffler via plain http the fail [15:22:31] and it would explain my tcpdump [15:22:33] all right fixing [15:23:57] (03CR) 10Nuria: [C: 04-1] Refactor data_quality oozie bundle to fix too many partitions (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [15:27:44] job failures are my fault sorryyy [15:28:33] all right new setting applied joal [15:28:42] I'll restart a webrequest job [15:28:43] elukey: give me a minute to test please [15:29:38] joal: too late sorry [15:29:55] elukey: testing is as easy as a simple hive query :) [15:30:00] worked :) [15:30:14] I forsee this webrequest job succeeding :) [15:31:00] I should have done it before, my hadoop test job was of course stupid enough to not trigger the issue [15:31:01] !log Rerun webrequest jobs for hour 2019-10-31T14:00 after failure [15:31:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:31:03] let's see! [15:31:16] elukey: I have run a job using a proper reduce :-P [15:31:33] lesson learned :D [15:31:51] elukey: wordcount, or pi, as examples if you want, or a simple hive query :) [15:34:08] I tried wordcount, but it succeeded [15:42:58] joal: I think that webrequest is not really working [15:43:59] Ah! Crap :( [15:44:34] It feels like an oozie issue [15:47:49] elukey: restart oozie, to make it update its hadoop config? [15:48:46] really strange though, will do [15:49:13] ah so those settings are only in workers [15:49:16] not in the coordinator [15:49:25] ? [15:49:31] Ahhhh [15:49:36] no my bad [15:49:39] it is the common settings [15:51:50] ok so the setting was not in coordinator, for some reason [15:51:57] I just added it, and restarted oozie [15:52:05] joal: shall I start another test? [15:52:11] Please! [15:52:33] I was about to taunt you about this time the oozie-test being the correct one :) [15:55:34] same issue I think [15:55:35] weir [15:55:44] Meh :( [15:55:55] hive worked in your manual run right? [15:56:04] yes [16:00:41] ping mforns , fdans , joal [16:01:07] need to drop for kids - sent e-scrum [16:01:21] elukey: will be back in ~1h I think to help with TLS :( [16:01:55] yeah I think that I'll rollback if I don't find what's happening [16:22:30] mforns: I just noticed that the pingback graphs at https://pingback.wmflabs.org are empty. That's still the correct place to view them, right? [16:25:11] CindyCicaleseWMF, I think this might be caused by the rename of reports directory in analytics.wikimedia.org/datasets [16:25:48] it should redirect transparently, but there might be an issue there, will discuss this with the team [16:26:47] mforns: thanks! [16:28:10] (03PS2) 10Cicalese: Update to include 1.33 and 1.34 [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/545917 (https://phabricator.wikimedia.org/T223414) [16:33:53] joal: so I have restarted hive server, and now webrequest refine works [16:36:05] !log restart oozie and hive-server2 on an-coord1001 to pick up new new TLS mapreduce settings [16:36:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:50:49] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/545917 (https://phabricator.wikimedia.org/T223414) (owner: 10Cicalese) [16:54:23] Of course elukey! I should have tested beeline (which would have failed) [16:56:46] elukey: hm another thought [16:56:55] we could prefix the topics in kafka: logging-eqiad [16:57:12] but still only use the datacenter=eqiad in the hive table [16:57:58] we'd have problems if there was ever another *eqiad topic with the same name in a different kafka cluster to import into the same table [16:58:10] yep exactly [16:58:27] but the partition naming would still be accurate [16:58:28] joal: just used tcpdump, now I can finally see TLS 1.2 [16:58:35] \o/! [16:58:40] Great elukey :) [16:58:43] datacenter=eqiad, even if the topic prefix was logging-eqiad [16:59:12] and some logic I guess to handle a name clash in topic names? [17:03:41] ottomata: as FYI if a rollback is needed (I hope not) https://phabricator.wikimedia.org/T236995#5624106 [17:04:34] 10Analytics, 10Desktop Improvements, 10Event-Platform, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): [SPIKE 8hrs] How will the changes to eventlogging affect desktop improvements - https://phabricator.wikimedia.org/T233824 (10phuedx) a:03phuedx [17:05:52] well elukey no special logic needed [17:05:56] we'd just have to do something different if that happened [17:05:59] which it probably won't. [17:06:09] but we'd have to do something manual to start importing anyway [17:06:14] sure sure [17:06:14] so we'd know at that time to do something different [17:29:28] (03PS3) 10Joal: Update oozie datasets to match dumps import change [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547169 (https://phabricator.wikimedia.org/T234333) [17:32:57] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, and 4 others: Create new eventgate-logging deployment in k8s with helmfile - https://phabricator.wikimedia.org/T236386 (10fgiunchedi) >>! In T236386#5621245, @Ottomata wrote: > @fgiunchedi, q for you. In T23... [17:36:59] (03PS1) 10Fdans: Update changelog for 104 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/547588 [17:39:29] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10jlinehan) >>! In T228175#5606900, @Ottomata wrote: > My preferences: > > - Strong preference for Config-1. > - In favor of Pr... [17:40:33] (03CR) 10Fdans: [V: 03+2 C: 03+2] Update changelog for 104 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/547588 (owner: 10Fdans) [17:48:41] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547169 (https://phabricator.wikimedia.org/T234333) (owner: 10Joal) [17:49:27] !log deplying refinery-source 0.0.104 [17:49:28] fdans: I have updated the train-doc quite a lot as I had made mistakes - Do ou mind reviewing it while I'm here? [17:49:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:50:20] joal: looks good to me! [17:50:40] Cool - I thought no data move was needed but actually there is [17:50:44] fdans: --^ [17:50:56] yeah I saw, no problem! [17:50:57] fdans: I'm going for diner, will be back after to double you're all good [17:51:11] cool I'll be here joal [17:52:03] fdans: rollback steps added to https://etherpad.wikimedia.org/p/analytics-weekly-train [17:52:06] let me know if they are ok [17:52:17] saw that, thank you elukey :) [17:52:48] mforns: yt? [17:53:15] i'm seeing -Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://analytics.wikimedia.org/datasets/periodic/reports/metrics/pingback/count.tsv. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). [17:53:18] all right, logging off! [17:53:22] byeee luca! [17:53:26] o/ [17:53:26] at https://pingback.wmflabs.org/#unique-wiki-count [17:53:42] but we set CORS header at the doc root [17:53:43] .... [17:53:46] that didn't change [17:54:02] perhaps the redirect with the CORS thing is a problem? [17:58:59] ottomata: ah ya, developer tools cannot request via cors [17:59:11] !log deploying refinery [17:59:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:00:35] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-FUN-Funban-2019, 10WMDE-FUN-Sprint-2019-10-14, 10WMDE-New-Editors-Banner-Campaigns (Banner Campaign Autumn 2019): Implement banner design for WMDEs autum new editor recruitment campaign - https://phabricator.wikimedia.org/T235845 (10AndyRussG) Scheduled t... [18:02:47] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5623642, @Nuria wrote: > Did we decided where is the error client code g... [18:03:13] i think the redirect is somehow squashing the cors header [18:04:47] ottomata: I'm having the git-fat shenanigans with scap deploy when deploying refinery :( [18:04:57] oh boy [18:04:59] what this time? [18:05:06] i did just change git fat stuff [18:05:12] soooo let's see if it is caused by my change [18:05:20] https://www.irccloud.com/pastebin/zKPjRR0L/ [18:05:25] ottomata: [18:09:16] hm [18:10:46] hmmm [18:10:56] fdans: da39a3ee5e6b4b0d3255bfef95601890afd80709 is set for all of the artifats [18:11:01] and is not a real binary sha [18:11:10] i thihnk this happened to dan the other week [18:12:09] hmmmm [18:12:11] ottomata: hmmmm [18:12:30] fdans: how much time was there between when the refinery source build finished, and when you submitted the jenkins job to update the refinery artifacts? [18:13:05] hmmm like no more than 5 minutes? [18:13:14] i think this might be why [18:13:29] the cron to generate the git fat shas only runs every 5 minutes [18:13:41] i'm going to rerun that job and see if it makes a better commit [18:14:41] oh [18:14:41] Resolving archiva.wikimedia.org (archiva.wikimedia.org)... 404 Resource does not exist [18:14:42] hm [18:14:44] that's different [18:14:53] https://archiva.wikimedia.org/repository/releases/org/wikimedia/analytics/refinery/hive/refinery-hive/0.0.104/refinery-hive-0.0.104.jar [18:15:12] fdans: did the 0.0.104 refinery-source build job succeed? [18:15:39] i think it didn't! [18:15:40] https://integration.wikimedia.org/ci/job/analytics-refinery-release/212/ [18:15:46] ottomata: https://integration.wikimedia.org/ci/job/analytics-refinery-release/211/ [18:16:01] hm [18:16:02] what the hell [18:16:15] I don't recall starting 212 [18:16:27] 211 has is using master [18:16:30] https://integration.wikimedia.org/ci/job/analytics-refinery-release/212/parameters/ [18:17:48] ottomata: should I start a new one? I'm not sure what I did wrong [18:18:03] [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-deploy-plugin:2.7:deploy (default-deploy) on project refinery: Failed to deploy artifacts: Could not transfer artifact org.wikimedia.analytics.refinery:refinery:pom:0.0.104 from/to archiva.releases (https://archiva.wikimedia.org/repository/releases/): Failed to transfer file: [18:18:03] https://archiva.wikimedia.org/repository/releases/org/wikimedia/analytics/refinery/refinery/0.0.104/refinery-0.0.104.pom. Return code is: 401, ReasonPhrase:Unauthorized. -> [Help 1] [18:18:05] ??? [18:18:09] in [18:18:09] https://integration.wikimedia.org/ci/job/analytics-refinery-release/212/console [18:18:59] fdans: yeah let's try again i'm not totally sure.... [18:19:06] goign to delete the 0.0.104 branch [18:19:30] ottomata: and the commits in refinery-source? [18:20:16] hmm ya will reset to before [18:20:58] fdans: i'm going to launch it [18:21:12] ottomata: thx [18:29:02] huh. [18:29:09] still broken ok somethign is wrong...maybe with passwords??? [18:29:21] oh [18:29:24] archiva passwrods expire!!!! [18:29:25] 😭 [18:33:58] ok trying again, i think i fixed. and maayyybe told it not to expire the pw [18:35:43] ottomata: awesome! [18:43:25] ottomata: failed :__ [18:43:28] yeha... [18:43:45] not sure why, same error, am in password/macos update/gpg hell rigiht now [18:44:29] i need gpg to read our pws [18:44:33] but i updated macos [18:44:36] and https://gpgtools.tenderapp.com/ [18:44:43] many things happening [18:45:20] maybe that's just for mail [18:45:20] dunno [18:45:24] am tryhing to get it elsewher [18:45:42] whatever it is my previously installed gpg binary doesn;t work [18:51:14] GRR [18:51:18] × [18:51:19] Your password cannot match any of your previous 6 password(s). [18:54:37] uff every time this mess :( [18:54:45] there must be some weird setting in there [18:54:55] it already happened 3 times that I can remember [18:55:03] ottomata: can I help with pwstore or anything? :( [18:57:19] elukey: i think i finally got it [18:57:33] trying again [18:57:39] ack [18:59:38] 10Analytics, 10service-runner, 10User-Elukey: Upgrade service-runner on AQS to unblock rsyslog logging - https://phabricator.wikimedia.org/T236757 (10Ottomata) Oh ho hoooo righto. [19:02:12] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Ottomata) So I'm pretty sure that ^ should fix, but the responses will have to expire out of varnish cache before they start retu... [19:04:15] * elukey afk again! [19:10:24] ottomata: YES [19:11:07] ottomata: ok to update the symlinks? [19:12:02] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Nuria) Several things that come to mind: I think the list of features that wpould be included in a 1st implementation is a bi... [19:13:10] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Nuria) Let's include the perf team as well, this is client side code cc @Gilles and @Krinkle [19:16:52] welp. I'm updating the symlinks [19:17:21] !log updating jar symlinks to 0.0.104 [19:17:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:20:39] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Nuria) Ah, sorry I see link to prototype code here: https://github.com/linehan/wmf-epc/blob/jason/src/js/epc.js [19:21:03] fdans: sorry was afk for a min [19:21:05] yes go for it! [19:21:59] ottomata: just finished, is there anything I should do before scapping again? [19:23:52] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10jlinehan) >>! In T228175#5624465, @Nuria wrote: > * Activity identifiers (tag all events comprising an individual run of a fun... [19:29:46] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10JAllemandou) Some maths about datasize increase using approximated ratios for values TLS-Version, Key-Exchange, Auth and Cipher from... [19:33:43] 10Analytics: Add data-purge for processed mediawiki_wikitext_history - https://phabricator.wikimedia.org/T237047 (10JAllemandou) [19:36:37] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Ottomata) > The core functionality is being extended but the basic functionality has not changed, as such we would expect a l... [19:41:39] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Ottomata) I don't expect addition of these fields to really impact Kafka or much else! :) > I think representing those values in a... [19:42:08] !log refinery deployment complete [19:42:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:43:25] !log restarting mediawiki-history-wikitext [19:43:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:43:48] !log (changing jar version first) [19:43:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:45:20] !log (actually no, no need) [19:45:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:51:57] joal: you're there by any chance? [19:54:17] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10jlinehan) >>! In T228175#5624514, @Ottomata wrote: > From ResourceLoader's perspective, Producer-1 and Producer-2 are identica... [20:04:15] fdans: nice stuff [20:04:27] sorry i keep missing your pings!' [20:04:43] ottomata: naaa thanks for your help, the cluster's deployed [20:08:28] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10jlinehan) Talking some more with @Ottomata on IRC, we agreed we ought to brainstorm ways we can do this like a facade. We have... [20:08:37] I've got a problem trying to use a modern version of R with SWAP [20:09:28] I've got a problem trying to use a modern version of R with SWAP [20:09:28] I'm on notebook1004 [20:09:28] I installed R 3.6 and the IRkernel following https://github.com/IRkernel/IRkernel [20:09:31] I can start jupyter console using the R 3.6 kernel, it shows up in the jupyter-notebook [20:09:34] dropdown menu, but when I try to connect to it from the notebook I get "kernel died" [20:10:48] groceryheist: you are nathante, yes? [20:10:59] i see this in your juptyer logs [20:11:02] https://www.irccloud.com/pastebin/yPjMNKyR/ [20:11:38] dunno what that is about, but maybe it can help! :) [20:20:40] yeah [20:22:23] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) >If we could encode this in the same way as x-analytics, it'd be more easy to refine into a map in wmf.webrequest. +1 to this... [20:23:20] ooop [20:23:27] ottomata: thanks! I found the solution [20:23:36] It was here:https://stackoverflow.com/questions/37999772/how-to-run-jupyter-rkernel-notebook-with-inline-graphics-on-machine-without-disp#38021352 [20:25:26] gr8! [20:27:30] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Nuria) >The library itself is not determining when to enable or disable sending. It provides a way for the application to sign... [20:33:23] hey fdans [20:33:31] hellooo joal [20:33:32] sorry I was not here when you ping [20:33:36] What's up? [20:33:39] not sure if you saw my email [20:33:55] I can't find the xmldumps directory in order to move it joal [20:33:56] Ah! [20:34:37] Crap I always forget a mid-folder :( Please excuse me fdans [20:34:45] Updating the doc [20:35:12] fdans: Do you want us to do it now, or shall I do it tomorrow morning? [20:35:31] Actually, I need to do it now as tomorrow the merged puppet stuff will run and expect the new folder [20:36:00] joal: I don't think the new path is correct though, no xmldumps there either [20:36:01] /wmf/data/wmf/mediawiki/ [20:36:25] Ah! double mistake [20:36:28] updated [20:36:42] /wmf/data/raw/mediawiki/xmldumps - pfff [20:37:04] joal: is the output directory correct, in wmf? [20:37:36] fdans: same exact folder except for the xml bit in the last name [20:37:46] looks correct in the doc [20:38:08] ok joal thank you I'll carry on with the items on the list [20:38:17] Sorry for the delay fdans :( [20:38:19] Thanks a lot [20:38:34] no worries at all joal :) [20:44:29] joal, the job I restarted it before doing all this [20:44:40] since it doesn't materialize until october this is fine right? [20:44:49] i mean until tomorrow [20:45:18] pls ignore if you're already out [20:46:28] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10jlinehan) >>! In T228175#5624674, @Nuria wrote: > Let's talk more about this, in this case (offline/online) the eventlogging c... [22:38:58] I just went down the rabbithole of EventLogging validation errors, and had some thoughts. Most of which are probably completely redundant, so apologies in advance for the noise... [22:40:47] Is there a process in place to monitor EventError's and route them to product owners? I don't see a dashboard in logstash, which is why I ask. [22:42:11] This is just theory, but perhaps there's a difference between occasional schema errors, and a high proportion like 99% of a schema's events getting rejected. [22:43:02] For example, the type of query here https://phabricator.wikimedia.org/T227018#5324825 should maybe be a regular dashboard, and include a column (count / total). [22:43:59] Maybe we start a weekly thing where an email goes to the top 3 schemas by volume and top 3 by proportion of bad messages. [22:45:41] We might consider adding a "maintainer" field to the event schema schema? Or at least compiling a list somewhere, be it email addresses or phab tags. [23:10:18] https://www.irccloud.com/pastebin/CllgZyWO/ [23:47:57] awight: we have maintainers for schemas in the talk pages , we try to notify schema owners of events with error rates that are real high, most of those come from the apps, from older versions that are deployed and no longer updated [23:48:33] awight: other than that when eventerror spikes we create tickets for those events and follow progress with owners [23:51:47] Talk pages--of course, that makes perfect sense! [23:52:43] awight: but there is little unknown on the errors until spikes show up, media viewer errors have no owner and old app events will not be fixed [23:53:40] awight: we could surface those more but what we really want to focus our efforts on is making sure new spikes on errors alarm properly and are easily findable [23:57:23] awight: do file a ticket though, there is usefulness on that [23:58:01] thanks, it sounds like a hard monitoring problem. Asking cos I ran across a stream that had gone dead last week, and as a layperson it took a minute to find the validation error. [23:58:59] Ah I found something useful to fix :-) There's an "eventlogging" dashboard in logstash, but it's empty. [23:59:04] gtg!