[00:43:45] (03PS1) 10Razzi: Update superset to f19f2c3 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/677055 [02:02:59] nuria: EventBus? https://github.com/wikimedia/mediawiki-extensions-eventbus [02:45:04] 10Analytics-Clusters, 10Analytics-Kanban: Upgrade to Superset 1.0 - https://phabricator.wikimedia.org/T272390 (10razzi) Alright, I reported the annotation text one upstream at https://github.com/apache/superset/issues/13959. I'll look into the other issue another day. [03:05:40] ottomata: i forgot EVERYTHING [05:45:11] 10Analytics-Clusters, 10Analytics-Kanban, 10DBA, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) That's excellent, are we good to close this? [06:21:35] good morning [06:28:29] 10Analytics-Clusters, 10Analytics-Kanban, 10DBA, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10elukey) Almost! There are a couple of things left: - clouddb1021 is still running with icinga notifications disabled, plus t... [06:34:50] 10Analytics, 10WMDE-Analytics-Engineering, 10Patch-For-Review: wmde-toolkit-analyzer-build.service fails on stat1007 - https://phabricator.wikimedia.org/T278665 (10elukey) I have removed the systemd timer for the time being, it was constantly failing and causing alerts. [06:35:35] 10Analytics-Clusters, 10Analytics-Kanban, 10DBA, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) Sounds good @elukey - thanks! [07:21:36] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10elukey) @srodlund hi! Thanks a lot, I am currently still writing the first draft, it got deprioritized due to... [07:25:18] Hi team - today I'll work only from standup onward, it's my round carin the kids [07:29:50] joal: bonjour, ack :) [08:28:39] (03CR) 10Awight: "I would like to have this bucketing and escaping stuff kept somewhere reusable. A java plugin for hive?" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/676297 (https://phabricator.wikimedia.org/T279046) (owner: 10Awight) [08:53:37] this is very nice [08:53:38] Caused by: org.apache.hadoop.yarn.exceptions.YarnException: User dr.who does not have privilege to see this application application_1617698841597_0003 [08:53:48] dr.who is the user for yarn.wikimedia.org [08:54:05] of course we don't enforce any auth in there so we use "dr.who" as proxy :D [09:02:00] fixed, dr.who needs to be a yarn admin :) [09:03:03] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Review the Yarn Capacity scheduler and see if we can move to it - https://phabricator.wikimedia.org/T277062 (10elukey) @JAllemandou these are the settings now, lemme know what needs to be tuned :) ` # Global config # Maximum numbe... [09:11:30] all right new settings for the capacity scheduler deployed in hadoop test, basic tests are all working [09:11:40] standby for Joseph's and Andrew's review :) [09:16:19] 10Analytics-Radar, 10Patch-For-Review, 10WMDE-TechWish-Sprint-2021-03-31: Broken reportupdater queries: edit count bucket label contains illegal characters - https://phabricator.wikimedia.org/T279046 (10awight) [09:32:52] 10Analytics-Radar, 10Patch-For-Review, 10Unplanned-Sprint-Work, 10WMDE-TechWish-Sprint-2021-03-31: Broken reportupdater queries: edit count bucket label contains illegal characters - https://phabricator.wikimedia.org/T279046 (10awight) I wasn't sure whether this is unplanned or planned work, so errored tow... [09:36:33] 10Analytics-Clusters, 10Patch-For-Review: Upgrade the rest of the Hadoop test cluster to Buster - https://phabricator.wikimedia.org/T278422 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-coord1002.eqiad.wmnet'] ` The log can be found in `/var/lo... [09:37:10] !log reimage an-coord1002 to Debian Buster [09:37:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:37:23] (this is the standby one, not the active :) [09:48:42] (03PS2) 10Hnowlan: Add makefile and dockerfile for local tests [analytics/aqs] - 10https://gerrit.wikimedia.org/r/676402 [10:14:42] (03CR) 10Hnowlan: Add makefile and dockerfile for local tests (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/676402 (owner: 10Hnowlan) [10:19:26] 10Analytics-Clusters, 10Patch-For-Review: Upgrade the rest of the Hadoop test cluster to Buster - https://phabricator.wikimedia.org/T278422 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-coord1002.eqiad.wmnet'] ` and were **ALL** successful. [10:19:43] perfect, an-coord1002 up and running with debian [10:53:18] * elukey lunch! [12:20:06] 10Analytics, 10netops: Audit analytics firewall filters - https://phabricator.wikimedia.org/T279429 (10elukey) [12:27:46] 10Analytics, 10netops: Audit analytics firewall filters - https://phabricator.wikimedia.org/T279429 (10ayounsi) There is also a term permitting UDP fragments, I added a "count" to know if/why we're using it. [12:52:01] (03CR) 10Fdans: "last changes applied, merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/658348 (https://phabricator.wikimedia.org/T265732) (owner: 10Fdans) [12:52:10] (03CR) 10Fdans: [V: 03+2 C: 03+2] Add monthly pageview complete job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/658348 (https://phabricator.wikimedia.org/T265732) (owner: 10Fdans) [13:06:04] 10Analytics, 10Event-Platform, 10Inuka-Team: InukaPageView Event Platform Migration - https://phabricator.wikimedia.org/T267344 (10SBisson) @Ottomata events from local dev environment (with version 0.0.0) should all be coming in through the new system but this hasn't been released to the stores yet. I'll let... [13:07:42] 10Analytics-Clusters, 10Patch-For-Review: Upgrade the Hadoop coordinators to Debian Buster - https://phabricator.wikimedia.org/T278424 (10elukey) The an-coord1002 host is now running Buster, puppet looks working (already tested in Hadoop test). The only thing that I had to do was to chown mysql:mysql /srv/sqld... [13:46:29] 10Analytics-Clusters, 10Patch-For-Review: Upgrade the Hadoop coordinators to Debian Buster - https://phabricator.wikimedia.org/T278424 (10Ottomata) > I know that @Ottomata has some reservations about /srv so I'll not proceed :) Oh oh oh, no no my reservations are not valid here. We use /srv at WMF and we shou... [13:49:45] hey all [13:51:46] hewo [13:53:25] 10Analytics-Clusters, 10Analytics-Kanban, 10DBA, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Milimetric) > - @JAllemandou may have some performance questions to add related to indexes IIRC, leaving a note in here to re... [13:54:50] 10Analytics-Clusters, 10Analytics-Kanban, 10DBA, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) >>! In T269211#6974963, @elukey wrote: > Almost! There are a couple of things left: > > - clouddb1021 is still r... [13:56:37] 10Analytics, 10Cloud-Services, 10Data-Persistence (Consultation): Sqoop on multi-instance clouddb1021 is very slow for some tables - https://phabricator.wikimedia.org/T279095 (10Marostegui) We can try to give wikidata (s8) more memory and remove it from some other sections, ie (from s5 and s6) That table its... [14:53:44] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10srodlund) Sound great! Looking forward to reading it! [15:00:56] 10Analytics: Data drifts between superset_production on an-coord1001 and db1108 - https://phabricator.wikimedia.org/T279440 (10elukey) [15:05:10] elukey: when you said that git is used to deploy aqs in the analytics namespace rather than scap, how does that work? [15:08:57] hnowlan: IIRC puppet runs and ensure latest on the repo [15:13:43] ah nice [15:16:28] Hi team, good morning [15:20:20] 10Analytics: Review the usage of dns_canonicalize=false for Kerberos - https://phabricator.wikimedia.org/T278353 (10elukey) Summary for puppet: We need to avoid canonicalization for krb service principals like analytics-hive.eqiad.wmnet, since they are CNAMEs to specific hosts. For example, hosts behind `analyt... [15:20:24] mornin! [15:21:48] dcausse: o/ have a fix for event-utilities. ResourceLoader + BasicHttpClient didn't actually work properly, because non 2xx response body was being used! [15:22:04] so a 404 to a schema uri would try to make a Json schema out of a 404 response body [15:22:05] https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/677287 [15:22:43] ottomata: o/, looking [15:33:29] actually got a couple of changes to make to BasicHttpClient after reading again, but basically the same idea [15:35:59] trying to understand why it happened, I guess it's because previously it used the guava Resources and failed on 404? [15:36:08] yeaeh [15:36:13] i think so [15:37:36] dcausse:, pushed new patch, slight change to BasicHttpClient [15:37:54] refreshing [15:40:44] heya a-team, it looks like many of us have an SRE interview loop meeting at the same time as standup today [15:40:53] perhaps we should just cancel? [15:41:06] ah, didn't know, sure [15:41:19] whenever joal is around I can sync up on gobblin with him [15:41:22] it goes into to staff, so I guess cancle that too? fdans ? [15:58:10] Heya milimetric - let's sync after the circle-up meeting? [15:58:18] yep, ping me [16:05:31] So no standup meeting? [16:05:46] correct klausman - a lot of the team is in the circle-up [16:08:06] Alrighty, see you in the staff meeting, then? [16:09:00] klausman: we'll be 15mins late, but I'd like to have this as I have arrangement to make in the near future I'd like the team to know [16:11:14] Sure, np. I can always stare at Puppet and k8s some more :) [16:15:23] klausman: you staring at k8s makes me feel I'm not alone in seeing the emoji in this name :) [16:38:47] dcausse: why UncheckedIOException for an unknown exception in getAsBytes? [16:39:58] are we doing some quick standup or just skip to tomorrow? [16:40:07] and or staff? [16:40:19] brb in 5 :) [16:40:27] also dcausse, q about the constructors that read from a resource comment. not sure I undertsand that one [16:40:34] we're in staff meeting with Tobias [16:56:18] milimetric: cave? [17:02:50] ottomata: for UncheckedIOException I think it's never an "unknown exception", httpClient always throw an IOException (this comment only applies if you decide to restrict BasicHttpResponse#exception to IOException) [17:06:08] for the constructor thing, when I read: new Thing(resource); I understand that "resource" is kept as a reference [17:06:33] which feels wrong if resource is closed just after "new Thing(resource)" [17:07:23] (it's all about code-style) nothing is incorrect in what you wrote [17:10:47] dcausse: ah ok, what if i changee that from a constructor to a static factory method? [17:10:53] that probably makes more sense eh? [17:11:03] same args, just not a constructor? [17:11:32] ottomata: yes it would [17:11:45] cool, and i'll see about just doing IOException. ty [17:16:15] Hello everyone, I tried the new Jupyter hub, but I can't get past `Error: [Errno 30] Read-only file system: '/run/jupyter-urbanecm-singleuser'`. What am I doing wrong? :-) [17:22:02] 10Analytics: Produce a list of wiki projects ranked by number of eligible voters in Board elections - https://phabricator.wikimedia.org/T278815 (10JAllemandou) There you go @Qgil :) ` spark.sql(""" WITH base_data AS ( SELECT wiki_db, event_user_id, MAX(event_user_revision_count) AS... [17:27:42] Urbanecm: that's a strange error, at one point do you get that? [17:27:49] and on which stat box? [17:28:39] ottomata: stat1005. I gave it my LDAP password, clicked "Start", and then the error showed [17:28:56] when i clicked start here, the error showed https://usercontent.irccloud-cdn.com/file/Xof1r8j9/image.png [17:29:09] so it looks like this https://usercontent.irccloud-cdn.com/file/3i9kDks3/image.png [17:29:20] ottomata: is that helpful? [17:32:52] Urbanecm: can you try again ? i'm watching logs [17:32:59] sure. [17:33:17] ottomata: done, same error. [17:33:31] i can try to log out log in if you think that can be helpful [17:33:58] i see it, very strange. Can you try on a different stat box? [17:35:04] ottomata: sure. Any particular box you want me to use? [17:35:23] any is fine, just want to see if it happens to you in multiple places or if this is just a stat1005 issue [17:36:07] okay, trying with stat1006 [17:36:41] and i have the same error there [17:36:56] and i also tried it in inkognito window, so it should not be affected by stat1005-issued cookies [17:37:24] huh very weird [17:38:38] ok i have to go afk for a bit, very strange that it works for me and others, i wonder if there is some reason your account is using /run...maybe some user specific setting you have? not sure, will be back and try it [17:38:41] in a little while [17:38:47] sure [17:38:56] ottomata: should i create a phab ticket about it? [18:32:28] * elukey afk! [18:32:53] razzi: I am going to open the tasks tomorrow if you don't mind, got caught into an incident in #operations, going to dinner now! [18:33:15] razzi: do I see you trying jupyteer on stat1005 too? [18:33:22] elukey: ok have a good dinner! [18:33:34] ottomata: yeah, figured I'd see if I could repro the error, and I did! [18:33:37] yeah! [18:33:38] great! [18:33:41] let's figure that out then [18:33:42] Read-only file system: '/run/jupyter-razzi-singleuser' [18:33:52] yeah i see that too was still tailing logs [18:34:45] hmm ok so this is def jupyterhub failling...dunno why for me it isn't [18:34:46] hm [18:35:27] oh i got it to repro too after logging out and back in! [18:35:49] i can see that systemd is not configure to allow jupyterhub-conda to write to /run [18:35:54] but why was it working before but not now? [18:36:15] so it sounds that I'm not the only one who can't use the new system? [18:36:22] yup, all 3 of us now :D [18:36:28] that's good in a way :D [18:36:34] ya [18:36:36] * Urbanecm doesn't like Urbanecm-specific bugs [18:36:45] :) [18:39:49] weird i just restarted jupyterhub-conda and now am getting traitlets.traitlets.TraitError: 'options_form' is not a trait of 'CondaEnvProfilesSpawner' instanc [18:39:51] what has changed?? [19:00:28] * razzi taking a lunch break [19:13:04] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Product-Analytics, 10Research: New anaconda-wmf release with updated packages - https://phabricator.wikimedia.org/T271960 (10Ottomata) 05Open→03Resolved [19:13:06] 10Analytics, 10Patch-For-Review: Prep for replacing jupyter conda migration - https://phabricator.wikimedia.org/T262847 (10Ottomata) [20:06:42] 10Analytics-Clusters: Icinga/MegaRAID alert on an-worker1100 - https://phabricator.wikimedia.org/T279475 (10razzi) [20:26:59] (03CR) 10Razzi: "This was to see if the latest superset had fixed some formatting issues; unfortunately this is not the case, so I opened https://github.co" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/677055 (owner: 10Razzi) [20:27:23] (03Abandoned) 10Razzi: Update superset to f19f2c3 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/677055 (owner: 10Razzi) [20:33:41] razzi: Hi, could you hang out in -operations. There is an alert related to the latest merge [20:33:52] the victorops-analytics group thing [20:34:01] icinga doesnt like the config [20:34:08] yep, thanks mutante, on my way [20:34:16] thanks [21:01:53] 10Analytics, 10Analytics-Kanban: Jupyter conda bug: cannot spawn new server - https://phabricator.wikimedia.org/T279480 (10Ottomata) [21:02:00] Urbanecm: i filed https://phabricator.wikimedia.org/T279480 [22:15:04] 10Analytics, 10Analytics-Kanban: Jupyter conda bug: cannot spawn new server - https://phabricator.wikimedia.org/T279480 (10Ottomata) Ah, this is why /run is being used (since the rrercent anaconda-wmf release) https://github.com/jupyterhub/systemdspawner/commit/1b83c2738f4a3fa34b5242da9cb098a04901e6dc [22:34:32] !log rebalance kafka partitions for webrequest_text_13,14 [22:34:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [23:08:02] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Analytics, and 2 others: Switch mw.user.sessionId back to session-cookie persistence - https://phabricator.wikimedia.org/T223931 (10Krinkle) 05Resolved→03Open >>! In T223931#6976622, @TheDJ wrote: > Shouldn't this cookie have been added t...