[00:36:04] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Cleanup, 10Operations, 10Traffic, and 4 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) [00:53:55] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Cleanup, 10Operations, 10Traffic, and 4 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) [00:58:19] (03PS1) 10Jforrester: production: Drop SkinPerPage, EUCopyrightCampaign, and EUCopyrightCampaignSkin [tools/release] - 10https://gerrit.wikimedia.org/r/574617 (https://phabricator.wikimedia.org/T238803) [01:00:19] (03CR) 10Jforrester: [C: 03+2] "I9bef107bf534fbf9fed943e8931312a8e60bce56 is live." [tools/release] - 10https://gerrit.wikimedia.org/r/574617 (https://phabricator.wikimedia.org/T238803) (owner: 10Jforrester) [01:00:47] (03Merged) 10jenkins-bot: production: Drop SkinPerPage, EUCopyrightCampaign, and EUCopyrightCampaignSkin [tools/release] - 10https://gerrit.wikimedia.org/r/574617 (https://phabricator.wikimedia.org/T238803) (owner: 10Jforrester) [01:05:16] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Cleanup, 10Operations, 10Traffic, and 4 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) [01:06:31] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Cleanup, 10Operations, 10Traffic, and 4 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) [01:07:39] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Cleanup, 10Operations, 10Traffic, and 4 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) [01:09:07] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Cleanup, 10Operations, 10Traffic, and 4 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) [01:12:36] 10Release-Engineering-Team-TODO, 10Cleanup, 10Operations, 10Traffic, and 3 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10CCicalese_WMF) Thank you for all of your work on this, @Jdforrester-WMF! [01:12:50] 10Release-Engineering-Team-TODO, 10Cleanup, 10Operations, 10Traffic, and 3 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) a:05Jdforrester-WMF→03None This is now done as much as we can at RelEng's side. Assigning back over to CPT for the task tr... [01:15:40] 10Release-Engineering-Team-TODO, 10Cleanup, 10Operations, 10Traffic, and 3 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10CCicalese_WMF) a:03CCicalese_WMF [01:19:53] 10Deployments, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar): Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Jdforrester-WMF) The idea of using mtime was always to avoid having to inspect the contents of the IS file itself. M... [02:39:02] 10Continuous-Integration-Infrastructure: beta-mediawiki-config-update-eqiad no works - https://phabricator.wikimedia.org/T245770 (10Zoranzoki21) 05Resolved→03Open This happens again. See: {F31629986} [02:52:10] Hey, I was wondering if i could be added to the deployment-prep group? Basically I just want access to logstash-beta.wmflabs.org in order to check to see if there are any csp error reports i don't know about on beta wikis [03:26:03] bawolff: it's viral. anyone in this group can do so via horizon. https://openstack-browser.toolforge.org/project/deployment-prep#admins [03:26:47] Cool, as someone in the group, can you infect me? :P [03:38:08] 10Deployments, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar): Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Krinkle) If I understand @thcipriani 's conclusion correctly, the problem is that code execution from disk (opcache)... [03:55:52] 10Phabricator, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10User-brennen: Dockerize our Phabricator development environment - https://phabricator.wikimedia.org/T245575 (10mmodell) Do you think we have a good base-image to use for this? Phab works with Apache or ngi... [07:59:18] 10Continuous-Integration-Infrastructure, 10Wikimedia-GitHub, 10User-MarcoAurelio: Reenable tests for github.com/wikimedia/texvcjs - https://phabricator.wikimedia.org/T245344 (10Physikerwelt) 05Open→03Resolved a:03MarcoAurelio @MarcoAurelio thank you. Works great. @Jdforrester-WMF for me Travis works... [08:55:12] 10Deployments, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar): Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Joe) >>! In T236104#5914485, @Krinkle wrote: > If I understand @thcipriani 's conclusion correctly, the problem is t... [09:01:10] 10Deployments, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar): Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Joe) I think we need to go at this another way. We should refresh the json cache only when `filemtime('InitialiseSet... [09:01:15] 10Project-Admins, 10Community-Tech: Create a Phabricator project tag for "Commons deletion notification bot" - https://phabricator.wikimedia.org/T229759 (10Aklapper) Sigh, I cannot even find out where its code base is when looking at https://meta.wikimedia.org/wiki/Community_Tech/Commons_deletion_notification_bot [10:04:18] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Tobi_WMDE_SW) >>! In T706#5906592, @Aklapper wrote: >>>! In T706#5880750, @Tobi_WMDE_SW wrote: >> can you please add @Lena_WMDE to the #acl_project-admins group? > > Hi, I've added... [10:35:35] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Port scap to Python 3 - https://phabricator.wikimedia.org/T246025 (10LarsWirzenius) p:05Triage→03Medium [10:56:12] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Fix parent permissions; inherit from `operations/software` [software/nss-dnsdc] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/574147 (owner: 10MarcoAurelio) [10:56:17] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "Thanks!" [software/nss-dnsdc] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/574147 (owner: 10MarcoAurelio) [11:05:40] 10Release-Engineering-Team, 10Operations, 10serviceops: mcrouter proxies and scap proxies - https://phabricator.wikimedia.org/T245841 (10jijiki) If there are no objections, I would like to proceed with this [11:13:06] 10Project-Admins, 10Community-Tech: Create a Phabricator project tag for "Commons deletion notification bot" - https://phabricator.wikimedia.org/T229759 (10Samwilson) The source code is https://github.com/wikimedia/CommonsNotifier I've added a few links on Meta to make it easier. [12:16:48] 10Release-Engineering-Team, 10MediaWiki-General, 10Regression: All logged out UI is English - https://phabricator.wikimedia.org/T246095 (10TheDJ) [13:09:00] 10Release-Engineering-Team, 10MediaWiki-General, 10Regression: All logged out UI is English - https://phabricator.wikimedia.org/T246095 (10Pcoombe) [13:20:41] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, 10Regression: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Pcoombe) [13:23:24] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, 10Regression: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Lucas_Werkmeister_WMDE) >>! In T246071#5915117, @Pcoombe wrote: > https://gerrit.wikime... [13:34:50] !log set number of replicas for es indices to 0 on deployment-logstash03 [13:34:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:51:56] 10Phabricator: Weekly phabricator-reports mail: List active Herald rules authored by users not recently active - https://phabricator.wikimedia.org/T246105 (10Aklapper) p:05Triage→03Low [14:01:24] Would be very great if someone could look at / deploy the UBN patch in https://phabricator.wikimedia.org/T246071 [14:02:22] hmm, that sounds very bad if it happens to logged out users [14:02:45] it is very bad, I'd say it should be deployed ASAP [14:03:27] 10Phabricator: Weekly phabricator-reports mail: List active Herald rules authored by users not recently active - https://phabricator.wikimedia.org/T246105 (10Aklapper) (Realized this when looking at `H313` whose author was last active in April 2019.) [14:03:35] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Bawolff) Is this causing cache pollution? If so, this should be merged ASAP to limit the... [14:06:40] yikes, didn’t think about the cache impact [14:07:11] I think I can deploy it now, jouncebot says nothing else happening at the moment [14:07:30] Lucas_WMDE: that would be great... I was waiting for someone to say go ahead, but this seem severe enough to just do it [14:07:57] Thank you. [14:09:33] Hmm, not sure if its actually causing cache pollution. Haven't been able to reproduce cache pollution so far. maybe vary: accept-language actually is respected by varnish [14:10:11] in that case it sounds like it might cause the cache to blow up though [14:10:19] if it has to cache many copies of the same page for different Accept-Language headers… [14:11:35] Well the site hasn't gone down, so it can't be that bad ;) [14:13:14] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Nikerabbit) Either caches are being polluted, or they are getting split (Vary: Accept-Lan... [14:14:02] bawolff: yeah, I just came to the same conclusion [14:14:21] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Bawolff) I just did some quick testing with curl. At a glance, it looks like varnish is r... [14:14:31] bawolff: interesting to look at metrics afterwards, how much of an impact this caused [14:15:39] I’d also be interested [14:15:52] because in principle “respect Accept-Language” sounds like a nice thing ^^ [14:15:53] +1 [14:16:36] Accept-language does have some problems to being used in general. Often its different from what users expect it to be, and they don't know how to change it [14:17:02] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Lucas_Werkmeister_WMDE) Hm. The config change was deployed, and it seemed to do the right... [14:17:06] Although maybe better than nothing for a multilingual site like meta [14:17:10] + it needs normalization [14:18:05] I do like the commons approach though where they put up a little notice, "do you want to switch to language X?" [14:18:21] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Nemo_bis) >>! In T246071#5915756, @Lucas_Werkmeister_WMDE wrote: > Hm. The config change... [14:20:23] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Lucas_Werkmeister_WMDE) Yup, now it’s Chinese too. [ruwiki](https://ru.wikipedia.org/wiki... [14:24:30] I’m still seeing English UI when I visit the start pages of several Wikipedias in private mode :S [14:24:47] I wonder if there’s cache pollution only *now*, when MediaWiki is (I assume) no longer sending that Vary: Accept-Language? [14:25:03] https://grafana.wikimedia.org/d/000000550/mediawiki-application-servers?orgId=1&from=now-24h&to=now increased CPU usage, slow response rate up, more responses coming through [14:25:19] Lucas_WMDE: hmm quite possible depending how it is done... does ?action=purge fix it? [14:25:31] Lucas_WMDE: Does the x-cache header show it as a cached response? [14:25:32] huh [14:25:34] it does not [14:25:44] *purge doesn’t fix it [14:25:56] x-cache does show it as a hit [14:26:00] And is it a 304 not modified or a 200 ok [14:26:18] 200 ok [14:26:26] fawiki is the one I’m trying it with atm [14:26:37] Lucas_WMDE: do you have a language cookie in that session? [14:27:12] I don’t think so [14:27:27] Can confirm, i also get https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C as english language :( [14:27:44] hm, but with `curl` it seems to be in Farsi as it should be [14:27:52] farsi to me as well [14:28:56] this makes no sense [14:29:00] but I didn't visit it before [14:29:00] https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C?action=abc gives me an English error message [14:29:01] are you using same accept-encoding with curl? (I have no idea what curl default is) [14:29:11] the pages I visited before still seem to serve bad version [14:29:15] surely that error page isn’t cached anywhere [14:29:20] bawolff: good point [14:29:41] yeah, if I add -H 'Accept-Language: en', I get English too [14:29:45] with curl [14:30:02] so if something was cached with Accept-Language, it is still being served from cache, it looks [14:30:40] hmm no, also happens on random page feature [14:30:46] not sure what is happening [14:30:57] no, this is definitely an uncached page for the error message [14:31:11] and it still seems to be respecting accept-language [14:31:29] $wgULSLanguageDetection is false on fawiki according to shell.php though [14:31:35] huh [14:31:35] I still see vary: Accept-Encoding,Accept-Language,Cookie,Authorization [14:32:06] I’ll check an actual mw* appserver [14:32:43] hm, do those not have mwscript? [14:32:50] I seem to pretty consistently being served by mw1332 [14:33:34] I’m getting different servers, I think [14:33:41] e. g. mw1267, mw1327 [14:34:35] curl -I -H 'accept-language: fr' 'https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C?action=abcd' seems to be working ok though [14:34:45] but just abc is not [14:35:27] I guess i'm leaning towards the patch not being on all servers? [14:35:41] I think so [14:35:55] *now* I’m getting Russian interface on ruwiki with ?action=garbage [14:36:00] except with ?action=abc [14:36:16] maybe ?action=abc is cached. I see age: 254 [14:36:19] so I guess ?action=abc is cached somewhere [14:36:21] yeah [14:36:50] anyone from traffic team online? [14:36:53] oh yeah, that response still has Vary: Accept-Language too [14:37:13] and ?action=abcd doesn’t (only Vary Accept-Encoding,Cookie,Authorization [14:37:18] Yeah, and for example, language de is returning farsi: curl -H 'accept-language: de' 'https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C?action=abc' [14:37:45] ok, so maybe Varnish doesn’t revalidate those cached entries? [14:38:15] and ?action=purge won’t send enough purges to Varnish for all the languages? [14:38:33] i think purges only do canonical urls, so it wouldn't hit ?action=abc anyways [14:38:37] oh right, it most likely won't purge vary: accept-language variants [14:38:55] bawolff: yeah but I still get English on arwiki main page for instance [14:39:09] (in browser, haven’t tried curl) [14:39:09] Looking at https://grafana.wikimedia.org/d/000000550/mediawiki-application-servers?orgId=1&from=now-2d&to=now there is no drop similar to the increase from when this was introduced [14:39:32] in fact it's looking as if it is getting worse atm [14:39:56] I don't anymore (Arwiki main page) in my browser [14:40:47] then we probably have different Accept-Language headers [14:41:05] Lucas_WMDE: What's your accept-language and accept-encoding headers? [14:41:45] in Firefox, en-US,en;q=0.5 and gzip, deflate, br [14:41:57] Nikerabbit: on the other hand icinga reports being happier for https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&var-method=GET [14:41:57] with curl and just Accept-Language: en I get Arabic [14:42:42] Nemo_bis: promising [14:43:42] aaand arwiki is Arabic to be now [14:43:42] Krinkle: FYI discussion above, it likely affects performance metrics as well [14:44:04] I can confirm that curl -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' 'https://ar.wikipedia.org/wiki/%D8%A7%D9%84%D8%B5%D9%81%D8%AD%D8%A9_%D8%A7%D9%84%D8%B1%D8%A6%D9%8A%D8%B3%D9%8A%D8%A9' | gunzip - | less gives me arabic [14:44:26] (s/to be/to me/ in my last message) [14:44:41] interetingly it only seemed to start going down after second sync-file by Amir some minutes ago [14:44:53] hm [14:44:58] https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-24h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&var-method=GET [14:45:05] hmm, maybe it just took a while to fall out of front caches. I'm not sure I still really understand the varnish purging architecture as much as i used to [14:45:05] it's very clear from this graph [14:45:27] bawolff: yeah, everything seems to use the content language for me now [14:45:32] unless it's 30 min cache thing [14:45:34] https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C?action=abc needed a Ctrl+F5 first but then it went back [14:45:53] (back to Farsi, that is, from an English version that was presumably cached in-browser) [14:46:55] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Lucas_Werkmeister_WMDE) Current status seems to be that the appservers behave correctly n... [14:47:11] I did sync the right file though, I think [14:47:17] wmf-config/InitialiseSettings.php [14:48:08] and Amir synced a Wikibase dir, not sure why that would affect wmf-config o_O [14:48:45] unless that was what actually reduced the load [14:48:48] Race condition with some sort of opcode cache perhaps? [14:49:40] T236104 [14:49:41] ? [14:49:41] T236104: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 [14:50:48] sounds like that’s it [14:51:07] pheew [14:51:21] thanks Lucas_WMDE and bawolff, I guess I monitor the graphs for a while just in case [14:52:35] thank you. [14:52:38] avg. response time is now almost back at the pre-issue levels [14:52:52] any follow-ups needed? [14:53:26] does this deserve incident documentation? [14:53:41] (very foolish of me to ask, I don’t really want to write it ^^) [14:54:07] Lucas_WMDE: prolly good idea for learning, didn't cause outage but could have [14:57:18] ok, I’ll start [14:57:32] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Nikerabbit) It looks like we hit {T236104}. Looking at graphs things seems to be returni... [14:58:12] Lucas_WMDE: okay, I can fill in the rest [14:58:46] 10Beta-Cluster-Infrastructure, 10Operations, 10observability, 10serviceops: Stream a subset of mediawiki apache logs to logstash - https://phabricator.wikimedia.org/T244472 (10jijiki) After a lot of fiddling with @herron, we are finally at this https://phabricator.wikimedia.org/P10513 ! The resource field... [15:02:21] page is at https://wikitech.wikimedia.org/wiki/Incident_documentation/20200225-mediawiki_interface_language, though I’m still editing it [15:02:40] Thanks everyone! lmk if you want any input from me for the incident documentation. I was looking at the bug but didn't realise it had potential impact on site stability or would have tried to escalate further [15:08:40] Lucas_WMDE: let me know when you have stopped [15:10:57] I think I’m done for now [15:11:27] Nikerabbit: ^ [15:11:53] oops, I have the wrong task number in the Conclusions [15:20:16] 10Deployments, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar), 10Wikimedia-Incident: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Lucas_Werkmeister_WMDE) Adding #wikimedia-incident for [20200225-mediawiki interface languag... [15:30:16] the timeline still confuses me [15:30:28] the spike in Grafana is at 00:46 [15:30:48] which is *before* the SAL entry “Remove all IS config related to the fixcopyrightwiki wiki”, 00:53 [15:31:26] Lucas_WMDE: there may be spikes during deployments, didn't they deploy something before that already? [15:31:59] well, “spike” is maybe the wrong word [15:32:11] e. g. avg response time went from ~180ms to ~230ms at that time [15:32:15] and then stayed there for a long time [15:32:18] https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=1582581600000&to=1582642800000&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&var-method=GET [15:32:56] Nikerabbit: there were some other syncs before that, yes, I just wouldn’t have expected those to cause the issue [15:33:00] 0:46 is the dblists/ sync [15:33:08] https://tools.wmflabs.org/sal/production?p=0&q=Synchronized&d=2020-02-25 [15:34:23] hmm [15:36:34] Lucas_WMDE: you're right, I can't explain [15:37:07] I’ll add it to the timeline, at least [15:40:10] unless sync-wikiversions also synced the settings files somehow [15:40:40] or if that sync caused something else to be slow [15:41:02] 10Deployments, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar), 10Wikimedia-Incident: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Krinkle) >>! In T236104#5914734, @Joe wrote: >>>! In T236104#5914485, @Krinkle wrote: >> * I... [15:41:05] and why did avg time really start going up at 12.30 [15:45:52] no idea [15:45:56] added that to the timeline too, though [15:46:03] 10Deployments, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar), 10Wikimedia-Incident: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Krinkle) >>! In T236104#5914744, @Joe wrote: > We should refresh the json cache only when `f... [15:46:52] Lucas_WMDE: logging happens when the sync is done. it starts earlier on most servers. [15:47:07] plus there's the delay of the config cache which means some servers only get it later or on the second attempt. [15:47:20] not sure if that helps :) [15:47:34] Krinkle: the log entry says the duration was 55s though, that wouldn’t be enough to explain the full difference [15:47:58] I could see the “second attempt” part, but then the response time should rise later, not earlier :D [15:48:06] I don’t think scap can violate causality ;) [15:51:27] (03CR) 10Jforrester: [C: 03+2] Make changes to MW_VERSION, not $wgVersion [tools/release] - 10https://gerrit.wikimedia.org/r/573646 (https://phabricator.wikimedia.org/T212738) (owner: 10Jforrester) [15:51:45] Lucas_WMDE: each sync there updates IS.php timestamp, giving it another chance at being applied [15:56:51] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Jdforrester-WMF) Eurgh. Sorry about this, all. I assumed ULS's defaults were deploy-safe... [15:57:57] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Lucas_Werkmeister_WMDE) Incident documentation is here: https://wikitech.wikimedia.org/wi... [16:00:57] 10Deployments, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar), 10Wikimedia-Incident: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Joe) >>! In T236104#5916074, @Krinkle wrote: >>>! In T236104#5914734, @Joe wrote: >>>>! In T... [16:01:03] I set the incident doc to review status btw [16:01:19] I think I’m done with it now [16:02:57] thanks again Lucas_WMDE [16:03:21] np :) [16:03:22] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Jdforrester-WMF) >>! In T246071#5916140, @Lucas_Werkmeister_WMDE wrote: > Incident docume... [16:03:23] glad to help [16:03:48] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Jdforrester-WMF) 05Open→03Resolved a:03Lucas_Werkmeister_WMDE [16:05:01] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Lucas_Werkmeister_WMDE) > None of the affected users will ever read Tech/News, or have an... [16:06:48] Lucas_WMDE: my guess is that the earlier rise isn't due to the ULS load regression, but rather either an unrelated elevation, or a "standard" elevation after a deployment whilst some caches get repopulated [16:07:07] but yeah it's mysterious indeed [16:07:14] https://grafana.wikimedia.org/d/000000580/apache-backend-timing?orgId=1&from=1582591339587&to=1582592060118 [16:07:23] scroll down there for a more detailed view [16:09:37] it’s interesting that the % of requests ≤ 50ms also went up at the same time [16:11:12] anyways, lunch, I’m back in ~40mins if anyone needs me ^^ [16:13:20] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Nikerabbit) >>! In T246071#5916155, @Jdforrester-WMF wrote: >>>! In T246071#5916140, @Luc... [16:22:01] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Editing-team, 10VisualEditor: Firefox CI tests keep failing in VE - https://phabricator.wikimedia.org/T240955 (10JTannerWMF) Looks like the Release Engineering... [16:22:23] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10VisualEditor, 10Editing-team (Tracking): Firefox CI tests keep failing in VE - https://phabricator.wikimedia.org/T240955 (10JTannerWMF) [16:24:51] (03PS7) 10Krinkle: Make changes to MW_VERSION, not $wgVersion [tools/release] - 10https://gerrit.wikimedia.org/r/573646 (https://phabricator.wikimedia.org/T212738) (owner: 10Jforrester) [16:24:54] (03CR) 10Krinkle: [C: 03+2] Make changes to MW_VERSION, not $wgVersion [tools/release] - 10https://gerrit.wikimedia.org/r/573646 (https://phabricator.wikimedia.org/T212738) (owner: 10Jforrester) [16:25:24] (03Merged) 10jenkins-bot: Make changes to MW_VERSION, not $wgVersion [tools/release] - 10https://gerrit.wikimedia.org/r/573646 (https://phabricator.wikimedia.org/T212738) (owner: 10Jforrester) [16:25:51] (03PS4) 10Jforrester: Drop merge-wmf-branch [tools/release] - 10https://gerrit.wikimedia.org/r/563498 [16:29:12] 10Release-Engineering-Team-TODO, 10Performance-Team, 10Core Platform Team Workboards (Clinic Duty Team), 10MW-1.35-notes (1.35.0-wmf.20; 2020-02-18), and 3 others: Performance regression from Apcu/ExtensionRegistry::loadFromQueue on PHP7 - https://phabricator.wikimedia.org/T187154 (10Krinkle) [16:33:53] 10Beta-Cluster-Infrastructure, 10MediaWiki-API, 10Pywikibot, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Attempt to login fails several times - https://phabricator.wikimedia.org/T224712 (10Xqt) >>! In T224712#5721064, @Xqt wrote: > Again in https://api.travis-ci.org/v3/job/621982242/l... [16:39:42] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Security Concept Review, 10Security-Team, 10SecTeam Discussion: Security Concept Review For new CI - https://phabricator.wikimedia.org/T240943 (10dduvall) Sounds good, @chasemp. Looking forward to giving this another go when we're ready. [16:41:15] (03PS1) 10Jforrester: layout: [mediawiki/extensions/Popups] Publish PHP documentation [integration/config] - 10https://gerrit.wikimedia.org/r/574790 (https://phabricator.wikimedia.org/T242779) [16:43:35] (03CR) 10Jforrester: [C: 03+2] layout: [mediawiki/extensions/Popups] Publish PHP documentation [integration/config] - 10https://gerrit.wikimedia.org/r/574790 (https://phabricator.wikimedia.org/T242779) (owner: 10Jforrester) [16:44:43] (03Merged) 10jenkins-bot: layout: [mediawiki/extensions/Popups] Publish PHP documentation [integration/config] - 10https://gerrit.wikimedia.org/r/574790 (https://phabricator.wikimedia.org/T242779) (owner: 10Jforrester) [16:52:21] !log Zuul: [mediawiki/extensions/Popups] Publish PHP documentation T242779 [16:52:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:52:25] T242779: Add Vector and Popups PHP documentation to doc.wikimedia.org - https://phabricator.wikimedia.org/T242779 [16:54:09] (03PS1) 10Jforrester: Link Popup's PHP documentation [integration/docroot] - 10https://gerrit.wikimedia.org/r/574797 (https://phabricator.wikimedia.org/T242779) [17:25:59] 10Continuous-Integration-Infrastructure, 10cloud-services-team (Kanban): Old cloudvirt (with Intel Xeon) are twice slower than new ones (Intel Sky Lake) - https://phabricator.wikimedia.org/T223971 (10Bstorm) p:05High→03Medium [17:27:09] 10Continuous-Integration-Infrastructure, 10cloud-services-team (Kanban): Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake) - https://phabricator.wikimedia.org/T223971 (10Reedy) [17:27:35] 10Continuous-Integration-Infrastructure, 10cloud-services-team (Kanban): Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake) - https://phabricator.wikimedia.org/T223971 (10Bstorm) This is likely to be fixed with the introduction of a number of things. One is to change to the perf... [17:47:52] PROBLEM - Parsoid on deployment-mediawiki-parsoid10 is CRITICAL: connect to address 172.16.0.141 and port 8000: Connection refused [17:47:52] PROBLEM - Parsoid on deployment-parsoid09 is CRITICAL: connect to address 172.16.5.63 and port 8000: Connection refused [17:57:49] (03CR) 10VolkerE: [C: 03+1] Link Popup's PHP documentation [integration/docroot] - 10https://gerrit.wikimedia.org/r/574797 (https://phabricator.wikimedia.org/T242779) (owner: 10Jforrester) [17:59:24] (03CR) 10Krinkle: [C: 03+2] Link Popup's PHP documentation [integration/docroot] - 10https://gerrit.wikimedia.org/r/574797 (https://phabricator.wikimedia.org/T242779) (owner: 10Jforrester) [18:00:11] (03Merged) 10jenkins-bot: Link Popup's PHP documentation [integration/docroot] - 10https://gerrit.wikimedia.org/r/574797 (https://phabricator.wikimedia.org/T242779) (owner: 10Jforrester) [18:41:11] thcipriani hi, around? :) [19:03:37] Krinkle: Ha. Thanks, was waiting to test and fix that patch. :-P [19:05:00] (03PS1) 10Jforrester: Make Popup's JS documentation go to the sub-directory [integration/docroot] - 10https://gerrit.wikimedia.org/r/574844 [19:05:24] (03CR) 10Jforrester: [C: 03+2] Make Popup's JS documentation go to the sub-directory [integration/docroot] - 10https://gerrit.wikimedia.org/r/574844 (owner: 10Jforrester) [19:06:51] (03Merged) 10jenkins-bot: Make Popup's JS documentation go to the sub-directory [integration/docroot] - 10https://gerrit.wikimedia.org/r/574844 (owner: 10Jforrester) [19:07:45] !log Ran `sudo -u doc-uploader git -C /srv/docroot pull` on doc1001. [19:07:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:09:40] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [19:09:48] Yeah yeah. [19:16:54] 10Beta-Cluster-Infrastructure, 10MediaWiki-API, 10Pywikibot, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Attempt to login fails several times - https://phabricator.wikimedia.org/T224712 (10Dvorapa) The loop in line 3148 does seem correct to me [19:17:17] 10Release-Engineering-Team, 10Release-Engineering-Team-TODO, 10MediaWiki-Docker, 10Developer Productivity, 10User-brennen: Command-line wrapper for interacting with core's docker-compose stack - https://phabricator.wikimedia.org/T246111 (10brennen) p:05Triage→03Medium Yeah, I'm interested in this. B... [19:18:21] 10Beta-Cluster-Infrastructure, 10Privacy: Flush private data on Beta Cluster - https://phabricator.wikimedia.org/T189541 (10Tgr) 05Open→03Declined @Jcross {T77858} is probably the more important / urgent privacy work. Given that no one was bothered about this for the last two years, once T77858 happens we... [19:23:12] 10Deployments, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar), 10Wikimedia-Incident: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10thcipriani) Example code above for anyone interested in fiddling with this: https://github.c... [19:26:40] 10Beta-Cluster-Infrastructure, 10MediaWiki-API, 10Pywikibot, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Attempt to login fails several times - https://phabricator.wikimedia.org/T224712 (10Anomie) >>! In T224712#5913343, @Dvorapa wrote: > Beta Cluster is fixed, but still fails for zh.... [19:30:53] 10Release-Engineering-Team, 10Release-Engineering-Team-TODO, 10MediaWiki-Docker, 10Developer Productivity, 10User-brennen: Command-line wrapper for interacting with core's docker-compose stack - https://phabricator.wikimedia.org/T246111 (10kostajh) > Whoops... not sure what happened to that task, it was... [19:35:08] 10Release-Engineering-Team, 10Release-Engineering-Team-TODO, 10MediaWiki-Docker, 10Developer Productivity, 10User-brennen: Command-line wrapper for interacting with core's docker-compose stack - https://phabricator.wikimedia.org/T246111 (10kostajh) > Go seems fine to me, unless anyone has super strong fe... [19:35:26] 10Release-Engineering-Team, 10Release-Engineering-Team-TODO, 10MediaWiki-Docker, 10Developer Productivity, and 2 others: Command-line wrapper for interacting with core's docker-compose stack - https://phabricator.wikimedia.org/T246111 (10kostajh) [19:47:36] 10Deployments, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10serviceops, and 2 others: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10greg) [19:50:55] 10Gerrit, 10Operations: gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10thcipriani) 05Open→03Resolved a:03thcipriani Moved all the lfs files to a symlinked path under new disk on `/srv/lfs` (thanks @Dzahn): ` thcipriani@gerrit1002:~$ df -h Filesystem Size Used Avail... [19:50:58] 10Gerrit, 10Operations, 10vm-requests: Gerrit VM to test data migration - https://phabricator.wikimedia.org/T239151 (10thcipriani) [20:01:24] 10Release-Engineering-Team, 10Release-Engineering-Team-TODO, 10MediaWiki-Docker, 10Developer Productivity, and 2 others: Command-line wrapper for interacting with core's docker-compose stack - https://phabricator.wikimedia.org/T246111 (10brennen) > PHP has advantages in that it would be a lot easier for pe... [20:03:02] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [20:17:58] !log Beta jobs on Jenkins backed up; doing the restart dance. [20:18:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:21:21] Hmm. [20:21:35] James_F: Usually don't even need to restart [20:21:43] Just kill the jobs [20:22:02] https://integration.wikimedia.org/ci/monitoring/nodes/deployment-deploy01? lists deployment-deploy01 as running `quibble-vendor-mysql-php72-docker` job for 13 days. [20:22:10] Reedy: Yeah, I know. :-) [20:23:50] Possibly in this case we actually do need to restart it. [20:37:28] James_F: Could i be added to deployment-prep project so i can see logstash-beta.wmflabs.org ? [20:37:37] Err. [20:37:39] Probably. [20:37:47] That's a thing I do through Horizon, right? [20:37:53] I believe so [20:37:55] * James_F doesn't use WMCS stuff much. [20:38:06] Appearently anyone in the project has the right to add other people [20:38:19] https://openstack-browser.toolforge.org/project/deployment-prep [20:38:30] err, wait [20:38:33] Maybe someone added me [20:38:38] As I am listed as a user there [20:38:50] * bawolff whined about it last night but i didn't think anyone responded [20:39:44] So umm, sorry, nevrmind :) [20:39:48] * James_F grins. [20:39:55] Happy to, err, not help. :-) [20:40:56] lol [20:41:00] You were so fast, you solved it in negative time ;) [20:43:38] !log Running a manual sync for Beta Cluster. [20:43:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:47:47] Huh, of course as soon as I resort to a manual run, beta-scap-eqiad actually triggers. [20:48:53] Project beta-scap-eqiad build #289401: 15ABORTED in 2 min 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/289401/ [21:00:06] !log gerrit: replication start mediawiki/extensions/MediaModeration --wait # New repo [21:00:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:08:28] Project beta-scap-eqiad build #289402: 04FAILURE in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/289402/ [21:19:06] Project beta-scap-eqiad build #289403: 04STILL FAILING in 8 min 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/289403/ [21:29:31] 10Release-Engineering-Team, 10Security-Team, 10Security, 10user-sbassett: Wikimedia deployers audit - https://phabricator.wikimedia.org/T237696 (10sbassett) [21:37:06] Project beta-scap-eqiad build #289404: 04STILL FAILING in 16 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/289404/ [21:37:55] 10Release-Engineering-Team, 10Security-Team, 10Patch-For-Review, 10Security, 10user-sbassett: Wikimedia deployers audit - https://phabricator.wikimedia.org/T237696 (10sbassett) @MoritzMuehlenhoff - patch is up, see above. I wanted to make this task public and give those being removed of their deployment... [21:39:25] 10Release-Engineering-Team, 10Security-Team, 10Patch-For-Review, 10Security, 10user-sbassett: Wikimedia deployers audit - https://phabricator.wikimedia.org/T237696 (10sbassett) [21:49:10] Yippee, build fixed! [21:49:11] Project beta-scap-eqiad build #289405: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/289405/ [21:49:17] 10Release-Engineering-Team, 10Security-Team, 10Patch-For-Review, 10Security, 10user-sbassett: Wikimedia deployers audit - https://phabricator.wikimedia.org/T237696 (10Dzahn) So should we add those users as subscribers here on the ticket so that they get notified? [22:09:11] 10Beta-Cluster-Infrastructure, 10MediaWiki-API, 10Pywikibot, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Attempt to login fails several times - https://phabricator.wikimedia.org/T224712 (10Dvorapa) It happens only sometimes, which is weird, I'll investigate it further [22:14:48] deploying a change to update MW_VERSION to 1.35.0-wmf.21 [22:18:44] (03PS1) 10Jforrester: Follow-up 71454310b0: Fix regex to not double-up the 's [tools/release] - 10https://gerrit.wikimedia.org/r/574880 [22:40:27] 10Continuous-Integration-Infrastructure: beta-mediawiki-config-update-eqiad was stuck on 2020-02-19 - https://phabricator.wikimedia.org/T245770 (10Jdforrester-WMF) 05Open→03Resolved [22:41:00] 10Continuous-Integration-Infrastructure: beta-mediawiki-config-update-eqiad was stuck on 2020-02-19 - https://phabricator.wikimedia.org/T245770 (10Jdforrester-WMF) >>! In T245770#5914410, @Zoranzoki21 wrote: > This happens again. See: > {F31629986} Fixed, but this was a different issue. Please don't re-open tic... [22:44:54] (03Abandoned) 10Jeena Huneidi: Use the restbase chart from releases.wikimedia.org/charts [releng/local-charts] - 10https://gerrit.wikimedia.org/r/537562 (https://phabricator.wikimedia.org/T228915) (owner: 10Jeena Huneidi) [22:55:05] 10Release-Engineering-Team, 10Security-Team, 10Patch-For-Review, 10Security, 10user-sbassett: Wikimedia deployers audit - https://phabricator.wikimedia.org/T237696 (10sbassett) >>! In T237696#5917908, @Dzahn wrote: > So should we add those users as subscribers here on the ticket so that they get notified... [22:56:01] 10Release-Engineering-Team, 10Security-Team, 10Patch-For-Review, 10Security, 10user-sbassett: Wikimedia deployers audit - https://phabricator.wikimedia.org/T237696 (10sbassett) [23:02:09] (03PS1) 10Reedy: White Ori's personal email [integration/config] - 10https://gerrit.wikimedia.org/r/574890 [23:03:01] (03CR) 10Reedy: [C: 03+2] White Ori's personal email [integration/config] - 10https://gerrit.wikimedia.org/r/574890 (owner: 10Reedy) [23:04:03] (03Merged) 10jenkins-bot: White Ori's personal email [integration/config] - 10https://gerrit.wikimedia.org/r/574890 (owner: 10Reedy) [23:04:22] (03CR) 10Jforrester: [C: 03+2] White Ori's personal email [integration/config] - 10https://gerrit.wikimedia.org/r/574890 (owner: 10Reedy) [23:04:38] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/574890 [23:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:18:59] !log Added mayakpwiki to deployment-prep, user is already in the nda group and is working for WMF Product Analytics. Wants access to eventlog05 [23:19:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:57:53] (03PS3) 10Jforrester: jjb, layout: Add a job to test new phan changes on MW core [integration/config] - 10https://gerrit.wikimedia.org/r/574231 (https://phabricator.wikimedia.org/T226117) (owner: 10Daimona Eaytoy) [23:59:03] (03CR) 10Jforrester: [C: 03+2] "Let's give it a go." [integration/config] - 10https://gerrit.wikimedia.org/r/574231 (https://phabricator.wikimedia.org/T226117) (owner: 10Daimona Eaytoy) [23:59:56] (03Merged) 10jenkins-bot: jjb, layout: Add a job to test new phan changes on MW core [integration/config] - 10https://gerrit.wikimedia.org/r/574231 (https://phabricator.wikimedia.org/T226117) (owner: 10Daimona Eaytoy)