[00:00:34] this was the buster support so far https://gerrit.wikimedia.org/r/q/topic:%22phab-buster%22+(status:open%20OR%20status:merged) [00:01:52] ok got a clean puppet run other than that lvs package [00:04:18] :)) [00:04:29] how did you fix it? [00:07:07] arr, almost forgot i still need to fix phab1003/2001 too [00:07:15] puppet runs [00:10:16] paladox: oh man.. it says "if !$enable_php_fpm" and there is the ! [00:10:31] hmm? [00:10:39] so when it's NOT enabled [00:10:50] and the version is buster [00:10:54] THEN we get the new packages [00:11:06] oh [00:11:07] yup [00:11:13] and if it's not enabled and also not buster then we would get the apt::repo [00:11:20] but it's enabled and not buster... [00:11:29] just like in the other place there are 4 cases now [00:11:51] well having enable_php_fpm set to false would do both [00:11:58] since the if else is inside of $enable_php_fpm [00:13:16] i dont like "if not enable" anyways [00:14:30] whether we need those packages or not is not even depending on fpm or not [00:15:49] we should remove that [00:15:53] and just default to php-fpm [00:15:58] since we are now stretch+ [00:21:30] paladox: removing the part that breaks it, leaving more for you too :) [00:22:28] heh [00:24:30] also not working :( [00:24:46] duplicate declaration.. why did i start this :) [00:24:58] i really should have known it's always 5 follow-ups [04:06:41] 10Phabricator, 10User-DannyS712: Disable global herald rule H322 - https://phabricator.wikimedia.org/T235145 (10DannyS712) [04:08:43] 10Phabricator, 10User-DannyS712: Disable global herald rule H322 - https://phabricator.wikimedia.org/T235145 (10DannyS712) @Aklapper would you be willing to action this, given that you created the rule, and that https://www.mediawiki.org/wiki/Phabricator/Help/Herald_Rules suggests that you are the go-to contac... [04:20:13] 10Phabricator, 10User-DannyS712: Disable global herald rule H322 - https://phabricator.wikimedia.org/T235145 (10bd808) 05Open→03Resolved a:03bd808 @DannyS712 `{{done}}` [04:20:42] Project beta-update-databases-eqiad build #37315: 04FAILURE in 41 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/37315/ [04:23:04] 10Phabricator, 10User-DannyS712: Disable global herald rule H322 - https://phabricator.wikimedia.org/T235145 (10DannyS712) @bd808 I just went to recreate this as a personal rule, so that I could edit it, and found out that I couldn't, because it would need to be a global rule. Can you please * Re-enable H322... [04:36:35] 10Phabricator, 10User-DannyS712: Disable global herald rule H322 - https://phabricator.wikimedia.org/T235145 (10bd808) {F30612725, layout=right, float, size=thumb, alt="[[File:Rainbow_trout_transparent.png]]"} [x] re-enable rule [x] action set to "only the first time this rule matches" [ ] Change this task to... [04:37:59] 10Phabricator, 10User-DannyS712: Disable global herald rule H322 - https://phabricator.wikimedia.org/T235145 (10DannyS712) Thanks [05:21:16] Yippee, build fixed! [05:21:17] Project beta-update-databases-eqiad build #37316: 09FIXED in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/37316/ [06:35:06] (03PS1) 10Jeena Huneidi: Use new mediawiki-dev chart with apache server. [releng/local-charts] - 10https://gerrit.wikimedia.org/r/542009 [06:38:03] Yippee, build fixed! [06:38:03] Project mwcore-phpunit-coverage-master build #227: 09FIXED in 3 hr 38 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/227/ [06:41:02] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [06:54:39] (03PS2) 10Jeena Huneidi: Use new mediawiki-dev and parsoid charts [releng/local-charts] - 10https://gerrit.wikimedia.org/r/542009 [06:58:50] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [07:18:28] 10Phabricator, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, 10serviceops: package wikimedia-lvs-realserver for buster - https://phabricator.wikimedia.org/T235140 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff The package was alread... [07:18:35] 10Phabricator, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, and 2 others: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10MoritzMuehlenhoff) [08:07:36] 10Phabricator, 10Operations: List of recent most active Phab "Priority" field setters - https://phabricator.wikimedia.org/T235153 (10Aklapper) p:05Triage→03Low [08:08:27] Just to make sure I'm not duplicating work: do we have any examples of C/C++ test coverage jobs, in this case for a PHP extension? [08:10:29] awight: not that I'm aware of, Scribunto maybe? [08:11:02] twentyafterfour: Good suggestion, thanks! [08:13:57] ah too bad, Lua is included as binaries so there doesn't seem to be any C glue. [08:18:31] awight: the lua binary is a fallback iirc [08:18:36] we have a zend extension [08:18:41] php-luasandbox [08:19:01] mediawiki/php/luasandbox ;) [08:19:35] then that is a zend extension, so something like phpize && make [08:19:49] then tests are executing php code with the extension loaded [08:20:01] oh and hi :] [08:34:17] hashar: twentyafterfour: It turns out to be easy and rewarding! https://gerrit.wikimedia.org/r/#/c/mediawiki/php/wikidiff2/+/542063/ [08:37:39] I could make a new docker image "php-compile-coverage" perhaps, it would just add the "lcov" package and include a new entrypoint script. But it's a tiny package, I might as well just add this to the existing "php-compile" template. [09:20:05] Project beta-update-databases-eqiad build #37320: 04FAILURE in 4.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/37320/ [09:49:30] (03PS1) 10Awight: [WIP] Coverage report for PHP extensions [integration/config] - 10https://gerrit.wikimedia.org/r/542071 (https://phabricator.wikimedia.org/T185583) [09:58:39] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201910), 10Developer-Advocacy, 10wikimedia.biterg.io: biterg.io Gerrit crawling probably stresses the server too much - https://phabricator.wikimedia.org/T234328 (10hashar) Even after I activated the account aga... [09:58:47] Do we expose the docker registry API? [10:02:33] (03PS2) 10Awight: [WIP] Coverage report for PHP extensions [integration/config] - 10https://gerrit.wikimedia.org/r/542071 (https://phabricator.wikimedia.org/T185583) [10:03:41] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Coverage report for PHP extensions [integration/config] - 10https://gerrit.wikimedia.org/r/542071 (https://phabricator.wikimedia.org/T185583) (owner: 10Awight) [10:20:15] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201910), 10Developer-Advocacy, 10wikimedia.biterg.io: biterg.io Gerrit crawling probably stresses the server too much - https://phabricator.wikimedia.org/T234328 (10hashar) After flushing the `accounts` cache, t... [10:21:14] Yippee, build fixed! [10:21:14] Project beta-update-databases-eqiad build #37321: 09FIXED in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/37321/ [10:35:35] !log gerrit: added onimisionipe and gehel to wikidata-query-gui groups # T235159 [10:35:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:35:38] T235159: Enable write access for Mathew.onipe(onimisionipe) and gehel on wikidata gui repo - https://phabricator.wikimedia.org/T235159 [10:35:51] hashar: thanks! [10:35:54] 10Release-Engineering-Team (CI & Testing services), 10Cloud-VPS, 10Tools, 10Wikimedia-Incident, 10cloud-services-team (Kanban): Various user visible errors in Cloud VPS projects following OpenStack upgrade on 2019-10-07 - https://phabricator.wikimedia.org/T234834 (10aborrero) [10:38:13] gehel: onimisionipe: and Jenkins automatically publish the result of npm build to a different repository: wikidata/query/gui-deploy [10:38:17] it has a bunch of pending changes https://gerrit.wikimedia.org/r/#/q/project:wikidata/query/gui-deploy [10:38:24] Stas was typically +2 ing them [10:38:32] most probably the changes should be abandoned now [10:38:41] hashar: we need to check that with Luca [10:39:16] we're in the process of cleaning up how we do releases [10:41:02] gehel: I thought about moving the CI job to use Docker container but eventually gave up on that task ( https://phabricator.wikimedia.org/T210286 ) [10:41:41] gehel: I would be happy to be in the loop if I can help. I would love to get rid of that awkward build flow for sure [10:41:53] me too :) [10:42:13] at the moment, we're just making sure we stillknow how to deploy :) [10:46:11] gehel: sounds like a good first step [10:46:39] what might be worth looking is to migrate to the new Deployment pipeline, which effectively means moving the service to Docker containers generated by CI and deployed to Kubernetes [10:46:58] but I would guess the production Kubernetes cluster would not have access to the wdqs data backend [10:58:37] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201910), 10Developer-Advocacy, 10wikimedia.biterg.io: biterg.io Gerrit crawling probably stresses the server too much - https://phabricator.wikimedia.org/T234328 (10hashar) 05Open→03Resolved Verified with Va... [11:40:26] (03PS3) 10Awight: Coverage report for PHP extensions [integration/config] - 10https://gerrit.wikimedia.org/r/542071 (https://phabricator.wikimedia.org/T185583) [11:57:23] PROBLEM - Host integration-agent-docker-1006 is DOWN: CRITICAL - Host Unreachable (172.16.2.94) [11:57:58] PROBLEM - Host deployment-sessionstore01 is DOWN: CRITICAL - Host Unreachable (172.16.2.178) [12:00:13] PROBLEM - Host deployment-poolcounter05 is DOWN: CRITICAL - Host Unreachable (172.16.5.160) [12:02:24] RECOVERY - Host integration-agent-docker-1006 is UP: PING OK - Packet loss = 0%, RTA = 1.15 ms [12:03:00] RECOVERY - Host deployment-sessionstore01 is UP: PING OK - Packet loss = 0%, RTA = 1.56 ms [12:05:14] RECOVERY - Host deployment-poolcounter05 is UP: PING OK - Packet loss = 0%, RTA = 1.13 ms [12:08:28] (03CR) 10Thiemo Kreuz (WMDE): "I'm not familiar enough with all this to make a call. Where would the coverage report(s) (multiple?) end? At https://doc.wikimedia.org/cov" [integration/config] - 10https://gerrit.wikimedia.org/r/542071 (https://phabricator.wikimedia.org/T185583) (owner: 10Awight) [12:09:30] (03CR) 10Awight: "> I'm not familiar enough with all this to make a call. Where would" [integration/config] - 10https://gerrit.wikimedia.org/r/542071 (https://phabricator.wikimedia.org/T185583) (owner: 10Awight) [13:17:54] Is there a reasonable way to build an experimental docker image and use it from WMF CI Jenkins? [13:37:27] nvm, I'm reading https://www.mediawiki.org/wiki/Continuous_integration/Docker and it doesn't seem helpful to jump in there. [13:48:44] 10Release-Engineering-Team (CI & Testing services), 10Cloud-VPS, 10Tools, 10Wikimedia-Incident, 10cloud-services-team (Kanban): Various user visible errors in Cloud VPS projects following OpenStack upgrade on 2019-10-07 - https://phabricator.wikimedia.org/T234834 (10Andrew) [13:50:50] (03CR) 10Awight: "We only need coverage for php7.0, so I'm going to simplify the patch... Might as well package as a new image, while I'm at it." [integration/config] - 10https://gerrit.wikimedia.org/r/542071 (https://phabricator.wikimedia.org/T185583) (owner: 10Awight) [14:07:15] (03PS1) 10Octfx: Edit Project Config [extensions/WikiSEO] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/542111 [14:07:17] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [extensions/WikiSEO] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/542111 (owner: 10Octfx) [14:13:33] (03PS1) 10Octfx: Edit Project Config [extensions/WikiSEO] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/542114 [14:28:26] (03PS4) 10Awight: Coverage report for PHP extensions [integration/config] - 10https://gerrit.wikimedia.org/r/542071 (https://phabricator.wikimedia.org/T185583) [14:55:46] error: Server does not allow request for unadvertised object 09682421cca509dc3f9a7eb42553f26b5fda493a [14:55:46] Fetched in submodule path 'Jade', but it did not contain 09682421cca509dc3f9a7eb42553f26b5fda493a. Direct fetching of that commit failed. [15:52:23] Uh oh gerrit's gonna explode, I think? https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?panelId=16&fullscreen&orgId=1 [15:53:04] it does seem that way [15:53:58] Oh yes [15:53:59] I have a proposed workaround upstream [15:54:00] I think we are going to deploy that soon [15:57:47] cc thcipriani ^ [15:58:24] I have some requests to gerrit for the suggest_reviewers endpoint that have not completed even after a minute [15:58:45] I'm wondering if that hits some of the same lock contention as sending email, if the lock that's being hit is actually around its storage of contact information or something [15:58:55] of course I have no idea but figured I'd mention it :) [16:00:01] * thcipriani looks [16:00:02] cdanis: yeah they are locked on a reentrant lock [16:00:04] which is never released [16:00:04] It Is [16:00:05] cdanis: it uses the account cache endpoint [16:00:07] So is hitting the lock [16:00:07] I asked guava upstream who gave me a workaround [16:00:08] (Not a nice hack but a workaround) [16:00:24] could maybe do a thread dump and check wether the jvm finds any deadlock [16:00:32] but most probably that is just the good "old" issue [16:00:43] which server is the active gerrit server presently? [16:00:45] is it gerrit1001? [16:00:50] Cobalt [16:00:56] blurg. Yep same issue https://fastthread.io/ft-thread-report.jsp?dumpId=1 [16:01:07] er wait [16:01:15] https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMTAvMTAvLS1hcGktMDZlMmFhZjMtMzM1NS00ODI4LWFiNDgtMWY5NzIzNWQ1M2JmNGUwYWI3MmYtNDhlNi00N2Y0LTlmYWEtZTMyODc4ODY0N2Y0LnR4dC0t& [16:01:21] *actual url [16:01:58] fun [16:02:18] it shows SendEmail-2 thread as owning the lock [16:02:20] Workaround: https://gerrit-review.googlesource.com/c/gerrit/+/239436 [16:02:30] but I can not see that threads in gerrit show-queue -q -q :-\ [16:02:57] yeah, I've noticed that [16:03:03] I think the threads are done executing [16:03:15] but they still hold the lock for whatever reason [16:03:35] so gerrit thinks they are all done with their work, but the jvm still sees their lock [16:03:57] hashar: do you want to take any other dumps before I restart? [16:04:06] I wonder if it throws silently? [16:04:25] that has been my suspicion [16:04:26] thcipriani: nop, do restart :) [16:04:31] * thcipriani does [16:05:18] https://phabricator.wikimedia.org/P9307 as usual [16:06:44] If we change https://github.com/GerritCodeReview/gerrit/blob/stable-2.15/gerrit-server/src/main/java/com/google/gerrit/server/account/AccountCacheImpl.java#L86 to Exception then it should in theory catch all exceptions, right? [16:07:23] In theory [16:09:00] although that seems to do some magic on accounts in the exception handling that we maybe should avoid triggering [16:09:09] in the missing method [16:09:23] or in the SendEmail execution [16:10:08] another piece of the puzzle: upstream doesn't have this problem, which may mean it has something to do with one of our plugins, or something to do with ldap. [16:10:58] Yeh [16:10:59] Upstream also run gerrit master [16:11:00] So maybe the issue was fixed without anyone realising? [16:11:02] paladox: the folks who ripped out guava and replaced it with caffeine: do you remember if they used ldap authentication? [16:11:13] I can ask [16:11:28] might be an interesting data point [16:11:42] if they use oauth and they still have this problem it might help narrow down possibilities [16:12:04] (03PS1) 10Jforrester: layout: Add CI configuration for new DiscusionTools repo [integration/config] - 10https://gerrit.wikimedia.org/r/542166 (https://phabricator.wikimedia.org/T234481) [16:12:33] (03PS2) 10Jforrester: layout: Add CI configuration for new DiscussionTools repo [integration/config] - 10https://gerrit.wikimedia.org/r/542166 (https://phabricator.wikimedia.org/T234481) [16:12:37] (03CR) 10Jforrester: [C: 03+2] layout: Add CI configuration for new DiscussionTools repo [integration/config] - 10https://gerrit.wikimedia.org/r/542166 (https://phabricator.wikimedia.org/T234481) (owner: 10Jforrester) [16:13:45] Did we install any plugins around December / January time? [16:14:14] not that I remember [16:14:17] (03Merged) 10jenkins-bot: layout: Add CI configuration for new DiscussionTools repo [integration/config] - 10https://gerrit.wikimedia.org/r/542166 (https://phabricator.wikimedia.org/T234481) (owner: 10Jforrester) [16:14:18] * thcipriani looks at deploy repo [16:15:35] !log Zuul: Add CI configuration for new DiscussionTools repo T234481 [16:15:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:15:38] T234481: Create a stub extension "DiscussionTools" - https://phabricator.wikimedia.org/T234481 [16:16:20] javamelody was september. I think that 2.15.8 is what may have introduced the issue (for us at least). That was Jan 15th. [16:17:50] prior to that we were on 2.15.6 for 2 months and before that 2.15.3 for like 6 months [16:18:29] ok [16:18:58] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10serviceops-radar, 10Upstream: Gerrit account cache has a faulty reentrant lock causing http/sendemail threads to stall completely - https://phabricator.wikimedia.org/T224448 (10thcipriani) >>! In T224448#55635... [16:20:02] (03PS1) 10Jforrester: layout: [DiscussionTools] Add phan and phan-seccheck too [integration/config] - 10https://gerrit.wikimedia.org/r/542167 (https://phabricator.wikimedia.org/T234481) [16:20:21] (03CR) 10Jforrester: [C: 03+2] layout: [DiscussionTools] Add phan and phan-seccheck too [integration/config] - 10https://gerrit.wikimedia.org/r/542167 (https://phabricator.wikimedia.org/T234481) (owner: 10Jforrester) [16:20:46] thcipriani actually i doin't think this was introduced into core :( [16:20:53] seeing as it hit 2.14 (which is before 2.15 [16:20:55] ) [16:21:08] hit 2.14 for who? [16:21:13] ericson [16:21:20] they run 2.14 atm [16:21:26] with hopes of going with 2.15+ soon [16:22:02] (03Merged) 10jenkins-bot: layout: [DiscussionTools] Add phan and phan-seccheck too [integration/config] - 10https://gerrit.wikimedia.org/r/542167 (https://phabricator.wikimedia.org/T234481) (owner: 10Jforrester) [16:22:38] !log Zuul: Also enable phan and phan-seccheck on DiscussionTools T234481 [16:22:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:22:40] T234481: Create a stub extension "DiscussionTools" - https://phabricator.wikimedia.org/T234481 [16:26:21] thcipriani ericson use ldap [16:27:48] interesting [16:28:46] thcipriani though it appears i've been logged out again [16:29:11] even though i just logged in a few days ago after the last one caused us to restart [16:29:23] * paladox remembered to press the "remember me" button [16:29:38] so it looks like either the cache on disk is getting corrupted or we need to boost it [16:30:07] hrm, it's confusing, the cache on disk is hugely oversized for what we ever use [16:30:20] it's per session though [16:30:24] people can have many sessions [16:30:31] e.g phone, desktop, other electronics [16:33:57] 92% hit mem for websessions [16:34:09] 27% ratio disk [16:34:51] also, it's been the same size for quite a while: https://phabricator.wikimedia.org/P9308 [16:35:49] and we have a 256m disk limit [16:36:55] oh [16:37:02] so not corrupted then i guess [16:39:13] thcipriani apparently you can set [cache "accounts"]... [16:39:21] i'm trying to find out what the default is [16:39:34] ref https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#cache_names [16:39:44] that cache is used in the file we are having problems with [16:42:03] FWIW: here's the output of show caches for websessions over time https://phabricator.wikimedia.org/P9308#55722 [16:42:30] dropped a bunch from memory following restart (as you'd expect) [16:42:38] disk stayed ~stable [16:44:20] thanks! [16:49:31] ssh -p 29418 paladox@gerrit.wikimedia.org gerrit show-caches --show-threads [16:49:35] that's interesting! [16:50:52] thcipriani that's a basic version of the jvm logs you collect [16:54:06] thcipriani i say lets up the account cache [16:54:23] looking at https://gerrit-review.googlesource.com/c/gerrit/+/140711 (and using it as a guidence), the account cache is limited. [17:00:42] thcipriani hasharAway https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/542174/ [17:01:08] (03PS1) 10Jforrester: build: Upgrade mediawiki-codesniffer to v28.0.0, drop PHPUnit 4, increase PHP to 7.2+ [integration/docroot] - 10https://gerrit.wikimedia.org/r/542175 [17:01:14] I never see the cache hit ratio below 99% on the accounts cache. That seems to be the key. If you see a cache that has used up all its entries (1024 for accounts) and the hit ratio is below 99% it means it has to hit the disk too much [17:01:37] (03CR) 10jerkins-bot: [V: 04-1] build: Upgrade mediawiki-codesniffer to v28.0.0, drop PHPUnit 4, increase PHP to 7.2+ [integration/docroot] - 10https://gerrit.wikimedia.org/r/542175 (owner: 10Jforrester) [17:02:00] for the accounts cache, I often see it at 1024; however, the hit ratio is never below 99% [17:03:20] oh [17:05:08] thcipriani should i abandon? [17:05:34] yeah, I don't think it's necessary [17:06:18] ok [17:07:04] I say we should deploy https://gerrit-review.googlesource.com/c/gerrit/+/239436 which should get us through until upstream manage to switch to caffeine. [17:08:54] thcipriani offtopic: gerrit supports git protocole v2 now! [17:08:55] (https://groups.google.com/forum/#!topic/repo-discuss/nEkvNVCZzNM) [17:10:42] gerrit 3.1 gets branched next week [17:12:26] nice [17:12:30] that's a good incentive [17:14:21] Is the beta cluster all hunky dory post the cloud changes? [17:14:22] [XZ9mVKwQBGoAAB-499MAAAAC] Exception caught: Request to parsoid for "html" to "wikitext" conversion of content connected to title "Topic:V8xht5pchemxmn4g" failed: (curl error: 28) Timeout was reached [17:19:07] https://phabricator.wikimedia.org/T234242 seemingly [17:19:21] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10StructuredDiscussions: [betalabs-regression] Cannot create a topic on Structured discussion - https://phabricator.wikimedia.org/T234242 (10Reedy) [17:47:53] PROBLEM - Parsoid on deployment-parsoid09 is CRITICAL: connect to address 172.16.5.63 and port 8000: Connection refused [17:47:54] PROBLEM - Parsoid on deployment-mediawiki-parsoid10 is CRITICAL: connect to address 172.16.0.141 and port 8000: Connection refused [17:49:14] I wonder... [17:53:26] Reedy: Oh dear. Yeah, plausibly broken. [17:55:21] 10Gerrit, 10Release-Engineering-Team, 10serviceops: Set gerrit1001 master switch date - https://phabricator.wikimedia.org/T234866 (10thcipriani) I think some upcoming Monday would be best for this. We want to avoid making a big change in prod on a Friday. Tuesday–Thursday risks running into train/other deplo... [18:05:36] 10Gerrit, 10Release-Engineering-Team, 10serviceops: Set gerrit1001 master switch date - https://phabricator.wikimedia.org/T234866 (10Paladox) I'm available anytime as @thcipriani work day starts as the UK work day finishes. [18:11:00] (03PS4) 10Jforrester: jjb: Replace docker-ci-src-setup-mw with docker-zuul-cloner followed by docker-ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/539987 (https://phabricator.wikimedia.org/T234062) [18:26:48] svg's are now highlighted in PolyGerrit's ui (using the xml language) [18:35:05] (03PS5) 10Jforrester: jjb: Replace docker-ci-src-setup-mw with docker-zuul-cloner followed by docker-ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/539987 (https://phabricator.wikimedia.org/T234062) [18:35:52] 10Gerrit, 10Release-Engineering-Team, 10serviceops: Set gerrit1001 master switch date - https://phabricator.wikimedia.org/T234866 (10Dzahn) @thcipriani Sounds good and Mondays work for me (from around 10am PST). This coming one is "Wikimedia holiday email / Monday, October 14 US holiday" though. Unless you... [18:36:52] (03CR) 10Jforrester: [C: 03+2] jjb: Replace docker-ci-src-setup-mw with docker-zuul-cloner followed by docker-ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/539987 (https://phabricator.wikimedia.org/T234062) (owner: 10Jforrester) [18:39:37] (03Merged) 10jenkins-bot: jjb: Replace docker-ci-src-setup-mw with docker-zuul-cloner followed by docker-ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/539987 (https://phabricator.wikimedia.org/T234062) (owner: 10Jforrester) [18:44:07] (03PS3) 10Jforrester: dockerfiles: Drop ci-src-setup, unused [integration/config] - 10https://gerrit.wikimedia.org/r/539989 [18:46:45] 10Gerrit, 10Release-Engineering-Team, 10serviceops: Set gerrit1001 master switch date - https://phabricator.wikimedia.org/T234866 (10Dzahn) We agreed on Monday, October 21st. Is this ticket just about setting the date? Then it's resolved. But it also has the check boxes for the migration steps. [18:53:41] (03CR) 10Jforrester: [C: 03+2] dockerfiles: Drop ci-src-setup, unused [integration/config] - 10https://gerrit.wikimedia.org/r/539989 (owner: 10Jforrester) [18:54:55] 10Continuous-Integration-Config, 10Release-Engineering-Team-TODO (201910), 10Patch-For-Review, 10phan: ci-src-setup job (used by mediawiki-core-php72-phan-docker) is still running on PHP 7.0.33 - https://phabricator.wikimedia.org/T234062 (10Jdforrester-WMF) 05Open→03Resolved a:03Jdforrester-WMF [18:54:58] 10Release-Engineering-Team-TODO (201910), 10MediaWiki-General: Bump PHP support version in composer.json - https://phabricator.wikimedia.org/T234767 (10Jdforrester-WMF) [18:56:09] (03Merged) 10jenkins-bot: dockerfiles: Drop ci-src-setup, unused [integration/config] - 10https://gerrit.wikimedia.org/r/539989 (owner: 10Jforrester) [19:15:01] (03Abandoned) 10Octfx: Edit Project Config [extensions/WikiSEO] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/542114 (owner: 10Octfx) [19:15:07] (03Abandoned) 10Octfx: Edit Project Config [extensions/WikiSEO] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/542111 (owner: 10Octfx) [19:23:16] 10Release-Engineering-Team-TODO (201910), 10MediaWiki-General, 10Patch-For-Review: Bump PHP support version in composer.json - https://phabricator.wikimedia.org/T234767 (10Jdforrester-WMF) 05Open→03Resolved [19:25:09] 10Release-Engineering-Team-TODO (201910), 10MediaWiki-General, 10Patch-For-Review: Bump PHP support version in composer.json - https://phabricator.wikimedia.org/T234767 (10Reedy) [19:25:41] 10Release-Engineering-Team-TODO (201910), 10MediaWiki-General, 10Patch-For-Review: Bump PHP support version in composer.json - https://phabricator.wikimedia.org/T234767 (10Reedy) [19:32:00] 10Gerrit, 10Release-Engineering-Team, 10serviceops: Set gerrit1001 master switch date - https://phabricator.wikimedia.org/T234866 (10Paladox) [19:34:02] 10Gerrit, 10Release-Engineering-Team, 10serviceops: Set gerrit1001 master switch date - https://phabricator.wikimedia.org/T234866 (10Paladox) [19:34:14] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Paladox) [19:34:36] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Paladox) [19:35:29] 10Gerrit, 10Release-Engineering-Team, 10serviceops: Set gerrit1001 master switch date - https://phabricator.wikimedia.org/T234866 (10Dzahn) a:03Paladox Announcement text as agreed on on P9309. Paladox is sending mail to wikitech :) [19:36:45] 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201910), 10Quibble, 10Patch-For-Review: Create an integration test running Quibble with mediawiki/core - https://phabricator.wikimedia.org/T235118 (10Jdforrester-WMF) a:03hashar [19:37:17] 10Release-Engineering-Team (Local Dev), 10Release-Engineering-Team-TODO (201910), 10Developer Productivity, 10local-charts, and 2 others: Add pipeline config and publish step for restbase dev docker image - https://phabricator.wikimedia.org/T234580 (10Jdforrester-WMF) a:03jeena [19:38:44] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201910), 10Quibble: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git" - https://phabricator.wikimedia.org/T233143 (10Jdforrester-WMF... [19:39:18] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Quibble, and 3 others: CI: Create a way to share a secret between MediaWiki and the testing framework. - https://phabricator.wikimedia.org/T233092 (10hashar) 05Open→03Resolved a:03ha... [19:39:21] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10CPT Initiatives (API Integration Tests), 10Core Platform Team Workboards (Purple): Set up CI for mediawiki/tools/api-testing - https://phabricator.wikimedia.org/T230340 (10hashar) [19:44:12] (03PS1) 10Hashar: Use $wgSecretKey in api-testing job [integration/config] - 10https://gerrit.wikimedia.org/r/542210 (https://phabricator.wikimedia.org/T230340) [19:52:36] (03CR) 10Hashar: [C: 03+2] "jobs updated ! ;)" [integration/config] - 10https://gerrit.wikimedia.org/r/542210 (https://phabricator.wikimedia.org/T230340) (owner: 10Hashar) [19:54:20] 10Release-Engineering-Team (Pipeline), 10Release-Engineering-Team-TODO (201910), 10Release Pipeline, 10Maps (Kartotherian): Deployment Pipeline fails with CPS error for Kartotherian - https://phabricator.wikimedia.org/T233316 (10dduvall) @mathew.onipe no problem! I suspect there's definitely a better way... [19:54:38] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10CPT Initiatives (API Integration Tests), and 2 others: Set up CI for mediawiki/tools/api-testing - https://phabricator.wikimedia.org/T230340 (10hashar) Adam Wight eventually found out a c... [19:55:05] (03Merged) 10jenkins-bot: Use $wgSecretKey in api-testing job [integration/config] - 10https://gerrit.wikimedia.org/r/542210 (https://phabricator.wikimedia.org/T230340) (owner: 10Hashar) [19:55:21] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Quibble, and 3 others: CI: Create a way to share a secret between MediaWiki and the testing framework. - https://phabricator.wikimedia.org/T233092 (10hashar) a:05hashar→03awight @awig... [20:00:03] Project mwcore-phpunit-coverage-master build #228: 04FAILURE in 5 hr 0 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/228/ [20:04:58] 10Gerrit, 10Release-Engineering-Team, 10serviceops: Set gerrit1001 master switch date - https://phabricator.wikimedia.org/T234866 (10Paladox) 05Open→03Resolved Sent https://lists.wikimedia.org/pipermail/wikitech-l/2019-October/092664.html [20:09:36] 10Gerrit, 10Release-Engineering-Team, 10serviceops: Set gerrit1001 master switch date - https://phabricator.wikimedia.org/T234866 (10Dzahn) [20:09:44] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn) [20:10:34] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn) [20:11:14] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn) [20:11:59] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn) [20:18:14] (03CR) 10Brennen Bearnes: [C: 03+2] Use new mediawiki-dev and parsoid charts [releng/local-charts] - 10https://gerrit.wikimedia.org/r/542009 (owner: 10Jeena Huneidi) [20:36:00] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201910), 10CPT Initiatives (API Integration Tests), 10Core Platform Team Workboards (Purple): Set up CI for mediawiki/tools/api-testing - https://phabricator.wikimedia.org/T230340 (10Jdfor... [20:36:17] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201910), 10CPT Initiatives (API Integration Tests), 10Core Platform Team Workboards (Purple): Set up CI for mediawiki/tools/api-testing - https://phabricator.wikimedia.org/T230340 (10Jdfor... [20:39:59] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Quibble, and 3 others: CI: Create a way to share a secret between MediaWiki and the testing framework. - https://phabricator.wikimedia.org/T233092 (10daniel) Has the new version of quibbl... [20:45:58] (03PS1) 10Jforrester: layout: Make mediawiki-quibble-api-testing* voting [integration/config] - 10https://gerrit.wikimedia.org/r/542223 (https://phabricator.wikimedia.org/T230340) [20:50:04] (03CR) 10Jforrester: [C: 03+2] layout: Make mediawiki-quibble-api-testing* voting [integration/config] - 10https://gerrit.wikimedia.org/r/542223 (https://phabricator.wikimedia.org/T230340) (owner: 10Jforrester) [20:52:25] (03Merged) 10jenkins-bot: layout: Make mediawiki-quibble-api-testing* voting [integration/config] - 10https://gerrit.wikimedia.org/r/542223 (https://phabricator.wikimedia.org/T230340) (owner: 10Jforrester) [20:52:57] !log Zuul: Make mediawiki-quibble-api-testing* voting T230340 [20:53:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:53:01] T230340: Set up CI for mediawiki/tools/api-testing - https://phabricator.wikimedia.org/T230340 [20:57:19] (03CR) 10Brennen Bearnes: [V: 03+2 C: 03+2] Use new mediawiki-dev and parsoid charts [releng/local-charts] - 10https://gerrit.wikimedia.org/r/542009 (owner: 10Jeena Huneidi) [20:59:09] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki configuration Error - string 'Wikipedia' not found on 'https://en.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2397 bytes in 0.014 second response time [20:59:18] James_F: <3 [20:59:27] hashar: Keep being awesome. [21:01:47] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki configuration Error - string 'Wikipedia' not found on 'https://en.m.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2379 bytes in 0.009 second response time [21:02:06] hashar fyi https://lists.wikimedia.org/pipermail/wikitech-l/2019-October/092664.html :) [21:02:08] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki configuration Error - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 1790 bytes in 0.003 second response time [21:02:41] MediaWiki 1.35 requires at least PHP version 7.2.9, you are using PHP 7.2.8-1+0~20180725124257.2+stretch~1.gbp571e56. [21:02:47] James_F, ^ [21:03:05] James_F: yeah trying trying, bu tit is more and more challenging. There is a lot more competition nowadays so it is harder to stand ou :! [21:03:07] Krenair: Oh, sorry, beta cluster? [21:03:09] yep [21:03:13] damn it [21:03:13] Krenair: Meeeeh. [21:03:21] maybe I can just press some update buttons [21:03:22] Reedy: You gone dun broke Beta again. [21:03:24] Why is it on such an old version? [21:03:27] Let's leave it broken [21:03:31] gotta apt upgrade I guess [21:03:40] Reedy: Until last week and Krenair's heroïcs it was running HHVM. [21:03:51] and make sure component/php72 is somewhere in apt-cache policy output / /etc/apt/sources.d/ [21:03:53] James_F: You say that like prod hasn't been running HHVM for months or years :) [21:03:55] My heroics consisted of finding a hiera flag and turning it on :P [21:04:02] Reedy: Indeed. [21:04:12] 10Release-Engineering-Team (Local Dev), 10Release-Engineering-Team-TODO (201910), 10dev-images, 10local-charts: Point deployment-charts/mediawiki-dev at latest dev image published by pipeline - https://phabricator.wikimedia.org/T234391 (10brennen) 05Open→03Resolved [21:04:16] Krenair: And bashing it repeatedly until it works. [21:05:05] paladox: ohhh! I have noticed Daniel and you patching up for the new gerrit server :D [21:05:08] We could pin back master to .0 for a bit until you can fix? [21:05:20] hashar yup! [21:06:04] Everyone hold your horses [21:06:10] I'm slightly amused that beta is literally on a version that's 0.0.1 too low [21:06:11] I ran apt update and apt upgrade [21:06:20] RESTART ALL THE THINGS [21:06:29] What could possibly go wrong? [21:06:48] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39230 bytes in 1.328 second response time [21:06:54] shinken should wake up at some point but from the looks of things it is back up. [21:06:58] oh there it is [21:07:03] magic [21:07:09] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-07 is OK: HTTP OK: HTTP/1.1 200 OK - 48494 bytes in 0.591 second response time [21:07:17] 7.2.22-1+0~20190902.26+debian9~1.gbpd64eb7+wmf1 (fpm-fcgi) [21:07:19] Nice. [21:07:34] Maybe we should have something work out when Beta Cluster and prod drift platforms? [21:08:22] !log Ran apt upgrades on deployment-mediawiki-0[79] to bring beta cluster back online, MW started requiring a version higher than we were running [21:08:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:09:11] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48974 bytes in 0.634 second response time [21:10:21] Shouldn't need to upgrade PHP on the beta cluster for at least a couple of major MW versions now [21:10:59] 10Release-Engineering-Team: php-composer-security-docker currently failing with "curl: command not found" error - https://phabricator.wikimedia.org/T235221 (10sbassett) [21:11:14] 10Release-Engineering-Team: php-composer-security-docker currently failing with "curl: command not found" error - https://phabricator.wikimedia.org/T235221 (10sbassett) p:05Triage→03High [21:22:40] Reedy: Unless we find a major upstream bug and bump to .23 or whatever. [21:23:44] 10Phabricator, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, 10serviceops: package wikimedia-lvs-realserver for buster - https://phabricator.wikimedia.org/T235140 (10Dzahn) Oh, that was quick and easier than i thought. Thank you! [21:26:29] 10Release-Engineering-Team: php-composer-security-docker currently failing with "curl: command not found" error - https://phabricator.wikimedia.org/T235221 (10sbassett) Seems to fix it locally - Dockerfile: `lang=docker FROM docker-registry.wikimedia.org/releng/composer-php72:0.2.1-s1 USER root RUN apt-get upda... [21:38:41] (03PS1) 10Samwilson: Add WikiSEO extension [integration/config] - 10https://gerrit.wikimedia.org/r/542235 [21:42:22] (03CR) 10Jforrester: [C: 03+2] Add WikiSEO extension [integration/config] - 10https://gerrit.wikimedia.org/r/542235 (owner: 10Samwilson) [21:44:17] (03Merged) 10jenkins-bot: Add WikiSEO extension [integration/config] - 10https://gerrit.wikimedia.org/r/542235 (owner: 10Samwilson) [21:47:01] 10Release-Engineering-Team (Local Dev), 10Release-Engineering-Team-TODO (201910), 10Developer Productivity, 10local-charts, 10Epic: Add pipeline config and publish step for parsoid dev docker image - https://phabricator.wikimedia.org/T234578 (10brennen) 05Open→03Resolved a:03jeena [21:47:03] 10Release-Engineering-Team (Local Dev), 10Release-Engineering-Team-TODO, 10Developer Productivity, 10local-charts, 10Epic: Create official docker images for Mediawiki and services used in the local development environment - https://phabricator.wikimedia.org/T217872 (10brennen) [22:28:24] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-CentralAuth, 10Security-Team, 10Beta-Cluster-reproducible, 10Performance-Team (Radar): Beta Cluster cross-wiki login request would be blocked by CSP - https://phabricator.wikimedia.org/T211539 (10Jdforrester-WMF) Eurgh, no. I'm going to have to wade... [22:53:42] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10serviceops, 10Test-Coverage: Upgrade our php-xdebug package for php7.2 - https://phabricator.wikimedia.org/T234418 (10Jdforrester-WMF) [22:54:07] twentyafterfour: phab1001 (buster) - puppet running all green now with phab role :) [22:54:39] phab1003/phab2001 fixed puppet runs and we switched it to always default to php-fpm and removed support for mod_php [23:26:30] 10Deployments, 10Release-Engineering-Team-TODO, 10WMF-JobQueue, 10Core Platform Team Workboards (Clinic Duty Team), 10Wikimedia-production-error: Review removal of ukwikimedia wiki - https://phabricator.wikimedia.org/T218170 (10Krinkle) Okay, so what can we do about about? Who should take the next step?... [23:43:32] 10Deployments, 10Release-Engineering-Team, 10VisualEditor, 10Wikimedia-Logstash, and 2 others: Logstash discards messages from MediaWiki if they contain uncommon keys in the $context array - https://phabricator.wikimedia.org/T234564 (10Krinkle) [23:43:52] 10Deployments, 10Release-Engineering-Team, 10VisualEditor, 10Wikimedia-Logstash, and 2 others: Logstash discards messages from MediaWiki if they contain uncommon keys in the $context array - https://phabricator.wikimedia.org/T234564 (10Krinkle) p:05Triage→03High [23:45:05] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 78.57% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [23:49:03] (03Abandoned) 10Jforrester: [DNM,WIP] dockerfiles: [ci-src-setup] Move to php72/stretch [integration/config] - 10https://gerrit.wikimedia.org/r/539984 (https://phabricator.wikimedia.org/T234062) (owner: 10Jforrester) [23:52:06] (03CR) 10Jforrester: [C: 04-1] "(Still blocked.)" [integration/config] - 10https://gerrit.wikimedia.org/r/516570 (owner: 10Jforrester) [23:56:44] 10Deployments, 10Release-Engineering-Team, 10VisualEditor, 10Wikimedia-Logstash, and 2 others: Logstash discards messages from MediaWiki if they contain uncommon keys in the $context array - https://phabricator.wikimedia.org/T234564 (10Krinkle) >>! In T234564#5560719, @herron wrote: > > ` > … Could not in...