[01:14:18] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 30.77% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [01:21:40] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [01:23:08] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [02:38:08] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [05:09:09] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [05:36:31] 10Phabricator: Remove duplicate user account @abi for Abijeet Patro - https://phabricator.wikimedia.org/T216978 (10Arrbee) [06:05:58] 10Phabricator: Remove duplicate user account @abi for Abijeet Patro - https://phabricator.wikimedia.org/T216978 (10Aklapper) 05Open→03Resolved a:03Aklapper Done [06:48:54] 10Phabricator (Upstream), 10WMSE-Bug-Reporting-and-Translation-2019, 10Upstream: Cancel button in Create Task form always leads to Maniphest - https://phabricator.wikimedia.org/T215508 (10Aklapper) p:05Triage→03Lowest [06:50:23] 10Phabricator, 10Privacy: Difficulties registering a Phab account if third-party cookies are not enabled; email address requirement contradicts Privacy Policy - https://phabricator.wikimedia.org/T214251 (10Aklapper) [06:51:39] 10Phabricator: Global search results list one closed task, though I searched for only open tasks - https://phabricator.wikimedia.org/T205793 (10Aklapper) 05Open→03Declined Cannot reproduce anymore with https://phabricator.wikimedia.org/search/query/j3W65uoTFhAW/ - might have been a caching issue. [07:10:18] 10Release-Engineering-Team (Backlog), 10Quibble: Using Quibble with different backend than gerrit - https://phabricator.wikimedia.org/T216819 (10ItSpiderman) Thank you for your support and willingness to help. yes, i have experience in python I will get back to you once i have made some progress Thank you a... [07:16:18] 10MediaWiki-Codesniffer: Replace casting functions with direct casting - https://phabricator.wikimedia.org/T216971 (10Mainframe98) a:03Mainframe98 [08:56:33] 10Phabricator-Bot-Requests, 10WMSE-Organisational-Development-2019, 10User-Sebastian_Berlin-WMSE: Create bot for adding WMSE's projects - https://phabricator.wikimedia.org/T211351 (10Sebastian_Berlin-WMSE) I see, but it doesn't have to be an email address that actually goes somewhere, is that what "invalid"... [09:35:46] (03PS1) 10Hashar: Register mediawiki/extensions/SaveSpinner [integration/config] - 10https://gerrit.wikimedia.org/r/492616 [09:36:02] (03CR) 10Hashar: [C: 03+2] Register mediawiki/extensions/SaveSpinner [integration/config] - 10https://gerrit.wikimedia.org/r/492616 (owner: 10Hashar) [09:37:25] (03Merged) 10jenkins-bot: Register mediawiki/extensions/SaveSpinner [integration/config] - 10https://gerrit.wikimedia.org/r/492616 (owner: 10Hashar) [09:46:10] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Testing, 10HHVM, 10Language-Team (Language-2019-January-March), 10Wikimedia-production-error (Shared Build Failure): quibble-vendor-mysql-hhvm-docker in gate fails for most merges (exit status -11) - https://phabricator.wikimedia.org/T216689 (10Nik... [10:13:04] (03PS1) 10Mainframe98: Replace casting functions with casts [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/492634 (https://phabricator.wikimedia.org/T216971) [10:23:17] 10Phabricator: Remove duplicate user account @abi for Abijeet Patro - https://phabricator.wikimedia.org/T216978 (10Arrbee) Thanks @Aklapper [10:26:29] 10Phabricator-Bot-Requests, 10WMSE-Organisational-Development-2019, 10User-Sebastian_Berlin-WMSE: Create bot for adding WMSE's projects - https://phabricator.wikimedia.org/T211351 (10Aklapper) Created @WMSE-project-start-bot. Please see P8126. Please resolve this task once it works as expected. Thanks! [10:32:25] (03PS2) 10Mainframe98: Replace casting functions with casts [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/492634 (https://phabricator.wikimedia.org/T216971) [10:37:15] 10MediaWiki-Codesniffer: Enforce @covers… tags to have full qualified class names starting with backslash - https://phabricator.wikimedia.org/T215144 (10Mainframe98) [10:37:18] 10MediaWiki-Codesniffer: PHPCS should make sure @covers tags are absolute - https://phabricator.wikimedia.org/T183218 (10Mainframe98) [10:44:12] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [11:18:02] (03CR) 10Thiemo Kreuz (WMDE): [C: 04-1] "I'm not decided yet. Before actually discussing and reviewing this I would like to ask for two things:" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/492634 (https://phabricator.wikimedia.org/T216971) (owner: 10Mainframe98) [11:22:40] (03PS1) 10Giuseppe Lavagetto: Add golang image [integration/config] - 10https://gerrit.wikimedia.org/r/492649 [11:46:05] (03CR) 10Mainframe98: "> Patch Set 2: Code-Review-1" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/492634 (https://phabricator.wikimedia.org/T216971) (owner: 10Mainframe98) [11:52:24] (03CR) 10Mainframe98: "> Patch Set 2:" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/492634 (https://phabricator.wikimedia.org/T216971) (owner: 10Mainframe98) [12:30:53] 10Release-Engineering-Team (Kanban), 10Code-Health-Metrics, 10User-zeljkofilipin: Code health metrics spike - https://phabricator.wikimedia.org/T207046 (10zeljkofilipin) [12:45:50] 10Release-Engineering-Team (Kanban), 10Code-Health-Metrics, 10User-zeljkofilipin: Code health metrics spike - https://phabricator.wikimedia.org/T207046 (10zeljkofilipin) 05Open→03Resolved a:03zeljkofilipin As far as I can see, this is resolved. Example: - [[ https://gerrit.wikimedia.org/r/c/mediawiki... [12:52:52] (03PS3) 10Mainframe98: Replace casting functions with casts [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/492634 (https://phabricator.wikimedia.org/T216971) [12:55:06] (03PS1) 10Mainframe98: Prohibit aliases is_long, is_double and is_real [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/492663 [13:09:28] (03CR) 10Hashar: "Few nitpicks regarding ZUUL_CHANGE_IDS :)" (036 comments) [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/492225 (owner: 10Thcipriani) [13:10:43] Project beta-scap-eqiad build #239217: 04FAILURE in 4.6 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239217/ [13:15:30] Project mediawiki-core-doxygen-docker build #4993: 04FAILURE in 11 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/4993/ [13:16:49] (03PS1) 10Giuseppe Lavagetto: Use stretch packages on sury-php [integration/config] - 10https://gerrit.wikimedia.org/r/492666 [13:17:38] <_joe_> hashar: ^^ this change is pretty fundamental [13:18:03] <_joe_> I can't even imagine how things worked until today [13:21:21] 10Continuous-Integration-Config, 10Code-Health-Metrics, 10User-zeljkofilipin: Report results from SonarCloud to Gerrit - https://phabricator.wikimedia.org/T217008 (10hashar) [13:24:18] Yippee, build fixed! [13:24:19] Project beta-scap-eqiad build #239218: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239218/ [13:24:40] (03CR) 10Hashar: [C: 03+2] Use stretch packages on sury-php [integration/config] - 10https://gerrit.wikimedia.org/r/492666 (owner: 10Giuseppe Lavagetto) [13:25:43] Project beta-scap-eqiad build #239219: 04FAILURE in 0.99 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239219/ [13:26:03] (03Merged) 10jenkins-bot: Use stretch packages on sury-php [integration/config] - 10https://gerrit.wikimedia.org/r/492666 (owner: 10Giuseppe Lavagetto) [13:34:20] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "Thanks a lot for the split!" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/492663 (owner: 10Mainframe98) [13:35:28] <_joe_> hashar: this was generated with the new "update" action of docker-pkg btw [13:35:38] <_joe_> which I'm almost done writing :) [13:38:06] 10Release-Engineering-Team, 10Analytics, 10EventBus, 10MediaWiki-Core-Testing, and 4 others: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins - https://phabricator.wikimedia.org/T216069 (10CCicalese_WMF) [13:38:12] (03CR) 10Hashar: [C: 04-1] "Blubberoid and Kask run on production, as such as I would prefer to have the golang base image to be maintained by SRE under operations/do" [integration/config] - 10https://gerrit.wikimedia.org/r/492649 (owner: 10Giuseppe Lavagetto) [13:39:19] !log Rebuilding some CI Docker images using PHP sury.org to switch the sury.org component from jessie to stretch ( https://gerrit.wikimedia.org/r/#/c/integration/config/+/492666/ ) [13:39:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:39:48] (03CR) 10Hashar: [C: 03+2] "(now rebuilding, it will take a while)" [integration/config] - 10https://gerrit.wikimedia.org/r/492666 (owner: 10Giuseppe Lavagetto) [13:39:51] 10Release-Engineering-Team, 10MediaWiki-Containers, 10Operations, 10Core Platform Team Kanban (Done with CPT), and 4 others: FY2017/18 Program 6 - Outcome 2 - Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456 (10CCicalese_WMF) [13:40:49] (03CR) 10Giuseppe Lavagetto: "> Patch Set 1: Code-Review-1" [integration/config] - 10https://gerrit.wikimedia.org/r/492649 (owner: 10Giuseppe Lavagetto) [13:40:55] (03Abandoned) 10Giuseppe Lavagetto: Add golang image [integration/config] - 10https://gerrit.wikimedia.org/r/492649 (owner: 10Giuseppe Lavagetto) [13:44:06] Yippee, build fixed! [13:44:06] Project beta-scap-eqiad build #239220: 09FIXED in 9 min 47 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239220/ [13:57:40] (03CR) 10Hashar: [C: 03+2] "Done" [integration/config] - 10https://gerrit.wikimedia.org/r/492666 (owner: 10Giuseppe Lavagetto) [14:12:07] Yippee, build fixed! [14:12:08] Project mediawiki-core-doxygen-docker build #4994: 09FIXED in 8 min 6 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/4994/ [14:31:12] Project beta-scap-eqiad build #239225: 04FAILURE in 0.97 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239225/ [14:39:02] 10Project-Admins: Kartographer project should be a subproject instead of a milestone - https://phabricator.wikimedia.org/T161691 (10Aklapper) [14:39:15] 10Phabricator: Kartographer project should be a subproject instead of a milestone - https://phabricator.wikimedia.org/T161691 (10Aklapper) [14:44:16] Yippee, build fixed! [14:44:16] Project beta-scap-eqiad build #239226: 09FIXED in 9 min 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239226/ [14:45:38] Project beta-scap-eqiad build #239227: 04FAILURE in 2.2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239227/ [14:58:14] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10Ottomata) Any updates here? I see beta still in readonly... [14:59:11] Yippee, build fixed! [14:59:11] Project beta-scap-eqiad build #239228: 09FIXED in 9 min 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239228/ [15:00:50] Project beta-scap-eqiad build #239229: 04FAILURE in 20 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239229/ [15:08:32] Hi everyone! Is it normal that I cannot run queries on deploymentwiki? I keep getting an exception [15:08:41] Note to channel admins: Channel has got an outdated link to logs in the description [15:13:26] Daimona, run queries? [15:13:48] Krenair: yes. Upon launching sql deploymentwiki I get an excep [15:13:55] Same with other DBs [15:14:10] If that's not known, I can give you a paste [15:14:26] I'm guessing it's trying to use the slave that is down? [15:14:34] Yippee, build fixed! [15:14:35] Project beta-scap-eqiad build #239230: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239230/ [15:14:42] that's probably one of the problems caused by the problems we've got with deployment-db* hosts right now [15:14:51] haven't had time to properly fix them [15:14:54] I suspected it as well [15:15:01] I'm pasting the trace just in case [15:15:02] sql --write $dbname [15:15:07] Should use the master now IIRC [15:15:55] Project beta-scap-eqiad build #239231: 04FAILURE in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239231/ [15:16:31] https://phabricator.wikimedia.org/P8127 NDA restricted just for paranoia [15:16:48] Uhhh [15:16:59] That sounds like the mysqli? extension isn't loaded in the cli context [15:17:40] reedy@deployment-deploy01:~$ sql enwiki --write [15:17:40] ERROR 2003 (HY000): Can't connect to MySQL server on '172.16.5.23' (111 "Connection refused") [15:17:45] Connecting with --write yields ERROR 2003 (HY000): Can't connect to MySQL server on '172.16.5.23' (111 "Connection refused") [15:17:51] ^ [15:25:58] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Testing, 10HHVM, 10Language-Team (Language-2019-January-March), 10Wikimedia-production-error (Shared Build Failure): quibble-vendor-mysql-hhvm-docker in gate fails for most merges (exit status -11) - https://phabricator.wikimedia.org/T216689 (10Nik... [15:27:31] Yippee, build fixed! [15:27:31] Project beta-scap-eqiad build #239232: 09FIXED in 9 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239232/ [15:42:35] Reedy should I file a phab task? [15:58:41] 10Release-Engineering-Team (Kanban), 10TimedMediaHandler, 10Patch-For-Review, 10User-zeljkofilipin: The first Selenium test for TimedMediaHandler - https://phabricator.wikimedia.org/T214480 (10zeljkofilipin) a:05zeljkofilipin→03None [16:44:19] 10Phabricator-Bot-Requests, 10WMSE-Organisational-Development-2019, 10User-Sebastian_Berlin-WMSE: Create bot for adding WMSE's projects - https://phabricator.wikimedia.org/T211351 (10Sebastian_Berlin-WMSE) I can't get the test on https://www.mediawiki.org/wiki/Phabricator/Bots to work. I set `"user"` to `"WM... [16:48:28] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10RazShuty) Is there any time estimation for this? not to annoy, I'm just trying to understand the timeline [16:52:17] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10RhinosF1) >>! In T216067#4981587, @RazShuty wrote: > Is there any time estimation for this? not to annoy, I'm just trying to understand the timeline >>! In T216067#4981157, @Ot... [16:54:29] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10greg) Repeatedly asking for ETAs does not help get the issue fixed faster. We are working on it, but it is not a simple issue. [16:55:56] 10Release-Engineering-Team (Watching / External), 10Scap, 10Operations, 10Patch-For-Review: Deploy scap 3.9.0-1 - https://phabricator.wikimedia.org/T216666 (10thcipriani) 05Open→03Resolved Thanks for the merge @jijiki ! [17:03:03] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10RazShuty) @greg I understand and I'm sure it's hard and I'm sorry, just to get a time guesstimation, are we talking about hours/days/weeks? Again, I don't want to annoy, it's j... [17:03:36] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10greg) There is no explicit ETA at this time. [17:33:49] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10mmodell) I plan to work on this today (and tomorrow if I can't get it running today) @krenair: how far did you get with the slave? [17:38:32] (03PS1) 10Giuseppe Lavagetto: Explicitly pass the config to OpcacheManager.invalidate_all() [tools/scap] - 10https://gerrit.wikimedia.org/r/492738 [17:40:34] (03CR) 10jerkins-bot: [V: 04-1] Explicitly pass the config to OpcacheManager.invalidate_all() [tools/scap] - 10https://gerrit.wikimedia.org/r/492738 (owner: 10Giuseppe Lavagetto) [17:46:20] 10Phabricator-Bot-Requests, 10WMSE-Organisational-Development-2019, 10User-Sebastian_Berlin-WMSE: Create bot for adding WMSE's projects - https://phabricator.wikimedia.org/T211351 (10Aklapper) Not that I knew. [17:46:24] 10Scap: Track scap syncs that are part of a given SWAT window - https://phabricator.wikimedia.org/T193311 (10mmodell) [17:46:27] 10Release-Engineering-Team (Kanban), 10Scap, 10User-MModell: Document scap swat command - https://phabricator.wikimedia.org/T196411 (10mmodell) 05Open→03Stalled p:05High→03Normal I still intend to work on this but it's currently stalled due to big failures in the beta cluster that need my attention t... [17:46:47] https://groups.google.com/forum/#!topic/repo-discuss/A9dGOppvgGA [17:47:07] "GerritHub multi-site plugin goes OpenSource" [17:47:57] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release Pipeline (Blubber): Unify configuration for local build-context copies and variant artifacts - https://phabricator.wikimedia.org/T211625 (10thcipriani) 05Open→03Resolved I think @dduvall accomplished everything outlined on this task. Clos... [17:50:13] 10Release-Engineering-Team (Kanban), 10Scap, 10Patch-For-Review: l10nupdate is still using HHVM - https://phabricator.wikimedia.org/T205313 (10thcipriani) 05Open→03Stalled p:05Normal→03Low The l10nupdate is currently disabled in production. Resetting priority. [17:50:32] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10LDAP-Access-Requests: Disable nodepoolmanager user in LDAP - https://phabricator.wikimedia.org/T217064 (10hashar) [17:52:14] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) [17:52:40] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) >>! In T209361#4929205, @hashar wrote: > I caught up with some more cleanup. Last thing to do is to disable th... [17:53:41] Oh apparently renaming a repo is easy (with NoteDB) just as simple as renaming on disk and reindexing it. [17:57:18] paladox: but renaming projects is not yet allowed in our gerrit install are they? [17:57:43] Well we doin't do that nope, but it's possible. [17:58:02] could be useful, a request like that was lodged last week iirc, on mediawiki [17:58:15] we had to archive the former a create a new one [17:58:21] old fashioned style :) [17:58:26] heh [18:06:09] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Parsoid: Parsoid fails to deploy to beta cluster - https://phabricator.wikimedia.org/T216534 (10Arlolra) 05Open→03Resolved a:03Arlolra This wasn't an issue deployed today. [18:12:54] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10mmodell) ok it looks like the data copy finished without errors. [18:23:02] (03CR) 10Thcipriani: "update seems sane." (032 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/492738 (owner: 10Giuseppe Lavagetto) [18:26:41] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10mmodell) Mariadb is now running on db05! `lang=shell-session root@deployment-db05:/srv/sqldata# chown mysql.mysql * . root@deployment-db05:/srv/sqldata# chown mysql.mysql * -R... [18:46:23] hashar: Should the 'mediawiki-quibble-*' jobs be running on wmf/swat branches? [18:46:28] See https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/491795/ [18:46:33] That seems unexpected/redundant. [18:55:45] hey all [18:55:48] since beta is readonly [18:56:01] Krinkle: no [18:56:06] are there other suggested ways to smoke test mw config changes and mw core/extension patches? [18:56:29] Krinkle: mediawiki-quibble-* is for release branches , for wmf/deployment branches the jobs are wmf-quibble.* [18:56:31] but [18:56:35] there are bunch of issues with filters [18:58:26] gotta reboot + deal with kids *sorry* [19:11:01] I guess dealing with kids is the worst part *g* [19:15:29] 10Phabricator, 10Proton, 10Reading-Infrastructure-Team-Backlog: Update Herald (H228) to include project [insert project number here] - https://phabricator.wikimedia.org/T217078 (10Jhernandez) [19:25:19] Project beta-scap-eqiad build #239253: 04FAILURE in 8.9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239253/ [19:37:06] 10Phabricator: Unable to edit Herald rule, "unhandled exception" - https://phabricator.wikimedia.org/T217082 (10MBinder_WMF) [19:37:29] 10Phabricator: Unable to edit Herald rule, "unhandled exception" - https://phabricator.wikimedia.org/T217082 (10MBinder_WMF) [19:37:35] 10Phabricator, 10Proton, 10Reading-Infrastructure-Team-Backlog: Update Herald (H228) to include project [insert project number here] - https://phabricator.wikimedia.org/T217078 (10MBinder_WMF) [19:38:19] 10Phabricator, 10Proton, 10Reading-Infrastructure-Team-Backlog: Update Herald (H228) to include project [insert project number here] - https://phabricator.wikimedia.org/T217078 (10MBinder_WMF) I'm blocked by T217082 :( Someone else may have better luck? [19:38:41] Yippee, build fixed! [19:38:41] Project beta-scap-eqiad build #239254: 09FIXED in 9 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239254/ [19:39:23] 10Phabricator: Unable to edit Herald rule, "unhandled exception" - https://phabricator.wikimedia.org/T217082 (10MarcoAurelio) Confirmed. I get the same error. [19:41:54] twentyafterfour: Argument 1 passed to HeraldTokenizerFieldValue::setValueMap() must be of the type array, object given, called in /srv/deployment/phabricator/deployment/phabricator/src/applications/herald/field/HeraldField.php on line 137 and defined [19:42:02] on T217082 [19:42:03] T217082: Unable to edit Herald rule, "unhandled exception" - https://phabricator.wikimedia.org/T217082 [19:42:11] weird [19:42:23] * hauskatze dinner [19:44:53] 10Phabricator: Unable to edit Herald rule, "unhandled exception" - https://phabricator.wikimedia.org/T217082 (10MarcoAurelio) p:05Triage→03High Well this is happening not only on the rule above, but in all of them Herald rules. Checked with H90 and H267 both return the same error message. Editting existing H... [19:48:27] 10Phabricator, 10Release-Engineering-Team (Kanban): Unable to edit Herald rule, "unhandled exception" - https://phabricator.wikimedia.org/T217082 (10mmodell) a:03mmodell [20:15:34] Project beta-scap-eqiad build #239258: 04FAILURE in 5.7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239258/ [20:16:05] (03PS1) 10Thcipriani: Train notes: automatically upload changelog [integration/config] - 10https://gerrit.wikimedia.org/r/492758 [20:17:06] (03CR) 10jerkins-bot: [V: 04-1] Train notes: automatically upload changelog [integration/config] - 10https://gerrit.wikimedia.org/r/492758 (owner: 10Thcipriani) [20:19:15] hiya, is there some special place I need to set a new config in order for CI tests to pass? [20:19:17] I'm getting [20:19:18] Uncaught exception 'ConfigException' with message 'MultiConfig::get: undefined option: 'EventServices' [20:19:31] the config is defined in wmf-config [20:19:41] but the patch is jsut in the EventBus extension [20:19:45] and is failing in CI there [20:22:24] (03PS2) 10Thcipriani: Train notes: automatically upload changelog [integration/config] - 10https://gerrit.wikimedia.org/r/492758 [20:23:37] ottomata: I don't know offhand, honestly, but I can dig a bit: which test is failing? [20:23:46] sure [20:23:50] https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/36701/console [20:23:53] for https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/492023/ [20:23:58] ottomata: You need to define EventServices in extension.json [20:24:03] Config can't only exists in wmf-config [20:24:14] ah ha! [20:24:17] Reedy: thanks :) [20:24:18] Even if it's only as an empty array [20:24:30] great. tahnk you [20:24:37] this is what I was missing re. Pchelolo ^ [20:25:17] thcipriani: I think Reedy got it [20:25:36] :) [20:34:08] Yippee, build fixed! [20:34:09] Project beta-scap-eqiad build #239259: 09FIXED in 9 min 51 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239259/ [20:35:36] Project beta-scap-eqiad build #239260: 04FAILURE in 1.4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239260/ [20:36:54] heh, sad [20:43:00] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10Krenair) >>! In T216067#4981728, @mmodell wrote: > @krenair: how far did you get with the slave? I wasn't able to complete the documented apply-log instruction due to the issu... [20:54:14] Yippee, build fixed! [20:54:15] Project beta-scap-eqiad build #239261: 09FIXED in 9 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/239261/ [21:02:52] 10Continuous-Integration-Config, 10Fresnel, 10Performance-Team: Add "Console section" pattern to Jenkins to make Fresnel results easy to find - https://phabricator.wikimedia.org/T216572 (10Krinkle) p:05Triage→03Normal a:03Krinkle [21:03:02] 10Continuous-Integration-Config, 10Fresnel, 10Performance-Team: For the Fresnel job, distinguish system failure from assert failure - https://phabricator.wikimedia.org/T216574 (10Krinkle) p:05Triage→03Low [21:24:36] heya thcipriani do you know? I need to clone a git repo (for event schemas) into a docker image during build with blubber [21:25:00] looking at blubber docs, not seeing much other than installing apt packages [21:30:11] y'all maybe know this already but contint1001 has a disk space alert [21:30:47] 10Release-Engineering-Team (Watching / External), 10Operations, 10Release Pipeline, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Track and install additional npm packages for all service container images - https://phabricator.wikimedia.org/T205911 (10akosiaris) Oops, sorry f... [21:34:01] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog), 10Release Pipeline: disk space alert on contint1001 - https://phabricator.wikimedia.org/T217094 (10Andrew) [21:34:09] thcipriani, looks like you may have handled this last time: https://phabricator.wikimedia.org/T217094 [21:34:36] 10Release-Engineering-Team, 10Scap, 10Parsoid: Aborted deploy does not revert config changes - https://phabricator.wikimedia.org/T217095 (10Arlolra) [21:35:09] andrewbogott: yep, I'll take a look [21:36:52] ottomata: yeah, there's not really support in blubber for git clone, blubber copies in repos setup by CI (that is, it just copies to the image whatever is in the working directory) [21:37:43] thcipriani: going to try making a variant that uses builder.command ? [21:37:47] and then including it? [21:37:51] that could work as well [21:46:04] 10Continuous-Integration-Infrastructure, 10docker-pkg, 10Patch-For-Review: Pruning docker-pkg images - https://phabricator.wikimedia.org/T207703 (10thcipriani) I think this just needs to be deployed at this point (which looking at `/srv/deployment` maybe hasn't happened in a while). [21:46:23] hm thcipriani you might not know but [21:46:38] https://www.irccloud.com/pastebin/aIq5kxFJ/ [21:46:54] docker can't create dir in /srv when building image? [21:49:27] hrm, we have a 3-tiered system of users when creating a dockerfile. When you're in the builder step you're on the 2nd tier (which is "somebody" [like "nobody" but with a shell]) so you can't create files that would need root perms. YOu can only create files under your "lives: {in:}" [21:49:47] hm [21:49:51] ok i guess that's fine. [22:03:51] (03PS1) 10Alexandros Kosiaris: Install heapdump and gc-stats when env production [blubber] - 10https://gerrit.wikimedia.org/r/492922 (https://phabricator.wikimedia.org/T205911) [22:06:10] (03CR) 10jerkins-bot: [V: 04-1] Install heapdump and gc-stats when env production [blubber] - 10https://gerrit.wikimedia.org/r/492922 (https://phabricator.wikimedia.org/T205911) (owner: 10Alexandros Kosiaris) [22:14:12] !log docker rmi images without "latest" tag on contint1001 to free space -- should have kept all current docker-pkg images as well as images with children -- T217094 [22:14:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:14:16] T217094: disk space alert on contint1001 - https://phabricator.wikimedia.org/T217094 [22:16:49] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog), 10Release Pipeline: contint1001:/var/lib/docker growth - https://phabricator.wikimedia.org/T207702 (10thcipriani) [22:16:51] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog), 10Release Pipeline: disk space alert on contint1001 - https://phabricator.wikimedia.org/T217094 (10thcipriani) 05Open→03Resolved a:03thcipriani Did this manually. Disk space issue should be resolved. This time I did this the... [22:25:11] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10mmodell) @krenair: it seems like your transfer completed (in screen) but I'm not sure which procedure you were following? [22:26:38] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10Krenair) the one in the description, https://wikitech.wikimedia.org/wiki/Setting_up_a_MySQL_replica#Transferring_Data the bit it says to do before starting mysql [23:15:05] (03CR) 10Thcipriani: maintenance: Cleanup old Docker images at a lower threshold (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/490505 (https://phabricator.wikimedia.org/T177867) (owner: 10Dduvall) [23:32:10] !log root@deployment-db05# mariabackup --innobackupex --apply-log --use-memory=10G /srv/sqldata # T216067 [23:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:32:13] T216067: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 [23:41:05] 10Beta-Cluster-Infrastructure: Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10mmodell) So I figured out how to do `innobackupex` with our current installed packages: `mariabackup` has a innobackupex compatibility mode. [1] So after running that, I got:... [23:42:06] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10mmodell) a:05Krenair→03None @krenair: hope you don't mind me claiming this one? [23:42:12] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10mmodell) a:03mmodell [23:43:03] twentyafterfour, np, have an evil spooky haunted tree token :p [23:43:12] good luck [23:45:05] Krenair: thanks, I think I've got it now, after chasing forks of forks of xtrabackup to finally arrive at `mariabackup --innobackupex` [23:46:34] fun [23:46:42] sorry I didn't have the time for that [23:47:01] had forgotten I had left it assigned [23:51:00] Krenair: it's ok, thanks for what you already did on it, you did the hard part I think [23:58:16] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Recover from corrupted beta MySQL slave (deployment-db04) - https://phabricator.wikimedia.org/T216067 (10mmodell) Change master on `db05` to point to `db04` `CHANGE MASTER to MASTER_USER='repl', MASTER_PASSWORD='...', MASTER_PORT=3306, MASTER...