[03:42:42] 10Release-Engineering-Team, 10Collection, 10Readers-Web-Backlog, 10Readers-Web-Kanban-Board: Collection tests do not run properly - https://phabricator.wikimedia.org/T177672#3671066 (10Tgr) Looks like the test discovery relies on extension registration which Collection does not use. You'll probably have to... [04:18:08] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #542: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/542/ [05:37:01] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<10.00%) [06:13:48] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:38:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [07:12:02] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:13:50] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [07:50:11] 10Continuous-Integration-Infrastructure (shipyard), 10Operations, 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3671190 (10Joe) >>! In T177276#3666812, @Legoktm wrote: > Some requirements of this build process: > * Basic macro support: > ** `{{ apt... [08:48:52] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [08:57:12] (03CR) 10Thiemo Mättig (WMDE): "You might be interested in https://github.com/wmde/WikibaseCodeSniffer/pull/8. I strongly suggest to not make this already super-complicat" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/383168 (owner: 10Umherirrender) [10:45:16] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Operations, 10Jenkins: Upgrade ci ssh key to ecdsa - https://phabricator.wikimedia.org/T177826#3671597 (10Paladox) [10:48:37] 10Release-Engineering-Team, 10Jenkins: Allow users to view build history in jenkins - https://phabricator.wikimedia.org/T177827#3671612 (10Paladox) [11:46:30] PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 1342 bytes in 0.004 second response time [11:53:11] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - string 'Wikipedia' not found on 'https://en.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 1953 bytes in 0.047 second response time [11:56:30] RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 46514 bytes in 1.644 second response time [12:17:29] PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 1342 bytes in 0.005 second response time [12:22:36] RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 46564 bytes in 4.613 second response time [12:22:45] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #551: 04FAILURE in 44 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/551/ [12:23:08] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:51:28] 10Release-Engineering-Team (Watching / External), 10Contributors-Team, 10MobileFrontend, 10Operations, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3671838 (10Addshore) @MoritzMuehlenhoff and I just spent some time trying to re... [12:57:51] Project beta-scap-eqiad build #176878: 04FAILURE in 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/176878/ [13:00:41] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [13:01:29] 10Release-Engineering-Team, 10Collection, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board: Collection tests do not run properly - https://phabricator.wikimedia.org/T177672#3666355 (10phuedx) a:03phuedx [13:01:56] tests are failing because labs dns is down. [13:02:37] should be recovering now [13:03:18] PROBLEM - Puppet errors on saucelabs-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:06:08] Project beta-scap-eqiad build #176879: 04STILL FAILING in 2 min 28 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/176879/ [13:07:09] PROBLEM - Puppet errors on integration-slave-jessie-1004 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:07:17] 13:06:07 sudo -u mwdeploy -n -- /usr/bin/rsync -l deployment-tin.deployment-prep.eqiad.wmflabs::common/wikiversions*.{json,php} /srv/mediawiki on deployment-mediawiki05.deployment-prep.eqiad.wmflabs returned [255]: Connection closed by 10.68.22.21 [13:07:17] 13:06:07 [13:09:23] PROBLEM - Puppet errors on deployment-videoscaler01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:09:43] PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:10:39] PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:11:13] PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:12:21] Project beta-scap-eqiad build #176880: 04STILL FAILING in 2 min 23 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/176880/ [13:12:50] seems it is down again [13:12:59] 10Release-Engineering-Team, 10Collection, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board: Collection tests do not run properly - https://phabricator.wikimedia.org/T177672#3671894 (10phuedx) >>! In T177672#3671066, @Tgr wrote: > Looks like the test discovery relies on extension regi... [13:13:22] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:13:26] PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:16:14] Yippee, build fixed! [13:16:15] Project beta-scap-eqiad build #176881: 09FIXED in 2 min 33 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/176881/ [13:18:45] PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:18:59] PROBLEM - Puppet errors on deployment-mediawiki07 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:20:29] PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:30:55] PROBLEM - Puppet errors on integration-cumin is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:33:26] PROBLEM - Puppet errors on castor02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:40:40] RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:42:10] RECOVERY - Puppet errors on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [13:43:16] 10Release-Engineering-Team, 10Collection, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board: Collection tests do not run properly - https://phabricator.wikimedia.org/T177672#3672016 (10phuedx) Latest build output with the changes from {T177801} applied: https://integration.wikimedia.o... [13:43:20] RECOVERY - Puppet errors on saucelabs-03 is OK: OK: Less than 1.00% above the threshold [0.0] [13:47:09] 10Release-Engineering-Team (Watching / External), 10Contributors-Team, 10MobileFrontend, 10Operations, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3672019 (10jkroll) The wikidiff2 patch introduced new parameters with default v... [13:48:23] RECOVERY - Puppet errors on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0] [13:48:29] RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [13:49:21] RECOVERY - Puppet errors on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:49:45] RECOVERY - Puppet errors on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0] [13:50:39] RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [13:51:13] RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [13:53:42] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:58:58] RECOVERY - Puppet errors on deployment-mediawiki07 is OK: OK: Less than 1.00% above the threshold [0.0] [14:00:29] RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:07:15] 10Release-Engineering-Team (Watching / External), 10Contributors-Team, 10MobileFrontend, 10Operations, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3672066 (10MoritzMuehlenhoff) >>! In T176637#3672019, @jkroll wrote: > It shoul... [14:08:26] RECOVERY - Puppet errors on castor02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:10:55] RECOVERY - Puppet errors on integration-cumin is OK: OK: Less than 1.00% above the threshold [0.0] [14:26:53] 10Release-Engineering-Team (Watching / External), 10Contributors-Team, 10MobileFrontend, 10Operations, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3672117 (10jkroll) >>! In T176637#3672066, @MoritzMuehlenhoff wrote: >>>! In T1... [15:10:16] PROBLEM - Host integration-slave-docker-1705 is DOWN: CRITICAL - Host Unreachable (10.68.20.124) [15:11:06] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Patch-For-Review: Find CI container build location - https://phabricator.wikimedia.org/T173128#3672316 (10thcipriani) a:03thcipriani Current plan is to build and push containers from the CI hosts in production (currently contint1001). [15:12:54] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team: Remove integration-slave-docker-1705.integration.eqiad.wmflabs - https://phabricator.wikimedia.org/T177743#3672321 (10thcipriani) 05Open>03Resolved a:03thcipriani Removed. Was only needed while upgrading docker-ce and packagi... [15:16:07] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:30:43] (03CR) 10Thcipriani: "inline bikeshedding" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/380551 (https://phabricator.wikimedia.org/T175297) (owner: 10Dduvall) [15:32:45] PROBLEM - Puppet errors on deployment-kafka03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:51:09] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [16:07:43] RECOVERY - Puppet errors on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:17:25] PROBLEM - Puppet errors on deployment-kafka-jumbo-1 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:23:05] 10Release-Engineering-Team, 10Collection, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board: Collection tests do not run properly - https://phabricator.wikimedia.org/T177672#3666355 (10Jdlrobson) I confirm the patch fixes the problem. I had no idea extension.json was needed for automa... [16:29:19] !log adding "Ladsgroup" to admins in wikidatawiki in beta cluster [16:29:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:37:25] RECOVERY - Puppet errors on deployment-kafka-jumbo-1 is OK: OK: Less than 1.00% above the threshold [0.0] [16:42:40] Hi releng! I notice we have an exciting new php7/docker test environment. [16:43:26] How can I configure the SmashPig test env to install the sqlite driver? [16:48:13] The tests currently all fail, unfortunately: https://integration.wikimedia.org/ci/job/composer-php70-docker/8/console [17:03:45] 10Release-Engineering-Team, 10Collection, 10Proton, 10Readers-Web-Backlog, and 2 others: Collection tests do not run properly - https://phabricator.wikimedia.org/T177672#3672761 (10ovasileva) [17:10:26] 10Release-Engineering-Team, 10Collection, 10Proton, 10Readers-Web-Backlog, and 2 others: Collection tests do not run properly - https://phabricator.wikimedia.org/T177672#3672800 (10phuedx) [17:16:04] * Krinkle is confused why Jenkins/Zuul is clogged. [17:16:18] There is 1 mediawiki patch and 1 wikidata patch in the 'test' pipeline, executing about 3 jobs [17:16:22] and nothing else is happening anywhere. [17:16:28] What is it waiting for? [17:18:44] right, also 1 patch in submit. and that's 3x7=21 jobs, and we've got a limit of 25. Somehow I don't recall <= 3 patches holding up CI, but I guess it's fine. [17:18:56] Docker here we come :) [17:19:17] the waiting time is for deleting and creating the instances [17:19:30] see https://grafana.wikimedia.org/dashboard/db/nodepool [17:31:42] ejegg: uhoh, my bad. could you file a bug in #ci-config ? [17:32:07] I thought I had run tests for all repositories to make sure they pass [17:32:09] sure thing! [17:32:28] Not to worry, nothing urgent to merge there just now [17:33:04] ejegg: Also if the tests depend upon sqlite3, it could be added as a "require-dev": { "ext-sqlite3": "*" } so it fails much earlier [17:51:43] !log add "Ladsgroup" to oversight members in enwiki in beta cluster to test T177705 [17:51:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:51:48] T177705: REGRESSION - Option to suppress user accounts on blocks (hideuser) disappeared after migrating to OOjs UI - https://phabricator.wikimedia.org/T177705 [17:57:26] legoktm: good call, i'll add that! [17:57:36] (03PS1) 10Legoktm: Add php-sqlite3 extension to php docker image [integration/config] - 10https://gerrit.wikimedia.org/r/383387 [17:59:48] Amir1: Just saw ^^, I somewhat feel it might be related to what I did in T133036. [17:59:48] T133036: Field "hideuser" on Special:Block should be hidden when time field not indefinite - https://phabricator.wikimedia.org/T133036 [18:01:10] eddiegp: It might be, they might just notice it [18:01:21] but overall, everything looks fine to me [18:02:28] ejegg: https://integration.wikimedia.org/ci/job/composer-php70-docker/9/console [18:02:46] (03CR) 10Legoktm: [C: 032] Add php-sqlite3 extension to php docker image [integration/config] - 10https://gerrit.wikimedia.org/r/383387 (owner: 10Legoktm) [18:03:03] legoktm: quick work! [18:03:07] thank you [18:03:37] np [18:03:55] The require-dev patch is here if you want a fast-fail test: https://gerrit.wikimedia.org/r/383388 [18:03:59] (03Merged) 10jenkins-bot: Add php-sqlite3 extension to php docker image [integration/config] - 10https://gerrit.wikimedia.org/r/383387 (owner: 10Legoktm) [18:04:13] Amir1: I meant that I wondered whether changing to OOUI might have changed some css classes/ids (making the js logic hide the field and not show it again when expiry is set to infinity). [18:05:13] Don't know why it works on beta though. [18:08:34] 10Continuous-Integration-Infrastructure (shipyard), 10Operations, 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3673007 (10Legoktm) >>! In T177276#3671190, @Joe wrote: > * There is no need for cache busters as we ignore cache at image build time. T... [18:16:47] eddiegp: we had a patch to fix all of those js/css class names which it fixed apparently [18:19:28] Nice :) [18:23:22] greg-g: Hey, I added this to run a maintenance script to clean up ores tables: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=1772456&oldid=1772454 I hope that's fine, if not let me know so I revert it [18:24:02] Amir1: I hope so too :) [18:24:55] the script is easy, just takes lots of time to work so it doesn't cause replication lag [18:25:22] kk [18:25:29] DBAs know about it? [18:27:37] greg-g: yes [18:30:46] alrighty then [18:41:51] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [18:42:20] 10Beta-Cluster-Infrastructure, 10MediaWiki-Authentication-and-authorization, 10MediaWiki-extensions-CentralAuth, 10MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)), 10Patch-For-Review: "Loss of session data" on Beta Cluster - https://phabricator.wikimedia.org/T172560#3673157 (10Ryasmeen) Thi... [18:43:11] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-Parser, 10Readers-Web-Backlog (Tracking), 10User-Jdlrobson: Templates rendering as links on beta cluster - https://phabricator.wikimedia.org/T173576#3673159 (10Jdlrobson) Using expanded templates doesn't seem to help completely. There... [18:44:10] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-Parser, 10Readers-Web-Backlog (Tracking): Templates rendering as links on beta cluster - https://phabricator.wikimedia.org/T173576#3533588 (10Jdlrobson) p:05Triage>03Low [18:46:49] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [18:54:37] (03PS4) 10Krinkle: mwconf: Use built-in DevelopmentSettings.php instead when available [integration/jenkins] - 10https://gerrit.wikimedia.org/r/383040 (https://phabricator.wikimedia.org/T177669) [19:43:05] 10Continuous-Integration-Config, 10MediaWiki-extensions-MultimediaViewer, 10Multimedia, 10Readers-Web-Backlog, and 3 others: Drop old "jshint" job from MediaViewer extension test/etc. pipelines - https://phabricator.wikimedia.org/T140652#3673463 (10MBinder_WMF) [19:54:53] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:29:52] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [21:53:39] 10Release-Engineering-Team (Kanban), 10Phabricator: Add support for task types - https://phabricator.wikimedia.org/T93499#3673992 (10mmodell) [22:28:03] 10Release-Engineering-Team (Watching / External), 10Contributors-Team, 10MobileFrontend, 10Operations, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3674044 (10Addshore) So. I think we might be okay to close this ticket now? [22:36:49] 10Release-Engineering-Team (Watching / External), 10Contributors-Team, 10MobileFrontend, 10Operations, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3674078 (10Jdforrester-WMF) 05Open>03Resolved Provisionally marking as Reso... [23:40:32] (03PS1) 10Aude: Update Wikidata - wmf/1.31.0-wmf.3 [tools/release] - 10https://gerrit.wikimedia.org/r/383490 [23:41:28] (03CR) 10Aude: [C: 032] Update Wikidata - wmf/1.31.0-wmf.3 [tools/release] - 10https://gerrit.wikimedia.org/r/383490 (owner: 10Aude) [23:42:01] (03Merged) 10jenkins-bot: Update Wikidata - wmf/1.31.0-wmf.3 [tools/release] - 10https://gerrit.wikimedia.org/r/383490 (owner: 10Aude)