[00:02:35] I'll have another in a moment when I've fixed the corruption in my mediawiki-config.git [00:02:35] I'll have another in a moment when I've fixed the corruption in my mediawiki-config.git [00:07:48] andrewbogott, https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/474823 [00:07:48] andrewbogott, https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/474823 [00:12:43] andrewbogott, is the etherpad still up to date on status? [00:12:43] andrewbogott, is the etherpad still up to date on status? [00:12:56] nope, I'll update it shortly [00:12:56] nope, I'll update it shortly [00:13:21] ok [00:13:21] ok [00:14:56] wmf-config/wikitech.php:$wgOpenStackManagerProxyGateways = [ 'eqiad' => '208.80.155.156' ]; [00:14:56] wmf-config/wikitech.php:$wgOpenStackManagerProxyGateways = [ 'eqiad' => '208.80.155.156' ]; [00:15:01] this config will be unused these days right? [00:15:01] this config will be unused these days right? [00:15:06] all proxy management will be going through horizon? [00:15:06] all proxy management will be going through horizon? [00:15:14] PROBLEM - SSH on deployment-memc06 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:15] PROBLEM - SSH on deployment-memc06 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:22] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [00:15:22] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [00:15:43] PROBLEM - SSH on deployment-ircd is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:43] PROBLEM - SSH on deployment-ircd is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:44] PROBLEM - SSH on deployment-elastic05 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:44] PROBLEM - SSH on deployment-elastic05 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:47] PROBLEM - SSH on deployment-etcd-01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:47] PROBLEM - SSH on deployment-etcd-01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:47] PROBLEM - Host deployment-puppetmaster03 is DOWN: CRITICAL - Host Unreachable (10.68.23.29) [00:15:47] PROBLEM - Host deployment-puppetmaster03 is DOWN: CRITICAL - Host Unreachable (10.68.23.29) [00:16:08] PROBLEM - Host deployment-webperf12 is DOWN: CRITICAL - Host Unreachable (10.68.20.36) [00:16:08] PROBLEM - Host deployment-webperf12 is DOWN: CRITICAL - Host Unreachable (10.68.20.36) [00:16:35] PROBLEM - SSH on deployment-fluorine02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:35] PROBLEM - SSH on deployment-fluorine02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:39] PROBLEM - SSH on deployment-zookeeper02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:39] PROBLEM - SSH on deployment-zookeeper02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:52] PROBLEM - SSH on deployment-sca04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:52] PROBLEM - SSH on deployment-sca04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:05] heh [00:17:05] heh [00:17:11] PROBLEM - SSH on deployment-changeprop is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:11] PROBLEM - SSH on deployment-changeprop is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:13] why does it start notifying now? :) [00:17:13] why does it start notifying now? :) [00:17:17] Krenair ^^ [00:17:17] Krenair ^^ [00:17:24] PROBLEM - Host deployment-memc07 is DOWN: CRITICAL - Host Unreachable (10.68.17.171) [00:17:24] PROBLEM - Host deployment-memc07 is DOWN: CRITICAL - Host Unreachable (10.68.17.171) [00:17:25] PROBLEM - Host deployment-puppetdb02 is DOWN: CRITICAL - Host Unreachable (10.68.19.126) [00:17:25] PROBLEM - Host deployment-puppetdb02 is DOWN: CRITICAL - Host Unreachable (10.68.19.126) [00:17:25] PROBLEM - SSH on deployment-db04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:25] PROBLEM - SSH on deployment-db04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:29] looking [00:17:29] looking [00:17:30] PROBLEM - SSH on deployment-db03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:30] PROBLEM - SSH on deployment-db03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:30] PROBLEM - SSH on deployment-mcs01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:30] PROBLEM - SSH on deployment-mcs01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:30] PROBLEM - Host deployment-deploy02 is DOWN: CRITICAL - Host Unreachable (10.68.23.98) [00:17:31] PROBLEM - Host deployment-deploy02 is DOWN: CRITICAL - Host Unreachable (10.68.23.98) [00:17:37] PROBLEM - Host deployment-dumps-puppetmaster02 is DOWN: CRITICAL - Host Unreachable (10.68.20.216) [00:17:37] PROBLEM - Host deployment-dumps-puppetmaster02 is DOWN: CRITICAL - Host Unreachable (10.68.20.216) [00:17:42] PROBLEM - Host deployment-kafka-jumbo-1 is DOWN: CRITICAL - Host Unreachable (10.68.23.243) [00:17:43] PROBLEM - Host deployment-kafka-jumbo-1 is DOWN: CRITICAL - Host Unreachable (10.68.23.243) [00:17:44] I just restarted it, after merging a patch that (I thought) excluded that stuff from shinken [00:17:44] I just restarted it, after merging a patch that (I thought) excluded that stuff from shinken [00:17:51] I'll just kill it again :) [00:17:51] I'll just kill it again :) [00:17:52] PROBLEM - SSH on deployment-kafka-jumbo-2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:52] PROBLEM - SSH on deployment-kafka-jumbo-2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:52] PROBLEM - SSH on deployment-sca02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:52] PROBLEM - SSH on deployment-sca02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:53] PROBLEM - SSH on deployment-mathoid is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:53] PROBLEM - SSH on deployment-mathoid is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:00] PROBLEM - SSH on deployment-cache-upload04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:00] PROBLEM - SSH on deployment-cache-upload04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:01] PROBLEM - SSH on deployment-memc05 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:01] PROBLEM - SSH on deployment-memc05 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:20:05] test [00:20:05] test [00:20:37] deleted shinken-02:/etc/shinken/generated/deployment-prep.cfg [00:20:37] deleted shinken-02:/etc/shinken/generated/deployment-prep.cfg [00:20:41] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [00:20:41] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [00:20:48] should be fine now hopefully paladox andrewbogott [00:20:48] should be fine now hopefully paladox andrewbogott [00:20:53] :) [00:20:54] :) [00:20:54] :) [00:20:54] :) [00:35:24] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [00:35:24] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [00:38:25] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [00:38:25] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [00:52:44] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10Dzahn) [00:52:44] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10Dzahn) [00:53:09] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10Dzahn) @jeena ^ Done, i marked the check box. [00:53:09] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10Dzahn) @jeena ^ Done, i marked the check box. [01:09:09] we should probably have some check that deployment-puppetmaster03 has no unsigned certs waiting [01:09:09] we should probably have some check that deployment-puppetmaster03 has no unsigned certs waiting [01:15:40] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:15:40] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:18:07] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2783 bytes in 7.526 second response time [01:18:07] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2783 bytes in 7.526 second response time [01:21:22] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [01:21:22] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [01:46:22] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [01:46:23] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [01:49:04] andrewbogott, ^ that you? [01:49:04] andrewbogott, ^ that you? [01:49:44] looks like db04 is moving [01:49:44] looks like db04 is moving [01:54:44] twentyafterfour, ^ [01:54:44] twentyafterfour, ^ [01:54:48] it's almost 2AM here [01:54:48] it's almost 2AM here [01:55:03] I need to go [01:55:03] I need to go [01:55:12] maybe take a look at wmf-config/db-labs.php [01:55:12] maybe take a look at wmf-config/db-labs.php [01:55:20] could just change the IPs and hope db04 comes up soon enough [01:55:20] could just change the IPs and hope db04 comes up soon enough [01:55:28] 1:55am to be precise :) [01:55:28] 1:55am to be precise :) [01:55:30] might have to look at making it use the master as a replica [01:55:30] might have to look at making it use the master as a replica [01:55:54] I have a feeling upping the sectionLoads DEFAULT value from 0 might do the trick but I'm not sure [01:55:55] I have a feeling upping the sectionLoads DEFAULT value from 0 might do the trick but I'm not sure [02:52:10] * twentyafterfour is on it [02:52:10] ACTION is on it [05:41:13] 10Release-Engineering-Team (Kanban), 10User-greg: Share results of the Developer Satisfaction survey - https://phabricator.wikimedia.org/T209913 (10greg) p:05Triage>03Normal [05:41:13] 10Release-Engineering-Team (Kanban), 10User-greg: Share results of the Developer Satisfaction survey - https://phabricator.wikimedia.org/T209913 (10greg) p:05Triage>03Normal [05:42:02] 10Release-Engineering-Team (Kanban): Summarize Developer Satisfaction survey data - https://phabricator.wikimedia.org/T209914 (10greg) p:05Triage>03Normal [05:42:02] 10Release-Engineering-Team (Kanban): Summarize Developer Satisfaction survey data - https://phabricator.wikimedia.org/T209914 (10greg) p:05Triage>03Normal [05:42:16] 10Release-Engineering-Team (Kanban): Share results of the Developer Satisfaction survey - https://phabricator.wikimedia.org/T209913 (10greg) [05:42:16] 10Release-Engineering-Team (Kanban): Share results of the Developer Satisfaction survey - https://phabricator.wikimedia.org/T209913 (10greg) [05:44:03] 10Release-Engineering-Team (Kanban), 10Surveys: Complete first round of the Release Engineering Developer Satisfaction survey - https://phabricator.wikimedia.org/T197635 (10greg) [05:44:04] 10Release-Engineering-Team (Kanban), 10Surveys: Complete first round of the Release Engineering Developer Satisfaction survey - https://phabricator.wikimedia.org/T197635 (10greg) [05:45:47] twentyafterfour: I'm about to go to sleep. I updated https://etherpad.wikimedia.org/p/deployment-prep-to-neutron with the latest batch of IPs; there are another dozen or so VMs that should migrate overnight, then a few stragglers I'll figure out in the morning. [05:45:47] twentyafterfour: I'm about to go to sleep. I updated https://etherpad.wikimedia.org/p/deployment-prep-to-neutron with the latest batch of IPs; there are another dozen or so VMs that should migrate overnight, then a few stragglers I'll figure out in the morning. [05:45:56] 10Release-Engineering-Team (Kanban): Summarize Developer Satisfaction survey data - https://phabricator.wikimedia.org/T209914 (10greg) Survey closes on Nov 26th: https://lists.wikimedia.org/pipermail/engineering/2018-November/000642.html [05:45:56] 10Release-Engineering-Team (Kanban): Summarize Developer Satisfaction survey data - https://phabricator.wikimedia.org/T209914 (10greg) Survey closes on Nov 26th: https://lists.wikimedia.org/pipermail/engineering/2018-November/000642.html [05:46:07] andrewbogott: cool I'm still messing with mariadb trying to fix replication [05:46:07] andrewbogott: cool I'm still messing with mariadb trying to fix replication [05:46:30] andrewbogott: thanks for all the effort you've put into it so far [05:46:31] andrewbogott: thanks for all the effort you've put into it so far [05:46:48] get some rest ;) [05:46:48] get some rest ;) [05:49:41] 10Release-Engineering-Team (Kanban), 10Surveys: Create the Release Engineering Developer Satisfaction survey - https://phabricator.wikimedia.org/T209916 (10greg) 05Open>03Resolved p:05Triage>03Normal [05:49:41] 10Release-Engineering-Team (Kanban), 10Surveys: Create the Release Engineering Developer Satisfaction survey - https://phabricator.wikimedia.org/T209916 (10greg) 05Open>03Resolved p:05Triage>03Normal [05:49:43] 10Release-Engineering-Team (Kanban), 10Surveys: Create the Release Engineering Developer Satisfaction survey - https://phabricator.wikimedia.org/T209916 (10greg) https://lists.wikimedia.org/pipermail/engineering/2018-October/000629.html [05:49:43] 10Release-Engineering-Team (Kanban), 10Surveys: Create the Release Engineering Developer Satisfaction survey - https://phabricator.wikimedia.org/T209916 (10greg) https://lists.wikimedia.org/pipermail/engineering/2018-October/000629.html [05:50:15] 10Release-Engineering-Team, 10Epic: Complete first round of the Release Engineering Developer Satisfaction survey - https://phabricator.wikimedia.org/T197635 (10greg) [05:50:15] 10Release-Engineering-Team, 10Epic: Complete first round of the Release Engineering Developer Satisfaction survey - https://phabricator.wikimedia.org/T197635 (10greg) [05:50:57] don't mind the weird phab spam/rearrangement there, I got a little OCD before bed [05:50:57] don't mind the weird phab spam/rearrangement there, I got a little OCD before bed [05:51:18] "_ [05:51:18] "_ [05:51:22] also :) [05:51:23] also :) [05:52:43] gotta love resolving tasks right after creating them. it's just satisfying somehow. Plus it would totally improve some metrics (which we aren't tracking) if only we were tracking them. [05:52:43] gotta love resolving tasks right after creating them. it's just satisfying somehow. Plus it would totally improve some metrics (which we aren't tracking) if only we were tracking them. [05:53:31] average time to resolve: 2 minutes. median time to resolve: 8 years. [05:53:31] average time to resolve: 2 minutes. median time to resolve: 8 years. [05:53:34] something like that [05:53:34] something like that [06:00:24] :) [06:00:24] :) [06:17:05] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [06:17:05] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [08:54:12] so deployment-db04 seems to have a different host key (!?) and deployment-db03 is unreachable [08:54:12] so deployment-db04 seems to have a different host key (!?) and deployment-db03 is unreachable [09:02:57] so the appserver has the old db replica IP [09:02:57] so the appserver has the old db replica IP [09:03:09] but the deployment server has the new one [09:03:09] but the deployment server has the new one [09:03:11] not sure why [09:03:11] not sure why [09:03:13] maybe scap is upset [09:03:13] maybe scap is upset [09:05:08] Krenair: I think that’s expected [09:05:09] Krenair: I think that’s expected [09:05:15] Re host key change [09:05:15] Re host key change [09:29:12] paladox, no I don't think so [09:29:12] paladox, no I don't think so [09:29:16] other hosts have not had a change [09:29:16] other hosts have not had a change [09:29:59] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/LDAPProvider] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474852 [09:29:59] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/LDAPProvider] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474852 [09:30:01] (03CR) 10QChris: [V: 032 C: 032] Allow “Gerrit Managers” to import history [extensions/LDAPProvider] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474852 (owner: 10QChris) [09:30:01] (03CR) 10QChris: [V: 032 C: 032] Allow “Gerrit Managers” to import history [extensions/LDAPProvider] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474852 (owner: 10QChris) [09:30:33] (03PS1) 10QChris: Import done. Revoke import grants [extensions/LDAPProvider] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474853 [09:30:33] (03PS1) 10QChris: Import done. Revoke import grants [extensions/LDAPProvider] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474853 [09:30:35] (03CR) 10QChris: [V: 032 C: 032] Import done. Revoke import grants [extensions/LDAPProvider] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474853 (owner: 10QChris) [09:30:35] (03CR) 10QChris: [V: 032 C: 032] Import done. Revoke import grants [extensions/LDAPProvider] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474853 (owner: 10QChris) [09:39:56] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/LDAPAuthentication2] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474858 [09:39:56] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/LDAPAuthentication2] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474858 [09:39:57] (03CR) 10QChris: [V: 032 C: 032] Allow “Gerrit Managers” to import history [extensions/LDAPAuthentication2] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474858 (owner: 10QChris) [09:39:58] (03CR) 10QChris: [V: 032 C: 032] Allow “Gerrit Managers” to import history [extensions/LDAPAuthentication2] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474858 (owner: 10QChris) [09:40:20] (03PS1) 10QChris: Import done. Revoke import grants [extensions/LDAPAuthentication2] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474859 [09:40:21] (03PS1) 10QChris: Import done. Revoke import grants [extensions/LDAPAuthentication2] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474859 [09:40:22] (03CR) 10QChris: [V: 032 C: 032] Import done. Revoke import grants [extensions/LDAPAuthentication2] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474859 (owner: 10QChris) [09:40:22] (03CR) 10QChris: [V: 032 C: 032] Import done. Revoke import grants [extensions/LDAPAuthentication2] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474859 (owner: 10QChris) [09:45:21] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/LDAPUserInfo] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474862 [09:45:21] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/LDAPUserInfo] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474862 [09:45:23] (03CR) 10QChris: [V: 032 C: 032] Allow “Gerrit Managers” to import history [extensions/LDAPUserInfo] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474862 (owner: 10QChris) [09:45:23] (03CR) 10QChris: [V: 032 C: 032] Allow “Gerrit Managers” to import history [extensions/LDAPUserInfo] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474862 (owner: 10QChris) [09:45:46] (03PS1) 10QChris: Import done. Revoke import grants [extensions/LDAPUserInfo] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474863 [09:45:46] (03PS1) 10QChris: Import done. Revoke import grants [extensions/LDAPUserInfo] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474863 [09:45:48] (03CR) 10QChris: [V: 032 C: 032] Import done. Revoke import grants [extensions/LDAPUserInfo] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474863 (owner: 10QChris) [09:45:48] (03CR) 10QChris: [V: 032 C: 032] Import done. Revoke import grants [extensions/LDAPUserInfo] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474863 (owner: 10QChris) [10:03:06] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36932 bytes in 6.219 second response time [10:03:06] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36932 bytes in 6.219 second response time [10:03:11] I brought it back up [10:03:11] I brought it back up [10:04:07] !log manually fixed deployment-mediawiki-09:/srv/mediawiki/wmf-config/db-labs.php to match deployment copy, not sure why it didn't deploy properly yet [10:04:07] !log manually fixed deployment-mediawiki-09:/srv/mediawiki/wmf-config/db-labs.php to match deployment copy, not sure why it didn't deploy properly yet [10:04:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:04:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:05:33] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 47994 bytes in 3.421 second response time [10:05:33] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 47994 bytes in 3.421 second response time [10:12:54] changed profile::openstack::main::region to eqiad1-r so cumin targets the new hosts instead of the old ones [10:12:54] changed profile::openstack::main::region to eqiad1-r so cumin targets the new hosts instead of the old ones [10:12:57] for the record old ones [10:12:57] for the record old ones [10:13:10] deployment-cache-text04.deployment-prep.eqiad.wmflabs,deployment-elastic[05-07].deployment-prep.eqiad.wmflabs,deployment-maps05.deployment-prep.eqiad.wmflabs,deployment-rd3-cptest-master01.deployment-prep.eqiad.wmflabs,deployment-redis[05-06].deployment-prep.eqiad.wmflabs [10:13:10] deployment-cache-text04.deployment-prep.eqiad.wmflabs,deployment-elastic[05-07].deployment-prep.eqiad.wmflabs,deployment-maps05.deployment-prep.eqiad.wmflabs,deployment-rd3-cptest-master01.deployment-prep.eqiad.wmflabs,deployment-redis[05-06].deployment-prep.eqiad.wmflabs [10:14:34] armed keyholder on cumin [10:14:34] armed keyholder on cumin [10:18:28] deployment-db04 has different host keys, as has memc05, imagescaler01, sca01, mcs01, ircd, poolcounter04, aqs01, restbase02, restbase01, pdfrender02, mathoid, aqs02, parsoid09, fluorine02, changeprop, sca04, aqs03, sentry01, memc04, conf03, cache-upload04, prometheus01 [10:18:28] deployment-db04 has different host keys, as has memc05, imagescaler01, sca01, mcs01, ircd, poolcounter04, aqs01, restbase02, restbase01, pdfrender02, mathoid, aqs02, parsoid09, fluorine02, changeprop, sca04, aqs03, sentry01, memc04, conf03, cache-upload04, prometheus01 [10:18:34] all have new host keys for some reason [10:18:34] all have new host keys for some reason [10:18:40] the rest seem to have kept the correct ones [10:18:40] the rest seem to have kept the correct ones [10:25:08] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/LDAPGroups] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474871 [10:25:08] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [extensions/LDAPGroups] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474871 [10:25:10] (03CR) 10QChris: [V: 032 C: 032] Allow “Gerrit Managers” to import history [extensions/LDAPGroups] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474871 (owner: 10QChris) [10:25:10] (03CR) 10QChris: [V: 032 C: 032] Allow “Gerrit Managers” to import history [extensions/LDAPGroups] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474871 (owner: 10QChris) [10:25:32] (03PS1) 10QChris: Import done. Revoke import grants [extensions/LDAPGroups] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474872 [10:25:32] (03PS1) 10QChris: Import done. Revoke import grants [extensions/LDAPGroups] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474872 [10:25:34] (03CR) 10QChris: [V: 032 C: 032] Import done. Revoke import grants [extensions/LDAPGroups] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474872 (owner: 10QChris) [10:25:34] (03CR) 10QChris: [V: 032 C: 032] Import done. Revoke import grants [extensions/LDAPGroups] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474872 (owner: 10QChris) [10:54:19] (03PS1) 10Hashar: Grant permission to push tags to labs-toollabs [labs/toollabs] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474880 [10:54:19] (03PS1) 10Hashar: Grant permission to push tags to labs-toollabs [labs/toollabs] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474880 [10:55:27] (03CR) 10Hashar: [V: 032 C: 032] "lets try" [labs/toollabs] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474880 (owner: 10Hashar) [10:55:27] (03CR) 10Hashar: [V: 032 C: 032] "lets try" [labs/toollabs] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474880 (owner: 10Hashar) [10:58:34] !log Clearing out deployment-deploy01 disk space. Went offline due to disk space consumption [10:58:34] !log Clearing out deployment-deploy01 disk space. Went offline due to disk space consumption [10:58:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:58:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:59:08] err [10:59:08] err [11:00:34] !log deployment-deploy01 got migrated to a new region but the Jenkins configuration had not been updated. Adjusting IP address from 10.68.23.38 to 172.16.4.18 | T208101 [11:00:34] !log deployment-deploy01 got migrated to a new region but the Jenkins configuration had not been updated. Adjusting IP address from 10.68.23.38 to 172.16.4.18 | T208101 [11:00:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:00:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:00:37] T208101: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 [11:00:37] T208101: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 [11:01:28] Project beta-update-databases-eqiad build #29897: 04FAILURE in 4.6 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29897/ [11:01:28] Project beta-update-databases-eqiad build #29897: 04FAILURE in 4.6 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29897/ [11:02:40] Yippee, build fixed! [11:02:40] Yippee, build fixed! [11:02:40] Project beta-code-update-eqiad build #225525: 09FIXED in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/225525/ [11:02:41] Project beta-code-update-eqiad build #225525: 09FIXED in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/225525/ [11:03:31] (03PS1) 10Hashar: Fix grammar in tests/test_coverage.py [integration/config] - 10https://gerrit.wikimedia.org/r/474881 [11:03:32] (03PS1) 10Hashar: Fix grammar in tests/test_coverage.py [integration/config] - 10https://gerrit.wikimedia.org/r/474881 [11:13:10] 10Deployments, 10Release-Engineering-Team (Kanban), 10HHVM, 10Wikimedia-Incident: Figure out why HHVM kept running stale code for hours - https://phabricator.wikimedia.org/T181833 (10ArielGlenn) Can we think about ways to detect the problem within short period of time? I'd like to see that happen at least. [11:13:10] 10Deployments, 10Release-Engineering-Team (Kanban), 10HHVM, 10Wikimedia-Incident: Figure out why HHVM kept running stale code for hours - https://phabricator.wikimedia.org/T181833 (10ArielGlenn) Can we think about ways to detect the problem within short period of time? I'd like to see that happen at least. [11:13:19] Krenair: it happened to me [11:13:19] Krenair: it happened to me [11:14:57] Project beta-scap-eqiad build #228235: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228235/ [11:14:57] Project beta-scap-eqiad build #228235: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228235/ [11:15:10] (03CR) 10Ladsgroup: [C: 031] readme: how to reproduce a CI build [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [11:15:10] (03CR) 10Ladsgroup: [C: 031] readme: how to reproduce a CI build [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [11:20:04] Project beta-update-databases-eqiad build #29898: 04STILL FAILING in 4.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29898/ [11:20:04] Project beta-update-databases-eqiad build #29898: 04STILL FAILING in 4.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29898/ [11:27:32] Project beta-scap-eqiad build #228236: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228236/ [11:27:32] Project beta-scap-eqiad build #228236: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228236/ [11:37:17] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team (Kanban): Report CI jobs not running on Docker slaves - https://phabricator.wikimedia.org/T209935 (10hashar) [11:37:17] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team (Kanban): Report CI jobs not running on Docker slaves - https://phabricator.wikimedia.org/T209935 (10hashar) [11:38:59] Project beta-scap-eqiad build #228237: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228237/ [11:38:59] Project beta-scap-eqiad build #228237: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228237/ [11:40:47] (03PS1) 10Hashar: QA report: list jobs not on Docker slaves [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) [11:40:47] (03PS1) 10Hashar: QA report: list jobs not on Docker slaves [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) [11:43:21] If deployment-ircd got moved why is irc.beta.wmflabs.org. still 208.80.155.209 in openstack-browser, but not actually in DNS? [11:43:21] If deployment-ircd got moved why is irc.beta.wmflabs.org. still 208.80.155.209 in openstack-browser, but not actually in DNS? [11:47:23] !log Armed keyholder on deployment-deploy01 Got shutdown while being migrated a new cloud region # T208101 [11:47:23] !log Armed keyholder on deployment-deploy01 Got shutdown while being migrated a new cloud region # T208101 [11:47:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:47:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:47:26] T208101: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 [11:47:27] T208101: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 [11:52:57] (03CR) 10jerkins-bot: [V: 04-1] QA report: list jobs not on Docker slaves [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [11:52:57] (03CR) 10jerkins-bot: [V: 04-1] QA report: list jobs not on Docker slaves [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [11:53:24] hashar, probably want to do deploy02 too? [11:53:24] hashar, probably want to do deploy02 too? [11:55:35] hashar, https://gerrit.wikimedia.org/r/474890 [11:55:35] hashar, https://gerrit.wikimedia.org/r/474890 [11:56:42] (03PS2) 10Hashar: QA report: list jobs not on Docker slaves [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) [11:56:42] (03PS2) 10Hashar: QA report: list jobs not on Docker slaves [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) [12:00:26] (03CR) 10Hashar: [C: 032] "Will then run https://integration.wikimedia.org/ci/job/integration-config-qa/" [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [12:00:26] (03CR) 10Hashar: [C: 032] "Will then run https://integration.wikimedia.org/ci/job/integration-config-qa/" [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [12:01:21] Project beta-scap-eqiad build #228238: 04STILL FAILING in 21 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228238/ [12:01:21] Project beta-scap-eqiad build #228238: 04STILL FAILING in 21 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228238/ [12:01:37] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2765 bytes in 8.167 second response time [12:01:37] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2765 bytes in 8.167 second response time [12:04:05] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2765 bytes in 3.108 second response time [12:04:05] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2765 bytes in 3.108 second response time [12:04:47] (03Merged) 10jenkins-bot: QA report: list jobs not on Docker slaves [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [12:04:47] (03Merged) 10jenkins-bot: QA report: list jobs not on Docker slaves [integration/config] - 10https://gerrit.wikimedia.org/r/474886 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [12:05:25] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Report CI jobs not running on Docker slaves - https://phabricator.wikimedia.org/T209935 (10hashar) The report gives me: | Job | Tied to |--|-- | analytics-refinery-release | contintLabsSlave && Debian... [12:05:25] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Report CI jobs not running on Docker slaves - https://phabricator.wikimedia.org/T209935 (10hashar) The report gives me: | Job | Tied to |--|-- | analytics-refinery-release | contintLabsSlave && Debian... [12:06:53] hmm, so on deployment-mediawiki-07 it seems that mwdeploy has a different uid/gid in /etc/passwd vs the uid/gid owning /var/lib/mwdeploy [12:06:53] hmm, so on deployment-mediawiki-07 it seems that mwdeploy has a different uid/gid in /etc/passwd vs the uid/gid owning /var/lib/mwdeploy [12:07:22] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [10.0] [12:07:22] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [10.0] [12:07:24] is the /etc/passwd and /etc/group entry even supposed to exist? is mwdeploy an ldap user? [12:07:24] is the /etc/passwd and /etc/group entry even supposed to exist? is mwdeploy an ldap user? [12:07:50] scap is failing with "Could not chdir to home directory /var/lib/mwdeploy: Permission denied" [12:07:50] scap is failing with "Could not chdir to home directory /var/lib/mwdeploy: Permission denied" [12:11:21] !log scap failures on deployment-mediawiki-07 are related to uid/gid mismatch of the mwdeploy user, specifically the owner of that user's home dir is uid 603 but /etc/passwd|group have a different uid/gid for the same username. T208101 [12:11:21] !log scap failures on deployment-mediawiki-07 are related to uid/gid mismatch of the mwdeploy user, specifically the owner of that user's home dir is uid 603 but /etc/passwd|group have a different uid/gid for the same username. T208101 [12:11:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:11:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:11:24] T208101: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 [12:11:24] T208101: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 [12:15:07] (03PS1) 10Hashar: Add cache dir for integration-config-qa [integration/config] - 10https://gerrit.wikimedia.org/r/474896 (https://phabricator.wikimedia.org/T209935) [12:15:07] (03PS1) 10Hashar: Add cache dir for integration-config-qa [integration/config] - 10https://gerrit.wikimedia.org/r/474896 (https://phabricator.wikimedia.org/T209935) [12:15:24] Project beta-scap-eqiad build #228239: 04STILL FAILING in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228239/ [12:15:25] Project beta-scap-eqiad build #228239: 04STILL FAILING in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228239/ [12:15:27] (03CR) 10Hashar: [C: 032] "That fixed it" [integration/config] - 10https://gerrit.wikimedia.org/r/474896 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [12:15:27] (03CR) 10Hashar: [C: 032] "That fixed it" [integration/config] - 10https://gerrit.wikimedia.org/r/474896 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [12:16:37] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Report CI jobs not running on Docker slaves - https://phabricator.wikimedia.org/T209935 (10hashar) 05Open>03Resolved a:03hashar First build is https://integration.wikimedia.org/ci/job/integration... [12:16:38] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Report CI jobs not running on Docker slaves - https://phabricator.wikimedia.org/T209935 (10hashar) 05Open>03Resolved a:03hashar First build is https://integration.wikimedia.org/ci/job/integration... [12:18:53] (03Merged) 10jenkins-bot: Add cache dir for integration-config-qa [integration/config] - 10https://gerrit.wikimedia.org/r/474896 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [12:18:53] (03Merged) 10jenkins-bot: Add cache dir for integration-config-qa [integration/config] - 10https://gerrit.wikimedia.org/r/474896 (https://phabricator.wikimedia.org/T209935) (owner: 10Hashar) [12:19:47] lunch & [12:19:47] lunch & [12:20:09] Project beta-update-databases-eqiad build #29899: 04STILL FAILING in 9.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29899/ [12:20:09] Project beta-update-databases-eqiad build #29899: 04STILL FAILING in 9.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29899/ [12:28:18] Project beta-scap-eqiad build #228240: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228240/ [12:28:18] Project beta-scap-eqiad build #228240: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228240/ [12:40:56] Project beta-scap-eqiad build #228241: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228241/ [12:40:56] Project beta-scap-eqiad build #228241: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228241/ [12:53:48] Project beta-scap-eqiad build #228242: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228242/ [12:53:48] Project beta-scap-eqiad build #228242: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228242/ [12:53:57] (03PS6) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [12:53:57] (03PS6) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [12:58:42] (03CR) 10Mainframe98: Forbid usage of is_null() (037 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [12:58:42] (03CR) 10Mainframe98: Forbid usage of is_null() (037 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [13:01:47] !log PHP Startup: Unable to load dynamic library '/usr/lib/php/20151012/luasandbox.so' - /usr/lib/php/20151012/luasandbox.so: cannot open shared object file: No such file or directory T208101 [13:01:47] !log PHP Startup: Unable to load dynamic library '/usr/lib/php/20151012/luasandbox.so' - /usr/lib/php/20151012/luasandbox.so: cannot open shared object file: No such file or directory T208101 [13:01:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:01:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:01:50] T208101: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 [13:01:50] T208101: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 [13:06:43] Project beta-scap-eqiad build #228243: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228243/ [13:06:43] Project beta-scap-eqiad build #228243: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228243/ [13:07:26] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [13:07:26] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [13:07:47] Project beta-scap-eqiad build #228244: 04STILL FAILING in 0.77 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228244/ [13:07:47] Project beta-scap-eqiad build #228244: 04STILL FAILING in 0.77 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228244/ [13:08:26] !log removed local unix user mwdeploy from deployment-mediawiki-07 because it was shadowing the real mwdeploy user in ldap [13:08:26] !log removed local unix user mwdeploy from deployment-mediawiki-07 because it was shadowing the real mwdeploy user in ldap [13:08:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:08:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:08:29] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Ubuntu Trusty Deprecation): cloudvps: deployment-prep project trusty deprecation - https://phabricator.wikimedia.org/T204500 (10aborrero) The only Trusty instance is in SHUTOFF state. Deadline is approaching (2018-12-18). Shall we just delete the instance? [13:08:29] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Ubuntu Trusty Deprecation): cloudvps: deployment-prep project trusty deprecation - https://phabricator.wikimedia.org/T204500 (10aborrero) The only Trusty instance is in SHUTOFF state. Deadline is approaching (2018-12-18). Shall we just delete the instance? [13:09:03] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36971 bytes in 3.676 second response time [13:09:03] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36971 bytes in 3.676 second response time [13:10:26] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Ubuntu Trusty Deprecation): cloudvps: deployment-prep project trusty deprecation - https://phabricator.wikimedia.org/T204500 (10Krenair) >>! In T204500#4761814, @aborrero wrote: > The only Trusty instance is in SHUTOFF state. Deadline is approaching (2018-12-18). Sha... [13:10:26] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Ubuntu Trusty Deprecation): cloudvps: deployment-prep project trusty deprecation - https://phabricator.wikimedia.org/T204500 (10Krenair) >>! In T204500#4761814, @aborrero wrote: > The only Trusty instance is in SHUTOFF state. Deadline is approaching (2018-12-18). Sha... [13:11:34] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48018 bytes in 4.503 second response time [13:11:34] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48018 bytes in 4.503 second response time [13:17:26] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [13:17:26] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [13:20:07] Project beta-update-databases-eqiad build #29900: 04STILL FAILING in 7.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29900/ [13:20:08] Project beta-update-databases-eqiad build #29900: 04STILL FAILING in 7.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29900/ [13:22:50] Project beta-scap-eqiad build #228245: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228245/ [13:22:50] Project beta-scap-eqiad build #228245: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228245/ [13:22:57] bah [13:22:57] bah [13:23:47] deployment-mediawiki07.deployment-prep.eqiad.wmflabs is down apparently [13:23:47] deployment-mediawiki07.deployment-prep.eqiad.wmflabs is down apparently [13:24:11] hashar, I think you need a hyphen before the 07 [13:24:11] hashar, I think you need a hyphen before the 07 [13:24:25] I also think there used to be one without the hyphen? [13:24:25] I also think there used to be one without the hyphen? [13:24:57] horizon shows deployment-mediawiki-07 in the new region https://horizon.wikimedia.org/project/instances/4bff32b7-316a-40d4-9de4-3a13a7ebf62a/ [13:24:57] horizon shows deployment-mediawiki-07 in the new region https://horizon.wikimedia.org/project/instances/4bff32b7-316a-40d4-9de4-3a13a7ebf62a/ [13:25:11] yep [13:25:11] yep [13:25:15] but -07 instead of 07 [13:25:16] but -07 instead of 07 [13:25:37] krenair@deployment-mediawiki-07:~$ date [13:25:37] krenair@deployment-mediawiki-07:~$ date [13:25:37] Tue Nov 20 13:25:34 UTC 2018 [13:25:38] Tue Nov 20 13:25:34 UTC 2018 [13:25:59] hashar@bastion-01:~$ dig +short ANY deployment-mediawiki07.deployment-prep.eqiad.wmflabs [13:25:59] hashar@bastion-01:~$ dig +short ANY deployment-mediawiki07.deployment-prep.eqiad.wmflabs [13:25:59] hashar@bastion-01:~$ [13:25:59] hashar@bastion-01:~$ [13:26:01] ah [13:26:01] ah [13:27:35] which is confusing since hosts in the old region did not have a dash between the name and the number :D [13:27:35] which is confusing since hosts in the old region did not have a dash between the name and the number :D [13:27:59] but yeah working now thanks [13:27:59] but yeah working now thanks [13:28:38] no [13:28:38] no [13:28:45] they had a dash [13:28:45] they had a dash [13:28:48] hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data [13:28:48] hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data [13:28:48] :D [13:28:49] :D [13:28:50] the dash has been there for quite a while [13:28:50] the dash has been there for quite a while [13:29:06] I think it dates back to when people were moving the appserver hosts to stretch [13:29:06] I think it dates back to when people were moving the appserver hosts to stretch [13:29:09] modules/trafficserver/manifests/init.pp:# 'replacement' => 'http://deployment-mediawiki05.deployment-prep.eqiad.wmflabs/' }, ], [13:29:09] modules/trafficserver/manifests/init.pp:# 'replacement' => 'http://deployment-mediawiki05.deployment-prep.eqiad.wmflabs/' }, ], [13:29:09] :) [13:29:09] :) [13:29:11] ah yeah [13:29:11] ah yeah [13:29:20] I havent followed beta cluster for a year at least [13:29:20] I havent followed beta cluster for a year at least [13:29:23] I am outdated :) [13:29:23] I am outdated :) [13:29:51] Krenair, puppet is working on the deployment-elastic* servers now, so I'm going to move them shortly. [13:29:51] Krenair, puppet is working on the deployment-elastic* servers now, so I'm going to move them shortly. [13:30:23] cool [13:30:23] cool [13:30:29] andrewbogott, I also noted maps05 as having OK puppet [13:30:29] andrewbogott, I also noted maps05 as having OK puppet [13:30:38] andrewbogott, see the etherpad, I've been updating it [13:30:38] andrewbogott, see the etherpad, I've been updating it [13:30:40] Trying to find someone who will admit to owning/caring about those redis boxes. [13:30:40] Trying to find someone who will admit to owning/caring about those redis boxes. [13:30:42] great! [13:30:42] great! [13:32:37] andrewbogott, what's the status of zotero01? [13:32:37] andrewbogott, what's the status of zotero01? [13:32:52] that's our remaining trusty host and it doesn't seem to have come back up [13:32:52] that's our remaining trusty host and it doesn't seem to have come back up [13:32:57] 10Beta-Cluster-Infrastructure: hhvm systemd service on deployment-prep reports: hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data - https://phabricator.wikimedia.org/T209946 (10hashar) [13:32:57] 10Beta-Cluster-Infrastructure: hhvm systemd service on deployment-prep reports: hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data - https://phabricator.wikimedia.org/T209946 (10hashar) [13:33:04] it still appears to exist in the old region [13:33:04] it still appears to exist in the old region [13:33:17] destination host unreachable [13:33:17] destination host unreachable [13:33:35] I pinged about that in -cloud [13:33:35] I pinged about that in -cloud [13:33:46] it uses a lost base image (which is almost certainly my fault) [13:33:46] it uses a lost base image (which is almost certainly my fault) [13:33:59] I can probably hack up a fix but want confirmation that it's actually useful first. [13:33:59] I can probably hack up a fix but want confirmation that it's actually useful first. [13:34:07] Since it's Trusty I assume it's due for a rebuild anyway? [13:34:07] Since it's Trusty I assume it's due for a rebuild anyway? [13:34:39] well [13:34:39] well [13:34:50] see https://phabricator.wikimedia.org/T204500 and subtasks [13:34:50] see https://phabricator.wikimedia.org/T204500 and subtasks [13:35:35] Project beta-scap-eqiad build #228246: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228246/ [13:35:35] Project beta-scap-eqiad build #228246: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228246/ [13:36:33] 10Beta-Cluster-Infrastructure, 10HHVM, 10Patch-For-Review: hhvm systemd service on deployment-prep reports: hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data - https://phabricator.wikimedia.org/T209946 (10hashar) Due to a typo that has been there since we added support for systemd/... [13:36:33] 10Beta-Cluster-Infrastructure, 10HHVM, 10Patch-For-Review: hhvm systemd service on deployment-prep reports: hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data - https://phabricator.wikimedia.org/T209946 (10hashar) Due to a typo that has been there since we added support for systemd/... [13:37:11] this thing is also our oldest host [13:37:11] this thing is also our oldest host [13:37:26] march 2015 [13:37:26] march 2015 [13:37:41] dunno why it's lasted this long [13:37:41] dunno why it's lasted this long [13:39:38] instances should be replaced more often than that [13:39:38] instances should be replaced more often than that [13:40:45] I'm trying to revive it with an old image, will see how it goes. [13:40:45] I'm trying to revive it with an old image, will see how it goes. [13:41:31] well, I guess, a much newer image, but still an old one :) [13:41:31] well, I guess, a much newer image, but still an old one :) [13:41:43] just another trusty image right? [13:41:44] just another trusty image right? [13:42:01] yeah [13:42:01] yeah [13:42:16] So far the word on redis in _security is 'iirc changeprop in deployment-prep is not using redis at all' [13:42:16] So far the word on redis in _security is 'iirc changeprop in deployment-prep is not using redis at all' [13:42:41] but I'm still not clear on if that means those VMs can be deleted. Filippo made them but didn't have much to do with them after that. [13:42:41] but I'm still not clear on if that means those VMs can be deleted. Filippo made them but didn't have much to do with them after that. [13:43:44] alright well they would've been running the MW job queue [13:43:44] alright well they would've been running the MW job queue [13:43:50] but apparently that got obsoleted [13:43:50] but apparently that got obsoleted [13:43:53] ? [13:43:53] ? [13:46:33] I'm pretty sure that's right [13:46:33] I'm pretty sure that's right [13:47:47] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Ubuntu Trusty Deprecation): cloudvps: deployment-prep project trusty deprecation - https://phabricator.wikimedia.org/T204500 (10Mvolz) >>! In T204500#4761814, @aborrero wrote: > The only Trusty instance is in SHUTOFF state. Deadline is approaching (2018-12-18). Shall... [13:47:47] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Ubuntu Trusty Deprecation): cloudvps: deployment-prep project trusty deprecation - https://phabricator.wikimedia.org/T204500 (10Mvolz) >>! In T204500#4761814, @aborrero wrote: > The only Trusty instance is in SHUTOFF state. Deadline is approaching (2018-12-18). Shall... [13:51:53] Krenair: is deployment-cache-text04 totally replaced now? [13:51:53] Krenair: is deployment-cache-text04 totally replaced now? [13:53:19] Project beta-scap-eqiad build #228247: 04STILL FAILING in 16 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228247/ [13:53:19] Project beta-scap-eqiad build #228247: 04STILL FAILING in 16 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228247/ [13:54:24] Krenair: Giuseppe says about redis "we're not using the old jobqueue, we should remove those vms" [13:54:24] Krenair: Giuseppe says about redis "we're not using the old jobqueue, we should remove those vms" [13:55:30] andrewbogott, not yet [13:55:30] andrewbogott, not yet [13:55:34] been a bit busy [13:55:34] been a bit busy [13:55:51] Krenair: so I see :) [13:55:51] Krenair: so I see :) [13:56:06] andrewbogott, okay, sounds like we can delete the old redis VMs then [13:56:06] andrewbogott, okay, sounds like we can delete the old redis VMs then [13:56:24] yep, done [13:56:24] yep, done [13:56:52] andrewbogott, so you wanna move the elastic nodes to group 5 and get started on that? [13:56:52] andrewbogott, so you wanna move the elastic nodes to group 5 and get started on that? [13:57:16] elastic05 and elastic06 are already on the move [13:57:16] elastic05 and elastic06 are already on the move [13:57:28] can you keep the etherpad up to date please? [13:57:28] can you keep the etherpad up to date please? [13:57:46] yes, sorry [13:57:46] yes, sorry [13:58:42] did you find the zotero01 image? [13:58:42] did you find the zotero01 image? [14:03:18] Krenair: I'm trying to copy it over a more modern Trusty base. I've done that once before and it worked OK. We'll know in 5 or so minutes. [14:03:18] Krenair: I'm trying to copy it over a more modern Trusty base. I've done that once before and it worked OK. We'll know in 5 or so minutes. [14:03:34] ok [14:03:34] ok [14:03:36] do you know why the SSH host keys got overwritten on a lot of instances? [14:03:36] do you know why the SSH host keys got overwritten on a lot of instances? [14:06:35] I don't! I've seen that in a few places but can't think of a good reason why it would happen. [14:06:35] I don't! I've seen that in a few places but can't think of a good reason why it would happen. [14:06:47] Project beta-scap-eqiad build #228248: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228248/ [14:06:47] Project beta-scap-eqiad build #228248: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228248/ [14:07:05] Was there any pattern to where you saw it? e.g. only on hosts with public IPs? [14:07:05] Was there any pattern to where you saw it? e.g. only on hosts with public IPs? [14:07:36] no we don't have many hosts with floating IPs but we had a lot of hosts with broken SSH host keys [14:07:36] no we don't have many hosts with floating IPs but we had a lot of hosts with broken SSH host keys [14:07:45] hm [14:07:45] hm [14:07:47] we have maybe like 4 hosts with floating IPs I think? [14:07:47] we have maybe like 4 hosts with floating IPs I think? [14:07:52] yeah [14:07:52] yeah [14:08:01] deployment-db04 has different host keys, as has memc05, imagescaler01, sca01, mcs01, ircd, poolcounter04, aqs01, restbase02, restbase01, pdfrender02, mathoid, aqs02, parsoid09, fluorine02, changeprop, sca04, aqs03, sentry01, memc04, conf03, cache-upload04, prometheus01 [14:08:01] all have new host keys for some reason [14:08:01] deployment-db04 has different host keys, as has memc05, imagescaler01, sca01, mcs01, ircd, poolcounter04, aqs01, restbase02, restbase01, pdfrender02, mathoid, aqs02, parsoid09, fluorine02, changeprop, sca04, aqs03, sentry01, memc04, conf03, cache-upload04, prometheus01 [14:08:01] the rest seem to have kept the correct ones [14:08:01] all have new host keys for some reason [14:08:01] the rest seem to have kept the correct ones [14:10:07] I don't see any pattern in there but it's a *lot* of instances [14:10:07] I don't see any pattern in there but it's a *lot* of instances [14:13:52] 10Beta-Cluster-Infrastructure, 10Operations, 10HHVM, 10Patch-For-Review: hhvm systemd service on deployment-prep reports: hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data - https://phabricator.wikimedia.org/T209946 (10hashar) + #operations since that affects production as well (... [14:13:52] 10Beta-Cluster-Infrastructure, 10Operations, 10HHVM, 10Patch-For-Review: hhvm systemd service on deployment-prep reports: hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data - https://phabricator.wikimedia.org/T209946 (10hashar) + #operations since that affects production as well (... [14:18:58] Project beta-scap-eqiad build #228249: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228249/ [14:18:58] Project beta-scap-eqiad build #228249: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228249/ [14:20:02] Project beta-update-databases-eqiad build #29901: 04STILL FAILING in 2.1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29901/ [14:20:02] Project beta-update-databases-eqiad build #29901: 04STILL FAILING in 2.1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29901/ [14:22:12] Krenair: ready for me to move deployment-elastic07? 06 is done. [14:22:12] Krenair: ready for me to move deployment-elastic07? 06 is done. [14:22:18] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10HHVM, 10Patch-For-Review: hhvm systemd service on deployment-prep reports: hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data - https://phabricator.wikimedia.org/T209946 (10hashar) a:03hashar [14:22:18] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10HHVM, 10Patch-For-Review: hhvm systemd service on deployment-prep reports: hhvm.service: Ignoring invalid environment assignment 'RUN_AS_GROUP=www-data - https://phabricator.wikimedia.org/T209946 (10hashar) a:03hashar [14:22:30] (03CR) 10Thiemo Kreuz (WMDE): "The code looks pretty awesome! Often the code that does the cleanups in these sniffs is pretty hard to read. Not this one. Good job!" (035 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [14:22:30] (03CR) 10Thiemo Kreuz (WMDE): "The code looks pretty awesome! Often the code that does the cleanups in these sniffs is pretty hard to read. Not this one. Good job!" (035 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [14:22:50] andrewbogott, are you sure they're done? [14:22:50] andrewbogott, are you sure they're done? [14:23:15] 06 is. 05 still in progress. [14:23:15] 06 is. 05 still in progress. [14:23:42] well search doesn't work so [14:23:42] well search doesn't work so [14:23:57] just do it and we'll clean up later [14:23:57] just do it and we'll clean up later [14:24:35] ok [14:24:35] ok [14:25:05] bah I got distracted [14:25:05] bah I got distracted [14:26:40] zotero remains a problem. I'm going to have some breakfast and then try again [14:26:40] zotero remains a problem. I'm going to have some breakfast and then try again [14:27:05] !log deployment-mediawiki-07 : sudo rm -fR /srv/mediawiki/.~tmp~/ [14:27:05] !log deployment-mediawiki-07 : sudo rm -fR /srv/mediawiki/.~tmp~/ [14:27:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:27:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:30:15] (03PS2) 10MSantos: Allowing JsonConfig to be tested with Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/464827 (https://phabricator.wikimedia.org/T163274) [14:30:15] (03PS2) 10MSantos: Allowing JsonConfig to be tested with Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/464827 (https://phabricator.wikimedia.org/T163274) [14:31:42] Yippee, build fixed! [14:31:42] Yippee, build fixed! [14:31:43] Project beta-scap-eqiad build #228250: 09FIXED in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228250/ [14:31:43] Project beta-scap-eqiad build #228250: 09FIXED in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228250/ [14:31:47] (03CR) 10MSantos: "@Hashar what do you think about my previous comment?" [integration/config] - 10https://gerrit.wikimedia.org/r/464827 (https://phabricator.wikimedia.org/T163274) (owner: 10MSantos) [14:31:47] (03CR) 10MSantos: "@Hashar what do you think about my previous comment?" [integration/config] - 10https://gerrit.wikimedia.org/r/464827 (https://phabricator.wikimedia.org/T163274) (owner: 10MSantos) [14:34:14] !log beta-scap job fixed by deleting a file on deployment-mediawiki-07 [14:34:14] !log beta-scap job fixed by deleting a file on deployment-mediawiki-07 [14:34:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:34:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:41:02] 10Project-Admins, 10User-Jayprakash12345: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Jayprakash12345) [14:41:02] 10Project-Admins, 10User-Jayprakash12345: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Jayprakash12345) [14:49:57] uh andrewbogott [14:49:57] uh andrewbogott [14:50:06] root@deployment-cache-text05:/etc/acme# ping 172.16.4.21 [14:50:06] root@deployment-cache-text05:/etc/acme# ping 172.16.4.21 [14:50:07] PING 172.16.4.21 (172.16.4.21) 56(84) bytes of data. [14:50:07] 64 bytes from 172.16.4.21: icmp_seq=1 ttl=64 time=0.024 ms [14:50:07] PING 172.16.4.21 (172.16.4.21) 56(84) bytes of data. [14:50:07] 64 bytes from 172.16.4.21: icmp_seq=1 ttl=64 time=0.024 ms [14:50:09] no reverse DNS? [14:50:09] no reverse DNS? [14:50:32] (03PS7) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [14:50:32] (03PS7) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [14:50:36] this IP is deployment-cache-text05.deployment-prep.eqiad.wmflabs [14:50:36] this IP is deployment-cache-text05.deployment-prep.eqiad.wmflabs [14:50:58] (03CR) 10Mainframe98: Forbid usage of is_null() (034 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [14:50:59] (03CR) 10Mainframe98: Forbid usage of is_null() (034 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [14:51:34] 172.16.4.95 works — I assume the cron just hasn't fired yet [14:51:34] 172.16.4.95 works — I assume the cron just hasn't fired yet [14:51:47] oh, wait, that should be set up by sink shouldn't it [14:51:47] oh, wait, that should be set up by sink shouldn't it [14:52:18] https://www.irccloud.com/pastebin/7MQqMHst/ [14:52:18] https://www.irccloud.com/pastebin/7MQqMHst/ [14:52:47] works for me [14:52:47] works for me [14:54:53] weird [14:54:53] weird [14:55:03] for some reason varnish on cache-text05 is not listening on the ports nginx expects [14:55:03] for some reason varnish on cache-text05 is not listening on the ports nginx expects [14:55:14] (3120 through 3127) [14:55:14] (3120 through 3127) [14:56:06] must be missing something. will be back later [14:56:06] must be missing something. will be back later [14:59:24] !log created integration-castor03.integration.eqiad.wmflabs intended as a replacement for castor02 | T208803 [14:59:24] !log created integration-castor03.integration.eqiad.wmflabs intended as a replacement for castor02 | T208803 [14:59:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:59:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:59:28] T208803: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 [14:59:28] T208803: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 [15:01:36] (03CR) 10Thiemo Kreuz (WMDE): Forbid usage of is_null() (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [15:01:36] (03CR) 10Thiemo Kreuz (WMDE): Forbid usage of is_null() (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [15:06:20] 10Release-Engineering-Team (Kanban), 10Cloud-Services, 10Epic, 10Patch-For-Review: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 (10hashar) I will prepare the migration to integration-castor03 later tonight and attempt to do the migration tomorrow or on Thurs... [15:06:20] 10Release-Engineering-Team (Kanban), 10Cloud-Services, 10Epic, 10Patch-For-Review: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 (10hashar) I will prepare the migration to integration-castor03 later tonight and attempt to do the migration tomorrow or on Thurs... [15:12:25] (03PS8) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [15:12:25] (03PS8) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [15:12:27] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'https://en.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2482 bytes in 0.114 second response time [15:12:27] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'https://en.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2482 bytes in 0.114 second response time [15:14:27] (03PS9) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [15:14:27] (03PS9) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [15:14:45] (03CR) 10Mainframe98: Forbid usage of is_null() (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [15:14:45] (03CR) 10Mainframe98: Forbid usage of is_null() (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [15:15:00] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'https://en.m.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2483 bytes in 0.159 second response time [15:15:01] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'https://en.m.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2483 bytes in 0.159 second response time [15:18:23] PHP fatal error: [15:18:23] PHP fatal error: [15:18:23] Stack overflow [15:18:23] Stack overflow [15:18:25] great..... [15:18:25] great..... [15:18:36] awe [15:18:36] awe [15:18:50] where do you see that? [15:18:50] where do you see that? [15:19:20] at https://en.wikipedia.beta.wmflabs.org/ [15:19:20] at https://en.wikipedia.beta.wmflabs.org/ [15:19:26] can't dig into it right now [15:19:26] can't dig into it right now [15:19:32] weird it was just working a minute ago [15:19:32] weird it was just working a minute ago [15:19:36] I can look into it [15:19:36] I can look into it [15:20:06] Project beta-update-databases-eqiad build #29902: 04STILL FAILING in 5.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29902/ [15:20:06] Project beta-update-databases-eqiad build #29902: 04STILL FAILING in 5.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29902/ [15:25:31] 10Release-Engineering-Team (Kanban), 10User-greg: Summarize Developer Satisfaction survey data - https://phabricator.wikimedia.org/T209914 (10greg) [15:25:31] 10Release-Engineering-Team (Kanban), 10User-greg: Summarize Developer Satisfaction survey data - https://phabricator.wikimedia.org/T209914 (10greg) [15:26:07] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [15:26:08] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [15:32:01] Krenair: Everything is moved now save deployment-cache-text04 and zotero (which I'm still messing with) [15:32:01] Krenair: Everything is moved now save deployment-cache-text04 and zotero (which I'm still messing with) [15:32:11] ok [15:32:11] ok [15:32:16] I'll try to fix cache-text later [15:32:16] I'll try to fix cache-text later [15:32:24] 10Release-Engineering-Team (Backlog), 10User-greg: Expand Onboarding page - https://phabricator.wikimedia.org/T129050 (10greg) 05Open>03Resolved a:03greg https://www.mediawiki.org/w/index.php?title=Wikimedia_Release_Engineering_Team%2FOnboarding&type=revision&diff=2974185&oldid=2825738 [15:32:24] 10Release-Engineering-Team (Backlog), 10User-greg: Expand Onboarding page - https://phabricator.wikimedia.org/T129050 (10greg) 05Open>03Resolved a:03greg https://www.mediawiki.org/w/index.php?title=Wikimedia_Release_Engineering_Team%2FOnboarding&type=revision&diff=2974185&oldid=2825738 [15:51:31] Krenair, twentyafterfour, I'm getting nowhere with moving deployment-zotero01. It's not immediately in my way so I think the right solution is to just leave it in eqiad until akosiaris replaces it. [15:51:31] Krenair, twentyafterfour, I'm getting nowhere with moving deployment-zotero01. It's not immediately in my way so I think the right solution is to just leave it in eqiad until akosiaris replaces it. [15:51:48] It might need some security group changes in the meantime [15:51:48] It might need some security group changes in the meantime [15:58:07] ok [15:58:07] ok [16:13:16] (03CR) 10Michael Große: [C: 031] "Haven't tried it, but from reading through this, it seems clear enough." [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [16:13:16] (03CR) 10Michael Große: [C: 031] "Haven't tried it, but from reading through this, it seems clear enough." [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [16:13:46] 10Release-Engineering-Team, 10Operations, 10Release Pipeline: Design pipeline image versioning scheme - https://phabricator.wikimedia.org/T209088 (10akosiaris) >>! In T209088#4759458, @thcipriani wrote: >>>! In T209088#4745829, @akosiaris wrote: >> I think we should support multiple tags per image (docker an... [16:13:46] 10Release-Engineering-Team, 10Operations, 10Release Pipeline: Design pipeline image versioning scheme - https://phabricator.wikimedia.org/T209088 (10akosiaris) >>! In T209088#4759458, @thcipriani wrote: >>>! In T209088#4745829, @akosiaris wrote: >> I think we should support multiple tags per image (docker an... [16:20:02] Project beta-update-databases-eqiad build #29903: 04STILL FAILING in 1.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29903/ [16:20:02] Project beta-update-databases-eqiad build #29903: 04STILL FAILING in 1.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29903/ [16:23:01] andrewbogott: I'm trying to make sense of the php errors on deployment-mediawiki-* ... [16:23:01] andrewbogott: I'm trying to make sense of the php errors on deployment-mediawiki-* ... [16:23:25] one seems to be running php 7.0 while the other is php 7.2 and neither are fully/properly installed... [16:23:25] one seems to be running php 7.0 while the other is php 7.2 and neither are fully/properly installed... [16:23:55] andrewbogott: I'd say leave zotero alone indeed [16:23:55] andrewbogott: I'd say leave zotero alone indeed [16:25:40] twentyafterfour: I'm done breaking things for now :) let me know if you wind up with patches you'd like me to merge. [16:25:40] twentyafterfour: I'm done breaking things for now :) let me know if you wind up with patches you'd like me to merge. [16:26:04] ok [16:26:04] ok [16:26:07] thanks andrewbogott [16:26:07] thanks andrewbogott [16:56:36] 10MediaWiki-Releasing, 10Release-Engineering-Team (Kanban), 10MW-1.32-release, 10Patch-For-Review: Release MW 1.32 RC0 - https://phabricator.wikimedia.org/T209105 (10RobH) [16:56:36] 10MediaWiki-Releasing, 10Release-Engineering-Team (Kanban), 10MW-1.32-release, 10Patch-For-Review: Release MW 1.32 RC0 - https://phabricator.wikimedia.org/T209105 (10RobH) [17:09:21] Logging into Beta Cluster seems to be broken [17:09:21] Logging into Beta Cluster seems to be broken [17:09:31] https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:UserLogin&returnto=Main+Page [17:09:31] https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:UserLogin&returnto=Main+Page [17:09:57] looks like y'all are aware :) [17:09:57] looks like y'all are aware :) [17:10:36] Please ping me when it's fixed. [17:10:36] Please ping me when it's fixed. [17:18:07] twentyafterfour: well, maybe y'all don't know. Sounds like some things were broken, but maybe unrelated to the current outage. [17:18:07] twentyafterfour: well, maybe y'all don't know. Sounds like some things were broken, but maybe unrelated to the current outage. [17:18:24] Basically all special pages are broken on beta [17:18:24] Basically all special pages are broken on beta [17:20:02] Project beta-update-databases-eqiad build #29904: 04STILL FAILING in 1.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29904/ [17:20:02] Project beta-update-databases-eqiad build #29904: 04STILL FAILING in 1.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29904/ [17:31:49] kaldari, when maintenance is over everyone should get a wikitech-l mail [17:31:49] kaldari, when maintenance is over everyone should get a wikitech-l mail [17:31:58] thanks [17:31:58] thanks [17:32:17] all pages are broken on beta right now and twentyafterfour was looking into why [17:32:17] all pages are broken on beta right now and twentyafterfour was looking into why [17:32:43] it's not just special pages unless for some reason you're logged out [17:32:43] it's not just special pages unless for some reason you're logged out [17:37:07] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release Pipeline (Blubber): Retain YAML support by converting to JSON in Blubber - https://phabricator.wikimedia.org/T207696 (10dduvall) [17:37:07] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release Pipeline (Blubber): Retain YAML support by converting to JSON in Blubber - https://phabricator.wikimedia.org/T207696 (10dduvall) [17:38:07] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Refactor validation system to use jsonschema - https://phabricator.wikimedia.org/T207695 (10dduvall) [17:38:07] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Refactor validation system to use jsonschema - https://phabricator.wikimedia.org/T207695 (10dduvall) [17:38:10] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release Pipeline (Blubber): Adopt JSON as blubber's internal configuration format - https://phabricator.wikimedia.org/T207694 (10dduvall) [17:38:10] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release Pipeline (Blubber): Adopt JSON as blubber's internal configuration format - https://phabricator.wikimedia.org/T207694 (10dduvall) [17:39:39] (03CR) 10Legoktm: [V: 032 C: 032] Give access to l10n-bot on wikimedia-cz/tracker [wikimedia-cz/tracker] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/473078 (https://phabricator.wikimedia.org/T209308) (owner: 10Urbanecm) [17:39:39] (03CR) 10Legoktm: [V: 032 C: 032] Give access to l10n-bot on wikimedia-cz/tracker [wikimedia-cz/tracker] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/473078 (https://phabricator.wikimedia.org/T209308) (owner: 10Urbanecm) [17:40:08] 10Gerrit, 10Repository-Ownership-Requests, 10WMCZ-Tracker, 10Patch-For-Review, 10User-Urbanecm: Give access to l10n-bot on wikimedia-cz/tracker - https://phabricator.wikimedia.org/T209308 (10Legoktm) 05Open>03Resolved [17:40:08] 10Gerrit, 10Repository-Ownership-Requests, 10WMCZ-Tracker, 10Patch-For-Review, 10User-Urbanecm: Give access to l10n-bot on wikimedia-cz/tracker - https://phabricator.wikimedia.org/T209308 (10Legoktm) 05Open>03Resolved [18:18:50] !log deployment-mcs01 deployed [mobileapps/deploy@7553087] [18:18:50] !log deployment-mcs01 deployed [mobileapps/deploy@7553087] [18:18:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:18:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:20:07] Project beta-update-databases-eqiad build #29905: 04STILL FAILING in 7.1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29905/ [18:20:08] Project beta-update-databases-eqiad build #29905: 04STILL FAILING in 7.1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29905/ [18:39:52] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10MarcoAurelio) a:03MarcoAurelio [18:39:52] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10MarcoAurelio) a:03MarcoAurelio [18:44:20] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10MarcoAurelio) 05Open>03Resolved Done: #indic-techcom created. Description added and initial member set as @Jayprakash12345. You can add more members... [18:44:20] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10MarcoAurelio) 05Open>03Resolved Done: #indic-techcom created. Description added and initial member set as @Jayprakash12345. You can add more members... [18:49:08] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [18:49:08] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [18:49:39] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Jayprakash12345) @MarcoAurelio Thanks, I read the comment carefully. And will not rename the Project :) [18:49:39] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Jayprakash12345) @MarcoAurelio Thanks, I read the comment carefully. And will not rename the Project :) [18:55:23] Krenair: did you have a plan for deployment-cache-text04 other than holding off until last? [18:55:23] Krenair: did you have a plan for deployment-cache-text04 other than holding off until last? [18:57:27] also does anyone know which version of php we are supposed to be running in beta? We've got half broken versions of 7.0 and 7.2 depending on which mediawiki node you check [18:57:27] also does anyone know which version of php we are supposed to be running in beta? We've got half broken versions of 7.0 and 7.2 depending on which mediawiki node you check [19:00:37] twentyafterfour: I think it should be 7.2 for the appservers [19:00:37] twentyafterfour: I think it should be 7.2 for the appservers [19:00:40] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Aklapper) So I'm a little bit wondering about the workflow: I guess users can bring up issues at https://meta.wikimedia.org/wiki/Indic-TechCom/Requests... [19:00:40] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Aklapper) So I'm a little bit wondering about the workflow: I guess users can bring up issues at https://meta.wikimedia.org/wiki/Indic-TechCom/Requests... [19:01:10] basing that off of the version in hieradata/role/common/mediawiki/appserver.yaml [19:01:10] basing that off of the version in hieradata/role/common/mediawiki/appserver.yaml [19:02:01] which, last I checked, may not be in the hieradata hierarchy in beta? which, if that's the case, might explain some weirdness. [19:02:01] which, last I checked, may not be in the hieradata hierarchy in beta? which, if that's the case, might explain some weirdness. [19:11:23] hey releng, hope all is good :) I have a patch I want to rerun CI on. Is dropping and re-adding jenkins enough? https://gerrit.wikimedia.org/r/#/c/wikimedia/fundraising/crm/+/474951/-1..2 [19:11:23] hey releng, hope all is good :) I have a patch I want to rerun CI on. Is dropping and re-adding jenkins enough? https://gerrit.wikimedia.org/r/#/c/wikimedia/fundraising/crm/+/474951/-1..2 [19:14:28] jgleeson: just comment with "recheck" (no quotes) [19:14:28] jgleeson: just comment with "recheck" (no quotes) [19:18:20] greg-g, worked a treat! thanks [19:18:20] greg-g, worked a treat! thanks [19:18:30] np! [19:18:31] np! [19:19:30] the feature is documented but in such a way as to hide it almost completely: https://www.mediawiki.org/wiki/Continuous_integration/Whitelist [19:19:30] the feature is documented but in such a way as to hide it almost completely: https://www.mediawiki.org/wiki/Continuous_integration/Whitelist [19:20:02] Project beta-update-databases-eqiad build #29906: 04STILL FAILING in 1.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29906/ [19:20:02] Project beta-update-databases-eqiad build #29906: 04STILL FAILING in 1.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29906/ [19:28:31] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Jayprakash12345) https://meta.wikimedia.org/wiki/Indic-TechCom/Requests is different things, It is using for user scripts and gadget mostly. We will us... [19:28:31] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Jayprakash12345) https://meta.wikimedia.org/wiki/Indic-TechCom/Requests is different things, It is using for user scripts and gadget mostly. We will us... [19:35:33] ErrorException from line 0 of : PHP Notice: fwrite(): send of 41 bytes failed with errno=32 Broken pipe [19:35:34] ErrorException from line 0 of : PHP Notice: fwrite(): send of 41 bytes failed with errno=32 Broken pipe [19:35:47] wth is that supposed to mean? [19:35:47] wth is that supposed to mean? [19:40:58] twentyafterfour, was hoping to get the mediawiki problems sorted before sorting out deployment-cache-text05 [19:40:58] twentyafterfour, was hoping to get the mediawiki problems sorted before sorting out deployment-cache-text05 [19:41:17] then we should be able to just swap the DNS entry to point at text05 instead of text04 and avoid the problem of migrating the cache-text box [19:41:18] then we should be able to just swap the DNS entry to point at text05 instead of text04 and avoid the problem of migrating the cache-text box [19:43:20] Krenair: ok cool [19:43:21] Krenair: ok cool [19:43:40] mediawiki's problem seems like it might be database related [19:43:40] mediawiki's problem seems like it might be database related [19:43:50] both db nodes seem to think they are slaves [19:43:50] both db nodes seem to think they are slaves [19:43:57] huh [19:43:58] huh [19:44:14] that or I'm reading the slave status / master status incorrectly [19:44:14] that or I'm reading the slave status / master status incorrectly [19:44:31] db04 says Access denied for user 'repl'@'172.16.5.5' (using password: YES) [19:44:31] db04 says Access denied for user 'repl'@'172.16.5.5' (using password: YES) [19:44:34] Gerrit 3.0 is being released in spring 2019 :) [19:44:34] Gerrit 3.0 is being released in spring 2019 :) [19:44:44] db03 says Fatal error: Invalid (empty) username when attempting to connect to the master server. Connection attempt terminated. [19:44:44] db03 says Fatal error: Invalid (empty) username when attempting to connect to the master server. Connection attempt terminated. [19:45:19] hang on we can handle the access denied thing [19:45:20] hang on we can handle the access denied thing [19:45:32] oh no wait [19:45:32] oh no wait [19:45:37] | repl | % | [19:45:37] | repl | % | [19:45:48] never mind [19:45:48] never mind [19:45:55] where is it specified which one is supposed to be the master? is that in puppet somewhere? Conftool? [19:45:55] where is it specified which one is supposed to be the master? is that in puppet somewhere? Conftool? [19:46:11] this level of puppetisation around database servers? :)) [19:46:11] this level of puppetisation around database servers? :)) [19:46:27] I think it's set up at runtime [19:46:27] I think it's set up at runtime [19:46:36] just by people logging in as root and running commands [19:46:36] just by people logging in as root and running commands [19:46:54] that's what I thought but I was just being sure I hadn't missed something ;) [19:46:55] that's what I thought but I was just being sure I hadn't missed something ;) [19:47:22] surely sre has something a bit more formal set up for prod though I hope... [19:47:22] surely sre has something a bit more formal set up for prod though I hope... [19:48:48] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Jayprakash12345) >you'd copy them to the Requests wiki page? I will never copy them to the Requests wiki page. We will guide the user to request about... [19:48:48] 10Project-Admins, 10User-Jayprakash12345, 10User-MarcoAurelio: Create a new component project: Indic-TechCom - https://phabricator.wikimedia.org/T209952 (10Jayprakash12345) >you'd copy them to the Requests wiki page? I will never copy them to the Requests wiki page. We will guide the user to request about... [19:54:22] twentyafterfour, so db04 seems to know its supposed to replicate from db03 [19:54:22] twentyafterfour, so db04 seems to know its supposed to replicate from db03 [19:54:35] but is stuck connecting? [19:54:35] but is stuck connecting? [19:54:58] yeah but does db03 know it's master? [19:54:58] yeah but does db03 know it's master? [19:55:20] yes [19:55:20] yes [19:55:33] show master status shows deployment-db03-bin.000103 [19:55:33] show master status shows deployment-db03-bin.000103 [19:56:18] and show master status on db04 shows something similar: [19:56:18] and show master status on db04 shows something similar: [19:56:27] deployment-db04-bin.000126 | 626 | [19:56:27] deployment-db04-bin.000126 | 626 | [19:56:44] yeah but let's not worry about that for a moment [19:56:44] yeah but let's not worry about that for a moment [19:56:48] why is db04 unable to connect to db03 [19:56:48] ok [19:56:48] ok [19:56:48] why is db04 unable to connect to db03 [19:57:18] well it could be name resolution (cached ip ) or firewall rules? [19:57:18] well it could be name resolution (cached ip ) or firewall rules? [19:57:36] actually it's access denied [19:57:36] actually it's access denied [19:58:09] can we find out what credentials its trying? [19:58:09] can we find out what credentials its trying? [20:00:12] I think I found that in /srv/sqldata/master.info [20:00:12] I think I found that in /srv/sqldata/master.info [20:00:36] but it doesn't work [20:00:36] but it doesn't work [20:01:17] (03CR) 10Umherirrender: Forbid usage of is_null() (034 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [20:01:17] (03CR) 10Umherirrender: Forbid usage of is_null() (034 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [20:01:26] Krenair: master.info yeah [20:01:26] Krenair: master.info yeah [20:01:42] so we need to grant it on the master? [20:01:42] so we need to grant it on the master? [20:03:49] could try changing the repl password and updating the slave with the new one I guess [20:03:50] could try changing the repl password and updating the slave with the new one I guess [20:04:05] where did you find that access denied error? [20:04:05] where did you find that access denied error? [20:04:33] it's in the slave status on db04 [20:04:33] it's in the slave status on db04 [20:04:42] oh hang on [20:04:42] oh hang on [20:04:43] show slave status\G [20:04:43] show slave status\G [20:04:54] last io error [20:04:54] last io error [20:05:03] right but how long ago was that? [20:05:03] right but how long ago was that? [20:05:10] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [20:05:10] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [20:05:15] I don't know [20:05:15] I don't know [20:05:16] I just looked in the service logs for mysqld and saw replication failures [20:05:17] I just looked in the service logs for mysqld and saw replication failures [20:05:38] probably caused by my attempts to fix stuff last night [20:05:39] probably caused by my attempts to fix stuff last night [20:05:50] I tried running the create user commands on both hosts but actually should have just run them on the master and let them get replicated [20:05:50] I tried running the create user commands on both hosts but actually should have just run them on the master and let them get replicated [20:06:02] now the slave is complaining about the create user commands [20:06:02] now the slave is complaining about the create user commands [20:06:37] master log position is deployment-db03-bin.000102 : 891687561 [20:06:37] master log position is deployment-db03-bin.000102 : 891687561 [20:07:28] and on the master it's already on log 000103 position 336 [20:07:28] and on the master it's already on log 000103 position 336 [20:07:44] I've deleted the users I made on db04 [20:07:44] I've deleted the users I made on db04 [20:07:44] so the slave is behind, not sure how far behind [20:07:44] so the slave is behind, not sure how far behind [20:07:48] restarted mysql a couple times [20:07:48] restarted mysql a couple times [20:08:07] and now...still problems. bah. [20:08:07] and now...still problems. bah. [20:08:24] slave is "Connecting" [20:08:24] slave is "Connecting" [20:09:00] right okay [20:09:00] right okay [20:09:05] we're back to Nov 20 20:07:18 deployment-db04 mysqld: 181120 20:07:18 [ERROR] Slave I/O: error connecting to master 'repl@deployment-db03.eqiad.wmflabs:3306' - retry-time: 60 maximum-retries: 86400 message: Access denied for user 'repl'@'172.16.5.5' (using password: YES), Internal MariaDB error code: 1045 [20:09:05] we're back to Nov 20 20:07:18 deployment-db04 mysqld: 181120 20:07:18 [ERROR] Slave I/O: error connecting to master 'repl@deployment-db03.eqiad.wmflabs:3306' - retry-time: 60 maximum-retries: 86400 message: Access denied for user 'repl'@'172.16.5.5' (using password: YES), Internal MariaDB error code: 1045 [20:09:51] on master: show slave hosts; empty set [20:09:51] on master: show slave hosts; empty set [20:10:38] hmm "Slave has read all relay log; waiting for the slave I/O thread to update it" [20:10:38] hmm "Slave has read all relay log; waiting for the slave I/O thread to update it" [20:12:00] tcp 0 0 172.16.5.5:59320 172.16.5.23:3306 TIME_WAIT [20:12:00] tcp 0 0 172.16.5.5:59320 172.16.5.23:3306 TIME_WAIT [20:12:03] (03PS10) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [20:12:04] (03PS10) 10Mainframe98: Forbid usage of is_null() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [20:12:21] that looks like the tcp connection from slave to master [20:12:21] that looks like the tcp connection from slave to master [20:12:28] yes [20:12:29] yes [20:12:42] I think we just need to change the password [20:12:42] I think we just need to change the password [20:13:03] (03CR) 10Mainframe98: Forbid usage of is_null() (033 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [20:13:03] (03CR) 10Mainframe98: Forbid usage of is_null() (033 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [20:15:59] twentyafterfour, okay [20:15:59] twentyafterfour, okay [20:16:03] I think I may have done it [20:16:03] I think I may have done it [20:16:09] root@BETA[mysql]> show slave status \G [20:16:09] root@BETA[mysql]> show slave status \G [20:16:09] *************************** 1. row *************************** [20:16:09] Slave_IO_State: Queueing master event to the relay log [20:16:09] *************************** 1. row *************************** [20:16:10] Slave_IO_State: Queueing master event to the relay log [20:16:49] twentyafterfour, can you confirm replication looks okay to you? [20:16:49] twentyafterfour, can you confirm replication looks okay to you? [20:19:49] looking [20:19:49] looking [20:20:02] Project beta-update-databases-eqiad build #29907: 04STILL FAILING in 2.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29907/ [20:20:03] Project beta-update-databases-eqiad build #29907: 04STILL FAILING in 2.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29907/ [20:20:48] Krenair: yeah it looks good I think [20:20:48] Krenair: yeah it looks good I think [20:21:18] why is MediaWiki still broken? :/ [20:21:19] why is MediaWiki still broken? :/ [20:21:24] grr I don't know [20:21:24] grr I don't know [20:21:27] I'll store the repl password [20:21:27] I'll store the repl password [20:21:28] I see a ton of ErrorException from line 0 of : PHP Notice: fwrite(): send of 41 bytes failed with errno=32 Broken pipe [20:21:28] I see a ton of ErrorException from line 0 of : PHP Notice: fwrite(): send of 41 bytes failed with errno=32 Broken pipe [20:21:40] maybe in a labs/private cherry-pick? [20:21:40] maybe in a labs/private cherry-pick? [20:21:58] which seems to be related to using a pipe after fork (according to a superficial googling of that error) [20:21:58] which seems to be related to using a pipe after fork (according to a superficial googling of that error) [20:22:06] Krenair: yeah that seems fine [20:22:06] Krenair: yeah that seems fine [20:23:29] !log Changed deployment-prep mysql repl password in attempt to get replication working again, have stored it at deployment-puppetmaster03:/var/lib/git/labs/private/modules/secret/secrets/mysql/repl_password [20:23:29] !log Changed deployment-prep mysql repl password in attempt to get replication working again, have stored it at deployment-puppetmaster03:/var/lib/git/labs/private/modules/secret/secrets/mysql/repl_password [20:23:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:23:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:24:35] and with that I really need to do other things [20:24:35] and with that I really need to do other things [20:25:04] Krenair: ok thanks for helping out! I'll keep poking mediawiki [20:25:04] Krenair: ok thanks for helping out! I'll keep poking mediawiki [20:45:02] ok narrowed the php error down to redis [20:45:02] ok narrowed the php error down to redis [20:54:12] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next), 10Wikimedia-Logstash, 10Jenkins: Send Jenkins daemon logs to logstash - https://phabricator.wikimedia.org/T143733 (10herron) [20:54:12] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next), 10Wikimedia-Logstash, 10Jenkins: Send Jenkins daemon logs to logstash - https://phabricator.wikimedia.org/T143733 (10herron) [20:55:22] why don't we have any deployment-redis servers? are those no longer relevant? [20:55:22] why don't we have any deployment-redis servers? are those no longer relevant? [21:05:00] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [21:05:00] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [21:11:07] greg-g hi, i wonder should liw and longma be voiced? [21:11:07] greg-g hi, i wonder should liw and longma be voiced? [21:12:16] twentyafterfour, those were made obsolete and deleted earlier today [21:12:16] twentyafterfour, those were made obsolete and deleted earlier today [21:12:32] twentyafterfour, what IPs is MediaWiki trying to hit? [21:12:32] twentyafterfour, what IPs is MediaWiki trying to hit? [21:12:58] there was a bit of discussion in here and other channels about getting rid of those [21:12:58] there was a bit of discussion in here and other channels about getting rid of those [21:20:03] Project beta-update-databases-eqiad build #29908: 04STILL FAILING in 2.2 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29908/ [21:20:03] Project beta-update-databases-eqiad build #29908: 04STILL FAILING in 2.2 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29908/ [21:27:51] paladox: yeah, getting to it, low priority :) [21:27:51] paladox: yeah, getting to it, low priority :) [21:28:18] Krenair: it's going into a deep recursive loop starting at #531 /srv/mediawiki/php-master/includes/libs/redis/RedisConnectionPool.php(249): Redis->auth(string) [21:28:18] Krenair: it's going into a deep recursive loop starting at #531 /srv/mediawiki/php-master/includes/libs/redis/RedisConnectionPool.php(249): Redis->auth(string) [21:28:37] ah ok :) [21:28:37] ah ok :) [21:28:37] and the only config related to redis that I can find mentions those redis04 and redis05 nodes [21:28:37] and the only config related to redis that I can find mentions those redis04 and redis05 nodes [21:28:41] interesting [21:28:41] interesting [21:28:57] it's possible someone declared them obsolete before they had config references removed :) [21:28:57] it's possible someone declared them obsolete before they had config references removed :) [21:29:04] will look later [21:29:05] will look later [21:29:22] indeed ... so if I remove the config then it'll stop trying to use redis for session storage? [21:29:22] indeed ... so if I remove the config then it'll stop trying to use redis for session storage? [21:29:24] hmm [21:29:24] hmm [21:31:34] it might use memc? [21:31:34] it might use memc? [21:31:43] it might use $magical_new_service ? [21:31:43] it might use $magical_new_service ? [21:33:40] I'm not at all sure how any of this works. I'm still very far from being a mediawiki expert [21:33:40] I'm not at all sure how any of this works. I'm still very far from being a mediawiki expert [21:33:54] or even an amateur [21:33:54] or even an amateur [21:35:12] twentyafterfour: yet you manage to do fantastic work [21:35:12] twentyafterfour: yet you manage to do fantastic work [21:36:37] Hauskatze: I'm much more comfortable in scap and phabricator codebases these days [21:36:37] Hauskatze: I'm much more comfortable in scap and phabricator codebases these days [21:37:02] twentyafterfour: I can understand that. [21:37:02] twentyafterfour: I can understand that. [21:59:37] twentyafterfour: you still around? [21:59:37] twentyafterfour: you still around? [22:15:15] maintenance-disconnect-full-disks build 22003 integration-slave-jessie-1003 (/: 95%): OFFLINE due to disk space [22:15:15] maintenance-disconnect-full-disks build 22003 integration-slave-jessie-1003 (/: 95%): OFFLINE due to disk space [22:25:16] maintenance-disconnect-full-disks build 22005 integration-slave-jessie-1003: OFFLINE due to disk space [22:25:16] maintenance-disconnect-full-disks build 22005 integration-slave-jessie-1003: OFFLINE due to disk space [22:26:12] Project beta-update-databases-eqiad build #29909: 04STILL FAILING in 2.4 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29909/ [22:26:12] Project beta-update-databases-eqiad build #29909: 04STILL FAILING in 2.4 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29909/ [22:29:43] * thcipriani checks out 1003 [22:29:43] ACTION checks out 1003 [22:30:50] Hauskatze: sorry I went afk for a bit [22:30:50] Hauskatze: sorry I went afk for a bit [22:31:13] twentyafterfour: don't worry, I managed to get the attention of dereckson in -operations :) [22:31:13] twentyafterfour: don't worry, I managed to get the attention of dereckson in -operations :) [22:31:20] I am going to bed [22:31:20] I am going to bed [22:39:36] where does https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep come from? Horizon? [22:39:36] where does https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep come from? Horizon? [22:41:50] maybe openstackmanager? [22:41:51] maybe openstackmanager? [22:42:02] ^ [22:42:03] ^ [22:42:30] !log repooling integration-slave-jessie-1003 after cleaning mvn and gradle cache [22:42:30] !log repooling integration-slave-jessie-1003 after cleaning mvn and gradle cache [22:42:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:42:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:44:43] hey guys, do you know why scap still tries to use deployment-tin by any chance? [22:44:43] hey guys, do you know why scap still tries to use deployment-tin by any chance? [22:44:56] i have puppet running: scap deploy-local [22:44:56] i have puppet running: scap deploy-local [22:45:09] and then scap does: Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs... [22:45:09] and then scap does: Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs... [22:45:17] but that is gone [22:45:17] but that is gone [22:45:35] and i don't see where scap gets that default from.. i grepped scap and puppet repo and Hiera and so on [22:45:35] and i don't see where scap gets that default from.. i grepped scap and puppet repo and Hiera and so on [22:45:51] it grabs the deployment server from the last time it deployed [22:45:51] it grabs the deployment server from the last time it deployed [22:46:06] so it's probably in /srv/deployment/[repo]/.git/DEPLOY_HEAD somewhere [22:46:06] so it's probably in /srv/deployment/[repo]/.git/DEPLOY_HEAD somewhere [22:46:09] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [22:46:09] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [22:46:12] but i just created this instance and ran puppet the first time [22:46:12] but i just created this instance and ran puppet the first time [22:46:20] it never deployed [22:46:20] it never deployed [22:46:30] what's the repo? [22:46:30] what's the repo? [22:46:35] i just want to test the puppet role and it happens to use scap [22:46:35] i just want to test the puppet role and it happens to use scap [22:46:50] Execution of '/usr/bin/scap deploy-local --repo scholarships/scholarships -D log_json:False' returned 70: 22:39:45 Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs/scholarships/scholarships/.git [22:46:50] Execution of '/usr/bin/scap deploy-local --repo scholarships/scholarships -D log_json:False' returned 70: 22:39:45 Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs/scholarships/scholarships/.git [22:47:29] i am trying to test if the puppet role "webserver_misc_apps" works on stretch [22:47:29] i am trying to test if the puppet role "webserver_misc_apps" works on stretch [22:47:37] and this is a profile inside that [22:47:37] and this is a profile inside that [22:47:39] mutante: so on deployment-deploy01:/srv/deployment/scholarships/scholarships/.git/DEPLOY_HEAD is where it's finding that info [22:47:39] mutante: so on deployment-deploy01:/srv/deployment/scholarships/scholarships/.git/DEPLOY_HEAD is where it's finding that info [22:47:46] mutante / thcipriani - something perhaps stupid but have you checked the private repos (labs/private/etc)? [22:47:46] mutante / thcipriani - something perhaps stupid but have you checked the private repos (labs/private/etc)? [22:48:49] if you swap out the value of "git_server" in that file (or run `scap deploy --init` in deployment-deploy01:/srv/deployment/scholarships/scholarships to regenerate that file, that should fix the problem. [22:48:49] if you swap out the value of "git_server" in that file (or run `scap deploy --init` in deployment-deploy01:/srv/deployment/scholarships/scholarships to regenerate that file, that should fix the problem. [22:50:58] thcipriani: oh.. so .. that would mean i am already talking to the new deployment server and there it gets the info which server it used last time ..and then it gets the old name and fails to continue ? slightly confused. but let me just try that [22:50:58] thcipriani: oh.. so .. that would mean i am already talking to the new deployment server and there it gets the info which server it used last time ..and then it gets the old name and fails to continue ? slightly confused. but let me just try that [22:52:59] is this another case of an instance outside deployment-prep trying to use deployment-prep infra? :/ [22:52:59] is this another case of an instance outside deployment-prep trying to use deployment-prep infra? :/ [22:54:15] yeah, it is confusing, this comes up when we switch deployment servers (as you might imagine :)). I forget why we implemented that way, but we should fix it :\ [22:54:15] yeah, it is confusing, this comes up when we switch deployment servers (as you might imagine :)). I forget why we implemented that way, but we should fix it :\ [22:54:55] eh.. are you saying i need to create my own deployment server or evereything must always be inside deployment-prep if a class uses scap? [22:54:55] eh.. are you saying i need to create my own deployment server or evereything must always be inside deployment-prep if a class uses scap? [22:55:09] thcipriani: ok, thank you! [22:55:09] thcipriani: ok, thank you! [22:56:26] i see how it already created files in /srv/deployment/ ack [22:56:26] i see how it already created files in /srv/deployment/ ack [22:57:10] and not using deployment-tin would be needed either way [22:57:10] and not using deployment-tin would be needed either way [23:01:41] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [23:01:42] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [23:03:15] thcipriani: it worked.. after i also deleted the /srv/deployment/scholarships on my instance and did it again [23:03:15] thcipriani: it worked.. after i also deleted the /srv/deployment/scholarships on my instance and did it again [23:04:12] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [23:04:12] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [23:04:34] mutante: hrm, well, glad it's working now at least [23:04:34] mutante: hrm, well, glad it's working now at least [23:04:59] thanks [23:04:59] thanks [23:05:04] and Krenair, i will delete it again soon [23:05:04] and Krenair, i will delete it again soon [23:05:13] i just want to fix the role [23:05:13] i just want to fix the role [23:05:51] hey, whats up with beta cluster? is it down because of maitenance? I just deployed beta cluster config change [23:05:51] hey, whats up with beta cluster? is it down because of maitenance? I just deployed beta cluster config change [23:05:54] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [23:05:54] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [23:06:02] and I'm not sure if this is me who broke it, or is it down for maintenance [23:06:02] and I'm not sure if this is me who broke it, or is it down for maintenance [23:06:16] it may be down due to the maint that is ongoing [23:06:16] it may be down due to the maint that is ongoing [23:06:18] 10Release-Engineering-Team (Kanban), 10LDAP-Access-Requests: LDAP requests for Jeena Huneidi - https://phabricator.wikimedia.org/T210028 (10greg) p:05Triage>03Normal [23:06:18] 10Release-Engineering-Team (Kanban), 10LDAP-Access-Requests: LDAP requests for Jeena Huneidi - https://phabricator.wikimedia.org/T210028 (10greg) p:05Triage>03Normal [23:06:26] it moving to a new network in the cloud (labs) [23:06:26] it moving to a new network in the cloud (labs) [23:07:52] ok, thx paladox for info [23:07:52] ok, thx paladox for info [23:08:51] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [23:08:51] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [23:11:09] greg-g - can I get access to the deployment-prep please? [23:11:09] greg-g - can I get access to the deployment-prep please? [23:11:22] :) [23:11:23] :) [23:11:54] raynor: deployment-prep is still broken and I'm still trying to figure out the cause so it might be a while before it's back to normal [23:11:54] raynor: deployment-prep is still broken and I'm still trying to figure out the cause so it might be a while before it's back to normal [23:12:19] thanks, btw, twentyafterfour :/ [23:12:19] thanks, btw, twentyafterfour :/ [23:12:24] greg-g: np [23:12:24] greg-g: np [23:12:37] ah, it can wait, like 30 mins ago I deployed a config change for beta cluster (https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/473835/) but of course I didn't check that beta works correctly [23:12:37] ah, it can wait, like 30 mins ago I deployed a config change for beta cluster (https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/473835/) but of course I didn't check that beta works correctly [23:12:50] I need a consultation with an expert on mediawiki session storage / object cache [23:12:50] I need a consultation with an expert on mediawiki session storage / object cache [23:12:54] "no one said it would be easy" is a song lyric and movie title from a band I love. Boy were they right. [23:12:54] "no one said it would be easy" is a song lyric and movie title from a band I love. Boy were they right. [23:13:32] after I deployed that I saw that beta is down, and now you know... my pats are on fire [23:13:32] after I deployed that I saw that beta is down, and now you know... my pats are on fire [23:18:22] (03CR) 10Urbanecm: [V: 032 C: 032] wikimedia-cz/tracker rights should inherit from wikimedia-cz [wikimedia-cz/tracker] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474649 (owner: 10Urbanecm) [23:18:22] (03CR) 10Urbanecm: [V: 032 C: 032] wikimedia-cz/tracker rights should inherit from wikimedia-cz [wikimedia-cz/tracker] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474649 (owner: 10Urbanecm) [23:20:07] Project beta-update-databases-eqiad build #29910: 04STILL FAILING in 7.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29910/ [23:20:07] Project beta-update-databases-eqiad build #29910: 04STILL FAILING in 7.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/29910/ [23:23:23] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: RedisBagOStuff is broken on beta - https://phabricator.wikimedia.org/T210030 (10mmodell) p:05Triage>03Normal [23:23:23] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: RedisBagOStuff is broken on beta - https://phabricator.wikimedia.org/T210030 (10mmodell) p:05Triage>03Normal [23:23:44] * twentyafterfour is going to consult wikitech-l [23:23:44] ACTION is going to consult wikitech-l [23:27:55] twentyafterfour maybe this https://github.com/php-amqplib/php-amqplib/issues/424 will help? [23:27:55] twentyafterfour maybe this https://github.com/php-amqplib/php-amqplib/issues/424 will help? [23:28:01] Also can it reach the redis servers? [23:28:02] Also can it reach the redis servers? [23:41:39] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [23:41:39] 10Release-Engineering-Team (Kanban): Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [23:45:11] paladox: the redis servers went awayy [23:45:11] paladox: the redis servers went awayy [23:45:18] oh [23:45:18] oh [23:45:21] so I think what we want is to remove them from the config [23:45:21] so I think what we want is to remove them from the config [23:45:46] because I tried adding them back and that didn't work - plus they were removed because the consensus was that they shouldn't be needed [23:45:46] because I tried adding them back and that didn't work - plus they were removed because the consensus was that they shouldn't be needed [23:45:57] twentyafterfour https://github.com/wikimedia/puppet/blob/848070990ce601484a456b07deda10b5994a9ab1/hieradata/labs/deployment-prep/common.yaml#L320 [23:45:57] twentyafterfour https://github.com/wikimedia/puppet/blob/848070990ce601484a456b07deda10b5994a9ab1/hieradata/labs/deployment-prep/common.yaml#L320 [23:46:04] did the same thing as earlier for repo "iegreview" , replaced deployment-tin [23:46:04] did the same thing as earlier for repo "iegreview" , replaced deployment-tin [23:48:45] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: contint1001 build cleanup - https://phabricator.wikimedia.org/T209123 (10greg) >>! In T209123#4739288, @hashar wrote: > I think I got the low hanging fruits solved: > > * Quibble jobs retain the log for 7 days... [23:48:45] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: contint1001 build cleanup - https://phabricator.wikimedia.org/T209123 (10greg) >>! In T209123#4739288, @hashar wrote: > I think I got the low hanging fruits solved: > > * Quibble jobs retain the log for 7 days... [23:49:13] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): contint1001 build cleanup - https://phabricator.wikimedia.org/T209123 (10greg) [23:49:13] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): contint1001 build cleanup - https://phabricator.wikimedia.org/T209123 (10greg) [23:49:37] 10Release-Engineering-Team (Backlog), 10Discovery-Search (Current work): Use maven wrapper (mvnw) to build maven based project from search platform team - https://phabricator.wikimedia.org/T208938 (10greg) [23:49:37] 10Release-Engineering-Team (Backlog), 10Discovery-Search (Current work): Use maven wrapper (mvnw) to build maven based project from search platform team - https://phabricator.wikimedia.org/T208938 (10greg) [23:49:52] 10MediaWiki-Releasing, 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Security: Install docker on releases-jenkins - https://phabricator.wikimedia.org/T208529 (10greg) [23:49:52] 10MediaWiki-Releasing, 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Security: Install docker on releases-jenkins - https://phabricator.wikimedia.org/T208529 (10greg) [23:50:03] 10MediaWiki-Releasing, 10Release-Engineering-Team (Next), 10Security: Create fab task to deploy jenkins job to either ci-jenkins or releases-jenkins - https://phabricator.wikimedia.org/T208528 (10greg) [23:50:03] 10MediaWiki-Releasing, 10Release-Engineering-Team (Next), 10Security: Create fab task to deploy jenkins job to either ci-jenkins or releases-jenkins - https://phabricator.wikimedia.org/T208528 (10greg) [23:50:12] 10MediaWiki-Releasing, 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Security: Create jenkins job to build MediaWiki tarball releases - https://phabricator.wikimedia.org/T208527 (10greg) [23:50:12] 10MediaWiki-Releasing, 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Security: Create jenkins job to build MediaWiki tarball releases - https://phabricator.wikimedia.org/T208527 (10greg) [23:50:19] 10Release-Engineering-Team (Backlog), 10Wikidata: Stop branching & deploying WikibaseQuality extension - https://phabricator.wikimedia.org/T208499 (10greg) [23:50:19] 10Release-Engineering-Team (Backlog), 10Wikidata: Stop branching & deploying WikibaseQuality extension - https://phabricator.wikimedia.org/T208499 (10greg) [23:54:09] (03CR) 10Thcipriani: [C: 032] Fix grammar in tests/test_coverage.py [integration/config] - 10https://gerrit.wikimedia.org/r/474881 (owner: 10Hashar) [23:54:09] (03CR) 10Thcipriani: [C: 032] Fix grammar in tests/test_coverage.py [integration/config] - 10https://gerrit.wikimedia.org/r/474881 (owner: 10Hashar) [23:55:57] (03Merged) 10jenkins-bot: Fix grammar in tests/test_coverage.py [integration/config] - 10https://gerrit.wikimedia.org/r/474881 (owner: 10Hashar) [23:55:57] (03Merged) 10jenkins-bot: Fix grammar in tests/test_coverage.py [integration/config] - 10https://gerrit.wikimedia.org/r/474881 (owner: 10Hashar) [23:59:28] 10Release-Engineering-Team (Watching / External), 10Scap, 10Operations: Scap deployers should have the ability to depool and restart HHVM - https://phabricator.wikimedia.org/T208813 (10greg) [23:59:28] 10Release-Engineering-Team (Watching / External), 10Scap, 10Operations: Scap deployers should have the ability to depool and restart HHVM - https://phabricator.wikimedia.org/T208813 (10greg)