[02:12:04] PROBLEM - Puppet staleness on deployment-maps05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [02:12:04] PROBLEM - Puppet staleness on deployment-maps05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [05:34:14] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Krishna_Chaitanya_Velaga) I been involved in organising various outreach activities for Wikimedia projects, include large projects such as conferences and campaigns. My previous expe... [05:34:14] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Krishna_Chaitanya_Velaga) I been involved in organising various outreach activities for Wikimedia projects, include large projects such as conferences and campaigns. My previous expe... [09:13:34] zeljkof: looks like the merge of https://gerrit.wikimedia.org/r/c/integration/config/+/474173 failed (cc hashar) [09:13:34] zeljkof: looks like the merge of https://gerrit.wikimedia.org/r/c/integration/config/+/474173 failed (cc hashar) [09:13:54] so we don't have the publication to sonarcloud working yet [09:13:55] so we don't have the publication to sonarcloud working yet [09:14:12] well, we do :) [09:14:12] well, we do :) [09:14:26] the jobs are deployed, this is just the config file used to deploy the jobs [09:14:26] the jobs are deployed, this is just the config file used to deploy the jobs [09:14:50] I'll re-merge it, looks like just one job was aborted for whatever reason, I'll check [09:14:50] I'll re-merge it, looks like just one job was aborted for whatever reason, I'll check [09:15:40] https://integration.wikimedia.org/ci/job/integration-config-tox-docker/5108/console [09:15:40] https://integration.wikimedia.org/ci/job/integration-config-tox-docker/5108/console [09:15:49] Build timed out (after 3 minutes). Marking the build as aborted. [09:15:49] Build timed out (after 3 minutes). Marking the build as aborted. [09:16:23] (03CR) 10Zfilipin: Add sonarcloud publication to maven site publish template [integration/config] - 10https://gerrit.wikimedia.org/r/474173 (https://phabricator.wikimedia.org/T207046) (owner: 10Hashar) [09:16:23] (03CR) 10Zfilipin: Add sonarcloud publication to maven site publish template [integration/config] - 10https://gerrit.wikimedia.org/r/474173 (https://phabricator.wikimedia.org/T207046) (owner: 10Hashar) [09:16:30] (03CR) 10Zfilipin: [C: 032] Add sonarcloud publication to maven site publish template [integration/config] - 10https://gerrit.wikimedia.org/r/474173 (https://phabricator.wikimedia.org/T207046) (owner: 10Hashar) [09:16:30] (03CR) 10Zfilipin: [C: 032] Add sonarcloud publication to maven site publish template [integration/config] - 10https://gerrit.wikimedia.org/r/474173 (https://phabricator.wikimedia.org/T207046) (owner: 10Hashar) [09:16:55] it should work now, looks like the failing jobs times out after three minutes [09:16:55] it should work now, looks like the failing jobs times out after three minutes [09:19:36] zeljkof: strange, I've seen failing post merge [09:19:37] zeljkof: strange, I've seen failing post merge [09:20:18] looks like they were not using the more recent docker image (at least, they were still failing on not being able to write to the cache directory) [09:20:18] looks like they were not using the more recent docker image (at least, they were still failing on not being able to write to the cache directory) [09:20:59] for example: https://integration.wikimedia.org/ci/job/wikidata-query-rdf-maven-java8-docker-site-publish/130/console [09:20:59] for example: https://integration.wikimedia.org/ci/job/wikidata-query-rdf-maven-java8-docker-site-publish/130/console [09:21:30] (03Merged) 10jenkins-bot: Add sonarcloud publication to maven site publish template [integration/config] - 10https://gerrit.wikimedia.org/r/474173 (https://phabricator.wikimedia.org/T207046) (owner: 10Hashar) [09:21:30] (03Merged) 10jenkins-bot: Add sonarcloud publication to maven site publish template [integration/config] - 10https://gerrit.wikimedia.org/r/474173 (https://phabricator.wikimedia.org/T207046) (owner: 10Hashar) [09:23:04] Oh, that job uses a different docker image for some reason [09:23:04] Oh, that job uses a different docker image for some reason [09:25:30] I think I'll wait for hashar to be around, my head is already hurting from trying to find my way around JJB [09:25:30] I think I'll wait for hashar to be around, my head is already hurting from trying to find my way around JJB [09:27:05] gehel: maybe I've made a mistake in deploying... [09:27:05] gehel: maybe I've made a mistake in deploying... [09:27:17] but I don't think so, I've deployed all jobs together [09:27:17] but I don't think so, I've deployed all jobs together [09:27:27] so if one works, all should work [09:27:27] so if one works, all should work [09:27:28] no, looking at the conf, it does specify another docker image, which we have not updated [09:27:28] no, looking at the conf, it does specify another docker image, which we have not updated [09:27:44] (03PS1) 10Hashar: Add GraphQL MediaWiki extension [integration/config] - 10https://gerrit.wikimedia.org/r/474647 [09:27:45] (03PS1) 10Hashar: Add GraphQL MediaWiki extension [integration/config] - 10https://gerrit.wikimedia.org/r/474647 [09:27:49] there is no reason (that I know of) to use a different image just for this project [09:27:49] there is no reason (that I know of) to use a different image just for this project [09:28:18] gehel: I'll re-deploy it with master, now that the commit is merged [09:28:18] gehel: I'll re-deploy it with master, now that the commit is merged [09:28:51] https://github.com/wikimedia/integration-config/blob/master/jjb/wikidata.yaml#L23-L27 [09:28:51] https://github.com/wikimedia/integration-config/blob/master/jjb/wikidata.yaml#L23-L27 [09:29:15] ^ that seems to specify a different docker image [09:29:15] ^ that seems to specify a different docker image [09:30:17] (03PS1) 10Urbanecm: wikimedia-cz/tracker rights should inherit from wikimedia-cz [wikimedia-cz/tracker] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474649 [09:30:17] (03PS1) 10Urbanecm: wikimedia-cz/tracker rights should inherit from wikimedia-cz [wikimedia-cz/tracker] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/474649 [09:30:23] but the config seems sufficiently different from the other projects that I have no idea how to reconcile them [09:30:23] but the config seems sufficiently different from the other projects that I have no idea how to reconcile them [09:30:57] (03PS1) 10Hashar: Add some BlueSpice MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/474650 [09:30:57] (03PS1) 10Hashar: Add some BlueSpice MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/474650 [09:31:03] good mmorning [09:31:03] good mmorning [09:31:05] uh, I have not clue, let's wait for hashar :D [09:31:05] uh, I have not clue, let's wait for hashar :D [09:31:14] sorry I have contractors at home to change the heating system :D [09:31:14] sorry I have contractors at home to change the heating system :D [09:31:16] hashar: ah, we're just talking about you [09:31:16] hashar: ah, we're just talking about you [09:31:18] but I am there at least [09:31:18] but I am there at least [09:31:21] hashar: Hello! [09:31:21] hashar: Hello! [09:31:40] how is the Sonar thingie going on? [09:31:40] how is the Sonar thingie going on? [09:31:51] Oh, that docker image adds phantomjs, but not sure if we actually need it [09:31:51] Oh, that docker image adds phantomjs, but not sure if we actually need it [09:31:54] (03CR) 10Hashar: [C: 032] Add some BlueSpice MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/474650 (owner: 10Hashar) [09:31:54] (03CR) 10Hashar: [C: 032] Add some BlueSpice MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/474650 (owner: 10Hashar) [09:32:01] (03CR) 10Hashar: [C: 032] Add GraphQL MediaWiki extension [integration/config] - 10https://gerrit.wikimedia.org/r/474647 (owner: 10Hashar) [09:32:01] (03CR) 10Hashar: [C: 032] Add GraphQL MediaWiki extension [integration/config] - 10https://gerrit.wikimedia.org/r/474647 (owner: 10Hashar) [09:32:15] gehel: usually PhantomJS is frowned upon [09:32:15] gehel: usually PhantomJS is frowned upon [09:32:22] it is a legacy thing that should not be used [09:32:22] it is a legacy thing that should not be used [09:32:40] that part is way out of my expertise! [09:32:40] that part is way out of my expertise! [09:32:51] partly because it is a fork of some old Webkit which is no more used by any browser (opera/safari switched to chromium) [09:32:51] partly because it is a fork of some old Webkit which is no more used by any browser (opera/safari switched to chromium) [09:33:07] probably used by wdqs-gui for testing, so indirectly used by the wdqs build [09:33:07] probably used by wdqs-gui for testing, so indirectly used by the wdqs build [09:33:10] phantomJS had its use for headless testing, but firefox/chromium do have support for headless mode nowadays [09:33:10] phantomJS had its use for headless testing, but firefox/chromium do have support for headless mode nowadays [09:33:29] so for sure, any software still relying on phantomjs should switch to chromium headless (and potentially firefox as well) [09:33:29] so for sure, any software still relying on phantomjs should switch to chromium headless (and potentially firefox as well) [09:34:03] dockerfiles/java8-wikidata-query-rdf/Dockerfile.template:RUN {{ "phantomjs" | apt_install }} [09:34:03] dockerfiles/java8-wikidata-query-rdf/Dockerfile.template:RUN {{ "phantomjs" | apt_install }} [09:34:15] which is fine, but legacy :) [09:34:15] which is fine, but legacy :) [09:34:18] yep, that's what I was reading as well [09:34:18] yep, that's what I was reading as well [09:34:41] and I think that container/repo have maven shell out to install nodejs/npm etc [09:34:41] and I think that container/repo have maven shell out to install nodejs/npm etc [09:35:19] (03Merged) 10jenkins-bot: Add GraphQL MediaWiki extension [integration/config] - 10https://gerrit.wikimedia.org/r/474647 (owner: 10Hashar) [09:35:19] (03Merged) 10jenkins-bot: Add GraphQL MediaWiki extension [integration/config] - 10https://gerrit.wikimedia.org/r/474647 (owner: 10Hashar) [09:35:21] (03Merged) 10jenkins-bot: Add some BlueSpice MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/474650 (owner: 10Hashar) [09:35:21] (03Merged) 10jenkins-bot: Add some BlueSpice MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/474650 (owner: 10Hashar) [09:36:23] yep, it does install node / npm and then delegates to whatever node build chain to do the build [09:36:23] yep, it does install node / npm and then delegates to whatever node build chain to do the build [09:36:46] not sue if that would take care of installing pahntom itself if needed (and not even sure it is needed) [09:36:46] not sue if that would take care of installing pahntom itself if needed (and not even sure it is needed) [09:38:10] seems phantomjs got added to the container, this way mvn/npm or whatever needs it should be able to reuse the locally installed version [09:38:10] seems phantomjs got added to the container, this way mvn/npm or whatever needs it should be able to reuse the locally installed version [09:38:28] that saves CI from having to download the binary over and over and over :) [09:38:28] that saves CI from having to download the binary over and over and over :) [09:47:21] (03PS1) 10Hashar: Fix up BlueSpiceQrCode repo name [integration/config] - 10https://gerrit.wikimedia.org/r/474655 [09:47:21] (03PS1) 10Hashar: Fix up BlueSpiceQrCode repo name [integration/config] - 10https://gerrit.wikimedia.org/r/474655 [09:47:42] (03CR) 10Hashar: [C: 032] "I am tired :/" [integration/config] - 10https://gerrit.wikimedia.org/r/474655 (owner: 10Hashar) [09:47:42] (03CR) 10Hashar: [C: 032] "I am tired :/" [integration/config] - 10https://gerrit.wikimedia.org/r/474655 (owner: 10Hashar) [09:48:09] * addshore hands hashar a covfefe [09:48:09] ACTION hands hashar a covfefe [09:48:29] (03PS2) 10Hashar: Fix up BlueSpice repo names [integration/config] - 10https://gerrit.wikimedia.org/r/474655 [09:48:29] (03PS2) 10Hashar: Fix up BlueSpice repo names [integration/config] - 10https://gerrit.wikimedia.org/r/474655 [09:48:45] (03CR) 10Hashar: [C: 032] Fix up BlueSpice repo names [integration/config] - 10https://gerrit.wikimedia.org/r/474655 (owner: 10Hashar) [09:48:45] (03CR) 10Hashar: [C: 032] Fix up BlueSpice repo names [integration/config] - 10https://gerrit.wikimedia.org/r/474655 (owner: 10Hashar) [09:50:48] gehel: the git log for releng/java8-wikidata-query-rdf container explicitly states the project requires npm and phantomjs. That is the reason for the different container [09:50:49] gehel: the git log for releng/java8-wikidata-query-rdf container explicitly states the project requires npm and phantomjs. That is the reason for the different container [09:51:13] annd apparently it hasn't been updated to catch up with changes made to releng/java8 [09:51:13] annd apparently it hasn't been updated to catch up with changes made to releng/java8 [09:51:43] mjolnir / xgboost as well [09:51:43] mjolnir / xgboost as well [09:51:52] so maybe we can just rebuild them [09:51:52] so maybe we can just rebuild them [09:52:04] (03Merged) 10jenkins-bot: Fix up BlueSpice repo names [integration/config] - 10https://gerrit.wikimedia.org/r/474655 (owner: 10Hashar) [09:52:04] (03Merged) 10jenkins-bot: Fix up BlueSpice repo names [integration/config] - 10https://gerrit.wikimedia.org/r/474655 (owner: 10Hashar) [09:59:10] hashar: yeah, probably [09:59:11] hashar: yeah, probably [09:59:38] mjolnir / xgboost don't have the sonar post-build steps, so no need to rebuild those [09:59:38] mjolnir / xgboost don't have the sonar post-build steps, so no need to rebuild those [10:00:11] mjolnir / xgboost are anemic projects, with a very minimal maven configuration [10:00:11] mjolnir / xgboost are anemic projects, with a very minimal maven configuration [10:00:30] we might want to align them at some point, but there isn't much value there [10:00:30] we might want to align them at some point, but there isn't much value there [10:02:12] ok :) [10:02:12] ok :) [10:02:23] note I have absolutely no idea what those code sources are for :) [10:02:23] note I have absolutely no idea what those code sources are for :) [10:02:48] hashar: any chance you could do the rebuild of the wikidata image? It's probably going to take you as much time to explain it to me than to do it yourself [10:02:48] hashar: any chance you could do the rebuild of the wikidata image? It's probably going to take you as much time to explain it to me than to do it yourself [10:03:14] I can explain mjolnir / xgboost if you want to know :) [10:03:14] I can explain mjolnir / xgboost if you want to know :) [10:04:34] oh [10:04:34] oh [10:04:41] well it is "easy" [10:04:42] well it is "easy" [10:04:54] the containers are all defined in integration/config.git in dockerfiles/ directory [10:04:54] the containers are all defined in integration/config.git in dockerfiles/ directory [10:05:05] that is build by a software named docker-pkg and written by Giuseppe [10:05:05] that is build by a software named docker-pkg and written by Giuseppe [10:05:21] to rebuild a container one "just" have to add a new entry in the changelog file for the containers to be rebuild [10:05:21] to rebuild a container one "just" have to add a new entry in the changelog file for the containers to be rebuild [10:05:37] I can create the patch! [10:05:37] I can create the patch! [10:05:47] dch -i -c dockerfiles/java8-wikidata-query-rdf/changelog [10:05:47] dch -i -c dockerfiles/java8-wikidata-query-rdf/changelog [10:06:01] add something meaningful to describe the change like: Rebuild to catch up with parent container changes [10:06:01] add something meaningful to describe the change like: Rebuild to catch up with parent container changes [10:06:35] being obsessive one can look at which parent container (releng/java8) was being used and add the list of changes from parent [10:06:35] being obsessive one can look at which parent container (releng/java8) was being used and add the list of changes from parent [10:06:44] + eventually try to figure out which packages got updated [10:06:44] + eventually try to figure out which packages got updated [10:06:53] but it is surely easier to just rebuild and live with it [10:06:53] but it is surely easier to just rebuild and live with it [10:07:02] then the change get approved/CR+2 [10:07:02] then the change get approved/CR+2 [10:07:38] and we rebuild it on contint1001.wikimedia.org machine using docker-pkg. I can't remember the process [10:07:38] and we rebuild it on contint1001.wikimedia.org machine using docker-pkg. I can't remember the process [10:07:57] but we have a Fabric task to ease it, so one of us with shell access to contint1001 can just: fab deploy_docker [10:07:57] but we have a Fabric task to ease it, so one of us with shell access to contint1001 can just: fab deploy_docker [10:07:59] and hope [10:07:59] and hope [10:07:59] :) [10:07:59] :) [10:08:47] (03PS1) 10Gehel: rebuild wikidata-query-rdf docker image [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) [10:08:47] (03PS1) 10Gehel: rebuild wikidata-query-rdf docker image [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) [10:09:31] hashar: ok, patch created, I'll let you review and fabric it further along [10:09:31] hashar: ok, patch created, I'll let you review and fabric it further along [10:10:18] Oh, and I should probably update the job configuration to use the latest tag [10:10:18] Oh, and I should probably update the job configuration to use the latest tag [10:11:13] (03PS2) 10Gehel: rebuild wikidata-query-rdf docker image [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) [10:11:13] (03PS2) 10Gehel: rebuild wikidata-query-rdf docker image [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) [10:17:05] (03CR) 10Hashar: [C: 032] rebuild wikidata-query-rdf docker image [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) (owner: 10Gehel) [10:17:05] (03CR) 10Hashar: [C: 032] rebuild wikidata-query-rdf docker image [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) (owner: 10Gehel) [10:17:20] gehel: +2ed :) [10:17:20] gehel: +2ed :) [10:17:34] thanks! [10:17:34] thanks! [10:17:49] any other step needed from me to deploy it? [10:17:49] any other step needed from me to deploy it? [10:19:40] once merged you can try fab deploy_docker [10:19:40] once merged you can try fab deploy_docker [10:20:48] ok, will try [10:20:48] ok, will try [10:22:42] (03Merged) 10jenkins-bot: rebuild wikidata-query-rdf docker image [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) (owner: 10Gehel) [10:22:42] (03Merged) 10jenkins-bot: rebuild wikidata-query-rdf docker image [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) (owner: 10Gehel) [10:23:45] !log Updating docker-pkg files on contint1001 for wikdiata-query-rdf image [10:23:45] !log Updating docker-pkg files on contint1001 for wikdiata-query-rdf image [10:23:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:23:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:24:12] hashar: Permission denied: '/etc/docker-pkg/integration.yaml' [10:24:12] hashar: Permission denied: '/etc/docker-pkg/integration.yaml' [10:24:43] yeah, I'm not in contint-admins [10:24:43] yeah, I'm not in contint-admins [10:24:59] hashar / zeljkof: I'll let you do the deploy [10:24:59] hashar / zeljkof: I'll let you do the deploy [10:36:51] hashaR / zeljkof: did you manage to deploy that new image? [10:36:51] hashaR / zeljkof: did you manage to deploy that new image? [10:37:18] gehel: sorry, I was in meeting with hashar, so both of us were busy :D [10:37:18] gehel: sorry, I was in meeting with hashar, so both of us were busy :D [10:37:35] hashar: are you deploying the image? should I? [10:37:35] hashar: are you deploying the image? should I? [10:37:52] no problem, just wanted to make sure we don't stay with a change merged, but not deployed [10:37:52] no problem, just wanted to make sure we don't stay with a change merged, but not deployed [10:37:59] I'm not sure I've ever deployed an image, or container [10:37:59] I'm not sure I've ever deployed an image, or container [10:38:28] zeljkof: according to hashar, it should be a simple `fab deploy_docker` [10:38:28] zeljkof: according to hashar, it should be a simple `fab deploy_docker` [10:39:41] ok, what's the patch? [10:39:41] ok, what's the patch? [10:40:05] patch is already merged: https://gerrit.wikimedia.org/r/c/integration/config/+/474660 [10:40:05] patch is already merged: https://gerrit.wikimedia.org/r/c/integration/config/+/474660 [10:41:22] ok, looking [10:41:22] ok, looking [10:42:31] found the docs https://www.mediawiki.org/wiki/Continuous_integration/Docker#Publishing_docker-pkg_images [10:42:31] found the docs https://www.mediawiki.org/wiki/Continuous_integration/Docker#Publishing_docker-pkg_images [10:45:36] ok, so looks like that's something hashar will have to do [10:45:36] ok, so looks like that's something hashar will have to do [10:45:53] my python installation is constantly fighting me :/ [10:45:53] my python installation is constantly fighting me :/ [10:45:58] ahah [10:45:58] ahah [10:45:59] I've managed to install fabric [10:45:59] I've managed to install fabric [10:46:02] pip3 install fabric==1.14.0 [10:46:02] pip3 install fabric==1.14.0 [10:46:16] but `fab deploy_docker` fails [10:46:16] but `fab deploy_docker` fails [10:46:23] can you run just "fab" ? [10:46:23] can you run just "fab" ? [10:46:27] should show an help message [10:46:27] should show an help message [10:46:37] https://www.irccloud.com/pastebin/m9uYlFrJ/ [10:46:37] https://www.irccloud.com/pastebin/m9uYlFrJ/ [10:46:50] fun [10:46:50] fun [10:46:52] yeah, the same error message [10:46:52] yeah, the same error message [10:46:54] :P [10:46:54] :P [10:47:05] maybe it is not compatible with python3.7 :/ [10:47:05] maybe it is not compatible with python3.7 :/ [10:47:13] I'll go stretch for 5 minutes, to get the python tension out of me ;P [10:47:13] I'll go stretch for 5 minutes, to get the python tension out of me ;P [10:47:18] try: pip 3 uninstall fabric ; pip install fabric==1.14.0 [10:47:18] try: pip 3 uninstall fabric ; pip install fabric==1.14.0 [10:47:23] should get you the python2 version [10:47:23] should get you the python2 version [10:47:29] ok, will try [10:47:29] ok, will try [10:47:40] https://stackoverflow.com/questions/29306752/fabric-import-error-cannot-import-name-ismappingtype [10:47:40] https://stackoverflow.com/questions/29306752/fabric-import-error-cannot-import-name-ismappingtype [10:47:49] fabric doesn't support Python 3: [10:47:49] fabric doesn't support Python 3: [10:47:52] Fabric is a Python (2.5-2.7) library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks. [10:47:52] Fabric is a Python (2.5-2.7) library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks. [10:47:53] so yeah [10:47:53] so yeah [10:47:56] python2 via pip [10:47:56] python2 via pip [10:47:57] :) [10:47:57] :) [10:49:02] aaaaand looks like my pyhon 2 does not even have pip [10:49:02] aaaaand looks like my pyhon 2 does not even have pip [10:49:06] I hate pyhon [10:49:06] I hate pyhon [10:49:09] python [10:49:09] python [10:53:03] 10Phabricator: Create subtask should not open new task with deadline form - https://phabricator.wikimedia.org/T209799 (10Aklapper) If I click "Create Subtask" in this task, form 3 opens for me (basically form 1 without Priority field)... [10:53:03] 10Phabricator: Create subtask should not open new task with deadline form - https://phabricator.wikimedia.org/T209799 (10Aklapper) If I click "Create Subtask" in this task, form 3 opens for me (basically form 1 without Priority field)... [10:54:00] (03PS5) 10Tarrow: readme: how to reproduce a CI build [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [10:54:00] (03PS5) 10Tarrow: readme: how to reproduce a CI build [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [10:55:17] (03CR) 10jerkins-bot: [V: 04-1] readme: how to reproduce a CI build [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [10:55:17] (03CR) 10jerkins-bot: [V: 04-1] readme: how to reproduce a CI build [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [10:57:45] 10Phabricator: Create subtask should not open new task with deadline form - https://phabricator.wikimedia.org/T209799 (10MarcoAurelio) If I click "Create Subtask" in this task, form 10 opens for me: https://phabricator.wikimedia.org/maniphest/task/edit/form/10/?parent=209799&template=209799&status=open [10:57:45] 10Phabricator: Create subtask should not open new task with deadline form - https://phabricator.wikimedia.org/T209799 (10MarcoAurelio) If I click "Create Subtask" in this task, form 10 opens for me: https://phabricator.wikimedia.org/maniphest/task/edit/form/10/?parent=209799&template=209799&status=open [11:01:04] zeljkof: you are on a mac aren't you? [11:01:04] zeljkof: you are on a mac aren't you? [11:01:19] maybe they only provide python3 [11:01:19] maybe they only provide python3 [11:03:12] zeljkof: I will rebuild the container for gehel [11:03:12] zeljkof: I will rebuild the container for gehel [11:05:56] Home brew made python3 the default at one point then changed it back to python2 [11:05:56] Home brew made python3 the default at one point then changed it back to python2 [11:06:28] ... [11:06:28] ... [11:06:43] docker-registry.discovery.wmnet/releng/java8-wikidata-query-rdf:0.1.3 is buildin [11:06:43] docker-registry.discovery.wmnet/releng/java8-wikidata-query-rdf:0.1.3 is buildin [11:06:43] g [11:06:44] g [11:12:22] hashar: thanks, I'll fight with my python setup later [11:12:22] hashar: thanks, I'll fight with my python setup later [11:13:23] !log updating jobs wikidata-query-rdf-maven-java8-docker wikidata-query-rdf-maven-java8-docker-site-publish for https://gerrit.wikimedia.org/r/#/c/integration/config/+/474660/ [11:13:23] !log updating jobs wikidata-query-rdf-maven-java8-docker wikidata-query-rdf-maven-java8-docker-site-publish for https://gerrit.wikimedia.org/r/#/c/integration/config/+/474660/ [11:13:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:13:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:13:54] (03CR) 10Hashar: [C: 032] "Jobs updated wikidata-query-rdf-maven-java8-docker and wikidata-query-rdf-maven-java8-docker-site-publish" [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) (owner: 10Gehel) [11:13:54] (03CR) 10Hashar: [C: 032] "Jobs updated wikidata-query-rdf-maven-java8-docker and wikidata-query-rdf-maven-java8-docker-site-publish" [integration/config] - 10https://gerrit.wikimedia.org/r/474660 (https://phabricator.wikimedia.org/T207046) (owner: 10Gehel) [11:13:58] gehel: jobs updated :) [11:13:58] gehel: jobs updated :) [11:14:05] hashar: thanks! [11:14:05] hashar: thanks! [11:20:34] (03PS6) 10Tarrow: readme: how to reproduce a CI build [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [11:20:34] (03PS6) 10Tarrow: readme: how to reproduce a CI build [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [11:33:39] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog): Zuul Clone frequently fails in php70-phan-docker job - https://phabricator.wikimedia.org/T188958 (10hashar) 05Open>03Resolved a:03hashar I have looked at the build log and the only match I found is August 20 2018 build https:/... [11:33:39] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog): Zuul Clone frequently fails in php70-phan-docker job - https://phabricator.wikimedia.org/T188958 (10hashar) 05Open>03Resolved a:03hashar I have looked at the build log and the only match I found is August 20 2018 build https:/... [11:59:35] (03CR) 10D3r1ck01: [C: 031] "LGTM! Great work @zD/Zd1LqRH ;)" [integration/config] - 10https://gerrit.wikimedia.org/r/474549 (https://phabricator.wikimedia.org/T200778) (owner: 10Nils ANDRE) [11:59:35] (03CR) 10D3r1ck01: [C: 031] "LGTM! Great work @zD/Zd1LqRH ;)" [integration/config] - 10https://gerrit.wikimedia.org/r/474549 (https://phabricator.wikimedia.org/T200778) (owner: 10Nils ANDRE) [12:14:37] (03CR) 10Hashar: [C: 032] "Will be deployed in a few minutes" [integration/config] - 10https://gerrit.wikimedia.org/r/474549 (https://phabricator.wikimedia.org/T200778) (owner: 10Nils ANDRE) [12:14:38] (03CR) 10Hashar: [C: 032] "Will be deployed in a few minutes" [integration/config] - 10https://gerrit.wikimedia.org/r/474549 (https://phabricator.wikimedia.org/T200778) (owner: 10Nils ANDRE) [12:18:02] (03Merged) 10jenkins-bot: Add zD/Zd1LqRH to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/474549 (https://phabricator.wikimedia.org/T200778) (owner: 10Nils ANDRE) [12:18:02] (03Merged) 10jenkins-bot: Add zD/Zd1LqRH to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/474549 (https://phabricator.wikimedia.org/T200778) (owner: 10Nils ANDRE) [13:10:53] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Aklapper) @Krishna_Chaitanya_Velaga : For the start, could you [request a project](https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Project-Admins) by following... [13:10:53] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Aklapper) @Krishna_Chaitanya_Velaga : For the start, could you [request a project](https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Project-Admins) by following... [13:21:10] !log Created integration-publishing02 172.16.4.5 for WMCS region migration # T208803 [13:21:10] !log Created integration-publishing02 172.16.4.5 for WMCS region migration # T208803 [13:21:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:21:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:21:20] T208803: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 [13:21:21] T208803: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 [13:21:37] Project beta-scap-eqiad build #228211: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228211/ [13:21:37] Project beta-scap-eqiad build #228211: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228211/ [13:36:28] Yippee, build fixed! [13:36:28] Yippee, build fixed! [13:36:28] Project beta-scap-eqiad build #228212: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228212/ [13:36:28] Project beta-scap-eqiad build #228212: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/228212/ [13:40:02] (03PS1) 10Hashar: Switch to integration-publishing02 [integration/config] - 10https://gerrit.wikimedia.org/r/474689 (https://phabricator.wikimedia.org/T208803) [13:40:02] (03PS1) 10Hashar: Switch to integration-publishing02 [integration/config] - 10https://gerrit.wikimedia.org/r/474689 (https://phabricator.wikimedia.org/T208803) [13:46:22] (03CR) 10Hashar: [C: 032] Switch to integration-publishing02 [integration/config] - 10https://gerrit.wikimedia.org/r/474689 (https://phabricator.wikimedia.org/T208803) (owner: 10Hashar) [13:46:22] (03CR) 10Hashar: [C: 032] Switch to integration-publishing02 [integration/config] - 10https://gerrit.wikimedia.org/r/474689 (https://phabricator.wikimedia.org/T208803) (owner: 10Hashar) [13:47:40] !log Shutdown integration-publishing , replaced by integration-publishing02 # T208803 [13:47:40] !log Shutdown integration-publishing , replaced by integration-publishing02 # T208803 [13:47:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:47:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:47:43] T208803: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 [13:47:43] T208803: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 [13:48:42] (03Merged) 10jenkins-bot: Switch to integration-publishing02 [integration/config] - 10https://gerrit.wikimedia.org/r/474689 (https://phabricator.wikimedia.org/T208803) (owner: 10Hashar) [13:48:42] (03Merged) 10jenkins-bot: Switch to integration-publishing02 [integration/config] - 10https://gerrit.wikimedia.org/r/474689 (https://phabricator.wikimedia.org/T208803) (owner: 10Hashar) [13:52:28] 10Phabricator: Create subtask should not open new task with deadline form - https://phabricator.wikimedia.org/T209799 (10Dvorapa) This is weird. Also try edit task (form 10 too). I don't know, where deadline is needed (form 10), but we get this form almost on every occasion possible (except new tasks' simple for... [13:52:28] 10Phabricator: Create subtask should not open new task with deadline form - https://phabricator.wikimedia.org/T209799 (10Dvorapa) This is weird. Also try edit task (form 10 too). I don't know, where deadline is needed (form 10), but we get this form almost on every occasion possible (except new tasks' simple for... [13:52:58] PROBLEM - Host integration-publishing is DOWN: CRITICAL - Host Unreachable (10.68.23.254) [13:52:58] PROBLEM - Host integration-publishing is DOWN: CRITICAL - Host Unreachable (10.68.23.254) [14:30:53] 10Release-Engineering-Team (Kanban), 10Cloud-Services, 10Epic, 10Patch-For-Review: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 (10hashar) I have switched the CI jobs from integration-publishing to integration-publishing02 in the new region. I have shutdown... [14:30:53] 10Release-Engineering-Team (Kanban), 10Cloud-Services, 10Epic, 10Patch-For-Review: Migrate the Integration cloud project to eqiad1-r - https://phabricator.wikimedia.org/T208803 (10hashar) I have switched the CI jobs from integration-publishing to integration-publishing02 in the new region. I have shutdown... [15:06:39] !log ores:e957b24 is going beta [15:06:39] !log ores:e957b24 is going beta [15:06:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:06:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:12:55] RECOVERY - SSH on deployment-logstash2 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u7 (protocol 2.0) [15:12:55] RECOVERY - SSH on deployment-logstash2 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u7 (protocol 2.0) [15:29:36] (03CR) 10Hashar: [C: 032] Add LukBukkit to the CI whitelist (GCI student) [integration/config] - 10https://gerrit.wikimedia.org/r/474453 (https://phabricator.wikimedia.org/T200778) (owner: 10LukBukkit) [15:29:36] (03CR) 10Hashar: [C: 032] Add LukBukkit to the CI whitelist (GCI student) [integration/config] - 10https://gerrit.wikimedia.org/r/474453 (https://phabricator.wikimedia.org/T200778) (owner: 10LukBukkit) [15:32:53] 10Continuous-Integration-Infrastructure (shipyard), 10Wikidata, 10Wikidata Query UI: WDQS GUI build fails on CI - https://phabricator.wikimedia.org/T209776 (10hashar) [15:32:53] 10Continuous-Integration-Infrastructure (shipyard), 10Wikidata, 10Wikidata Query UI: WDQS GUI build fails on CI - https://phabricator.wikimedia.org/T209776 (10hashar) [15:33:00] (03Merged) 10jenkins-bot: Add LukBukkit to the CI whitelist (GCI student) [integration/config] - 10https://gerrit.wikimedia.org/r/474453 (https://phabricator.wikimedia.org/T200778) (owner: 10LukBukkit) [15:33:00] (03Merged) 10jenkins-bot: Add LukBukkit to the CI whitelist (GCI student) [integration/config] - 10https://gerrit.wikimedia.org/r/474453 (https://phabricator.wikimedia.org/T200778) (owner: 10LukBukkit) [15:40:42] 10Continuous-Integration-Infrastructure (shipyard), 10Wikidata, 10Wikidata Query UI: WDQS GUI build fails on CI - https://phabricator.wikimedia.org/T209776 (10hashar) `grunt-contrib-qunit` 0.3.0 migrates from PhantomJS to Chrome using Puppeteer. So we can probably drop PhantomJS from the container. Puppetee... [15:40:42] 10Continuous-Integration-Infrastructure (shipyard), 10Wikidata, 10Wikidata Query UI: WDQS GUI build fails on CI - https://phabricator.wikimedia.org/T209776 (10hashar) `grunt-contrib-qunit` 0.3.0 migrates from PhantomJS to Chrome using Puppeteer. So we can probably drop PhantomJS from the container. Puppetee... [15:49:28] 10Continuous-Integration-Infrastructure (shipyard), 10Wikidata, 10Wikidata Query UI: WDQS GUI build fails on CI - https://phabricator.wikimedia.org/T209776 (10hashar) The change got updated to pass options to puppeteer / Chrome: https://gerrit.wikimedia.org/r/#/c/wikidata/query/gui/+/472969/1..2/Gruntfile.js... [15:49:29] 10Continuous-Integration-Infrastructure (shipyard), 10Wikidata, 10Wikidata Query UI: WDQS GUI build fails on CI - https://phabricator.wikimedia.org/T209776 (10hashar) The change got updated to pass options to puppeteer / Chrome: https://gerrit.wikimedia.org/r/#/c/wikidata/query/gui/+/472969/1..2/Gruntfile.js... [15:53:06] (03CR) 10Addshore: [C: 031] "Looks good." [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [15:53:06] (03CR) 10Addshore: [C: 031] "Looks good." [integration/quibble] - 10https://gerrit.wikimedia.org/r/452335 (https://phabricator.wikimedia.org/T200991) (owner: 10Pablo Grass (WMDE)) [15:53:34] !log cherry-picking gerrit:474694/1 in beta puppetmaster [15:53:34] !log cherry-picking gerrit:474694/1 in beta puppetmaster [15:53:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:53:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:54:59] hashar, thanks [15:54:59] hashar, thanks [15:58:12] 10Gerrit, 10MediaWiki-Vagrant, 10Patch-For-Review: git-review fails with "The requested URL /changes/ was not found on this server." - https://phabricator.wikimedia.org/T163242 (10hashar) Well done @tgr , indeed git-review forge the Gerrit API url based on the git remote URL. Thank you to have updated git-r... [15:58:12] 10Gerrit, 10MediaWiki-Vagrant, 10Patch-For-Review: git-review fails with "The requested URL /changes/ was not found on this server." - https://phabricator.wikimedia.org/T163242 (10hashar) Well done @tgr , indeed git-review forge the Gerrit API url based on the git remote URL. Thank you to have updated git-r... [16:19:08] (03PS1) 10Gehel: Add a docker image for sonar-scanner. [integration/config] - 10https://gerrit.wikimedia.org/r/474729 (https://phabricator.wikimedia.org/T209849) [16:19:08] (03PS1) 10Gehel: Add a docker image for sonar-scanner. [integration/config] - 10https://gerrit.wikimedia.org/r/474729 (https://phabricator.wikimedia.org/T209849) [16:21:00] Krenair: monday is my tahnks day :))) [16:21:00] Krenair: monday is my tahnks day :))) [16:21:13] * hashar takes some time off. bb [16:21:13] ACTION takes some time off. bb [16:21:13] l [16:21:13] l [16:21:45] (03CR) 10jerkins-bot: [V: 04-1] Add a docker image for sonar-scanner. [integration/config] - 10https://gerrit.wikimedia.org/r/474729 (https://phabricator.wikimedia.org/T209849) (owner: 10Gehel) [16:21:45] (03CR) 10jerkins-bot: [V: 04-1] Add a docker image for sonar-scanner. [integration/config] - 10https://gerrit.wikimedia.org/r/474729 (https://phabricator.wikimedia.org/T209849) (owner: 10Gehel) [16:22:37] hmmmmmm [16:22:37] hmmmmmm [16:22:37] Scribunto is not available [16:22:37] Scribunto is not available [16:22:46] hashar: https://integration.wikimedia.org/ci/job/wikibase-repo-docker/61/console [16:22:46] hashar: https://integration.wikimedia.org/ci/job/wikibase-repo-docker/61/console [16:22:52] did you change something ? :D [16:22:52] did you change something ? :D [16:23:14] (03PS2) 10Gehel: Add a docker image for sonar-scanner. [integration/config] - 10https://gerrit.wikimedia.org/r/474729 (https://phabricator.wikimedia.org/T209849) [16:23:14] (03PS2) 10Gehel: Add a docker image for sonar-scanner. [integration/config] - 10https://gerrit.wikimedia.org/r/474729 (https://phabricator.wikimedia.org/T209849) [16:30:40] 10Phabricator, 10Wikimedia-Logstash: Get phabricator error logs into logstash - https://phabricator.wikimedia.org/T141895 (10herron) [16:30:40] 10Phabricator, 10Wikimedia-Logstash: Get phabricator error logs into logstash - https://phabricator.wikimedia.org/T141895 (10herron) [16:35:42] twentyafterfour: was your suggestion that we move deployment-prep this week in earnest? I have most of Tuesday and Wednesday available to work on that if so. [16:35:42] twentyafterfour: was your suggestion that we move deployment-prep this week in earnest? I have most of Tuesday and Wednesday available to work on that if so. [16:38:31] andrewbogott: Indeed, I can help out if that's when you want to do it. Do you think it's enough time to work out all the issues? I'm not intimately familiar with most of the beta cluster configs but I can usually make sense of puppet code without much struggling. [16:38:31] andrewbogott: Indeed, I can help out if that's when you want to do it. Do you think it's enough time to work out all the issues? I'm not intimately familiar with most of the beta cluster configs but I can usually make sense of puppet code without much struggling. [16:39:06] It's hard to say if it's enough time. If I start right now then it probably is :) [16:39:06] It's hard to say if it's enough time. If I start right now then it probably is :) [16:39:36] I actually need to leave early on Wednesday but in theory by that time it would just be cleanup (chasing misplaced IPs) that can be done without me [16:39:36] I actually need to leave early on Wednesday but in theory by that time it would just be cleanup (chasing misplaced IPs) that can be done without me [16:40:45] should I prepare a deployment-cache-text05 to move traffic to? [16:40:45] should I prepare a deployment-cache-text05 to move traffic to? [16:40:46] And of course it's not a short week for Krenair or arturo or giovanni [16:40:46] And of course it's not a short week for Krenair or arturo or giovanni [16:40:47] andrewbogott: ok. I'm flying to california on Saturday but maybe one of my team mates can take care of any loose ends next week if there are any. [16:40:47] andrewbogott: ok. I'm flying to california on Saturday but maybe one of my team mates can take care of any loose ends next week if there are any. [16:41:28] we had previously discussed timing this so it would fit in between deployment cycles… is deployment frozen this week? [16:41:28] we had previously discussed timing this so it would fit in between deployment cycles… is deployment frozen this week? [16:42:14] not entirely [16:42:14] not entirely [16:42:20] there's SWAT windows and stuff [16:42:20] there's SWAT windows and stuff [16:42:22] our team meeting is in 20 minutes so I'll discuss it with the boss man and I'll give you a committment after the meeting if Greg doesn't object. [16:42:22] our team meeting is in 20 minutes so I'll discuss it with the boss man and I'll give you a committment after the meeting if Greg doesn't object. [16:42:26] looks like no train though [16:42:27] looks like no train though [16:42:41] twentyafterfour: sounds great, just let me know [16:42:41] twentyafterfour: sounds great, just let me know [16:42:48] speaking of which, there's still a pre-train sanity break window? [16:42:48] speaking of which, there's still a pre-train sanity break window? [16:43:23] And, I'm sorry if I'm pushing this too hard… we definitely /can/ just wait until December to do this if that turns out to be what people want. [16:43:24] And, I'm sorry if I'm pushing this too hard… we definitely /can/ just wait until December to do this if that turns out to be what people want. [16:43:38] also the next SWAT window has over 6 patches in it :/ [16:43:38] also the next SWAT window has over 6 patches in it :/ [16:44:37] andrewbogott: no problem, you aren't pushing too hard, if we can avoid doing things twice that seems like the way to go... [16:44:37] andrewbogott: no problem, you aren't pushing too hard, if we can avoid doing things twice that seems like the way to go... [16:44:42] Krenair: :-/ [16:44:42] Krenair: :-/ [16:45:44] I don't know about the swats or the train this week, I've been deep phabricator code recently [16:45:44] I don't know about the swats or the train this week, I've been deep phabricator code recently [16:45:53] I just looked at the calendar [16:45:53] I just looked at the calendar [16:52:37] no train, just swats [16:52:37] no train, just swats [16:53:04] I kept the pre-train sanity break because... lazy? no reason, feel free to remove :) [16:53:04] I kept the pre-train sanity break because... lazy? no reason, feel free to remove :) [16:53:17] I'm not bothered by it, it just felt like maybe someone forgot about it [16:53:17] I'm not bothered by it, it just felt like maybe someone forgot about it [16:53:35] indeed [16:53:35] indeed [17:20:19] andrewbogott: greg says to go for it. So you've got my attention for the short week. [17:20:19] andrewbogott: greg says to go for it. So you've got my attention for the short week. [17:20:35] Great! I'll start moving things as soon as I'm out of this meeting. [17:20:35] Great! I'll start moving things as soon as I'm out of this meeting. [17:20:43] Do you have opinions about things you'd like to see moved first or last? [17:20:43] Do you have opinions about things you'd like to see moved first or last? [17:21:40] andrewbogott: not sure really, Krenair might have an opinion about that [17:21:40] andrewbogott: not sure really, Krenair might have an opinion about that [17:22:25] andrewbogott, I'd suggest not touching cache-text [17:22:25] andrewbogott, I'd suggest not touching cache-text [17:22:30] for the moment [17:22:30] for the moment [17:22:57] ok [17:22:57] ok [17:23:11] never do two hosts of the same type at the same time [17:23:11] never do two hosts of the same type at the same time [17:23:19] the db hosts... might be best to stop mysql gracefully first [17:23:19] the db hosts... might be best to stop mysql gracefully first [17:24:47] some of the migrations will temporarily break certain functionality but the only one I would expect to totally take the whole of beta out would be cache-text04 [17:24:47] some of the migrations will temporarily break certain functionality but the only one I would expect to totally take the whole of beta out would be cache-text04 [17:25:02] Krenair: I'm still in a meeting but if you want to re-order the list here, https://etherpad.wikimedia.org/p/deployment-prep-to-neutron [17:25:02] Krenair: I'm still in a meeting but if you want to re-order the list here, https://etherpad.wikimedia.org/p/deployment-prep-to-neutron [17:25:20] I can do them in order (although of course will be moving three or four in parallel at any one time) [17:25:20] I can do them in order (although of course will be moving three or four in parallel at any one time) [17:25:26] there doesn't necessarily need to be a strict order [17:25:27] there doesn't necessarily need to be a strict order [17:25:36] ok, mostly just save cache-test04 for last? [17:25:36] ok, mostly just save cache-test04 for last? [17:26:08] that doesn't have to be last if we decide to spin up cache-text05 to take over its traffic [17:26:08] that doesn't have to be last if we decide to spin up cache-text05 to take over its traffic [17:26:37] ah, that would be good [17:26:37] ah, that would be good [17:26:54] theoretically the same applies to cache-upload but it's probably lower impact [17:26:54] theoretically the same applies to cache-upload but it's probably lower impact [17:27:26] let's just break them into groups of 4, keep cache-text out of it for the moment and ensure the groups don't contain multiple of the same type [17:27:26] let's just break them into groups of 4, keep cache-text out of it for the moment and ensure the groups don't contain multiple of the same type [17:29:39] looks good [17:29:39] looks good [17:30:03] deployment-rd3-cptest-master01 is a Redis Misc master (redis::misc::master) [17:30:04] deployment-rd3-cptest-master01 is a Redis Misc master (redis::misc::master) [17:30:04] The last Puppet run was at Thu Nov 1 17:10:22 UTC 2018 (25939 minutes ago). Puppet is disabled. Disabled by disable-puppet [17:30:04] The last Puppet run was at Thu Nov 1 17:10:22 UTC 2018 (25939 minutes ago). Puppet is disabled. Disabled by disable-puppet [17:30:08] disabled by disable-puppet? [17:30:08] disabled by disable-puppet? [17:31:05] hm [17:31:05] hm [17:31:14] best to get that enabled and running before we start :) [17:31:14] best to get that enabled and running before we start :) [17:32:22] oh, ms-fe will probably also break upload [17:32:22] oh, ms-fe will probably also break upload [17:32:34] but again I don't think that's a massive deal if it's temporary [17:32:34] but again I don't think that's a massive deal if it's temporary [17:33:13] Krenair: actually, I'm going to rearrange this list again so I can script the groups (so I don't have to wait for all four to finish before starting another four..) will ping you when I have that [17:33:13] Krenair: actually, I'm going to rearrange this list again so I can script the groups (so I don't have to wait for all four to finish before starting another four..) will ping you when I have that [17:33:20] okay... [17:33:20] okay... [17:40:31] -maps05 is also not running puppet [17:40:31] -maps05 is also not running puppet [17:41:08] The last Puppet run was at Mon Nov 12 14:07:48 UTC 2018 (10293 minutes ago). Puppet is disabled. disabled puppet for osm auth test [17:41:09] The last Puppet run was at Mon Nov 12 14:07:48 UTC 2018 (10293 minutes ago). Puppet is disabled. disabled puppet for osm auth test [17:53:35] Krenair: so, I'd like to script each group listed on the etherpad and then run all four group scripts at the same time. Feel free to rearrange things as needed if you see VMs in different groups that would be disastrous to move at the same time. The groups are based on your organization below. [17:53:35] Krenair: so, I'd like to script each group listed on the etherpad and then run all four group scripts at the same time. Feel free to rearrange things as needed if you see VMs in different groups that would be disastrous to move at the same time. The groups are based on your organization below. [17:54:09] maps04 and maps05 in group 1? [17:54:09] maps04 and maps05 in group 1? [17:54:20] elastic06 and elastic07 in group 1? [17:54:20] elastic06 and elastic07 in group 1? [17:54:31] kafka-main-1 and kafka-jumbo-1 in group 1? [17:54:32] kafka-main-1 and kafka-jumbo-1 in group 1? [17:54:47] sca01 and sca02 in group 1? [17:54:47] sca01 and sca02 in group 1? [17:55:07] I don't know how disastrous any of these would be but I would avoid it. [17:55:07] I don't know how disastrous any of these would be but I would avoid it. [17:56:04] so, sorry if this was unclear... [17:56:04] so, sorry if this was unclear... [17:56:13] each thing in group 1 will be done in sequence, for that group [17:56:13] each thing in group 1 will be done in sequence, for that group [17:56:46] right so these are not simultaneous [17:56:46] right so these are not simultaneous [17:56:50] right [17:56:50] right [17:57:13] the idea being that the script does one and then moves straight onto the next one [17:57:13] the idea being that the script does one and then moves straight onto the next one [17:58:53] I am worried that these groups are large [17:58:53] I am worried that these groups are large [17:59:00] correct. So having maps04 and maps05 in group1 together is a good thing right? [17:59:00] correct. So having maps04 and maps05 in group1 together is a good thing right? [17:59:06] don't really want to end up debugging three things at once [17:59:06] don't really want to end up debugging three things at once [17:59:10] yes [17:59:10] yes [17:59:49] ok — I can do the first few sets by hand so that you don't get hit by a firehose :) Then we can evaluate whether scripting is appropriate. [17:59:49] ok — I can do the first few sets by hand so that you don't get hit by a firehose :) Then we can evaluate whether scripting is appropriate. [18:03:08] twentyafterfour: I'm going to send the following to wikitech-l. Anything you'd like me to add? [18:03:08] twentyafterfour: I'm going to send the following to wikitech-l. Anything you'd like me to add? [18:03:20] https://www.irccloud.com/pastebin/VjgaI8Fq/ [18:03:20] https://www.irccloud.com/pastebin/VjgaI8Fq/ [18:07:23] maybe mention that this is the infrastructure behind beta.wmflabs.org [18:07:23] maybe mention that this is the infrastructure behind beta.wmflabs.org [18:09:32] yep, adding. [18:09:32] yep, adding. [18:09:33] andrewbogott: lgtm [18:09:33] andrewbogott: lgtm [18:11:35] andrewbogott, shall we remove maps05 and rd3-cptest-master01 from the list until someone fixes their puppet? [18:11:35] andrewbogott, shall we remove maps05 and rd3-cptest-master01 from the list until someone fixes their puppet? [18:11:58] yes, probably [18:11:58] yes, probably [18:12:05] puppet is also broken on the elastic nodes: [18:12:05] puppet is also broken on the elastic nodes: [18:12:11] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Missing title. The title expression resulted in undef at /etc/puppet/modules/elasticsearch/manifests/init.pp:112:35 on node deployment-elastic05.deployment-prep.eqiad.wmflabs [18:12:11] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Missing title. The title expression resulted in undef at /etc/puppet/modules/elasticsearch/manifests/init.pp:112:35 on node deployment-elastic05.deployment-prep.eqiad.wmflabs [18:12:26] I don't quite know what that is yet [18:12:26] I don't quite know what that is yet [18:12:32] don't worry about it [18:12:32] don't worry about it [18:12:57] just banish them from the new region until someone fixes it [18:12:57] just banish them from the new region until someone fixes it [18:13:01] ok :) [18:13:01] ok :) [18:16:13] Krenair: ok, I think I'm ready to move the first item in each group. Any reason to wait? [18:16:13] Krenair: ok, I think I'm ready to move the first item in each group. Any reason to wait? [18:16:28] the first item in each group? [18:16:28] the first item in each group? [18:17:20] thought you were starting with group 1 and having a script go through them sequentially [18:17:20] thought you were starting with group 1 and having a script go through them sequentially [18:17:41] ok, sorry, I'm communicating poorly :) [18:17:41] ok, sorry, I'm communicating poorly :) [18:17:49] the ones you've marked in progress should be fine [18:17:49] the ones you've marked in progress should be fine [18:18:12] but don't end up moving e.g. memc06 and memc07 at the same time [18:18:12] but don't end up moving e.g. memc06 and memc07 at the same time [18:18:17] ok. I'm not scripting for now, going to just do these by hand until we're convinced that there aren't too many surprises. [18:18:17] ok. I'm not scripting for now, going to just do these by hand until we're convinced that there aren't too many surprises. [18:18:21] ok [18:18:21] ok [18:22:20] 10Release-Engineering-Team, 10Release Pipeline: Initial production image build fails helm test - https://phabricator.wikimedia.org/T209871 (10thcipriani) [18:22:20] 10Release-Engineering-Team, 10Release Pipeline: Initial production image build fails helm test - https://phabricator.wikimedia.org/T209871 (10thcipriani) [18:22:34] andre__, Does the 'Remember me' box in Horizon do anything? [18:22:34] andre__, Does the 'Remember me' box in Horizon do anything? [18:22:38] andrewbogott, * sorry ! [18:22:38] andrewbogott, * sorry ! [18:22:52] Krenair: it does! [18:22:52] Krenair: it does! [18:23:00] It should preserve the session for 7 days [18:23:00] It should preserve the session for 7 days [18:23:02] vs. like 30 minutes [18:23:02] vs. like 30 minutes [18:23:05] I seem to have to log in every time :/ [18:23:06] I seem to have to log in every time :/ [18:23:08] 10Release-Engineering-Team, 10Release Pipeline: Initial production image build fails helm test - https://phabricator.wikimedia.org/T209871 (10thcipriani) p:05Triage>03Normal [18:23:09] 10Release-Engineering-Team, 10Release Pipeline: Initial production image build fails helm test - https://phabricator.wikimedia.org/T209871 (10thcipriani) p:05Triage>03Normal [18:23:14] interesting [18:23:14] interesting [18:23:38] It works for me, I'm pretty sure? But I don't know what all it does to preserve the session. [18:23:38] It works for me, I'm pretty sure? But I don't know what all it does to preserve the session. [18:23:47] maybe I just don't use it often enough [18:23:47] maybe I just don't use it often enough [18:23:52] andrewbogott, is eqiad1-r enabled for deployment-prep now? [18:23:52] andrewbogott, is eqiad1-r enabled for deployment-prep now? [18:23:55] PROBLEM - Host deployment-maps04 is DOWN: CRITICAL - Host Unreachable (10.68.19.18) [18:23:55] PROBLEM - Host deployment-maps04 is DOWN: CRITICAL - Host Unreachable (10.68.19.18) [18:23:57] I went to launch instance and it's greyed out :/ [18:23:57] I went to launch instance and it's greyed out :/ [18:23:59] longma: hello! [18:23:59] longma: hello! [18:24:20] Krenair: I didn't force a puppet run on the host so it might be a few minutes behind [18:24:21] Krenair: I didn't force a puppet run on the host so it might be a few minutes behind [18:24:25] ah [18:24:25] ah [18:24:26] I'll do that now [18:24:26] I'll do that now [18:24:34] no rush [18:24:34] no rush [18:24:57] PROBLEM - Host deployment-urldownloader02 is DOWN: CRITICAL - Host Unreachable (10.68.19.117) [18:24:57] PROBLEM - Host deployment-urldownloader02 is DOWN: CRITICAL - Host Unreachable (10.68.19.117) [18:25:33] hm, guess I should kill shinken for the duration... [18:25:33] hm, guess I should kill shinken for the duration... [18:26:02] probably [18:26:02] probably [18:26:14] andrewbogott, could drop deployment-prep from shinkengen temporarily [18:26:14] andrewbogott, could drop deployment-prep from shinkengen temporarily [18:32:17] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:32:18] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:32:21] !log creating deployment-cache-text05 to replace deployment-cache-text04 [18:32:21] !log creating deployment-cache-text05 to replace deployment-cache-text04 [18:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:33:03] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:33:03] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:38:18] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:38:18] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:38:20] Project beta-code-update-eqiad build #225524: 04FAILURE in 3.1 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/225524/ [18:38:20] Project beta-code-update-eqiad build #225524: 04FAILURE in 3.1 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/225524/ [18:39:04] 10Gerrit, 10Release-Engineering-Team (Next), 10DBA, 10Operations: Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Paladox) In theory we could fix this with the upgrade to 2.16 (as nothing uses the db anymore but it's still... [18:39:04] 10Gerrit, 10Release-Engineering-Team (Next), 10DBA, 10Operations: Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Paladox) In theory we could fix this with the upgrade to 2.16 (as nothing uses the db anymore but it's still... [18:39:15] 10Gerrit, 10Release-Engineering-Team (Next), 10DBA, 10Operations: Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Paladox) [18:39:15] 10Gerrit, 10Release-Engineering-Team (Next), 10DBA, 10Operations: Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Paladox) [18:39:18] 10Gerrit, 10Patch-For-Review: Upgrade to Gerrit 2.16 - https://phabricator.wikimedia.org/T200739 (10Paladox) [18:39:18] 10Gerrit, 10Patch-For-Review: Upgrade to Gerrit 2.16 - https://phabricator.wikimedia.org/T200739 (10Paladox) [18:41:31] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:41:31] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:43:54] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Cloud-Services, 10Epic, 10Patch-For-Review: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 (10Andrew) This is now underway! [18:43:54] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Cloud-Services, 10Epic, 10Patch-For-Review: Migrate deployment-prep to eqiad1 - https://phabricator.wikimedia.org/T208101 (10Andrew) This is now underway! [18:47:03] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:47:03] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:49:22] Krenair: a couple of not-very-interesting things are moved; will be ~ half an hour before anything else finishes. [18:49:22] Krenair: a couple of not-very-interesting things are moved; will be ~ half an hour before anything else finishes. [18:49:35] ok [18:49:35] ok [18:49:55] I take it none of the moves instances have any matches in the known config places [18:49:55] I take it none of the moves instances have any matches in the known config places [18:50:20] moved* [18:50:20] moved* [18:52:38] I don't think so [18:52:38] I don't think so [18:53:59] I've noticed one thing that's interesting [18:53:59] I've noticed one thing that's interesting [18:54:10] I spun up a new instance in eqiad1-r [18:54:10] I spun up a new instance in eqiad1-r [18:54:13] [ 131.311137] rc.local[395]: E: Failed to fetch http://deployment-deploy01.deployment-prep.eqiad.wmflabs/repo/dists/stretch-deployment-prep/main/binary-amd64/Packages Unable to connect to deployment-deploy01.deployment-prep.eqiad.wmflabs:http: [IP: 10.68.23.38 80] [18:54:13] [ 131.311137] rc.local[395]: E: Failed to fetch http://deployment-deploy01.deployment-prep.eqiad.wmflabs/repo/dists/stretch-deployment-prep/main/binary-amd64/Packages Unable to connect to deployment-deploy01.deployment-prep.eqiad.wmflabs:http: [IP: 10.68.23.38 80] [18:54:16] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:54:16] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [18:54:27] should probably allow eqiad1-r IPs into that though it's not critical [18:54:28] should probably allow eqiad1-r IPs into that though it's not critical [18:54:46] for this anyway [18:54:46] for this anyway [18:56:55] looks like it also fails to talk to puppet? [18:56:55] looks like it also fails to talk to puppet? [19:01:59] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [19:01:59] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [19:09:36] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [19:09:36] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [19:30:21] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [19:30:21] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [19:32:00] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [19:32:00] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [19:43:36] Krenair, twentyafterfour, want me to keep moving things, or should I pause after this set for you to evaluate things? [19:43:36] Krenair, twentyafterfour, want me to keep moving things, or should I pause after this set for you to evaluate things? [19:44:16] keep going [19:44:16] keep going [19:44:24] ok [19:44:24] ok [19:44:40] yeah [19:44:40] yeah [19:44:46] 10Release-Engineering-Team, 10Operations, 10Release Pipeline: Design pipeline image versioning scheme - https://phabricator.wikimedia.org/T209088 (10thcipriani) >>! In T209088#4745829, @akosiaris wrote: > I think we should support multiple tags per image (docker anyway does support that and they cost next to... [19:44:46] 10Release-Engineering-Team, 10Operations, 10Release Pipeline: Design pipeline image versioning scheme - https://phabricator.wikimedia.org/T209088 (10thcipriani) >>! In T209088#4745829, @akosiaris wrote: > I think we should support multiple tags per image (docker anyway does support that and they cost next to... [20:06:48] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [20:06:48] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [20:10:16] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:10:16] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:12:47] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:12:47] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:14:34] Krenair: did you set up puppetdb on deployment-puppetmaster03? It's having trouble, I'm not quite sure where to start. [20:14:34] Krenair: did you set up puppetdb on deployment-puppetmaster03? It's having trouble, I'm not quite sure where to start. [20:14:39] Is puppetdb its own service? [20:14:39] Is puppetdb its own service? [20:14:48] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:14:48] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:14:50] Warning: Error 500 on SERVER: Server Error: Could not retrieve facts for deployment-puppetmaster03.deployment-prep.eqiad.wmflabs: Failed to find facts from PuppetDB at puppet:8140: Failed to execute '/pdb/query/v4/nodes/deployment-puppetmaster03.deployment-prep.eqiad.wmflabs/facts' on at least 1 of the following 'server_urls': https://deployment-puppetdb02.deployment-prep.eqiad.wmflabs [20:14:50] Warning: Error 500 on SERVER: Server Error: Could not retrieve facts for deployment-puppetmaster03.deployment-prep.eqiad.wmflabs: Failed to find facts from PuppetDB at puppet:8140: Failed to execute '/pdb/query/v4/nodes/deployment-puppetmaster03.deployment-prep.eqiad.wmflabs/facts' on at least 1 of the following 'server_urls': https://deployment-puppetdb02.deployment-prep.eqiad.wmflabs [20:15:52] oh, nevermind, I should've read to the end :) [20:15:52] oh, nevermind, I should've read to the end :) [20:15:59] It's just a security group thing [20:15:59] It's just a security group thing [20:17:23] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:17:23] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:20:20] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:20:21] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:22:25] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [20:22:25] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [20:24:26] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:24:26] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:26:51] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:26:51] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:27:11] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) a:05greg>03jeena [20:27:11] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) a:05greg>03jeena [20:31:03] Does gerrit not create the local account until after the person logs in the first time (after they've created their ldap/developer account)? [20:31:03] Does gerrit not create the local account until after the person logs in the first time (after they've created their ldap/developer account)? [20:33:16] greg-g: correct afaik [20:33:16] greg-g: correct afaik [20:33:21] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:33:21] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:33:30] it creates an account on-login afaik [20:33:30] it creates an account on-login afaik [20:33:57] so just use the ldap to log in and you shouldn't need to separately request a gerrit account, it's just automatic on first login [20:33:57] so just use the ldap to log in and you shouldn't need to separately request a gerrit account, it's just automatic on first login [20:34:07] * twentyafterfour isn't 100% sure but that's how I think it works [20:34:07] ACTION isn't 100% sure but that's how I think it works [20:35:38] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:35:38] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:36:03] twentyafterfour: right, that's what I thought. I was looking for jeena's account in gerrit but it's not there since she hasn't logged in yet [20:36:03] twentyafterfour: right, that's what I thought. I was looking for jeena's account in gerrit but it's not there since she hasn't logged in yet [20:38:57] Though upstream have added some new functionaility [20:38:57] Though upstream have added some new functionaility [20:39:08] that can create a account based on which rest api you use [20:39:08] that can create a account based on which rest api you use [20:39:16] this affects LDAP [20:39:16] this affects LDAP [20:41:03] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:41:03] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [20:42:45] https://gerrit-review.googlesource.com/c/gerrit/+/200897 [20:42:45] https://gerrit-review.googlesource.com/c/gerrit/+/200897 [20:51:28] greg-g: yeah, as you may have guessed by now most of the services that share the developer account system create local accounts on demand as the backing LDAP data is used to authenticate. [20:51:28] greg-g: yeah, as you may have guessed by now most of the services that share the developer account system create local accounts on demand as the backing LDAP data is used to authenticate. [20:52:02] this actually includes Wikitech if the developer account is created via toolsadmin rather than on wikitech originally [20:52:02] this actually includes Wikitech if the developer account is created via toolsadmin rather than on wikitech originally [20:54:30] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [20:54:30] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [20:56:28] gerrit's db being dropped in https://gerrit-review.googlesource.com/c/gerrit/+/205196 [20:56:28] gerrit's db being dropped in https://gerrit-review.googlesource.com/c/gerrit/+/205196 [20:59:18] 10Gerrit, 10Release-Engineering-Team (Next), 10DBA, 10Operations: Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Paladox) Gerrit's db support is being removed in https://gerrit-review.googlesource.com/c/gerrit/+/205196 :) [20:59:18] 10Gerrit, 10Release-Engineering-Team (Next), 10DBA, 10Operations: Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Paladox) Gerrit's db support is being removed in https://gerrit-review.googlesource.com/c/gerrit/+/205196 :) [21:03:46] Krenair: I've fixed all the things I know to fix but still can't get puppetdb to start up on deployment-puppetdb02. I suspect there's an IP hard-coded someplace but can't find it. [21:03:46] Krenair: I've fixed all the things I know to fix but still can't get puppetdb to start up on deployment-puppetdb02. I suspect there's an IP hard-coded someplace but can't find it. [21:03:55] Want to have a look, if puppetdb is something you've worked on? [21:03:55] Want to have a look, if puppetdb is something you've worked on? [21:17:45] andrewbogott, sorry I got completely distracted [21:17:45] andrewbogott, sorry I got completely distracted [21:17:47] bd808, also includes wikitech if you had an SVN account [21:17:47] bd808, also includes wikitech if you had an SVN account [21:17:49] andrewbogott, I'll have a look at puppetdb [21:17:49] andrewbogott, I'll have a look at puppetdb [21:18:15] 10Scap, 10Operations, 10User-jijiki: Introduce state to Scap - https://phabricator.wikimedia.org/T209881 (10thcipriani) [21:18:15] 10Scap, 10Operations, 10User-jijiki: Introduce state to Scap - https://phabricator.wikimedia.org/T209881 (10thcipriani) [21:19:31] I've never seen this error [21:19:32] I've never seen this error [21:19:37] and it's very unhelpful too [21:19:38] and it's very unhelpful too [21:21:29] Nov 19 21:18:47 deployment-puppetdb02 puppetdb[3214]: Caused by: java.net.BindException: Cannot assign requested address [21:21:29] Nov 19 21:18:47 deployment-puppetdb02 puppetdb[3214]: Caused by: java.net.BindException: Cannot assign requested address [21:21:34] this is interesting [21:21:34] this is interesting [21:21:42] I wonder if it's trying to bind to the old IP or something [21:21:42] I wonder if it's trying to bind to the old IP or something [21:21:46] it has 172.16.4.104/21 now [21:21:46] it has 172.16.4.104/21 now [21:22:04] 10Project-Admins, 10Anti-Harassment (AHT Sprint 33), 10User-Luke081515: Create project for Community Health Metrics Kit - https://phabricator.wikimedia.org/T191556 (10TBolliger) [21:22:04] 10Project-Admins, 10Anti-Harassment (AHT Sprint 33), 10User-Luke081515: Create project for Community Health Metrics Kit - https://phabricator.wikimedia.org/T191556 (10TBolliger) [21:22:19] Krenair is it binding to the host name [21:22:19] Krenair is it binding to the host name [21:22:22] or ip? [21:22:22] or ip? [21:22:28] host name could be cached to the old ip [21:22:28] host name could be cached to the old ip [21:22:32] check /etc/hosts [21:22:32] check /etc/hosts [21:23:06] interestingly this thing does have an /etc/hosts entry for itself [21:23:06] interestingly this thing does have an /etc/hosts entry for itself [21:23:18] which is correct [21:23:18] which is correct [21:25:07] Nov 19 21:24:35 deployment-puppetdb02 puppetdb[3636]: #011at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:145) [21:25:07] Nov 19 21:24:35 deployment-puppetdb02 puppetdb[3636]: #011at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:145) [21:25:07] Nov 19 21:24:35 deployment-puppetdb02 puppetdb[3636]: #011at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:49) [21:25:07] Nov 19 21:24:35 deployment-puppetdb02 puppetdb[3636]: #011at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:49) [21:25:19] hmm [21:25:19] hmm [21:25:21] so this is the prometheus agent stuff [21:25:21] so this is the prometheus agent stuff [21:26:03] is it trying to connect to prod? [21:26:03] is it trying to connect to prod? [21:26:37] why would it start doing that? [21:26:37] why would it start doing that? [21:27:00] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [21:27:00] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [21:27:22] Krenair the point class adds that stuff [21:27:22] Krenair the point class adds that stuff [21:27:25] *prod [21:27:25] *prod [21:28:12] what? [21:28:12] what? [21:28:14] so it's fighting with prometheus over the port? [21:28:14] so it's fighting with prometheus over the port? [21:28:42] it's not clear yet which port puppetdb is trying to bind to [21:28:43] it's not clear yet which port puppetdb is trying to bind to [21:31:18] Yeah, I was confused by that. The conf file says 8081, but the puppetmaster is configured to talk to it on 443 [21:31:18] Yeah, I was confused by that. The conf file says 8081, but the puppetmaster is configured to talk to it on 443 [21:31:53] that's nginx [21:31:53] that's nginx [21:32:09] nginx has a site set up for puppetdb that proxies stuff through to port 8080 [21:32:09] nginx has a site set up for puppetdb that proxies stuff through to port 8080 [21:32:21] Krenair the prod class is adding prometheus to puppetdb. [21:32:21] Krenair the prod class is adding prometheus to puppetdb. [21:32:35] paladox, what does this have to do with eqiad -> eqiad1-r migration? [21:32:35] paladox, what does this have to do with eqiad -> eqiad1-r migration? [21:33:22] Krenair it would probaley try to connect to the prod prometheus instance? [21:33:22] Krenair it would probaley try to connect to the prod prometheus instance? [21:33:41] why would it be any more likely to do that now compared to when it was in the old region? [21:33:41] why would it be any more likely to do that now compared to when it was in the old region? [21:33:52] and why would that break binding to a port? [21:33:52] and why would that break binding to a port? [21:35:08] It could be a race that was waiting to break on the next reboot [21:35:08] It could be a race that was waiting to break on the next reboot [21:35:14] and unrelated to the migration apart from rebooting [21:35:14] and unrelated to the migration apart from rebooting [21:35:34] ^ yeah that's possible. Was that stuff recently changed [21:35:34] ^ yeah that's possible. Was that stuff recently changed [21:35:41] [pid 3876] bind(18, {sa_family=AF_INET6, sin6_port=htons(9400), inet_pton(AF_INET6, "::ffff:10.68.19.126", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address) [21:35:41] [pid 3876] bind(18, {sa_family=AF_INET6, sin6_port=htons(9400), inet_pton(AF_INET6, "::ffff:10.68.19.126", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address) [21:35:45] caught this in strace [21:35:45] caught this in strace [21:35:54] ah that looks like a culprit [21:35:54] ah that looks like a culprit [21:35:58] not the port at all [21:35:58] not the port at all [21:36:00] why is it trying to bind a 10. address? [21:36:00] why is it trying to bind a 10. address? [21:36:12] that can't be right [21:36:13] that can't be right [21:36:25] andrewbogott, was that its old IP by any chance? [21:36:25] andrewbogott, was that its old IP by any chance? [21:36:45] in progress -> | cac3fa7b-3b03-450c-bccf-ffbc65a20dc8 | deployment-puppetdb02 | ACTIVE | public=10.68.19.126 | [21:36:45] in progress -> | cac3fa7b-3b03-450c-bccf-ffbc65a20dc8 | deployment-puppetdb02 | ACTIVE | public=10.68.19.126 | [21:36:48] yes [21:36:48] yes [21:37:01] so where is that cached and why [21:37:01] hm, so it's cached someplace that survives a reboot [21:37:01] hm, so it's cached someplace that survives a reboot [21:37:01] so where is that cached and why [21:37:08] it's possible that a puppet run would fix it, if we could run puppet :) [21:37:08] it's possible that a puppet run would fix it, if we could run puppet :) [21:37:20] I could point it to the main puppetmaster for a single run, just in case [21:37:20] I could point it to the main puppetmaster for a single run, just in case [21:37:32] * andrewbogott tries [21:37:32] ACTION tries [21:37:37] 10.68.19.126 doesn't show up in operations/puppet ... hmm [21:37:37] 10.68.19.126 doesn't show up in operations/puppet ... hmm [21:37:53] found stuff [21:37:53] found stuff [21:38:06] root@deployment-puppetdb02:/etc/puppetlabs/puppetdb# grep 10.68.19.126 /etc/* -r [21:38:06] root@deployment-puppetdb02:/etc/puppetlabs/puppetdb# grep 10.68.19.126 /etc/* -r [21:38:06] ok, holding off... [21:38:06] ok, holding off... [21:38:09] grep: /etc/wmcs-imageversion: No such file or directory [21:38:09] grep: /etc/wmcs-imageversion: No such file or directory [21:38:11] root@deployment-puppetdb02:/etc/puppetlabs/puppetdb# [21:38:11] root@deployment-puppetdb02:/etc/puppetlabs/puppetdb# [21:38:13] bah IRC [21:38:13] bah IRC [21:38:16] /etc/default/puppetdb:JAVA_ARGS="-Xmx4G -javaagent:/usr/share/java/prometheus/jmx_prometheus_javaagent.jar=10.68.19.126:9400:/etc/puppetdb/jvm_prometheus_puppetdb_jmx_exporter.yaml" [21:38:16] /etc/default/puppetdb:JAVA_ARGS="-Xmx4G -javaagent:/usr/share/java/prometheus/jmx_prometheus_javaagent.jar=10.68.19.126:9400:/etc/puppetdb/jvm_prometheus_puppetdb_jmx_exporter.yaml" [21:38:22] /etc/nagios/nrpe_local.cfg:server_address=10.68.19.126 [21:38:22] /etc/nagios/nrpe_local.cfg:server_address=10.68.19.126 [21:38:26] /etc/postgresql/9.6/main/pg_hba.conf:host puppetdb puppetdb 10.68.19.126/32 md5 [21:38:26] /etc/postgresql/9.6/main/pg_hba.conf:host puppetdb puppetdb 10.68.19.126/32 md5 [21:38:30] /etc/ssh/ssh_known_hosts:deployment-puppetdb02.deployment-prep.eqiad.wmflabs,deployment-puppetdb02,10.68.19.126 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEjqxAqSCjBr1tBKgacSmXZIcTMyZ4wgt7fow+wRTNqWsr83b6thxUtrhasV/tabTlB0yr9BuiuOdGXv5tr0Zck= [21:38:30] /etc/ssh/ssh_known_hosts:deployment-puppetdb02.deployment-prep.eqiad.wmflabs,deployment-puppetdb02,10.68.19.126 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEjqxAqSCjBr1tBKgacSmXZIcTMyZ4wgt7fow+wRTNqWsr83b6thxUtrhasV/tabTlB0yr9BuiuOdGXv5tr0Zck= [21:38:45] it's that /etc/default/puppetdb [21:38:45] it's that /etc/default/puppetdb [21:38:50] which puppet would probably fix if puppet could work [21:38:50] which puppet would probably fix if puppet could work [21:38:54] but since this is puppetdb this is chicken and egg [21:38:54] but since this is puppetdb this is chicken and egg [21:39:00] will fix by hand [21:39:00] will fix by hand [21:39:04] nice find [21:39:04] nice find [21:39:42] grep usually does the trick :) [21:39:42] grep usually does the trick :) [21:40:02] ah, sorry, I might be stepping on your toes [21:40:02] ah, sorry, I might be stepping on your toes [21:40:06] (or rather, puppet might be) [21:40:06] (or rather, puppet might be) [21:40:15] puppetdb is thinking about starting... [21:40:15] puppetdb is thinking about starting... [21:40:20] or was? [21:40:20] or was? [21:40:25] hm, it's sitting there [21:40:26] hm, it's sitting there [21:41:20] I'm guessing that postgresql line will need fixing too [21:41:21] I'm guessing that postgresql line will need fixing too [21:41:25] but still [21:41:25] but still [21:42:05] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [21:42:05] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10jeena) [21:42:16] ok, I got puppet to do a little bit but now it's back to failing in the same place [21:42:16] ok, I got puppet to do a little bit but now it's back to failing in the same place [21:42:36] are you changing stuff at the same time as me? [21:42:36] are you changing stuff at the same time as me? [21:42:43] I was but I'm stopping :) [21:42:44] I was but I'm stopping :) [21:42:56] so interesting [21:42:56] so interesting [21:43:16] although I'm sitting there looking at `service puppetdb start` with a blinking cursor [21:43:16] although I'm sitting there looking at `service puppetdb start` with a blinking cursor [21:43:23] status tells me it's running [21:43:23] status tells me it's running [21:43:27] oh it finished [21:43:27] oh it finished [21:43:55] and it's up! [21:43:55] and it's up! [21:44:12] woah [21:44:12] woah [21:44:28] this can't be good [21:44:28] this can't be good [21:44:52] it replaced /etc/puppetdb/ssl/cert.pem ? [21:44:52] it replaced /etc/puppetdb/ssl/cert.pem ? [21:44:59] and /etc/puppetdb/ssl/server.key [21:44:59] and /etc/puppetdb/ssl/server.key [21:45:09] and /etc/nginx/ssl/cert.pem [21:45:09] and /etc/nginx/ssl/cert.pem [21:45:15] and /etc/nginx/ssl/server.key [21:45:16] and /etc/nginx/ssl/server.key [21:45:21] hm [21:45:21] hm [21:45:38] hm but puppet does still work [21:45:38] hm but puppet does still work [21:45:47] okay [21:45:47] okay [21:45:53] it appears to have gotten away with that [21:45:53] it appears to have gotten away with that [21:46:01] weird [21:46:01] weird [21:46:55] puppet is working on other hosts [21:46:56] puppet is working on other hosts [21:47:05] let's consider puppetdb working [21:47:05] let's consider puppetdb working [21:47:12] great :) [21:47:12] great :) [21:47:19] sweet [21:47:19] sweet [21:47:28] I got into this because I was trying to understand who/what broke deployment-elasticXX [21:47:28] I got into this because I was trying to understand who/what broke deployment-elasticXX [21:47:40] It's surely a hiera change, but I don't know who is working on elastic stuff right now [21:47:40] It's surely a hiera change, but I don't know who is working on elastic stuff right now [21:47:42] but of course puppet wouldn't run because puppetdb wasn't coming up :) [21:47:42] but of course puppet wouldn't run because puppetdb wasn't coming up :) [21:50:50] bd808: oh neat (re wikitech being an on-demand thing) [21:50:50] bd808: oh neat (re wikitech being an on-demand thing) [21:52:44] andrewbogott, so where are we with the other migrations? [21:52:45] andrewbogott, so where are we with the other migrations? [21:53:27] I'll update the pad [21:53:27] I'll update the pad [21:55:21] ok, updated [21:55:21] ok, updated [21:56:41] mm just realised I should probably depool mediawiki-07 before you shut it down, send its traffic to mediawiki-09 [21:56:41] mm just realised I should probably depool mediawiki-07 before you shut it down, send its traffic to mediawiki-09 [21:58:49] andrewbogott, oh did you already get that one? [21:58:49] andrewbogott, oh did you already get that one? [21:59:08] lol yep that took out the site [21:59:08] lol yep that took out the site [21:59:12] one sec [21:59:13] one sec [21:59:14] Yeah, it was already in the process of shutting down when you said that :/ [21:59:14] Yeah, it was already in the process of shutting down when you said that :/ [22:00:58] 17 VMs copied, 53 to go. [22:00:58] 17 VMs copied, 53 to go. [22:01:30] WTF [22:01:30] WTF [22:01:35] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: SSL_connect returned=1 errno=0 state=error: certificate verify failed: [certificate revoked for /CN=deployment-puppetdb02.deployment-prep.eqiad.wmflabs] [22:01:35] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: SSL_connect returned=1 errno=0 state=error: certificate verify failed: [certificate revoked for /CN=deployment-puppetdb02.deployment-prep.eqiad.wmflabs] [22:01:38] ????? [22:01:38] ????? [22:01:42] ok, try again! [22:01:42] ok, try again! [22:01:50] I saw that, it happened on the first try and then righted itself. [22:01:50] I saw that, it happened on the first try and then righted itself. [22:02:06] it didn't fix it [22:02:06] it didn't fix it [22:02:10] dammit I need puppet to fix the site [22:02:10] dammit I need puppet to fix the site [22:02:16] oh :( [22:02:16] oh :( [22:02:21] (03CR) 10Umherirrender: [C: 031] "Looks good, but I have no idea if php has some edge cases the sniff should be worry about." (037 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [22:02:21] (03CR) 10Umherirrender: [C: 031] "Looks good, but I have no idea if php has some edge cases the sniff should be worry about." (037 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/446271 (https://phabricator.wikimedia.org/T199768) (owner: 10Prtksxna) [22:02:21] what host is saying that? [22:02:21] what host is saying that? [22:02:26] cache-text04 [22:02:26] cache-text04 [22:02:38] okay I'll just have to try and do this manually [22:02:38] okay I'll just have to try and do this manually [22:02:44] not like I can break it any more than it already is [22:02:44] not like I can break it any more than it already is [22:03:53] I can also cancel out of the 07 move and restart it in eqiad [22:03:53] I can also cancel out of the 07 move and restart it in eqiad [22:03:56] if necessary [22:03:56] if necessary [22:04:04] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [22:04:04] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10greg) [22:04:07] and now varnish won't reload [22:04:07] and now varnish won't reload [22:04:12] this is not good [22:04:12] this is not good [22:04:33] Nov 19 22:04:24 deployment-cache-text04 varnish[22877]: Backend host "deployment-mediawiki-07.deployment-prep.eqiad.wmflabs": resolves to too many addresses. [22:04:33] Nov 19 22:04:24 deployment-cache-text04 varnish[22877]: Backend host "deployment-mediawiki-07.deployment-prep.eqiad.wmflabs": resolves to too many addresses. [22:04:34] Is it maybe the case that puppet is broken on eqiad hosts and working on eqiad1 hosts? [22:04:34] Is it maybe the case that puppet is broken on eqiad hosts and working on eqiad1 hosts? [22:04:38] okay I can just remove that backend entirely [22:04:39] okay I can just remove that backend entirely [22:04:56] Yeah, when in transition each host resolves to two different IPs. [22:04:56] Yeah, when in transition each host resolves to two different IPs. [22:05:15] Because I don't want to risk wiping out old state until we confirm that the eqiad1 VM is up and running and reachable. [22:05:15] Because I don't want to risk wiping out old state until we confirm that the eqiad1 VM is up and running and reachable. [22:05:30] okay so I fixed varnish I think [22:05:30] okay so I fixed varnish I think [22:05:35] -07 is huge, it'll be almost 2 hours until it finishes moving [22:05:35] -07 is huge, it'll be almost 2 hours until it finishes moving [22:05:39] I have no idea if it'll be able to get traffic through to -09 [22:05:39] I have no idea if it'll be able to get traffic through to -09 [22:06:26] okay [22:06:26] okay [22:06:31] it gets through to an appserver but then [22:06:31] it gets through to an appserver but then [22:06:35] (Cannot access the database: No working replica DB server: Unknown error (10.68.18.35:3306)) [22:06:35] (Cannot access the database: No working replica DB server: Unknown error (10.68.18.35:3306)) [22:07:18] huh I assumed it would fall back to db03 if it couldn't get that [22:07:18] huh I assumed it would fall back to db03 if it couldn't get that [22:07:18] okay [22:07:19] okay [22:07:45] so maybe db04's rules don't permit eqiad1-r hosts? [22:07:45] so maybe db04's rules don't permit eqiad1-r hosts? [22:07:53] I'll look [22:07:53] I'll look [22:07:57] what port are we probably talking about? [22:07:57] what port are we probably talking about? [22:08:06] 3306 [22:08:06] 3306 [22:08:07] tcp [22:08:07] tcp [22:08:25] no iptables, what about security groups? [22:08:26] no iptables, what about security groups? [22:08:49] I'm fixing that, one moment... [22:08:49] I'm fixing that, one moment... [22:09:12] so it was security groups then? [22:09:12] so it was security groups then? [22:09:14] ok, done. Is that better? [22:09:14] ok, done. Is that better? [22:09:26] apparently not [22:09:26] apparently not [22:09:34] https://en.wikipedia.beta.wmflabs.org/ still says (Cannot access the database: No working replica DB server: Unknown error (10.68.18.35:3306)) [22:09:34] https://en.wikipedia.beta.wmflabs.org/ still says (Cannot access the database: No working replica DB server: Unknown error (10.68.18.35:3306)) [22:09:49] I don't know how long it takes security group rules to spread… should be quick though [22:09:49] I don't know how long it takes security group rules to spread… should be quick though [22:09:59] should be virtually instant? [22:09:59] should be virtually instant? [22:10:25] I'd think [22:10:25] I'd think [22:11:25] oh, is it the grant? [22:11:25] oh, is it the grant? [22:11:36] wrong address in the @ ? [22:11:36] wrong address in the @ ? [22:12:15] oh good call [22:12:15] oh good call [22:12:27] network looks fine I can talk from mediawiki-09 to the mysql service on db04 [22:12:27] network looks fine I can talk from mediawiki-09 to the mysql service on db04 [22:12:54] | wikiadmin | 10.% | [22:12:54] | wikiadmin | 10.% | [22:12:54] | wikiuser | 10.% | [22:12:54] | wikiuser | 10.% | [22:12:56] yeah [22:12:56] yeah [22:12:59] nice [22:12:59] nice [22:13:10] now how the hell do you replicate this stuff properly for new ranges [22:13:10] now how the hell do you replicate this stuff properly for new ranges [22:13:26] and why was that causing 'Unknown error'? [22:13:26] and why was that causing 'Unknown error'? [22:13:58] also lol, 10.4 stuff in here in 2018: [22:13:59] also lol, 10.4 stuff in here in 2018: [22:14:02] I need to go run an errand in a minute. I'm going to leave these copies running unless you object. [22:14:02] | root | 10.4.0.53 | [22:14:02] I need to go run an errand in a minute. I'm going to leave these copies running unless you object. [22:14:02] | ops | 10.4.0.85 | [22:14:02] | root | 10.4.0.53 | [22:14:02] | ops | 10.4.0.85 | [22:14:05] ok [22:14:05] ok [22:14:20] (And, for that matter, I might let the run overnight too; seems better to just get everything in one place) [22:14:20] (And, for that matter, I might let the run overnight too; seems better to just get everything in one place) [22:14:23] copies of the instances? [22:14:23] copies of the instances? [22:14:34] I mean, copies from region to region [22:14:34] I mean, copies from region to region [22:14:38] right [22:14:38] right [22:14:39] andrewbogott: agreed, lets just get it all in one place if possible [22:14:39] andrewbogott: agreed, lets just get it all in one place if possible [22:14:44] ok [22:14:44] ok [22:15:07] I'm going to try to re-create those mysql users for the new range [22:15:07] I'm going to try to re-create those mysql users for the new range [22:15:08] I can poke at some of it this evening if you fire up some instances when you get back [22:15:09] I can poke at some of it this evening if you fire up some instances when you get back [22:15:21] (/me works late, usually) [22:15:21] (/me works late, usually) [22:15:28] When I'm back I'll try to understand what's happening with the puppetdb cert, if you haven't already fixed it by then :) [22:15:28] When I'm back I'll try to understand what's happening with the puppetdb cert, if you haven't already fixed it by then :) [22:18:44] think I did it [22:18:44] think I did it [22:18:58] (mysql, maybe) [22:18:58] (mysql, maybe) [22:19:12] seriously though why do we still have pmtpa stuff lurking [22:19:12] seriously though why do we still have pmtpa stuff lurking [22:19:21] okay beta.wmflabs.org is back up [22:19:21] okay beta.wmflabs.org is back up [22:19:32] nice [22:19:32] nice [22:20:07] yep, site looks good. [22:20:07] yep, site looks good. [22:20:14] I'm out for a while, driving [22:20:14] I'm out for a while, driving [22:20:36] I'm going afk for a bit as well, I'll check in when I get back [22:20:36] I'm going afk for a bit as well, I'll check in when I get back [22:23:10] !log duplicated wiki(user|admin) mysql users on deployment-db0[34] - previous hosts 10.%, new hosts 172.16.% [22:23:10] !log duplicated wiki(user|admin) mysql users on deployment-db0[34] - previous hosts 10.%, new hosts 172.16.% [22:23:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:23:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:24:54] I think after all this is done I'm gonna have to go through the IRC logs and just file bugs en masse [22:24:54] I think after all this is done I'm gonna have to go through the IRC logs and just file bugs en masse [22:25:31] like why does deployment-db03 have wikiuser grants on `%a%`.* ? [22:25:31] like why does deployment-db03 have wikiuser grants on `%a%`.* ? [22:27:12] FWIW I installed mariadb-client on deployment-imagescaler01 and deployment-mediawiki-09 to debug this problem, should clean up later [22:27:12] FWIW I installed mariadb-client on deployment-imagescaler01 and deployment-mediawiki-09 to debug this problem, should clean up later [22:29:10] I'm going to try sorting the puppet cert for deployment-puppetdb02 [22:29:10] I'm going to try sorting the puppet cert for deployment-puppetdb02 [22:30:21] WTF [22:30:21] WTF [22:30:29] root@deployment-puppetmaster03:~# puppet cert list [22:30:29] root@deployment-puppetmaster03:~# puppet cert list [22:30:29] "deployment-puppetdb02.deployment-prep.eqiad.wmflabs" (SHA256) 5D:B2:96:51:8D:49:63:30:B5:51:27:2D:78:35:8B:2F:E2:FC:3A:88:5C:F9:AE:64:49:E6:ED:03:73:6B:9D:03 [22:30:29] "host-172-16-4-100.deployment-prep.eqiad.wmflabs" (SHA256) 28:A8:84:1C:29:EC:08:03:9E:A4:D5:C3:25:3B:A4:3D:C1:6E:D9:F2:61:B3:EE:DC:24:68:E7:34:E8:11:73:22 [22:30:29] "host-172-16-4-106.deployment-prep.eqiad.wmflabs" (SHA256) A4:A6:93:2F:0D:1A:FD:7A:73:B2:14:48:BF:2E:33:AE:E8:22:68:15:5B:B2:FA:3F:4D:23:2D:55:33:AD:51:AC [22:30:30] "deployment-puppetdb02.deployment-prep.eqiad.wmflabs" (SHA256) 5D:B2:96:51:8D:49:63:30:B5:51:27:2D:78:35:8B:2F:E2:FC:3A:88:5C:F9:AE:64:49:E6:ED:03:73:6B:9D:03 [22:30:30] "host-172-16-4-100.deployment-prep.eqiad.wmflabs" (SHA256) 28:A8:84:1C:29:EC:08:03:9E:A4:D5:C3:25:3B:A4:3D:C1:6E:D9:F2:61:B3:EE:DC:24:68:E7:34:E8:11:73:22 [22:30:30] "host-172-16-4-106.deployment-prep.eqiad.wmflabs" (SHA256) A4:A6:93:2F:0D:1A:FD:7A:73:B2:14:48:BF:2E:33:AE:E8:22:68:15:5B:B2:FA:3F:4D:23:2D:55:33:AD:51:AC [22:30:30] D5: Ok so I hacked up ssh.py to use mozprocess - https://phabricator.wikimedia.org/D5 [22:30:30] D5: Ok so I hacked up ssh.py to use mozprocess - https://phabricator.wikimedia.org/D5 [22:30:30] D9: Remap all submodules to tin - https://phabricator.wikimedia.org/D9 [22:30:30] D9: Remap all submodules to tin - https://phabricator.wikimedia.org/D9 [22:30:31] "host-172-16-4-116.deployment-prep.eqiad.wmflabs" (SHA256) 76:52:82:18:AF:CD:79:7D:50:63:5E:82:99:E9:6D:D7:D6:69:6F:6D:B6:A9:CD:01:BB:9E:83:9D:9C:B7:83:CB [22:30:31] "host-172-16-4-116.deployment-prep.eqiad.wmflabs" (SHA256) 76:52:82:18:AF:CD:79:7D:50:63:5E:82:99:E9:6D:D7:D6:69:6F:6D:B6:A9:CD:01:BB:9E:83:9D:9C:B7:83:CB [22:30:32] D6: Interactive deployment shell aka iscap - https://phabricator.wikimedia.org/D6 [22:30:32] D6: Interactive deployment shell aka iscap - https://phabricator.wikimedia.org/D6 [22:30:32] D7: Testing: DO not merge - https://phabricator.wikimedia.org/D7 [22:30:32] D7: Testing: DO not merge - https://phabricator.wikimedia.org/D7 [22:30:36] "host-172-16-4-19.deployment-prep.eqiad.wmflabs" (SHA256) 15:05:C5:7D:86:10:BA:ED:68:73:D2:DD:00:13:52:FB:CB:C5:BD:5A:E1:82:C6:D5:92:51:AC:AB:FA:F0:51:F2 [22:30:36] "host-172-16-4-19.deployment-prep.eqiad.wmflabs" (SHA256) 15:05:C5:7D:86:10:BA:ED:68:73:D2:DD:00:13:52:FB:CB:C5:BD:5A:E1:82:C6:D5:92:51:AC:AB:FA:F0:51:F2 [22:30:39] root@deployment-puppetmaster03:~# puppet cert sign deployment-p [22:30:39] root@deployment-puppetmaster03:~# puppet cert sign deployment-p [22:30:40] D2: Add .arcconfig for differential/arcanist - https://phabricator.wikimedia.org/D2 [22:30:41] D2: Add .arcconfig for differential/arcanist - https://phabricator.wikimedia.org/D2 [22:31:08] those must be mid-migration instances but why are they starting up and trying to get puppet certs with the wrong hostname? [22:31:08] those must be mid-migration instances but why are they starting up and trying to get puppet certs with the wrong hostname? [22:31:37] heh this is a bit of a catch-22 [22:31:37] heh this is a bit of a catch-22 [22:31:50] can't run puppet because the puppetdb cert is revoked [22:31:50] can't run puppet because the puppetdb cert is revoked [22:31:57] can't fix the puppetdb cert because puppet won't run [22:31:57] can't fix the puppetdb cert because puppet won't run [22:32:15] gonna have to try to insert the deployment-puppetdb02 cert by hand [22:32:15] gonna have to try to insert the deployment-puppetdb02 cert by hand [22:32:48] dosen't puppetdb include some magic to copy the puppet cert? [22:32:48] dosen't puppetdb include some magic to copy the puppet cert? [22:38:23] okay I think I fixed it [22:38:24] okay I think I fixed it [22:38:31] no wait what's it doing [22:38:31] no wait what's it doing [22:38:39] okay it also changed /etc/postgresql/ssl/cert.pem [22:38:40] okay it also changed /etc/postgresql/ssl/cert.pem [22:38:45] and /etc/postgresql/ssl/server.key [22:38:45] and /etc/postgresql/ssl/server.key [22:38:58] now was that correct... [22:38:58] now was that correct... [22:39:49] seems to be fine [22:39:49] seems to be fine [22:40:24] !log manually sorted certs for deployment-puppetdb02 [22:40:24] !log manually sorted certs for deployment-puppetdb02 [22:40:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:40:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:42:54] okay so puppet on deployment-cache-text04 just cleaned up my comments I left when fixing stuff by hand [22:42:54] okay so puppet on deployment-cache-text04 just cleaned up my comments I left when fixing stuff by hand [23:29:23] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.33.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T206657 (10greg) 05Open>03Resolved [23:29:23] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.33.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T206657 (10greg) 05Open>03Resolved [23:35:39] 10Release-Engineering-Team (Kanban), 10Education-Program-Dashboard, 10MediaWiki-extensions-EducationProgram, 10Epic, 10User-greg: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018 - https://phabricator.wikimedia.org/T125618 (10greg) >>! In T125618#4620545, @In... [23:35:39] 10Release-Engineering-Team (Kanban), 10Education-Program-Dashboard, 10MediaWiki-extensions-EducationProgram, 10Epic, 10User-greg: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018 - https://phabricator.wikimedia.org/T125618 (10greg) >>! In T125618#4620545, @In... [23:41:57] 10Deployments, 10Release-Engineering-Team (Kanban), 10HHVM, 10Wikimedia-Incident: Figure out why HHVM kept running stale code for hours - https://phabricator.wikimedia.org/T181833 (10greg) I don't know how much further we can go with this after so long... propose to decline. [23:41:58] 10Deployments, 10Release-Engineering-Team (Kanban), 10HHVM, 10Wikimedia-Incident: Figure out why HHVM kept running stale code for hours - https://phabricator.wikimedia.org/T181833 (10greg) I don't know how much further we can go with this after so long... propose to decline. [23:42:32] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10Dzahn) Hi , welcome @jeena I can handle the Gerrit checkbox. I can see you already have a Wikitech/LDAP user, but in Gerrit itself i can't see it yet. Let me know when you have logged in on... [23:42:32] 10Release-Engineering-Team (Kanban), 10User-greg: Onboarding Jeena Huneidi - https://phabricator.wikimedia.org/T209722 (10Dzahn) Hi , welcome @jeena I can handle the Gerrit checkbox. I can see you already have a Wikitech/LDAP user, but in Gerrit itself i can't see it yet. Let me know when you have logged in on... [23:42:52] thank you for fixing puppet, Krenair. Those puppet certs with bogus hostnames will probably keep creeping in (the migrated hosts boot once with the wrong hostname before getting fixed.) I can clean them up at the end unless they're actively breaking things in the meantime. [23:42:52] thank you for fixing puppet, Krenair. Those puppet certs with bogus hostnames will probably keep creeping in (the migrated hosts boot once with the wrong hostname before getting fixed.) I can clean them up at the end unless they're actively breaking things in the meantime. [23:44:15] andrewbogott, they're not actively breaking things but is it really necessary to boot them with the wrong names? [23:44:15] andrewbogott, they're not actively breaking things but is it really necessary to boot them with the wrong names? [23:47:20] I don't know why it happens. It's not a race, since dhcp has had the whole copy time to get up to date with the right names. [23:47:20] I don't know why it happens. It's not a race, since dhcp has had the whole copy time to get up to date with the right names. [23:48:15] they come up with their old IP, ask dhcp for a name and get host-172-whatever along with their new, correct IP. Then after a reboot they get the right hostname (and already have the right IP from before). [23:48:15] they come up with their old IP, ask dhcp for a name and get host-172-whatever along with their new, correct IP. Then after a reboot they get the right hostname (and already have the right IP from before). [23:48:24] It's ugly but seems mostly harmless [23:48:24] It's ugly but seems mostly harmless [23:56:45] andrewbogott, https://gerrit.wikimedia.org/r/474820 [23:56:45] andrewbogott, https://gerrit.wikimedia.org/r/474820 [23:57:16] nice [23:57:16] nice