[00:00:03] Yippee, build fixed! [00:00:03] Project language-screenshots-VisualEditor » chrome,Windows 10,ci-jessie-wikimedia build #51: 09FIXED in 7 hr 45 min: https://integration.wikimedia.org/ci/job/language-screenshots-VisualEditor/BROWSER=chrome,PLATFORM=Windows%2010,label=ci-jessie-wikimedia/51/ [00:01:19] Yes, because that was one paragraph [00:01:42] Oh [00:01:44] The bug was that we were trying to pass stuff like "

Foo

Bar

Baz

" to an XML parser, which (correctly) causes an exception [00:01:53] oh [00:01:57] But the XML parser is only used in an IE-specific code path [00:02:11] oh [00:02:41] Anyway, the other patches were just deployed, so it should be fixed now [00:02:52] (but you may need to wait 5-10 mins and/or hard-refresh for the updated JS to come through) [00:02:56] Ok [00:02:58] yeh [00:03:05] since it is still not wokring [00:03:43] Ah works [00:03:43] now [00:04:07] RoanKattouw ^^ [00:04:16] thanks for fixing it and explaning the cause of the bug [00:04:17] :) [00:04:57] Yay [00:05:03] I'm very happy to have finally solved these VE issues [00:05:23] yep, thankyou :) [00:05:30] Because they've been there for a long time, and I was afraid of having to spend a lot of time investigating them [00:05:39] Then I suddenly made a breakthrough two days ago :) [00:05:46] oh [00:16:30] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [00:51:28] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [00:55:45] Project performance-webpagetest-wpt-org build #1855: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/performance-webpagetest-wpt-org/1855/ [02:16:48] Yippee, build fixed! [02:16:48] Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #141: 09FIXED in 3 min 46 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/141/ [02:56:38] PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [03:31:35] RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [04:05:46] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #129: 04FAILURE in 9 min 45 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/129/ [04:18:52] Yippee, build fixed! [04:18:52] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #129: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/129/ [04:34:29] PROBLEM - Puppet staleness on deployment-salt02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [04:48:37] Yippee, build fixed! [04:48:38] Project performance-webpagetest-wpt-org build #1856: 09FIXED in 16 min: https://integration.wikimedia.org/ci/job/performance-webpagetest-wpt-org/1856/ [06:50:47] 10Gerrit: Short commit hashes with all numbers cannot be searched using gerrit - https://phabricator.wikimedia.org/T86035#2604268 (10Aklapper) @Paladox: If you're "doing a test" you **must** explain properly everything single step of your test, in a list, leaving no room for interpretation. Otherwise we're all j... [07:44:05] PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [08:06:51] PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [08:23:10] 06Release-Engineering-Team, 06Community-Tech: updateCollation.php on terbium still run code from 1.28.0-wmf.16 against enwiki ( LoadBalancer::reallyOpenConnection: 402+ connections made (master=db1057) LoadBalancer.php line 850 ) - https://phabricator.wikimedia.org/T144580#2604370 (10hashar) [08:23:16] 06Release-Engineering-Team, 06Community-Tech: updateCollation.php on terbium still run code from 1.28.0-wmf.16 against enwiki ( LoadBalancer::reallyOpenConnection: 402+ connections made (master=db1057) LoadBalancer.php line 850 ) - https://phabricator.wikimedia.org/T144580#2604382 (10hashar) p:05Triage>03Hi... [08:24:05] RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [08:34:58] 06Release-Engineering-Team, 06Community-Tech: updateCollation.php on terbium still run code from 1.28.0-wmf.16 against enwiki ( LoadBalancer::reallyOpenConnection: 402+ connections made (master=db1057) LoadBalancer.php line 850 ) - https://phabricator.wikimedia.org/T144580#2604395 (10hashar) updateCollation.ph... [08:41:04] Puppet run on deployment-redis01 [08:41:05] bah [08:41:52] RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:43:16] transient issue [09:20:31] (03PS2) 10Zfilipin: Remove job selenium-Wikidata [integration/config] - 10https://gerrit.wikimedia.org/r/307940 (https://phabricator.wikimedia.org/T144487) (owner: 10Tobias Gritschacher) [09:21:37] (03CR) 10Zfilipin: [C: 032] "I have deleted selenium-Wikidata job." [integration/config] - 10https://gerrit.wikimedia.org/r/307940 (https://phabricator.wikimedia.org/T144487) (owner: 10Tobias Gritschacher) [09:22:39] (03Merged) 10jenkins-bot: Remove job selenium-Wikidata [integration/config] - 10https://gerrit.wikimedia.org/r/307940 (https://phabricator.wikimedia.org/T144487) (owner: 10Tobias Gritschacher) [09:23:26] 10Browser-Tests-Infrastructure, 10Wikidata, 15User-Tobi_WMDE_SW: Retire wikidata/browsertests.git - https://phabricator.wikimedia.org/T144486#2604487 (10zeljkofilipin) [09:23:29] 10Browser-Tests-Infrastructure, 10Wikidata, 13Patch-For-Review, 15User-Tobi_WMDE_SW: Remove https://integration.wikimedia.org/ci/job/selenium-Wikidata/ - https://phabricator.wikimedia.org/T144487#2604485 (10zeljkofilipin) 05Open>03Resolved a:03zeljkofilipin [09:24:09] 10Browser-Tests-Infrastructure, 10Wikidata, 13Patch-For-Review, 15User-Tobi_WMDE_SW: Remove https://integration.wikimedia.org/ci/job/selenium-Wikidata/ - https://phabricator.wikimedia.org/T144487#2601255 (10zeljkofilipin) I am not sure why the task was assigned to me when I resolved it. :| [09:24:59] (03PS44) 10Zfilipin: Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [09:44:42] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:53:07] (03CR) 10Zfilipin: [C: 032] Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) (owner: 10Zfilipin) [09:54:06] (03Merged) 10jenkins-bot: Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) (owner: 10Zfilipin) [10:24:43] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [10:31:29] 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Upstream: timeouts with rubygems.global.ssl.fastly.net causing jobs to fail - https://phabricator.wikimedia.org/T144325#2604545 (10hashar) a:05hashar>03None [10:31:41] (03PS1) 10Hashar: rake: tweak files filter [integration/config] - 10https://gerrit.wikimedia.org/r/308150 (https://phabricator.wikimedia.org/T144325) [10:32:05] (03CR) 10Hashar: "I have added a test to cover the rake files filter with https://gerrit.wikimedia.org/r/308150 and added a few more paths such as /spec/." [integration/config] - 10https://gerrit.wikimedia.org/r/307670 (https://phabricator.wikimedia.org/T144325) (owner: 10Krinkle) [10:32:53] (03CR) 10Hashar: [C: 032] rake: tweak files filter [integration/config] - 10https://gerrit.wikimedia.org/r/308150 (https://phabricator.wikimedia.org/T144325) (owner: 10Hashar) [10:33:37] zeljkof: I am deploying your change on zuul [10:34:06] (03CR) 10Hashar: "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/307940 (https://phabricator.wikimedia.org/T144487) (owner: 10Tobias Gritschacher) [10:34:08] (03Merged) 10jenkins-bot: rake: tweak files filter [integration/config] - 10https://gerrit.wikimedia.org/r/308150 (https://phabricator.wikimedia.org/T144325) (owner: 10Hashar) [10:34:15] zeljkof: https://gerrit.wikimedia.org/r/#/c/307940/2 :D [10:34:19] it changed zuul/layout.yaml [10:34:39] hashar: oops, sorry, I have forgot to deploy it [10:35:16] I did remember to delete the job, but completely forgot about zuul :| [10:35:45] AHARHGH rubocop [10:36:09] what's wrong? [10:36:21] I have forgot to set it up in vim :D [10:36:28] modules/role/spec/spec_helper.rb:25:14: C: %w-literals should be delimited by ( and ). [10:36:28] levels = %w{debug info notice warning err alert emerg crit} [10:36:28] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [10:41:05] :) [11:05:20] 10Gerrit: Short commit hashes with all numbers cannot be searched using gerrit - https://phabricator.wikimedia.org/T86035#2604554 (10Paladox) Ok I went to https://gerrit-review.googlesource.com/#/c/84672/ I then used 85d0492080e9242cd24879bedd37f920c237ede5 and took 8 caractors from it ie 85d04920 I then type... [11:46:16] 06Release-Engineering-Team, 06Community-Tech: updateCollation.php on terbium still run code from 1.28.0-wmf.16 against enwiki ( LoadBalancer::reallyOpenConnection: 402+ connections made (master=db1057) LoadBalancer.php line 850 ) - https://phabricator.wikimedia.org/T144580#2604615 (10hashar) [13:09:13] 10MediaWiki-Codesniffer: Update squizlabs/PHP_CodeSniffer to 3.x - https://phabricator.wikimedia.org/T142474#2604741 (10Paladox) 3.0.0rc1 has been release now, getting closer to stable release https://github.com/squizlabs/PHP_CodeSniffer/releases/tag/3.0.0RC1 [13:11:38] (03Draft1) 10Paladox: Update squizlabs/php_codesniffer to 2.7.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/308162 [13:11:59] (03PS2) 10Paladox: Update squizlabs/php_codesniffer to 2.7.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/308162 [13:15:33] (03CR) 10jenkins-bot: [V: 04-1] Update squizlabs/php_codesniffer to 2.7.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/308162 (owner: 10Paladox) [13:16:35] (03CR) 10Paladox: "@Legoktm not sure why it fails, could you fix the errors please?" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/308162 (owner: 10Paladox) [13:26:58] 05Gerrit-Migration, 10Diffusion, 10GitHub-Mirrors, 06Repository-Admins: Have Phabricator take over replication to Github - https://phabricator.wikimedia.org/T115624#2604755 (10Paladox) @demon could you create a script we could use that will add the mirrors to all the repos please? Also exclude the ones th... [13:31:44] 10Continuous-Integration-Infrastructure, 07Upstream, 07Zuul: Zuul-cloner failing to acquire .git lock sometimes - https://phabricator.wikimedia.org/T86730#2604759 (10Paladox) Is this fixed in https://phabricator.wikimedia.org/rCIZUe489cf2a1a97870c55abd4279a9bd8eeac0cb8b7 since it looks like that patch fixes... [13:32:38] hashar im wondering could you import from upstream into https://phabricator.wikimedia.org/diffusion/ODEV/ please? [13:32:40] nodepool [13:34:24] hashar Since the last update was novemeber, but they are on version 0.3.0 and were on 0.1.1 [13:34:32] https://github.com/openstack-infra/nodepool [13:37:46] ? [13:41:12] 03Scap3, 06Services, 10service-runner, 10service-template-node, and 2 others: Enable config deploys for service::node services - https://phabricator.wikimedia.org/T144542#2602980 (10mobrovac) [13:42:10] 03Scap3, 10ChangeProp, 10EventBus, 06Services, 15User-mobrovac: Enable Scap config deploys for Change Propagation - https://phabricator.wikimedia.org/T144595#2604784 (10mobrovac) [13:44:25] 03Scap3, 10Parsoid, 06Services, 15User-mobrovac: Enable Scap3 config deploys for Parsoid - https://phabricator.wikimedia.org/T144596#2604807 (10mobrovac) [13:44:50] 03Scap3, 10Parsoid, 06Services, 15User-mobrovac: Enable Scap3 config deploys for Parsoid - https://phabricator.wikimedia.org/T144596#2604821 (10mobrovac) p:05Triage>03Normal a:03mobrovac [13:45:01] 03Scap3, 06Services, 10service-runner, 10service-template-node, and 2 others: Enable config deploys for service::node services - https://phabricator.wikimedia.org/T144542#2602980 (10mobrovac) [13:45:05] 03Scap3, 10Parsoid, 06Services, 15User-mobrovac: Enable Scap3 config deploys for Parsoid - https://phabricator.wikimedia.org/T144596#2604807 (10mobrovac) [13:46:58] 03Scap3, 10Citoid, 06Services, 10VisualEditor, 15User-mobrovac: Enable Scap3 config deploys for Citoid - https://phabricator.wikimedia.org/T144597#2604835 (10mobrovac) [13:47:10] 03Scap3, 10Citoid, 06Services, 10VisualEditor, 15User-mobrovac: Enable Scap3 config deploys for Citoid - https://phabricator.wikimedia.org/T144597#2604851 (10mobrovac) p:05Triage>03Normal [13:57:16] 10Continuous-Integration-Infrastructure, 06Operations, 07Jenkins, 13Patch-For-Review, 07Wikimedia-Incident: Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected - https://phabricator.wikimedia.org/T126552#2604862 (10hashar) https://gerrit.wikimedia.org/r/#/c/300092/ had... [14:00:16] 03Scap3, 10Mobile-Content-Service, 06Services, 15User-mobrovac: Enable Scap3 config deploys for MCS - https://phabricator.wikimedia.org/T144598#2604863 (10mobrovac) [14:00:24] 03Scap3, 10Mobile-Content-Service, 06Services, 15User-mobrovac: Enable Scap3 config deploys for MCS - https://phabricator.wikimedia.org/T144598#2604877 (10mobrovac) p:05Triage>03Normal [14:00:38] 03Scap3, 06Services, 10service-runner, 10service-template-node, and 2 others: Enable config deploys for service::node services - https://phabricator.wikimedia.org/T144542#2602980 (10mobrovac) [14:00:41] 03Scap3, 10Mobile-Content-Service, 06Services, 15User-mobrovac: Enable Scap3 config deploys for MCS - https://phabricator.wikimedia.org/T144598#2604863 (10mobrovac) [14:06:22] 10Continuous-Integration-Infrastructure, 07Upstream, 07Zuul: Zuul-cloner failing to acquire .git lock sometimes - https://phabricator.wikimedia.org/T86730#2604901 (10hashar) e489cf2a1a97870c55abd4279a9bd8eeac0cb8b7 is the Zuul merger it deals with the file lock sticking when modifying the git configuration... [14:23:46] 10Continuous-Integration-Infrastructure, 07Upstream, 07Zuul: Zuul-cloner failing to acquire .git lock sometimes - https://phabricator.wikimedia.org/T86730#2604945 (10Paladox) Oh, @hashar would you know how to implement the fix please? [14:26:21] (03PS1) 10Gehel: elasticsearch-tool: adding project to continuous integration [integration/config] - 10https://gerrit.wikimedia.org/r/308175 [14:26:22] 10Continuous-Integration-Infrastructure, 07Nodepool: Update nodepool to upstream master branch - https://phabricator.wikimedia.org/T144601#2604952 (10Paladox) [14:26:54] (03CR) 10Gehel: [C: 04-1] "wait for initial commit in elasticsearch-tool repo before merging this change" [integration/config] - 10https://gerrit.wikimedia.org/r/308175 (owner: 10Gehel) [14:27:54] (03PS1) 10KartikMistry: Add jenkins job for apertium-ita, apertium-srd, apertium-srd-ita [integration/config] - 10https://gerrit.wikimedia.org/r/308176 [14:27:58] 05Continuous-Integration-Scaling, 07Jenkins, 07Nodepool: Postmortem: Nodepool can't add slaves to Jenkins due to config plugin directory reaching 32k inodes - https://phabricator.wikimedia.org/T127131#2604972 (10hashar) [14:28:01] 10Continuous-Integration-Infrastructure, 06Operations, 07Jenkins, 13Patch-For-Review, 07Wikimedia-Incident: Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected - https://phabricator.wikimedia.org/T126552#2604970 (10hashar) 05Open>03Resolved From the Gerrit change:... [14:29:01] 03Scap3, 06Services, 10service-runner, 10service-template-node, and 2 others: Enable config deploys for service::node services - https://phabricator.wikimedia.org/T144542#2604980 (10mobrovac) [14:29:04] 03Scap3, 10Citoid, 06Services, 10VisualEditor, 15User-mobrovac: Enable Scap3 config deploys for Citoid - https://phabricator.wikimedia.org/T144597#2604979 (10mobrovac) [14:29:28] 05Continuous-Integration-Scaling, 07Jenkins, 07Nodepool: Postmortem: Nodepool can't add slaves to Jenkins due to config plugin directory reaching 32k inodes - https://phabricator.wikimedia.org/T127131#2604981 (10Paladox) 05stalled>03Open I guess we can now go forward with this task, reopening it now. [14:42:17] 03Scap3, 10Citoid, 10ContentTranslation-CXserver, 10Graphoid, and 5 others: Depool and repool SCB services during deploys - https://phabricator.wikimedia.org/T144602#2604993 (10mobrovac) [14:42:32] 03Scap3, 10Citoid, 10ContentTranslation-CXserver, 10Graphoid, and 5 others: Depool and repool SCB services during deploys - https://phabricator.wikimedia.org/T144602#2605007 (10mobrovac) p:05Triage>03Normal [14:43:30] 05Continuous-Integration-Scaling, 07Jenkins, 07Nodepool: Postmortem: Nodepool can't add slaves to Jenkins due to config plugin directory reaching 32k inodes - https://phabricator.wikimedia.org/T127131#2605008 (10hashar) [14:45:26] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2605027 (10hashar) [14:49:01] Project language-screenshots-VisualEditor » chrome,Windows 10,ci-jessie-wikimedia build #52: 04FAILURE in 8 hr 0 min: https://integration.wikimedia.org/ci/job/language-screenshots-VisualEditor/BROWSER=chrome,PLATFORM=Windows%2010,label=ci-jessie-wikimedia/52/ [14:59:46] (03PS3) 10Hashar: Revert "Move tox-jessie & co. off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306725 (https://phabricator.wikimedia.org/T143938) [14:59:48] (03PS1) 10Hashar: [labs/striker] port job to Nodepool Jessie instance [integration/config] - 10https://gerrit.wikimedia.org/r/308183 (https://phabricator.wikimedia.org/T143938) [15:01:02] (03CR) 10jenkins-bot: [V: 04-1] Revert "Move tox-jessie & co. off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306725 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:01:05] (03CR) 10jenkins-bot: [V: 04-1] [labs/striker] port job to Nodepool Jessie instance [integration/config] - 10https://gerrit.wikimedia.org/r/308183 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:06:18] (03PS2) 10Hashar: [labs/striker] port job to Nodepool Jessie instance [integration/config] - 10https://gerrit.wikimedia.org/r/308183 (https://phabricator.wikimedia.org/T143938) [15:06:20] (03PS4) 10Hashar: Revert "Move tox-jessie & co. off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306725 (https://phabricator.wikimedia.org/T143938) [15:07:19] (03CR) 10jenkins-bot: [V: 04-1] [labs/striker] port job to Nodepool Jessie instance [integration/config] - 10https://gerrit.wikimedia.org/r/308183 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:07:33] (03CR) 10jenkins-bot: [V: 04-1] Revert "Move tox-jessie & co. off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306725 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:08:54] ahhh [15:09:45] 10Gerrit: Short commit hashes with all numbers cannot be searched using gerrit - https://phabricator.wikimedia.org/T86035#2605051 (10Aklapper) @Paladox: Your steps have nothing to do with the steps in the initial description. [15:10:04] (03PS5) 10Hashar: Revert "Move tox-jessie & co. off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306725 (https://phabricator.wikimedia.org/T143938) [15:10:16] (03Abandoned) 10Hashar: [labs/striker] port job to Nodepool Jessie instance [integration/config] - 10https://gerrit.wikimedia.org/r/308183 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:10:24] 10Gerrit: Short commit hashes with all numbers cannot be searched using gerrit - https://phabricator.wikimedia.org/T86035#2605055 (10Paladox) >>! In T86035#2605051, @Aklapper wrote: > @Paladox: Your steps have nothing to do with the steps in the initial description. No I'm talking about gerrit 2.13, which looks... [15:17:23] !log Bringing tox jobs to Nodepool with https://gerrit.wikimedia.org/r/#/c/306725/ [15:17:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:17:41] (03CR) 10Hashar: [C: 032] Revert "Move tox-jessie & co. off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306725 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:18:40] (03Merged) 10jenkins-bot: Revert "Move tox-jessie & co. off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306725 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:19:24] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2605078 (10hashar) To bring the tox job back to permanent slaves: * revert https://gerrit.wikimedia.org/r/#/c/306725/ * Rephrase the commit message one line summar... [15:21:59] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2605080 (10hashar) [15:41:53] I have added to https://grafana.wikimedia.org/dashboard/db/continuous-integration a pie chart of the current nodepool capacity. [15:42:01] there is a tiny box with an arrow that spawn a button which link to the history graph [15:42:12] also nodepool needs updating to 0.3.0? [15:42:41] I have bring back the tox jobs to Nodepool after discussion with chase. Revert instructions are on https://phabricator.wikimedia.org/T143938#2605078 if needed [15:42:48] paladox: cant update it [15:43:00] Oh [15:43:02] why? [15:43:08] paladox: there is a missing dependencies and have to review all the commits [15:43:15] Oh [15:58:23] 10Gerrit: Short commit hashes with all numbers cannot be searched using gerrit - https://phabricator.wikimedia.org/T86035#2605158 (10Aklapper) >>! In T86035#2605055, @Paladox wrote: >>>! In T86035#2605051, @Aklapper wrote: >> @Paladox: Your steps have nothing to do with the steps in the initial description. > >... [15:59:15] 10Gerrit: Short commit hashes with all numbers cannot be searched using gerrit - https://phabricator.wikimedia.org/T86035#2605177 (10Paladox) Yes they do, this task is about short commit hashes, so what I wrote has to do with this task. [16:01:49] Project selenium-CentralNotice » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #134: 04FAILURE in 48 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/134/ [16:14:02] thcipriani hashar: o/ hey there! just wanted to ping you folks on whether it's ok for me to upgrade the android emulator plugin [16:19:35] niedzielski: I was hoping hashar would respond to your email :P I *think* it should be fine (particularly since you folks are the only people using the android emulator plugin) as long as the plugin maintains compatibility with our (somewhat outdated) version of jenkins. I'm not clear if this requires a jenkins restart: probably depends on the nature of the plugin update and I'm sure the UI will [16:19:37] notify. [16:20:33] thcipriani: ah thanks! is it best upgrade through the jenkins gui itself or should i modify a config somewhere? [16:21:33] niedzielski: yeah, upgrade through the gui [16:21:44] thcipriani: cool :) [16:22:48] niedzielski: yeah, looks like this will require a restart :\ [16:23:15] thcipriani: :/ yeah i was just checking if one of hte other views exposed a "just give me the beans" button [16:23:26] :D [16:24:34] thcipriani: i guess i'll push the button if that's cool with you. do i need to coordinate a reboot? [16:25:50] greg-g: I feel like I should tap someone with opinions besides myself :) Is there any policy for a quick jenkins restart for a plugin upgrade? [16:26:18] or is that mostly a "whenever we feel like it's fine" sort of thing? [16:26:44] CI is in a pretty quiet time, should be minimally disruptive is my feeling here. [16:26:48] 06Release-Engineering-Team, 15User-greg: Create P&T offsite slides (due 9/12) - https://phabricator.wikimedia.org/T144511#2605270 (10demon) [16:27:05] on the bright side, if anything goes wrong, we'll be famous! [16:27:21] oh boy :) [16:30:58] niedzielski: thcipriani: please be bold :] [16:31:27] i think austin powers has a line where he says his "middle name is bold." [16:31:44] niedzielski: that's enough of a blessing for me: go for it! [16:31:47] android emulator is fairly low impact. It is only used by the android apps jobs [16:31:54] but make sure to read the changelog :] [16:32:06] and sometime JJB / the job got to be updated to reflect changes in the plugin [16:34:32] got to leave [16:36:55] thcipriani: i think will need to manually delete the emulator snapshots. i'll be handling changes needed as part of T133183 [16:37:28] hm, no stashbot in this channel [16:37:41] I found it [16:37:49] i believe it was bob marley who said "no stashbot, no cry" [16:38:19] Abraham Lincoln, actually [16:38:38] :) [16:38:53] thcipriani: anyway, i'm gonna push the button to install it. i think i should then get a prompt to restart [16:38:58] cool? [16:39:06] niedzielski: do it :) [16:39:44] thcipriani: i guess it's downloaded already. i'm gonna flip on "Restart Jenkins when installation is complete and no jobs are running" [16:39:56] then one day jenkins *should* reboot [16:40:03] niedzielski: kk, I'll log in -operations [16:40:36] thcipriani hashar: sweet! thanks for the help [16:41:01] niedzielski: thanks for pushing all the buttons :) [16:41:14] thcipriani: it was real hard and stuff but i found a way [16:41:22] :D [16:41:37] thcipriani: hope it all works and doesn't require too many changes. should be good on the android side i'm sure [16:42:28] indeed. reading the changelog it looked like lots of improvements without too many changes [16:43:48] thcipriani: yeah, the known issues section has more of the negative stuff in it but it sounded overall pretty nice [16:44:27] thcipriani: our old API 15 emulator configuration was pretty stable but we need some of the newer API functionality and we can't even get the API 15 emulator easily any more (google seems to have removed it from the sdk manager) [16:47:44] Project mediawiki-core-code-coverage build #2238: 15ABORTED in 1 hr 47 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/2238/ [16:48:10] Project language-screenshots-VisualEditor » chrome,Windows 10,ci-jessie-wikimedia build #53: 15ABORTED in 1 hr 9 min: https://integration.wikimedia.org/ci/job/language-screenshots-VisualEditor/BROWSER=chrome,PLATFORM=Windows%2010,label=ci-jessie-wikimedia/53/ [16:49:00] niedzielski: restarting [16:49:14] thcipriani: yayy! [16:50:42] niedzielski: and back [16:51:38] thcipriani: thanks for the help! i'll make any changes needed in androidland [16:51:54] ack, sounds good [17:04:19] Yippee, build fixed! [17:04:19] Project selenium-CentralNotice » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #135: 09FIXED in 39 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/135/ [17:19:25] ostriches did you manage to package gerrit 2.12.4? [17:19:34] I started, haven't finished. [17:19:44] ok thanks :) [19:05:28] 05Continuous-Integration-Scaling, 06Operations, 07Nodepool, 07WorkType-NewFunctionality: Backport python-shade from debian/testing to jessie-wikimedia - https://phabricator.wikimedia.org/T107267#2605675 (10hashar) [19:05:30] 10Continuous-Integration-Infrastructure, 07Nodepool: Update nodepool to upstream master branch - https://phabricator.wikimedia.org/T144601#2605674 (10hashar) [19:06:47] 10Continuous-Integration-Infrastructure, 07Nodepool: Update Nodepool to catch up with upstream master branch - https://phabricator.wikimedia.org/T144601#2604952 (10hashar) p:05Triage>03Low [19:08:25] 10Continuous-Integration-Infrastructure, 07Nodepool: Update Nodepool to catch up with upstream master branch - https://phabricator.wikimedia.org/T144601#2605689 (10hashar) It is not going to happen anytime soon. We would need: * get python-shade at an appropriate version T107267 * review logs * rebase custom... [19:15:36] 05Continuous-Integration-Scaling, 06Operations, 07Nodepool, 07WorkType-NewFunctionality: Backport python-shade from debian/testing to jessie-wikimedia - https://phabricator.wikimedia.org/T107267#2605705 (10Paladox) @MoritzMuehlenhoff hi, could you do this please? [19:17:31] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:22:06] paladox: stop spamming random people really :D [19:22:17] Oh sorry. [19:22:26] upgrading nodepool is NOT a priority [19:22:31] Ok [19:22:35] and most probably we would want a test setup [19:22:39] Ok [19:22:45] which cant really be made on top of labs instances [19:22:51] Ok [19:23:00] but ops have an openstack test cluster. So potentially we could get an updated Nodepool there and play with it [19:23:12] Ok [19:23:15] at least there is now a task to upgrade nodepool :] thx for that!! [19:23:24] Ok and your welcome :) [19:23:40] * paladox goes back to fixing puppet-lint [19:30:13] (03PS1) 10Legoktm: Whitelist harej [integration/config] - 10https://gerrit.wikimedia.org/r/308227 [19:31:44] (03PS1) 10Hashar: zuul: button to CI grafana board [integration/docroot] - 10https://gerrit.wikimedia.org/r/308228 [19:32:30] (03CR) 10Hashar: [C: 031] Whitelist harej [integration/config] - 10https://gerrit.wikimedia.org/r/308227 (owner: 10Legoktm) [19:32:38] (03CR) 10Hashar: [C: 032] zuul: button to CI grafana board [integration/docroot] - 10https://gerrit.wikimedia.org/r/308228 (owner: 10Hashar) [19:33:01] (03Merged) 10jenkins-bot: zuul: button to CI grafana board [integration/docroot] - 10https://gerrit.wikimedia.org/r/308228 (owner: 10Hashar) [19:42:30] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:34:51] (03CR) 10Legoktm: [C: 032] Whitelist harej [integration/config] - 10https://gerrit.wikimedia.org/r/308227 (owner: 10Legoktm) [20:35:56] (03Merged) 10jenkins-bot: Whitelist harej [integration/config] - 10https://gerrit.wikimedia.org/r/308227 (owner: 10Legoktm) [20:36:18] !log deploying https://gerrit.wikimedia.org/r/308227 [20:36:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:56:47] Project performance-webpagetest-wpt-org build #1860: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/performance-webpagetest-wpt-org/1860/ [21:23:44] 10Deployment-Systems: Make mwscript.php also look in WikimediaMaintenance - https://phabricator.wikimedia.org/T35284#2605998 (10demon) [21:24:53] 10Deployment-Systems: Make mwscript.php also look in WikimediaMaintenance - https://phabricator.wikimedia.org/T35284#370972 (10demon) 05Open>03declined Per @aaron above, this actually is not trivial to do. Plus there's lots of extensions that could have maintenance scripts, not just WikimediaMaintenance. Dec... [21:27:10] 10Deployment-Systems: Make mwscript.php also look in WikimediaMaintenance - https://phabricator.wikimedia.org/T35284#2606008 (10aaron) My bash completion scripts just use enwiki's directory...which is almost always good enough, heh. [21:46:11] ostriches, wandering did you manage to finish packaging gerrit 2.12.4? [21:47:51] wandering = wondering [21:49:20] paladox: I presume he will respond on the relevant task, you aren't his manager, you don't get to speed up his work on anything ;) [21:49:35] ok [21:49:35] s/speed up/focus/ [21:50:08] ok [21:50:19] sorry [21:51:06] it's ok, just respect that people are working on many things, not just the things you are watching/working on, and sometimes priorities aren't what you think they should be [21:51:51] ok [22:04:56] * Platonides thinks it was a polite inquiry [22:21:18] let's try this again [22:21:27] scap is complaining, Host key verification failed. [22:21:36] for deployment-parsoid09.deployment-prep.eqiad.wmflabs [22:21:48] arlolra: so in beta cluster when you are using scap3 you need to accept the host keys as your own user [22:22:15] so from the deploy server, do `ssh deployment-parsoid09.deployment-prep.eqiad.wmflabs` [22:22:17] and then accept the key [22:22:46] your ssh will fail (no forwared key) but ~/.ssh/known_hosts will get updated [22:23:23] this is an annoyance in beta cluster due to our lack of puppet collection for hosts keys [22:23:51] hmm, ok, let me try that [22:25:05] now i'm getting [22:25:06] Agent admitted failure to sign using the key. [22:25:06] Permission denied (publickey,keyboard-interactive). [22:25:53] any idea what that's about? or did i mess up step 1 above [22:26:37] hmmm... let's check to see if the agent is armed [22:27:25] its got a lot of keys added... [22:27:29] alrighty [22:27:47] do you know which ssh user parsoid is using? [22:27:59] I think it would be printed in the log message [22:28:15] something@host... [22:29:07] hmmm or maybe not? [22:29:44] nah, not seeing that in the scrollback [22:30:15] yeah I checked `scap deploy-log --verbose` and it's not there [22:30:29] https://github.com/wikimedia/mediawiki-services-parsoid-deploy/tree/master/scap [22:30:46] ssh_user: deploy-service [22:31:19] `sudo keyholder status` claims that key is loaded [22:32:32] * bd808 tries SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -vvv deploy-service@deployment-parsoid09.deployment-prep.eqiad.wmflabs [22:32:56] the key is not being accepted on the target host... [22:33:45] arlolra: is this the first scap3 deploy to that host? [22:34:16] no, mobrovac had done it previously [22:34:27] maybe the second though [22:34:54] scap3 is new to us [22:34:54] I'm going to ssh over there and see if I can spot anything in the auth log [22:35:14] thanks for the help [22:35:39] are you in boise today? [22:35:46] Sure! Using new deploy tool scan be frustrating [22:35:51] yup [22:36:04] that's a real nice room you've got there [22:36:46] thanks :) [22:36:59] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T144644#2606121 (10greg) [22:37:23] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2564761 (10greg) [22:38:06] bd808, arlolra, real nice house even. :) [22:40:24] thanks subbu. we are pretty happy with it :) [22:40:44] :) [22:40:52] arlolra: so I'm seeing the ssh server reject a long list of keys from the agent. Now I guess the mystery is why? [22:42:59] i'm sure your guess is better than mine [22:43:22] whoop a bit behind on this [22:43:27] I think I know what's happening [22:43:35] agent admitted failure to sign key? [22:43:48] that's a keyholder thing... [22:43:48] thcipriani: {{sofixit}} [22:43:52] :D [22:44:15] ssh demon on deployment-parsoid09 is rejecting the keys [22:45:06] oh [22:45:43] and why did it work for Marko? [22:45:44] uhh, I can get in [22:45:51] arlolra: try it now [22:45:59] you weren't in the deploy-service group [22:46:05] so I added you [22:46:05] ah! [22:46:21] workin' [22:46:24] thanks! [22:46:34] Finished Deploy: parsoid/deploy (duration: 00m 18s) [22:46:37] absotively, sorry about the hassle :) [22:46:38] thcipriani: maybe I'm not in beta either? [22:46:41] \o/ [22:46:47] * thcipriani looks [22:47:13] is subbu there? 'cause he'll want to update as well [22:47:48] arlolra: what's subbu's labs username? [22:48:45] good question [22:50:05] bd808: you weren't in that group either. I added you. Currently this is a local group on deployment-tin. Which is horrible and should be moved to ldap, but I'm unclear how to do that. Do you know if there are any docs for that? [22:50:30] i imagine it's the same as prod [22:50:30] https://github.com/wikimedia/operations-puppet/commit/046f1f4ba3c993fcfa464057869bfc20e4bf4db1 [22:50:38] ssastry, cscott, arlolra [22:50:48] * thcipriani adds [22:51:03] beta doesn't use admin/data.yaml afaik... [22:51:46] thcipriani: hmmm.. there's not really a good way to to manage that sort of thing in ldap that works 100% of the time [22:51:49] arlolra: cool, found all those users and added. [22:51:58] much thanks [22:52:25] thcipriani: in theory that's what service groups are for, but last I heard they were sketchy outside of the tools project [22:52:26] looks like that deploy went well from my manual tests in labs. calling it {{done}} [22:52:42] bd808, thcipriani: thank you both. have a nice weekend [22:52:53] arlolra: great, you too! [22:53:44] bd808: hrm, well, I guess what I'm doing works until deployment-tin is no longer a thing... :(( [22:53:56] (not that we're planning on that) [22:54:07] (but that's mostly what I worry about) [22:54:22] thcipriani: you could probably manage the group in tin with hiera somehow [22:54:57] yeah, that sounds smarter. [22:55:15] arlolra: you too. thanks for being patient [22:56:01] would that my patience was always so lightly tested (that was nothing) [22:56:15] thcipriani: it would be really nice to figure out how to solve the host key collection thing too, even if it's a bit of a hack [23:00:44] 10Beta-Cluster-Infrastructure, 03Scap3: Fixup beta scap3 keyholder problems - https://phabricator.wikimedia.org/T144647#2606197 (10thcipriani) [23:00:52] ^ bd808 [23:01:44] :) [23:02:10] I'll try to look at this next week. It's one of those weird things where the solutions are so quick that we haven't gotten around to "fixing" it :\ [23:02:40] "quick" but also opaque to everyone who hasn't hit this error before. [23:02:47] *these errors [23:03:17] 10Beta-Cluster-Infrastructure, 03Scap3: Fixup beta scap3 keyholder problems - https://phabricator.wikimedia.org/T144647#2606210 (10bd808) In my Striker project I solved the group thing by putting this in hiera: ``` scap::server::keyholder_agents: deploy-service: trusted_groups: - wikide... [23:20:54] 06Release-Engineering-Team, 15User-greg: "publish" draft Q2 goals (due 9/2) - https://phabricator.wikimedia.org/T144549#2606233 (10greg) draft done for now, will improve/clarify more next week [23:21:03] 06Release-Engineering-Team, 15User-greg: "publish" draft Q2 goals (due 9/2) - https://phabricator.wikimedia.org/T144549#2606234 (10greg) 05Open>03Resolved [23:25:22] 06Release-Engineering-Team, 06Operations, 15User-greg, 07Wikimedia-Incident: Institute quarterly(?) review of incident reports and follow-up - https://phabricator.wikimedia.org/T141287#2606273 (10greg) >>! In T141287#2565426, @greg wrote: > One easy thing to do would be to ping all of the very old tasks (e...