[00:00:53] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39416 bytes in 0.834 second response time [00:14:28] 10Deployment-Systems, 3Scap3, 3releng-201516-q3, 7WorkType-NewFunctionality: [keyresult] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#1946373 (10greg) [00:15:04] 10Continuous-Integration-Infrastructure, 10Differential, 5Gerrit-Migration, 3releng-201516-q2, 3releng-201516-q3: [keyresult] Connect Differential code review with continuous integration - https://phabricator.wikimedia.org/T31#1946378 (10greg) [00:18:24] 10Continuous-Integration-Infrastructure, 3releng-201516-q3, 7Jenkins: [keyresult] Migrate Jenkins to Jessie (gallium -> cobalt) - https://phabricator.wikimedia.org/T124121#1946384 (10greg) 3NEW [00:18:36] 10Continuous-Integration-Infrastructure: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#1946392 (10greg) [00:18:38] 10Continuous-Integration-Infrastructure, 3releng-201516-q3, 7Jenkins: [keyresult] Migrate Jenkins to Jessie (gallium -> cobalt) - https://phabricator.wikimedia.org/T124121#1946391 (10greg) [00:20:42] 6Release-Engineering-Team, 3releng-201516-q2, 15User-greg: QR action item: Phabricator - https://phabricator.wikimedia.org/T115176#1946406 (10greg) 5Open>3Invalid a:3greg This stuff is better tracked in other tasks already. [00:21:31] 5Continuous-Integration-Scaling, 3releng-201516-q2, 15User-greg: [keyresult] CI cluster responds to spike in queued builds by starting and registering additional jenkins slaves - https://phabricator.wikimedia.org/T111106#1946410 (10greg) 5Open>3Resolved a:3greg Per confirmation with @hashar, this is done. [00:21:42] 5Continuous-Integration-Scaling, 3releng-201516-q2, 15User-greg: [keyresult] CI cluster responds to spike in queued builds by starting and registering additional jenkins slaves - https://phabricator.wikimedia.org/T111106#1946415 (10greg) a:5greg>3hashar [00:22:50] 3releng-201516-q2, 15User-greg: [keyresult] Code review RFC (Differential) - Write, publish, publicize, and respond to/incorporate feedback. - https://phabricator.wikimedia.org/T114311#1946416 (10greg) 5Open>3Resolved a:3greg This step (Q2 quarterly goal) is done, the work on the actual RFC is at T119908. [00:23:11] 3releng-201516-q2: [keyresult] Code review RFC (Differential) - Write, publish, publicize, and respond to/incorporate feedback. - https://phabricator.wikimedia.org/T114311#1946420 (10greg) a:5greg>3demon [00:23:19] 5Continuous-Integration-Scaling, 3releng-201516-q2: [keyresult] CI cluster responds to spike in queued builds by starting and registering additional jenkins slaves - https://phabricator.wikimedia.org/T111106#1594588 (10greg) [00:46:00] 5Gerrit-Migration, 7Documentation: Update Commit Message Guidelines for phab - https://phabricator.wikimedia.org/T123081#1946499 (10greg) [00:46:02] 5Gerrit-Migration, 7Documentation: Update Code Review related documentation on wiki pages - https://phabricator.wikimedia.org/T207#1946498 (10greg) [00:48:31] 10Differential, 5Gerrit-Migration, 15User-greg: Phabricator and Arcanist versions must be kept in sync because of occasional breaking api changes between phabricator versions. - https://phabricator.wikimedia.org/T91422#1946516 (10greg) 5Open>3Resolved a:3greg This task is a statement of fact and not an... [00:48:33] 5Gerrit-Migration, 15User-greg: Identify Arcanist showstoppers for wikimedians - https://phabricator.wikimedia.org/T597#1946519 (10greg) [01:05:39] 01:04:10 ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/p/mediawiki/extensions/DataValues [01:06:05] https://integration.wikimedia.org/ci/job/mwext-Validator-testextension-php53/3/console [01:09:40] (03PS1) 10Reedy: Remove dependancy on DataValues as it's been removed from Gerrit... [integration/config] - 10https://gerrit.wikimedia.org/r/265179 [01:10:08] 6Release-Engineering-Team, 15User-greg: Write up idea from talk with Quim (meta "don't forget" task) - https://phabricator.wikimedia.org/T124125#1946614 (10greg) 3NEW a:3greg [01:11:19] (03CR) 10jenkins-bot: [V: 04-1] Remove dependancy on DataValues as it's been removed from Gerrit... [integration/config] - 10https://gerrit.wikimedia.org/r/265179 (owner: 10Reedy) [01:12:01] Oh [01:12:39] (03PS2) 10Reedy: Remove dependancy on DataValues as it's been removed from Gerrit... [integration/config] - 10https://gerrit.wikimedia.org/r/265179 [01:20:15] Project beta-update-databases-eqiad build #5915: 04FAILURE in 14 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/5915/ [02:21:57] Yippee, build fixed! [02:21:57] Project beta-update-databases-eqiad build #5916: 09FIXED in 1 min 55 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/5916/ [03:07:45] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:08:13] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:38] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31574 bytes in 2.312 second response time [03:13:04] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 39398 bytes in 0.593 second response time [03:32:44] Yippee, build fixed! [03:32:44] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #948: 09FIXED in 50 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/948/ [09:51:41] hashar: Please review https://gerrit.wikimedia.org/r/#/c/264333/ [09:52:25] hashar: Also would you know how we could add support for submodules. Would we add it to the unittests and qunittests [10:05:03] paladox: good morning [10:05:11] sorry I am catching up with emails from last two days [10:05:28] hashar: Ok. [10:05:32] I had lost access to my dev mail address so there is a rather long backlog :-} [10:05:53] if you added me as a reviewer in Gerrit, it sends a mail notification [10:06:02] so I got the change somewhere in my huge mail inbox [10:09:52] PROBLEM - Puppet failure on deployment-ms-be01 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [10:14:50] RECOVERY - Puppet failure on deployment-ms-be01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:16:41] hashar: Ok. [10:25:45] PROBLEM - Puppet failure on deployment-ms-be01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [10:49:14] (03PS1) 10Hashar: [OATHAuth] add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/265236 [10:49:26] (03CR) 10Hashar: [C: 032] [OATHAuth] add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/265236 (owner: 10Hashar) [10:50:47] (03Merged) 10jenkins-bot: [OATHAuth] add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/265236 (owner: 10Hashar) [10:50:48] 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7Technical-Debt, 7Tracking: All repositories should pass jshint test (tracking) - https://phabricator.wikimedia.org/T62619#1947384 (10TheDJ) [10:52:42] (03CR) 10Hashar: "Validated on source change https://gerrit.wikimedia.org/r/#/c/265154/ ;-} Thanks!" [integration/config] - 10https://gerrit.wikimedia.org/r/265236 (owner: 10Hashar) [12:03:35] 6Release-Engineering-Team, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release8, 3LE-CX8-Sprint 1, and 2 others: Review and create CX Parallel corpora table - https://phabricator.wikimedia.org/T120815#1947513 (10hashar) Jaime has agreed on the table schema and adding a new table is not a... [12:05:53] 6Release-Engineering-Team, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release8, 3LE-CX8-Sprint 1, and 2 others: Review and create CX Parallel corpora table - https://phabricator.wikimedia.org/T120815#1861819 (10hashar) (to clarify: I am not going to handle the table creation but we can... [12:50:41] 6Release-Engineering-Team, 6Phabricator, 10Phabricator-Sprint-Extension: Let's all stay in the loop on the Projects V3 update - https://phabricator.wikimedia.org/T120276#1947628 (10Aklapper) Is there anything actionable in this task which is not already covered by T120772? Or is this blocked on T120772? I w... [13:22:05] (03PS1) 10Hashar: [ReaderFeedback] Drop jslint job [integration/config] - 10https://gerrit.wikimedia.org/r/265256 (https://phabricator.wikimedia.org/T63622) [13:27:11] (03CR) 10Hashar: [C: 032] [ReaderFeedback] Drop jslint job [integration/config] - 10https://gerrit.wikimedia.org/r/265256 (https://phabricator.wikimedia.org/T63622) (owner: 10Hashar) [13:29:23] (03Merged) 10jenkins-bot: [ReaderFeedback] Drop jslint job [integration/config] - 10https://gerrit.wikimedia.org/r/265256 (https://phabricator.wikimedia.org/T63622) (owner: 10Hashar) [13:37:28] 6Release-Engineering-Team, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release8, 3LE-CX8-Sprint 1, and 2 others: Review and create CX Parallel corpora table - https://phabricator.wikimedia.org/T120815#1947756 (10Nikerabbit) The table belongs to the shared `wikishared` or `extension1` (II... [13:45:25] 6Release-Engineering-Team, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release8, 3LE-CX8-Sprint 1, and 2 others: Review and create CX Parallel corpora table - https://phabricator.wikimedia.org/T120815#1947782 (10jcrespo) Maybe I can modify `sql.php` to allow this on a separate issue? It... [14:11:34] 10Beta-Cluster-Infrastructure: Setup a Swift cluster on beta-cluster to match production - https://phabricator.wikimedia.org/T64835#1947857 (10fgiunchedi) [14:14:22] 10Beta-Cluster-Infrastructure, 6Commons, 10MediaWiki-File-management, 6Multimedia: Thumbnail generation should happen via the same setup in the beta cluster and in production (tracking) - https://phabricator.wikimedia.org/T84950#1947870 (10hashar) [14:14:26] 10Beta-Cluster-Infrastructure, 6Release-Engineering-Team, 7Tracking: Use Beta cluster as a true canary for code deployments (tracking) - https://phabricator.wikimedia.org/T53494#1947871 (10hashar) [14:14:28] 10Beta-Cluster-Infrastructure: Setup a Swift cluster on beta-cluster to match production - https://phabricator.wikimedia.org/T64835#1947867 (10hashar) 5stalled>3Open @fgiunchedi working on it. Hence the task is no more stalled. [14:16:40] Project beta-scap-eqiad build #87017: 04FAILURE in 6 min 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87017/ [14:25:01] 10Deployment-Systems, 6Release-Engineering-Team: 10Deployment-Systems, 6Release-Engineering-Team: sync-dir doesn't like 10Deployment-Systems, 6Release-Engineering-Team: Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #760: 04FAILURE in 2 min 41 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/760/ [14:28:52] 10Beta-Cluster-Infrastructure: Setup a Swift cluster on beta-cluster to match production - https://phabricator.wikimedia.org/T64835#1947921 (10fgiunchedi) I've setup a swift cluster with backends `deployment-ms-be01` / `deployment-ms-be02` and the frontend at `deployment-ms-fe01`. Each backend is a xlarge instan... [14:30:08] 10Deployment-Systems, 6Release-Engineering-Team, 5Patch-For-Review: (03PS1) 10Hashar: Add puppet/tox to kafkatee/varnishkafka puppet modules [integration/config] - 10https://gerrit.wikimedia.org/r/265276 [14:51:25] (03CR) 10Hashar: [C: 032] Add puppet/tox to kafkatee/varnishkafka puppet modules [integration/config] - 10https://gerrit.wikimedia.org/r/265276 (owner: 10Hashar) [14:51:29] 10Deployment-Systems, 6Release-Engineering-Team, 5Patch-For-Review: sync-dir doesn't like (03Merged) 10jenkins-bot: Add puppet/tox to kafkatee/varnishkafka puppet modules [integration/config] - 10https://gerrit.wikimedia.org/r/265276 (owner: 10Hashar) [15:01:53] Yippee, build fixed! [15:01:54] Project beta-scap-eqiad build #87023: 09FIXED in 7 min 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87023/ [15:02:53] (03PS2) 10Hashar: [RandomRootPage] Archived extension [integration/config] - 10https://gerrit.wikimedia.org/r/264975 (owner: 10Paladox) [15:03:03] (03PS3) 10Hashar: [RandomRootPage] Archived extension [integration/config] - 10https://gerrit.wikimedia.org/r/264975 (https://phabricator.wikimedia.org/T109809) (owner: 10Paladox) [15:04:41] (03CR) 10Hashar: [C: 032] "Got merged in core ( T109809 )" [integration/config] - 10https://gerrit.wikimedia.org/r/264975 (https://phabricator.wikimedia.org/T109809) (owner: 10Paladox) [15:05:48] (03Merged) 10jenkins-bot: [RandomRootPage] Archived extension [integration/config] - 10https://gerrit.wikimedia.org/r/264975 (https://phabricator.wikimedia.org/T109809) (owner: 10Paladox) [15:09:00] (03PS2) 10Hashar: [SiteSettings] Replace jslint test with npm and add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/265266 (owner: 10Paladox) [15:09:16] (03CR) 10Hashar: [C: 032] [SiteSettings] Replace jslint test with npm and add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/265266 (owner: 10Paladox) [15:10:30] (03Merged) 10jenkins-bot: [SiteSettings] Replace jslint test with npm and add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/265266 (owner: 10Paladox) [15:54:03] 6Release-Engineering-Team, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release8, 3LE-CX8-Sprint 1, and 2 others: Review and create CX Parallel corpora table - https://phabricator.wikimedia.org/T120815#1948184 (10hashar) Indeed MediaWiki `maintenance/sql.php` relies on a $wgDBName. So i... [16:00:05] 6Release-Engineering-Team, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release8, 3LE-CX8-Sprint 1, and 2 others: Review and create CX Parallel corpora table - https://phabricator.wikimedia.org/T120815#1948202 (10Krenair) `mwscript sql.php testwiki --cluster extension1` WFM. [16:03:06] 6Release-Engineering-Team, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release8, 3LE-CX8-Sprint 1, and 2 others: Review and create CX Parallel corpora table - https://phabricator.wikimedia.org/T120815#1948208 (10jcrespo) Oh, so no need to change it. @santosh, follow Krenair's advice. Let... [16:03:46] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:03:48] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:08:38] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31574 bytes in 1.230 second response time [16:08:38] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39760 bytes in 1.292 second response time [16:10:02] (03CR) 10Paladox: "Thanks." [integration/config] - 10https://gerrit.wikimedia.org/r/265266 (owner: 10Paladox) [16:17:32] 6Release-Engineering-Team, 6Phabricator, 10Phabricator-Sprint-Extension: Let's all stay in the loop on the Projects V3 update - https://phabricator.wikimedia.org/T120276#1948225 (10greg) This task is mostly superseded by Z308. I encouraged @DStrine to create tasks for tracking upcoming issues in Phab deploys... [17:02:48] hashar: It seems that test [17:02:48] beta-mediawiki-config-update-eqiad [17:02:49] has gone down again. it did this yesterday and now there is a queue at https://integration.wikimedia.org/zuul/ [17:03:23] https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ [17:03:29] https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update-eqiad/ [17:03:42] https://integration.wikimedia.org/ci/computer/deployment-bastion.eqiad/ [17:06:49] !log clearing files from beta-cluster to prepare for Swift migration. python pwb.py delete.py -family:betacommons -lang:en -cat:'GWToolset Batch Upload' -verbose -putthrottle:0 -summary:'Clearing out old batched upload to save up disk space for Swift migration' [17:06:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:20:04] 10Beta-Cluster-Infrastructure: beta equivalent of noc.wikimedia.org/conf - https://phabricator.wikimedia.org/T42819#1948508 (10Krenair) [17:32:59] Reedy: nice, you got D100 :) [17:33:13] heh, not particularly intentional :D [17:33:58] 10Deployment-Systems, 3Scap3: Bug in scap3 git submodule url rewriting - https://phabricator.wikimedia.org/T121884#1948571 (10mmodell) a:3thcipriani Tyler says this is fixed but he forgot to close it. @thcipriani: can you link to the commit if it's not too difficult to find? :) [17:34:10] 10Deployment-Systems, 3Scap3: Bug in scap3 git submodule url rewriting - https://phabricator.wikimedia.org/T121884#1948581 (10mmodell) p:5Triage>3Normal [17:34:41] 10Deployment-Systems, 3Scap3, 7WorkType-NewFunctionality: default lock file for scap3 should be repo-dependent - https://phabricator.wikimedia.org/T116208#1948584 (10mmodell) p:5Normal>3High [17:39:24] still deleting a bunch of files on the beta cluster [17:39:28] will take a while [17:39:48] idling / dinner etc while the bots delete [17:40:14] Oh and good news: there is some Swift cluster on beta cluster thanks to godog !!!!!! [17:40:48] 10Deployment-Systems, 6Release-Engineering-Team, 5Patch-For-Review: 3Resolved a:3Reedy This is resolved by the patch above being merged. Still the discrepancy to look at in T124171 [17:40:57] 10Deployment-Systems, 6Release-Engineering-Team: sync-dir doesn't like 10Deployment-Systems, 6Release-Engineering-Team: sync-dir doesn't like scap seemingly allows PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:59:45] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:07:28] 6Release-Engineering-Team, 6Phabricator, 10Phabricator-Sprint-Extension: Let's all stay in the loop on the Projects V3 update - https://phabricator.wikimedia.org/T120276#1948681 (10Aklapper) /me agrees with @greg [18:08:54] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39424 bytes in 3.892 second response time [18:09:38] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31573 bytes in 0.576 second response time [18:11:12] (03PS1) 10Gilles: Remove last thumbor repos [integration/config] - 10https://gerrit.wikimedia.org/r/265298 [18:13:08] 6Release-Engineering-Team, 6Phabricator, 10Phabricator-Sprint-Extension: Let's all stay in the loop on the Projects V3 update - https://phabricator.wikimedia.org/T120276#1948694 (10DStrine) Z308 is really helpful. I think we can resolve this task. [18:15:56] 6Release-Engineering-Team, 6Phabricator, 10Phabricator-Sprint-Extension: Let's all stay in the loop on the Projects V3 update - https://phabricator.wikimedia.org/T120276#1948736 (10Aklapper) 5Open>3Resolved a:3Aklapper [18:25:19] 6Release-Engineering-Team, 6Phabricator, 10Phabricator-Sprint-Extension: Let's all stay in the loop on the Projects V3 update - https://phabricator.wikimedia.org/T120276#1948816 (10greg) {icon thumbs-o-up} [18:30:01] 10Deployment-Systems, 5codfw-rollout, 3codfw-rollout-Apr-Jun-2015: Selecting configuration files depending on the realm of the current (bastion) server isn't always sensible - https://phabricator.wikimedia.org/T46889#1948872 (10Aklapper) [18:42:36] 10Beta-Cluster-Infrastructure: Document how and when the configuration is propagated to the beta cluster - https://phabricator.wikimedia.org/T124198#1948921 (10Dereckson) 3NEW [18:54:36] 10Beta-Cluster-Infrastructure: Document how and when the configuration is propagated to the beta cluster - https://phabricator.wikimedia.org/T124198#1948999 (10hashar) It is mostly Jenkins driven via a job that runs every 10 or so minutes. There is some overview at https://wikitech.wikimedia.org/wiki/Nova_Resour... [18:55:31] 10Beta-Cluster-Infrastructure: Document how and when the configuration is propagated to the beta cluster - https://phabricator.wikimedia.org/T124198#1949002 (10hashar) For the root cause, I noticed the Jenkins jobs running on beta cluster are deadlocked. Which is hitting us from time to time :( Fixing it. [18:55:48] !log disconnecting Gearman plugin to remove deadlock for beta cluster rjobs [18:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:56:08] so I just ran into a weird situation with bootstrapping a brand new repo for use with differential. When you create the repo in diffusion it's completely empty, so no master branch, no commits. [18:56:56] then you try to use differential to create the patch which adds .arcconfig, but when you try to `arc land` the patch, arc fails because it can't push to a remote branch that doesn't exist [18:56:59] !log beta cluster code has been stalled for roughly 2h30 [18:57:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:57:29] I don't know if it's a bug in arc or if we should just be sure to always push the first commit, bypassing differential. [18:57:34] ostriches: ^ [19:02:02] Probably the latter. [19:02:17] Assuming the first commit is .arcconfig :) [19:02:49] Feature request: let Phab create the new repo on disk with a master branch & initial commit with .arcconfig populated :p [19:03:11] I'm sure we could hook into the repo creation process [19:13:53] that would be a really nice feature. [19:14:01] as long as it's somehow optional anyway [19:14:02] !log beta: sudo rm /data/project/upload7/*/*/lockdir/* [19:14:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:19:50] !log beta: sudo find /data/project/upload7/*/*/temp -type f -delete [19:19:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:25:02] Project beta-code-update-eqiad build #89370: 04FAILURE in 2 min 7 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/89370/ [19:26:36] !log Nuked all files from http://commons.wikimedia.beta.wmflabs.org/wiki/Category:GWToolset_Batch_Upload [19:26:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:40:46] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:43:41] 10Deployment-Systems, 6Release-Engineering-Team: sync-dir doesn't like !log beta : foreachwiki deleteArchivedRevisions.php -delete [19:49:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:50:14] !log beta: on commons ran deleteArchivedFile.php : Nuked 7130 files [19:50:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:50:39] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39741 bytes in 1.150 second response time [20:05:51] !log beta sudo find /data/project/upload7/math -type f -delete (probably some old left over) [20:05:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:33:35] thcipriani: magic! https://phabricator.wikimedia.org/P2496 [20:34:29] Now, if I committed that and cloned it, I'm curious what it'd look like transitively. [20:34:37] ie: would it even work if the path isn't identical? [20:34:50] (in my local setup, it isn't) [20:41:50] so it doesn't just substitute the absolute url in the gitmodules file? [20:42:01] and what do you mean the path isn't identical? which path? [20:42:38] ostriches: ^ [20:43:01] No, the .gitmodules file keeps the ../ [20:43:57] that is some interesting magic. git is full of obscure conveniences [20:44:17] Ah yes, my theory is correct. [20:44:18] it's almost as if that feature is made for gerrit [20:45:00] https://phabricator.wikimedia.org/P2497 [20:45:38] It totally works in just vanilla git, but you've gotta keep the directory structure the same or git can't figure it out. [20:46:16] So if you clone from ../foo into bar/baz, when you clone bar over to bar2/, bar/../foo has to exist [20:46:24] weird. [20:46:46] well, I suppose that's expected behavior, it's just not a feature I knew about until this morning :) [20:48:54] I'm was mainly just curious if we could avoid the submodule rewriting, but we can't ensure our structure on tin matches that of gerrit or Phab really... [20:50:17] thcipriani, can you check https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes?from=1453312183647&to=1453322983647&var-site=All&var-cache_type=text&var-status_type=5 [20:50:29] * thcipriani looks [20:50:37] it happened just after the latest mediawiki update [20:51:13] I cannot reproduce the 503, but commons is not very happy [20:51:35] hmmm, that does seem to correlate very closely with the update [20:52:03] is text- commons, all datacenters [20:52:36] so I am assuming application-related [20:53:48] huge spike in dbperformance logging at same time. [20:53:55] ostriches: Can you help with https://phabricator.wikimedia.org/T119588 ? Kill that SyntaxHighlight_Pygments repo in gerrit and phab. It was created as copy of SyntaxHighlight_Geshi by mistake and never used. And ironically, SH_Geshi is now Pygments and SH_Pygments uses Geshi. SH_Geshi is canonical and latest. To be renamed to plain "SyntaxHighlight" soon. [20:54:18] Cite https://phabricator.wikimedia.org/T103614 for that [20:54:24] I already deleted SyntaxHighlight_Pygments in gerrit. [20:54:35] There was some task [20:54:49] lots of session errors in logstash for commonswiki, that's about all I can see there. [20:56:03] Thousands of "Can neither load the session nor create an empty session" [20:56:35] 10Deployment-Systems, 6Release-Engineering-Team: sync-dir doesn't like I think there is a session issue site wide [21:03:06] 10Deployment-Systems, 6Release-Engineering-Team: sync-dir doesn't like ostriches: hello :) Do you have some doc about deleting Gerrit repos ? [21:07:09] just wondering, I guess I missing the appropriate permission [21:08:14] spike in 5xxs seems related to the session problems, seemingly no logs of them prior to the deployment. I don't think it's a coincidence, especially considering the error doesn't exist in the fatalmonitor for enwiki [21:09:00] security issue, too? https://phabricator.wikimedia.org/T124224 [21:09:21] was session stuff deployed at the same time? [21:10:28] hashar: Any gerrit admin can do it, just requires command line [21:10:37] (ssh to gerrit command line, that is) [21:10:49] https://logstash.wikimedia.org/#dashboard/temp/AVJg3_N7ptxhN1XaPLH5 [21:11:04] `ssh -p 29418 gerrit.wikimedia.org deleteproject delete --yes-really-delete foo/bar` [21:11:30] oh [21:11:42] I was trying to get the help ` gerrit deleteproject delete --help` [21:11:46] gerrit: deleteproject: not found [21:11:47] :D [21:11:57] so assumed the plugin is disabled / or I lack perm [21:12:08] will give it a try tomorrow with the full delete comman [21:12:08] d [21:33:13] (03PS1) 10Paladox: [SemanticSifter] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265393 [21:33:55] (03PS2) 10Paladox: [SemanticSifter] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265393 [21:36:28] (03CR) 10jenkins-bot: [V: 04-1] [SemanticSifter] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265393 (owner: 10Paladox) [21:40:39] (03CR) 10Paladox: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/265393 (owner: 10Paladox) [21:44:25] (03CR) 10Hashar: [C: 032] Remove last thumbor repos [integration/config] - 10https://gerrit.wikimedia.org/r/265298 (owner: 10Gilles) [21:44:27] Reedy: Hi im getting an error in https://integration.wikimedia.org/ci/job/tox-jessie/3859/console the error has nothing todo with the patch i uploaded. It is saying the error is comming from FAIL: test_zuul_project_in_gerrit.test_zuul_project_in_gerrit('thumbor/multi-handler',) [21:44:35] FAIL: test_zuul_project_in_gerrit.test_zuul_project_in_gerrit('thumbor/video-loader',) [21:49:31] (03Merged) 10jenkins-bot: Remove last thumbor repos [integration/config] - 10https://gerrit.wikimedia.org/r/265298 (owner: 10Gilles) [21:50:26] (03CR) 10Hashar: "Deployed, they are gone from CI" [integration/config] - 10https://gerrit.wikimedia.org/r/265298 (owner: 10Gilles) [22:01:36] (03CR) 10Paladox: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/265393 (owner: 10Paladox) [22:16:52] 10Deployment-Systems, 3Scap3: Bug in scap3 git submodule url rewriting - https://phabricator.wikimedia.org/T121884#1949925 (10thcipriani) 5Open>3Resolved Yup, should be fixed. Problem was the bad assumption that the name and path of a submodule will always be the same. This is fixed in master. Not sure i... [22:24:12] (03Abandoned) 10Paladox: [SacredText] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/245967 (owner: 10Paladox) [22:30:00] (03PS4) 10Paladox: [BlueSky] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/247309 [22:39:53] 6Release-Engineering-Team, 6Team-Practices: One year review of RelEng offsite outcomes (April 2016) - https://phabricator.wikimedia.org/T112763#1950058 (10KLans_WMF) [22:41:19] (03PS4) 10Paladox: [examples] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/244747 [22:46:03] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:46:28] (03PS8) 10Paladox: Update extensions using the deprecated qunit test [integration/config] - 10https://gerrit.wikimedia.org/r/245495 [22:46:45] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:47:49] (03CR) 10Paladox: "Adding @Florianschmidtwelzow and @Jdlrobson since they work on MobileFrontend." [integration/config] - 10https://gerrit.wikimedia.org/r/245495 (owner: 10Paladox) [22:48:09] (03PS4) 10Paladox: [CodeMirror] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/255141 [22:48:58] (03CR) 10jenkins-bot: [V: 04-1] Update extensions using the deprecated qunit test [integration/config] - 10https://gerrit.wikimedia.org/r/245495 (owner: 10Paladox) [22:50:40] (03PS9) 10Paladox: Update extensions using the deprecated qunit test [integration/config] - 10https://gerrit.wikimedia.org/r/245495 [22:51:37] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39754 bytes in 0.660 second response time [22:52:28] (03CR) 10jenkins-bot: [V: 04-1] Update extensions using the deprecated qunit test [integration/config] - 10https://gerrit.wikimedia.org/r/245495 (owner: 10Paladox) [22:54:01] (03PS10) 10Paladox: Update extensions using the deprecated qunit test [integration/config] - 10https://gerrit.wikimedia.org/r/245495 [22:55:52] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39406 bytes in 0.841 second response time [22:59:07] (03PS1) 10Paladox: [MobileFrontend] Add some more experimental tests [integration/config] - 10https://gerrit.wikimedia.org/r/265420 [23:03:14] Is anyone here familiar with setting up zuul? I'm trying to find someone to help me figure out why my Zuul queue isn't being processed by any Jenkins workers despite zuul printing logs showing that gearman and Jenkins are communicating with each other. [23:04:27] (03CR) 10Paladox: "Thanks test passes the new qunit test. And I think mobilefrontend will because actually in the new test it uses two extensions as dependac" [integration/config] - 10https://gerrit.wikimedia.org/r/245495 (owner: 10Paladox) [23:05:05] (03PS2) 10Paladox: [Offline] Make test extension vote false [integration/config] - 10https://gerrit.wikimedia.org/r/258993 [23:12:12] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #392: 04FAILURE in 15 min: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/392/ [23:26:57] zxiiro: our zuul expert isn't here right now, but I would ask in the OpenStack CI channel (I presume they have one) [23:35:06] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1950387 (10mmodell) git-ssh.wikimedia.org has an ipv6 address in DNS, however, it's not yet active due to lack of time to work on this. We need t... [23:35:13] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:40:04] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 39406 bytes in 0.535 second response time [23:43:50] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1950423 (10Dzahn) >>! In T100519#1950387, @mmodell wrote: > git-ssh.wikimedia.org has an ipv6 address in DNS, however, it's not yet active due to...