[00:10:06] 10Beta-Cluster, 6operations, 7Database: Possible to run writes (e.g. UPDATE) on slave - https://phabricator.wikimedia.org/T110115#1569721 (10Mattflaschen) [00:55:39] (03PS1) 10BryanDavis: Revert "Run mw-install-mysql.sh with statement tracing" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/233651 [00:55:47] (03CR) 10BryanDavis: [C: 032] Revert "Run mw-install-mysql.sh with statement tracing" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/233651 (owner: 10BryanDavis) [00:56:23] (03Merged) 10jenkins-bot: Revert "Run mw-install-mysql.sh with statement tracing" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/233651 (owner: 10BryanDavis) [00:58:16] !log Updated tin:/srv/deployment/integration/slave-scripts to b287e93 (Revert "Run mw-install-mysql.sh with statement tracing") and synced via trebuchet [00:58:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [02:42:07] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #796: FAILURE in 6.8 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/796/ [04:31:32] Yippee, build fixed! [04:31:33] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #541: FIXED in 39 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/541/ [08:30:22] 6Release-Engineering, 6Zero, 7Mobile, 7Technical-Debt: Pull WikipediaMobileFirefoxOS from mediawiki-config - https://phabricator.wikimedia.org/T107172#1570365 (10hashar) p:5Triage>3Low Thanks @dr0ptp4kt :-) [08:51:08] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore - https://phabricator.wikimedia.org/T110052#1570375 (10hashar) 5Open>3Resolved a:3hashar integration-slave-precise-1014 was affected as wel... [08:52:13] !log pooling back integration-slave-precise-1014 , integration-slave-trusty-1014 and integration-slave-trusty-1017 . labvirt1007 missed disk space ( https://phabricator.wikimedia.org/T110052 ) [08:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:58:56] (03PS1) 10Hashar: Archive ClickTracking extension [integration/config] - 10https://gerrit.wikimedia.org/r/233681 (https://phabricator.wikimedia.org/T63591) [08:59:06] (03CR) 10Hashar: [C: 032] Archive ClickTracking extension [integration/config] - 10https://gerrit.wikimedia.org/r/233681 (https://phabricator.wikimedia.org/T63591) (owner: 10Hashar) [09:01:16] (03Merged) 10jenkins-bot: Archive ClickTracking extension [integration/config] - 10https://gerrit.wikimedia.org/r/233681 (https://phabricator.wikimedia.org/T63591) (owner: 10Hashar) [09:24:14] 5Continuous-Integration-Isolation, 3releng-201516-q1: [keyresult] boot instances from OpenStack API - https://phabricator.wikimedia.org/T109913#1570478 (10hashar) p:5Triage>3Low This was solved as part of setting up the OpenStack account for Nodepool. Andrew did the configuration and we validated it back i... [09:26:15] 10Browser-Tests, 6Release-Engineering, 5Testing Initiative 2015: Improve documentation around running/writing (with lots of examples) browser tests - https://phabricator.wikimedia.org/T108108#1570485 (10hashar) [09:26:15] 6Release-Engineering, 6Team-Practices, 5Testing Initiative 2015, 7Tracking: Follow up workshop & brown bag ideas from Testing: Where does it hurt? - https://phabricator.wikimedia.org/T108122#1570484 (10hashar) [09:26:26] 6Release-Engineering, 6Team-Practices, 5Testing Initiative 2015, 7Tracking: [tracking] Follow up workshop & brown bag ideas from Testing: Where does it hurt? - https://phabricator.wikimedia.org/T108122#1570488 (10hashar) [09:26:40] 10Browser-Tests, 6Release-Engineering, 5Testing Initiative 2015: Improve documentation around running/writing (with lots of examples) browser tests - https://phabricator.wikimedia.org/T108108#1570489 (10hashar) p:5Triage>3Normal [09:33:09] 10Continuous-Integration-Config, 5Patch-For-Review: Remove mediawiki/extensions/DataTypes from CI - https://phabricator.wikimedia.org/T108759#1570503 (10hashar) p:5Triage>3Low [09:42:15] Yippee, build fixed! [09:42:16] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #702: FIXED in 1 hr 32 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/702/ [09:43:15] Yippee, build fixed! [09:43:15] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #590: FIXED in 6 min 14 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/590/ [09:44:52] (03CR) 10Hashar: "That was done on purpose in the initial version RobLa wrote." [tools/release] - 10https://gerrit.wikimedia.org/r/231317 (owner: 10Florianschmidtwelzow) [09:58:59] 10Browser-Tests, 10MobileFrontend: Chrome window resizes without height in @integration tests - toggling test fails - https://phabricator.wikimedia.org/T109794#1570538 (10Jhernandez) [09:59:31] RECOVERY - Puppet staleness on integration-slave-trusty-1017 is OK Less than 1.00% above the threshold [3600.0] [10:02:12] !log deleted job https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-8-sauce/ Was disabled and no more in our JJB config [10:02:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:08:09] 10Browser-Tests, 10MediaWiki-Vagrant, 5Patch-For-Review, 5Testing Initiative 2015: Vagrant command for running browser tests - https://phabricator.wikimedia.org/T96283#1570681 (10zeljkofilipin) I have just stumbled upon this: https://forge.puppetlabs.com/p0deje/display Puppet module which sets up Xvfb + x... [11:38:24] (03PS1) 10Hashar: Throttle mediawiki jobs to one per node [integration/config] - 10https://gerrit.wikimedia.org/r/233689 [11:39:14] (03CR) 10Hashar: [C: 032] Throttle mediawiki jobs to one per node [integration/config] - 10https://gerrit.wikimedia.org/r/233689 (owner: 10Hashar) [11:41:24] (03Merged) 10jenkins-bot: Throttle mediawiki jobs to one per node [integration/config] - 10https://gerrit.wikimedia.org/r/233689 (owner: 10Hashar) [11:47:43] !log Upgraded a bunch of Jenkins plugins [11:47:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:49:41] (03Abandoned) 10Hashar: Throttle mediawiki core jobs to one per node [integration/config] - 10https://gerrit.wikimedia.org/r/227234 (owner: 10Hashar) [11:57:15] (03Abandoned) 10Hashar: Add BlogPage to testextension [integration/config] - 10https://gerrit.wikimedia.org/r/227217 (owner: 10Paladox) [11:59:44] (03PS14) 10Hashar: Always fail the vector extension [integration/config] - 10https://gerrit.wikimedia.org/r/228133 (owner: 10Paladox) [12:05:40] (03CR) 10Hashar: Always fail the vector extension (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/228133 (owner: 10Paladox) [12:05:45] (03PS15) 10Hashar: Always fail the vector extension [integration/config] - 10https://gerrit.wikimedia.org/r/228133 (owner: 10Paladox) [12:06:56] (03CR) 10Hashar: [C: 032] "Deleting:" [integration/config] - 10https://gerrit.wikimedia.org/r/228133 (owner: 10Paladox) [12:08:27] (03PS16) 10Hashar: Always fail the vector extension [integration/config] - 10https://gerrit.wikimedia.org/r/228133 (owner: 10Paladox) [12:08:37] (03CR) 10Hashar: [C: 04-2] Always fail the vector extension [integration/config] - 10https://gerrit.wikimedia.org/r/228133 (owner: 10Paladox) [12:09:52] (03PS17) 10Hashar: Always fail the vector extension [integration/config] - 10https://gerrit.wikimedia.org/r/228133 (owner: 10Paladox) [12:10:14] (03CR) 10Hashar: [C: 032] "Made the Zuul layout to use the 'archived' template." [integration/config] - 10https://gerrit.wikimedia.org/r/228133 (owner: 10Paladox) [12:12:07] (03Merged) 10jenkins-bot: Always fail the vector extension [integration/config] - 10https://gerrit.wikimedia.org/r/228133 (owner: 10Paladox) [12:14:32] (03CR) 10Hashar: [C: 04-1] "Filled T110178" (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/232726 (owner: 10Amire80) [12:14:41] (03PS2) 10Hashar: Remove Narayam and WebFonts [integration/config] - 10https://gerrit.wikimedia.org/r/232726 (https://phabricator.wikimedia.org/T110178) (owner: 10Amire80) [12:14:43] (03CR) 10jenkins-bot: [V: 04-1] Remove Narayam and WebFonts [integration/config] - 10https://gerrit.wikimedia.org/r/232726 (https://phabricator.wikimedia.org/T110178) (owner: 10Amire80) [12:15:55] 10Continuous-Integration-Config, 10MediaWiki-extensions-UniversalLanguageSelector, 5Patch-For-Review: Archive extensions Narayam and WebFonts - https://phabricator.wikimedia.org/T110178#1570799 (10hashar) [12:16:07] 10Continuous-Integration-Config, 10MediaWiki-extensions-UniversalLanguageSelector, 5Patch-For-Review: Archive extensions Narayam and WebFonts - https://phabricator.wikimedia.org/T110178#1570801 (10hashar) p:5Triage>3Low [12:20:49] (03CR) 10Hashar: "Was still in the mediawiki-extensions-{phpflavor} and mediawiki-extensions-qunit jobs." [integration/config] - 10https://gerrit.wikimedia.org/r/228877 (owner: 10Paladox) [12:20:54] (03PS10) 10Hashar: Archive Mantle extension [integration/config] - 10https://gerrit.wikimedia.org/r/228877 (owner: 10Paladox) [12:21:51] (03CR) 10Hashar: "Potentially we still need Mantle for the release branches :-/" [integration/config] - 10https://gerrit.wikimedia.org/r/228877 (owner: 10Paladox) [12:23:58] (03CR) 10Paladox: "Oh ok thanks." [integration/config] - 10https://gerrit.wikimedia.org/r/228877 (owner: 10Paladox) [12:27:11] (03PS8) 10Hashar: Add check for json in TwitterLogin [integration/config] - 10https://gerrit.wikimedia.org/r/225711 (owner: 10Paladox) [12:32:02] (03PS2) 10Paladox: Add jshint to check: to Slate Skin [integration/config] - 10https://gerrit.wikimedia.org/r/229178 [12:32:10] (03PS2) 10Paladox: Add composer-test to Slate Skin [integration/config] - 10https://gerrit.wikimedia.org/r/229180 [12:32:18] (03PS2) 10Paladox: Update Slate skin test [integration/config] - 10https://gerrit.wikimedia.org/r/229176 [12:40:27] (03PS14) 10Hashar: Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [12:42:04] (03CR) 10jenkins-bot: [V: 04-1] Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [12:53:36] (03CR) 10Hashar: "The skins and extensions are very similar. We clone the repositories and run a core testsuite." [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [13:00:19] Why is jenkins failing all of the mediawiki-config patches? [13:00:22] https://integration.wikimedia.org/ci/job/operations-mw-config-phplint/1083/console [13:00:26] 04:12:53 stderr: error: object file .git/objects/38/ba7f77676e6afe4b26f8e1aaf9e0e96c26a908 is empty [13:00:26] 04:12:53 fatal: loose object 38ba7f77676e6afe4b26f8e1aaf9e0e96c26a908 (stored in .git/objects/38/ba7f77676e6afe4b26f8e1aaf9e0e96c26a908) is corrupt [13:06:49] 6Release-Engineering, 6operations, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1570937 (10akosiaris) p:5Triage>3Normal [13:12:45] 10Continuous-Integration-Config, 10Incident-20150312-whitespace: add a check for whitespace before leading 3Low [13:14:05] (03PS3) 10Hashar: Drop mediawiki/extensions/DataTypes for repo deletion [integration/config] - 10https://gerrit.wikimedia.org/r/230936 (https://phabricator.wikimedia.org/T108759) (owner: 10QChris) [13:14:19] (03CR) 10Hashar: [C: 032] "It is being deleted, not archived." [integration/config] - 10https://gerrit.wikimedia.org/r/230936 (https://phabricator.wikimedia.org/T108759) (owner: 10QChris) [13:15:56] (03Merged) 10jenkins-bot: Drop mediawiki/extensions/DataTypes for repo deletion [integration/config] - 10https://gerrit.wikimedia.org/r/230936 (https://phabricator.wikimedia.org/T108759) (owner: 10QChris) [13:16:56] 10Continuous-Integration-Config, 5Patch-For-Review: Remove mediawiki/extensions/DataTypes from CI - https://phabricator.wikimedia.org/T108759#1570962 (10hashar) 5Open>3Resolved a:3hashar [13:18:17] 10Continuous-Integration-Config, 10MediaWiki-extensions-OpenSearchXml, 10Wikimedia-Git-or-Gerrit: Archive the OpenSearchXml extensions - https://phabricator.wikimedia.org/T108213#1570970 (10hashar) Made it read only in Gerrit https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/OpenSearchXml [13:19:27] (03PS1) 10Hashar: Archive extension OpenSearchXml [integration/config] - 10https://gerrit.wikimedia.org/r/233699 (https://phabricator.wikimedia.org/T108213) [13:21:30] (03CR) 10Hashar: [C: 032] Archive extension OpenSearchXml [integration/config] - 10https://gerrit.wikimedia.org/r/233699 (https://phabricator.wikimedia.org/T108213) (owner: 10Hashar) [13:22:09] 10Continuous-Integration-Config, 10MediaWiki-extensions-OpenSearchXml, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: Archive the OpenSearchXml extensions - https://phabricator.wikimedia.org/T108213#1570981 (10hashar) 5Open>3Resolved a:3hashar Archived in CI [13:24:11] (03Merged) 10jenkins-bot: Archive extension OpenSearchXml [integration/config] - 10https://gerrit.wikimedia.org/r/233699 (https://phabricator.wikimedia.org/T108213) (owner: 10Hashar) [13:40:17] (03PS1) 10Aude: Update Wikidata deployment branch to wmf/1.26wmf20 [tools/release] - 10https://gerrit.wikimedia.org/r/233705 [13:40:24] (03CR) 10JanZerebecki: "The error you linked to was a temporary error on the build slave. Not related to the extension or the test configuration." [integration/config] - 10https://gerrit.wikimedia.org/r/227217 (owner: 10Paladox) [13:42:59] (03CR) 10Aude: [C: 032] Update Wikidata deployment branch to wmf/1.26wmf20 [tools/release] - 10https://gerrit.wikimedia.org/r/233705 (owner: 10Aude) [13:43:51] Krenair: jenkins is failing some patches i'm working on too [13:44:00] have you managed to resolve your issue? [13:44:51] By ignoring Jenkins, sure :p [13:44:55] lol [13:45:02] I doubt this actually fixes the repository [13:47:12] (03Merged) 10jenkins-bot: Update Wikidata deployment branch to wmf/1.26wmf20 [tools/release] - 10https://gerrit.wikimedia.org/r/233705 (owner: 10Aude) [13:53:10] (03PS1) 10Hashar: tests: speed up TestZuulLayout [integration/config] - 10https://gerrit.wikimedia.org/r/233709 [13:53:12] (03PS1) 10Hashar: tests: zuul config is no more a class propery [integration/config] - 10https://gerrit.wikimedia.org/r/233710 [13:54:00] (03CR) 10Hashar: [C: 032] tests: zuul config is no more a class propery [integration/config] - 10https://gerrit.wikimedia.org/r/233710 (owner: 10Hashar) [13:55:26] (03CR) 10Hashar: "The Zuul scheduler is not written to, so there is no need to have a new instance for each test :-)" [integration/config] - 10https://gerrit.wikimedia.org/r/233709 (owner: 10Hashar) [14:17:25] Is anyone around who has opinion/authority over the ‘staging’ project? I’d like to migrate a few instances, it will cause those instances to be offline for a few minutes. [14:18:23] thcipriani|afk: ^ ? [14:22:43] andrewbogott: a few minutes is not a big deal—go for it [14:23:06] thanks [14:25:46] Jenkins is unhappy, see https://integration.wikimedia.org/ci/job/mwext-testextension-zend/7104/console ("fatal: loose object 25cda1d8a3c1c9e59ca901e12d0df8b7171e396e (stored in .git/objects/25/cda1d8a3c1c9e59ca901e12d0df8b7171e396e) is corrupt") [14:33:56] 10Continuous-Integration-Infrastructure, 3Collaboration-Team-Current: Jenkins job can't clone VisualEditor, blocking merge - https://phabricator.wikimedia.org/T110184#1571097 (10Mattflaschen) 3NEW [14:34:09] thcipriani|afk: how about deployment-cache-mobile04 — can it tolerate a bit of downtime as well? [14:34:41] Yes [14:36:14] 10Continuous-Integration-Infrastructure, 7Upstream, 7Zuul: zuul status page has double underline in Firefox due to abbr styles - https://phabricator.wikimedia.org/T109747#1571117 (10Krinkle) [14:36:21] ostriches: ok, I’ll do that one next, thank you. [14:36:35] yw [14:36:37] 10Continuous-Integration-Infrastructure, 7Upstream, 7Zuul: zuul status page has double underline in Firefox due to abbr styles - https://phabricator.wikimedia.org/T109747#1558086 (10Krinkle) Indeed. Upstream bootstrap, which in turn upstreams to Normalise.css at https://github.com/necolas/normalize.css/pull/... [14:38:42] hashar: What is the currently deployed Zuul version? Is the upstream patch for https://phabricator.wikimedia.org/T94796 (which is merged) deployed? [14:39:54] looking at the response heeaders for status.json, it doesn't seem to be deployed [14:41:23] 10Beta-Cluster, 10MediaWiki-API, 10MediaWiki-Unit-tests, 7Pywikibot-tests: prevent modules with broken paraminfo being deployed to production wikis - https://phabricator.wikimedia.org/T108322#1571125 (10Krinkle) [14:42:37] 10Continuous-Integration-Infrastructure, 3Collaboration-Team-Current: [Regression] Jenkins job can't clone repos: git/objects/* is corrupt - https://phabricator.wikimedia.org/T110184#1571132 (10Krinkle) p:5Triage>3Unbreak! [14:43:33] 10Continuous-Integration-Infrastructure, 3Collaboration-Team-Current: [Regression] Jenkins job can't clone repos: git/objects/* is corrupt - https://phabricator.wikimedia.org/T110184#1571097 (10Krinkle) The same is happening for other extensions and mediawiki-core: See https://integration.wikimedia.org/ci/job... [14:46:45] Krinkle: looking at https://integration.wikimedia.org/ci/job/mwext-testextension-zend/7104/console [14:46:58] Krinkle: Zuul is deployed using debian packages [14:47:11] maybe I screwed up something :/ [14:47:39] so that is currently 2.0.0-327-g3ebedde-wmf3precise1 [14:48:48] the cache patch is not included apparently [14:50:02] hashar_: #wikimedia-dev [14:50:37] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31202 bytes in 0.624 second response time [14:50:38] currently blocking deployments, mediawiki cant clone extensions in https://integration.wikimedia.org/ci/job/mwext-qunit/4402/console [14:55:54] !log dropping all workspaces from integration-slave-precise-1014 . Some .git repos in workspaces might be corrupted [14:55:57] Krinkle: ^^^ [14:55:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:55:59] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1571173 (10Krinkle) [14:56:01] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: Reenable resolv.conf ndots:2 option on CI instances - https://phabricator.wikimedia.org/T105297#1571171 (10Krinkle) 5Open>3declined It seems this is obsolete with the new DNS infrastructure. See T92351 for details. I've undeployed ou... [14:57:23] hashar: mwext-qunit that is failing ran on integration-slave-trusty-1017, not precise [14:57:33] And other failures on other slaves, too. [14:57:46] labvirt1007 was full yesterday [14:57:51] and three instances were paused [14:58:02] I put them back online this morning, but I guess they have some disk corruption :( [14:58:18] andrewbogott: seems the resume instances from yesterday ended up with some disk corruption [14:58:24] hashar: Which three? [14:58:33] Let's simply destroy the instances and re-create them? [14:58:42] !sal [14:58:42] https://tools.wmflabs.org/sal/releng [14:58:53] k integration-slave-precise-1014 , integration-slave-trusty-1014 and integration-slave-trusty-1017 . labvirt1007 missed disk space ( https://phabricator.wikimedia.org/T110052 ) [14:58:54] hashar: dang :( what kind of corruption? [14:58:57] I've updated https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup for you. [14:58:57] those three [14:59:11] I would’ve thought that things would just error out with disk full [14:59:11] andrewbogott: we have a bunch of jobs apparently falling solely on those instances [14:59:13] example is https://integration.wikimedia.org/ci/job/mwext-qunit/4403/console [14:59:20] where git consider some object to be empty / corrupt [14:59:33] stderr: 'error: object file .git/objects/a3/3b082c13c0f154fc498e51977f380423c4572f is empty [14:59:46] maybe the VM created the file but failed to actually write the content [15:00:04] yeah, I could believe that that’s just a symptom of disk full [15:00:07] likely. that's how git would complain if you have an empty object file. [15:00:11] Can you just re-do the git checkout? [15:00:27] $ git fsck [15:00:27] error: object file .git/objects/00/3124078a04dce9370de7f1eca062437ef88fe2 is empty [15:00:27] error: object file .git/objects/00/3124078a04dce9370de7f1eca062437ef88fe2 is empty [15:00:27] fatal: loose object 003124078a04dce9370de7f1eca062437ef88fe2 (stored in .git/objects/00/3124078a04dce9370de7f1eca062437ef88fe2) is corrupt [15:00:28] :D [15:00:38] -r--r--r-- 1 jenkins-deploy wikidev 0 Aug 24 14:19 .git/objects/a3/3b082c13c0f154fc498e51977f380423c4572f [15:00:43] [15:00 UTC] krinkle at integration-slave-trusty-1017.integration.eqiad.wmflabs in /mnt/jenkins-workspace/workspace/mwext-qunit/src/extensions/VisualEditor (*) [15:01:23] hashar: Try rm'ing all the corrupt/empty objects, then re-fetching from origin. [15:01:29] That'll either fix it. [15:01:31] Or fubar it [15:01:43] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore - https://phabricator.wikimedia.org/T110052#1571184 (10hashar) 5Resolved>3Open So apparently some git repos in Jenkins workspace ended up bei... [15:01:52] andrewbogott: Easier said then done. We've got dozens of slaves and close to a 1000 workspaces each with between 1 and 20 git repos cloned inside. [15:01:52] In which case you'll have to reclone anyway, which is your only other option really [15:02:44] !log tashing workspaces on integration-slave-trusty-1014 and integration-slave-trusty-1017 ( https://phabricator.wikimedia.org/T110052#1571184 ) [15:02:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:04:11] !log soft rebooting integration-slave-trusty-1014 (ssh dead) [15:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:04:41] andrewbogott: yeah I am deleting all the workspaces on the affected instances. The jenkins jobs will just reclone them [15:04:53] hashar: ok, let me know how it goes [15:07:42] hehe [15:07:45] /bin/sh: 1: exec: cloud-init: not found [15:07:55] andrewbogott: seems the instances are corrupted beyond repair :-} [15:08:14] dang [15:08:32] hashar: they can be rebuilt, right? [15:09:24] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore - https://phabricator.wikimedia.org/T110052#1571198 (10hashar) integration-slave-trusty-1014 on boot reports: /bin/sh: 1: exec: cloud-init:... [15:09:52] andrewbogott: yeah we can rebuild them just fine. Everything is in puppet and the bits that can't be puppetized are well documented [15:10:03] ok. Sorry for the mess :( [15:10:07] andrewbogott: but it seems any instance on labvirt1007 might well suffer from corruption [15:10:14] specially if they got paused at some point [15:10:24] yeah. We’ll have to see what shakes out [15:10:52] !log unpooling and deleting integration-slave-trusty-1014 integration-slave-trusty-1017 and integration-slave-precise-1014 . They are most probably corrupted ( https://phabricator.wikimedia.org/T110052#1571184 ) [15:10:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:12:39] andrewbogott: not a big deal. We have a large farm :-} [15:13:40] PROBLEM - Host integration-slave-precise-1014 is DOWN: CRITICAL - Host Unreachable (10.68.18.38) [15:13:55] PROBLEM - Host integration-slave-trusty-1014 is DOWN: CRITICAL - Host Unreachable (10.68.18.29) [15:14:46] 10Browser-Tests, 10CirrusSearch, 6Discovery: Upgrade CirrusSearch browser tests to use mediawiki_selenium 1.x - https://phabricator.wikimedia.org/T99653#1571209 (10dduvall) a:5dduvall>3None [15:15:31] PROBLEM - Host integration-slave-trusty-1017 is DOWN: CRITICAL - Host Unreachable (10.68.17.136) [15:17:38] 10Browser-Tests, 10Gather: Upgrade Gather browser tests to use mediawiki_selenium 1.x - https://phabricator.wikimedia.org/T99654#1571225 (10dduvall) 5Open>3Resolved a:3dduvall This was done by the Reading team as part of {T100293}. [15:17:39] 6Release-Engineering, 7Tracking: Update repositories that use mediawiki_selenium Ruby gem 1.x (tracking) - https://phabricator.wikimedia.org/T94083#1571230 (10dduvall) [15:18:56] 10Browser-Tests, 5Patch-For-Review, 5Testing Initiative 2015: [Spike] Decouple MW-Selenium from Cucumber - https://phabricator.wikimedia.org/T108273#1571236 (10dduvall) 5Open>3Resolved [15:19:09] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore - https://phabricator.wikimedia.org/T110052#1571237 (10hashar) I have poked the #releng team mailing list to pair the rebuild with someone. [15:27:44] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5MW-1.26-release, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1571244 (10zeljkofilipin) This is a problem for wikidata jobs, that take hours to run in CI. Things to do:... [15:33:46] thcipriani: So, T109861 is kind of vague. I get the general idea, but it's hard to get an idea of where/how to start. [15:34:06] (python is also like my 4th language, so there's that) [15:34:38] yeah, I was thinking about how to break that out as well. [15:35:45] so in order to keep the ability to run all tasks on hosts serially it might be best to keep them as one remote action (deploylocal) [15:36:02] which would mean you could just start looking into the better output task [15:36:35] 5Continuous-Integration-Isolation, 3releng-201516-q1: [keyresult] subset of jobs run in disposable instances - https://phabricator.wikimedia.org/T109914#1571282 (10greg) [15:36:35] 5Continuous-Integration-Isolation, 3releng-201516-q1: [keyresult] boot instances from OpenStack API - https://phabricator.wikimedia.org/T109913#1571278 (10greg) 5Open>3Resolved p:5Low>3Normal a:3hashar [15:36:37] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Isolation, 7Epic, 3releng-201415-Q3, and 3 others: [keyresult] Run CI jobs in disposable VMs - https://phabricator.wikimedia.org/T47499#1571284 (10greg) [15:36:57] 6Release-Engineering, 6Phabricator, 10Phabricator-Sprint-Extension, 5Patch-For-Review: Create a continuous integration plan for Wikimedia Phabricator patches - https://phabricator.wikimedia.org/T85123#1571290 (10dduvall) [15:36:59] T109858 this was what I ended up thinking might be the right idea after futzing with rearraging code for a while [15:36:59] 10Browser-Tests, 6Phabricator, 10Phabricator-Sprint-Extension, 7Upstream: Create Browser Tests for Phabricator - https://phabricator.wikimedia.org/T87359#1571287 (10dduvall) 5Open>3Resolved a:3dduvall FWICT, the commit associated with this task was completed. If you want to do more with browser tests... [15:37:47] ostriches: but I definitely think you had the right idea in breaking out service deploy from main, in terms of code maintainability. [15:38:00] I think basically everything should leave main [15:38:08] 6Release-Engineering: Organize browsertests/Selenium training - https://phabricator.wikimedia.org/T100170#1571299 (10zeljkofilipin) [15:38:10] 10Browser-Tests: Write the first browsertests/Selenium test - https://phabricator.wikimedia.org/T94024#1571296 (10zeljkofilipin) 5Open>3Resolved a:3zeljkofilipin We had this session at #Wikimedia-Hackathon-2015. We did not have the session at #Wikimania-Hackathon-2015. Nothing left to do here. [15:38:52] 6Release-Engineering: Organize browsertests/Selenium training - https://phabricator.wikimedia.org/T100170#1307198 (10zeljkofilipin) [15:38:54] 10Browser-Tests: Fix broken browsertests/Selenium Jenkins jobs - https://phabricator.wikimedia.org/T94299#1571301 (10zeljkofilipin) 5Open>3Resolved a:3zeljkofilipin We had this session at #Wikimedia-Hackathon-2015. We did not have the session at #Wikimania-Hackathon-2015. Nothing left to do here. [15:38:55] 17 classes and counting :p [15:40:18] 10Browser-Tests: Investigate distribution of browser test run time - https://phabricator.wikimedia.org/T104396#1571312 (10hashar) One of the slowdown are the network roundtrips to SauceLabs. In some cases there are thousands of them and that is tracked at T92613 [15:41:25] heh, yeah, I think that restructuring probably needs to happen. Breaking out deploylocal into multiple tasks might be the right thing, but, again, you'd have to figure out how run all tasks in serial against an n-sized batch of hosts [15:41:32] 10Browser-Tests, 6Collaboration-Team-Backlog, 10Flow: Flow's Edit existing post test fails if post not in view - https://phabricator.wikimedia.org/T59702#1571315 (10zeljkofilipin) 5Open>3Invalid a:3zeljkofilipin No reply in more than a month. Resolving as invalid. Please reopen if this is still a problem. [15:42:20] I think it might be easier to checkout the logging and reporting of tasks: maybe a restructure is not blocking better output. [15:42:49] 10Browser-Tests: Investigate using the sikuli-like Applitools framework for visual testing - https://phabricator.wikimedia.org/T90884#1571319 (10zeljkofilipin) p:5Low>3Normal [15:46:49] thcipriani: I'll have a look at that, good idea [15:46:58] 10Browser-Tests, 6Collaboration-Team-Backlog, 10WikiLove: Update WikiLove repository to mediawiki_selenium Ruby gem 1.1 - https://phabricator.wikimedia.org/T99660#1571336 (10zeljkofilipin) p:5High>3Normal [15:47:02] 10Browser-Tests, 6Collaboration-Team-Backlog, 10WikiLove: Update WikiLove repository to mediawiki_selenium Ruby gem 1.1 - https://phabricator.wikimedia.org/T99660#1571339 (10zeljkofilipin) p:5Normal>3High a:3zeljkofilipin [15:49:17] 10Browser-Tests: Investigate distribution of browser test run time - https://phabricator.wikimedia.org/T104396#1571355 (10zeljkofilipin) p:5Triage>3Normal [15:49:29] 10Browser-Tests: Investigate distribution of browser test run time - https://phabricator.wikimedia.org/T104396#1571357 (10zeljkofilipin) a:3zeljkofilipin [15:50:00] 10Browser-Tests, 10MobileFrontend: Chrome window resizes without height in @integration tests - toggling test fails - https://phabricator.wikimedia.org/T109794#1571361 (10hashar) Can this be a dupe of {T88288} ? [15:50:18] 10Browser-Tests, 10MobileFrontend: Chrome window resizes without height in @integration tests - toggling test fails - https://phabricator.wikimedia.org/T109794#1571365 (10hashar) p:5Triage>3Normal [15:50:22] 10Browser-Tests: Use the API URL as a single entry point and infer the rest by querying siteinfo - https://phabricator.wikimedia.org/T103763#1571366 (10zeljkofilipin) p:5Triage>3Low [15:50:53] 10Browser-Tests, 10Wikidata: adapt wikidata_api gem to be compatible with mediawiki_api gem version 0.4 - https://phabricator.wikimedia.org/T106811#1571369 (10zeljkofilipin) p:5Triage>3High [15:51:09] 10Browser-Tests, 10Wikidata: adapt wikidata_api gem to be compatible with mediawiki_api gem version 0.4 - https://phabricator.wikimedia.org/T106811#1571371 (10dduvall) a:3dduvall [15:52:08] 10Browser-Tests, 10MobileFrontend: Chrome window resizes without height in @integration tests - toggling test fails - https://phabricator.wikimedia.org/T109794#1571377 (10dduvall) [15:52:37] 10Browser-Tests, 5Testing Initiative 2015: Experiment with browser testing in other software languages - https://phabricator.wikimedia.org/T108874#1571382 (10zeljkofilipin) p:5Triage>3Low [15:53:36] 10Browser-Tests, 5Testing Initiative 2015: Add documentation for release process of MW-Selenium - https://phabricator.wikimedia.org/T108873#1571390 (10zeljkofilipin) p:5Triage>3Low a:3zeljkofilipin [15:54:08] 10Browser-Tests, 5Testing Initiative 2015: Add documentation for release process of MW-Selenium - https://phabricator.wikimedia.org/T108873#1532722 (10zeljkofilipin) p:5Low>3Normal [15:54:33] 10Browser-Tests, 6Release-Engineering, 5Testing Initiative 2015: Improve browser testing page with templates : Emphasize testing documentation on mediawiki.org - https://phabricator.wikimedia.org/T108110#1571396 (10zeljkofilipin) p:5Triage>3Normal [15:55:35] 10Browser-Tests: Browser tests should support Ubuntu Chromium - https://phabricator.wikimedia.org/T63262#1571400 (10hashar) @dduvall aren't the browser tests triggered from Gerrit already using Chromium? Seems it is all fine nowadays. [16:01:37] hashar: welcome back :) [16:03:30] legoktm: \O/ [16:04:08] legoktm: I still have a pull request at https://github.com/legoktm/tools-ci :-)) [16:04:14] that was is on my radar [16:04:27] I would love to have it run from our Jenkins and have the output generated automatically [16:04:35] that would be awesome [16:04:42] I have a bunch of other use cases in mind such as checking the REL branches [16:04:49] and the ruby gems versions :D [16:05:13] "bug reports" are currently on the talk page: https://www.mediawiki.org/wiki/User_talk:Legoktm/ci [16:06:32] oh [16:06:50] I guess that can be moved under Gerrit / Phabricator :-} [16:06:56] (03CR) 10Legoktm: [C: 032] "42s → 11s on my system. \o/" [integration/config] - 10https://gerrit.wikimedia.org/r/233709 (owner: 10Hashar) [16:07:05] * hashar needs a better computer [16:07:11] :P [16:07:47] (03Merged) 10jenkins-bot: tests: speed up TestZuulLayout [integration/config] - 10https://gerrit.wikimedia.org/r/233709 (owner: 10Hashar) [16:07:48] (03Merged) 10jenkins-bot: tests: zuul config is no more a class propery [integration/config] - 10https://gerrit.wikimedia.org/r/233710 (owner: 10Hashar) [16:08:46] meeting time [16:11:09] 10Browser-Tests: Investigate using the sikuli-like Applitools framework for visual testing - https://phabricator.wikimedia.org/T90884#1571450 (10zeljkofilipin) p:5Normal>3Low [16:15:25] 6RelEng-Admin, 15User-greg: Create KPIs for #releng-201516-Q2 - https://phabricator.wikimedia.org/T107905#1571453 (10greg) [16:26:43] 6Release-Engineering: (To discuss) Implement "Request time to response" KPI - https://phabricator.wikimedia.org/T108993#1571524 (10hashar) From our team meeting, we have a bunch of feature/enhancement requests which we know will take a while to respond to. We could potentially just track the maintenance / defec... [16:29:20] 10Continuous-Integration-Infrastructure, 6Release-Engineering, 7Zuul: Implement "Jenkins queue wait" KPI - https://phabricator.wikimedia.org/T108750#1571627 (10hashar) [16:30:54] 10Continuous-Integration-Infrastructure, 6Release-Engineering, 7Zuul: Implement "Jenkins/Zuul queue wait" KPI - https://phabricator.wikimedia.org/T108750#1571634 (10hashar) [16:34:36] 10Beta-Cluster, 3Reading-Web-Sprint-54-28-Days-Later: Get QuickSurveys enabled on beta cluster - https://phabricator.wikimedia.org/T110199#1571710 (10Jdlrobson) 3NEW [16:36:35] 10Beta-Cluster, 3Reading-Web-Sprint-54-28-Days-Later: Get QuickSurveys enabled on beta cluster - https://phabricator.wikimedia.org/T110199#1571710 (10Jdlrobson) [16:38:26] 10Continuous-Integration-Infrastructure: Track and graph mean time to merge - https://phabricator.wikimedia.org/T70114#1571777 (10hashar) Additional note: I am not sure what `resident_time` is. Potentially that is the different between enqueue and dequeue time. If a change is reenqueued in the gate-and-submit... [16:50:44] 10Continuous-Integration-Config, 5Patch-For-Review, 7Puppet: Setup rubocop for operations/puppet ruby code lints - https://phabricator.wikimedia.org/T102020#1571918 (10greg) >>! In T102020#1567220, @zeljkofilipin wrote: > @hashar: any idea on which folders contain third party code? redirect this question to... [17:03:33] bd808: anomie: my bus is delayed, I'll be 10 min late [17:04:16] 6Release-Engineering: Organize browsertests/Selenium training - https://phabricator.wikimedia.org/T100170#1572000 (10zeljkofilipin) a:3zeljkofilipin [17:04:18] 10Continuous-Integration-Config, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, 3LE-CX6-Sprint 2, 5Patch-For-Review: ContentTranslation is not running PHPUnit structure tests - https://phabricator.wikimedia.org/T109670#1572001 (10Amire80) [17:05:34] 10Browser-Tests, 7I18n, 5Patch-For-Review: Hacking: Load i18n messages from MediaWiki to browser tests - https://phabricator.wikimedia.org/T90577#1572011 (10zeljkofilipin) @vikassy: Is there a reason this is still WIP? Didn't we finish this during wikimania? [17:06:52] 6RelEng-Admin, 15User-greg: Create KPIs for #releng-201516-Q2 - https://phabricator.wikimedia.org/T107905#1572021 (10greg) [17:07:28] 6Release-Engineering: (To discuss) Implement "Request time to response" KPI - https://phabricator.wikimedia.org/T108993#1572023 (10greg) [17:07:30] 10Continuous-Integration-Infrastructure, 6Release-Engineering, 7Jenkins: Implement "Jenkins uptime" KPI - https://phabricator.wikimedia.org/T108769#1572024 (10greg) [17:07:31] 10Deployment-Systems, 6Release-Engineering: Implement "new weekly release deploy duration" KPI - https://phabricator.wikimedia.org/T108742#1572026 (10greg) [17:07:33] 10Continuous-Integration-Infrastructure, 6Release-Engineering: Implement "CI overhead" KPI - https://phabricator.wikimedia.org/T108751#1572025 (10greg) [17:07:35] 6RelEng-Admin, 15User-greg: Create KPIs for #releng-201516-Q2 - https://phabricator.wikimedia.org/T107905#1506866 (10greg) [18:19:26] 10Beta-Cluster, 3Reading-Web-Sprint-54-28-Days-Later: Get QuickSurveys enabled on beta cluster - https://phabricator.wikimedia.org/T110199#1572615 (10Jdlrobson) [18:52:03] 10Continuous-Integration-Infrastructure: [Regression] Jenkins job can't clone repos: git/objects/* is corrupt - https://phabricator.wikimedia.org/T110184#1572708 (10SBisson) [18:58:09] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1572710 (10hashar) [18:59:00] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1567858 (10hashar) [18:59:13] 10Continuous-Integration-Infrastructure: Jenkins job can't clone repos: git/objects/* is corrupt - https://phabricator.wikimedia.org/T110184#1572713 (10hashar) [18:59:24] 10Continuous-Integration-Infrastructure: Jenkins job can't clone repos: git/objects/* is corrupt - https://phabricator.wikimedia.org/T110184#1571097 (10hashar) [18:59:25] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1567858 (10hashar) [18:59:48] 10Continuous-Integration-Infrastructure: Jenkins job can't clone repos: git/objects/* is corrupt - https://phabricator.wikimedia.org/T110184#1571097 (10hashar) Some instances have been corrupted ( T110052 ) :-( [19:07:06] ostriches: have you ever tried to rename repositories in Gerrit? (including updating changes?) [19:08:08] I have the use case for CI , since it is kind of hard to differentiate between live and obsolete repositories [19:08:14] Yes. [19:08:24] And no, I won't do it again. [19:08:43] I am volunteering to suffer the pain :D [19:09:19] It's a manual process, requires root to ytterbium, and leaves no redirect for the old name behind, so you any old links *will break* [19:09:26] We did it once. Maybe twice. [19:09:39] old links in Gerrit you mean ? [19:09:46] or on git.wm.o / github [19:09:59] Oh yeah, github will have to be updated to the new repo too [19:10:01] for obsoletes repos I am fine breaking links [19:10:08] git.wm.o can go die in a fire [19:10:10] or even stop replicating to github [19:10:11] * ostriches doesn't care about it [19:10:12] hehe [19:10:12] :p [19:10:25] I noticed git.wm.o never have been updated [19:10:41] anyway that is going to be replaced by Phabricator thingy, so not an issue [19:11:09] why do you need root access on ytterbium for ? isn't it enough to have gerrit2 access? [19:12:38] Eh, gerrit2 might be enough [19:12:52] I'm mainly thinking permissions on /var/lib/gerrit2/review_site/git/ [19:12:57] Which *should* be gerrit2:gerrit2 [19:13:39] I guess I can sort that out with european ops :-} [19:14:44] find . -not -user gerrit2 -exec ls -ld {} \; [19:14:44] -rw-r--r-- 1 root root 91 Mar 25 10:26 ./operations/debs/statsite.git/config [19:14:44] \O/ [19:15:29] will keep that for renaming thingy for later on though. [19:15:38] was just wondering what kind of wall / madness you ended up with [19:21:15] It's just incredibly manual and error prone [19:21:26] ....Three tables you have to update? [19:21:27] I think [19:21:29] Maybe 4. [19:22:08] the openstack doc ( http://docs.openstack.org/infra/system-config/gerrit.html#renaming-a-project ) only mentions account_project_watches and changes [19:22:19] but yeah need to be verified I guess [19:24:14] changes, projects, account_project_watches [20:20:07] Project beta-update-databases-eqiad build #2444: FAILURE in 6.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/2444/ [20:54:37] 10Continuous-Integration-Infrastructure: Track and graph mean time to merge - https://phabricator.wikimedia.org/T70114#1573197 (10hashar) I looked a bit more at it. When a change enter a Zuul pipeline, the time is saved. When the change leave the pipeline (the jobs are completed and the job reported, the change... [20:56:47] 10Continuous-Integration-Infrastructure: Track and graph mean time to merge - https://phabricator.wikimedia.org/T70114#1573202 (10hashar) [20:56:48] 10Continuous-Integration-Infrastructure, 6Release-Engineering, 7Zuul: Implement "Jenkins/Zuul queue wait" KPI - https://phabricator.wikimedia.org/T108750#1573201 (10hashar) [20:57:09] 10Continuous-Integration-Infrastructure, 6Release-Engineering, 7Zuul: Implement "Jenkins/Zuul queue wait" KPI - https://phabricator.wikimedia.org/T108750#1529519 (10hashar) The metric investigation spam is under sub task T70114 [21:01:25] 10Continuous-Integration-Config, 3Mobile-App-Sprint-64-Android-Gadolinium, 5Patch-For-Review, 3Wikipedia-Android-App: Jenkins should run tests for the Wikipedia app before merge - https://phabricator.wikimedia.org/T62720#1573240 (10Niedzielski) [21:08:53] greg-g: KPI KPI https://phabricator.wikimedia.org/T70114#1573197 ! :-D [21:09:13] seems Zuul emits more or less metrics we could use for the queue time KPI [21:12:08] hashar: sweet, I'll look at more deeply in a bit [21:12:53] potentially one could create a nice dashboard on https://grafana.wikimedia.org/ [21:12:58] for the KPI [21:13:06] yeah, that'd be awesome [21:13:38] I never noticed the dashboards dashboard on that homepage [21:13:39] :) [21:14:18] that interface could use some training sessions :D [21:16:03] I don't even understand how to add a graph to it [21:16:09] it might be in puppet? [21:16:48] https://www.mediawiki.org/w/index.php?search=grafana&title=Special%3ASearch&go=Go :( :( [21:17:19] ah https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org [21:20:27] adding a grafana dashboard is easy. Its basically the same as saving a search in Kibana. Grafana was originally a fork of Kibana (hence the similar name) [21:21:00] I think that landing page is driven by "tags" on the dashboards [21:21:41] oh even easier, it's just html in widgets [21:22:23] I think I will just rant about grafana/kibana terrible UX until someone build the dashboard :-D [21:23:00] https://commons.wikimedia.org/wiki/File:ELK_Tech_Talk_2015-08-20.pdf :) [21:24:48] bd808: yeah still have to watch your youtube video :-} [21:37:14] bd808: https://grafana.wikimedia.org/#/dashboard/db/releng-zuul !!! [21:37:16] thank you :) [21:39:21] 10Continuous-Integration-Infrastructure: Track and graph mean time to merge - https://phabricator.wikimedia.org/T70114#1573359 (10hashar) Or in Grafana: https://grafana.wikimedia.org/#/dashboard/db/releng-zuul [21:43:23] neat :) [21:48:19] and sleep [22:20:08] 10Continuous-Integration-Infrastructure: Track and graph mean time to merge - https://phabricator.wikimedia.org/T70114#1573538 (10greg) a:3hashar [22:21:12] 10Continuous-Integration-Infrastructure: Track and graph mean time to merge - https://phabricator.wikimedia.org/T70114#734749 (10greg) a:5hashar>3None I assigned to Antoine to run point, but now I'm unassigning. I'll let @hashar, @dduvall, and @zeljkofilipin figure it out :) [23:08:26] Yippee, build fixed! [23:08:27] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #241: FIXED in 11 min: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/241/ [23:24:48] 10Browser-Tests: Browser tests should support Ubuntu Chromium - https://phabricator.wikimedia.org/T63262#1573688 (10dduvall) 5Open>3Resolved a:3dduvall Yes, we're reliably executing browser tests against Gerrit patches using chromium + chrome driver. (See {T103039}) [23:33:22] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<11.11%) [23:43:06] !log reboot deployment-puppetmaster unreachable from other vms (labvirt1007 thing, probably) [23:43:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:44:07] RECOVERY - Host deployment-puppetmaster is UPING OK - Packet loss = 0%, RTA = 0.81 ms [23:47:13] !log deployment-puppetmaster showing signs of a corrupt disk "error: object file .git/objects/cc/026ba0cdc872490ef6a616b2bac4bb829639cd is empty" shutting it off for now. [23:47:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:48:51] !log stopping puppetmaster and disabling puppet runs on deployment-puppetmaster until we get a change to diagnose/rebuild (tomorrow) [23:48:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:50:45] PROBLEM - Puppet staleness on deployment-puppetmaster is CRITICAL 42.86% of data above the critical threshold [43200.0] [23:53:20] 6Release-Engineering, 10MediaWiki-Debug-Logger, 6Reading-Infrastructure-Team, 10Wikimedia-Logstash, and 2 others: Log php fatals with full backtraces again (fatal.log on fluorine) - https://phabricator.wikimedia.org/T89169#1573731 (10bd808) I spent some time last weekend digging around in the HHVM source c... [23:56:08] RECOVERY - Puppet failure on deployment-urldownloader is OK Less than 1.00% above the threshold [0.0] [23:56:25] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [23:57:17] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [23:59:09] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [23:59:15] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [23:59:29] RECOVERY - Puppet failure on deployment-elastic06 is OK Less than 1.00% above the threshold [0.0]