[00:44:43] 10Beta-Cluster-Infrastructure, 03Scap3: Automate beta scap3/keyholder setup - https://phabricator.wikimedia.org/T144647#2665966 (10thcipriani) [00:56:20] 03Scap3: scap deploy-local should make fewer assumptions about server/directories - https://phabricator.wikimedia.org/T146602#2665967 (10thcipriani) [00:57:31] 03Scap3: scap deploy-local should make fewer assumptions about server/directories - https://phabricator.wikimedia.org/T146602#2665979 (10thcipriani) p:05Triage>03High a:03thcipriani Current plan: ``` scap deploy-local [repo-path] [local-checkout-path] ``` This makes scap look more like `git clone`. Exa... [07:02:06] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:08:38] 10Beta-Cluster-Infrastructure, 10scap: On beta, puppet keeps downgrading scap ! - https://phabricator.wikimedia.org/T146618#2666309 (10hashar) [08:09:37] PROBLEM - Puppet run on deployment-jobrunner02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [08:14:37] RECOVERY - Puppet run on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:58:27] 10Beta-Cluster-Infrastructure, 10Deployment-Systems: mediawiki::users::mwdeploy_pub_key hiera key should be purge - https://phabricator.wikimedia.org/T145495#2666447 (10hashar) p:05Triage>03Low [09:58:34] 10Beta-Cluster-Infrastructure, 10Mathoid: Move mathoid to deployment-sca* hosts in Beta Cluster - https://phabricator.wikimedia.org/T142255#2666449 (10hashar) p:05Triage>03Normal [09:58:42] 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible, 07Easy, 07Puppet: "Connect to 'deployment.eqiad.wmnet' instead" when you ssh into deployment-tin on Beta - https://phabricator.wikimedia.org/T146505#2666450 (10hashar) p:05Triage>03Low [09:58:49] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Build an apt repository on deployment-prep (for testing packages from jenkins) - https://phabricator.wikimedia.org/T146497#2666451 (10hashar) p:05Triage>03Normal [09:59:07] 10Beta-Cluster-Infrastructure, 10scap: On beta, puppet keeps downgrading scap ! - https://phabricator.wikimedia.org/T146618#2666453 (10hashar) [09:59:09] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Build an apt repository on deployment-prep (for testing packages from jenkins) - https://phabricator.wikimedia.org/T146497#2663278 (10hashar) [09:59:27] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Build an apt repository on deployment-prep (for testing packages from jenkins) - https://phabricator.wikimedia.org/T146497#2663278 (10hashar) Doesn't work quite well with `scap` which is pinned to a specific version i... [09:59:42] 10Beta-Cluster-Infrastructure: beta cluster: Warning: failed to mkdir "/srv/mediawiki/php-master/images/thumb/... - https://phabricator.wikimedia.org/T145496#2666457 (10hashar) p:05Triage>03High [10:00:21] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10Wikimedia-General-or-Unknown: Allow to test a mediawiki-config change to the beta cluster - https://phabricator.wikimedia.org/T136828#2666458 (10hashar) p:05Triage>03Low a:05hashar>03None [10:05:37] 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 06Operations: Puppet sslcert::ca does not refresh the certificate symlinks when a .crt is updated - https://phabricator.wikimedia.org/T145609#2666467 (10hashar) [10:06:01] 10Beta-Cluster-Infrastructure, 10scap: On beta, puppet keeps downgrading scap ! - https://phabricator.wikimedia.org/T146618#2666471 (10hashar) p:05Triage>03Normal [10:06:07] 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 06Operations: Puppet sslcert::ca does not refresh the certificate symlinks when a .crt is updated - https://phabricator.wikimedia.org/T145609#2635904 (10hashar) p:05Triage>03Normal [10:09:04] 10Beta-Cluster-Infrastructure, 05Goal, 07Tracking: Consolidate, remove, and/or downsize Beta Cluster instances to help with [[wikitech:Purge_2016]] - https://phabricator.wikimedia.org/T142288#2666473 (10hashar) [10:12:45] 10Beta-Cluster-Infrastructure, 06Labs, 10Labs-Infrastructure, 07Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#2666479 (10hashar) [10:12:46] 10Beta-Cluster-Infrastructure: Diamond logstash monitor fills /var/log/apache2 access log - https://phabricator.wikimedia.org/T74175#2666476 (10hashar) 05Open>03Resolved a:03hashar That no more appear. The main reason was the /var being too small which is no more the case today. [10:15:33] PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [10:20:53] 10Beta-Cluster-Infrastructure, 06Operations: Check status of under_NDA group - https://phabricator.wikimedia.org/T142822#2666497 (10hashar) 05Open>03Resolved a:03hashar I have removed the group. Every project members already had root access anyway. [10:23:35] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10Monitoring, 07Jenkins, 07Technical-Debt: Create metrics of Beta Cluster stability using a Jenkins job - https://phabricator.wikimedia.org/T106421#2666504 (10hashar) 05Open>03declined I am declining this, seems it is not much of a subject an... [10:24:18] 10Beta-Cluster-Infrastructure: Setup multiversion on Beta Cluster for nightly build browser testing support - https://phabricator.wikimedia.org/T67127#2666508 (10hashar) [10:24:20] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Config, 07WorkType-NewFunctionality: use a deployment branch for beta - https://phabricator.wikimedia.org/T130045#2666510 (10hashar) [10:25:34] 10Beta-Cluster-Infrastructure, 10Wikimedia-Site-requests: On beta metawiki, a mix of the beta enwiki and the production metawiki logos show - https://phabricator.wikimedia.org/T125942#2666511 (10hashar) p:05Low>03Lowest [10:27:32] 10Continuous-Integration-Infrastructure, 10DBA, 10MediaWiki-Database: Enable MariaDB/MySQL strict mode on CI slaves - https://phabricator.wikimedia.org/T119371#2666528 (10hashar) [10:28:47] !log beta: on deployment-pdf01 rm -fR /home/cscott/.npm/ T145343 [10:28:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:29:37] !log deployment-pdf01 apt-get upgrade / cleaning files left over etc [10:29:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:32:54] !log beta: on deployment-pdf01 rm -fR /home/cscott/tmp/npm* [10:32:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:35:02] 10Beta-Cluster-Infrastructure: deployment-pdf01 low free space warning - https://phabricator.wikimedia.org/T145343#2666565 (10hashar) 05Open>03Resolved a:03hashar [10:36:03] 10Beta-Cluster-Infrastructure, 06Discovery, 10Wikimedia-Portals, 13Patch-For-Review, 07Puppet: beta-mediawiki-config-update-eqiad failing with merge conflict in portals - https://phabricator.wikimedia.org/T129427#2666573 (10hashar) 05Open>03Resolved Havent seen this one happening again. I am assuming... [10:36:47] 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2666578 (10hashar) [10:38:26] 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2666597 (10hashar) The deployment servers have been reimaged to Jessie: * deployment-mira * deployment-tin02 Last patch to land is https://gerrit.wiki... [10:40:04] 10Continuous-Integration-Config: jenkins debian-glue job should use Wikimedia debian mirror - https://phabricator.wikimedia.org/T145508#2666599 (10hashar) p:05Triage>03Low [10:40:40] 10Continuous-Integration-Infrastructure: PHP7 support in CI (tracking) - https://phabricator.wikimedia.org/T144964#2666604 (10hashar) [10:40:42] 10Continuous-Integration-Config: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#2666602 (10hashar) 05Open>03stalled This is stalled until we get more capacity for CI. [10:41:09] 05Continuous-Integration-Scaling, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2666606 (10hashar) [10:41:10] 10Continuous-Integration-Config: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#2666605 (10hashar) [10:42:20] 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests, 10netops, 13Patch-For-Review: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2666607 (10hashar) 05Open>03Resolved We had contint1001 allocated. It has a p... [10:43:25] 10Continuous-Integration-Infrastructure: On Trusty PHP yields: PHP Deprecated: Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/20-xhprof.ini on line 2 - https://phabricator.wikimedia.org/T135338#2666609 (10hashar) [10:44:45] 10Continuous-Integration-Infrastructure: On Trusty PHP yields: PHP Deprecated: Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/20-xhprof.ini on line 2 - https://phabricator.wikimedia.org/T135338#2295773 (10hashar) 05Open>03declined We will live with that warning. Upstream hasn't replied on... [10:45:16] 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Disable core dumps generation on CI labs slaves - https://phabricator.wikimedia.org/T96025#2666626 (10hashar) 05stalled>03Resolved Closing again. The core dumps are actually quite useful and they are automatically garbage collected by Puppet. [10:50:32] RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [10:50:57] 10Beta-Cluster-Infrastructure, 06Operations, 07Puppet: Make deployment-prep puppetmaster more similar to Production puppetmaster - https://phabricator.wikimedia.org/T146627#2666629 (10yuvipanda) [10:51:19] 10Continuous-Integration-Infrastructure, 07Epic, 13Patch-For-Review: Provide (pre-merge) code coverage reports on patchsets - https://phabricator.wikimedia.org/T101544#2666641 (10hashar) a:05Krinkle>03None Stalled pending {T101545} [10:52:46] 10Beta-Cluster-Infrastructure, 06Operations, 07Puppet: Make deployment-prep puppetmaster more similar to Production puppetmaster - https://phabricator.wikimedia.org/T146627#2666647 (10yuvipanda) [10:54:34] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Implement "CI overhead" KPI - https://phabricator.wikimedia.org/T108751#2666651 (10hashar) 05Open>03Resolved a:03hashar Zuul has been upgraded and reports the time to wait for a change to be triggered / running. We also have several das... [10:56:46] 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 07Puppet: Implement role based hiera lookups for labs - https://phabricator.wikimedia.org/T120165#2666657 (10hashar) [10:56:48] 10Beta-Cluster-Infrastructure, 06Operations, 07Puppet: Make deployment-prep puppetmaster more similar to Production puppetmaster - https://phabricator.wikimedia.org/T146627#2666656 (10hashar) [10:57:07] 10Beta-Cluster-Infrastructure, 06Operations, 07Puppet: Make deployment-prep puppetmaster more similar to Production puppetmaster - https://phabricator.wikimedia.org/T146627#2666629 (10hashar) p:05Triage>03Normal [10:57:50] 10Continuous-Integration-Infrastructure, 10ArchCom-RfC, 13Patch-For-Review, 07RfC: [RFC] Optional Travis integration for Jenkins - https://phabricator.wikimedia.org/T114421#2666662 (10hashar) [10:57:52] 10Continuous-Integration-Infrastructure: use Mac Mini for iOS Mobile app signing as a MW CI Jenkins slave - https://phabricator.wikimedia.org/T114569#2666660 (10hashar) 05Open>03declined We had that discussion months ago. Mobile team is using a third party since maintaining Mac / XCode is not possible for us. [11:02:18] 10Continuous-Integration-Infrastructure, 10ArchCom-RfC, 13Patch-For-Review, 07RfC: [RFC] Optional Travis integration for Jenkins - https://phabricator.wikimedia.org/T114421#2666663 (10hashar) 05stalled>03declined There is no champion for it and almost all use cases are covered by our current CI setup.... [11:02:51] 10Continuous-Integration-Infrastructure, 10ArchCom-RfC, 13Patch-For-Review, 07RfC: [RFC] Optional Travis integration for Jenkins - https://phabricator.wikimedia.org/T114421#2666670 (10hashar) [11:02:53] 10Continuous-Integration-Infrastructure: ability to run CI for iOS Mobile app - https://phabricator.wikimedia.org/T114570#2666665 (10hashar) 05Open>03Resolved a:03hashar >>! In T114569#2666660, @hashar wrote: > We had that discussion months ago. Mobile team is using a third party since maintaining Mac / X... [11:03:40] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Build an apt repository on deployment-prep (for testing packages from jenkins) - https://phabricator.wikimedia.org/T146497#2663278 (10hashar) a:03mmodell [11:04:23] 10Continuous-Integration-Infrastructure: PHP7 support in CI (tracking) - https://phabricator.wikimedia.org/T144964#2666673 (10hashar) 05Open>03stalled Pending more capacity which is {T133911} [11:05:01] 10Continuous-Integration-Config: Update puppet-lint to 2.* - https://phabricator.wikimedia.org/T144667#2666677 (10hashar) [11:05:15] 10Continuous-Integration-Config: Update puppet-lint to 2.* - https://phabricator.wikimedia.org/T144667#2606791 (10hashar) p:05Triage>03Low [11:05:27] 10Continuous-Integration-Infrastructure, 07Jenkins: Upgrade Jenkins from 1.x to 2.7.2 - https://phabricator.wikimedia.org/T144106#2666680 (10hashar) p:05Triage>03High [11:05:36] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins: Upgrade Jenkins from 1.x to 2.7.2 - https://phabricator.wikimedia.org/T144106#2588690 (10hashar) [11:05:57] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins: Upgrade Jenkins to 1.651.3 - https://phabricator.wikimedia.org/T144105#2666683 (10hashar) p:05Triage>03High [11:06:16] 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Technical-Debt: Migrate CI labs slaves to use /srv instead of /mnt - https://phabricator.wikimedia.org/T146381#2666687 (10hashar) p:05Triage>03Normal [11:07:15] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 10Wikidata: Run Wikibase browser tests on gerrit triggered with keyword - https://phabricator.wikimedia.org/T145190#2666689 (10hashar) p:05Triage>03Normal [11:07:34] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 10Wikidata: Run Wikibase browser tests on gerrit triggered with keyword - https://phabricator.wikimedia.org/T145190#2622727 (10hashar) Aren't we already triggering the job mwext-mw-selenium? But I am not sure how well it will play with Wikibas... [11:07:52] 10Continuous-Integration-Infrastructure: PHP7 support in CI (tracking) - https://phabricator.wikimedia.org/T144964#2666696 (10hashar) p:05Triage>03Normal [11:08:01] 10Continuous-Integration-Infrastructure: Allow people with +2 rights to trigger recheck - https://phabricator.wikimedia.org/T144113#2666697 (10hashar) p:05Triage>03Low [11:08:24] 10Continuous-Integration-Infrastructure, 07Nodepool: 2016-08-10 CI incident follow-ups - https://phabricator.wikimedia.org/T142952#2552198 (10hashar) p:05Triage>03Normal [11:11:12] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-UserMerge: Error: 1137 Can't reopen table: 'w1' - https://phabricator.wikimedia.org/T142240#2666708 (10hashar) 05Open>03declined That is similar to T101702, namely you can join twice on temporary tables: https://dev.mysql.com/doc/refman/5.0/e... [11:11:26] 10Continuous-Integration-Infrastructure, 07Zuul: Fix Zuul package "postinst called with unknown argument `triggered' - https://phabricator.wikimedia.org/T146084#2650129 (10hashar) p:05Triage>03Normal [11:11:58] 10Continuous-Integration-Infrastructure: Install php7 and the php-ast extension so etsy/phan can be run from jenkins - https://phabricator.wikimedia.org/T132636#2205146 (10hashar) p:05Triage>03Low [11:12:21] 10Continuous-Integration-Infrastructure, 10MediaWiki-Unit-tests: Use strict mode when running hhvm tests on jenkins - https://phabricator.wikimedia.org/T132270#2193039 (10hashar) p:05Triage>03Low [11:13:27] 10Continuous-Integration-Infrastructure, 07HHVM: CI slaves should have HHVM call the exception user handler so we have useful stack trace on fatal errors - https://phabricator.wikimedia.org/T126473#2015381 (10hashar) p:05Triage>03Normal [11:13:49] 10Continuous-Integration-Infrastructure: QUnit and npm jobs sometimes fail with "npm ERR! code ENOENT" - https://phabricator.wikimedia.org/T130895#2666726 (10hashar) 05Open>03Resolved a:03hashar I guess it was a transient issue. [11:13:53] 10Continuous-Integration-Infrastructure: QUnit and npm jobs sometimes fail with "npm ERR! code MODULE_NOT_FOUND" - https://phabricator.wikimedia.org/T129711#2666730 (10hashar) 05Open>03Resolved a:03hashar I guess it was a transient issue. [11:15:20] 10Continuous-Integration-Config, 13Patch-For-Review, 07Upstream, 07Zuul: free some repositories from their unintended chain to mediawiki/core - https://phabricator.wikimedia.org/T107529#2666761 (10hashar) [11:15:22] 10Continuous-Integration-Infrastructure: Merge tests even though a repo is being tested before it - https://phabricator.wikimedia.org/T122602#2666763 (10hashar) [11:15:42] 10Continuous-Integration-Infrastructure: Find out which jobs are using ElasticSearch as a backend - https://phabricator.wikimedia.org/T112667#1641905 (10hashar) p:05Triage>03Low [11:16:00] 10Continuous-Integration-Infrastructure, 10Deployment-Systems, 06Release-Engineering-Team: Unify deployment of integration/config.git changes using the official Wikimedia deployment system - https://phabricator.wikimedia.org/T111559#1610383 (10hashar) p:05Triage>03Normal [11:17:27] 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07WorkType-Maintenance: Jenkins job mediawiki-extensions-qunit Karma timeout on odd-numbered build slaves - https://phabricator.wikimedia.org/T122449#2666776 (10hashar) [11:17:29] 10Continuous-Integration-Config, 10Analytics-EventLogging: EventLogging CI jobs hit meta.wikimedia.org - https://phabricator.wikimedia.org/T122463#2666774 (10hashar) 05Open>03declined Apparently does not cause much harm. [11:18:36] 10Continuous-Integration-Config, 07Ruby, 15User-zeljkofilipin: create and use rubocop-wikimedia package - https://phabricator.wikimedia.org/T134895#2666780 (10hashar) p:05Triage>03Low [11:18:58] 10Continuous-Integration-Config, 10Utilities-mwdumper, 07Jenkins, 13Patch-For-Review: Re-add mwdumper builds to continuous integration / jenkins - https://phabricator.wikimedia.org/T133456#2666781 (10hashar) 05Open>03Resolved a:03hashar [11:20:00] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 07Documentation, 07Jenkins: Jenkins-mwext-sync needs documentation - https://phabricator.wikimedia.org/T62793#2666785 (10hashar) 05Open>03declined That script and the associated job have been removed. That was merely to work arou... [11:24:45] !log beta: mass upgrading all debian packages on all instances [11:24:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:27:15] PROBLEM - Puppet run on deployment-sca01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:28:28] PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [11:29:38] PROBLEM - Host deployment-salt02 is DOWN: CRITICAL - Host Unreachable (10.68.17.58) [11:30:11] bah [11:30:12] salt died [11:30:22] PROBLEM - Puppet run on deployment-conftool is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:31:20] !log rebooting deployment-salt02 has a kernel soft lock while hitting the disk [11:31:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:34:37] RECOVERY - Host deployment-salt02 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms [11:35:22] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 10Wikidata, 15User-zeljkofilipin: Run Wikibase browser tests on gerrit triggered with keyword - https://phabricator.wikimedia.org/T145190#2666832 (10zeljkofilipin) I think the task is about running a subset of test if you reply to a gerrit [11:35:49] !log deployment-salt02 : autoremoving a bunch of java related packages [11:35:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:37:31] PROBLEM - Puppet run on deployment-tmh01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:41:19] PROBLEM - Puppet run on deployment-salt02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [11:42:14] RECOVERY - Puppet run on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:45:24] RECOVERY - Puppet run on deployment-conftool is OK: OK: Less than 1.00% above the threshold [0.0] [11:46:18] RECOVERY - Puppet run on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:50:08] PROBLEM - SSH on deployment-cache-text04 is CRITICAL: Connection refused [11:50:26] PROBLEM - SSH on deployment-cache-upload04 is CRITICAL: Connection refused [11:55:07] RECOVERY - SSH on deployment-cache-text04 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [11:55:27] RECOVERY - SSH on deployment-cache-upload04 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [11:57:29] RECOVERY - Puppet run on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:59:00] errors above are all from me [12:03:30] RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [12:03:47] 10Gerrit, 06Developer-Relations: Add a welcome bot to Gerrit for first time contributors - https://phabricator.wikimedia.org/T73357#2666867 (10Aklapper) a:03Legoktm >>! In T73357#2627021, @Legoktm wrote: > If someone drafts up message text, I'd be happy to do the technical implementation. See above. :) @Le... [12:32:52] (03Abandoned) 10Hashar: Add rm -fR "$WORKSPACE/modules/*/bin" to jenkins job operations-puppet-doc [integration/config] - 10https://gerrit.wikimedia.org/r/307654 (https://phabricator.wikimedia.org/T143233) (owner: 10Paladox) [12:37:25] zeljkof: poke [12:37:35] zeljkof: I am going to get Firefox upgraded from v46 to v49 on trusty [12:37:37] hashar: peek? [12:37:42] for T137561 [12:37:45] uh oh [12:37:50] just removing https://gerrit.wikimedia.org/r/#/c/293739/3/modules/contint/manifests/browsers.pp [12:38:18] hashar: but wait [12:38:24] that will break all selenium tests [12:38:42] 47.0 was broken [12:38:46] got fixed by 47.0.1 [12:40:11] zeljkof: or is selenium webdriver broken in 49 as well ? :D [12:41:33] hashar: yes, the situation is complicated [12:41:39] our current selenium stack will work up to firefox 47.0.1 [12:41:53] it will be completely broken with firefox 48+ [12:41:59] have they dropped/broke selenium with 48 + really ? [12:42:00] for 48+ we need geckodriver [12:42:10] and my research says it is not ready [12:42:26] https://phabricator.wikimedia.org/T137540#2656073 [12:42:47] selenium needs firefoxdriver for firefox 47 and earlieer [12:42:57] and it needs geckodriver for firefox 48+ [12:43:52] oh come it is not even mentionned in https://www.mozilla.org/en-US/firefox/48.0/releasenotes/ ? :( [12:44:19] hashar: let me check [12:44:39] well, it does [12:44:46] but you have to read between the lines :( [12:44:48] "Add-ons that have not been verified and signed by Mozilla will not load" [12:44:54] firefoxdriver is not signed [12:44:57] geckodriver is signed [12:45:17] ahhh [12:45:28] and they cant sign the webdriver version right? [12:46:16] hashar: according to a post on a mailing list, firefoxdriver can not be signed, since it uses unsafe apis, or something like that [12:46:33] and geckodriver is a complete rewrite, so it is safe, and can be signed [12:48:39] bah I lost phabricator [12:48:41] or internet [12:49:31] zeljkof: ok so I am holding my changes [12:49:53] hashar: please do [12:50:08] zeljkof: so should we just drop Firefox entirely ? [12:50:08] if you need to upgrade firefox, up to 47.0.1 is fine [12:50:19] (excluding 47.0.0, that one is broken) [12:50:23] or spend time to get Marionette support + create a debian package for geckodriver ? [12:50:38] hashar: geckodriver is currently not ready :( [12:50:45] 40% of our tests fail [12:50:50] tested [12:51:08] if you need firefox 48+ we can switch selenium tests to chrome/chromium [12:53:51] oh it was just to get firefox upgraded [12:53:53] but yeah [12:54:03] most probably we should just drop firefox entirely and save us bunch of time [12:54:44] I dont think I have the energy to get gekcodriver / rust properly packaged [13:00:28] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 07Upstream, 15User-zeljkofilipin: Firefox v47 breaks mediawiki_selenium - https://phabricator.wikimedia.org/T137561#2666979 (10hashar) From a quick check-in with zelko. Since v48 ([[ https://www.mozilla.org/en-US/firefox/48.0/releasenotes/ |... [13:03:37] (03Abandoned) 10Hashar: Merge branch 'debian/precise-wikimedia' into debian/jessie-wikimedia [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/307276 (owner: 10Paladox) [13:03:40] (03Abandoned) 10Hashar: New release 2.5.0-8-gcbc7f62-wmf2jessie1 [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/307277 (owner: 10Paladox) [13:03:49] (03Abandoned) 10Hashar: Merge branch 'upstream' [integration/zuul] - 10https://gerrit.wikimedia.org/r/307308 (owner: 10Paladox) [13:08:02] (03Abandoned) 10Hashar: Add env params['RDOCOPT'] = '--exclude=/modules/[^/]*/bin/.*$' [integration/config] - 10https://gerrit.wikimedia.org/r/309327 (https://phabricator.wikimedia.org/T143233) (owner: 10Paladox) [13:08:57] (03Abandoned) 10Hashar: Update some libs [integration/docroot] - 10https://gerrit.wikimedia.org/r/311345 (https://phabricator.wikimedia.org/T109747) (owner: 10Paladox) [13:11:07] so there was a brief phabricator outage [13:11:19] I am going to create a ticket [13:14:33] (03PS4) 10Hashar: Add checkstyle publisher for mediawiki-core-phpcs job [integration/config] - 10https://gerrit.wikimedia.org/r/243869 (https://phabricator.wikimedia.org/T113865) (owner: 10Legoktm) [13:20:10] twentyafterfour_: ostriches phab search is down? [13:20:30] phab itself seems to be down now [13:21:51] madhuvishy: chasemp: yeah poke -operations about it. It seems to flap due to some mysql error [13:22:19] hashar: I don't think it's on mysql's side, it's phab doing unwise things causing this issue [13:22:40] long queries that lock and cause phab itself to choke [13:22:57] it would be good if twentyafterfour_ was around [13:23:07] greg-g_: ^ [13:24:20] Relatedly, it feels (no data to support my impression) like it's become way easier to get a 504 timeout when running a search on Phab. https://phabricator.wikimedia.org/T146562 [13:25:17] I am pretty sure it's the traditional 'all of ops are in one room with crappy internet, let us blow some thing up' that happens every year. Thanks, universe [13:27:16] (03CR) 10Hashar: "the other publisher got overriden." [integration/config] - 10https://gerrit.wikimedia.org/r/243869 (https://phabricator.wikimedia.org/T113865) (owner: 10Legoktm) [13:27:26] (03PS5) 10Hashar: Add checkstyle publisher for mediawiki-core-phpcs job [integration/config] - 10https://gerrit.wikimedia.org/r/243869 (https://phabricator.wikimedia.org/T113865) (owner: 10Legoktm) [13:30:16] thcipriani: good morning :] [13:30:34] 10Continuous-Integration-Config, 10MediaWiki-Codesniffer: Add cache support to PHPCode_Sniffer jobs - https://phabricator.wikimedia.org/T146644#2667077 (10hashar) [13:30:35] thcipriani: I am pretty sure the puppet git autorebase ends up in a race condition with puppetmaster [13:30:49] That would make some sense [13:30:54] but I have no understanding as to why it started being an issue only recently [13:31:30] for the integration puppetmaster? It is running completely new code as of last monday, so if you run into any weird issue please file phab tickets and cc me :) [13:31:52] (03CR) 10Hashar: [C: 032] "Thank you Kunal. I have deployed your change and it is live as shown on https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-trus" [integration/config] - 10https://gerrit.wikimedia.org/r/243869 (https://phabricator.wikimedia.org/T113865) (owner: 10Legoktm) [13:32:42] yuvipanda: that is mostly on beta [13:32:45] I think this happens on deployment-puppetmaster (exclusively?) [13:32:56] (03Merged) 10jenkins-bot: Add checkstyle publisher for mediawiki-core-phpcs job [integration/config] - 10https://gerrit.wikimedia.org/r/243869 (https://phabricator.wikimedia.org/T113865) (owner: 10Legoktm) [13:33:00] ah, ok. then it's different. [13:33:02] yuvipanda: the CI puppetmaster is apparently all fine. Thanks to have switched it to whatever new stuff/code :] [13:33:05] me and krenair will change that too in a few weeks [13:33:49] and eventually deployment-prep will have similar puppet infra as prod (including exported resources) [13:34:02] yeah seen your task [13:34:57] also, one random thought, the git rebase wouldn't be mangling the same files at the same time since there has to be a puppet run on the deployment-puppetmaster to put those files in place, right? [13:35:23] ("same files" meaning the files being used during puppet runs) [13:35:33] it is a 10 minutes cron [13:35:52] with X instances asking puppetmaster for catalog compilation [13:36:09] I guess a noticeable chunk of those catalog requests ends up being executed when the rebase occurs [13:36:10] what is the problem y'all are talking about? [13:36:15] and thus has file missing / disappearing maybe [13:36:21] yeah, but I mean, /var/lib/git/operations/puppet isn't where the puppetmaster is looking for files [13:36:36] hmm [13:36:40] that must be there ? :D [13:36:50] thcipriani: hashar we are going to get invasive with phabricator here out of necessity, and by we I mean mostly DBA's but the search index operations are causing havok [13:36:54] or else how does the puppet master knows about the cherry pick one does in /var/lib/git/operations/puppet ? [13:37:11] greg-g_: ^ it would be good if twentyafterfour_ or ostriches were around [13:37:30] chasemp: invasive? Do you mean use this channel for sync? [13:38:00] chasemp: I don't think greg-g will be on IRC now, it's not even 7AM yet on a monday [13:38:31] chasemp: sounds reasonable, I'm not too familiar with it, if phab is having problems. ostriches should be around shortly, he's an early riser typically. [13:40:14] hashar: my understanding (that might be wrong) is that deployment-puppetmaster has to have a puppet-run before changes show up since it's looking /etc/puppet for modules, manifests, et al. [13:40:36] our entire spanish DBA dept is looking into it [13:41:32] * hashar orders a large bowl of taps [13:41:34] tapas [13:41:35] grr [13:42:04] chasemp: maybe it is a large amount of requests due to a bot crawling the site [13:42:10] or some bot doing heavy searches [13:42:14] * andre__ thanks everybody investigating and tries to find non-Phab work :P [13:42:21] I dont have access to the server or phabricator logs so cant really look into it [13:42:47] thcipriani: ahh I see the confusion. The /etc/puppet/{manifests,hieradata,modules} are symlinked to /var/lib/git/operations/puppet equivalents [13:42:53] thcipriani: so a pull / cherry pick is instantly live [13:43:12] and I dont think we can pause / hold the puppetmaster while doing it [13:46:27] Yippee, build fixed! [13:46:28] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #157: 09FIXED in 2 min 27 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/157/ [13:48:57] ah, that makes sense :) [13:54:05] we have wiped the search table [13:54:12] and converted to innodb [14:10:16] 06Release-Engineering-Team, 06Developer-Relations, 10Wikimedia-Blog-Content: blog.wikimedia.org post on Phabricator improvements - https://phabricator.wikimedia.org/T141457#2667267 (10MelodyKramer) @EdErhart-WMF and @jeffelder are the contacts for the blog! I would email the two of them about your plans! [14:18:07] 06Release-Engineering-Team, 10Monitoring, 06Operations, 06Performance-Team, 07Wikimedia-Incident: MediaWiki load time regression should trigger an alarm / page people - https://phabricator.wikimedia.org/T146125#2651529 (10ori) #performance-team is considering making this the focus of our off-site. [14:25:00] thcipriani: our Phab boards are never getting clean :(((( eg https://phabricator.wikimedia.org/tag/wikimedia-log-errors/ [14:25:07] been solving / fixing a few boards this morning [14:25:10] but that is really long and tedious [14:38:04] hashar: yeah, board triage has really fallen apart. [14:42:53] chasemp: I'm around now [14:43:25] twentyafterfour_: good morning. apparently the Phabricator search index went crazy/stall [14:43:35] DBAs have converted the database table to innodb [14:43:41] and regenerated the index [14:43:45] ok [14:48:55] 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 07Puppet: Implement role based hiera lookups for labs - https://phabricator.wikimedia.org/T120165#2667411 (10yuvipanda) I'm no longer convinced we have to do this. https://phabricator.wikimedia.org/T91990 should cover most of the things we want from this. [14:56:29] thcipriani: looks like a bunch of wikimedia-log-errors tasks are actually fixed :] [14:58:05] hashar hi, i found a new skin for gerrit for us https://github.com/shellscape/OctoGerrit/ [14:58:16] It has the icons to tell if it is open or merge [14:59:01] Im deffitly going to try it now [14:59:09] it looks more better then our current skin [14:59:19] I use that board for 2 things: (1) new mid-intensity info-level (or similar low impact) errors I throw a task in phab on that board. (2) things that cause me to roll-back the train, I add a task to that board and add it as a blocker for the wmf.xx task. [14:59:30] I never look at that board :\ [15:01:11] IIRC for a while we were triaging that board on wednesdays and then we stopped because reasons. [15:01:39] Check this out [15:01:39] http://gerrit-test.wmflabs.org/gerrit/#/dashboard/self [15:15:34] 03Scap3: remove hard-coded upstart commands - https://phabricator.wikimedia.org/T146656#2667455 (10thcipriani) [15:15:48] 03Scap3: remove hard-coded upstart commands - https://phabricator.wikimedia.org/T146656#2667467 (10thcipriani) p:05Triage>03Normal [15:19:48] paladox: are they skins build on top of the Gerrit REST API ? [15:20:03] No [15:20:07] using the gerrit.css file [15:20:25] We can customise it, maybe even make it mobile efficent [15:20:30] oh mean https://phabricator.wikimedia.org/tag/mediawiki-parser/ has more than 500 bugs :( [15:22:23] thcipriani: I guess you can attach https://phabricator.wikimedia.org/T146656 and https://gerrit.wikimedia.org/r/#/c/312705/ [15:22:32] thcipriani: I will do [15:52:36] chasemp: btw, there is the contact list on officewiki if you need people in an emergency (I lost my backscroll, just saw your ping before I restarted) *(really I just don't plan to look at my irclogs in less unless I need to) [15:56:31] Hey on phone, I wasnt trying to sandbag greg-g. My thinking was as tyler amd hashar were around to let them know directly and make the call on who to call [15:58:08] My process was somethimg like notify releng and let releng allocate internally, not sure where that falls on the contact ppl continuum. We were ok afa jynus and I working things out but it was an obv fyi need [15:58:35] but terrible internet here and it was all a struggle [15:59:54] I meant and let them make the call :) [16:00:01] I'm actually not sure how I could help with the database issue but would have been able to help diagnose I suppose [16:00:37] basically and alao to babysit as I kicked off a full search reindex [16:00:48] and that hasnt been done in a long time [16:00:57] so....possible weirdness [16:00:57] yeah ... [16:01:47] so I kind of expected folks within releng to make that figurative call since we were mid fire fight but ok [16:02:21] is that a bad approach? [16:02:31] Its kind of untested waters [16:02:40] chasemp: yup yup, just reminding if you needed someone and I'm not here :) [16:02:51] thanks for responding (I still don't know what happened, just metnally booting up still) [16:03:50] chase probably still has my number if he needs it [16:03:53] chasemp: if you want someone familiar with the service from releng to make a call, give 'em a call ;) [16:09:01] Yeah fair, I have a few thoughts but its not urgent. I would haave started calling pretty soon if no resolution etc. I do kind of feel like if someone from a team is around they are on the hook for marshalling their own ppls. Totally undefined and I should have been explicit either way, just my thinking in the moment [16:32:58] PROBLEM - Keyholder status on deployment-tin02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:35:59] (03CR) 10Paladox: "@hashar but this breaks nothing, this includes up to date css changes, to apply to current browsers." [integration/docroot] - 10https://gerrit.wikimedia.org/r/311345 (https://phabricator.wikimedia.org/T109747) (owner: 10Paladox) [16:36:17] (03CR) 10Paladox: "This may fix the bug linked other wise we can use the css I have wrote in the comment" [integration/docroot] - 10https://gerrit.wikimedia.org/r/311345 (https://phabricator.wikimedia.org/T109747) (owner: 10Paladox) [17:00:32] chasemp: oh, totally (re marshalling own peeps) [17:00:40] I'd expect the same [17:01:00] I do when there are release blockers "This is now a teamX issue, take care of it and report back, now" [17:06:06] 06Release-Engineering-Team, 10Phabricator: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2667974 (10jcrespo) [17:07:01] 06Release-Engineering-Team, 10Phabricator: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2667988 (10jcrespo) [17:13:37] 06Release-Engineering-Team, 10Phabricator: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2668002 (10jcrespo) In particular see how after converting the table, all table locking issues get converted into row locking instances which will b... [17:14:46] 06Release-Engineering-Team, 10Phabricator: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2668011 (10jcrespo) Also, re indexing is still in progress: https://grafana-admin.wikimedia.org/dashboard/db/mysql?panelId=2&fullscreen&from=1474886... [17:18:38] twentyafterfour: is that something we want to ping evan about directly? ^ [17:32:24] 06Release-Engineering-Team, 10Phabricator: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2668118 (10mmodell) We previously abandoned elasticsearch for phabricator. It might be time to look into that again. [17:32:49] greg-g: I can mention it and see what he has to say about myisam cs innodb [17:33:41] 06Release-Engineering-Team, 10Phabricator: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2668122 (10mmodell) p:05Triage>03High [17:37:24] 06Release-Engineering-Team, 10Phabricator: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2668131 (10mmodell) Someone asked upstream about why the table isn't innodb and epriestley [[ https://secure.phabricator.com/T4130#65039 | responded... [18:11:41] 06Release-Engineering-Team, 10Phabricator: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2668203 (10mmodell) It looks like innodb does not support stemming, among other limitations. [18:13:46] 06Release-Engineering-Team, 10Phabricator: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2668206 (10Paladox) @mmodell is stemming a full text search if so MySQL 5.6 or 5.7+ support full text searches. Also mariadb supports full search i... [18:21:35] 06Release-Engineering-Team, 07Wikimedia-log-errors: Error: Couldn't find trailer dictionary - https://phabricator.wikimedia.org/T145772#2668241 (10hashar) [18:22:00] 10Continuous-Integration-Config, 10MediaWiki-Codesniffer: Add cache support to PHPCode_Sniffer jobs - https://phabricator.wikimedia.org/T146644#2668245 (10Legoktm) [18:22:02] 10MediaWiki-Codesniffer: Update squizlabs/PHP_CodeSniffer to 3.x - https://phabricator.wikimedia.org/T142474#2668244 (10Legoktm) [18:24:53] 06Release-Engineering-Team, 07Wikimedia-log-errors: Error: Couldn't find trailer dictionary - https://phabricator.wikimedia.org/T145772#2668251 (10hashar) [18:43:24] 06Release-Engineering-Team, 07Wikimedia-log-errors: Error: Couldn't find trailer dictionary - https://phabricator.wikimedia.org/T145772#2668329 (10hashar) [19:51:40] (03PS1) 10Mholloway: Whitelist Neslihan [integration/config] - 10https://gerrit.wikimedia.org/r/312868 [20:00:39] (03CR) 10Paladox: [C: 031] Whitelist Neslihan [integration/config] - 10https://gerrit.wikimedia.org/r/312868 (owner: 10Mholloway) [20:07:43] PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:09:33] PROBLEM - Host deployment-tin is DOWN: CRITICAL - Host Unreachable (10.68.17.240) [20:43:27] 06Release-Engineering-Team, 10AbuseFilter, 10Wikimedia-Logstash, 07Wikimedia-log-errors: Should we keep StashEdit logs in AbuseFilter? - https://phabricator.wikimedia.org/T146697#2668717 (10hashar) [20:45:22] 06Release-Engineering-Team, 10AbuseFilter, 10Wikimedia-Logstash, 07Wikimedia-log-errors: Should we keep StashEdit logs in AbuseFilter? - https://phabricator.wikimedia.org/T146697#2668732 (10hashar) [21:44:08] anybody knows what Generic.Metrics.CyclomaticComplexity.TooHigh is actually doing? it told me cyclomatic complexity (21) exceeds allowed maximum of 20 [21:44:19] I refactored the code, and now it says 20 exceeds maximum of 16 [21:44:25] is there some way to shut it up? [21:58:43] Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #158: 04FAILURE in 42 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/158/ [22:09:27] 06Release-Engineering-Team, 15User-greg: Create agenda outline for 2016 RelEng team offsite - https://phabricator.wikimedia.org/T138437#2669027 (10greg) And now at: https://docs.google.com/document/d/1lmxtQkAuDJY4Vv8oFWihSmhz1y-JgUzsb11ebFCOz6g/edit# :) [22:10:01] 06Release-Engineering-Team, 15User-greg: Create agenda outline for 2016 RelEng team offsite - https://phabricator.wikimedia.org/T138437#2669043 (10greg) [22:13:15] 03Scap3, 06Services, 15User-mobrovac: Scap config management: Jinja2 fills templates with Pythonic values - https://phabricator.wikimedia.org/T145510#2632657 (10mmodell) http://stackoverflow.com/a/17661969/1672995 [22:14:43] twentyafterfour, thcipriani: around? [22:14:55] Krenair: what's up [22:15:05] either of you know keyholder much? [22:15:10] yes [22:15:23] I know it all too well [22:15:25] I noticed /usr/lib/nagios/plugins/check_keyholder is now failing [22:15:48] Because 'ssh-add -l' is not returning lines with file paths in it, like we expect [22:15:54] instead we have lines like this from it [22:15:57] 4096 5d:ae:1b:24:40:20:89:b3:e1:74:51:9a:e7:64:a3:5d rsa w/o comment (RSA) [22:16:04] so it cuts to 'rsa' [22:16:17] yeah ... because of switching to jessie I think? [22:16:18] ugh and before we were checking key names vs files in the directory :\ [22:18:36] hrm, I guess the easiest way to fix it would be to have configured_keys do ssh-keygen -l -f [22:18:53] we have openssh-client 1:6.7p1-5+deb8u3 [22:21:53] what's the version of openssh with the regression? /me looks for ticket [22:25:06] 10Continuous-Integration-Config, 10RESTBase, 06Wikipedia-Android-App-Backlog: Kick off periodic Android CI tests when RESTBase is updated on beta labs - https://phabricator.wikimedia.org/T146488#2662808 (10greg) 1) work on getting it consistently green first :) https://integration.wikimedia.org/ci/job/apps-... [22:25:08] might be https://github.com/openssh/openssh-portable/commit/141efe49542f7156cdbc2e4cd0a041d8b1aab622 ? [22:25:43] hm, no [22:29:17] https://github.com/openssh/openssh-portable/commit/1195f4cb07ef4b0405c839293c38600b3e9bdb46 [22:30:08] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2669090 [22:30:32] openssh thing found via https://gerrit.wikimedia.org/r/#/c/310793/ [22:32:46] 06Release-Engineering-Team, 06Developer-Relations, 10Wikimedia-Blog-Content: blog.wikimedia.org post on Phabricator improvements - https://phabricator.wikimedia.org/T141457#2669094 (10greg) Quick thought before I (or someone) emails them: This draft is so far pretty much just a boiler plate plus a list of li... [22:32:53] so I guess that's just the way things are now. Either way, in the short term we can change the icinga check to use: `SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh-add -l | cut -d' ' -f 2 | sort` and `for key in /etc/keyholder.d/*.pub; do ssh-keygen -l -f "$key" | cut -d' ' -f2; done | sort` [22:36:09] wanna upload a puppet patch? [22:36:45] yup, will do :) [22:37:35] k [22:44:24] deployment-aqs01 does not look happy, looking into it [22:45:57] Apparently it is no longer possible for the python interpreter to give me a prompt [22:46:12] here you go: >>> [22:47:04] thank [22:47:06] thanks [22:48:19] strace is also not working [22:52:18] so is `ls` [22:55:26] yeah, console log shows [11512081.931154] INFO: task java:3323 blocked for more than 120 seconds. [22:55:26] [11512081.931893] Not tainted 3.16.0-4-amd64 #1 [22:55:26] [11512081.932425] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [22:58:41] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2669154 (10greg) @hashar you set this to... [22:58:59] is phabricator search more useless than usual today? [23:00:44] Krenair: looks like the reindex is still going: https://grafana-admin.wikimedia.org/dashboard/db/mysql?panelId=2&fullscreen&from=1474886388229&to=1474902821657&var-dc=eqiad%20prometheus%2Fops&var-server=db1048 [23:00:52] context: https://phabricator.wikimedia.org/T146673 [23:00:55] ah [23:05:45] 06Release-Engineering-Team, 15User-greg: Create SOW for contractor - https://phabricator.wikimedia.org/T146711#2669183 (10greg) [23:26:41] oh well, that's dealt with [23:26:54] deployment-cache-upload04 has an almost-full /srv/vdb [23:31:11] PROBLEM - Free space - all mounts on deployment-cache-upload04 is CRITICAL: CRITICAL: deployment-prep.deployment-cache-upload04.diskspace._srv_vdb.byte_percentfree (<100.00%) [23:31:41] RECOVERY - Puppet staleness on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [3600.0] [23:33:53] /var/lib/varnish/deployment-cache-upload04/ is missing and varnishlog is not happy about it [23:52:39] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2669408 (10Mattflaschen-WMF) [23:55:18] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2669432 (10Mattflaschen-WMF) p:05Triage>03High [23:58:46] !log Started udp2log-mw on deployment-fluorine02 for T146723 [23:58:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:59:42] puppet wasn't ensuring it was running bd808? [23:59:57] it looks like the systemd unit if borked