[00:42:40] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Make rules for footer contents less strict [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304496 (https://phabricator.wikimedia.org/T142804) (owner: 10BryanDavis)
[00:43:08] <grrrit-wm>	 (03Merged) 10jenkins-bot: Make rules for footer contents less strict [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304496 (https://phabricator.wikimedia.org/T142804) (owner: 10BryanDavis)
[00:49:03] <grrrit-wm>	 (03PS3) 10Legoktm: Add script to test already merged commits in a repository [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304583 
[00:49:28] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Add script to test already merged commits in a repository [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304583 (owner: 10Legoktm)
[00:49:56] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add script to test already merged commits in a repository [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304583 (owner: 10Legoktm)
[01:13:06] <shinken-wm>	 PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[01:15:14] <grrrit-wm>	 (03PS1) 10Legoktm: Release 0.4.0 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/306085 
[01:16:53] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Release 0.4.0 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/306085 (owner: 10Legoktm)
[01:18:35] <grrrit-wm>	 (03Merged) 10jenkins-bot: Release 0.4.0 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/306085 (owner: 10Legoktm)
[01:28:27] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-imagescaler01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[02:25:26] <shinken-wm>	 RECOVERY - Host deployment-parsoid05 is UP: PING OK - Packet loss = 0%, RTA = 0.72 ms
[02:31:08] <shinken-wm>	 PROBLEM - Host deployment-parsoid05 is DOWN: CRITICAL - Host Unreachable (10.68.16.120)
[03:17:09] <grrrit-wm>	 (03CR) 10Krinkle: [C: 04-1] "Pending outcome of upstream bug." [integration/config] - 10https://gerrit.wikimedia.org/r/305993 (https://phabricator.wikimedia.org/T142964) (owner: 10Phedenskog)
[03:30:50] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10MobileFrontend, 13Patch-For-Review, and 3 others: Jenkins complains on MobileFrontend commits with Could not read gem at /var/lib/gems/2.1.0/cache/rake-10.5.0.gem. It may be corrupt... - https://phabricator.wikimedia.org/T143601#2574258
[04:09:44] <yurik>	 is zuul dead again? :(
[04:11:32] <yurik>	 ah, its probably wmf26
[04:16:50] <wmf-insecte>	 Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #117: 04FAILURE in 20 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/117/
[04:20:41] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574298 (10demon) a:03demon
[04:21:15] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574301 (10Joergi123) T137264 added the according change. Interestingly, the changeset in Ger...
[04:21:35] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574303 (10demon) As soon as the tags are finished, I'll build some new tarballs.
[04:22:21] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574306 (10demon) >>! In T143635#2574301, @Joergi123 wrote: > T137264 added the according cha...
[04:22:41] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574307 (10demon) p:05Triage>03Unbreak!
[04:24:19] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574310 (10demon) @Joergi123 As a quick workaround, you can swap `[` for `array(` and `]` for...
[04:26:07] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574311 (10Joergi123) >>! In T143635#2574310, @demon wrote: > @Joergi123 As a quick workaroun...
[04:26:48] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574312 (10demon) Yeah I'll wrap it up shortly :)
[05:17:21] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574344 (10demon) 05Open>03Resolved [[https://lists.wikimedia.org/pipermail/mediawiki-ann...
[05:41:52] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574389 (10Joergi123) Actually the tarballs for 1.26 are still unchanged...
[05:44:20] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574392 (10demon) Looking at mediawiki-1.26.4.patch.gz  I'm seeing the correct array syntax i...
[05:49:14] <grrrit-wm>	 (03CR) 10Phedenskog: "The upstream bug: It works for mobile devices but it fails for desktop, so I've configured so we only run the mobile devices as a proxy, s" [integration/config] - 10https://gerrit.wikimedia.org/r/305993 (https://phabricator.wikimedia.org/T142964) (owner: 10Phedenskog)
[05:53:15] <grrrit-wm>	 (03CR) 10Krinkle: "The bug you saw with broken timeline captures, does that affect wpt-reporter as well, or only the web interface?" [integration/config] - 10https://gerrit.wikimedia.org/r/305993 (https://phabricator.wikimedia.org/T142964) (owner: 10Phedenskog)
[05:53:34] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574393 (10Joergi123) I have tried with mediawiki-1.26.4.tar.gz and also a new download still...
[06:00:51] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574283 (10Legoktm) Maybe it is cached somewhere? ``` km@km-tp ~/p/sandbox> wget https://rele...
[06:15:22] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574442 (10Joergi123) Obviously it was cached, but not on **my** side. I even downloaded to t...
[06:54:34] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#2574547 (10yuvipanda)
[06:55:44] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-General-or-Unknown, 05MW-1.23-release, 05MW-1.26-release, 05Release: Regression: MediaWiki 1.26.4 is using incompatible array syntax - https://phabricator.wikimedia.org/T143635#2574549 (10demon) It does sit behind varnish, which I suppose is nice except in rare instance...
[07:00:48] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10MobileFrontend, 13Patch-For-Review, and 3 others: Jenkins complains on MobileFrontend commits with Could not read gem at /var/lib/gems/2.1.0/cache/rake-10.5.0.gem. It may be corrupt... - https://phabricator.wikimedia.org/T143601#2573107
[07:14:28] <kart_>	 /tmp/hudson1687932244205747632.sh: line 5: /usr/bin/lintian-junit-report: No such file or directory - hmm. again.
[08:04:23] <legoktm>	 I'm not sure why the zuul queue is so backlogged, it looks like the jobs just aren't running even though there are free executor slots?
[08:07:17] <legoktm>	 good morning hashar
[08:07:23] <hashar>	 hello :-}
[08:07:50] <hashar>	 legoktm: thank you for all the nodepool/CI baby sitting you have done !
[08:08:07] <legoktm>	 no problem :)
[08:08:37] <legoktm>	 I'm not sure why it's stuck right now
[08:08:43] <legoktm>	 [01:04:23] <legoktm> I'm not sure why the zuul queue is so backlogged, it looks like the jobs just aren't running even though there are free executor slots?
[08:08:56] <legoktm>	 there was a giant backlog because of the security release, but it cleared through that pretty well
[08:09:14] <legoktm>	 but there are trusty slaves just sitting there, doing nothing...
[08:12:01] <hashar>	 legoktm: oh I will deal with it
[08:12:12] <legoktm>	 do you know what's wrong?
[08:12:19] <hashar>	 most probably no trusty slaves can be spawned
[08:12:30] <hashar>	 or Jenkins is deadlocked somehow
[08:12:44] <legoktm>	 no, these are normal slaves - not nodepool
[08:13:01] <legoktm>	 all the nodepool jobs on trusty ran just fine :)
[08:13:10] <hashar>	 oh
[08:13:41] <legoktm>	 all the jessie jobs on permanent slaves and nodepool also ran fine. It seems to just be trusty permanent slaves having issues
[08:14:11] * hashar tries disabling and reenabling gearman client
[08:14:29] <hashar>	 that some times fix deadlocks
[08:15:00] <hashar>	 legoktm: solved by disabling/enabling gearman client
[08:15:18] <hashar>	 the thing is that there is a bad interaction between the Jenkins plugin that throttle builds one per node  and the Gearman plugin
[08:15:29] <hashar>	 sometime they race for executors and ends up being deadlocked on each other
[08:15:31] <legoktm>	 ah
[08:15:36] <hashar>	 disabling gearman  remove the deadlock
[08:16:06] <hashar>	 !log disabled/enabled Jenkins Gearman client to remove deadlock with Throttle plugin
[08:16:09] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[08:16:18] <hashar>	 whenever we get most stuff moved to Nodepool we will be able to drop the Throttle plugin
[08:18:32] <hashar>	 !log reboot integration-slave-trusty-1014
[08:18:39] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[08:22:48] <hashar>	 !log running puppet on integration-slave-trusty-1014
[08:22:52] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[08:25:43] <shinken-wm>	 PROBLEM - Puppet staleness on integration-slave-trusty-1014 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [43200.0]
[08:27:02] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10MobileFrontend, 13Patch-For-Review, and 3 others: Jenkins complains on MobileFrontend commits with Could not read gem at /var/lib/gems/2.1.0/cache/rake-10.5.0.gem. It may be corrupt... - https://phabricator.wikimedia.org/T143601#2574626
[08:35:46] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-trusty-1014 is OK: OK: Less than 1.00% above the threshold [3600.0]
[08:54:39] <wikibugs>	 10Beta-Cluster-Infrastructure: New wiki cluster wikipedia indonesian language - https://phabricator.wikimedia.org/T143557#2574672 (10hashar) p:05Triage>03Normal Each wiki added to the beta cluster adds overhead to the ongoing maintenance of the small infrastructure.  In most cases it is better to just use on...
[08:56:05] <wikibugs>	 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible, 07I18n: On Beta Cluster, MediaWiki namespace override is inconsistently applied - https://phabricator.wikimedia.org/T142863#2574675 (10hashar) 05Open>03Resolved a:03hashar Assuming it is fixed for now. Cant reproduce reliably for now, we can a...
[09:04:04] <wikibugs>	 05Continuous-Integration-Scaling, 07Nodepool: Nodepool should send metrics to statsd - https://phabricator.wikimedia.org/T111496#2574696 (10hashar) 05stalled>03Resolved a:03chasemp Nodepool now report statistics to statsd. Has been done via https://gerrit.wikimedia.org/r/#/c/305529/   We will want to pat...
[09:36:54] <wikibugs>	 05Continuous-Integration-Scaling, 07Nodepool: Nodepool should send metrics to statsd - https://phabricator.wikimedia.org/T111496#2574751 (10hashar) And I have created a basic dashboard in Grafana at https://grafana-admin.wikimedia.org/dashboard/db/nodepool
[09:43:26] <grrrit-wm>	 (03PS25) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) 
[10:40:49] <grrrit-wm>	 (03PS1) 10Hashar: Link to Nodepool grafana dashboard [integration/docroot] - 10https://gerrit.wikimedia.org/r/306175 
[10:42:42] <wikibugs>	 10Browser-Tests-Infrastructure, 10VisualEditor, 10VisualEditor-MediaWiki, 13Patch-For-Review, 15User-zeljkofilipin: Fix font support on SauceLabs VE screenshots - https://phabricator.wikimedia.org/T141369#2574814 (10zeljkofilipin) The job is running using Chrome browser.  |  | [[ https://integration.wiki...
[10:44:08] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Link to Nodepool grafana dashboard [integration/docroot] - 10https://gerrit.wikimedia.org/r/306175 (owner: 10Hashar)
[10:44:25] <grrrit-wm>	 (03Merged) 10jenkins-bot: Link to Nodepool grafana dashboard [integration/docroot] - 10https://gerrit.wikimedia.org/r/306175 (owner: 10Hashar)
[11:36:13] <hashar>	 zeljkof: look at http://logstash-beta.wmflabs.org ? :)
[11:37:03] <zeljkof>	 hashar: thanks, looking
[11:37:07] <zeljkof>	 I am getting this:
[11:37:24] <zeljkof>	  /usr/local/lib/ruby/gems/2.3.0/gems/mediawiki_api-0.7.0/lib/mediawiki_api/client.rb:211:in `send_request': [V7wzDQpEEH8AAHb1ERgAAAAD] Exception Caught: Could not acquire lock for "Array". (internal_api_error_LocalFileLockError) (MediawikiApi::ApiError)
[11:37:32] <zeljkof>	 works fine on vagrant and production
[11:37:39] <zeljkof>	 broken only on beta
[11:38:31] <zeljkof>	 Failed to lock 'VisualEditor_toolbar-ta.png'
[11:38:36] <zeljkof>	 does not say much
[11:38:41] <zeljkof>	 at least something I can report
[11:38:45] <zeljkof>	 thanks hashar 
[11:46:33] <hashar>	 zeljkof: well report that as a task for VE folks  +  beta-cluster-infra
[11:46:40] <hashar>	 there must be some more traces in logstash
[11:46:52] <zeljkof>	 reporting
[11:47:01] <zeljkof>	 I do not think it is related to VE
[11:47:02] <hashar>	 make sure to dig in logs :}
[11:47:12] <hashar>	 might be resource loader
[11:47:16] <zeljkof>	 logs are all spanish to me :|
[11:47:24] <zeljkof>	 I am using the API
[11:47:28] <zeljkof>	 not the web
[11:53:16] <zeljkof>	 hashar: https://phabricator.wikimedia.org/T143655
[11:53:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Commons, 10MediaWiki-API, 07Beta-Cluster-reproducible, 15User-zeljkofilipin: internal_api_error_LocalFileLockError while uploading file via API to commons.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T143655#2575020 (10zeljkofilipin)
[12:27:16] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551#2575183 (10hashar)
[12:35:02] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-Containers: Ready-to-use Docker package for MediaWiki - https://phabricator.wikimedia.org/T92826#2575219 (10mobrovac) [PR #1](https://github.com/wikimedia/mediawiki-node-services/pull/1) updates the Node services container with the latest developments.
[12:35:25] <wikibugs>	 10MediaWiki-Releasing, 10MediaWiki-Containers, 06Services, 15User-mobrovac: Ready-to-use Docker package for MediaWiki - https://phabricator.wikimedia.org/T92826#2575220 (10mobrovac)
[12:39:09] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551#2575224 (10hashar) I have applied the security patch  For core most of got merged in master yesterday and I have dropped them from /srv/patches/1.28.0-wmf.16/core.  One...
[12:49:57] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551#2575265 (10hashar) Next step is to sync to cluster https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Sync_to_cluster_and_v...
[13:20:31] <ottomata>	 HMmMMm
[13:20:37] <ottomata>	 puppet compiler says:
[13:20:38] <ottomata>	 [ 2016-08-23T13:12:19 ] CRITICAL: Build run failed: [Errno 28] No space left on device: '/mnt/jenkins-workspace/puppet-compiler/3799'
[13:20:57] <ottomata>	 https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/3799/console
[13:25:12] <hashar>	 ottomata: task fill please :)
[13:27:01] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Commons, 10MediaWiki-API, 10MediaWiki-Uploading, and 3 others: internal_api_error_LocalFileLockError while uploading file via API to commons.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T143655#2575366 (10Anomie)
[13:27:34] <hashar>	 ottomata: ema had the same issue earlier
[13:27:38] <hashar>	 the slave needs to be cleanedup
[13:33:40] <wmf-insecte>	 Project beta-code-update-eqiad build #118280: 15ABORTED in 39 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/118280/
[13:34:00] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2575396 (10ema)
[13:34:18] <hashar>	 ottomata: https://phabricator.wikimedia.org/T143671
[13:34:19] <hashar>	 :D
[13:36:47] <ottomata>	 thanks!
[13:36:54] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2575396 (10Ottomata) +1
[13:39:14] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10puppet-compiler: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2575460 (10hashar)
[13:51:11] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Commons, 10MediaWiki-API, 10MediaWiki-Uploading, and 3 others: internal_api_error_LocalFileLockError while uploading file via API to commons.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T143655#2575547 (10Anomie) Looking at other log messages in that reque...
[14:02:54] <hashar>	 European SWAT went just well.
[14:03:23] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Commons, 10MediaWiki-API, 10MediaWiki-Uploading, and 3 others: internal_api_error_LocalFileLockError while uploading file via API to commons.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T143655#2575625 (10zeljkofilipin) >>! In T143655#2575545, @Anomie wrot...
[14:03:49] <wikibugs>	 10Continuous-Integration-Config: Remove mediawiki/extensions/PdfBook from /zuul/layout.yaml - https://phabricator.wikimedia.org/T143683#2575628 (10Aklapper)
[14:06:48] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1014 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[14:07:33] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Commons, 10MediaWiki-API, 10MediaWiki-Uploading, and 3 others: internal_api_error_LocalFileLockError while uploading file via API to commons.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T143655#2575653 (10Anomie) Do the reproductions lack the "Redis is loa...
[14:08:08] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[14:10:00] <grrrit-wm>	 (03PS26) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) 
[14:11:04] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551#2575673 (10hashar) https://test.wikipedia.org/wiki/Special:Version  | MediaWiki | 1.28.0-wmf.16 (88beb39) | 11:39, 23 August 2016
[14:14:02] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[14:16:50] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[14:20:04] <grrrit-wm>	 (03PS1) 10Hashar: Drop PdfBook [integration/config] - 10https://gerrit.wikimedia.org/r/306213 (https://phabricator.wikimedia.org/T143683) 
[14:20:19] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Drop PdfBook [integration/config] - 10https://gerrit.wikimedia.org/r/306213 (https://phabricator.wikimedia.org/T143683) (owner: 10Hashar)
[14:21:17] <grrrit-wm>	 (03Merged) 10jenkins-bot: Drop PdfBook [integration/config] - 10https://gerrit.wikimedia.org/r/306213 (https://phabricator.wikimedia.org/T143683) (owner: 10Hashar)
[14:23:32] <wikibugs>	 10Continuous-Integration-Config, 13Patch-For-Review: Remove mediawiki/extensions/PdfBook from /zuul/layout.yaml - https://phabricator.wikimedia.org/T143683#2575740 (10hashar) 05Open>03Resolved a:03hashar
[14:24:07] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1005 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[14:28:52] <hashar>	 puppet broken will be fixed with https://gerrit.wikimedia.org/r/306214
[14:29:42] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[14:31:44] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[14:41:29] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 07Wikimedia-Incident: Investigate upgrade of OpenStack python module for labnodepool1001 - https://phabricator.wikimedia.org/T143013#2575765 (10hashar)
[14:41:49] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1014 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:43:06] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 07Wikimedia-Incident: Investigate upgrade of OpenStack python module for labnodepool1001 - https://phabricator.wikimedia.org/T143013#2554094 (10hashar) I am merging this one in the other task T137217 they are close dupes
[14:43:11] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:43:35] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Nodepool: Clean up apt:pin of python modules used for Nodepool - https://phabricator.wikimedia.org/T137217#2575769 (10hashar)
[14:43:38] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Nodepool: Clean up apt:pin of python modules used for Nodepool - https://phabricator.wikimedia.org/T137217#2361186 (10hashar) Updated the task details with the list of current vs jessie-backports python modules.  I guess we can first upgrade `python-novaclient` , rest...
[14:44:01] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:44:29] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Nodepool: Clean up apt:pin of python modules used for Nodepool - https://phabricator.wikimedia.org/T137217#2575796 (10hashar)
[14:44:31] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 07Wikimedia-Incident: Investigate upgrade of OpenStack python module for labnodepool1001 - https://phabricator.wikimedia.org/T143013#2575798 (10hashar)
[14:49:19] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Commons, 10MediaWiki-API, 10MediaWiki-Uploading, and 3 others: internal_api_error_LocalFileLockError while uploading file via API to commons.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T143655#2575805 (10zeljkofilipin) Sorry, I am not familiar with logsta...
[14:52:02] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Commons, 10MediaWiki-API, 10MediaWiki-Uploading, and 3 others: internal_api_error_LocalFileLockError while uploading file via API to commons.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T143655#2575808 (10Anomie) In the search bar near the top of the page,...
[14:56:52] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:57:31] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 07Nodepool, 13Patch-For-Review: Clean up apt:pin of python modules used for Nodepool - https://phabricator.wikimedia.org/T137217#2575815 (10hashar) https://gerrit.wikimedia.org/r/306220 `nodepool: bump nova client and openstack CLI` should do it.  That would...
[14:57:58] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 07Nodepool, 13Patch-For-Review: Clean up apt:pin of python modules used for Nodepool - https://phabricator.wikimedia.org/T137217#2575822 (10hashar) p:05Low>03High
[14:58:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[14:58:31] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Nodepool: 2016-08-10 CI incident follow-ups - https://phabricator.wikimedia.org/T142952#2575826 (10hashar)
[14:58:33] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 07Nodepool, 13Patch-For-Review: Clean up apt:pin of python modules used for Nodepool - https://phabricator.wikimedia.org/T137217#2575825 (10hashar)
[14:59:10] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1005 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:59:14] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:06:45] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1003 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:09:43] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:31:26] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2575912 (10hashar) >  we are doing a lot less and getting a lot more done it seems like  Most of the jobs have been mov...
[15:57:55] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1013 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:59:43] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2575937 (10hashar) Gave it a try with:  * max-server 12 * jessie 8 * trusty 4  And at 11 instances, attempt to spawn a...
[16:04:38] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Unit-tests, 13Patch-For-Review, 07Regression: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2575953 (10EBernhardson) Looking at the overnight runs of mediawiki-extensions-php55, th...
[16:08:35] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2575961 (10hashar) Via the openstack command line tool I got 8 instances:  | ef6ece68-5d79-4ba0-b2f0-e6d0fe308201 | ci-...
[16:08:55] <hashar>	 thcipriani: I am moving out but found out a three ghost instances in the contintcloud tenant  https://phabricator.wikimedia.org/T143016#2575961  :D
[16:09:07] <hashar>	 thcipriani: something prevent them from showing up /Being deleted so they are hidden
[16:09:17] <hashar>	 but still go against the tenant instances quota :(
[16:09:19] <thcipriani>	 ah ha!
[16:09:26] <hashar>	 and magic https://grafana.wikimedia.org/dashboard/db/nodepool
[16:09:33] <hashar>	 so statsd is going to be helpful
[16:09:43] <hashar>	 and we gotta sort out with labs whatever prevent instances from being properly deleted
[16:09:56] <hashar>	 then we can surely bump the quota / restore the rate :}
[16:10:37] * hashar disappears
[16:12:10] <thcipriani>	 changing the rate at which openstack is pinging nodepool seems to have cleared a lot of issues from the nodepool logs, FWIW
[16:13:45] <thcipriani>	 but that 2 instance find is amazing. with a max-server of 10 we were only getting 6, with the allocation currently at 15 we were getting 10 some times. With that 2 instance removal we're now getting 12 sometimes.
[16:13:57] <hashar>	 I guess
[16:14:04] * hashar bicycles
[16:14:07] <thcipriani>	 but, I suppose, max-servers has something to do with it.
[16:20:32] <wikibugs>	 10scap, 06Operations: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2576004 (10Joe)
[16:21:06] <wikibugs>	 10scap, 06Operations: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2576016 (10Joe) p:05Triage>03High
[16:21:31] <wikibugs>	 10scap, 06Operations: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2576004 (10Joe)
[16:30:36] <grrrit-wm>	 (03CR) 10EBernhardson: "Looks reasonable to me. Mentioning the tickets in the commit message content is good, but they should also be in the structured section at" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/305762 (owner: 10Lethexie)
[16:31:14] <wikibugs>	 03Scap3, 10scap, 06Operations, 15User-mobrovac: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2576089 (10mobrovac)
[16:31:25] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Add usage to forbid superglobals like $_GET,$_POST [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/296395 (owner: 10Lethexie)
[16:38:23] <bd808>	 !log Fixed ops/puppet sync by removing stale cherry-pick of https://gerrit.wikimedia.org/r/#/c/305996/
[16:38:27] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:39:00] <thcipriani>	 blerg, sorry :((
[16:39:11] <bd808>	 this is when I giggle because j.oe and I were talking about the puppet cherry-pick mess in deployment-prep yesterday and today his patch breaks it :)
[16:39:51] <thcipriani>	 well, the patch that I cherry-picked :P
[16:40:24] <bd808>	 we are up to 14 picks which is a bit crazy
[16:40:28] <bd808>	 some of them are really old too
[16:41:27] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10puppet-compiler: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2576152 (10greg) p:05Triage>03High
[16:41:56] <bd808>	 it would be neat if there were arbitrary tags in gerrit so we could easily find all of the things that are cherry-picked
[16:43:12] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10puppet-compiler: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2575396 (10greg) (Set it to High, but I assume @fgiunchedi fixed it by removing the old compilations.)
[16:46:04] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2576175 (10chasemp) @hashar I'm uncomfortable with you changing these values without a notice here before hand, a SAL e...
[16:48:35] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[17:01:14] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta puppetmaster cherry-pick process - https://phabricator.wikimedia.org/T135427#2576224 (10thcipriani) p:05High>03Normal Lowing priority from high since nothing is happened here in a while.  Here's what's currently cherry picked:  | Bryan Davis |...
[17:02:12] <thcipriani>	 ^ bd808 threw out an idea there that incorporates "tags" in some way
[17:02:21] <thcipriani>	 (probably not the way you were thinking :))
[17:03:13] <bd808>	 ah. yeah something in phab could be made to work
[17:03:57] <bd808>	 I'm going to bump on each of them that looks stalled in gerrit and make sure there are reviewers too
[17:04:13] <bd808>	 gergo's sentry stuff has been rotting for a long time :/
[17:04:47] <thcipriani>	 yeah, that's not the only thing :((
[17:08:01] <wikibugs>	 06Release-Engineering-Team: Preload TestingAccessWrapper in production mwrepl - https://phabricator.wikimedia.org/T143607#2576264 (10Mattflaschen-WMF)
[17:10:27] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [3600.0]
[17:34:18] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07Puppet: Preload TestingAccessWrapper in production mwrepl - https://phabricator.wikimedia.org/T143607#2576402 (10greg)
[17:42:14] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2576452 (10hashar) Asked sorry @chasemp :-(
[17:52:18] <wikibugs>	 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Identify "first responders" for "all" "components" deployed on Wikimedia servers - https://phabricator.wikimedia.org/T141066#2576530 (10greg)
[17:52:33] <grrrit-wm>	 (03PS10) 10Legoktm: Add usage to forbid superglobals like $_GET,$_POST [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/296395 (owner: 10Lethexie)
[17:52:39] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Add usage to forbid superglobals like $_GET,$_POST [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/296395 (owner: 10Lethexie)
[17:55:24] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add usage to forbid superglobals like $_GET,$_POST [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/296395 (owner: 10Lethexie)
[17:58:15] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta puppetmaster cherry-pick process - https://phabricator.wikimedia.org/T135427#2576550 (10bd808) >>! In T135427#2576224, @thcipriani wrote: > Here's what's currently cherry picked: >  > | Tyler Cipriani | scap: bump version to 3.2.3-1 merged > | Bra...
[17:59:53] <bd808>	 thcipriani: I got it back down to 11 cherry-picks. Not great, but minor progress
[18:00:26] <thcipriani>	 bd808: awesome! thanks for your help
[18:20:09] <grrrit-wm>	 (03PS1) 10Legoktm: Enable composer-test for TemplateSandbox [integration/config] - 10https://gerrit.wikimedia.org/r/306257 (https://phabricator.wikimedia.org/T143703) 
[18:20:38] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Enable composer-test for TemplateSandbox [integration/config] - 10https://gerrit.wikimedia.org/r/306257 (https://phabricator.wikimedia.org/T143703) (owner: 10Legoktm)
[18:21:38] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable composer-test for TemplateSandbox [integration/config] - 10https://gerrit.wikimedia.org/r/306257 (https://phabricator.wikimedia.org/T143703) (owner: 10Legoktm)
[18:21:41] <legoktm>	 !log deploying https://gerrit.wikimedia.org/r/306257
[18:21:45] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[18:53:08] <shinken-wm>	 RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster is OK: OK: Less than 100.00% above the threshold [0.0]
[18:59:25] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2576881 (10chasemp) >>! In T143016#2576452, @hashar wrote: > Asked sorry @chasemp :-(  No worries, thanks for understan...
[19:12:46] <grrrit-wm>	 (03CR) 10Ejegg: "Thanks awight! Looks like it needs a manual rebase" [integration/config] - 10https://gerrit.wikimedia.org/r/301025 (https://phabricator.wikimedia.org/T141309) (owner: 10Awight)
[19:34:00] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551#2577111 (10hashar) There are some undefined index from SpamBlacklist but that is a known issue: T138429
[19:35:54] <wikibugs>	 10Continuous-Integration-Config: Update jenkins job builder - https://phabricator.wikimedia.org/T143731#2577130 (10Paladox)
[19:39:58] <thcipriani>	 hashar: noteworthy since you're doing train this week. I setup fatalmonitors for each group https://logstash.wikimedia.org/app/kibana#/dashboard/group0 and https://logstash.wikimedia.org/app/kibana#/dashboard/group1
[19:41:58] <hashar>	 oh man
[19:42:00] <hashar>	 thank you for that
[19:42:17] <hashar>	 are the errors properly flagged with the version nowadays ?
[19:43:23] <thcipriani>	 I'm unsure how that version graph is being generated, actually
[19:43:39] <hashar>	 I added it iirc
[19:44:02] <hashar>	 no idea how i did it though
[19:44:09] <thcipriani>	 yeah, mwversion.raw :)
[19:44:10] <hashar>	 the new web interface really confuses me
[19:44:28] <thcipriani>	 takes some getting used to.
[19:47:29] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10Wikimedia-Logstash: Send Jenkins daemon logs to logstash - https://phabricator.wikimedia.org/T143733#2577168 (10hashar)
[19:50:45] <hashar>	 legoktm: thcipriani: if you get around still.  I am curious why you got the various jenkins jobs back to the permanent slaves
[19:51:02] <hashar>	 but I guess it was because there was no/not enough instances available on nodepool
[19:51:10] <hashar>	 or was there some other oddity that needs proper fixing?
[19:51:10] <paladox>	 hashar since the tests were slow on nodepool
[19:51:11] <legoktm>	 nodepool slaves were spawning  really slowly or not at all
[19:51:23] <thcipriani>	 ^
[19:51:29] <hashar>	 but once the build ran on them they worked just fine weren't they?
[19:51:34] <legoktm>	 and the jobs I moved took less than a minute, it didn't really make sense to keep them on nodepool
[19:51:42] <legoktm>	 yes, the jobs were running fine, once they ran
[19:51:49] <hashar>	 great :]
[19:51:59] <legoktm>	 but they'd just burn through the available vms really quickly with little benefit
[19:52:02] <thcipriani>	 particularly last Tuesday and Wednesday, nodepool couldn't seem to spin up new machines or enough machines
[19:52:24] <hashar>	 also since Ch4se enabled statsd reporting we now have a basic dashboard at https://grafana.wikimedia.org/dashboard/db/nodepool
[19:52:25] <thcipriani>	 logs were filled with 403 errors and no actual servers were getting built
[19:52:41] <hashar>	 yup
[19:53:39] <hashar>	 when we get a bigger pool, the spike of jobs will be triggeredin  a few seconds
[19:53:49] <paladox>	 :)
[19:53:50] <hashar>	 also have to look again at the time it take for an instance to boot
[19:54:09] <hashar>	 last low hanging fruit I have seen is grub waiting 5 seconds for user to choose a kernel to boot from
[19:54:44] <thcipriani>	 I think in this instance, labs tripped, and then nodepool couldn't right itself, took a couple reboots
[19:55:16] <thcipriani>	 then the next day at midnight utc wait times started to go up again
[19:55:24] <hashar>	 oh
[19:55:43] <thcipriani>	 midnight utc seems to be the peak demand (I guess)
[19:56:01] <hashar>	 that is a bit earlier usually
[19:56:12] <hashar>	 cause midnight UTc is 2am in europe
[19:56:29] <thcipriani>	 this was rough, based on looking at https://grafana.wikimedia.org/dashboard/db/releng-zuul?panelId=25&fullscreen for a few days
[19:56:52] <hashar>	 the huge spike over night was the bunch of  security patches
[19:57:23] <hashar>	 cause if we get 7 patches over 3 release branches  that is 21 changes
[19:57:31] <hashar>	 each entering test then gate and submit or 42 events
[19:57:40] <hashar>	 each having roughly 10 builds (mediawiki/core)
[19:57:41] <hashar>	 or 420 jobs
[19:57:51] <hashar>	 all sent over a few minutes
[19:58:15] <hashar>	 they are run in parallel  but that really exhaust the infra :(
[19:58:16] <paladox>	 hashar actually 1am and 2am in europe LOL
[19:58:50] <hashar>	 ah yeah it would be 1am in Portugal
[19:58:50] <thcipriani>	 yeah, mostly looking at last week's stuff, it wasn't a ton of patches, but nodepool hit a deficit of like 40 machines and just couldn't recover. This was pre all the tweaks that were made last week: moving to permanent slaves and tweaking nodepool's refresh rate
[19:59:01] <paladox>	 and london ie uk
[19:59:02] <paladox>	 too
[19:59:18] <paladox>	 were +0 utc in the winter and +1 bst in the summer
[19:59:24] <hashar>	 and no more in europe :D
[19:59:49] <paladox>	 LOL, nope just +1 and +2, and +0 in winter
[19:59:50] <hashar>	 thcipriani: if it still managed to spawn / allocate instances it would have dealt with the queue but that would surely take a loooong time
[20:00:08] <legoktm>	 part of the problem I think is that the jobs are still optimized for permanent slaves and we haven't combined them like we planned for nodepool to run travis-ci style
[20:00:21] <hashar>	 with a deficit of 40 machines each jobs taking say 5 minutes  but only 4 instances //  that would take an hour :/
[20:00:44] <legoktm>	 this is the first security release in a long time that I can remember where we didn't have issues with slaves running out of disk space and totally falling over - and I think that was because all the large jobs were on nodepool
[20:01:07] <hashar>	 yeah that is one of the intent of disposable instances
[20:01:39] <hashar>	 a discussion we had in spring with Dan was to have a few  instances that would have  lxc based containers added in slaves for the small / light jobs
[20:01:42] <ostriches>	 legoktm: That and I was doing it a low-traffic time.
[20:01:52] <ostriches>	 Like 6-10pm pacific.
[20:01:54] <hashar>	 or even for everything if that turns out to be a good thinkg
[20:02:22] <legoktm>	 we could just use docker or something
[20:03:47] <hashar>	 so in short, a nodepool-light :D
[20:04:05] <paladox>	 :)
[20:04:13] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10Wikimedia-Logstash: Send Jenkins daemon logs to logstash - https://phabricator.wikimedia.org/T143733#2577218 (10hashar) p:05Triage>03Normal
[20:26:21] <andrewbogott>	 What does it mean that gerrit is suddenly asking me for a username when I do 'git review'?  Did something change or did my repo somehow degrade?
[20:26:30] <andrewbogott>	 This has worked without a prompt the last 600 times I've done it
[20:30:22] <andrewbogott>	 ostriches: sorry, do you have a moment to help me understand ^ ?
[20:30:40] <andrewbogott>	 This is surely user error :(
[20:30:50] <wikibugs>	 10Continuous-Integration-Config: Update jenkins job builder - https://phabricator.wikimedia.org/T143731#2577322 (10hashar) The process I follow is:  * generate the XML config with the current jb * git rebase <upstream_commit> * generate XML config and do a diff -ur  If all happy, push (maybe force push if we ahv...
[20:31:24] <hashar>	 andrewbogott: git-review --verbose   ?
[20:31:29] <hashar>	 that often helps
[20:31:41] <hashar>	 also look at the  list of git remotes
[20:31:57] <hashar>	 specially the push url with  git remote -v
[20:32:14] <andrewbogott>	 https://www.irccloud.com/pastebin/3p4C8Mjl/
[20:32:15] <hashar>	 should have something like:   gerrit  ssh://hashar@gerrit.wikimedia.org:29418/mediawiki/core.git (push)
[20:32:45] <hashar>	 yeah the push url is over https
[20:33:01] <andrewbogott>	 Also in my .git/config I have 
[20:33:05] <andrewbogott>	 https://www.irccloud.com/pastebin/NKlab4wb/
[20:33:32] <hashar>	 maybe that repo has the remote named 'origin' ?
[20:34:14] <andrewbogott>	 how would I have changed that?
[20:34:23] <hashar>	 with recent versions of git-review +  an option ,   it is smart enough to reuse origin with a remote named origin  and a push url set to ssh.  Mine has:
[20:34:25] <hashar>	 origin  https://gerrit.wikimedia.org/r/p/mediawiki/core.git (fetch)                   
[20:34:25] <hashar>	 origin  ssh://hashar@gerrit.wikimedia.org:29418/mediawiki/core.git (push)                  
[20:34:36] <hashar>	 but I digress 
[20:36:22] <andrewbogott>	 hashar: I could tell if that was advice :)
[20:36:25] <andrewbogott>	 *couldn't
[20:36:32] <hashar>	 :D
[20:36:40] <hashar>	 so what is your 'git remote -v' saying ?
[20:37:05] <andrewbogott>	 https://www.irccloud.com/pastebin/pvWTmSVz/
[20:37:44] <andrewbogott>	 so I need to tell this somehow to always use the ssh url when pushing
[20:38:16] <hashar>	 well looks like your git-review uses   the origin remote
[20:38:27] <andrewbogott>	 Yeah, as of 30 minutes ago and for no reason
[20:38:35] <hashar>	 you updated it maybe ? :D
[20:39:00] <hashar>	 or it is your local branch 'production' that is set to track    origin/production
[20:39:13] <hashar>	 and maybe  git-review uses the remote tracked branch  as the remote repo
[20:39:19] <hashar>	 git branch -vv   would tell
[20:40:09] <andrewbogott>	 I have about 100 branches, all of the tracking origin/production
[20:40:38] <hashar>	 and  https://www.irccloud.com/pastebin/3p4C8Mjl/  shows it is using origin
[20:40:48] <hashar>	 so maybe just change the pushurl for the 'origin' remote
[20:41:13] <hashar>	 git remote set-url --push origin ssh://andrew@gerrit.wikimedia.org:29418/operations/puppet.git
[20:41:27] <hashar>	 and git remote -v will then tell you:
[20:41:38] <hashar>	 origin	https://gerrit.wikimedia.org/r/p/operations/puppet (fetch)
[20:41:38] <hashar>	 origin	ssh://andrew@gerrit.wikimedia.org:29418/operations/puppet.git (push)
[20:42:00] <wmf-insecte>	 Project selenium-Echo » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #126: 04FAILURE in 58 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/126/
[20:42:02] <andrewbogott>	 that seems to have done it.  Why it broke will forever remain a mystery
[20:42:05] <wmf-insecte>	 Project selenium-Echo » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #126: 04FAILURE in 1 min 4 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/126/
[20:42:05] <andrewbogott>	 thank you!
[20:42:22] <hashar>	 andrewbogott: also for your gitconfig, you are only changing the url for the 'gerrit' remote
[20:42:38] <hashar>	 there is a trick which is to have git replace the https url  regardless of the remote name
[20:43:21] <hashar>	 using something like:   url.https://gerrit.wikimedia.org/.pushInsteadOf =  ssh://andrew@gerrit.wikimedia.org:29418/
[20:43:57] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07Puppet: Preload TestingAccessWrapper in production mwrepl - https://phabricator.wikimedia.org/T143607#2577379 (10Mattflaschen-WMF)
[20:44:08] <hashar>	 [url "ssh://<username>@gerrit.wikimedia.org:29418"]
[20:44:08] <hashar>	 	pushInsteadOf = git://git.wikimedia.org
[20:44:13] <hashar>	 andrewbogott: ^^
[20:44:19] <hashar>	 got it wrong sorry
[20:44:27] <hashar>	 that comes from https://www.mediawiki.org/wiki/Gerrit/TortoiseGit_tutorial#Using_TortoiseGit
[20:44:53] <hashar>	 and adjust the git:// url to the https://gerrit url
[20:45:05] <hashar>	 with that git magically does the rewriting behind the scene
[20:45:59] <andrewbogott>	 ok, will try next time I have something to review
[20:46:22] <hashar>	 :)
[20:48:28] <shinken-wm>	 PROBLEM - Puppet run on mira is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:57:06] <shinken-wm>	 PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:17:46] <wmf-insecte>	 Project selenium-Wikidata » firefox,test,Linux,contintLabsSlave && UbuntuTrusty build #95: 04FAILURE in 2 hr 27 min: https://integration.wikimedia.org/ci/job/selenium-Wikidata/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/95/
[21:22:59] <ostriches>	 andrewbogott: Oh herp derp, looks like hashar helped you (sorry, was out for my afternoon walk with the dog)
[21:23:07] <ostriches>	 Generally, I prefer https over ssh with gerrit ;-)
[21:23:25] <andrewbogott>	 ostriches: yeah, I have it working again; no idea why it broke
[21:24:21] <ostriches>	 Friends don't let friends use git review ;-)
[21:24:24] <ostriches>	 Or ssh ;-)
[21:28:30] <shinken-wm>	 RECOVERY - Puppet run on mira is OK: OK: Less than 1.00% above the threshold [0.0]
[21:31:47] <shinken-wm>	 PROBLEM - Puppet run on deployment-aqs01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[21:34:37] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1012 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[21:35:43] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1011 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[21:37:08] <shinken-wm>	 RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[21:58:33] <wmf-insecte>	 Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #130: 04FAILURE in 6 min 32 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/130/
[22:22:04] <wmf-insecte>	 Yippee, build fixed!
[22:22:05] <wmf-insecte>	 Project selenium-CentralAuth » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #124: 09FIXED in 2 min 3 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/124/
[22:54:01] <grrrit-wm>	 (03CR) 10EBernhardson: "my previous comment should have read (T prefixed bug ids):" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/305762 (owner: 10Lethexie)
[23:07:12] <grrrit-wm>	 (03PS8) 10Awight: Use composer in DonationInterface hhvm tests [integration/config] - 10https://gerrit.wikimedia.org/r/301025 (https://phabricator.wikimedia.org/T141309) 
[23:23:34] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:24:35] <wmf-insecte>	 Project beta-code-update-eqiad build #118338: 04FAILURE in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/118338/
[23:27:44] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[23:34:30] <wmf-insecte>	 Yippee, build fixed!
[23:34:30] <wmf-insecte>	 Project beta-code-update-eqiad build #118339: 09FIXED in 1 min 29 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/118339/