[01:51:00] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:56:40] PROBLEM - Puppet failure on deployment-test is CRITICAL 30.00% of data above the critical threshold [0.0] [01:57:40] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:58:10] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 33.33% of data above the critical threshold [0.0] [01:58:10] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 33.33% of data above the critical threshold [0.0] [01:58:14] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:58:22] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 22.22% of data above the critical threshold [0.0] [01:58:30] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:58:54] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:58:58] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:59:14] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL 22.22% of data above the critical threshold [0.0] [01:59:26] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:59:28] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:59:30] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 20.00% of data above the critical threshold [0.0] [01:59:38] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL 60.00% of data above the critical threshold [0.0] [01:59:40] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 30.00% of data above the critical threshold [0.0] [02:00:07] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 22.22% of data above the critical threshold [0.0] [02:00:21] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:00:39] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 30.00% of data above the critical threshold [0.0] [02:00:43] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 40.00% of data above the critical threshold [0.0] [02:01:37] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 40.00% of data above the critical threshold [0.0] [02:01:39] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 70.00% of data above the critical threshold [0.0] [02:01:49] PROBLEM - Puppet failure on deployment-salt is CRITICAL 30.00% of data above the critical threshold [0.0] [02:02:03] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL 30.00% of data above the critical threshold [0.0] [02:02:05] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 44.44% of data above the critical threshold [0.0] [02:02:07] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 44.44% of data above the critical threshold [0.0] [02:02:07] PROBLEM - Puppet failure on deployment-upload is CRITICAL 44.44% of data above the critical threshold [0.0] [02:02:07] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 44.44% of data above the critical threshold [0.0] [02:03:09] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL 33.33% of data above the critical threshold [0.0] [02:03:23] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:03:23] PROBLEM - Puppet failure on deployment-stream is CRITICAL 60.00% of data above the critical threshold [0.0] [02:13:10] RECOVERY - Puppet failure on deployment-elastic06 is OK Less than 1.00% above the threshold [0.0] [02:13:22] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [02:13:32] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [02:14:26] RECOVERY - Puppet failure on deployment-parsoidcache02 is OK Less than 1.00% above the threshold [0.0] [02:15:39] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [02:16:37] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [02:16:39] RECOVERY - Puppet failure on deployment-sca01 is OK Less than 1.00% above the threshold [0.0] [02:16:40] RECOVERY - Puppet failure on deployment-test is OK Less than 1.00% above the threshold [0.0] [02:17:05] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [02:17:07] RECOVERY - Puppet failure on deployment-upload is OK Less than 1.00% above the threshold [0.0] [02:17:39] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [02:18:06] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [02:18:12] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [02:18:22] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [02:18:56] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [02:19:36] RECOVERY - Puppet failure on deployment-fluoride is OK Less than 1.00% above the threshold [0.0] [02:20:40] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [02:21:00] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [02:22:08] RECOVERY - Puppet failure on deployment-mediawiki02 is OK Less than 1.00% above the threshold [0.0] [02:23:12] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [02:23:26] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [02:24:14] RECOVERY - Puppet failure on deployment-restbase01 is OK Less than 1.00% above the threshold [0.0] [02:24:29] RECOVERY - Puppet failure on deployment-bastion is OK Less than 1.00% above the threshold [0.0] [02:24:41] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [02:25:07] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [02:25:23] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [02:26:47] RECOVERY - Puppet failure on deployment-salt is OK Less than 1.00% above the threshold [0.0] [02:27:03] RECOVERY - Puppet failure on deployment-mediawiki03 is OK Less than 1.00% above the threshold [0.0] [02:27:03] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [02:28:59] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [02:29:27] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [04:02:24] PROBLEM - SSH on deployment-logstash1 is CRITICAL - Socket timeout after 10 seconds [04:07:21] RECOVERY - SSH on deployment-logstash1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [04:40:59] Yippee, build fixed! [04:40:59] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #447: FIXED in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/447/ [04:41:39] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL - Socket timeout after 10 seconds [04:46:34] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 47739 bytes in 2.448 second response time [05:35:22] Yippee, build fixed! [05:35:23] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #422: FIXED in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/422/ [06:17:49] PROBLEM - Puppet failure on deployment-salt is CRITICAL 20.00% of data above the critical threshold [0.0] [06:43:45] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [06:47:49] RECOVERY - Puppet failure on deployment-salt is OK Less than 1.00% above the threshold [0.0] [07:31:16] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [07:34:17] 10Continuous-Integration-Infrastructure, 10Parsoid, 10VisualEditor: VisualEditor doesn't work on the beta-cluster - https://phabricator.wikimedia.org/T99496#1292767 (10Amire80) 3NEW [07:36:16] RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.019 second response time [08:05:51] zeljkof: can you see my message? [08:06:02] etonkovidova: yes! :) [08:06:11] zeljkof: perfec! [08:07:00] hola! [08:10:51] etonkovidova: https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/graphs/contributors [08:10:59] greg-g: welcome to irccloud! :) [08:13:41] zeljkof: I'm not on irccloud, I'm on my bouncer (via ssh) ;) [08:13:52] greg-g: cool :) [08:13:58] irssi ftw ;) [08:21:43] greg-g: you are such a geek [08:22:13] 10Continuous-Integration-Infrastructure, 10Parsoid, 10VisualEditor: VisualEditor doesn't work on the beta-cluster - https://phabricator.wikimedia.org/T99496#1292834 (10mobrovac) This is caused by a problem in the VE <-> RESTBase communication. There seem to be two problems: 1. VE in Beta is trying to use th... [08:22:16] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [08:25:25] zeljkof: https://phabricator.wikimedia.org/T97364 [08:27:57] 10Beta-Cluster, 10RESTBase, 10VisualEditor: VisualEditor doesn't work on the beta-cluster - https://phabricator.wikimedia.org/T99496#1292841 (10mobrovac) p:5Triage>3Unbreak! a:3mobrovac [08:37:17] RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.020 second response time [08:39:32] 10Continuous-Integration-Infrastructure: PHP Warning: Module 'apc' already loaded in Unknown on line 0 - https://phabricator.wikimedia.org/T99413#1292854 (10Legoktm) [08:39:49] 10Continuous-Integration-Infrastructure: PHP Warning: Module 'apc' already loaded in Unknown on line 0 on zend slaves - https://phabricator.wikimedia.org/T99413#1292857 (10Legoktm) [08:48:52] 10Continuous-Integration-Infrastructure: PHP Warning: Module 'apc' already loaded in Unknown on line 0 on zend slaves - https://phabricator.wikimedia.org/T99413#1292872 (10Nemo_bis) Legoktm said: > ...no, it was just failing on "08:11:01 42 | WARNING | Line exceeds 100 characters; contains 107 characters" and c... [08:56:40] PROBLEM - Content Translation Server on deployment-sca02 is CRITICAL: Connection refused [09:03:17] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 100.00% of data above the critical threshold [0.0] [09:10:06] 6Release-Engineering, 6Project-Creators: SWAT Project (Tag) - https://phabricator.wikimedia.org/T99411#1292920 (10mmodell) @krenair: that was me testing out a sprint, this is proposal for a real permanent project. [09:12:25] 10Beta-Cluster, 10Parsoid: Parsoid not binding to any port in Beta Cluster - https://phabricator.wikimedia.org/T99505#1292925 (10mobrovac) 3NEW [09:14:14] 10Beta-Cluster, 10RESTBase, 10VisualEditor: VisualEditor doesn't work on the beta-cluster - https://phabricator.wikimedia.org/T99496#1292935 (10mobrovac) [09:14:18] 10Beta-Cluster, 10Parsoid: Parsoid not binding to any port in Beta Cluster - https://phabricator.wikimedia.org/T99505#1292936 (10mobrovac) [09:16:51] 6Release-Engineering, 6Project-Creators: SWAT Project (Tag) - https://phabricator.wikimedia.org/T99411#1292945 (10Legoktm) > It looks like we will start organizing SWAT deployments on maniphest Says who and why? Where was this discussed with SWAT deployers / people who request SWAT deploys / people who someho... [09:20:22] 6Release-Engineering, 6Project-Creators: SWAT Project (Tag) - https://phabricator.wikimedia.org/T99411#1292948 (10greg) Don't worry :) I just started an email thread with the people listed on Deployments page as SWATers after a suggestion from Jon R. Nothing decided yet, we just started the conversation. When... [09:22:24] PROBLEM - Puppet failure on deployment-mx is CRITICAL 100.00% of data above the critical threshold [0.0] [09:55:48] 10Beta-Cluster, 10RESTBase, 10VisualEditor: VisualEditor doesn't work on the beta-cluster - https://phabricator.wikimedia.org/T99496#1292985 (10mobrovac) >>! In T99496#1292834, @mobrovac wrote: > This is caused by a problem in the VE <-> RESTBase communication. There seem to be two problems: > > 1. VE in Be... [09:56:07] RECOVERY - Puppet staleness on deployment-restbase02 is OK Less than 1.00% above the threshold [3600.0] [10:12:30] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1293000 (10JeanFred) Just in case that’s useful, I’m a long time Ansible user and use it for deployments − happy to help out if I can. [10:31:21] PROBLEM - Free space - all mounts on deployment-eventlogging02 is CRITICAL deployment-prep.deployment-eventlogging02.diskspace._var.byte_percentfree (<30.00%) [11:11:43] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL - Socket timeout after 10 seconds [11:28:02] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 33.33% of data above the critical threshold [0.0] [11:35:39] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL 30.00% of data above the critical threshold [0.0] [11:39:55] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 50.00% of data above the critical threshold [0.0] [11:40:25] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL 70.00% of data above the critical threshold [0.0] [11:48:02] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [11:50:24] RECOVERY - Puppet failure on deployment-parsoidcache02 is OK Less than 1.00% above the threshold [0.0] [11:55:35] RECOVERY - Puppet failure on deployment-fluoride is OK Less than 1.00% above the threshold [0.0] [11:59:52] Project browsertests-CentralAuth-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #112: FAILURE in 2 min 52 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralAuth-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/112/ [11:59:55] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [12:17:33] (03CR) 10TheDJ: "Note to self, link that page from the relevant: https://www.mediawiki.org/wiki/Manual:Coding_conventions" [integration/config] - 10https://gerrit.wikimedia.org/r/209991 (owner: 10TheDJ) [12:38:22] greg-g: if the brain trust in Annecy has any input on https://phabricator.wikimedia.org/T99505 , it would be highly appreciated [12:49:50] also, does anybody know what's the status of deployment-logstash1 ? [12:50:31] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-Translate: mediawiki-extensions-hhvm: MessageGroupStatesUpdaterJobTest::testHooks is intermittent failing - https://phabricator.wikimedia.org/T88554#1293385 (10Nikerabbit) p:5High>3Low [13:01:52] 10Beta-Cluster, 6Labs: No DNS entry for deployment-logstash1.eqiad.wmflabs ? - https://phabricator.wikimedia.org/T99521#1293401 (10mobrovac) 3NEW [13:08:27] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds [13:08:50] 10Beta-Cluster, 6Labs: No DNS entry for deployment-logstash1.eqiad.wmflabs ? - https://phabricator.wikimedia.org/T99521#1293422 (10mobrovac) [13:09:16] 10Beta-Cluster, 6Labs: No DNS entry for deployment-logstash1.eqiad.wmflabs ? - https://phabricator.wikimedia.org/T99521#1293401 (10mobrovac) [13:12:10] 10Beta-Cluster, 10Parsoid: Parsoid not binding to any port in Beta Cluster - https://phabricator.wikimedia.org/T99505#1293428 (10mobrovac) Heh, turns out the actual problem has nothing to do with Parsoid per se. As in T99506, Parsoid tries to create a gelf logger which connects to `deployment-logstash1`, but i... [13:12:11] RECOVERY - Parsoid on deployment-parsoid05 is OK: HTTP OK: HTTP/1.1 200 OK - 1476 bytes in 0.078 second response time [13:13:14] any take / info on https://phabricator.wikimedia.org/T99521 ? [13:13:21] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31029 bytes in 0.802 second response time [13:13:26] parsoid and restbase are failing in beta because of it [13:22:43] 10Beta-Cluster, 6Release-Engineering, 6Labs: No DNS entry for deployment-logstash1.eqiad.wmflabs ? - https://phabricator.wikimedia.org/T99521#1293441 (10Aklapper) [13:35:09] 10Beta-Cluster, 6Release-Engineering, 6Labs: No DNS entry for deployment-logstash1.eqiad.wmflabs ? - https://phabricator.wikimedia.org/T99521#1293479 (10Joe) We narrowed down the problem to be local to the ldap-backed dns, @Coren is looking into it at the moment. [13:37:56] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1293487 (10Joe) @Gwicke >>! In T93433#1207219, @mmodell wrote: > Although ansible looks really coo l** I think it's going to be a tough sell as long as we are using puppet and salt as our 'offica... [14:06:46] 10Beta-Cluster, 6Release-Engineering, 6Labs: No DNS entry for deployment-logstash1.eqiad.wmflabs ? - https://phabricator.wikimedia.org/T99521#1293546 (10coren) The issue may actually lie within dnsmasq after all; the instance is properly listed in the list it should be serving the name of, yet gives a SRVFAI... [14:13:22] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1293559 (10GWicke) > A tough one indeed. We are primarily interested in rolling deployments here, which puppet doesn't support; there shouldn't be any direct competition vs. puppet at this point.... [14:42:28] PROBLEM - Puppet failure on integration-puppetmaster is CRITICAL 50.00% of data above the critical threshold [0.0] [14:50:53] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:07:27] RECOVERY - Puppet failure on integration-puppetmaster is OK Less than 1.00% above the threshold [0.0] [15:17:31] * Nemo_bis upgraded to fedora 21, now going through https://phabricator.wikimedia.org/diffusion/MWVA/browse/master/support/README-lxc.md or rather https://github.com/fgrehm/vagrant-lxc/wiki/Usage-on-fedora-hosts#fedora-21 [15:20:54] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [15:25:45] Lines 17–47 can hopefully be skipped [15:26:24] 10Beta-Cluster, 10Parsoid: Parsoid not binding to any port in Beta Cluster - https://phabricator.wikimedia.org/T99505#1293752 (10Andrew) [15:26:26] 10Beta-Cluster, 6Release-Engineering, 6Labs: No DNS entry for deployment-logstash1.eqiad.wmflabs ? - https://phabricator.wikimedia.org/T99521#1293748 (10Andrew) 5Open>3Resolved a:3Andrew No idea why this broke, but a reboot fixed it. [15:28:57] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #537: FAILURE in 56 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/537/ [15:29:43] PROBLEM - Puppet staleness on deployment-logstash1 is CRITICAL 22.22% of data above the critical threshold [43200.0] [15:34:43] RECOVERY - Puppet staleness on deployment-logstash1 is OK Less than 1.00% above the threshold [3600.0] [16:29:10] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 100.00% of data above the critical threshold [0.0] [16:36:56] no luck... [16:36:56] - Gem::Ext::BuildError: ERROR: Failed to build gem native extension. [16:44:16] 10Beta-Cluster, 6Release-Engineering, 6Labs: No DNS entry for deployment-logstash1.eqiad.wmflabs ? - https://phabricator.wikimedia.org/T99521#1293890 (10coren) As a further note (so that we can recognize the bug if it recurs), dnsmasq /did/ have a valid lease for that IP and had the correct name associated w... [16:46:00] PROBLEM - Host integration-slave-trusty-1015 is DOWN: CRITICAL - Host Unreachable (10.68.18.30) [16:46:12] PROBLEM - Host deployment-bastion is DOWN: CRITICAL - Host Unreachable (10.68.16.58) [16:47:18] PROBLEM - Host deployment-test is DOWN: CRITICAL - Host Unreachable (10.68.16.149) [16:47:34] PROBLEM - Host deployment-memc03 is DOWN: PING CRITICAL - Packet loss = 100% [16:47:44] PROBLEM - Host deployment-parsoid05 is DOWN: PING CRITICAL - Packet loss = 100% [16:47:46] PROBLEM - Host deployment-restbase01 is DOWN: PING CRITICAL - Packet loss = 100% [16:48:38] PROBLEM - Host deployment-rsync01 is DOWN: PING CRITICAL - Packet loss = 100% [16:48:44] PROBLEM - Host integration-slave-trusty-1013 is DOWN: PING CRITICAL - Packet loss = 100% [16:49:04] PROBLEM - Host deployment-urldownloader is DOWN: PING CRITICAL - Packet loss = 100% [16:49:16] PROBLEM - Host deployment-elastic08 is DOWN: CRITICAL - Host Unreachable (10.68.17.188) [16:49:30] PROBLEM - Host integration-raita is DOWN: CRITICAL - Host Unreachable (10.68.16.53) [16:49:48] PROBLEM - Host deployment-cache-text02 is DOWN: CRITICAL - Host Unreachable (10.68.16.16) [16:50:21] PROBLEM - Host deployment-pdf01 is DOWN: PING CRITICAL - Packet loss = 100% [16:50:23] PROBLEM - Host deployment-salt is DOWN: CRITICAL - Host Unreachable (10.68.16.99) [16:50:25] PROBLEM - Host Generic Beta Cluster is DOWN: CRITICAL - Host Unreachable (en.wikipedia.beta.wmflabs.org) [16:54:13] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 20.00% of data above the critical threshold [0.0] [16:56:56] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 60.00% of data above the critical threshold [0.0] [16:57:04] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 50.00% of data above the critical threshold [0.0] [16:57:40] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 22.22% of data above the critical threshold [0.0] [16:58:10] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 40.00% of data above the critical threshold [0.0] [16:59:18] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 30.00% of data above the critical threshold [0.0] [16:59:27] PROBLEM - Puppet failure on deployment-stream is CRITICAL 50.00% of data above the critical threshold [0.0] [17:00:03] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 33.33% of data above the critical threshold [0.0] [17:00:31] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 20.00% of data above the critical threshold [0.0] [17:00:42] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 44.44% of data above the critical threshold [0.0] [17:01:08] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 40.00% of data above the critical threshold [0.0] [17:01:24] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 60.00% of data above the critical threshold [0.0] [17:01:46] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 100.00% of data above the critical threshold [0.0] [17:02:38] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL 40.00% of data above the critical threshold [0.0] [17:03:02] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL 30.00% of data above the critical threshold [0.0] [17:04:12] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL 20.00% of data above the critical threshold [0.0] [17:04:24] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 60.00% of data above the critical threshold [0.0] [17:06:28] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL 40.00% of data above the critical threshold [0.0] [17:07:39] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 11.11% of data above the critical threshold [0.0] [17:08:11] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 30.00% of data above the critical threshold [0.0] [17:08:11] PROBLEM - Puppet failure on deployment-upload is CRITICAL 20.00% of data above the critical threshold [0.0] [17:08:57] RECOVERY - Host deployment-elastic08 is UPING OK - Packet loss = 0%, RTA = 0.59 ms [17:09:03] RECOVERY - Host deployment-memc03 is UPING OK - Packet loss = 0%, RTA = 0.69 ms [17:09:15] RECOVERY - Host deployment-parsoid05 is UPING OK - Packet loss = 0%, RTA = 0.86 ms [17:09:39] RECOVERY - Host deployment-rsync01 is UPING OK - Packet loss = 0%, RTA = 0.60 ms [17:09:49] RECOVERY - Host deployment-cache-text02 is UPING OK - Packet loss = 0%, RTA = 0.62 ms [17:09:57] RECOVERY - Host deployment-bastion is UPING OK - Packet loss = 0%, RTA = 0.63 ms [17:10:07] RECOVERY - Host deployment-pdf01 is UPING OK - Packet loss = 0%, RTA = 0.60 ms [17:10:13] RECOVERY - Host deployment-urldownloader is UPING OK - Packet loss = 0%, RTA = 0.55 ms [17:10:17] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL 50.00% of data above the critical threshold [0.0] [17:10:23] RECOVERY - Host deployment-restbase01 is UPING OK - Packet loss = 0%, RTA = 0.76 ms [17:10:33] RECOVERY - Host deployment-test is UPING OK - Packet loss = 0%, RTA = 0.75 ms [17:11:21] RECOVERY - Host Generic Beta Cluster is UPING OK - Packet loss = 0%, RTA = 0.45 ms [17:11:39] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL 40.00% of data above the critical threshold [0.0] [17:11:43] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 66.67% of data above the critical threshold [0.0] [17:11:51] RECOVERY - Host deployment-salt is UPING OK - Packet loss = 0%, RTA = 0.48 ms [17:12:00] eh? [17:12:01] RECOVERY - Host integration-slave-trusty-1013 is UPING OK - Packet loss = 0%, RTA = 0.82 ms [17:12:02] RECOVERY - Host integration-slave-trusty-1015 is UPING OK - Packet loss = 0%, RTA = 0.55 ms [17:13:43] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 55.56% of data above the critical threshold [0.0] [17:14:07] RECOVERY - Host integration-raita is UPING OK - Packet loss = 0%, RTA = 0.71 ms [17:14:09] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 40.00% of data above the critical threshold [0.0] [17:14:33] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 80.00% of data above the critical threshold [0.0] [17:15:29] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 71.43% of data above the critical threshold [0.0] [17:24:18] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [17:24:26] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [17:25:00] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [17:25:30] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [17:25:44] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [17:26:08] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [17:26:22] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [17:27:14] heh, was alone today in the future-of-deployment meeting [17:27:38] RECOVERY - Puppet failure on deployment-logstash1 is OK Less than 1.00% above the threshold [0.0] [17:28:07] RECOVERY - Puppet failure on deployment-mediawiki03 is OK Less than 1.00% above the threshold [0.0] [17:29:13] RECOVERY - Puppet failure on deployment-elastic06 is OK Less than 1.00% above the threshold [0.0] [17:31:27] RECOVERY - Puppet failure on deployment-parsoidcache02 is OK Less than 1.00% above the threshold [0.0] [17:32:35] mobrovac: everyone is out at for a walk/doing some shopping for things they forgot. I'm stuck at the hotel in a budget meeting [17:32:39] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [17:32:55] well, all of my people that is ;) [17:33:00] greg-g: yeah, no pb, figured as much [17:33:02] lucky you [17:33:05] fight fight fight [17:33:06] :) [17:33:07] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [17:34:33] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [17:34:44] mobrovac: :P [17:36:39] RECOVERY - Puppet failure on deployment-fluoride is OK Less than 1.00% above the threshold [0.0] [17:36:41] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [17:36:41] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [17:36:57] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [17:37:43] RECOVERY - Puppet failure on deployment-sca01 is OK Less than 1.00% above the threshold [0.0] [17:38:13] RECOVERY - Puppet failure on deployment-upload is OK Less than 1.00% above the threshold [0.0] [17:38:41] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [17:39:09] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [17:39:11] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [17:39:25] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [17:40:20] RECOVERY - Puppet failure on deployment-restbase02 is OK Less than 1.00% above the threshold [0.0] [17:40:32] RECOVERY - Puppet failure on deployment-bastion is OK Less than 1.00% above the threshold [0.0] [17:42:04] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [17:43:14] RECOVERY - Puppet failure on deployment-mediawiki02 is OK Less than 1.00% above the threshold [0.0] [18:21:33] PROBLEM - Puppet failure on integration-zuul-packaged is CRITICAL 100.00% of data above the critical threshold [0.0] [18:22:36] 10Beta-Cluster, 10RESTBase, 10VisualEditor: VisualEditor doesn't work on the beta-cluster - https://phabricator.wikimedia.org/T99496#1294203 (10mobrovac) [18:22:39] 10Beta-Cluster, 10Parsoid: Parsoid not binding to any port in Beta Cluster - https://phabricator.wikimedia.org/T99505#1294201 (10mobrovac) 5Open>3Resolved a:3mobrovac [18:43:14] 10Beta-Cluster, 10RESTBase, 10VisualEditor: VisualEditor doesn't work on the beta-cluster - https://phabricator.wikimedia.org/T99496#1294339 (10mobrovac) 5Open>3Resolved [18:47:15] 10Continuous-Integration-Infrastructure, 7Jenkins: Let Jenkins-mwext-sync clean up own open unmergable patch sets - https://phabricator.wikimedia.org/T99552#1294357 (10Umherirrender) 3NEW [19:18:44] 10Beta-Cluster, 10RESTBase-Cassandra: Cassandra gets the wrong IPs in deployment-prep - https://phabricator.wikimedia.org/T99564#1294541 (10mobrovac) 3NEW [20:26:21] RECOVERY - Free space - all mounts on deployment-eventlogging02 is OK All targets OK [20:41:40] 10Deployment-Systems, 6Collaboration-Team, 10Thanks, 7I18n: "Thanks" button in language which is not the interface one - https://phabricator.wikimedia.org/T99575#1294860 (10Nemo_bis) [20:56:10] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL - Socket timeout after 10 seconds [20:56:10] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL - Socket timeout after 10 seconds [20:59:39] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 46625 bytes in 4.356 second response time [21:00:53] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 47738 bytes in 0.767 second response time [21:07:13] 10Deployment-Systems, 6Collaboration-Team, 10Thanks, 7I18n: "Thanks" button in language which is not the interface one - https://phabricator.wikimedia.org/T99575#1294906 (10Nemo_bis) 5Open>3Resolved a:3Nemo_bis Actually the search failed me https://translatewiki.net/w/i.php?title=MediaWiki:Thanks-tha... [21:17:26] PROBLEM - Puppet staleness on deployment-eventlogging02 is CRITICAL 100.00% of data above the critical threshold [43200.0] [21:38:45] 10Continuous-Integration-Infrastructure, 6Release-Engineering, 10Wikimedia-Hackathon-2015, 10Wikipedia-Android-App, 10Wikipedia-iOS-App: Hacking: Create end-to-end automated test for Wikipedia native app(s) - https://phabricator.wikimedia.org/T90177#1294965 (10BGerstle-WMF) a:3BGerstle-WMF [22:18:23] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #123: ABORTED in 3 hr 0 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/123/ [22:47:17] 10Browser-Tests, 6Release-Engineering: Net::ReadTimeout shouldn't mark a test as failed - https://phabricator.wikimedia.org/T98968#1295118 (10Jdlrobson) This is hitting us a lot today. Examples: https://integration.wikimedia.org/ci/view/Mobile/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrom... [23:37:48] 10Browser-Tests, 6Release-Engineering, 10Hackathon-Mexico-City-2015, 10Wikimedia-Hackathon-2015: Create pool of user accounts on beta cluster for browser test builds in Jenkins - https://phabricator.wikimedia.org/T90964#1295145 (10hashar) 5Open>3declined a:3hashar I don't see a good use case for now....