[00:00:03] 6Labs, 10Continuous-Integration-Infrastructure: Designate should support split horizon resolution to yield private IP of instances behind a public DNS entry - https://phabricator.wikimedia.org/T95288#1286740 (10Andrew) I'm going to solve this by setting up a labs-specific recursor which will swizzle IPs the wa... [00:00:13] 6Labs, 10Continuous-Integration-Infrastructure: Designate should support split horizon resolution to yield private IP of instances behind a public DNS entry - https://phabricator.wikimedia.org/T95288#1286741 (10Andrew) [00:00:16] 6Labs, 10hardware-requests, 6operations: New server for labs dns recursor - https://phabricator.wikimedia.org/T99133#1286742 (10Andrew) [00:01:33] 6Labs, 10hardware-requests, 6operations: New server for labs dns recursor - https://phabricator.wikimedia.org/T99133#1285990 (10Andrew) Let's call it labdns1003 [00:27:13] !log tools cleared graphite data for /var/* mounts on tools-redis [00:27:17] Logged the message, Master [00:28:26] 10Tool-Labs: Shinken: make sure 'Free space - all mounts' can handle no-longer-existing mounts - https://phabricator.wikimedia.org/T99077#1286820 (10yuvipanda) Cleaned out /var, /var/lib and /var/lib/redis mount data from tools-redis on graphite. This involves deleting the folders for these from /srv/carbon on l... [03:26:50] !ping [03:26:51] !pong [03:26:58] Hi Wikimedia labs [03:27:51] !ping [03:27:52] !pong [03:27:55] ha ha [03:30:36] its used for speed testing. I was getting lag [03:32:37] !ping [03:32:37] !pong [03:32:50] What is expected typically for speed? [03:35:51] Greater than 0ms [03:37:12] ok, so basically, if you can notice? [03:37:17] because [03:37:22] !ping [03:37:22] !pong [03:37:26] I can notice [05:36:42] 6Labs, 6operations: salt does not run reliably for toollabs - https://phabricator.wikimedia.org/T99213#1287107 (10yuvipanda) 3NEW [05:40:54] 6Labs, 6operations: salt does not run reliably for toollabs - https://phabricator.wikimedia.org/T99213#1287124 (10ArielGlenn) as a first step I need to turn on the config setting on the master that forces a ping of all clients after salt master key rotation (which happens every 24 hours or after any key is del... [06:01:34] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [06:01:35] waaaat [06:01:50] no shinken, it's up! [06:06:20] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 769826 bytes in 3.383 second response time [06:33:33] PROBLEM - Puppet failure on tools-master is CRITICAL 50.00% of data above the critical threshold [0.0] [06:33:54] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 60.00% of data above the critical threshold [0.0] [06:58:33] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [06:58:53] RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0] [07:35:41] 6Labs, 5Patch-For-Review: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1287271 (10jcrespo) @Andrew With a bit of cleanup we have reduced the 90GB-single-file to 2.5 GB of space without deleting any row. I would like to consider also migrating fully to InnoDB.... [08:08:10] hi all, I have a little question, I've setup a test wiki on a labs-instance for testing my extension, however the help page says that labs-vagrant has many settings tuned for development, which can leave the wiki insecure [08:08:38] my question is, if the wiki is only for testing, should I take additional steps to make it secure? [08:23:00] codezee: They are mostly disclosure settings (verbose error messages, etc). [08:25:10] Coren: alright, then I guess I need not worry [08:25:43] codezee: https://www.mediawiki.org/wiki/Manual:Security goes into detail about sane settings for an exposed wiki. You might also want to turn $wgShowSQLErrors and $wgDebugDumpSql to false when you don't need them. [08:26:22] Coren: thanks for the link! I'll go through it [10:53:47] andrewbogott_afk: Ping me when you get around. [11:56:35] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 60.00% of data above the critical threshold [0.0] [12:01:53] !log tools webgrid-lighttpd-1402 puppet failure caused by major memory usage; tools.kmlexport is running heavy perl scripts [12:02:01] Logged the message, Master [12:06:15] !log tools killed those perl scripts; kmlexport's lighttpd is also using excessive memory (5%), so restarting that [12:06:19] Logged the message, Master [12:11:04] 10Tool-Labs: kmlexport perl script memory usage - https://phabricator.wikimedia.org/T99236#1287584 (10valhallasw) 3NEW [12:21:34] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0] [12:29:56] it would be so convenient if all SUL accounts were on phab :/ [12:45:16] 10Tool-Labs: Investigate alternatives to dedicated exec node for gifti's tools - https://phabricator.wikimedia.org/T99130#1287632 (10valhallasw) p:5Triage>3Normal [12:45:22] 10Tool-Labs: deduplicate compute::general and compute::dedicated roles - https://phabricator.wikimedia.org/T99131#1287634 (10valhallasw) p:5Triage>3Normal [12:45:30] 10Tool-Labs: kmlexport perl script memory usage - https://phabricator.wikimedia.org/T99236#1287635 (10valhallasw) p:5Triage>3High [12:49:35] 10Tool-Labs, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2015-05-05: Add wb_changes_subscription and wbc_entity_usage to labs db replication - https://phabricator.wikimedia.org/T98748#1287653 (10coren) This was merged and applied. I do not know if this was expected, but I note that most databases had... [12:50:54] 10Tool-Labs, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2015-05-05: Add wb_changes_subscription and wbc_entity_usage to labs db replication - https://phabricator.wikimedia.org/T98748#1287655 (10aude) coren: only wikis in the "wikidataclient.dblist" have wbc_entity_usage and also most are still empty s... [12:55:58] 10Tool-Labs, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2015-05-05: Add wb_changes_subscription and wbc_entity_usage to labs db replication - https://phabricator.wikimedia.org/T98748#1287661 (10coren) 5Open>3Resolved a:3coren [13:11:13] andrewbogott_afk: You around? [13:17:00] Off for coffee + smoke [14:04:32] Just a warning: Jaime and I are changing the db setup for labs openstack; this means that wikitech logins will be broken for a few minutes. [14:10:10] PROBLEM - Puppet failure on tools-exec-1210 is CRITICAL 22.22% of data above the critical threshold [0.0] [14:10:48] PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:11:40] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:11:46] PROBLEM - Puppet failure on tools-trusty is CRITICAL 30.00% of data above the critical threshold [0.0] [14:12:32] andrewbogott: [14:12:36] "Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to fetch instance ID at /etc/puppet/modules/base/manifests/init.pp:21 on node i-00000bc0.eqiad.wmflabs" is relatex [14:12:38] related? [14:12:45] probably [14:12:46] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1205 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:13:44] PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:13:50] PROBLEM - Puppet failure on tools-mail is CRITICAL 50.00% of data above the critical threshold [0.0] [14:14:33] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:14:33] PROBLEM - Puppet failure on tools-exec-1410 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:14:35] PROBLEM - Puppet failure on tools-exec-1409 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:14:43] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:14:49] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:15:25] PROBLEM - Puppet failure on tools-static-01 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:15:31] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:17:09] PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL 66.67% of data above the critical threshold [0.0] [14:17:35] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:18:58] PROBLEM - Puppet failure on tools-checker-01 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:19:18] PROBLEM - Puppet failure on tools-webgrid-generic-1404 is CRITICAL 33.33% of data above the critical threshold [0.0] [14:19:52] PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:19:57] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL 30.00% of data above the critical threshold [0.0] [14:19:59] PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:20:46] PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:21:14] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:21:22] PROBLEM - Puppet failure on tools-shadow is CRITICAL 40.00% of data above the critical threshold [0.0] [14:23:31] * valhallasw wonders whether we should maybe just not have shinken-wm here at all [14:23:46] after all, our mailboxes get spammed extensively as well [14:25:01] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:25:43] PROBLEM - Puppet failure on tools-services-02 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:25:43] PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:25:49] PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:26:01] PROBLEM - Puppet failure on tools-services-01 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:26:10] PROBLEM - Puppet failure on tools-submit is CRITICAL 22.22% of data above the critical threshold [0.0] [14:26:35] PROBLEM - Puppet failure on tools-precise-dev is CRITICAL 20.00% of data above the critical threshold [0.0] [14:26:37] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1405 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:27:00] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:27:21] PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL 22.22% of data above the critical threshold [0.0] [14:27:21] PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL 33.33% of data above the critical threshold [0.0] [14:27:21] PROBLEM - Puppet failure on tools-static-02 is CRITICAL 33.33% of data above the critical threshold [0.0] [14:27:23] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1409 is CRITICAL 33.33% of data above the critical threshold [0.0] [14:27:37] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:27:37] PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:28:11] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL 44.44% of data above the critical threshold [0.0] [14:28:21] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1210 is CRITICAL 44.44% of data above the critical threshold [0.0] [14:28:27] PROBLEM - Puppet failure on tools-redis is CRITICAL 50.00% of data above the critical threshold [0.0] [14:28:37] PROBLEM - Puppet failure on tools-redis-slave is CRITICAL 40.00% of data above the critical threshold [0.0] [14:28:44] I'm mostly wondering what calue those puppet failiure messages have in the general case. A status page would be good. Alerts? Not so much. [14:28:46] PROBLEM - Puppet failure on tools-webgrid-generic-1402 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:28:50] value* [14:29:00] PROBLEM - Puppet failure on tools-exec-1214 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:29:11] PROBLEM - Puppet failure on tools-checker-02 is CRITICAL 55.56% of data above the critical threshold [0.0] [14:29:19] PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:29:21] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1401 is CRITICAL 55.56% of data above the critical threshold [0.0] [14:29:22] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 55.56% of data above the critical threshold [0.0] [14:29:28] PROBLEM - Puppet failure on tools-exec-1201 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:29:29] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL 22.22% of data above the critical threshold [0.0] [14:29:32] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 44.44% of data above the critical threshold [0.0] [14:29:33] PROBLEM - Puppet failure on tools-master is CRITICAL 20.00% of data above the critical threshold [0.0] [14:29:33] Coren: yeah, I agree. [14:29:35] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:29:36] PROBLEM - Puppet failure on tools-exec-1404 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:29:53] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:29:59] PROBLEM - Puppet failure on tools-exec-1219 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:30:05] PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL 55.56% of data above the critical threshold [0.0] [14:30:09] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL 66.67% of data above the critical threshold [0.0] [14:30:23] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL 55.56% of data above the critical threshold [0.0] [14:30:33] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1209 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:30:33] PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:30:51] PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:31:01] PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:31:03] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:32:09] PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL 44.44% of data above the critical threshold [0.0] [14:32:11] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 55.56% of data above the critical threshold [0.0] [14:32:17] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1202 is CRITICAL 44.44% of data above the critical threshold [0.0] [14:32:27] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:32:41] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1406 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:32:53] PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:33:59] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1408 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:39:35] RECOVERY - Puppet failure on tools-exec-1410 is OK Less than 1.00% above the threshold [0.0] [14:39:37] RECOVERY - Puppet failure on tools-exec-1409 is OK Less than 1.00% above the threshold [0.0] [14:40:07] RECOVERY - Puppet failure on tools-exec-1210 is OK Less than 1.00% above the threshold [0.0] [14:40:27] RECOVERY - Puppet failure on tools-static-01 is OK Less than 1.00% above the threshold [0.0] [14:40:43] RECOVERY - Puppet failure on tools-exec-1208 is OK Less than 1.00% above the threshold [0.0] [14:41:41] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0] [14:41:45] RECOVERY - Puppet failure on tools-trusty is OK Less than 1.00% above the threshold [0.0] [14:42:11] RECOVERY - Puppet failure on tools-exec-1408 is OK Less than 1.00% above the threshold [0.0] [14:42:34] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0] [14:42:46] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1205 is OK Less than 1.00% above the threshold [0.0] [14:43:45] RECOVERY - Puppet failure on tools-bastion-02 is OK Less than 1.00% above the threshold [0.0] [14:43:53] RECOVERY - Puppet failure on tools-mail is OK Less than 1.00% above the threshold [0.0] [14:43:59] RECOVERY - Puppet failure on tools-checker-01 is OK Less than 1.00% above the threshold [0.0] [14:44:16] RECOVERY - Puppet failure on tools-webgrid-generic-1404 is OK Less than 1.00% above the threshold [0.0] [14:44:32] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [14:44:44] RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0] [14:44:50] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [0.0] [14:44:52] RECOVERY - Puppet failure on tools-webgrid-generic-1401 is OK Less than 1.00% above the threshold [0.0] [14:44:58] RECOVERY - Puppet failure on tools-exec-1209 is OK Less than 1.00% above the threshold [0.0] [14:45:02] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK Less than 1.00% above the threshold [0.0] [14:45:32] RECOVERY - Puppet failure on tools-exec-1215 is OK Less than 1.00% above the threshold [0.0] [14:45:46] RECOVERY - Puppet failure on tools-exec-1207 is OK Less than 1.00% above the threshold [0.0] [14:46:10] RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0] [14:48:21] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [0.0] [14:49:15] RECOVERY - Puppet failure on tools-checker-02 is OK Less than 1.00% above the threshold [0.0] [14:49:23] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1401 is OK Less than 1.00% above the threshold [0.0] [14:49:57] RECOVERY - Puppet failure on tools-exec-catscan is OK Less than 1.00% above the threshold [0.0] [14:50:05] RECOVERY - Puppet failure on tools-exec-1217 is OK Less than 1.00% above the threshold [0.0] [14:50:19] RECOVERY - Puppet failure on tools-exec-cyberbot is OK Less than 1.00% above the threshold [0.0] [14:50:59] RECOVERY - Puppet failure on tools-services-01 is OK Less than 1.00% above the threshold [0.0] [14:51:09] RECOVERY - Puppet failure on tools-submit is OK Less than 1.00% above the threshold [0.0] [14:51:19] RECOVERY - Puppet failure on tools-shadow is OK Less than 1.00% above the threshold [0.0] [14:51:39] RECOVERY - Puppet failure on tools-precise-dev is OK Less than 1.00% above the threshold [0.0] [14:52:01] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1204 is OK Less than 1.00% above the threshold [0.0] [14:52:17] RECOVERY - Puppet failure on tools-static-02 is OK Less than 1.00% above the threshold [0.0] [14:52:18] RECOVERY - Puppet failure on tools-exec-1205 is OK Less than 1.00% above the threshold [0.0] [14:52:23] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1409 is OK Less than 1.00% above the threshold [0.0] [14:52:31] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0] [14:52:38] RECOVERY - Puppet failure on tools-exec-1402 is OK Less than 1.00% above the threshold [0.0] [14:53:18] RECOVERY - Puppet failure on tools-exec-1401 is OK Less than 1.00% above the threshold [0.0] [14:53:22] RECOVERY - Puppet failure on tools-redis is OK Less than 1.00% above the threshold [0.0] [14:53:34] RECOVERY - Puppet failure on tools-redis-slave is OK Less than 1.00% above the threshold [0.0] [14:53:46] RECOVERY - Puppet failure on tools-webgrid-generic-1402 is OK Less than 1.00% above the threshold [0.0] [14:54:02] RECOVERY - Puppet failure on tools-exec-1214 is OK Less than 1.00% above the threshold [0.0] [14:54:21] RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0] [14:54:29] RECOVERY - Puppet failure on tools-exec-1201 is OK Less than 1.00% above the threshold [0.0] [14:54:33] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [0.0] [14:54:37] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK Less than 1.00% above the threshold [0.0] [14:54:37] RECOVERY - Puppet failure on tools-exec-1404 is OK Less than 1.00% above the threshold [0.0] [14:54:57] RECOVERY - Puppet failure on tools-exec-1219 is OK Less than 1.00% above the threshold [0.0] [14:55:11] RECOVERY - Puppet failure on tools-exec-1212 is OK Less than 1.00% above the threshold [0.0] [14:55:33] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1209 is OK Less than 1.00% above the threshold [0.0] [14:55:33] RECOVERY - Puppet failure on tools-webproxy-02 is OK Less than 1.00% above the threshold [0.0] [14:55:43] RECOVERY - Puppet failure on tools-services-02 is OK Less than 1.00% above the threshold [0.0] [14:55:43] RECOVERY - Puppet failure on tools-exec-1213 is OK Less than 1.00% above the threshold [0.0] [14:55:51] RECOVERY - Puppet failure on tools-exec-1203 is OK Less than 1.00% above the threshold [0.0] [14:56:43] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1405 is OK Less than 1.00% above the threshold [0.0] [14:57:07] RECOVERY - Puppet failure on tools-exec-1403 is OK Less than 1.00% above the threshold [0.0] [14:57:12] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [0.0] [14:57:16] RECOVERY - Puppet failure on tools-exec-1218 is OK Less than 1.00% above the threshold [0.0] [14:57:17] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1202 is OK Less than 1.00% above the threshold [0.0] [14:57:27] RECOVERY - Puppet failure on tools-webproxy-01 is OK Less than 1.00% above the threshold [0.0] [14:57:43] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1406 is OK Less than 1.00% above the threshold [0.0] [14:57:53] RECOVERY - Puppet failure on tools-exec-1406 is OK Less than 1.00% above the threshold [0.0] [14:58:55] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1408 is OK Less than 1.00% above the threshold [0.0] [14:59:19] RECOVERY - Puppet failure on tools-exec-1211 is OK Less than 1.00% above the threshold [0.0] [14:59:21] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1210 is CRITICAL 22.22% of data above the critical threshold [0.0] [14:59:27] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1201 is OK Less than 1.00% above the threshold [0.0] [14:59:33] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [14:59:53] RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0] [15:00:20] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1401 is CRITICAL 22.22% of data above the critical threshold [0.0] [15:00:53] RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [0.0] [15:00:57] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL 20.00% of data above the critical threshold [0.0] [15:01:01] RECOVERY - Puppet failure on tools-exec-1407 is OK Less than 1.00% above the threshold [0.0] [15:01:05] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [0.0] [15:01:19] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL 22.22% of data above the critical threshold [0.0] [15:01:59] PROBLEM - Puppet failure on tools-services-01 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:02:10] PROBLEM - Puppet failure on tools-submit is CRITICAL 33.33% of data above the critical threshold [0.0] [15:02:10] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 55.56% of data above the critical threshold [0.0] [15:02:20] PROBLEM - Puppet failure on tools-shadow is CRITICAL 50.00% of data above the critical threshold [0.0] [15:02:36] PROBLEM - Puppet failure on tools-precise-dev is CRITICAL 30.00% of data above the critical threshold [0.0] [15:03:20] PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL 44.44% of data above the critical threshold [0.0] [15:03:32] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 40.00% of data above the critical threshold [0.0] [15:03:40] PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL 60.00% of data above the critical threshold [0.0] [15:04:46] PROBLEM - Puppet failure on tools-webgrid-generic-1402 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:06:04] PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL 66.67% of data above the critical threshold [0.0] [15:06:32] PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:06:39] PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:06:41] PROBLEM - Puppet failure on tools-services-02 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:06:49] PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:07:37] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1405 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:08:02] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:08:07] PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL 22.22% of data above the critical threshold [0.0] [15:08:17] PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL 22.22% of data above the critical threshold [0.0] [15:08:19] PROBLEM - Puppet failure on tools-static-02 is CRITICAL 33.33% of data above the critical threshold [0.0] [15:08:21] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1409 is CRITICAL 40.00% of data above the critical threshold [0.0] [15:08:43] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1406 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:08:53] PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL 40.00% of data above the critical threshold [0.0] [15:09:13] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL 44.44% of data above the critical threshold [0.0] [15:09:27] PROBLEM - Puppet failure on tools-redis is CRITICAL 40.00% of data above the critical threshold [0.0] [15:09:37] PROBLEM - Puppet failure on tools-redis-slave is CRITICAL 50.00% of data above the critical threshold [0.0] [15:10:02] PROBLEM - Puppet failure on tools-exec-1214 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:10:17] PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:10:21] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 55.56% of data above the critical threshold [0.0] [15:10:23] PROBLEM - Puppet failure on tools-exec-1201 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:10:31] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 55.56% of data above the critical threshold [0.0] [15:10:32] PROBLEM - Puppet failure on tools-master is CRITICAL 40.00% of data above the critical threshold [0.0] [15:10:33] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL 60.00% of data above the critical threshold [0.0] [15:10:37] PROBLEM - Puppet failure on tools-exec-1404 is CRITICAL 60.00% of data above the critical threshold [0.0] [15:10:38] PROBLEM - Puppet failure on tools-exec-1409 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:10:57] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 40.00% of data above the critical threshold [0.0] [15:11:00] PROBLEM - Puppet failure on tools-exec-1219 is CRITICAL 60.00% of data above the critical threshold [0.0] [15:11:07] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL 66.67% of data above the critical threshold [0.0] [15:11:08] PROBLEM - Puppet failure on tools-exec-1210 is CRITICAL 22.22% of data above the critical threshold [0.0] [15:11:29] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1209 is CRITICAL 60.00% of data above the critical threshold [0.0] [15:11:51] PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:12:01] PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:12:03] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 55.56% of data above the critical threshold [0.0] [15:13:11] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 66.67% of data above the critical threshold [0.0] [15:13:21] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1202 is CRITICAL 66.67% of data above the critical threshold [0.0] [15:13:27] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:13:35] PROBLEM - Puppet failure on tools-exec-wmt is CRITICAL 50.00% of data above the critical threshold [0.0] [15:14:17] Those warnings are all me, and I’m working on it [15:14:27] PROBLEM - Puppet failure on tools-exec-1206 is CRITICAL 60.00% of data above the critical threshold [0.0] [15:14:57] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1408 is CRITICAL 60.00% of data above the critical threshold [0.0] [15:15:25] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL 66.67% of data above the critical threshold [0.0] [15:16:59] andrewbogott: thanks! [15:17:13] (thanks you're working on it, and thanks for letting us know) [15:29:45] PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:29:57] PROBLEM - Puppet failure on tools-checker-01 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:30:11] PROBLEM - Puppet failure on tools-checker-02 is CRITICAL 22.22% of data above the critical threshold [0.0] [15:30:13] PROBLEM - Puppet failure on tools-webgrid-generic-1404 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:30:34] PROBLEM - Puppet failure on tools-exec-1410 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:30:35] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:30:44] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:30:50] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:30:52] PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:31:01] PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:31:03] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL 30.00% of data above the critical threshold [0.0] [15:31:25] PROBLEM - Puppet failure on tools-static-01 is CRITICAL 33.33% of data above the critical threshold [0.0] [15:31:33] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 40.00% of data above the critical threshold [0.0] [15:31:45] PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL 40.00% of data above the critical threshold [0.0] [15:31:47] PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL 40.00% of data above the critical threshold [0.0] [15:32:43] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:32:45] PROBLEM - Puppet failure on tools-trusty is CRITICAL 50.00% of data above the critical threshold [0.0] [15:33:11] PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL 55.56% of data above the critical threshold [0.0] [15:33:32] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 60.00% of data above the critical threshold [0.0] [15:33:44] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1205 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:34:52] PROBLEM - Puppet failure on tools-mail is CRITICAL 60.00% of data above the critical threshold [0.0] [15:45:12] RECOVERY - Puppet failure on tools-checker-02 is OK Less than 1.00% above the threshold [0.0] [15:46:47] RECOVERY - Puppet failure on tools-exec-1207 is OK Less than 1.00% above the threshold [0.0] [15:47:09] RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0] [15:47:19] RECOVERY - Puppet failure on tools-shadow is OK Less than 1.00% above the threshold [0.0] [15:48:23] RECOVERY - Puppet failure on tools-exec-1205 is OK Less than 1.00% above the threshold [0.0] [15:48:39] RECOVERY - Puppet failure on tools-exec-1402 is OK Less than 1.00% above the threshold [0.0] [15:49:24] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [0.0] [15:50:19] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1401 is OK Less than 1.00% above the threshold [0.0] [15:50:57] RECOVERY - Puppet failure on tools-exec-catscan is OK Less than 1.00% above the threshold [0.0] [15:51:07] RECOVERY - Puppet failure on tools-exec-1217 is OK Less than 1.00% above the threshold [0.0] [15:51:21] RECOVERY - Puppet failure on tools-exec-cyberbot is OK Less than 1.00% above the threshold [0.0] [15:51:33] RECOVERY - Puppet failure on tools-webproxy-02 is OK Less than 1.00% above the threshold [0.0] [15:51:41] RECOVERY - Puppet failure on tools-services-02 is OK Less than 1.00% above the threshold [0.0] [15:52:00] RECOVERY - Puppet failure on tools-services-01 is OK Less than 1.00% above the threshold [0.0] [15:52:10] RECOVERY - Puppet failure on tools-submit is OK Less than 1.00% above the threshold [0.0] [15:52:39] RECOVERY - Puppet failure on tools-precise-dev is OK Less than 1.00% above the threshold [0.0] [15:52:39] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1405 is OK Less than 1.00% above the threshold [0.0] [15:53:01] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1204 is OK Less than 1.00% above the threshold [0.0] [15:53:18] RECOVERY - Puppet failure on tools-static-02 is OK Less than 1.00% above the threshold [0.0] [15:53:24] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1409 is OK Less than 1.00% above the threshold [0.0] [15:53:34] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0] [15:54:12] RECOVERY - Puppet failure on tools-exec-1401 is OK Less than 1.00% above the threshold [0.0] [15:54:22] RECOVERY - Puppet failure on tools-redis is OK Less than 1.00% above the threshold [0.0] [15:54:38] RECOVERY - Puppet failure on tools-redis-slave is OK Less than 1.00% above the threshold [0.0] [15:54:44] RECOVERY - Puppet failure on tools-webgrid-generic-1402 is OK Less than 1.00% above the threshold [0.0] [15:55:18] RECOVERY - Puppet failure on tools-exec-1211 is OK Less than 1.00% above the threshold [0.0] [15:55:22] RECOVERY - Puppet failure on tools-exec-1201 is OK Less than 1.00% above the threshold [0.0] [15:55:23] RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0] [15:55:33] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [0.0] [15:55:35] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK Less than 1.00% above the threshold [0.0] [15:55:37] RECOVERY - Puppet failure on tools-exec-1404 is OK Less than 1.00% above the threshold [0.0] [15:55:59] RECOVERY - Puppet failure on tools-exec-1219 is OK Less than 1.00% above the threshold [0.0] [15:56:07] RECOVERY - Puppet failure on tools-exec-1212 is OK Less than 1.00% above the threshold [0.0] [15:56:33] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1209 is OK Less than 1.00% above the threshold [0.0] [15:56:41] RECOVERY - Puppet failure on tools-exec-1213 is OK Less than 1.00% above the threshold [0.0] [15:56:49] RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [0.0] [15:56:51] RECOVERY - Puppet failure on tools-exec-1203 is OK Less than 1.00% above the threshold [0.0] [15:57:03] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [0.0] [15:58:07] RECOVERY - Puppet failure on tools-exec-1403 is OK Less than 1.00% above the threshold [0.0] [15:58:10] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [0.0] [15:58:16] RECOVERY - Puppet failure on tools-exec-1218 is OK Less than 1.00% above the threshold [0.0] [15:58:18] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1202 is OK Less than 1.00% above the threshold [0.0] [15:58:26] RECOVERY - Puppet failure on tools-webproxy-01 is OK Less than 1.00% above the threshold [0.0] [15:58:32] RECOVERY - Puppet failure on tools-exec-wmt is OK Less than 1.00% above the threshold [0.0] [15:58:44] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1406 is OK Less than 1.00% above the threshold [0.0] [15:58:54] RECOVERY - Puppet failure on tools-exec-1406 is OK Less than 1.00% above the threshold [0.0] [15:59:24] RECOVERY - Puppet failure on tools-exec-1206 is OK Less than 1.00% above the threshold [0.0] [15:59:59] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1408 is OK Less than 1.00% above the threshold [0.0] [16:00:01] RECOVERY - Puppet failure on tools-exec-1214 is OK Less than 1.00% above the threshold [0.0] [16:00:25] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1201 is OK Less than 1.00% above the threshold [0.0] [16:00:33] RECOVERY - Puppet failure on tools-exec-1410 is OK Less than 1.00% above the threshold [0.0] [16:00:33] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [16:00:37] RECOVERY - Puppet failure on tools-exec-1409 is OK Less than 1.00% above the threshold [0.0] [16:00:43] RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0] [16:00:53] RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0] [16:01:07] RECOVERY - Puppet failure on tools-exec-1210 is OK Less than 1.00% above the threshold [0.0] [16:01:27] RECOVERY - Puppet failure on tools-static-01 is OK Less than 1.00% above the threshold [0.0] [16:01:41] 6Labs, 5Patch-For-Review: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1288200 (10Andrew) We tried to implement this today, and failed due to lack of connectivity with labnet1001, where the nova api lives. @faidon, can you adjust the network to permit such tr... [16:01:43] RECOVERY - Puppet failure on tools-exec-1208 is OK Less than 1.00% above the threshold [0.0] [16:02:01] RECOVERY - Puppet failure on tools-exec-1407 is OK Less than 1.00% above the threshold [0.0] [16:02:39] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0] [16:02:45] RECOVERY - Puppet failure on tools-trusty is OK Less than 1.00% above the threshold [0.0] [16:03:09] RECOVERY - Puppet failure on tools-exec-1408 is OK Less than 1.00% above the threshold [0.0] [16:03:33] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0] [16:03:45] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1205 is OK Less than 1.00% above the threshold [0.0] [16:04:44] RECOVERY - Puppet failure on tools-bastion-02 is OK Less than 1.00% above the threshold [0.0] [16:04:50] RECOVERY - Puppet failure on tools-mail is OK Less than 1.00% above the threshold [0.0] [16:04:58] RECOVERY - Puppet failure on tools-checker-01 is OK Less than 1.00% above the threshold [0.0] [16:05:17] RECOVERY - Puppet failure on tools-webgrid-generic-1404 is OK Less than 1.00% above the threshold [0.0] [16:05:33] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [16:05:49] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [0.0] [16:05:53] RECOVERY - Puppet failure on tools-webgrid-generic-1401 is OK Less than 1.00% above the threshold [0.0] [16:06:01] RECOVERY - Puppet failure on tools-exec-1209 is OK Less than 1.00% above the threshold [0.0] [16:06:01] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK Less than 1.00% above the threshold [0.0] [16:06:33] RECOVERY - Puppet failure on tools-exec-1215 is OK Less than 1.00% above the threshold [0.0] [16:10:17] (03CR) 10Merlijn van Deen: "Thanks for those suggestions. I've put discussing this on the lists of things to discuss during the lyon hackathon: https://phabricator.wi" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/209968 (https://phabricator.wikimedia.org/T98641) (owner: 10Merlijn van Deen) [16:10:48] Coren: any things you'd like to discuss in lyon? https://phabricator.wikimedia.org/T98912 [16:11:53] I'll follow your lead; keep in mind that I have a full plate for Lyon with the WIkisource community so I'll have limited time for "internal" stuff (I'll make sure to make at least /some/ time for it though) [16:14:28] Sure; that's why I'm checking in advance, so we can be super efficient in Lyon :-) [16:14:35] what are you going to do with WS? [16:21:13] They need help with a bunch of critical tools that are unmaintained/abandonned [16:22:54] https://fr.wikisource.org/wiki/Wikisource:Outils_%C3%A0_developper_%28Hackathon_2015_Lyon%29 is the rough agenda (in French) [16:23:27] Coren: Any update on abandoned tools 'usurpation'? [16:23:55] I'd still like to be added for bibleversefinder. Still waiting on legal? [16:24:01] T13|inClass: Not really; but I'm having a sitdown with Luis this weekend so I'll put it on the agenda [16:24:13] Thank you. ") [16:24:15] (Community relations vs legal) [16:24:44] :) [17:12:00] 10Tool-Labs, 10Hackathon-Lyon-2015: Tool-labs meeting agenda for Lyon Hackathon - https://phabricator.wikimedia.org/T98912#1288363 (10fgiunchedi) I don't have much sight in toollabs but happy to provide packaging help, ping me in Lyon in case [17:31:01] 10Tool-Labs, 10Continuous-Integration-Config: Set up lint checks for labs/toollabs - https://phabricator.wikimedia.org/T65687#1288423 (10JanZerebecki) [17:52:24] 6Labs, 5Patch-For-Review: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1288501 (10jcrespo) @Andrew, it is not an exhaustive investigation, but from netstat I can only see 2 hosts going in right now to virt1000: virt1000.wikimedia labnet1001.equiad I have als... [17:54:32] Fun fact: Moritz and I are applying a security patch to virt hosts. Each instance will hesitate for a second or two as it restarts with the new code. [17:55:17] * bd808 waves goodby to virtual floppy disks [18:01:17] 6Labs, 5Patch-For-Review: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1288514 (10Andrew) I'm not shocked if it's only those two, since originally I thought it was only one :) [18:32:05] Are all WMF servers in eqiad? [18:32:05] None anywhere else? [18:32:29] no [18:32:34] there are other datacenters [18:33:55] HoloIRCUser1: https://wikitech.wikimedia.org/wiki/Clusters [18:38:47] legoktm: Oh, nice. [19:09:32] yuvipanda|zzz, where can I find uwsgi python stack traces...? [19:14:41] oh, it's apparently flask swallowing them [19:34:28] yuvipanda|zzz: https://merlijn.vandeen.nl/2015/flask-mwoauth-on-tools.html :> [19:35:46] finally found time to finish it [19:37:51] 10Tool-Labs, 10Continuous-Integration-Config: Set up lint checks for labs/toollabs - https://phabricator.wikimedia.org/T65687#1288860 (10hashar) Currently Zuul triggers: ``` - name: labs/toollabs test: - labs-toollabs-debian-glue - phplint gate-and-submit: - phplint ``` For JavaScript... [20:50:07] !log deployment-prep rebooted deployment-bastion due to inconsistent run state after suspend/resume [20:50:16] Logged the message, dummy [23:06:51] andrewbogott: Coren: https://phabricator.wikimedia.org/T99304?workflow=create [23:07:05] Looks like there is over a minute clockdrift in labs between some instances and prod hosts. [23:07:19] And also lots of ping and socket host unreachable errors past hour. [23:07:41] Krinkle: I have been suspending and resuming instances, although I wouldn’t expect that to cause a minute’s worth [23:08:37] I thought (perhaps incorrectly) that they were running ntp and would re-sync themselves. Untrue [23:08:38] ? [23:08:42] Whatever it is, it's not specific to CI I imagine, I expect errors in all kinds of unforeseeable contexts in other instances. [23:08:57] I don't know. [23:09:10] I'm not sure how to compare it well, and I'm on a no-good connection at the moment. [23:09:29] I think jenkins tolerates up to 5 seconds at most. [23:09:46] Although I've never seen more than 2 in the past 3 years. [23:10:31] It got worse just now [23:10:52] The trusty instances were in sync and are now a minute and a half behind as well. [23:11:44] Krinkle: does ‘sudo ntpd -q’ fix anything? [23:12:36] I shouldn't be doing that, but I guess no-one in RelEng is online at the moment? [23:12:53] * Krinkle tries [23:13:26] * bd808 hands Krinkle one of the honorary RelEng badges laying on his desk [23:13:41] andrewbogott: No difference it seems, after that, still a minute or so behind. [23:13:47] ok [23:13:50] no error output or exit code either [23:15:40] Krinkle: yeah, that’s what the -q does :) [23:15:42] Try this: sudo service ntp stop ; sudo ntpdate -s time.nist.gov ; sudo service ntp start [23:17:51] Krinkle: ^ ? [23:18:08] andrewbogott: What server does it normally update with and how often? [23:18:23] Krinkle: as far as I know it only updates on startup. [23:18:28] I'd rather not change an indiviudal instance. [23:18:33] Do you not observe this on any other instances? [23:18:35] ? [23:18:38] I'd rather not change ci instances from base [23:18:57] Im currently playing on integration-slave-trusty-1017 which is 1-2 minutes behind. [23:19:02] from my local workstation clock [23:19:28] Oh, I thought NTP runs like much more often in the background? [23:19:36] Do we have our own NTP server inside wmf? [23:20:40] I thought it ran periodically as well, but since you’re seeing drift I conclude that this is not the case :( [23:21:09] Anyway, I updated the ticket with a solution — I’ll force a sync everywhere after everything is resumed. [23:21:30] (and, the clock is right on 1017 now, right?) [23:30:12] andrewbogott: It is [23:30:29] andrewbogott: Is there a salt command I can run now? [23:30:51] every other job is failing because of unexpected results in all kinds of places [23:30:52] I put the command on the ticket [23:30:56] Gotta go, though. [23:31:11] Ah, I see. OK [23:31:20] Without the time.nist.goc hardcode [23:31:21] perfect [23:31:54] Um, sorry — I presume that the jenkins boxes have their own salt master? Or are they hosted centrally? [23:32:20] there's an integration-saltmaster in labs [23:32:25] the prod servers are fine [23:32:40] Yeah, that’s what I thought. So you can run the salt command yourself, right? [23:32:49] The beta cluster instances are also affected, but they only affect beta update jobs which don't depend on clock much [23:32:51] Ye [23:32:52] p [23:32:57] Sorry to duck out, my ride is literally idling out on the curb :)