[02:14:33] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2885563 (10Andrew) [02:14:35] 06Labs: Request increased quota for etytree labs project - https://phabricator.wikimedia.org/T152417#2885562 (10Andrew) 05Open>03Resolved [02:15:35] 06Labs: Increase resource quota for dwl - https://phabricator.wikimedia.org/T152456#2848878 (10Andrew) Can you tell me more about what resources you're running short of? Is it possible this can be addressed via logfile rotation or query optimization, etc.? [02:36:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:39:09] PROBLEM - Free space - all mounts on tools-proxy-01 is CRITICAL: CRITICAL: tools.tools-proxy-01.diskspace._public_dumps.byte_percentfree (No valid datapoints found)tools.tools-proxy-01.diskspace.root.byte_percentfree (<50.00%) [03:09:07] RECOVERY - Free space - all mounts on tools-proxy-01 is OK: OK: tools.tools-proxy-01.diskspace._public_dumps.byte_percentfree (No valid datapoints found) [03:20:19] 06Labs, 07Puppet: Migrate references from $instance.eqiad.wmflabs to $instance.$project.eqiad.wmflabs - https://phabricator.wikimedia.org/T153608#2885600 (10scfc) [03:20:34] 06Labs, 07Puppet: Migrate references from $instance.eqiad.wmflabs to $instance.$project.eqiad.wmflabs - https://phabricator.wikimedia.org/T153608#2885612 (10scfc) p:05Triage>03Lowest [03:36:41] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [04:08:20] (03PS1) 10BryanDavis: Allow deleting SSH keys [labs/striker] - 10https://gerrit.wikimedia.org/r/328117 (https://phabricator.wikimedia.org/T144711) [04:08:47] bd808: needs some help with web dev? [04:09:26] (03PS2) 10BryanDavis: Allow deleting SSH keys [labs/striker] - 10https://gerrit.wikimedia.org/r/328117 (https://phabricator.wikimedia.org/T144711) [05:02:42] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [05:03:49] 06Labs, 10Tool-Labs, 13Patch-For-Review: Move aptly backups to a cron rather than puppet - https://phabricator.wikimedia.org/T150726#2885652 (10scfc) After testing, my idea to hinge the backup on `publish-aptly-repo-${title}` is wrong: `/usr/bin/aptly -architectures=amd64,all publish -skip-signing repo ${tit... [05:06:23] 06Labs, 10Tool-Labs, 13Patch-For-Review: Move aptly backups to a cron rather than puppet - https://phabricator.wikimedia.org/T150726#2885666 (10scfc) (JFTR: AFAICT `rsync` is way faster than Puppet in backing up, but the logic of "a backup is only //needed// when a package has been added" applies there as we... [05:13:45] 06Labs, 10Tool-Labs, 13Patch-For-Review: Move aptly backups to a cron rather than puppet - https://phabricator.wikimedia.org/T150726#2885667 (10scfc) For reference: The working `rsync` invocation by Puppet was `/usr/bin/rsync --chmod 440 --chown root:${::labsproject}.admin -ilrt /srv/packages/ /data/project/... [05:23:02] 06Labs, 10Tool-Labs: tools-redis is down - https://phabricator.wikimedia.org/T66150#2885670 (10scfc) [05:23:04] 06Labs: labs_lvm can clash file resource for mount point with other packages - https://phabricator.wikimedia.org/T91225#2885668 (10scfc) 05Open>03declined Technically, the problem still persists; but without people hitting that error regularly, there is no real point in fixing it. [05:39:32] 06Labs, 07Puppet: Retire and remove module labs_debrepo - https://phabricator.wikimedia.org/T153612#2885673 (10scfc) [05:39:52] 06Labs, 07Puppet: Retire and remove module labs_debrepo - https://phabricator.wikimedia.org/T153612#2885686 (10scfc) [05:39:56] 06Labs, 06Discovery, 10Wikidata, 10Wikidata-Query-Service: Sunset of WDQ - https://phabricator.wikimedia.org/T153439#2885685 (10scfc) [05:44:16] 06Labs, 07Puppet: Move all labs-only puppet roles to manifests/role/labs - https://phabricator.wikimedia.org/T107167#2885702 (10scfc) 05Open>03Resolved [06:18:06] 10Quarry: Remove dependency on labs_debrepo - https://phabricator.wikimedia.org/T153615#2885716 (10scfc) [06:42:43] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:17:13] 06Labs, 10wikitech.wikimedia.org: Extension:SyntaxHighlight_GeSHi reports for pages with syntax highlighting errors are bogus - https://phabricator.wikimedia.org/T153616#2885737 (10scfc) [07:23:12] 06Labs, 10wikitech.wikimedia.org: Job queue has 119663 entries - https://phabricator.wikimedia.org/T153618#2885763 (10scfc) [07:28:45] 06Labs, 10wikitech.wikimedia.org: Job queue has 119663 entries - https://phabricator.wikimedia.org/T153618#2885785 (10Legoktm) p:05Triage>03Unbreak! ``` legoktm@terbium:~$ mwscript runJobs.php --wiki=labswiki --maxjobs=5 Warning: Memcached::touch(): touch is only supported with binary protocol in /srv/medi... [08:02:30] 06Labs, 06Discovery, 10Wikidata, 10Wikidata-Query-Service: Sunset of WDQ - https://phabricator.wikimedia.org/T153439#2885866 (10Esc3300) [08:36:06] (03Draft1) 10Paladox: Send nick serv password when you change the nick back to original [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/328139 [08:36:09] (03Draft2) 10Paladox: Send nick serv password when you change the nick back to original [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/328139 [09:23:31] PROBLEM - Puppet run on tools-docker-registry-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [09:23:49] PROBLEM - Puppet run on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:24:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:24:15] PROBLEM - Puppet run on tools-redis-1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:24:23] PROBLEM - Puppet run on tools-exec-1219 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:24:45] PROBLEM - Puppet run on tools-exec-1405 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:25:11] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:25:37] PROBLEM - Puppet run on tools-webgrid-lighttpd-1204 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:25:47] PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:26:08] PROBLEM - Puppet run on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:27:02] PROBLEM - Puppet run on tools-exec-1415 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:27:20] PROBLEM - Puppet run on tools-worker-1019 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:27:30] PROBLEM - Puppet run on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:27:36] PROBLEM - Puppet run on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:27:38] PROBLEM - Puppet run on tools-docker-registry-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:27:40] PROBLEM - Puppet run on tools-worker-1022 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [09:27:56] PROBLEM - Puppet run on tools-proxy-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:28:04] PROBLEM - Puppet run on tools-exec-1218 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:28:10] PROBLEM - Puppet run on tools-exec-1413 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:28:14] PROBLEM - Puppet run on tools-logs-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:28:30] PROBLEM - Puppet run on tools-static-10 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:28:32] PROBLEM - Puppet run on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:28:42] PROBLEM - Puppet run on tools-exec-1404 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:28:54] PROBLEM - Puppet run on tools-worker-1005 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:29:02] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [09:29:16] PROBLEM - Puppet run on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:29:38] PROBLEM - Puppet run on tools-exec-1408 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:30:13] PROBLEM - Puppet run on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:31:11] PROBLEM - Puppet run on tools-exec-1419 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:32:07] PROBLEM - Puppet run on tools-exec-1221 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:32:17] PROBLEM - Puppet run on tools-worker-1015 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:32:23] PROBLEM - Puppet run on tools-worker-1004 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:32:55] PROBLEM - Puppet run on tools-exec-1213 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:33:21] PROBLEM - Puppet run on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:33:27] PROBLEM - Puppet run on tools-flannel-etcd-03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [09:33:49] PROBLEM - Puppet run on tools-bastion-05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:33:50] PROBLEM - Puppet run on tools-exec-1402 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:34:14] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:34:22] PROBLEM - Puppet run on tools-docker-builder-03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:34:30] PROBLEM - Puppet run on tools-worker-1014 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:34:42] PROBLEM - Puppet run on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:35:50] PROBLEM - Puppet run on tools-elastic-03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:35:58] PROBLEM - Puppet run on tools-worker-1003 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:36:06] PROBLEM - Puppet run on tools-puppetmaster-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:36:07] PROBLEM - Puppet run on tools-exec-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:36:08] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:36:12] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:37:31] PROBLEM - Puppet run on tools-worker-1007 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:37:33] PROBLEM - Puppet run on tools-exec-1414 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:37:39] PROBLEM - Puppet run on tools-exec-1416 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:37:39] PROBLEM - Puppet run on tools-exec-1417 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:37:43] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:37:43] PROBLEM - Puppet run on tools-proxy-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [09:38:15] PROBLEM - Puppet run on tools-webgrid-lighttpd-1206 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:38:25] PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:38:39] PROBLEM - Puppet run on tools-exec-1406 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:38:43] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:38:47] PROBLEM - Puppet run on tools-worker-1025 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:39:15] PROBLEM - Puppet run on tools-redis-1002 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:39:55] PROBLEM - Puppet run on tools-prometheus-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:40:50] PROBLEM - Puppet run on tools-worker-1013 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:40:58] PROBLEM - Puppet run on tools-flannel-etcd-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:41:00] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:41:54] PROBLEM - Puppet run on tools-worker-1002 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:41:58] PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:42:18] PROBLEM - Puppet run on tools-worker-1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:42:32] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:42:38] PROBLEM - Puppet run on tools-mail-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:43:12] PROBLEM - Puppet run on tools-webgrid-lighttpd-1205 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:43:42] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:43:54] PROBLEM - Puppet run on tools-exec-1214 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:44:51] PROBLEM - Puppet run on tools-exec-1217 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:45:17] PROBLEM - Puppet run on tools-worker-1009 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:45:19] PROBLEM - Puppet run on tools-exec-1410 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:45:21] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:45:29] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:46:10] PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:46:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:46:52] PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:46:58] PROBLEM - Puppet run on tools-exec-1418 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:46:59] PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:46:59] PROBLEM - Puppet run on tools-worker-1012 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:47:14] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:47:21] PROBLEM - Puppet run on tools-elastic-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:47:21] PROBLEM - Puppet run on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:47:21] PROBLEM - Puppet run on tools-mail is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:47:25] PROBLEM - Puppet run on tools-worker-1011 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:47:29] PROBLEM - Puppet run on tools-worker-1017 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:47:29] PROBLEM - Puppet run on tools-worker-1018 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:47:43] PROBLEM - Puppet run on tools-exec-1216 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:47:43] PROBLEM - Puppet run on tools-worker-1016 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:48:13] PROBLEM - Puppet run on tools-exec-1420 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:48:18] (03PS1) 10Legoktm: Subsume grrrit-wm [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 [09:48:35] PROBLEM - Puppet run on tools-k8s-etcd-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:48:39] (03CR) 10jenkins-bot: [V: 04-1] Subsume grrrit-wm [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [09:49:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1202 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:49:11] (03PS2) 10Legoktm: Subsume grrrit-wm [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 [09:49:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:49:21] PROBLEM - Puppet run on tools-worker-1006 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:49:39] PROBLEM - Puppet run on tools-webgrid-lighttpd-1415 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:49:53] PROBLEM - Puppet run on tools-exec-1220 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:50:03] PROBLEM - Puppet run on tools-grid-shadow is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:50:11] PROBLEM - Puppet run on tools-worker-1021 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:50:12] PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:50:36] PROBLEM - Puppet run on tools-k8s-etcd-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:50:56] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:51:06] PROBLEM - Puppet run on tools-grid-master is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:51:58] PROBLEM - Puppet run on tools-worker-1020 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:52:24] PROBLEM - Puppet run on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:53:26] PROBLEM - Puppet run on tools-webgrid-lighttpd-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:54:10] PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:54:14] PROBLEM - Puppet run on tools-checker-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [10:00:46] RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [10:01:00] PROBLEM - Puppet run on tools-exec-1403 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [10:04:08] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [10:09:29] RECOVERY - Puppet run on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [10:10:07] PROBLEM - Puppet run on tools-worker-1023 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:12:05] RECOVERY - Puppet run on tools-exec-1221 is OK: OK: Less than 1.00% above the threshold [0.0] [10:13:47] RECOVERY - Puppet run on tools-worker-1025 is OK: OK: Less than 1.00% above the threshold [0.0] [10:14:15] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [10:14:42] RECOVERY - Puppet run on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [10:14:56] RECOVERY - Puppet run on tools-prometheus-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:15:58] RECOVERY - Puppet run on tools-flannel-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:16:06] RECOVERY - Puppet run on tools-puppetmaster-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:16:06] RECOVERY - Puppet run on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [10:16:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [10:17:17] RECOVERY - Puppet run on tools-worker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [10:17:33] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [10:17:41] RECOVERY - Puppet run on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [10:17:41] RECOVERY - Puppet run on tools-mail-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:17:45] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [10:17:45] RECOVERY - Puppet run on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:18:15] RECOVERY - Puppet run on tools-webgrid-lighttpd-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [10:18:25] RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:18:37] RECOVERY - Puppet run on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [10:18:53] RECOVERY - Puppet run on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [10:19:17] RECOVERY - Puppet run on tools-redis-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [10:20:49] RECOVERY - Puppet run on tools-worker-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [10:21:11] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [10:21:54] RECOVERY - Puppet run on tools-worker-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [10:21:58] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:22:00] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [10:22:20] RECOVERY - Puppet run on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [10:22:30] RECOVERY - Puppet run on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [10:22:42] RECOVERY - Puppet run on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [10:22:44] RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [10:23:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [10:23:36] RECOVERY - Puppet run on tools-k8s-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:23:44] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:24:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [10:24:50] RECOVERY - Puppet run on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [10:25:20] RECOVERY - Puppet run on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [0.0] [10:25:23] RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [10:25:24] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [10:25:29] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [10:26:09] RECOVERY - Puppet run on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0] [10:26:51] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [10:26:59] RECOVERY - Puppet run on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [10:26:59] RECOVERY - Puppet run on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [10:26:59] RECOVERY - Puppet run on tools-worker-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [10:27:19] RECOVERY - Puppet run on tools-elastic-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:27:21] RECOVERY - Puppet run on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [10:27:25] RECOVERY - Puppet run on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [10:27:29] RECOVERY - Puppet run on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [10:27:33] 06Labs, 10SyntaxHighlight, 10wikitech.wikimedia.org: Extension:SyntaxHighlight_GeSHi reports for pages with syntax highlighting errors are bogus - https://phabricator.wikimedia.org/T153616#2886236 (10Paladox) [10:28:13] RECOVERY - Puppet run on tools-exec-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [10:29:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [10:29:15] RECOVERY - Puppet run on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:29:17] RECOVERY - Puppet run on tools-worker-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [10:29:41] RECOVERY - Puppet run on tools-webgrid-lighttpd-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [10:29:52] RECOVERY - Puppet run on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0] [10:30:02] RECOVERY - Puppet run on tools-grid-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [10:30:12] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:30:14] RECOVERY - Puppet run on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [10:30:36] RECOVERY - Puppet run on tools-k8s-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [10:30:56] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [10:31:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:32:26] RECOVERY - Puppet run on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [10:32:36] RECOVERY - Puppet run on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [10:32:38] RECOVERY - Puppet run on tools-docker-registry-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:33:02] RECOVERY - Puppet run on tools-exec-1218 is OK: OK: Less than 1.00% above the threshold [0.0] [10:33:16] RECOVERY - Puppet run on tools-logs-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:33:24] RECOVERY - Puppet run on tools-webgrid-lighttpd-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [10:33:33] RECOVERY - Puppet run on tools-docker-registry-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:33:33] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [10:33:47] RECOVERY - Puppet run on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [10:33:57] RECOVERY - Puppet run on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [10:33:58] (03CR) 10Paladox: [] "Could we bring over the commands that are in the bot like grrrit-wm: nick please and grrrit-wm: restart please?" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [10:34:09] (03CR) 10Paladox: [C: 04-1] Subsume grrrit-wm [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [10:34:11] RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [10:34:15] RECOVERY - Puppet run on tools-redis-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [10:34:25] RECOVERY - Puppet run on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [10:34:39] RECOVERY - Puppet run on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [10:34:45] RECOVERY - Puppet run on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [10:35:11] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [10:35:13] RECOVERY - Puppet run on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [10:35:35] RECOVERY - Puppet run on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [10:36:07] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [10:36:09] RECOVERY - Puppet run on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [10:36:40] (03CR) 10Paladox: [C: 04-1] "Probably also want to when this is merged bring over the password for nick serv so the bot can get its cloak." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [10:37:02] RECOVERY - Puppet run on tools-exec-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [10:37:18] RECOVERY - Puppet run on tools-worker-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [10:37:18] RECOVERY - Puppet run on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [10:37:22] RECOVERY - Puppet run on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [10:37:30] RECOVERY - Puppet run on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [10:37:40] RECOVERY - Puppet run on tools-worker-1022 is OK: OK: Less than 1.00% above the threshold [0.0] [10:37:54] RECOVERY - Puppet run on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:38:12] RECOVERY - Puppet run on tools-exec-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [10:38:20] RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [10:38:30] RECOVERY - Puppet run on tools-static-10 is OK: OK: Less than 1.00% above the threshold [0.0] [10:38:42] RECOVERY - Puppet run on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [10:39:02] RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [10:39:16] RECOVERY - Puppet run on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [10:39:21] (03CR) 10Legoktm: [] "I intentionally didn't implement those commands - wikibugs will automatically take over the nick when available, and restarts will be requ" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [10:39:22] RECOVERY - Puppet run on tools-docker-builder-03 is OK: OK: Less than 1.00% above the threshold [0.0] [10:40:59] RECOVERY - Puppet run on tools-worker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [10:42:39] RECOVERY - Puppet run on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [10:42:53] RECOVERY - Puppet run on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0] [10:43:29] RECOVERY - Puppet run on tools-flannel-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [10:43:36] (03CR) 10Paladox: [] "Oh why carnt we implement the nick command? Why does wikibugs take over the same nick?" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [10:43:47] RECOVERY - Puppet run on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [10:43:49] RECOVERY - Puppet run on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [10:44:40] (03CR) 10Legoktm: [] "Just to be clear, the Gerrit events are now going to be output by "wikibugs". grrrit-wm won't be used anymore. And if there's a wikibugs_," [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [10:45:05] RECOVERY - Puppet run on tools-worker-1023 is OK: OK: Less than 1.00% above the threshold [0.0] [10:45:52] RECOVERY - Puppet run on tools-elastic-03 is OK: OK: Less than 1.00% above the threshold [0.0] [10:45:58] RECOVERY - Puppet run on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [10:46:08] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [10:47:32] RECOVERY - Puppet run on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [10:47:34] RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [10:51:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [10:52:15] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [11:08:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:57:41] PROBLEM - Host tools-k8s-master-alextest is DOWN: CRITICAL - Host Unreachable (10.68.18.100) [12:00:02] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [12:22:56] PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [12:25:47] * akosiaris looking ^ [12:25:54] I am the cause of this anyway [12:26:40] akosiaris: ah, are you playing with the tools k8s? [12:26:47] υεσ [12:26:48] yes [12:26:52] actually failing [12:27:01] my biggest mistake was running deploy-master [12:27:03] akosiaris: heh, ok :) [12:27:09] which truncated kube-apiserver [12:27:14] I could use a bit of help [12:27:22] what arguments does it take ? [12:28:21] akosiaris: oh? what happened? [12:28:28] akosiaris: usually if you run puppet it should bring it up [12:28:33] akosiaris: is the k8s master down now?! [12:28:45] yes [12:28:52] -rwxr-xr-x 1 kubernetes staff 0 Dec 19 12:13 kube-apiserver [12:29:05] aaah [12:29:10] this is what happened ... after merging and checking everything was working fine [12:29:11] that kinda truncate [12:29:18] I ran deploy-master [12:29:24] without reading it first :-( [12:29:36] so it did wget -O /usr/local/bin/kube-apiserver truncating the binary :-( [12:29:43] what were you trying to do? [12:29:43] right [12:29:51] because I'm pretty sure there's no build on the server rn [12:29:58] argh [12:30:06] so this doesn't exist [12:30:15] so we got to build it first ? [12:30:17] fuck [12:30:22] I'll kick off a build [12:30:30] akosiaris: what were you trying to do? [12:31:01] undo a kube-scheduler version screwup [12:31:08] which I did successfully [12:31:26] thanks to puppet clientbucket in the end, not deploy-master [12:31:40] but somewhere along my effort I managed to truncated the apiserver binary [12:31:46] to get truncated* [12:33:02] akosiaris: kicked off a build [12:33:09] thanks.. how long ? [12:33:14] 45-60 mins ? [12:33:24] akosiaris: going to take a while. [12:33:27] akosiaris: don't remember, it's been a while :) I skipped tests [12:33:32] ok [12:33:35] sorry [12:33:40] for messing up so badly [12:33:58] damn deploy-master was a mine just waiting to be stepped on [12:34:23] akosiaris: yup :) I was planning on not touching it at all until we move to the debs [12:35:10] akosiaris: so... what happened? were you pushing a change to the k8s module that caused issues? [12:35:18] yes [12:35:28] wrong puppet resource [12:35:42] https://gerrit.wikimedia.org/r/#/c/326429/4/modules/k8s/manifests/scheduler.pp [12:35:57] source and target in the symlink are vice versa [12:36:07] and was trying to fix that and run deploy-master :-( [12:36:19] :D [12:36:55] and even then if I had not reverted my changes [12:37:01] everything would have been ok [12:37:15] but that was the perfect trigger for the mine I stepped on right before [12:37:25] well, better now than in the future anyway [12:37:55] someone was bound to have it explode on him anyway, better me than anyone else [12:39:00] akosiaris: well, better me :P [12:39:22] akosiaris: but in general, do ping me before cherry-picking on so I can warn you of traps :) [12:39:44] yeah, I was trying to save you the trouble.. turns out that backfired [12:39:51] badly in fact [12:40:02] RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [12:40:59] akosiaris: hmm, the packages are for 1.4 right? [12:41:03] akosiaris: not 1.5? [12:41:06] yes 1.4 [12:41:17] akosiaris: right, so we've to do 1.3 -> 1.4 first before doing 1.4 -> 1.5 [12:45:13] well, at least pods are still running [12:45:48] what happens to any that crash btw ? does kubelet require the API to be live to restart them ? [12:46:30] akosiaris: nope, for individual crashes it'll be fine [12:46:34] akosiaris: new actions will fail [12:46:44] akosiaris: and if any nodes fail then they won't be recreated elsewhere [12:46:56] akosiaris: so this isn't really an emergency [12:47:54] akosiaris: ofc build failed because it's out of disk space [12:49:35] PROBLEM - Free space - all mounts on tools-docker-builder-03 is CRITICAL: CRITICAL: tools.tools-docker-builder-03.diskspace.root.byte_percentfree (<44.44%) [12:49:51] akosiaris: cleaned up and trying again now [12:52:02] I 've tried finding a copy of kube-apiserver in puppet's clientbucket but the version I found was v1.2.0-alpha.3.7+9495c676ea9175-dirty [12:52:05] :-( [12:52:15] akosiaris: right, that's from a long time ago :) [12:52:55] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [12:52:59] yeah, just trying to think of alternatives to waiting for the build to finish [12:53:01] !log tools cleaned out pbuilder from tools-docker-builder-01 to clean up [12:53:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:53:50] I got rid of the pbuilder cache [12:53:54] which should give us enough space [12:59:36] RECOVERY - Free space - all mounts on tools-docker-builder-03 is OK: OK: All targets OK [13:02:19] akosiaris: hmm, failed again [13:02:36] Error response from daemon: open /var/lib/docker/devicemapper/mnt/65940bcac9154d1577dcd19ed09d098af33b9d51390417062d5bfd0088e41d39/rootfs/bin/[: n [13:02:37] o such file or directory [13:03:27] akosiaris: nvm, it passed 'enough' to get us our binaries [13:03:35] akosiaris: am going to run deploy-master now [13:04:52] :-) [13:05:54] btw the above error was in my cause because of disk space failures. In my case it was a bit more mysterious. The "meta" volume of the LVM thinpool was running out of disk space which was intriguing to figure out [13:06:19] !log tools run /usr/local/bin/deploy-master http://tools-docker-builder-03.tools.eqiad.wmflabs v1.3.3wmf1 on tools-k8s-master-01 [13:06:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:06:49] akosiaris: right. long term, I really want to use overlayfs [13:08:00] akosiaris: forcing a puppet run now, my build didn't get as far as kubelet [13:08:01] overlayfs has that beautiful rename (2) limitation [13:08:19] and the open limitation [13:08:23] https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/ [13:08:39] which means applications need some auditing that they don't do either of the 2 [13:08:51] but there's always the chance somebody shows up and does it [13:09:24] ah!! it runs :-) [13:09:30] yay! [13:09:36] yuvipanda: thanks! [13:10:01] sorry for messing so badly [13:10:10] akosiaris: uhm, it can't find any nodes or pods [13:10:11] this is concerning [13:10:21] hmm [13:10:45] akosiaris: is your patch still cherry-picked? [13:10:51] no, reverted [13:10:55] all 3 of them [13:10:57] akosiaris: hmm, ok [13:11:18] isn't all this info in etcd ? [13:12:05] yes [13:12:40] ooookay... [13:13:58] kube-apiserver's journal seems fine to me up to now btw [13:14:48] the controller manager is complaining about grrrit "Warning' reason: 'FailedCreate' Error creating: Attempt to use docker image not in approved registry" [13:14:51] not sure if it's related [13:16:41] yuvipanda: kubectl on k8s-master-01 is 0 bytes btw [13:17:02] akosiaris: haha, yes, I just found that [13:17:15] akosiaris: my laptop decided now is a great time to freeze (been happening frequently now) [13:17:22] akosiaris: yeah, kubectl on tools-login is fine [13:17:22] it the exact same mine I stepped on right before [13:17:26] cool [13:17:32] ok lemme copy it from there then [13:17:41] akosiaris: yup [13:18:11] akosiaris: I didn't realize that execing a 0len binary would succeed [13:18:18] hehe [13:18:24] akosiaris: anyway, things seem to be sane now [13:18:32] (once the kubectl copy finishes) [13:18:59] akosiaris: am going to restart docker-builder-03 and try to get it to finish doing a build [13:19:04] it's a damn huge binary .. [13:19:09] akosiaris: yup. [13:21:04] done [13:21:12] copying via NFS and it took for ever... [13:21:21] akosiaris: :D ok [13:21:43] kubectl cluster-info [13:21:44] Kubernetes master is running at http://localhost:8080 [13:21:45] :-) [13:21:46] ok [13:21:49] thanks for helping on this [13:22:11] akosiaris: np! sorry about leaving unexploded mines around :) [13:22:41] I have now tested https://gerrit.wikimedia.org/r/#/c/326429 https://gerrit.wikimedia.org/r/#/c/326430 and https://gerrit.wikimedia.org/r/#/c/326441 [13:22:46] take a look if you want [13:23:02] should be in the exact same spirit as the previous ones [13:24:17] akosiaris: is the symlink the right way around? [13:24:36] oh, I see. so deploy-master would put things in /usr/bin [13:24:46] ? [13:24:53] since I think right now it puts things in /usr/local/bin [13:25:22] PROBLEM - Puppet run on tools-docker-builder-03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:26:45] yeah I uploaded the correct symlink patch and have fixed manually the mistake [13:26:55] akosiaris: ah, ok. I'll take a look in a bit [13:26:56] patch not applied obviously yet [14:00:51] (03CR) 10Paladox: [] "Could we fix wikibugs keeps creating new threads (ie creates two usernames like wikibugs and wikibugs_) first please before doing this, ot" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [14:18:37] 06Labs, 10Labs-Infrastructure, 10DBA: Create a cronjob/check to run check_private_data data script and report back - https://phabricator.wikimedia.org/T153680#2887002 (10Marostegui) [14:49:54] 06Labs, 10Labs-Infrastructure, 10DBA: Create a cronjob/check to run check_private_data data script and report back - https://phabricator.wikimedia.org/T153680#2887047 (10jcrespo) [15:57:45] hi, when ssh warns about a change in ECDSA host key for wmflabs instance, should I be paranoid and double check somewhere what the key should be? [15:59:27] sorry, ignore this ^ pebkac [16:01:56] dcausse: :) [16:08:13] chasemp: ... instead of being paranoid I should stop being stupid :) [16:28:44] 06Labs: Request creation of twl-staging labs project - https://phabricator.wikimedia.org/T153549#2883959 (10Andrew) You already have a project, right? Can staging be accomplished on an additional instance (or instances) within your existing project? [16:30:10] Hey bd808 are you busy [16:30:54] Zppix: a bit. having a local network crisis at home. router won't route [16:31:07] Ok nevermind i can wait [16:41:55] 06Labs: Announce sunsetting of Precise on Labs as of March 31st - https://phabricator.wikimedia.org/T153686#2887247 (10Andrew) [16:45:38] Rip precise [16:59:34] (03CR) 10Legoktm: [] "It won't send duplicate messages." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [17:20:06] hi valhallasw`cloud :) [17:21:01] hi legoktm [17:21:23] valhallasw`cloud: would you like to review https://gerrit.wikimedia.org/r/328147 ? [17:23:02] (03CR) 10Paladox: [] "Oh but I saw wikibugs doing it today, so if wikibugs does it wont grrrrit do it?" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [17:23:30] (03CR) 10Legoktm: [] "Can you pastebin a log of wikibugs duplicating messages? I haven't seen any." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [17:24:53] (03CR) 10Paladox: [] "https://phabricator.wikimedia.org/P4651" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [17:25:28] (03CR) 10Paladox: [] "That's from #wikimedia-releng" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [17:26:19] (03CR) 10Paladox: [] "16:11:28 and 16:12:22pm utc +0" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [17:27:32] (03CR) 10Legoktm: [] "If you look at the log carefully, you'll see that the links are different. hashar edited the task twice quickly, so the two messages are c" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [17:28:51] (03CR) 10Paladox: [] "Oh, could we fix it so that it dosent create two wikibugs so that it sends the message from one username instead of splitting one message " [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [17:36:13] (03CR) 10Legoktm: [] "No, that's a feature. It adds additional flood control." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [17:38:14] (03CR) 10Paladox: [] "Oh ok." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [18:00:37] 06Labs, 10Wikispeech: Request creation of Wikispeech labs project - https://phabricator.wikimedia.org/T153430#2887634 (10Andrew) 05Open>03Resolved a:03Andrew @Lokal_Profil, this project has been created. You are currently the only member; as a projectadmin you can create new VMs and add other members or... [18:00:41] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2887638 (10Andrew) [18:02:30] 06Labs, 10Wikispeech: Request creation of Wikispeech labs project - https://phabricator.wikimedia.org/T153430#2887641 (10Andrew) by the way, your username is actually lokal-profil :) [18:03:03] (03CR) 10Merlijn van Deen: [C: 04-1] "Cool. Some minor comments inline." (034 comments) [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/328147 (owner: 10Legoktm) [18:07:55] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2887656 (10Andrew) [18:07:57] 06Labs: Request creation of hound labs project - https://phabricator.wikimedia.org/T148573#2887653 (10Andrew) 05stalled>03Resolved a:03Andrew There has now been a huge amount of discussion about the weird React license. Everyone (including the OSI people) are mostly in agreement that we hate the special p... [18:14:17] 06Labs, 07Puppet: Retire and remove module labs_debrepo - https://phabricator.wikimedia.org/T153612#2885673 (10Multichill) How exactly is this related to T153439 Tim? A bit more info than one line would be nice. [19:02:21] 06Labs: Increase resource quota for dwl - https://phabricator.wikimedia.org/T152456#2887806 (10Giftpflanze) We need more RAM for data processing. We are not aware of other possibilities. Afais, "bigram" has even more RAM, so we'd like to enough room for 1 small and 1 bigram instance. [19:13:08] (03CR) 10Merlijn van Deen: [C: 04-1] Combine account settings screens (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/328056 (owner: 10BryanDavis) [19:13:10] (03CR) 10Merlijn van Deen: [C: 04-1] Add FontAwesome css, fonts, and templatetag helper (032 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/328057 (owner: 10BryanDavis) [19:13:11] (03CR) 10BryanDavis: [] Combine account settings screens (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/328056 (owner: 10BryanDavis) [19:16:46] (03CR) 10Merlijn van Deen: [C: 04-1] Display existing SSH -keys (035 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/328058 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [19:20:30] (03CR) 10BryanDavis: [] Add FontAwesome css, fonts, and templatetag helper (032 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/328057 (owner: 10BryanDavis) [19:26:50] (03CR) 10Merlijn van Deen: [C: 04-1] Check and enforce OATH account protection (034 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/327786 (https://phabricator.wikimedia.org/T144712) (owner: 10BryanDavis) [19:27:53] (03CR) 10Merlijn van Deen: [C: 04-1] Add FontAwesome css, fonts, and templatetag helper (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/328057 (owner: 10BryanDavis) [19:28:55] (03CR) 10BryanDavis: [] Display existing SSH -keys (034 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/328058 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [19:31:41] (03CR) 10Merlijn van Deen: [C: 04-1] Allow deleting SSH keys (032 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/328117 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [19:36:38] (03CR) 10Merlijn van Deen: [C: 04-1] Display existing SSH -keys (033 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/328058 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [19:36:54] (03CR) 10BryanDavis: [] Allow deleting SSH keys (032 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/328117 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [19:39:04] (03CR) 10BryanDavis: [C: 04-1] Display existing SSH -keys (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/328058 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [19:56:35] * halfak curses NFS [20:01:44] NFS on tools is super slow [20:01:55] *tools-bastion-03 [20:01:59] *Home directory NFS [20:02:31] $ time ls [20:02:33] real 0m7.954s [20:07:41] !log tools killed gps_exif_bot2.py (tools.gpsexif), was using 50MB/s io, lagging all of tools-bastion-03 [20:07:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:08:56] again? frack. yuvipanda killed the same thing on 12-17 -- https://tools.wmflabs.org/sal/tools.gpsexif [20:09:31] !log tools.gpsexif Second time we killed gps_exif_bot2.py (tools.gpsexif), was using 50MB/s io, lagging all of tools-bastion-03 [20:09:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gpsexif/SAL [20:17:40] bd808: if its not to much to ask whats tools.zppixbot cpu usage? [20:20:23] Zppix: that question is impossible to answer [20:20:44] Correction i meant MB [20:20:52] MB of what? [20:21:06] Io [20:22:20] Zppix: as far as I can see, there are no zppixbot processes running? [20:22:57] Try by the name of sopel its orgin is tools.zppixbot [20:24:03] Zppix: I'm not going to play hide and seek with processes. [20:24:19] there are no tools.zppixbot jobs on the grid, and nothing running on tools-bastion-03 [20:24:39] Im a dumbass i forgot i moved it to kubectl [20:25:42] in that case, use whatever k8s gives you in terms of resource management. [20:31:42] 06Labs: Request increased quota for services-test labs project - https://phabricator.wikimedia.org/T153711#2888152 (10Eevans) [20:51:11] !log deployment-prep upgrading to python-requests_2.12.3-1_all.deb ./python-urllib3_1.19.1-1_all.deb on deployment-mediawiki04 and deployment-tin [20:51:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [20:57:02] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:20:09] !log deployment-prep upgrading to python-jsonschema_2.5.1-5~bpo8+1_all.deb on deployment-eventlogging03 [21:20:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [21:21:00] !log deployment-prep and also python-functools32_3.2.3.2-3~bpo8+1_all.deb [21:21:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [21:32:00] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:39:24] !log git created instance puppet-paladox so he can use puppet::self for testing his suggested pupppet changes [21:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [21:40:22] Hi it seems that the gerrit reviewer bot is not working. Hashar was added to https://gerrit.wikimedia.org/r/#/c/328238/ automatically, i had to add him manually [22:15:28] (03CR) 10BryanDavis: [] Check and enforce OATH account protection (034 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/327786 (https://phabricator.wikimedia.org/T144712) (owner: 10BryanDavis) [22:24:58] (03CR) 10BryanDavis: [] Add FontAwesome css, fonts, and templatetag helper (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/328057 (owner: 10BryanDavis) [22:35:39] 06Labs: Upgrade Labs to OpenStack Mitaka - https://phabricator.wikimedia.org/T145919#2888448 (10Andrew) [22:35:39] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: keystone: Deprecated: Direct import of auth plugin 'keystone.auth.plugins.wmtotp.Wmtotp' is deprecated as of Liberty in favor of its entrypoint from 'keystone.auth.wmtotp' and may be removed in N - https://phabricator.wikimedia.org/T150773#2888446 (10Andrew... [22:37:20] paladox: maybe make a ticket for that [22:38:32] mutante not sure what you mean [22:39:04] paladox: paladox> Hi it seems that the gerrit reviewer bot is not working.... [22:39:15] oh [22:39:17] ah [22:40:47] 06Labs, 10Labs-Infrastructure, 10Gerrit: Gerrit reviewer bot dosent seem to be working - https://phabricator.wikimedia.org/T153719#2888474 (10Paladox) [22:43:00] 06Labs, 10Labs-Infrastructure, 10Gerrit: Gerrit reviewer bot dosent seem to be working - https://phabricator.wikimedia.org/T153719#2888490 (10Paladox) [22:43:01] mutante ^^ [22:44:26] alrighty [22:45:57] ok [22:54:46] 06Labs, 10Labs-Infrastructure, 10Gerrit: Gerrit reviewer bot dosent seem to be working - https://phabricator.wikimedia.org/T153719#2888520 (10greg) @valhallasw Kinda offtopic, but where should gerrit-reviewer-bot tasks be filed? This current list of projects doesn't make sense to me. I don't see one in Phab... [23:03:32] 06Labs, 10Labs-Infrastructure, 10Gerrit: Gerrit reviewer bot dosent seem to be working - https://phabricator.wikimedia.org/T153719#2888548 (10Legoktm) Issues should be filed at . [23:05:05] 06Labs, 10Labs-Infrastructure: Gerrit reviewer bot dosent seem to be working - https://phabricator.wikimedia.org/T153719#2888565 (10greg) [23:07:32] 06Labs, 10Labs-Infrastructure: Gerrit reviewer bot dosent seem to be working - https://phabricator.wikimedia.org/T153719#2888574 (10Paladox) 05Open>03declined https://github.com/valhallasw/gerrit-reviewer-bot/issues/16 [23:10:00] 06Labs: Upgrade Labs to OpenStack Mitaka - https://phabricator.wikimedia.org/T145919#2888595 (10Andrew) [23:10:03] 06Labs, 10Labs-Infrastructure: nova: clean up deprecated config options for Mitaka - https://phabricator.wikimedia.org/T150775#2888593 (10Andrew) 05Open>03Resolved a:03Andrew [23:31:06] Hello. I'd like to know if it is possible to do an SQL query to the globalrenamequeue - I see there's a renameuser_status in centralauth_p but it's just for GlobalRenameProgress [23:31:17] I cannot find the table to query [23:31:20] logging? [23:41:42] TabbyCat: the table would be renameuser_queue in the centralauth db, but I bet it is not exposed on the labs replicas [23:42:11] I was looking at logging_logindex [23:42:44] bd808: do you have access to that table? I need to know the last action by Vriullop fo check for inactivity [23:43:08] whether approval or denial [23:43:53] yeah... give me a minute to get on the right machine [23:44:19] sure, no need to hurry [23:44:37] be sure to turn of the gas from the teapot before leaving ;) [23:47:19] TabbyCat: I think that 2015-03-24 was the last action there by Vriullop [23:47:30] okay, thanks [23:47:40] the removal for inactivity is correct then [23:48:38] We could probably get that table exposed on the replicas too. I guess we might want to redact the reason column [23:49:10] You could open a phab task asking for access to the table if you are interested in pursuing [23:50:09] access for myself or for the general public? [23:50:47] I'd like to know the 'describe globalrename_queue' to know which could be shown [23:51:02] because searching in Special:GlobalRenameQueue is a PITA [23:54:03] bd808: can I have a Phab paste with the results of "describe renameuser_queue;"? [23:54:49] problem is that even the page is not avalaible for anyone, so maybe it makes no sense to have the table visible in replicas [23:57:03] not sure what is gblrename | promote in logging_logindex