[00:10:31] PROBLEM - Free space - all mounts on tools-exec-1206 is CRITICAL tools.tools-exec-1206.diskspace.root.byte_percentfree (<55.56%) [01:21:16] 10Tool-Labs-tools-Global-user-contributions: Global user contributions: Support sorting results chronologically - https://phabricator.wikimedia.org/T70358#1615371 (10Krinkle) [02:02:49] Coren, I need some help. [02:03:12] I'm trying to start some scripts on xTools but they just die without any errors. :/ [02:04:58] YuviPanda, ^ [04:23:37] PROBLEM - Puppet staleness on tools-exec-1218 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:24:47] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1204 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:25:07] PROBLEM - Puppet staleness on tools-webproxy-02 is CRITICAL 22.22% of data above the critical threshold [43200.0] [04:25:56] PROBLEM - Puppet staleness on tools-exec-1219 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:26:00] PROBLEM - Puppet staleness on tools-services-01 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:26:02] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1402 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:26:48] PROBLEM - Puppet staleness on tools-exec-1404 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:27:06] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1407 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:27:36] PROBLEM - Puppet staleness on tools-exec-1205 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:27:50] PROBLEM - Puppet staleness on tools-redis-02 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:27:58] PROBLEM - Puppet staleness on tools-exec-1405 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:28:48] PROBLEM - Puppet staleness on tools-exec-1204 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:29:10] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1208 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:29:14] PROBLEM - Puppet staleness on tools-exec-1203 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:29:44] PROBLEM - Puppet staleness on tools-webgrid-generic-1403 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:29:46] PROBLEM - Puppet staleness on tools-master is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:30:16] PROBLEM - Puppet staleness on tools-exec-1408 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:30:24] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1207 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:30:25] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1406 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:31:55] PROBLEM - Puppet staleness on tools-mailrelay-02 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:32:29] PROBLEM - Puppet staleness on tools-exec-1402 is CRITICAL 11.11% of data above the critical threshold [43200.0] [04:32:31] PROBLEM - Puppet staleness on tools-exec-1213 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:32:59] PROBLEM - Puppet staleness on tools-exec-1211 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:33:31] PROBLEM - Puppet staleness on tools-webgrid-generic-1404 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:35:09] PROBLEM - Puppet staleness on tools-exec-1202 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:36:03] PROBLEM - Puppet staleness on tools-exec-1401 is CRITICAL 22.22% of data above the critical threshold [43200.0] [04:36:25] PROBLEM - Puppet staleness on tools-exec-1403 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:36:55] PROBLEM - Puppet staleness on tools-precise-dev is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:37:15] PROBLEM - Puppet staleness on tools-exec-1206 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:38:12] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1405 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:38:32] PROBLEM - Puppet staleness on tools-checker-02 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:39:24] PROBLEM - Puppet staleness on tools-services-02 is CRITICAL 22.22% of data above the critical threshold [43200.0] [04:40:18] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1206 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:40:40] PROBLEM - Puppet staleness on tools-exec-1407 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:41:50] PROBLEM - Puppet staleness on tools-bastion-01 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:42:12] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1403 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:42:52] PROBLEM - Puppet staleness on tools-exec-1214 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:43:02] PROBLEM - Puppet staleness on tools-exec-1406 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:43:12] PROBLEM - Puppet staleness on tools-exec-1201 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:43:30] PROBLEM - Puppet staleness on tools-redis-01 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:44:01] PROBLEM - Puppet staleness on tools-exec-1217 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:44:19] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1205 is CRITICAL 22.22% of data above the critical threshold [43200.0] [04:44:31] PROBLEM - Puppet staleness on tools-mail is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:44:41] PROBLEM - Puppet staleness on tools-exec-1409 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:44:41] PROBLEM - Puppet staleness on tools-submit is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:44:49] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1209 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:45:09] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1210 is CRITICAL 11.11% of data above the critical threshold [43200.0] [04:45:21] PROBLEM - Puppet staleness on tools-exec-1208 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:45:27] PROBLEM - Puppet staleness on tools-exec-gift is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:45:39] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1409 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:45:51] PROBLEM - Puppet staleness on tools-exec-1410 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:46:15] PROBLEM - Puppet staleness on tools-shadow is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:46:43] PROBLEM - Puppet staleness on tools-exec-1212 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:47:07] PROBLEM - Puppet staleness on tools-bastion-02 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:47:25] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1202 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:48:03] PROBLEM - Puppet staleness on tools-web-static-01 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:48:24] PROBLEM - Puppet staleness on tools-exec-1216 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:48:30] PROBLEM - Puppet staleness on tools-webgrid-generic-1402 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:48:40] PROBLEM - Puppet staleness on tools-exec-cyberbot is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:48:44] PROBLEM - Puppet staleness on tools-webproxy-01 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:49:08] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1408 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:49:22] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1410 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:50:36] PROBLEM - Puppet staleness on tools-exec-1209 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:53:22] PROBLEM - Puppet staleness on tools-checker-01 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:53:54] PROBLEM - Puppet staleness on tools-web-static-02 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:54:46] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1201 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:55:46] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1203 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:56:23] PROBLEM - Puppet staleness on tools-exec-1207 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:56:29] PROBLEM - Puppet staleness on tools-exec-1215 is CRITICAL 44.44% of data above the critical threshold [43200.0] [06:20:24] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1401 is CRITICAL 33.33% of data above the critical threshold [43200.0] [06:31:19] PROBLEM - Puppet staleness on tools-exec-1210 is CRITICAL 44.44% of data above the critical threshold [43200.0] [06:40:31] RECOVERY - Free space - all mounts on tools-exec-1206 is OK All targets OK [07:52:32] YuviPanda: ^ [07:52:36] your aptly broke all the things [07:52:46] valhallasw`cloud: oh [07:52:52] valhallasw`cloud: why at 7AM!? [07:53:14] I don't know :( [07:53:46] Cyberpower678: please file a bug [07:55:23] valhallasw`cloud: manual puppet runs work ok [07:55:28] I guess that means apt-get update is broken [07:55:30] yeah it's the apt-get update [07:55:45] which is why there are no puppet failure warnings [07:56:11] yup [07:56:39] also [07:56:40] W: Failed to fetch http://tools-services-01/repo/dists/precise-tools/main/binary-i386/Packages 404 Not Found [07:56:41] wut [07:56:43] i386?! [07:58:09] tools-login complains about http://tools-services-01/repo/dists/trusty-tools/main/source/Sources [07:58:13] which sounds saner [07:59:01] yeah [07:59:16] I think it's just we haven't actually published any packages [07:59:26] hm, maybe [08:00:01] did you do aptly repo create? [08:00:23] it seems so [08:01:27] shall I just add everything in /data/project/.system/etc? [08:02:39] valhallasw`cloud: ya. but I think the problem is elsewhere, moment [08:02:45] we don't actually have any source packages [08:02:51] oh! [08:02:53] right. [08:02:58] ok, ran sudo aptly repo add trusty-tools . for trusty [08:03:20] !log tools added all packages in data/project/.system/deb-trusty to aptly repo trusty-tools [08:03:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [08:04:00] valhallasw`cloud: cool! https://wikitech.wikimedia.org/wiki/Aptly#Adding_Packages you need to an update as well. [08:04:15] !log tools added all packages in data/project/.system/deb-precise to aptly repo precise-tools [08:04:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [08:04:29] why do we skip signing? signing is awesome! :-p [08:05:01] !log tools Publish for local repo ./trusty-tools [all, amd64] publishes {main: [trusty-tools]} has been successfully updated.
Publish for local repo ./precise-tools [all, amd64] publishes {main: [precise-tools]} has been successfully updated. [08:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [08:06:15] valhallasw`cloud: heh, should set it up [08:06:27] RECOVERY - Puppet staleness on tools-exec-1215 is OK Less than 1.00% above the threshold [3600.0] [08:08:27] valhallasw`cloud: ok, just the i386 error now [08:08:29] which makes me go [08:08:30] wtf [08:08:33] in that [08:08:38] I thought we only had amd64 [08:08:56] YuviPanda: no, apt-get update still fails on bastion-01 [08:09:07] W: Failed to fetch http://tools-services-01/repo/dists/trusty-tools/main/source/Sources 404 Not Found [08:09:10] valhallasw`cloud: did you do a puppet run first? [08:09:18] oh :-p [08:09:20] the puppet run gets rid of the deb-src line [08:09:38] but puppet won't run because of the failing apt! :-p [08:09:43] via cron, anyway [08:10:11] yeah [08:10:16] valhallasw`cloud: I have a script! [08:10:20] I can say 'salt' but fuck salt [08:10:25] :-p [08:10:41] https://github.com/yuvipanda/personal-wiki [08:10:42] tada [08:11:22] valhallasw`cloud: so does this mean that our precise instances are actually a different virtual arch than our trusty ones?! [08:11:44] no, don't think so. Linux tools-exec-1201 3.2.0-75-virtual #110-Ubuntu SMP Tue Dec 16 19:24:01 UTC 2014 x86_64 [08:12:03] then why is it looking for i386?! [08:12:43] multiarch stuff? [08:12:48] not sure [08:13:09] or maybe it's just a fallback when it can't find the x64 one? [08:13:13] valhallasw`cloud: so I've created only an amd64 repo [08:13:16] hm no [08:13:22] by default [08:13:38] well... this is where aptly makes a mess [08:14:20] "When repository is first published, list of architectures is stored in the database and can’t be changed.' [08:14:28] By default aptly would guess list of architectures from the contents of the snapshot or local repository being published. [08:15:25] valhallasw`cloud: ah. I can just change it in puppet and delete the repo, it'll get recreated [08:15:38] yeah, but we don't want i386 :P [08:15:38] valhallasw`cloud: see -operations [08:21:00] RECOVERY - Puppet staleness on tools-services-01 is OK Less than 1.00% above the threshold [3600.0] [08:21:50] RECOVERY - Puppet staleness on tools-bastion-01 is OK Less than 1.00% above the threshold [3600.0] [08:28:13] RECOVERY - Puppet staleness on tools-exec-1201 is OK Less than 1.00% above the threshold [3600.0] [08:32:19] 6Labs: Disable multiarch support in all Labs precise instances - https://phabricator.wikimedia.org/T111760#1615743 (10yuvipanda) 3NEW [08:44:24] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 30.00% of data above the critical threshold [0.0] [08:53:49] RECOVERY - Puppet staleness on tools-exec-1204 is OK Less than 1.00% above the threshold [3600.0] [08:53:57] (03PS1) 10Jean-Frédéric: Extract method is_template_present_in_page [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236749 (https://phabricator.wikimedia.org/T111757) [08:54:24] (03CR) 10Jean-Frédéric: [C: 032] Extract method is_template_present_in_page [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236749 (https://phabricator.wikimedia.org/T111757) (owner: 10Jean-Frédéric) [08:55:15] (03Merged) 10jenkins-bot: Extract method is_template_present_in_page [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236749 (https://phabricator.wikimedia.org/T111757) (owner: 10Jean-Frédéric) [08:59:29] RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0] [09:19:03] 6Labs, 6operations, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech, Gerrit and Phabricator - https://phabricator.wikimedia.org/T85913#1615861 (10adrianheine) I'm back, and I'm happy to walk through the process with someone on IRC if that's necessary :) [09:19:33] (03PS1) 10Jean-Frédéric: Fix bug introduced in 50e3ce9d [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236752 [09:19:49] (03CR) 10Jean-Frédéric: [C: 032] Fix bug introduced in 50e3ce9d [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236752 (owner: 10Jean-Frédéric) [09:19:55] (03Merged) 10jenkins-bot: Fix bug introduced in 50e3ce9d [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236752 (owner: 10Jean-Frédéric) [09:27:47] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 7Graphite, 7Shinken: Delete more specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T111540#1615880 (10fgiunchedi) agreed, what would be the easiest way to get a map of project -> list of instances? @yuvipanda @andrew ? [09:43:12] valhallasw`cloud: BTW did you add all the things on nfs to appropriate aptly repo? [09:43:26] YuviPanda: yes [09:43:36] precise to precise-tools, trysty to trusty-tools [09:43:43] YuviPanda: but we still need a preferences.d for aptly [09:44:05] Hmm? [09:44:22] so that aptly > apt.wm.o > ubuntu default [09:44:27] Can't we just get rid of the labsdebrepo role [09:44:34] Wouldn't ensure latest take care of that [09:44:53] I don't know [09:45:24] and in that case, I don't know if we want latest? :P [09:45:52] e.g. if we have some custom version in aptly, we don't want a newer version from ubuntu to supersede it [09:46:12] valhallasw`cloud: we already have ensure latest on exec environment and dev environ [09:46:30] But yeah a preference file seems good idea anyway [09:46:32] I know [09:46:54] * YuviPanda is still in a train [10:08:28] s51053 and s52261, killing your multi-hour queries because they are starting to affect the performance of labsdb1001 [10:10:09] valhallasw`cloud: doing your patches now! [10:16:02] valhallasw`cloud: done [10:44:10] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 66.67% of data above the critical threshold [0.0] [10:56:44] YuviPanda: oh dear. [10:56:52] what did I break this time [10:57:11] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: Package[python3-scipy] is already declared in file /etc/puppet/modules/toollabs/manifests/genpp/python_exec_trusty.pp:130; cannot redeclare at /etc/puppet/modules/toollabs/manifests/exec_environ.pp:360 on node tools-bastion-01.tools.eqiad.wmflabs [10:57:13] raah. [10:57:45] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 7Graphite, 7Shinken: Delete more specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T111540#1616099 (10yuvipanda) eb3e3dbd81d263791d2ba1909f64f8a84531c65e for my revert of my original garbage collector script. It failed because t... [10:58:02] valhallasw`cloud: need to kill them from exec_environ I guess? [10:58:06] valhallasw`cloud: all the python modules [10:58:12] I thought I did [10:58:54] ah there it is [10:59:09] am going to go for food with the WMDE folks now :( [10:59:11] I'll cya in a bit [10:59:19] ok [11:01:17] this is also fixed with re quire_package thing of course... [11:03:49] I should add apt.wm.o and aptly as sources to genpp at some point [11:04:48] YuviPanda: https://gerrit.wikimedia.org/r/#/c/236762/ should fix it [12:05:31] valhallasw`cloud: merged [12:06:04] valhallasw`cloud: we should maybe move the list of packages into a segmented yaml file [12:15:20] what's the advantage of that? [12:15:49] or do you mean all packages? and then use some genpp magic to install the right ones? [12:17:23] valhallasw`cloud: all of them. then we can build docker containers for kubernetes that can have the right set of packages [12:17:29] achso [12:18:02] ....wouldn't that make the docker containers huge? [12:18:15] yeah so you can compose them [12:18:24] php packages separate from python ones etc [12:18:30] mmm [12:18:33] and of course, eventually narrow those down [12:18:40] force venv for almost all of the things [12:18:57] for python I would just not supply any default packages. venv all the things [12:19:01] yeah [12:19:06] the system-wide packages are just for one-off scripts I'd say [12:19:07] there is autobuilders [12:19:08] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [12:19:11] \o/ [12:20:20] valhallasw`cloud: https://hub.docker.com/_/python/ see the onbuild one [12:20:30] valhallasw`cloud: those autobuild by setting up requirements.txt [12:20:50] YuviPanda: we should prebuild wheels though [12:20:58] that would make everyones life so much easier [12:21:05] not entirely sure how to do that though [12:21:16] because debian thinks wheels are bad, mkay [12:25:27] YuviPanda: could you force the puppet run on all the hosts? thanks :-) [12:25:41] valhallasw`cloud: yeah, let me do that [12:36:42] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1405 is CRITICAL 20.00% of data above the critical threshold [0.0] [12:37:16] bah [12:37:52] valhallasw`cloud: am forcing runs now [12:37:59] but something is failing? [12:38:30] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 30.00% of data above the critical threshold [0.0] [12:38:31] I think it might've been a pssh timeout [12:38:36] ok or not [12:38:46] but that's probably it - killed the run halfway through [12:39:16] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 33.33% of data above the critical threshold [0.0] [12:39:52] yeah, I think that's it [12:39:56] PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL 40.00% of data above the critical threshold [0.0] [12:40:06] sigh [12:42:18] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 75.00% of data above the critical threshold [0.0] [12:42:32] PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL 66.67% of data above the critical threshold [0.0] [12:42:42] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 60.00% of data above the critical threshold [0.0] [12:43:33] RECOVERY - Puppet staleness on tools-redis-01 is OK Less than 1.00% above the threshold [3600.0] [12:43:47] PROBLEM - Puppet failure on tools-web-static-01 is CRITICAL 30.00% of data above the critical threshold [0.0] [12:44:11] PROBLEM - Puppet failure on tools-redis-02 is CRITICAL 33.33% of data above the critical threshold [0.0] [12:45:29] PROBLEM - Puppet failure on tools-mailrelay-02 is CRITICAL 30.00% of data above the critical threshold [0.0] [12:45:49] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [3600.0] [12:45:51] PROBLEM - Puppet failure on tools-web-static-02 is CRITICAL 20.00% of data above the critical threshold [0.0] [12:45:58] * YuviPanda hates everything now [12:46:03] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [3600.0] [12:48:09] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1405 is OK Less than 1.00% above the threshold [3600.0] [12:48:21] RECOVERY - Puppet staleness on tools-exec-1216 is OK Less than 1.00% above the threshold [3600.0] [12:48:35] RECOVERY - Puppet staleness on tools-exec-1218 is OK Less than 1.00% above the threshold [3600.0] [12:48:59] RECOVERY - Puppet staleness on tools-exec-1217 is OK Less than 1.00% above the threshold [3600.0] [12:51:52] RECOVERY - Puppet staleness on tools-mailrelay-02 is OK Less than 1.00% above the threshold [3600.0] [12:52:50] RECOVERY - Puppet staleness on tools-redis-02 is OK Less than 1.00% above the threshold [3600.0] [12:53:04] RECOVERY - Puppet staleness on tools-web-static-01 is OK Less than 1.00% above the threshold [3600.0] [12:53:22] RECOVERY - Puppet staleness on tools-checker-01 is OK Less than 1.00% above the threshold [3600.0] [12:53:48] RECOVERY - Puppet failure on tools-web-static-01 is OK Less than 1.00% above the threshold [0.0] [12:53:52] RECOVERY - Puppet staleness on tools-web-static-02 is OK Less than 1.00% above the threshold [3600.0] [12:54:14] RECOVERY - Puppet failure on tools-redis-02 is OK Less than 1.00% above the threshold [0.0] [12:54:54] RECOVERY - Puppet failure on tools-exec-1217 is OK Less than 1.00% above the threshold [0.0] [12:55:08] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [3600.0] [12:55:30] RECOVERY - Puppet failure on tools-mailrelay-02 is OK Less than 1.00% above the threshold [0.0] [12:55:48] RECOVERY - Puppet failure on tools-web-static-02 is OK Less than 1.00% above the threshold [0.0] [12:56:16] valhallasw`cloud: ^ is doing ok now [12:56:17] the pssh [12:56:22] <3 [12:56:43] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1405 is OK Less than 1.00% above the threshold [0.0] [12:56:43] valhallasw`cloud: now as long as I Don't lose network :) [12:56:55] RECOVERY - Puppet staleness on tools-precise-dev is OK Less than 1.00% above the threshold [3600.0] [12:57:41] RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0] [12:58:29] RECOVERY - Puppet staleness on tools-webgrid-generic-1404 is OK Less than 1.00% above the threshold [3600.0] [12:59:11] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [3600.0] [12:59:43] RECOVERY - Puppet staleness on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [3600.0] [12:59:47] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1201 is OK Less than 1.00% above the threshold [3600.0] [12:59:51] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1209 is OK Less than 1.00% above the threshold [3600.0] [13:00:13] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [3600.0] [13:00:23] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [3600.0] [13:02:25] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1202 is OK Less than 1.00% above the threshold [3600.0] [13:02:34] RECOVERY - Puppet failure on tools-exec-1218 is OK Less than 1.00% above the threshold [0.0] [13:03:28] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [0.0] [13:03:30] RECOVERY - Puppet staleness on tools-webgrid-generic-1402 is OK Less than 1.00% above the threshold [3600.0] [13:04:20] RECOVERY - Puppet failure on tools-exec-1215 is OK Less than 1.00% above the threshold [0.0] [13:04:20] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1205 is OK Less than 1.00% above the threshold [3600.0] [13:04:22] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [3600.0] [13:04:50] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1204 is OK Less than 1.00% above the threshold [3600.0] [13:05:36] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1409 is OK Less than 1.00% above the threshold [3600.0] [13:05:59] valhallasw`cloud: it's taking a while becaus it's also installing all the py3 packages [13:06:05] yeah [13:06:07] that's fine [13:07:06] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [3600.0] [13:07:14] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0] [13:07:16] YuviPanda: https://cdn.rawgit.com/wikimedia/operations-puppet/production/modules/toollabs/manifests/genpp/report-python.html \o/ [13:07:31] valhallasw`cloud: nice [13:07:36] we should expand that to all packages :D [13:07:39] not complete, though, and partially wrong [13:07:50] e.g. python-requests on precise is 2.0 from apt.wm.o [13:07:51] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 7Graphite, 7Shinken: Delete more specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T111540#1616217 (10fgiunchedi) a:3fgiunchedi I'll take this, setting to low [13:07:58] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 7Graphite, 7Shinken: Delete more specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T111540#1616219 (10fgiunchedi) p:5Triage>3Low [13:08:30] RECOVERY - Puppet staleness on tools-checker-02 is OK Less than 1.00% above the threshold [3600.0] [13:08:44] YuviPanda: also, can we make apt.tools.wmflabs.org link to the tools apt repo? or is that a bad idea? [13:08:57] I can get the package list in another way, I suppose [13:09:10] hm, I'm wondering if apt.wm.o has the package list in the format I need at all... [13:09:11] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1408 is OK Less than 1.00% above the threshold [3600.0] [13:10:27] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1406 is OK Less than 1.00% above the threshold [3600.0] [13:10:27] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1401 is OK Less than 1.00% above the threshold [3600.0] [13:10:45] valhallasw`cloud: can't do sssl on *.tools.wmflabs.org [13:10:55] don't need ssl if we sign the packages :D [13:10:57] RECOVERY - Puppet staleness on tools-exec-1219 is OK Less than 1.00% above the threshold [3600.0] [13:11:13] valhallasw`cloud: still feels icky :P we want to be ssl only at some point [13:11:43] hrm, so I suppose I will have to parse packages.gz [13:12:09] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1403 is OK Less than 1.00% above the threshold [3600.0] [13:12:31] RECOVERY - Puppet staleness on tools-exec-1213 is OK Less than 1.00% above the threshold [3600.0] [13:12:39] 10MB gz. might be ok, actually [13:12:49] RECOVERY - Puppet staleness on tools-exec-1214 is OK Less than 1.00% above the threshold [3600.0] [13:13:03] still 10x more than the files I use now :( [13:15:21] RECOVERY - Puppet staleness on tools-exec-1408 is OK Less than 1.00% above the threshold [3600.0] [13:15:47] RECOVERY - Puppet staleness on tools-exec-1410 is OK Less than 1.00% above the threshold [3600.0] [13:16:46] RECOVERY - Puppet staleness on tools-exec-1212 is OK Less than 1.00% above the threshold [3600.0] [13:17:58] RECOVERY - Puppet staleness on tools-exec-1405 is OK Less than 1.00% above the threshold [3600.0] [13:18:00] RECOVERY - Puppet staleness on tools-exec-1211 is OK Less than 1.00% above the threshold [3600.0] [13:18:04] RECOVERY - Puppet staleness on tools-exec-1406 is OK Less than 1.00% above the threshold [3600.0] [13:19:38] RECOVERY - Puppet staleness on tools-exec-1409 is OK Less than 1.00% above the threshold [3600.0] [13:20:40] RECOVERY - Puppet staleness on tools-exec-1407 is OK Less than 1.00% above the threshold [3600.0] [13:21:04] RECOVERY - Puppet staleness on tools-exec-1401 is OK Less than 1.00% above the threshold [3600.0] [13:21:24] RECOVERY - Puppet staleness on tools-exec-1403 is OK Less than 1.00% above the threshold [3600.0] [13:21:44] RECOVERY - Puppet staleness on tools-exec-1404 is OK Less than 1.00% above the threshold [3600.0] [13:22:30] RECOVERY - Puppet staleness on tools-exec-1402 is OK Less than 1.00% above the threshold [3600.0] [13:24:17] RECOVERY - Puppet staleness on tools-exec-1203 is OK Less than 1.00% above the threshold [3600.0] [13:25:07] RECOVERY - Puppet staleness on tools-exec-1202 is OK Less than 1.00% above the threshold [3600.0] [13:25:23] RECOVERY - Puppet staleness on tools-exec-1208 is OK Less than 1.00% above the threshold [3600.0] [13:25:39] RECOVERY - Puppet staleness on tools-exec-1209 is OK Less than 1.00% above the threshold [3600.0] [13:26:23] RECOVERY - Puppet staleness on tools-exec-1210 is OK Less than 1.00% above the threshold [3600.0] [13:26:23] RECOVERY - Puppet staleness on tools-exec-1207 is OK Less than 1.00% above the threshold [3600.0] [13:27:07] RECOVERY - Puppet staleness on tools-bastion-02 is OK Less than 1.00% above the threshold [3600.0] [13:27:19] RECOVERY - Puppet staleness on tools-exec-1206 is OK Less than 1.00% above the threshold [3600.0] [13:27:33] RECOVERY - Puppet staleness on tools-exec-1205 is OK Less than 1.00% above the threshold [3600.0] [13:28:47] RECOVERY - Puppet staleness on tools-webproxy-01 is OK Less than 1.00% above the threshold [3600.0] [13:29:25] RECOVERY - Puppet staleness on tools-services-02 is OK Less than 1.00% above the threshold [3600.0] [13:29:31] RECOVERY - Puppet staleness on tools-mail is OK Less than 1.00% above the threshold [3600.0] [13:29:36] RECOVERY - Puppet staleness on tools-submit is OK Less than 1.00% above the threshold [3600.0] [13:29:48] RECOVERY - Puppet staleness on tools-master is OK Less than 1.00% above the threshold [3600.0] [13:30:10] RECOVERY - Puppet staleness on tools-webproxy-02 is OK Less than 1.00% above the threshold [3600.0] [13:30:26] RECOVERY - Puppet staleness on tools-exec-gift is OK Less than 1.00% above the threshold [3600.0] [13:31:14] RECOVERY - Puppet staleness on tools-shadow is OK Less than 1.00% above the threshold [3600.0] [13:33:40] RECOVERY - Puppet staleness on tools-exec-cyberbot is OK Less than 1.00% above the threshold [3600.0] [13:35:12] 6Labs, 10Tool-Labs, 10pywikibot-core, 5Patch-For-Review: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616257 (10jayvdb) So it is available on trusty now, but not precise. This should be fairly easy to package for precise. Is there a guide for how to get a package into the... [13:35:23] 6Labs, 10Tool-Labs, 10pywikibot-core: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616258 (10jayvdb) [13:40:48] 6Labs, 10Tool-Labs, 10pywikibot-core: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616261 (10yuvipanda) I don't think we should maintain any new python packages that aren't already being maintained upstream for that version of ubuntu. Why does it need to be in precise? [13:43:47] valhallasw`cloud, about what? [13:52:44] 6Labs, 10Tool-Labs, 10pywikibot-core: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616285 (10valhallasw) Because many people are using the default settings for jsub, which means they run their stuff on precise hosts. [13:53:05] YuviPanda: I thought backports were relatively easy? [13:54:08] valhallasw`cloud: yes but I thought we decided to not do any of it ourselves? and backport updates when they happen... [13:55:19] oh, I interpreted it as 'we're not going to build any more python packages that are not already packaged' [13:55:32] but aiui, backporting is a oneliner (rather than fooling around with debuild for two hours) [13:55:47] and keeping up the updates. [13:56:58] python packages? updates? :-p [13:57:36] I also don't know what ubuntu's rules on backports are [13:58:28] 6Labs, 10Tool-Labs, 10pywikibot-core: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616298 (10jayvdb) The latest patch adds a simple enum class so enum34 is optional. I would like to add an ImportWarning telling users to install enum34. That will just annoy toollab users who... [13:58:39] 10Quarry: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#1616299 (10Jarekt) 3NEW [14:00:13] 10Quarry: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#1616307 (10yuvipanda) Those aren't actually running for months, I think - those happen when for some reason there's an unhandled exception in the query results serializer, preventing it from updating the status accordingly... Nee... [14:02:19] valhallasw`cloud: oh, did you build a 'tools-packages' instance? [14:02:24] YuviPanda: no? [14:02:38] valhallasw`cloud: > tools-packages [14:02:42] I wonder who that is from [14:03:58] image id: (missing) [14:07:40] YuviPanda: scfc tried to log in aug 31, but otherwise I don't see any logins [14:07:43] apart from you and I today [14:08:04] valhallasw`cloud: so I suppose that's his [14:28:56] fixing up salt version on all labs instances that are either behind (2014.1.11) or ahead (2015.5.1), one instance at a time, interruption to salt services should be negligible [14:29:47] apergos: ty apergos [14:30:20] yw (it's all automated now) [14:30:30] after that I can check salt process count and fix those up [14:30:44] then I can see if any are still stuck on the authentication error and fix those up [14:30:51] (all automated as well) [14:31:20] lemme put this on a ticket now that I have the script going [14:34:00] 6Labs, 10Salt: clean up old ec2id-based salt keys on labs - https://phabricator.wikimedia.org/T103089#1616387 (10ArielGlenn) still a few more: language-dev.language.eqiad.wmflabs i-00000585.language.eqiad.wmflabs last puppet run: Sep 7 09:04 marathon-master-01.marathon.eqiad.wmflabs i-00000c08.eqiad.wmflabs... [14:46:05] 6Labs, 10Labs-Infrastructure, 3Labs-sprint-112, 3labs-sprint-113: Update Labs to OpenStack Kilo - https://phabricator.wikimedia.org/T110045#1616457 (10Andrew) [14:46:36] 6Labs, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Have catchpoint checks for all labs services (Tracking) - https://phabricator.wikimedia.org/T107058#1616459 (10Andrew) [14:47:38] 6Labs, 10wikitech.wikimedia.org, 3Labs-sprint-112, 5Patch-For-Review, and 2 others: Can't list instances on Special:NovaInstance - https://phabricator.wikimedia.org/T110629#1616462 (10Andrew) 5Open>3Resolved I'm pretty sure the cause of the problem was my bad cache-refresh code, which is now reverted. [14:47:38] 6Labs, 10Tool-Labs, 5Patch-For-Review: Labs_lvm::Volume[separate-tmp] is noisy on execution hosts - https://phabricator.wikimedia.org/T109933#1616464 (10Andrew) [14:47:41] 6Labs, 10Salt, 6operations: salt does not run reliably for toollabs - https://phabricator.wikimedia.org/T99213#1616465 (10ArielGlenn) I made a full pass on these and labs looked ok. But here we are in September and labs is generally in bad shape. This includes toollabs. Let me list here the issues: 1) mor... [14:47:58] 6Labs, 10Salt, 6operations: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1616466 (10ArielGlenn) [14:48:07] yuvipanda: documented [14:48:30] thanks [14:48:49] you know how you suffer with your ssh loops? multiply that by however many labs instances there are [14:48:54] :D [14:49:09] I don't envy you [14:49:16] me either :-D [15:37:51] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, and 3 others: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1616722 (10yuvipanda) [15:41:25] YuviPanda: hm, why aren't we just using a PPA for our toollabs packages? [15:42:04] i.e. we publish to our own PPA and use that PPA as source as well [15:42:35] jessie, and ppas live outside labs [15:42:37] also meetinsg [15:42:44] jessie, yes, fair point [15:42:53] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Setup monitoring and reporting for disk space usage of each project on NFS - https://phabricator.wikimedia.org/T106476#1616739 (10coren) [15:43:11] and yes, they live outside labs, that's the point. Giving the rest of the world access to our packages [15:43:33] without much extra effort [15:43:35] 6Labs, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Have catchpoint checks for all labs services (Tracking) - https://phabricator.wikimedia.org/T107058#1616745 (10Andrew) a:5coren>3Andrew [15:43:39] 6Labs, 3Labs-sprint-112, 3ToolLabs-Goals-Q4, 3labs-sprint-113: Fix documentation & puppetization for labs NFS - https://phabricator.wikimedia.org/T88723#1616746 (10coren) [15:43:42] 6Labs, 10Analytics, 10Labs-Infrastructure, 3Labs-Sprint-108, 5Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1616747 (10ellery) Thanks Otto! [15:45:43] 6Labs, 3labs-sprint-113: Evaluate gridengine's use of NFS and (possibly) move it to a different module - https://phabricator.wikimedia.org/T111797#1616757 (10coren) 3NEW [15:46:10] 6Labs, 3labs-sprint-113: Evaluate gridengine's use of NFS and (possibly) move it to a different volume - https://phabricator.wikimedia.org/T111797#1616764 (10coren) [15:50:38] 6Labs, 3labs-sprint-113: Separate scratch and tools NFS volumes to separate physical devices - https://phabricator.wikimedia.org/T111802#1616811 (10coren) 3NEW [15:53:22] valhallasw`cloud: first k8s node added to toollabs!!!! [15:55:33]