[00:10:31] PROBLEM - Free space - all mounts on tools-exec-1206 is CRITICAL tools.tools-exec-1206.diskspace.root.byte_percentfree (<55.56%) [01:21:16] 10Tool-Labs-tools-Global-user-contributions: Global user contributions: Support sorting results chronologically - https://phabricator.wikimedia.org/T70358#1615371 (10Krinkle) [02:02:49] Coren, I need some help. [02:03:12] I'm trying to start some scripts on xTools but they just die without any errors. :/ [02:04:58] YuviPanda, ^ [04:23:37] PROBLEM - Puppet staleness on tools-exec-1218 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:24:47] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1204 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:25:07] PROBLEM - Puppet staleness on tools-webproxy-02 is CRITICAL 22.22% of data above the critical threshold [43200.0] [04:25:56] PROBLEM - Puppet staleness on tools-exec-1219 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:26:00] PROBLEM - Puppet staleness on tools-services-01 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:26:02] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1402 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:26:48] PROBLEM - Puppet staleness on tools-exec-1404 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:27:06] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1407 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:27:36] PROBLEM - Puppet staleness on tools-exec-1205 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:27:50] PROBLEM - Puppet staleness on tools-redis-02 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:27:58] PROBLEM - Puppet staleness on tools-exec-1405 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:28:48] PROBLEM - Puppet staleness on tools-exec-1204 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:29:10] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1208 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:29:14] PROBLEM - Puppet staleness on tools-exec-1203 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:29:44] PROBLEM - Puppet staleness on tools-webgrid-generic-1403 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:29:46] PROBLEM - Puppet staleness on tools-master is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:30:16] PROBLEM - Puppet staleness on tools-exec-1408 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:30:24] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1207 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:30:25] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1406 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:31:55] PROBLEM - Puppet staleness on tools-mailrelay-02 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:32:29] PROBLEM - Puppet staleness on tools-exec-1402 is CRITICAL 11.11% of data above the critical threshold [43200.0] [04:32:31] PROBLEM - Puppet staleness on tools-exec-1213 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:32:59] PROBLEM - Puppet staleness on tools-exec-1211 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:33:31] PROBLEM - Puppet staleness on tools-webgrid-generic-1404 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:35:09] PROBLEM - Puppet staleness on tools-exec-1202 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:36:03] PROBLEM - Puppet staleness on tools-exec-1401 is CRITICAL 22.22% of data above the critical threshold [43200.0] [04:36:25] PROBLEM - Puppet staleness on tools-exec-1403 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:36:55] PROBLEM - Puppet staleness on tools-precise-dev is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:37:15] PROBLEM - Puppet staleness on tools-exec-1206 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:38:12] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1405 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:38:32] PROBLEM - Puppet staleness on tools-checker-02 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:39:24] PROBLEM - Puppet staleness on tools-services-02 is CRITICAL 22.22% of data above the critical threshold [43200.0] [04:40:18] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1206 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:40:40] PROBLEM - Puppet staleness on tools-exec-1407 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:41:50] PROBLEM - Puppet staleness on tools-bastion-01 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:42:12] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1403 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:42:52] PROBLEM - Puppet staleness on tools-exec-1214 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:43:02] PROBLEM - Puppet staleness on tools-exec-1406 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:43:12] PROBLEM - Puppet staleness on tools-exec-1201 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:43:30] PROBLEM - Puppet staleness on tools-redis-01 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:44:01] PROBLEM - Puppet staleness on tools-exec-1217 is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:44:19] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1205 is CRITICAL 22.22% of data above the critical threshold [43200.0] [04:44:31] PROBLEM - Puppet staleness on tools-mail is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:44:41] PROBLEM - Puppet staleness on tools-exec-1409 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:44:41] PROBLEM - Puppet staleness on tools-submit is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:44:49] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1209 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:45:09] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1210 is CRITICAL 11.11% of data above the critical threshold [43200.0] [04:45:21] PROBLEM - Puppet staleness on tools-exec-1208 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:45:27] PROBLEM - Puppet staleness on tools-exec-gift is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:45:39] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1409 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:45:51] PROBLEM - Puppet staleness on tools-exec-1410 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:46:15] PROBLEM - Puppet staleness on tools-shadow is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:46:43] PROBLEM - Puppet staleness on tools-exec-1212 is CRITICAL 30.00% of data above the critical threshold [43200.0] [04:47:07] PROBLEM - Puppet staleness on tools-bastion-02 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:47:25] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1202 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:48:03] PROBLEM - Puppet staleness on tools-web-static-01 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:48:24] PROBLEM - Puppet staleness on tools-exec-1216 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:48:30] PROBLEM - Puppet staleness on tools-webgrid-generic-1402 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:48:40] PROBLEM - Puppet staleness on tools-exec-cyberbot is CRITICAL 20.00% of data above the critical threshold [43200.0] [04:48:44] PROBLEM - Puppet staleness on tools-webproxy-01 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:49:08] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1408 is CRITICAL 55.56% of data above the critical threshold [43200.0] [04:49:22] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1410 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:50:36] PROBLEM - Puppet staleness on tools-exec-1209 is CRITICAL 10.00% of data above the critical threshold [43200.0] [04:53:22] PROBLEM - Puppet staleness on tools-checker-01 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:53:54] PROBLEM - Puppet staleness on tools-web-static-02 is CRITICAL 50.00% of data above the critical threshold [43200.0] [04:54:46] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1201 is CRITICAL 33.33% of data above the critical threshold [43200.0] [04:55:46] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1203 is CRITICAL 40.00% of data above the critical threshold [43200.0] [04:56:23] PROBLEM - Puppet staleness on tools-exec-1207 is CRITICAL 44.44% of data above the critical threshold [43200.0] [04:56:29] PROBLEM - Puppet staleness on tools-exec-1215 is CRITICAL 44.44% of data above the critical threshold [43200.0] [06:20:24] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1401 is CRITICAL 33.33% of data above the critical threshold [43200.0] [06:31:19] PROBLEM - Puppet staleness on tools-exec-1210 is CRITICAL 44.44% of data above the critical threshold [43200.0] [06:40:31] RECOVERY - Free space - all mounts on tools-exec-1206 is OK All targets OK [07:52:32] YuviPanda: ^ [07:52:36] your aptly broke all the things [07:52:46] valhallasw`cloud: oh [07:52:52] valhallasw`cloud: why at 7AM!? [07:53:14] I don't know :( [07:53:46] Cyberpower678: please file a bug [07:55:23] valhallasw`cloud: manual puppet runs work ok [07:55:28] I guess that means apt-get update is broken [07:55:30] yeah it's the apt-get update [07:55:45] which is why there are no puppet failure warnings [07:56:11] yup [07:56:39] also [07:56:40] W: Failed to fetch http://tools-services-01/repo/dists/precise-tools/main/binary-i386/Packages 404 Not Found [07:56:41] wut [07:56:43] i386?! [07:58:09] tools-login complains about http://tools-services-01/repo/dists/trusty-tools/main/source/Sources [07:58:13] which sounds saner [07:59:01] yeah [07:59:16] I think it's just we haven't actually published any packages [07:59:26] hm, maybe [08:00:01] did you do aptly repo create? [08:00:23] it seems so [08:01:27] shall I just add everything in /data/project/.system/etc? [08:02:39] valhallasw`cloud: ya. but I think the problem is elsewhere, moment [08:02:45] we don't actually have any source packages [08:02:51] oh! [08:02:53] right. [08:02:58] ok, ran sudo aptly repo add trusty-tools . for trusty [08:03:20] !log tools added all packages in data/project/.system/deb-trusty to aptly repo trusty-tools [08:03:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [08:04:00] valhallasw`cloud: cool! https://wikitech.wikimedia.org/wiki/Aptly#Adding_Packages you need to an update as well. [08:04:15] !log tools added all packages in data/project/.system/deb-precise to aptly repo precise-tools [08:04:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [08:04:29] why do we skip signing? signing is awesome! :-p [08:05:01] !log tools Publish for local repo ./trusty-tools [all, amd64] publishes {main: [trusty-tools]} has been successfully updated.
Publish for local repo ./precise-tools [all, amd64] publishes {main: [precise-tools]} has been successfully updated. [08:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [08:06:15] valhallasw`cloud: heh, should set it up [08:06:27] RECOVERY - Puppet staleness on tools-exec-1215 is OK Less than 1.00% above the threshold [3600.0] [08:08:27] valhallasw`cloud: ok, just the i386 error now [08:08:29] which makes me go [08:08:30] wtf [08:08:33] in that [08:08:38] I thought we only had amd64 [08:08:56] YuviPanda: no, apt-get update still fails on bastion-01 [08:09:07] W: Failed to fetch http://tools-services-01/repo/dists/trusty-tools/main/source/Sources 404 Not Found [08:09:10] valhallasw`cloud: did you do a puppet run first? [08:09:18] oh :-p [08:09:20] the puppet run gets rid of the deb-src line [08:09:38] but puppet won't run because of the failing apt! :-p [08:09:43] via cron, anyway [08:10:11] yeah [08:10:16] valhallasw`cloud: I have a script! [08:10:20] I can say 'salt' but fuck salt [08:10:25] :-p [08:10:41] https://github.com/yuvipanda/personal-wiki [08:10:42] tada [08:11:22] valhallasw`cloud: so does this mean that our precise instances are actually a different virtual arch than our trusty ones?! [08:11:44] no, don't think so. Linux tools-exec-1201 3.2.0-75-virtual #110-Ubuntu SMP Tue Dec 16 19:24:01 UTC 2014 x86_64 [08:12:03] then why is it looking for i386?! [08:12:43] multiarch stuff? [08:12:48] not sure [08:13:09] or maybe it's just a fallback when it can't find the x64 one? [08:13:13] valhallasw`cloud: so I've created only an amd64 repo [08:13:16] hm no [08:13:22] by default [08:13:38] well... this is where aptly makes a mess [08:14:20] "When repository is first published, list of architectures is stored in the database and can’t be changed.' [08:14:28] By default aptly would guess list of architectures from the contents of the snapshot or local repository being published. [08:15:25] valhallasw`cloud: ah. I can just change it in puppet and delete the repo, it'll get recreated [08:15:38] yeah, but we don't want i386 :P [08:15:38] valhallasw`cloud: see -operations [08:21:00] RECOVERY - Puppet staleness on tools-services-01 is OK Less than 1.00% above the threshold [3600.0] [08:21:50] RECOVERY - Puppet staleness on tools-bastion-01 is OK Less than 1.00% above the threshold [3600.0] [08:28:13] RECOVERY - Puppet staleness on tools-exec-1201 is OK Less than 1.00% above the threshold [3600.0] [08:32:19] 6Labs: Disable multiarch support in all Labs precise instances - https://phabricator.wikimedia.org/T111760#1615743 (10yuvipanda) 3NEW [08:44:24] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 30.00% of data above the critical threshold [0.0] [08:53:49] RECOVERY - Puppet staleness on tools-exec-1204 is OK Less than 1.00% above the threshold [3600.0] [08:53:57] (03PS1) 10Jean-Frédéric: Extract method is_template_present_in_page [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236749 (https://phabricator.wikimedia.org/T111757) [08:54:24] (03CR) 10Jean-Frédéric: [C: 032] Extract method is_template_present_in_page [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236749 (https://phabricator.wikimedia.org/T111757) (owner: 10Jean-Frédéric) [08:55:15] (03Merged) 10jenkins-bot: Extract method is_template_present_in_page [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236749 (https://phabricator.wikimedia.org/T111757) (owner: 10Jean-Frédéric) [08:59:29] RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0] [09:19:03] 6Labs, 6operations, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech, Gerrit and Phabricator - https://phabricator.wikimedia.org/T85913#1615861 (10adrianheine) I'm back, and I'm happy to walk through the process with someone on IRC if that's necessary :) [09:19:33] (03PS1) 10Jean-Frédéric: Fix bug introduced in 50e3ce9d [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236752 [09:19:49] (03CR) 10Jean-Frédéric: [C: 032] Fix bug introduced in 50e3ce9d [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236752 (owner: 10Jean-Frédéric) [09:19:55] (03Merged) 10jenkins-bot: Fix bug introduced in 50e3ce9d [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236752 (owner: 10Jean-Frédéric) [09:27:47] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 7Graphite, 7Shinken: Delete more specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T111540#1615880 (10fgiunchedi) agreed, what would be the easiest way to get a map of project -> list of instances? @yuvipanda @andrew ? [09:43:12] valhallasw`cloud: BTW did you add all the things on nfs to appropriate aptly repo? [09:43:26] YuviPanda: yes [09:43:36] precise to precise-tools, trysty to trusty-tools [09:43:43] YuviPanda: but we still need a preferences.d for aptly [09:44:05] Hmm? [09:44:22] so that aptly > apt.wm.o > ubuntu default [09:44:27] Can't we just get rid of the labsdebrepo role [09:44:34] Wouldn't ensure latest take care of that [09:44:53] I don't know [09:45:24] and in that case, I don't know if we want latest? :P [09:45:52] e.g. if we have some custom version in aptly, we don't want a newer version from ubuntu to supersede it [09:46:12] valhallasw`cloud: we already have ensure latest on exec environment and dev environ [09:46:30] But yeah a preference file seems good idea anyway [09:46:32] I know [09:46:54] * YuviPanda is still in a train [10:08:28] s51053 and s52261, killing your multi-hour queries because they are starting to affect the performance of labsdb1001 [10:10:09] valhallasw`cloud: doing your patches now! [10:16:02] valhallasw`cloud: done [10:44:10] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 66.67% of data above the critical threshold [0.0] [10:56:44] YuviPanda: oh dear. [10:56:52] what did I break this time [10:57:11] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: Package[python3-scipy] is already declared in file /etc/puppet/modules/toollabs/manifests/genpp/python_exec_trusty.pp:130; cannot redeclare at /etc/puppet/modules/toollabs/manifests/exec_environ.pp:360 on node tools-bastion-01.tools.eqiad.wmflabs [10:57:13] raah. [10:57:45] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 7Graphite, 7Shinken: Delete more specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T111540#1616099 (10yuvipanda) eb3e3dbd81d263791d2ba1909f64f8a84531c65e for my revert of my original garbage collector script. It failed because t... [10:58:02] valhallasw`cloud: need to kill them from exec_environ I guess? [10:58:06] valhallasw`cloud: all the python modules [10:58:12] I thought I did [10:58:54] ah there it is [10:59:09] am going to go for food with the WMDE folks now :( [10:59:11] I'll cya in a bit [10:59:19] ok [11:01:17] this is also fixed with re quire_package thing of course... [11:03:49] I should add apt.wm.o and aptly as sources to genpp at some point [11:04:48] YuviPanda: https://gerrit.wikimedia.org/r/#/c/236762/ should fix it [12:05:31] valhallasw`cloud: merged [12:06:04] valhallasw`cloud: we should maybe move the list of packages into a segmented yaml file [12:15:20] what's the advantage of that? [12:15:49] or do you mean all packages? and then use some genpp magic to install the right ones? [12:17:23] valhallasw`cloud: all of them. then we can build docker containers for kubernetes that can have the right set of packages [12:17:29] achso [12:18:02] ....wouldn't that make the docker containers huge? [12:18:15] yeah so you can compose them [12:18:24] php packages separate from python ones etc [12:18:30] mmm [12:18:33] and of course, eventually narrow those down [12:18:40] force venv for almost all of the things [12:18:57] for python I would just not supply any default packages. venv all the things [12:19:01] yeah [12:19:06] the system-wide packages are just for one-off scripts I'd say [12:19:07] there is autobuilders [12:19:08] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [12:19:11] \o/ [12:20:20] valhallasw`cloud: https://hub.docker.com/_/python/ see the onbuild one [12:20:30] valhallasw`cloud: those autobuild by setting up requirements.txt [12:20:50] YuviPanda: we should prebuild wheels though [12:20:58] that would make everyones life so much easier [12:21:05] not entirely sure how to do that though [12:21:16] because debian thinks wheels are bad, mkay [12:25:27] YuviPanda: could you force the puppet run on all the hosts? thanks :-) [12:25:41] valhallasw`cloud: yeah, let me do that [12:36:42] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1405 is CRITICAL 20.00% of data above the critical threshold [0.0] [12:37:16] bah [12:37:52] valhallasw`cloud: am forcing runs now [12:37:59] but something is failing? [12:38:30] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 30.00% of data above the critical threshold [0.0] [12:38:31] I think it might've been a pssh timeout [12:38:36] ok or not [12:38:46] but that's probably it - killed the run halfway through [12:39:16] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 33.33% of data above the critical threshold [0.0] [12:39:52] yeah, I think that's it [12:39:56] PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL 40.00% of data above the critical threshold [0.0] [12:40:06] sigh [12:42:18] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 75.00% of data above the critical threshold [0.0] [12:42:32] PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL 66.67% of data above the critical threshold [0.0] [12:42:42] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 60.00% of data above the critical threshold [0.0] [12:43:33] RECOVERY - Puppet staleness on tools-redis-01 is OK Less than 1.00% above the threshold [3600.0] [12:43:47] PROBLEM - Puppet failure on tools-web-static-01 is CRITICAL 30.00% of data above the critical threshold [0.0] [12:44:11] PROBLEM - Puppet failure on tools-redis-02 is CRITICAL 33.33% of data above the critical threshold [0.0] [12:45:29] PROBLEM - Puppet failure on tools-mailrelay-02 is CRITICAL 30.00% of data above the critical threshold [0.0] [12:45:49] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [3600.0] [12:45:51] PROBLEM - Puppet failure on tools-web-static-02 is CRITICAL 20.00% of data above the critical threshold [0.0] [12:45:58] * YuviPanda hates everything now [12:46:03] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [3600.0] [12:48:09] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1405 is OK Less than 1.00% above the threshold [3600.0] [12:48:21] RECOVERY - Puppet staleness on tools-exec-1216 is OK Less than 1.00% above the threshold [3600.0] [12:48:35] RECOVERY - Puppet staleness on tools-exec-1218 is OK Less than 1.00% above the threshold [3600.0] [12:48:59] RECOVERY - Puppet staleness on tools-exec-1217 is OK Less than 1.00% above the threshold [3600.0] [12:51:52] RECOVERY - Puppet staleness on tools-mailrelay-02 is OK Less than 1.00% above the threshold [3600.0] [12:52:50] RECOVERY - Puppet staleness on tools-redis-02 is OK Less than 1.00% above the threshold [3600.0] [12:53:04] RECOVERY - Puppet staleness on tools-web-static-01 is OK Less than 1.00% above the threshold [3600.0] [12:53:22] RECOVERY - Puppet staleness on tools-checker-01 is OK Less than 1.00% above the threshold [3600.0] [12:53:48] RECOVERY - Puppet failure on tools-web-static-01 is OK Less than 1.00% above the threshold [0.0] [12:53:52] RECOVERY - Puppet staleness on tools-web-static-02 is OK Less than 1.00% above the threshold [3600.0] [12:54:14] RECOVERY - Puppet failure on tools-redis-02 is OK Less than 1.00% above the threshold [0.0] [12:54:54] RECOVERY - Puppet failure on tools-exec-1217 is OK Less than 1.00% above the threshold [0.0] [12:55:08] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [3600.0] [12:55:30] RECOVERY - Puppet failure on tools-mailrelay-02 is OK Less than 1.00% above the threshold [0.0] [12:55:48] RECOVERY - Puppet failure on tools-web-static-02 is OK Less than 1.00% above the threshold [0.0] [12:56:16] valhallasw`cloud: ^ is doing ok now [12:56:17] the pssh [12:56:22] <3 [12:56:43] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1405 is OK Less than 1.00% above the threshold [0.0] [12:56:43] valhallasw`cloud: now as long as I Don't lose network :) [12:56:55] RECOVERY - Puppet staleness on tools-precise-dev is OK Less than 1.00% above the threshold [3600.0] [12:57:41] RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0] [12:58:29] RECOVERY - Puppet staleness on tools-webgrid-generic-1404 is OK Less than 1.00% above the threshold [3600.0] [12:59:11] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [3600.0] [12:59:43] RECOVERY - Puppet staleness on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [3600.0] [12:59:47] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1201 is OK Less than 1.00% above the threshold [3600.0] [12:59:51] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1209 is OK Less than 1.00% above the threshold [3600.0] [13:00:13] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [3600.0] [13:00:23] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [3600.0] [13:02:25] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1202 is OK Less than 1.00% above the threshold [3600.0] [13:02:34] RECOVERY - Puppet failure on tools-exec-1218 is OK Less than 1.00% above the threshold [0.0] [13:03:28] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [0.0] [13:03:30] RECOVERY - Puppet staleness on tools-webgrid-generic-1402 is OK Less than 1.00% above the threshold [3600.0] [13:04:20] RECOVERY - Puppet failure on tools-exec-1215 is OK Less than 1.00% above the threshold [0.0] [13:04:20] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1205 is OK Less than 1.00% above the threshold [3600.0] [13:04:22] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [3600.0] [13:04:50] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1204 is OK Less than 1.00% above the threshold [3600.0] [13:05:36] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1409 is OK Less than 1.00% above the threshold [3600.0] [13:05:59] valhallasw`cloud: it's taking a while becaus it's also installing all the py3 packages [13:06:05] yeah [13:06:07] that's fine [13:07:06] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [3600.0] [13:07:14] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0] [13:07:16] YuviPanda: https://cdn.rawgit.com/wikimedia/operations-puppet/production/modules/toollabs/manifests/genpp/report-python.html \o/ [13:07:31] valhallasw`cloud: nice [13:07:36] we should expand that to all packages :D [13:07:39] not complete, though, and partially wrong [13:07:50] e.g. python-requests on precise is 2.0 from apt.wm.o [13:07:51] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 7Graphite, 7Shinken: Delete more specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T111540#1616217 (10fgiunchedi) a:3fgiunchedi I'll take this, setting to low [13:07:58] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 7Graphite, 7Shinken: Delete more specific deployment-prep graphite datapoints - https://phabricator.wikimedia.org/T111540#1616219 (10fgiunchedi) p:5Triage>3Low [13:08:30] RECOVERY - Puppet staleness on tools-checker-02 is OK Less than 1.00% above the threshold [3600.0] [13:08:44] YuviPanda: also, can we make apt.tools.wmflabs.org link to the tools apt repo? or is that a bad idea? [13:08:57] I can get the package list in another way, I suppose [13:09:10] hm, I'm wondering if apt.wm.o has the package list in the format I need at all... [13:09:11] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1408 is OK Less than 1.00% above the threshold [3600.0] [13:10:27] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1406 is OK Less than 1.00% above the threshold [3600.0] [13:10:27] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1401 is OK Less than 1.00% above the threshold [3600.0] [13:10:45] valhallasw`cloud: can't do sssl on *.tools.wmflabs.org [13:10:55] don't need ssl if we sign the packages :D [13:10:57] RECOVERY - Puppet staleness on tools-exec-1219 is OK Less than 1.00% above the threshold [3600.0] [13:11:13] valhallasw`cloud: still feels icky :P we want to be ssl only at some point [13:11:43] hrm, so I suppose I will have to parse packages.gz [13:12:09] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1403 is OK Less than 1.00% above the threshold [3600.0] [13:12:31] RECOVERY - Puppet staleness on tools-exec-1213 is OK Less than 1.00% above the threshold [3600.0] [13:12:39] 10MB gz. might be ok, actually [13:12:49] RECOVERY - Puppet staleness on tools-exec-1214 is OK Less than 1.00% above the threshold [3600.0] [13:13:03] still 10x more than the files I use now :( [13:15:21] RECOVERY - Puppet staleness on tools-exec-1408 is OK Less than 1.00% above the threshold [3600.0] [13:15:47] RECOVERY - Puppet staleness on tools-exec-1410 is OK Less than 1.00% above the threshold [3600.0] [13:16:46] RECOVERY - Puppet staleness on tools-exec-1212 is OK Less than 1.00% above the threshold [3600.0] [13:17:58] RECOVERY - Puppet staleness on tools-exec-1405 is OK Less than 1.00% above the threshold [3600.0] [13:18:00] RECOVERY - Puppet staleness on tools-exec-1211 is OK Less than 1.00% above the threshold [3600.0] [13:18:04] RECOVERY - Puppet staleness on tools-exec-1406 is OK Less than 1.00% above the threshold [3600.0] [13:19:38] RECOVERY - Puppet staleness on tools-exec-1409 is OK Less than 1.00% above the threshold [3600.0] [13:20:40] RECOVERY - Puppet staleness on tools-exec-1407 is OK Less than 1.00% above the threshold [3600.0] [13:21:04] RECOVERY - Puppet staleness on tools-exec-1401 is OK Less than 1.00% above the threshold [3600.0] [13:21:24] RECOVERY - Puppet staleness on tools-exec-1403 is OK Less than 1.00% above the threshold [3600.0] [13:21:44] RECOVERY - Puppet staleness on tools-exec-1404 is OK Less than 1.00% above the threshold [3600.0] [13:22:30] RECOVERY - Puppet staleness on tools-exec-1402 is OK Less than 1.00% above the threshold [3600.0] [13:24:17] RECOVERY - Puppet staleness on tools-exec-1203 is OK Less than 1.00% above the threshold [3600.0] [13:25:07] RECOVERY - Puppet staleness on tools-exec-1202 is OK Less than 1.00% above the threshold [3600.0] [13:25:23] RECOVERY - Puppet staleness on tools-exec-1208 is OK Less than 1.00% above the threshold [3600.0] [13:25:39] RECOVERY - Puppet staleness on tools-exec-1209 is OK Less than 1.00% above the threshold [3600.0] [13:26:23] RECOVERY - Puppet staleness on tools-exec-1210 is OK Less than 1.00% above the threshold [3600.0] [13:26:23] RECOVERY - Puppet staleness on tools-exec-1207 is OK Less than 1.00% above the threshold [3600.0] [13:27:07] RECOVERY - Puppet staleness on tools-bastion-02 is OK Less than 1.00% above the threshold [3600.0] [13:27:19] RECOVERY - Puppet staleness on tools-exec-1206 is OK Less than 1.00% above the threshold [3600.0] [13:27:33] RECOVERY - Puppet staleness on tools-exec-1205 is OK Less than 1.00% above the threshold [3600.0] [13:28:47] RECOVERY - Puppet staleness on tools-webproxy-01 is OK Less than 1.00% above the threshold [3600.0] [13:29:25] RECOVERY - Puppet staleness on tools-services-02 is OK Less than 1.00% above the threshold [3600.0] [13:29:31] RECOVERY - Puppet staleness on tools-mail is OK Less than 1.00% above the threshold [3600.0] [13:29:36] RECOVERY - Puppet staleness on tools-submit is OK Less than 1.00% above the threshold [3600.0] [13:29:48] RECOVERY - Puppet staleness on tools-master is OK Less than 1.00% above the threshold [3600.0] [13:30:10] RECOVERY - Puppet staleness on tools-webproxy-02 is OK Less than 1.00% above the threshold [3600.0] [13:30:26] RECOVERY - Puppet staleness on tools-exec-gift is OK Less than 1.00% above the threshold [3600.0] [13:31:14] RECOVERY - Puppet staleness on tools-shadow is OK Less than 1.00% above the threshold [3600.0] [13:33:40] RECOVERY - Puppet staleness on tools-exec-cyberbot is OK Less than 1.00% above the threshold [3600.0] [13:35:12] 6Labs, 10Tool-Labs, 10pywikibot-core, 5Patch-For-Review: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616257 (10jayvdb) So it is available on trusty now, but not precise. This should be fairly easy to package for precise. Is there a guide for how to get a package into the... [13:35:23] 6Labs, 10Tool-Labs, 10pywikibot-core: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616258 (10jayvdb) [13:40:48] 6Labs, 10Tool-Labs, 10pywikibot-core: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616261 (10yuvipanda) I don't think we should maintain any new python packages that aren't already being maintained upstream for that version of ubuntu. Why does it need to be in precise? [13:43:47] valhallasw`cloud, about what? [13:52:44] 6Labs, 10Tool-Labs, 10pywikibot-core: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616285 (10valhallasw) Because many people are using the default settings for jsub, which means they run their stuff on precise hosts. [13:53:05] YuviPanda: I thought backports were relatively easy? [13:54:08] valhallasw`cloud: yes but I thought we decided to not do any of it ourselves? and backport updates when they happen... [13:55:19] oh, I interpreted it as 'we're not going to build any more python packages that are not already packaged' [13:55:32] but aiui, backporting is a oneliner (rather than fooling around with debuild for two hours) [13:55:47] and keeping up the updates. [13:56:58] python packages? updates? :-p [13:57:36] I also don't know what ubuntu's rules on backports are [13:58:28] 6Labs, 10Tool-Labs, 10pywikibot-core: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#1616298 (10jayvdb) The latest patch adds a simple enum class so enum34 is optional. I would like to add an ImportWarning telling users to install enum34. That will just annoy toollab users who... [13:58:39] 10Quarry: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#1616299 (10Jarekt) 3NEW [14:00:13] 10Quarry: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#1616307 (10yuvipanda) Those aren't actually running for months, I think - those happen when for some reason there's an unhandled exception in the query results serializer, preventing it from updating the status accordingly... Nee... [14:02:19] valhallasw`cloud: oh, did you build a 'tools-packages' instance? [14:02:24] YuviPanda: no? [14:02:38] valhallasw`cloud: > tools-packages [14:02:42] I wonder who that is from [14:03:58] image id: (missing) [14:07:40] YuviPanda: scfc tried to log in aug 31, but otherwise I don't see any logins [14:07:43] apart from you and I today [14:08:04] valhallasw`cloud: so I suppose that's his [14:28:56] fixing up salt version on all labs instances that are either behind (2014.1.11) or ahead (2015.5.1), one instance at a time, interruption to salt services should be negligible [14:29:47] apergos: ty apergos [14:30:20] yw (it's all automated now) [14:30:30] after that I can check salt process count and fix those up [14:30:44] then I can see if any are still stuck on the authentication error and fix those up [14:30:51] (all automated as well) [14:31:20] lemme put this on a ticket now that I have the script going [14:34:00] 6Labs, 10Salt: clean up old ec2id-based salt keys on labs - https://phabricator.wikimedia.org/T103089#1616387 (10ArielGlenn) still a few more: language-dev.language.eqiad.wmflabs i-00000585.language.eqiad.wmflabs last puppet run: Sep 7 09:04 marathon-master-01.marathon.eqiad.wmflabs i-00000c08.eqiad.wmflabs... [14:46:05] 6Labs, 10Labs-Infrastructure, 3Labs-sprint-112, 3labs-sprint-113: Update Labs to OpenStack Kilo - https://phabricator.wikimedia.org/T110045#1616457 (10Andrew) [14:46:36] 6Labs, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Have catchpoint checks for all labs services (Tracking) - https://phabricator.wikimedia.org/T107058#1616459 (10Andrew) [14:47:38] 6Labs, 10wikitech.wikimedia.org, 3Labs-sprint-112, 5Patch-For-Review, and 2 others: Can't list instances on Special:NovaInstance - https://phabricator.wikimedia.org/T110629#1616462 (10Andrew) 5Open>3Resolved I'm pretty sure the cause of the problem was my bad cache-refresh code, which is now reverted. [14:47:38] 6Labs, 10Tool-Labs, 5Patch-For-Review: Labs_lvm::Volume[separate-tmp] is noisy on execution hosts - https://phabricator.wikimedia.org/T109933#1616464 (10Andrew) [14:47:41] 6Labs, 10Salt, 6operations: salt does not run reliably for toollabs - https://phabricator.wikimedia.org/T99213#1616465 (10ArielGlenn) I made a full pass on these and labs looked ok. But here we are in September and labs is generally in bad shape. This includes toollabs. Let me list here the issues: 1) mor... [14:47:58] 6Labs, 10Salt, 6operations: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1616466 (10ArielGlenn) [14:48:07] yuvipanda: documented [14:48:30] thanks [14:48:49] you know how you suffer with your ssh loops? multiply that by however many labs instances there are [14:48:54] :D [14:49:09] I don't envy you [14:49:16] me either :-D [15:37:51] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, and 3 others: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1616722 (10yuvipanda) [15:41:25] YuviPanda: hm, why aren't we just using a PPA for our toollabs packages? [15:42:04] i.e. we publish to our own PPA and use that PPA as source as well [15:42:35] jessie, and ppas live outside labs [15:42:37] also meetinsg [15:42:44] jessie, yes, fair point [15:42:53] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Setup monitoring and reporting for disk space usage of each project on NFS - https://phabricator.wikimedia.org/T106476#1616739 (10coren) [15:43:11] and yes, they live outside labs, that's the point. Giving the rest of the world access to our packages [15:43:33] without much extra effort [15:43:35] 6Labs, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Have catchpoint checks for all labs services (Tracking) - https://phabricator.wikimedia.org/T107058#1616745 (10Andrew) a:5coren>3Andrew [15:43:39] 6Labs, 3Labs-sprint-112, 3ToolLabs-Goals-Q4, 3labs-sprint-113: Fix documentation & puppetization for labs NFS - https://phabricator.wikimedia.org/T88723#1616746 (10coren) [15:43:42] 6Labs, 10Analytics, 10Labs-Infrastructure, 3Labs-Sprint-108, 5Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1616747 (10ellery) Thanks Otto! [15:45:43] 6Labs, 3labs-sprint-113: Evaluate gridengine's use of NFS and (possibly) move it to a different module - https://phabricator.wikimedia.org/T111797#1616757 (10coren) 3NEW [15:46:10] 6Labs, 3labs-sprint-113: Evaluate gridengine's use of NFS and (possibly) move it to a different volume - https://phabricator.wikimedia.org/T111797#1616764 (10coren) [15:50:38] 6Labs, 3labs-sprint-113: Separate scratch and tools NFS volumes to separate physical devices - https://phabricator.wikimedia.org/T111802#1616811 (10coren) 3NEW [15:53:22] valhallasw`cloud: first k8s node added to toollabs!!!! [15:55:33] YuviPanda: hurray [15:56:20] valhallasw`cloud: I am thinking I'll migrate grrrit-wm first [15:56:33] YuviPanda: mmm. [15:56:46] the plan was to migrate the irc bots to a seperate project [15:56:53] but I suppose k8s also works [15:57:01] yeah [15:57:04] k8s without nfs [16:30:04] !log ores deployed ores-wikimedia-config:ca10888, ores==0.4.0 and revscoring==0.5.0 [16:30:22] Log it robot! [16:30:27] halfak: it has! [16:30:29] OK [16:30:31] :) [16:30:34] GOod robot [16:31:50] halfak: tools.wmflabs.org/sal [16:33:22] YuviPanda: well, that one has [16:33:37] but yay, now we have three places to find adminlogs? [16:34:29] !log ores should work now [16:34:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [16:34:40] heh [16:35:15] YuviPanda, so it didn't work. [16:35:17] Try again? [16:35:25] !log ores deployed ores-wikimedia-config:ca10888, ores==0.4.0 and revscoring==0.5.0 [16:35:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [16:36:07] halfak: it did. there are now two places [16:36:09] and one worked [16:36:13] we should deprecate one of them soon [16:36:22] Oh. [16:36:24] Huh [16:36:36] * halfak starts writing post-mortem [16:36:46] YuviPanda, got a good template for me? [16:36:51] yes [16:37:06] halfak: https://wikitech.wikimedia.org/wiki/Incident_documentation [16:37:16] halfak: there's a template linked from there and tons of exmaples [16:41:08] * halfak gets all wikipedian on this [16:45:23] Coren, ping [17:01:11] 17:01 -!- idle : 0 days 18 hours 33 mins 56 secs [signon: Sun Aug 16 07:05:38 2015] [17:01:20] Cyberpower678: In meeting. [17:01:48] YuviPanda, https://wikitech.wikimedia.org/wiki/Incident_documentation/20150908-ores [17:08:46] halfak: cool :) [17:21:14] Cyberpower678: both your xtools issue and your extra exec node thing [17:22:23] valhallasw`cloud, there's more to the exec node thing now. The WMF has taken an interest in this bot, so I it's better for me to talk to Coren than phab it. [17:22:53] it's better to document what's happening so it's not just in irc logs half a year from now ;-) [17:22:57] As for xTools, I'm not sure it's a bug in the infrastructure. [17:23:19] valhallasw`cloud, true, but I don't have all the details needed to even create a phab ticket. [17:23:54] valhallasw`cloud, The WMF wants my bot to run on the 30 top wikis. That's going to need a lot of resources and research. [17:24:19] Cyberpower678: when you say 'The WMF' can you clarify who you are talking about? [17:24:29] 6Labs: fix labs jessie instances to have correct salt version - https://phabricator.wikimedia.org/T104849#1617380 (10ArielGlenn) No takers? I have cleaned up almost all of the current issues, but I need to check if new instances get set up incorrectly. [17:24:30] YuviPanda, Occasi [17:24:34] Cyberpower678: ok [17:25:08] Cyberpower678: I'm confused why it can't be on phab? file a ticket with everything you know, add coren and occasi? [17:25:26] you can edit tickets afterewards as well [17:25:45] valhallasw`cloud, I want to have my information first. I don't to create a wishy-washy request. [17:26:15] Cyberpower678: and the tool labs project on phab is also for support requests, so it's fine if it's not clear whether it's infra or configuration [17:26:19] Besides, I'm still working to improve resource usage of my bot so my numbers aren't final yet. [17:26:44] Cyberpower678: that's fine. The phab ticket is where it can be discussed -- it doesn't have to be the final request [17:26:47] A bot this big needs to be efficient. [17:28:10] valhallasw`cloud, alright. But I would still like to have a word with Coren though. I will prepare a phab ticket soon. [17:28:29] thanks :-) [17:28:38] also, did the logs issue resolve itself? [17:29:00] I'm not sure. I have run my citation bot again [17:29:06] I'm still hammering out bugs. [17:29:17] and resource drains. [17:29:30] ok, sure. Also there, if you see the issue again, please file a phab ticket with as much info as you have -- it sounds like a serious issue. [17:29:56] lemme pull up another log I saw this issue crop up. [17:31:49] Looks like that log is normal again too. [17:32:05] valhallasw`cloud, It appears to be an intermittent issue. [17:33:36] It looks like the toolserver.org redirects double encode URL parameters [17:33:37] All my logs look normal at the moment. [17:34:14] Coren, can you ping me when you're done with the meeting. [17:37:42] hi [17:37:57] Hi there, is here anyone interested to talk with me? [17:38:25] Jackwiki: about what? [17:39:34] to late... he is gone [17:44:21] 6Labs, 10Tool-Labs: Make tools-mail route mail for @tools-*.pmtpa.wmflabs correctly - https://phabricator.wikimedia.org/T63484#1617464 (10scfc) Rereading the man page for `mailname`, of course the other approach to try would be to set `/etc/mailname` to `tools.wmflabs.org` and see what happens. Issues to look... [18:11:29] Betacommand: can you file a bug? On mobile so a bit hard for me [18:13:01] valhallasw`cloud: what would that be filed under? [18:14:39] Toollabs should be ok, not sure if there's anything more specific [18:18:00] 6Labs, 10Tool-Labs: toolserver redirects double encode params - https://phabricator.wikimedia.org/T111839#1617554 (10Betacommand) [18:34:54] 10Quarry: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#1617653 (10Jarekt) Ok So you are saying I should stop waiting for my http://quarry.wmflabs.org/query/5045 query results? ;) If they are all killed after 20 min., as queries from other tools like CatScan2, than no need for " easy... [18:46:07] is k8n ready for testing by end-users? [18:47:08] gifti: I think Yuvi is planning a public announcement in a few days. [18:48:18] yay [18:57:34] gwicke: Is it ok if I cause some downtime on services instance ‘appservice’? [18:57:41] Not urgent, just doing some rebalancing. [18:59:09] ebernhardson: Can estest1002 tolerate some downtime? It might be 40 minutes or so. [19:02:27] andrewbogott: yes its no problem [19:02:41] andrewbogott: thanks for asking, there are other times its in the middle of a 4 hour import operation :) [19:02:54] ebernhardson: great, I will do now. Thanks [19:05:30] !log estest migrating estest1002 to labvirt1004 [19:05:30] estest is not a valid project. [19:06:14] !log search migrating estest1002 to labvirt1004 [19:06:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Search/SAL, dummy [19:11:34] PROBLEM - Free space - all mounts on tools-exec-1206 is CRITICAL tools.tools-exec-1206.diskspace.root.byte_percentfree (<11.11%) [19:15:38] (03PS3) 10Niedzielski: Copy alpha build from merge job instead of making [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/231697 (https://phabricator.wikimedia.org/T99115) [19:16:38] YuviPanda: would you mind reviewing this guy when you get a chance? ^^ [19:21:27] hey friends, where do I complain about unmerged changes on deployment-puppetmaster [19:21:27] ? [19:21:44] YuviPanda: ? [19:23:49] bd808: ? [19:24:22] unmerged as in the cherry-picks or manual hacks? [19:24:28] uh, i thikn cherry picks? [19:24:35] looks like a merge or rebase conflict [19:24:45] ah. I can take a look [19:24:50] rebase in progress; onto 7bb7543 [19:24:51] You are currently rebasing branch 'production' on '7bb7543'. [19:25:01] danke [19:28:17] ottomata: fixed [19:28:27] yay danke [19:31:57] 6Labs, 6operations, 5Patch-For-Review: labs salt master on jessie fails to install salt-master - https://phabricator.wikimedia.org/T110032#1617944 (10Andrew) I rebuilt new images last week, so this should be fixed. I have not directly verified this though. [19:54:29] Why does the uwsgi server sometimes refuse to find files even though the permissions are correct. I'm using webservice2 uwsgi-python command. [19:55:06] It first happened with a module, then with an html template and now with app.py [20:02:39] ashwinpp: Are you certain there is no change to the working directory? [20:02:51] yes [20:03:20] Without more information, I can't think of any reason why that would happen. [20:03:50] I changed some content in the file and saved it [20:04:10] and the file it says it can't access, exists and is accessible by me [20:15:31] !log integration disconnecting integration-slave-trusty-1011 and migrating to a new virt host [20:15:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/SAL, dummy [20:44:23] 6Labs, 6Release-Engineering, 10wikitech.wikimedia.org: Missing messages on Wikitech - https://phabricator.wikimedia.org/T101753#1618241 (10greg) [20:45:32] I wonder how that was releng [20:45:38] andrewbogott, hey [20:46:11] Krenair: hello! [20:46:44] andrewbogott, did you see https://gerrit.wikimedia.org/r/#/c/236491/ ? [20:47:23] yes, although I haven’t thought about it much :) Seems like a good idea overall. [20:47:55] Okay [20:47:59] Some people just ignore all gerrit mail [20:48:28] So I've become more annoying and ask people if they've seen [20:49:27] yeah, it never hurts to nag [20:49:57] !log integration re-enabled integration-slave-trusty-1011 [20:50:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/SAL, dummy [21:02:05] Coren, how do I look up the job that just died? [21:02:35] YuviPanda, ^ [21:02:42] If it really /just/ died, it'll show in qstat; after it's been collected by the engine, it'll show in qacct [21:06:22] Coren, using qacct -j ### is causing a flood of data I'm not interested in. I only want my job that I just now submitted and watched unexpectedly die. [21:07:49] And providing it a name just causes it to hang up [21:08:12] It may be a *long* time before it responds. IIRC, it was never purged. [21:08:24] Oh wait it just loaded. [21:08:30] What's exit 137? [21:08:55] Coren, ^ [21:09:09] Signal 9 - sigkill. (128+9) That 99.9% means that it was killed by the shepherd because it ran out of memory. [21:09:22] You should also get the actual stats in there. [21:10:22] Coren, that shouldn't be happening. The bot is only spawning additional php workers. [21:11:02] There shouldn't be anything that uses memory beyond a few hundred megs. [21:11:17] Especially not maxvmem 1.282G [21:12:39] Coren, ^ could spawning using popen and pclose be rapidly zapping my memory? [21:13:59] It'll affect it, certainly; the maximum is cumulative for all of the processes in the group. [21:17:21] Coren, what is the shephard enforcing again? PHP mem limit or jsub or physical hardware limits? [21:33:33] (03PS1) 10Jean-Frédéric: Catch pywikibot.exceptions.InvalidTitle [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236963 (https://phabricator.wikimedia.org/T111865) [21:33:57] (03CR) 10Jean-Frédéric: [C: 032] Catch pywikibot.exceptions.InvalidTitle [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236963 (https://phabricator.wikimedia.org/T111865) (owner: 10Jean-Frédéric) [21:34:02] (03Merged) 10jenkins-bot: Catch pywikibot.exceptions.InvalidTitle [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/236963 (https://phabricator.wikimedia.org/T111865) (owner: 10Jean-Frédéric) [22:24:35] andrewbogott, so... if you're not going to +2 this, who is? [22:49:54] nuria: , bd808, another deployment/beta q for you [22:50:06] i'm setting up a new eventlogging host in deployment-prep [22:50:08] deprecating the old one [22:50:52] do you know how/where mediawiki is configured to send raw udp eventlogging events? [22:58:28] ottomata, [22:58:30] wmf-config/CommonSettings-labs.php: $wgEventLoggingFile = 'udp://deployment-eventlogging02.eqiad.wmflabs:8421/EventLogging'; [22:58:42] ah! [22:58:53] operations/mediawiki-config.git [22:58:56] great [22:58:58] got it [22:59:01] like basically all other wikimedia-specific mediawiki config [22:59:07] (aye, i never change that stuff) [22:59:11] how does it get deployed? [22:59:47] in beta? [22:59:59] well in production, you'd have to get it merged and deployed [23:00:21] for beta you have to get it merged there (by a production deployer), then beta will pick it up a few minutes later [23:00:35] production deployer will also need to merge the change on tin the normal way, but not sync it [23:01:32] aye, this change won't affect production, it is a beta labs config change [23:01:38] shame that is tied to production! :/ [23:01:44] not really [23:01:52] can you imagine what a state beta would be in if it wasn't tied to production? [23:02:06] i mean, it should be like production, but configs should be separate, no? [23:02:14] ... no [23:02:42] then people would be able to make a change to production without replicating it in beta [23:02:44] that would be bad. [23:02:52] Krenair: what if i wanted to set up a 3rd beta like environment [23:02:54] staging [23:02:55] and then a 4th [23:03:02] i dunno, supertest [23:03:17] there is a staging otw ;) [23:03:18] should all those hostname configs go in that repo? [23:04:00] the logic to build beta should be tied to production, and both should be used, but the configs for particular environments should be smarter than that [23:04:16] i dunno if this is relevant to you, but for elasticsearch things like hostnames go into hiera [23:04:29] so labs uses the production elasticsearch puppet, but the labs hiera data overrides some stuff including hostnames [23:04:31] ebernhardson: does mw read those from hiera? [23:04:44] i think the issue is that this config is a MW PHP config [23:04:51] ottomata: oh, i totally missed that [23:04:55] YuviPanda: any chance you could take a look at this guy? https://gerrit.wikimedia.org/r/#/c/231697/ [23:05:09] ottomata: yea we don't have anything special for MW config, besidse a CirrusSearch-common.php file in the mediawiki-config repo where we put stuff [23:05:28] labs just gets hardcoded into a CirrusSearch-labs.php file [23:05:41] aye [23:05:50] so there's gonna be a CirrusSearch-staging.php too? [23:05:54] sadly, yes [23:05:59] indeed :) [23:06:46] i don't think any kind of solution for that is being baked into the mediawiki centralize configuration rfc something something [23:07:01] :S i'll make a note on the RFC i guess :) [23:07:38] What is this staging stuff? [23:07:38] oh, there is an RFC? [23:08:11] Krenair: who should I add as reviewer on this patch? [23:08:25] um [23:08:31] usually you just either deploy it yourself the normal way [23:08:34] ottomata: well, there is https://www.mediawiki.org/wiki/Requests_for_comment/Configuration_database_2 but re-reading it its only tangentially related [23:08:47] or put it up on the deployment calendar in a SWAT window [23:08:49] ottomata: its the same think, about the mediawiki configuration and setup, but doesn't take into account the independant cluster concerns afaik [23:08:53] ah [23:08:55] there's one going on at the moment actually [23:09:32] I imagine if you ask Roan in -operations he'll do it for you [23:09:48] ebernhardson: cool, i mean, if the thing is able to change a single db url or something for the config, that might be fine [23:09:56] puppet could render a single db config url for an environment [23:14:28] ottomata, what is 'staging' then? [23:14:46] Krenair: i don't know, ebernhardson just told me about it [23:20:05] Krenair, http://gitready.com/beginner/2009/01/18/the-staging-area.html [23:20:22] Platonides, not git staging [23:23:06] I miss the staging family, then :) [23:37:17] is there a way to make ?.wmflabs.org a redirect to something that is not in labs?