[00:35:15] 6Labs, 10Tool-Labs, 15User-Bd808-Test: Create template PHP application for use on Tool Labs based on Slim, Twig and Wikimedia libraries - https://phabricator.wikimedia.org/T90092#1711296 (10bd808) [02:58:15] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [03:33:17] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [03:59:14] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [04:46:28] * legoktm lols at https://tools.wmflabs.org/sal/Krenair [05:34:14] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [07:23:44] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1711664 (10valhallasw) We should probably correlate the graph in http://tools.freeside.sk/monitor/http-kmlexport.html with the puppet failures we've seen to see if they are related (i.e., does kmlexport also die when puppet ca... [07:55:13] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [08:35:12] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [09:46:37] PROBLEM - Puppet staleness on tools-k8s-bastion-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:24:57] 6Labs, 10Labs-Infrastructure, 7Swift: Provide Swift object store(s) for the labs projects - https://phabricator.wikimedia.org/T114998#1712021 (10hashar) 3NEW [10:25:19] 6Labs, 10Labs-Infrastructure, 7Swift: Provide Swift object store(s) for the labs projects - https://phabricator.wikimedia.org/T114998#1712021 (10hashar) [10:26:26] 6Labs, 10Labs-Infrastructure, 7Swift: Provide Swift object store(s) for the labs projects - https://phabricator.wikimedia.org/T114998#1712021 (10hashar) [10:31:10] 6Labs, 10Labs-Infrastructure, 7Swift: Provide Swift object store(s) for the labs projects - https://phabricator.wikimedia.org/T114998#1712072 (10hashar) I found a previous update by @Andrew from November 2014: >>! In T64835#781789, @Andrew wrote: > To support Swift in labs I want to allow keystone/swift aut... [10:40:23] 6Labs, 10wikitech.wikimedia.org, 3Labs-Sprint-105: remove nutcracker from wikitech - https://phabricator.wikimedia.org/T102993#1712082 (10Revi) [10:55:12] valhallasw`cloud: I'd rather not change the code of the application; how do I override the php fcgi configuration ? [10:57:48] 6Labs, 10Tool-Labs: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) - https://phabricator.wikimedia.org/T115000#1712090 (10Nemo_bis) 3NEW [10:57:54] Filed as https://phabricator.wikimedia.org/T115000 before I forget [11:06:29] (03PS1) 10Alexandros Kosiaris: maps: Add a codfw postgresql dummy pass [labs/private] - 10https://gerrit.wikimedia.org/r/244428 [11:06:47] Nemo_bis: I think you can copy the fcgi definition from the docs (the 'default lighttpd configuration' part) and then adapt the fcgi invocation [11:06:55] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] maps: Add a codfw postgresql dummy pass [labs/private] - 10https://gerrit.wikimedia.org/r/244428 (owner: 10Alexandros Kosiaris) [11:06:56] if you put that in .lighttpd.conf, that should override the default config [11:10:42] valhallasw`cloud: didn't work for me, maybe I misinterpreted the examples [11:11:51] 6Labs, 10Tool-Labs: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) - https://phabricator.wikimedia.org/T115000#1712124 (10valhallasw) p:5Triage>3High The log file also mentions ``` error.log:2015-10-07 10:49:20: (server.c.1044) W... [11:13:13] Nemo_bis: oh, that error message is probably from you adding session.save_path = ... to the .lighttpd.conf at some point? [11:16:10] 6Labs, 10Tool-Labs: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) - https://phabricator.wikimedia.org/T115000#1712133 (10valhallasw) `/var/lib/php5` contains a few cookies, but all owned by tools.widar. The permissions on `/var/lib/... [11:19:16] valhallasw`cloud: which error message? I added that yes, but after the error message appeared; though maybe I pasted incorrectly [11:19:29] I also tried the various formats mentioned on the help pages [11:22:10] 6Labs, 10Tool-Labs: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) - https://phabricator.wikimedia.org/T115000#1712139 (10valhallasw) OK, I think I know what the issue is. Both `wlm-jury-at` and `widar` (and maybe some other tools) s... [11:22:35] 6Labs, 10Tool-Labs: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) - https://phabricator.wikimedia.org/T115000#1712140 (10valhallasw) [11:22:37] 6Labs, 10Tool-Labs: Session cookies (and data) being shared between web services cause issues - https://phabricator.wikimedia.org/T67891#1712141 (10valhallasw) [11:24:02] 6Labs, 10Tool-Labs: Session cookies (and data) being shared between web services cause issues - https://phabricator.wikimedia.org/T67891#1712147 (10valhallasw) The advantage of `~/.phpsessions` is also that cookies are shared between hosts, which means people are not suddenly logged out when the webservice is... [11:31:26] 6Labs, 10Tool-Labs: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) - https://phabricator.wikimedia.org/T115000#1712161 (10scfc) `strace -f /usr/bin/php-cgi` shows: ``` […] open("./php-cgi-fcgi.ini", O_RDONLY) = -1 ENOENT (No such... [11:37:26] 6Labs, 10Tool-Labs: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) - https://phabricator.wikimedia.org/T115000#1712166 (10valhallasw) Yes, that's probably the case. The fact php-cgi tries to read files in ./ is actually an interesti... [11:38:20] Nemo_bis: ^ you could try ~/php.ini [12:09:58] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1712220 (10scfc) Or the other way around: If `kmlexport` uses all the memory it requests, the calculation for memory shared between tools goes wrong, the host runs out of memory and Puppet fails (cf. also T107665). [12:26:14] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [12:45:59] 6Labs, 10Tool-Labs: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) - https://phabricator.wikimedia.org/T115000#1712271 (10Nemo_bis) >>! In T115000#1712166, @valhallasw wrote: > Yes, that's probably the case. > > The fact php-cgi tri... [12:46:02] valhallasw`cloud: done [13:01:17] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [13:15:05] Nemo_bis: thanks. I don't have time to look into it now, but will do later [13:17:21] Sure. Thanks for the help. [13:17:33] It's not urgent at all, because it was just a test tool and we're now using another one [13:17:58] ok [13:18:16] yeah, it seems the php-cgi current working dir actually isn't ~ but it's /usr/bin [13:18:16] gah. [13:21:10] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1712330 (10valhallasw) According to @coren the puppet issue was not actually a memory issue (file system cache was not flushed, indicating the issue wasn't actually a lack of memory, if I remember correctly). [13:27:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0] [13:35:01] andrewbogott: Do we keep a general-use role::package::builder instance around? I often create my own discardable one but that seems impractical. :-) [13:37:16] Coren: I usually build packages on my local VM. So, not really. [13:37:35] andrewbogott: Do you think it makes sense to make one, then? [13:37:52] sure. [13:38:02] There might even be a ‘packaging’ project already [13:40:09] Oh, hah. I'll have to wait a few minutes either way - my phone is drained. [13:48:08] andrewbogott: There is indeed one, with no instances. [13:49:04] Coren: I added you to the project… is there an instance now? :) [13:49:19] Oh hah. Yep. :-) [13:49:58] ... and it has the role::package::builder role. So, in fact, the answer to my original question was "yes, yes there is!" :-) [13:50:29] great! [14:02:12] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:51] PROBLEM - Puppet failure on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:46:13] yuvipanda: new version of webservicemonitor has not asploded. I'm keeping an eagle eye trained on it until I leave and will wait for tomorrow when I'm at the hotel to do the drain-reboot dance. [14:46:36] Just to make sure we have time to catch any regression before we stress the thing. [15:03:19] andrewbogott: as a reminder, I'm heading to the airport early afternoon. Anything that can't wait until tomorrow should be mentionned now. :-) [15:03:38] Coren: ok… I can’t think of anything. [15:03:42] Safe travels! [15:06:33] 6Labs, 10Tool-Labs, 3Labs-Sprint-115, 5Patch-For-Review, and 2 others: Attribute cache issue with NFS on Trusty - https://phabricator.wikimedia.org/T106170#1712538 (10coren) Ne version of webservicemonitor is deployed, removing the last impediment to the drain-reboot cycle. Planned for Fri Oct 9 with no o... [15:12:16] anyone know who operates the archiving cluebot ? [16:03:16] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3labs-sprint-117: Setup a way to store secrets and access them from puppet inside the Tool Labs project - https://phabricator.wikimedia.org/T112005#1712679 (10coren) With modulepath, what happens when one makes a local clone of the puppet tree? [16:28:14] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [16:49:25] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review, 3labs-sprint-117: Give 'novaobserver' keystone account rights to read everything, everywhere, write or change nothing - https://phabricator.wikimedia.org/T104588#1712838 (10Andrew) ...and... downgraded. [16:50:15] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review, 3labs-sprint-117: Support a multi-domain model in keystone - https://phabricator.wikimedia.org/T115026#1712846 (10Andrew) 3NEW a:3Andrew [16:50:25] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review, 3labs-sprint-117: switch to keystone api v3 - https://phabricator.wikimedia.org/T115027#1712853 (10Andrew) 3NEW a:3Andrew [16:55:08] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review, 3labs-sprint-117: Move project membership/assignment from ldap to keystone mysql - https://phabricator.wikimedia.org/T115029#1712891 (10Andrew) 3NEW a:3Andrew [16:55:32] 6Labs, 10Labs-Infrastructure, 3labs-sprint-117: switch to keystone api v3 - https://phabricator.wikimedia.org/T115027#1712898 (10Krenair) [16:55:41] 6Labs, 10Labs-Infrastructure, 3labs-sprint-117: Support a multi-domain model in keystone - https://phabricator.wikimedia.org/T115026#1712900 (10Krenair) [16:59:35] 6Labs, 10Labs-Infrastructure, 3labs-sprint-117: Move project membership/assignment from ldap to keystone mysql - https://phabricator.wikimedia.org/T115029#1712921 (10Krenair) [17:00:11] 6Labs, 10Labs-Infrastructure, 3labs-sprint-117: Move project membership/assignment from ldap to keystone mysql - https://phabricator.wikimedia.org/T115029#1712891 (10Krenair) > How, then, will pam/ssh determine project membership? I don't know. Nova API? [17:33:13] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [17:59:17] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [18:28:32] 6Labs, 10Labs-Infrastructure, 10hardware-requests, 6operations, 3labs-sprint-117: Labs test cluster in codfw - https://phabricator.wikimedia.org/T114435#1713186 (10chasemp) p:5Triage>3Normal [18:28:49] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org, 5Patch-For-Review: Adding a user to a project results in a blank page with the user added to the project but no shell access - https://phabricator.wikimedia.org/T114229#1713187 (10Krenair) Because nobody reviewed the patch, this will... [18:30:43] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: https://wikitech.wikimedia.org/w/api.php?action=novaprojects&subaction=getall times out - https://phabricator.wikimedia.org/T115034#1713198 (10Krenair) [18:34:12] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [18:53:28] 6Labs, 10Tool-Labs: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) - https://phabricator.wikimedia.org/T115000#1713272 (10valhallasw) Apparently the cwd of the php-cgi process is /usr/bin, not ~ :( [19:19:44] 6Labs, 10Tool-Labs: Session cookies (and data) being shared between web services cause issues - https://phabricator.wikimedia.org/T67891#1713387 (10valhallasw) From PHP 5.3 on, PHP supports '.user.ini' files, which can be used to override php.ini directives. I suggest the following setup: For each tool, creat... [19:19:57] Nemo_bis: ^ .user.ini *does* work :-) [19:20:13] (see /data/project/gerrit-reviewer-bot/public_html/.user.ini) [19:24:15] uuh [19:27:10] (make sure to create the session directory with 700 permissions, thoug) [19:27:12] +h [21:04:23] (03PS2) 10Hashar: On Jenkins, skip missing interpreters [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/242348 [21:04:35] (03CR) 10Hashar: "check experimental" [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/242348 (owner: 10Hashar) [21:08:24] (03CR) 10Hashar: "I knew we could set different parameters for tox when it find out it is running under Jenkins (it detects whether JENKINS_URL env variable" [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/242348 (owner: 10Hashar) [21:33:01] (03CR) 10Hashar: "check experimental" [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/242348 (owner: 10Hashar) [21:35:47] (03CR) 10Hashar: "Example result: https://integration.wikimedia.org/ci/job/tox-jessie/321/console" [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/242348 (owner: 10Hashar) [22:44:40] (03CR) 10John Vandenberg: On Jenkins, skip missing interpreters (031 comment) [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/242348 (owner: 10Hashar)