[01:06:14] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [02:30:11] 6Labs, 10Tool-Labs: Rewrite the meta_p table populating code to python and have it run on a cron - https://phabricator.wikimedia.org/T107094#1700304 (10Krenair) I think has_echo and has_flaggedrevs are quite different. [02:41:18] 6Labs, 10Tool-Labs: Rewrite the meta_p table populating code to python and have it run on a cron - https://phabricator.wikimedia.org/T107094#1700312 (10Krenair) a:5Krenair>3yuvipanda has_echo has the wrong default in "production" (current labs, as opposed to my dev copy). And actually has_flaggedrevs might... [07:57:16] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:32:15] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [10:35:30] 6Labs, 10Tool-Labs: nodejs jobs fail with a v8 fatal error - https://phabricator.wikimedia.org/T113826#1700542 (10valhallasw) This could be a memory issue; the default memory allocation for jobs is 256MB, and not all VMs handle malloc() failing well. I suspect the node VM might be one of them ;-) For referenc... [11:32:50] 6Labs, 10Tool-Labs: Please install hugin-tools and pillow - https://phabricator.wikimedia.org/T108210#1700585 (10valhallasw) p:5Triage>3Normal [11:35:53] 6Labs, 10Tool-Labs: Unable to boot Ruby app on tool labs - https://phabricator.wikimedia.org/T109322#1700590 (10valhallasw) p:5Triage>3Low [11:37:36] 6Labs, 10Tool-Labs, 5Patch-For-Review: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1700592 (10valhallasw) p:5Triage>3High [11:37:56] 6Labs, 10Tool-Labs: Make webservicemonitor support doing a HTTP health check - https://phabricator.wikimedia.org/T109719#1700593 (10valhallasw) p:5Triage>3Low [11:38:29] 6Labs, 10Tool-Labs: Add monitoring for SGE queue status - https://phabricator.wikimedia.org/T109730#1700595 (10valhallasw) p:5Triage>3High @Yuvipanda, could you take a look at this? [11:39:34] 6Labs, 10Tool-Labs: Add monitoring for expected load issues - https://phabricator.wikimedia.org/T109732#1700598 (10valhallasw) p:5Triage>3High [11:40:02] 6Labs, 10Tool-Labs, 7Database: tools.citationhunt can't access databases - https://phabricator.wikimedia.org/T109972#1700601 (10valhallasw) p:5Triage>3High [11:40:38] 6Labs, 10Tool-Labs, 7Database: tools.citationhunt can't access databases - https://phabricator.wikimedia.org/T109972#1565044 (10valhallasw) @jcrespo, could you take a look at this? [11:41:28] 6Labs, 10Tool-Labs: Create a fonts CDN for use on Tool Labs - https://phabricator.wikimedia.org/T110027#1700611 (10valhallasw) p:5Triage>3Low [11:42:00] 6Labs, 10Tool-Labs: libc version error when running python 3.4 - https://phabricator.wikimedia.org/T110812#1700615 (10valhallasw) 5Open>3Resolved a:3valhallasw [11:42:59] 6Labs, 10Tool-Labs-tools-Other, 6Commons: Provide service to filter over categorization from a list of Commons categories - https://phabricator.wikimedia.org/T110833#1700617 (10valhallasw) [11:43:31] 6Labs, 10Tool-Labs: nginx puppet manifest requires nfs so error page cannot be updated over puppet - https://phabricator.wikimedia.org/T110836#1700619 (10valhallasw) p:5Triage>3Low [11:45:47] 6Labs, 10Tool-Labs: SGE queues all overloaded / jobs not submitting although load averages are low - https://phabricator.wikimedia.org/T110994#1700620 (10valhallasw) 5Open>3Resolved a:3valhallasw Hm. I'm not sure what we can do to find the root cause of this issue at the moment. Let's just hope it doesn'... [11:47:37] 6Labs, 10Tool-Labs: Make tools-instances that don't need NFS not have NFS - https://phabricator.wikimedia.org/T111716#1700625 (10valhallasw) p:5Triage>3Lowest [11:47:52] 6Labs, 10Tool-Labs, 5Patch-For-Review: deploy package_builder on tool labs - https://phabricator.wikimedia.org/T111730#1700627 (10valhallasw) p:5Triage>3Lowest [11:48:04] 6Labs, 10Tool-Labs: Initial Deployment of Kubernetes to Tool Labs (Tracking) - https://phabricator.wikimedia.org/T111885#1700628 (10valhallasw) p:5Triage>3Low [11:48:54] 6Labs, 10Tool-Labs: Create debian packages for kubernetes - https://phabricator.wikimedia.org/T111888#1700630 (10valhallasw) p:5Triage>3Normal [11:49:18] 6Labs, 10Tool-Labs, 3Labs-Sprint-114: Make sure that docker0 bridge comes up after flannel network is established - https://phabricator.wikimedia.org/T111893#1700632 (10valhallasw) 5Open>3Resolved a:3valhallasw [11:49:19] 6Labs, 10Tool-Labs: Initial Deployment of Kubernetes to Tool Labs (Tracking) - https://phabricator.wikimedia.org/T111885#1618964 (10valhallasw) [11:49:35] yuvipanda: we should probably have a seperate k8s project on phab [11:49:45] it's sort of getting in the way of normal tool labs things [11:50:13] and the prioritization in terms of tool labs is very different as compared to how you probably want to prioritize for the k8s project [11:50:30] 6Labs, 10Tool-Labs, 3Labs-Sprint-114: Setup and verify authentication for Kubernetes - https://phabricator.wikimedia.org/T111904#1700636 (10valhallasw) p:5Triage>3Normal [11:50:45] 6Labs, 10Tool-Labs: Setup DNS for kubernetes services - https://phabricator.wikimedia.org/T111914#1700638 (10valhallasw) 5Open>3Resolved a:3valhallasw [11:50:46] 6Labs, 10Tool-Labs: Initial Deployment of Kubernetes to Tool Labs (Tracking) - https://phabricator.wikimedia.org/T111885#1700640 (10valhallasw) [11:51:00] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 3Labs-Sprint-115, and 2 others: Add support to dynamicproxy for kubernetes based web services - https://phabricator.wikimedia.org/T111916#1700641 (10valhallasw) p:5Triage>3Normal [11:51:25] 6Labs, 10Tool-Labs: Setup a way to store secrets and access them from puppet inside the Tool Labs project - https://phabricator.wikimedia.org/T112005#1700646 (10valhallasw) p:5Triage>3Low [11:52:04] 6Labs, 10Tool-Labs: Static server returns HTTP 403 Forbidden for valid files in some cases - https://phabricator.wikimedia.org/T112388#1700654 (10valhallasw) p:5Triage>3High [11:52:24] 6Labs, 10Tool-Labs, 7Icinga, 7Monitoring: Add a monitoring check for *.wmflabs.org certificate - https://phabricator.wikimedia.org/T112645#1700656 (10valhallasw) p:5Normal>3High [11:52:37] 6Labs, 10Tool-Labs, 3Labs-Sprint-115, 3labs-sprint-116: Write admission controller disabling mounting of unauthorized volumes - https://phabricator.wikimedia.org/T112718#1700658 (10valhallasw) p:5Triage>3Low [11:53:16] 6Labs, 10Tool-Labs, 3Labs-Sprint-115, 3labs-sprint-116: Write admission controller disabling mounting of unauthorized volumes - https://phabricator.wikimedia.org/T112718#1643779 (10valhallasw) Not sure if I agree with that. Projects can currently read other files on NFS if permissions allow, and I'm not su... [11:53:28] 6Labs, 10Tool-Labs: Kubernetes Beta Signup List - https://phabricator.wikimedia.org/T112824#1700664 (10valhallasw) p:5Triage>3Lowest [11:53:38] 6Labs, 10Tool-Labs, 3Labs-Sprint-115: Decide on Docker image policies for Tool Labs Kubernetes - https://phabricator.wikimedia.org/T112855#1700666 (10valhallasw) 5Open>3Resolved a:3valhallasw [11:53:51] 6Labs, 10Tool-Labs, 5Patch-For-Review: Remove modules/toollabs/files/host_aliases - https://phabricator.wikimedia.org/T109485#1700670 (10valhallasw) [11:53:52] 6Labs, 10Tool-Labs: qmaster chokes on old jobs from hosts that have been renamed - https://phabricator.wikimedia.org/T113614#1700668 (10valhallasw) 5Open>3Resolved a:3valhallasw [11:54:00] 6Labs, 10Tool-Labs: nodejs jobs fail with a v8 fatal error - https://phabricator.wikimedia.org/T113826#1700671 (10valhallasw) p:5Triage>3Normal [11:54:50] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3labs-sprint-116: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#1700680 (10valhallasw) p:5Triage>3Low [11:55:02] 6Labs: Fix often reported problems from the Tool Labs Survey (Tracking) - https://phabricator.wikimedia.org/T114442#1700681 (10valhallasw) p:5Triage>3Normal [11:57:09] 6Labs, 10Tool-Labs: install python-ldap dependencies - https://phabricator.wikimedia.org/T114388#1700685 (10valhallasw) p:5Triage>3Low [11:57:49] 6Labs, 10Tool-Labs: allow tool users to attach strace to their processes (at least on exec hosts) - https://phabricator.wikimedia.org/T114401#1700686 (10valhallasw) p:5Triage>3Low [11:57:58] 6Labs, 10Tool-Labs: provide easier way to contact people abusing resources - https://phabricator.wikimedia.org/T114560#1700687 (10valhallasw) p:5Triage>3Low [11:59:35] 6Labs, 10Tool-Labs, 5Patch-For-Review: HBA not configured correctly for tools-bastion-02 - https://phabricator.wikimedia.org/T104613#1700693 (10valhallasw) p:5High>3Normal a:5valhallasw>3None [16:24:51] 6Labs, 10Tool-Labs: nodejs jobs fail with a v8 fatal error - https://phabricator.wikimedia.org/T113826#1700858 (10DanielFriesen) You're right. I also tested with 500MB before, but it looks like Node.js requires 1GB of ram even for something as simple as this script: ``` console.log('...'); ``` [16:58:07] doctaxon: hey. You're running a few tcl scripts on tools-login that have been running there for a while (several hours). Could you move them to the grid? [17:01:07] 6Labs, 10Tool-Labs: nodejs jobs fail with a v8 fatal error - https://phabricator.wikimedia.org/T113826#1700883 (10valhallasw) 5Open>3Resolved a:3valhallasw Yes, this seems to be a more generic thing with VM-based languages; the same happens for Java and (less so) for Ruby and Python. For Node, the memory... [17:04:23] 6Labs, 10Tool-Labs: Static server returns HTTP 403 Forbidden for valid files in some cases - https://phabricator.wikimedia.org/T112388#1700887 (10scfc) @Coren: Assuming this is the attribute cache issue of T106170, we would need to reboot the servers `tools-web-static-01` and `tools-web-static-02` to resolve t... [17:07:09] 6Labs, 10Tool-Labs, 3Labs-Sprint-115, 3labs-sprint-116: Write admission controller disabling mounting of unauthorized volumes - https://phabricator.wikimedia.org/T112718#1700891 (10yuvipanda) p:5Low>3Normal Sure! But they still should be disallowed from mounting /etc as rw on the host, since there are... [17:07:33] 6Labs, 10Tool-Labs, 3Labs-Sprint-115, 3labs-sprint-116: Write admission controller disabling mounting of unauthorized volumes - https://phabricator.wikimedia.org/T112718#1700893 (10yuvipanda) [18:47:36] 6Labs, 10Beta-Cluster, 7Tracking: Beta Cluster <-> WMF Labs policy compliance (tracking) - https://phabricator.wikimedia.org/T114615#1700939 (10greg) 3NEW [18:47:49] 6Labs, 10Beta-Cluster, 7Tracking: Beta Cluster <-> WMF Labs policy compliance (tracking) - https://phabricator.wikimedia.org/T114615#1700946 (10greg) [18:49:48] 6Labs, 10Beta-Cluster, 7Tracking: Beta Cluster <-> WMF Labs policy compliance (tracking) - https://phabricator.wikimedia.org/T114615#1700939 (10greg) [19:15:05] 6Labs, 10Beta-Cluster, 7Tracking: Beta Cluster <-> WMF Labs policy compliance (tracking) - https://phabricator.wikimedia.org/T114615#1700962 (10Krenair) We should probably have different robots.txt files that disallow indexing everything, do the "If my tools have account creation..." bit, and make sure the p... [19:15:15] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3labs-sprint-116: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#1700964 (10valhallasw) This also needs an update to access.conf, as tool users are not allowed to log in even if they could authenticate (see T104613) [19:16:49] 6Labs, 10Beta-Cluster, 7Tracking: Beta Cluster <-> WMF Labs policy compliance (tracking) - https://phabricator.wikimedia.org/T114615#1700967 (10greg) [19:30:00] yuvipanda: I'm wondering if we need access.conf at all. We disallow non-tools users using the ldap code already, right? [19:30:36] do we? [19:35:49] yuvipanda: actually no, we don't. >_< [19:37:16] yeah maybe we should do that as well [19:45:07] 6Labs, 10Tool-Labs: Make a decommissioning checklist - https://phabricator.wikimedia.org/T97904#1701010 (10valhallasw) [19:48:45] 6Labs, 10Tool-Labs: Make a decommissioning checklist for toollabs VMs - https://phabricator.wikimedia.org/T97904#1701016 (10valhallasw) [19:53:27] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3labs-sprint-116: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#1701017 (10valhallasw) After discussing with @yuvipanda a bit more: The easiest way to handle this entire situation is by moving the tools-or-not-tools auth to the ldap han... [19:57:00] 6Labs, 10Tool-Labs: Make a decommissioning checklist for toollabs VMs - https://phabricator.wikimedia.org/T97904#1701029 (10valhallasw) [22:59:43] 6Labs, 10wikitech.wikimedia.org: Adding a user to a project results in a blank page with the user added to the project but no shell access - https://phabricator.wikimedia.org/T114229#1701122 (10Krenair) I got an HTTP 500 when I tried to add `Alex Monk (Test 1)`. Going to ask if I can get access to `silver.eqia...