[00:08:56] 6Labs, 5Patch-For-Review: Replicate data between codfw and eqiad - https://phabricator.wikimedia.org/T85606#1147239 (10coren) [00:08:57] 6Labs, 5Patch-For-Review: Upgrade labstore2001 to Jessie - https://phabricator.wikimedia.org/T93740#1147237 (10coren) 5Open>3Resolved This is *finally* done after spending a couple of hours trying to make it boot after the H700 "helpfully" reordered the drives. [00:09:09] 6Labs: Upgrade labstore2001 to Jessie - https://phabricator.wikimedia.org/T93740#1147240 (10coren) [01:23:13] 6Labs, 6Phabricator, 7Puppet: Disable by default Phabricator alternate file domain on Labs - https://phabricator.wikimedia.org/T93837#1147422 (10Negative24) 3NEW a:3Negative24 [01:40:58] twentyafterfour: What is the easiest way to see if the security extension is installed on a phab installation? [01:54:53] Negative24: you just need it checked out somewhere and then point to that location in the load-libraries config value [01:55:30] so is the libext dir a wmf thing or phab by default? [01:55:57] that name is something we made up [01:56:19] so is the security ext in there on production? [01:56:35] Negative24: yes [01:57:29] it seems that I didn't install it on phab-02 but when the security drop down is selected it is blocked by anon users [01:59:04] Negative24: everything should have been installed by puppet on phab-02. What do you mean blocked by anon users? [01:59:58] Blocked as in the user can't view it as if the security extension is functional. So by everything does it include the security ext? [02:00:18] I can't tell so I asked if you had a way of checking [02:02:17] the extension isn't shown in phab's configs [02:03:08] oh wait all is fine. [02:03:16] I configured it that way [02:04:00] yeah default visibility isn't part of that extension [02:05:03] yeah not used to that [02:13:43] 6Labs, 6Phabricator, 7Puppet: Disable by default Phabricator alternate file domain on Labs - https://phabricator.wikimedia.org/T93837#1147460 (10Negative24) [02:35:55] twentyafterfour: How is the puppet settings object applied to phabricator in init.pp? [02:37:04] Negative24: it's in manifests/roles/phabricator.pp ...not in init.pp [02:37:46] the settings object or the application of the object? [02:38:24] the settings object is altered in phabricator.pp not created [02:55:21] twentyafterfour: so it is init.pp with local.json.erb. sorry for bothering you. I need to look harder next time [03:10:21] PROBLEM - Host tools-webproxy-test is DOWN: (Host Check Timed Out) [03:10:24] PROBLEM - Host tools-exec-09 is DOWN: (Host Check Timed Out) [03:10:37] PROBLEM - Host tools-webgrid-02 is DOWN: (Host Check Timed Out) [03:10:41] PROBLEM - Host tools-exec-03 is DOWN: (Host Check Timed Out) [03:10:42] PROBLEM - Host tools-exec-10 is DOWN: (Host Check Timed Out) [03:10:42] PROBLEM - Host tools-exec-08 is DOWN: (Host Check Timed Out) [03:10:42] PROBLEM - Host tools-exec-12 is DOWN: (Host Check Timed Out) [03:10:43] PROBLEM - Host tools-webgrid-04 is DOWN: (Host Check Timed Out) [03:10:43] PROBLEM - Host tools-mail is DOWN: (Host Check Timed Out) [03:10:44] PROBLEM - Host tools-exec-14 is DOWN: (Host Check Timed Out) [03:10:44] PROBLEM - Host tools-webgrid-01 is DOWN: (Host Check Timed Out) [03:10:45] PROBLEM - Host tools-trusty is DOWN: (Host Check Timed Out) [03:10:45] PROBLEM - Host tools-exec-04 is DOWN: (Host Check Timed Out) [03:10:47] PROBLEM - Host tools-exec-07 is DOWN: (Host Check Timed Out) [03:10:48] PROBLEM - Host tools-submit is DOWN: (Host Check Timed Out) [03:10:48] PROBLEM - Host tools-bastion-01 is DOWN: (Host Check Timed Out) [03:10:49] PROBLEM - Host tools-exec-13 is DOWN: (Host Check Timed Out) [03:10:50] PROBLEM - Host tools-exec-01 is DOWN: (Host Check Timed Out) [03:10:54] PROBLEM - Host tools-exec-gift is DOWN: (Host Check Timed Out) [03:10:55] PROBLEM - Host tools-webgrid-06 is DOWN: (Host Check Timed Out) [03:10:56] PROBLEM - Host tools-webgrid-03 is DOWN: (Host Check Timed Out) [03:11:00] RECOVERY - Host tools-webproxy-test is UP: PING OK - Packet loss = 0%, RTA = 0.93 ms [03:11:01] PROBLEM - Host tools-exec-cyberbot is DOWN: CRITICAL - Plugin timed out after 15 seconds [03:11:01] RECOVERY - Host tools-webgrid-01 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms [03:11:02] RECOVERY - Host tools-exec-07 is UP: PING OK - Packet loss = 0%, RTA = 0.69 ms [03:11:02] RECOVERY - Host tools-exec-13 is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms [03:11:06] PROBLEM - Host tools-exec-11 is DOWN: CRITICAL - Plugin timed out after 15 seconds [03:11:07] PROBLEM - Host tools-webproxy-01 is DOWN: CRITICAL - Plugin timed out after 15 seconds [03:11:07] RECOVERY - Host tools-exec-09 is UP: PING OK - Packet loss = 0%, RTA = 1.47 ms [03:11:09] RECOVERY - Host tools-exec-gift is UP: PING OK - Packet loss = 0%, RTA = 6.93 ms [03:11:42] RECOVERY - Host tools-submit is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms [03:12:07] RECOVERY - Host tools-exec-03 is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms [03:12:07] RECOVERY - Host tools-bastion-01 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [03:12:09] RECOVERY - Host tools-exec-14 is UP: PING OK - Packet loss = 0%, RTA = 0.78 ms [03:12:11] RECOVERY - Host tools-webgrid-06 is UP: PING OK - Packet loss = 0%, RTA = 1.50 ms [03:12:57] RECOVERY - Host tools-mail is UP: PING OK - Packet loss = 0%, RTA = 0.88 ms [03:13:44] RECOVERY - Host tools-exec-10 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [03:14:14] RECOVERY - Host tools-webgrid-04 is UP: PING OK - Packet loss = 0%, RTA = 66.99 ms [03:14:14] RECOVERY - Host tools-exec-04 is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms [03:14:16] RECOVERY - Host tools-webgrid-03 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [03:14:22] RECOVERY - Host tools-exec-08 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [03:14:32] RECOVERY - Host tools-exec-11 is UP: PING OK - Packet loss = 0%, RTA = 0.44 ms [03:14:34] RECOVERY - Host tools-webproxy-01 is UP: PING OK - Packet loss = 0%, RTA = 1.28 ms [03:14:38] RECOVERY - Host tools-trusty is UP: PING OK - Packet loss = 0%, RTA = 1.37 ms [03:14:40] RECOVERY - Host tools-exec-01 is UP: PING OK - Packet loss = 0%, RTA = 0.71 ms [03:15:31] RECOVERY - Host tools-exec-12 is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms [03:20:02] RECOVERY - Host tools-exec-cyberbot is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [03:20:06] RECOVERY - Host tools-webgrid-02 is UP: PING OK - Packet loss = 0%, RTA = 0.57 ms [06:35:40] PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [06:37:04] PROBLEM - Puppet failure on tools-login is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [07:00:42] RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:07:08] RECOVERY - Puppet failure on tools-login is OK: OK: Less than 1.00% above the threshold [0.0] [08:38:25] Hi, I’ve discovered that the MediaWiki-Vagrant role for Parsoid only adds the primary wiki to Parsoid. I need it to also work on a secondary wiki. [08:39:12] I’m trying to figure out how to change this – but I can’t determine how to, from the localsettings.js.erb template, get a list of wikis in the multiwiki setup [08:39:17] besides this horrible hack: https://groups.google.com/forum/?fromgroups=#!topic/puppet-users/xOY-oPkS5Uw [08:39:28] maybe YuviPanda knows something here, he’s the puppetmaster (haw haw) [08:41:24] it sounds like I want a Resource Collector (“Mediawiki::Wiki <| |>”), but apparently you can’t pass that to a template [08:48:48] 10Wikimedia-Labs-wikitech-interface, 6operations, 7HTTPS: wikitech.wikimedia.org SSL certificate considered "outdated security" in Chrome - https://phabricator.wikimedia.org/T92709#1147929 (10Krinkle) >>! In T92709#1118731, @Dzahn wrote: > this should be T73156 (SHA1 needs to be replaced with a SHA256 cert)... [09:19:46] 10Wikimedia-Labs-wikitech-interface, 6operations, 7HTTPS: wikitech.wikimedia.org SSL certificate considered "outdated security" in Chrome - https://phabricator.wikimedia.org/T92709#1147996 (10yuvipanda) p:5Triage>3Normal [09:25:35] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [09:35:52] 6Labs, 10Wikimedia-Labs-Infrastructure: graphite.wmflabs.org no longer purges data for deleted instances - https://phabricator.wikimedia.org/T93861#1148054 (10Krinkle) 3NEW [09:36:22] 6Labs, 10Wikimedia-Labs-Infrastructure: graphite.wmflabs.org no longer purges data for deleted instances - https://phabricator.wikimedia.org/T93861#1148064 (10Krinkle) [09:42:15] 6Labs, 10Wikimedia-Labs-Infrastructure: graphite.wmflabs.org no longer purges data for deleted instances - https://phabricator.wikimedia.org/T93861#1148088 (10yuvipanda) Yeah, I reverted my fix because there were bugs in that script and it ended up filling the disk with 'archived' data. See Id5f026abe1ac2de99d... [09:42:42] 6Labs, 10Wikimedia-Labs-Infrastructure: graphite.wmflabs.org no longer purges data for deleted instances - https://phabricator.wikimedia.org/T93861#1148096 (10yuvipanda) Also you should use the wikitech API (wikitech.wikimedia.org/w/api.php?action=query&list=novainstances&niproject=deployment-prep&niregion=eqi... [09:43:29] 6Labs, 10Wikimedia-Labs-Infrastructure: graphite.wmflabs.org no longer purges data for deleted instances - https://phabricator.wikimedia.org/T93861#1148101 (10Krinkle) Also, this is not limited to entire instances. It also applies to individual data points. For example, this graph for disk space on integration... [09:44:12] 6Labs, 10Wikimedia-Labs-Infrastructure: graphite.wmflabs.org no longer purges data for deleted instances - https://phabricator.wikimedia.org/T93861#1148107 (10Krinkle) >>! In T93861#1148096, @yuvipanda wrote: > Also you should use the wikitech API (wikitech.wikimedia.org/w/api.php?action=query&list=novainstanc... [09:58:23] No webservice https://tools.wmflabs.org/catscan2/ [09:59:16] Nemo_bis: just restarted it. [10:01:17] Thanks [10:49:52] RECOVERY - Puppet failure on tools-dev is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:45] Can someone tell me how to get around the error "(network.c.358) can't bind to port: 14002 Address already in use" when starting the webservice? [11:53:11] Setting a port in lighttpd makes the service start, but nothing is displayed and an error about duplicate keys is displayed [11:53:41] Sumurai8: heya! which tool is this? [11:53:51] Sumurai8: and what command are you using to start the webservice? [11:54:08] webservice start to start it [11:54:12] and this is about wikilinkbot [11:54:28] 6Labs, 10Wikimedia-Labs-Infrastructure: graphite.wmflabs.org no longer purges data for deleted instances - https://phabricator.wikimedia.org/T93861#1148382 (10yuvipanda) p:5Triage>3Low (Needsvolunteer :)). Also replacing txstatsd might help [11:54:51] I also don't seem to be able to display anything via tools-static.wmflabs.org instead [11:56:27] So it's unclear to me what directory I have to make readable to get that working. It doesn't seem to be ~/www/static and it doesn't seem to be ~/public_html/static or ~/public_html/www/static [11:56:40] Sumurai8: ah, looks like you need to make your tool’s home directory accessible [11:56:44] drwxrws--- 8 tools.wikilinkbot tools.wikilinkbot 4096 Mar 25 11:32 wikilinkbot [11:56:52] needs a x for others there [11:58:36] why an x? [11:58:42] What does it need to execute? [11:59:17] Sumurai8: doing +x for directories means that they can traverse the directory [11:59:29] http://unix.stackexchange.com/questions/21251/why-do-directories-need-the-executable-x-permission-to-be-opened [11:59:55] I see [11:59:56] Sumurai8: alright, so I did ‘webservice2 start’ for you and that works :) [12:00:04] let me file a bug about the other issue. [12:00:16] 10Tool-Labs: Memory Exhausted Near / Tool labs error while querying with Python - https://phabricator.wikimedia.org/T93074#1148392 (10marcmiquel) Dear Springle, Thanks for your comments. I will check these things you say. I want to run a query to enwiki and I am afraid it will break (It has run for 7 hours by... [12:00:24] Thanks [12:01:18] 10Tool-Labs: Webservice start failing with duplicate port allocation from portgranter - https://phabricator.wikimedia.org/T93875#1148393 (10yuvipanda) 3NEW [12:01:25] Sumurai8: ^ [12:06:06] 6Labs, 10hardware-requests, 6operations: Replace virt1000 with a newer warrantied server - https://phabricator.wikimedia.org/T90626#1148421 (10faidon) I don't think that the "under warranty" bit is a dealbreaker. The point of my comment on IRC is that we should be prepared for a catastrophic event for one th... [12:08:12] 6Labs, 10hardware-requests, 6operations: Replace virt1000 with a newer warrantied server - https://phabricator.wikimedia.org/T90626#1148424 (10yuvipanda) +1 on having a hot spare. I remember the close-to-heart-attack several people got when we thought virt1000's motherboard had fried when only one of the lig... [13:50:07] hi. is there a way to start/stop a webservice from a grid job? [13:51:17] I tried calling |webservice2 uwsgi-python stop| from the script that runs on the grid, but that just prints out |webservice2: not found| [13:52:13] ggp: hey! afaik, no there is no way - grid job nodes aren’t submit hosts, so you can’t manipulate jobs from them [13:53:09] YuviPanda: oh :( ok, I'll see what I can do here [13:53:22] ggp: what are you using that for, btw? [13:54:46] YuviPanda: my grid job generates a sqlite3 db that's used by my tool. when the new database is ready, I'd like to stop the service, mv it over the old one, and restart it [13:55:08] ah, I see. [13:55:14] YuviPanda: I can probably just dump the new database and import it into the old in a single transaction, but that's a bit harder than just mv :) [13:55:17] you can do some stat based magic to see mtime, I guess [13:55:27] well, the ‘mv’ is also over NFS, and I’m not sure how atomic that is :) [14:12:14] 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1148826 (10coren) p:5Triage>3Normal My original approach was to tackle this at the prod DNS level, which had issues. Now that we are nearing having a read DNS server for labs, this is... [14:12:57] 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1148828 (10coren) [14:13:11] 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1148829 (10yuvipanda) Another option is to point all *.labsdb to localhost, and then put a mysql proxy there that intelligently routes things. This lets us dynamically adjust weights, and even do failover very... [14:18:38] 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1148847 (10coren) That //adds// a moving part. I'm pretty sure that's not something we should be gunning for. [14:20:04] 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1148848 (10yuvipanda) Oh totally :) I just remember @springle asking (a long time ago) for an easy way to shift traffic around easily. However, I just realized that if we use DNS properly, *that* will work fi... [14:21:44] 10Tool-Labs: watchlist table not available on labs - https://phabricator.wikimedia.org/T59617#1148857 (10coren) p:5Triage>3Normal Counts only with a minimum is okay, but the performance will not be extraordinary. This should not be an issue in practice, however. [14:27:12] 6Labs: dhclient overwrites /etc/resolv.conf - https://phabricator.wikimedia.org/T93691#1148868 (10coren) p:5Low>3High Ew. Forgot that labs actually use dhclient unlike prod. This will need to be fixed before we can switch to designate. [14:30:00] YuviPanda: fwiw, I think I'll go with |sqlite3 old.sqlite3 '.restore new.sqlite3'|, which basically does this: https://www.sqlite.org/backup.html [14:30:05] and looks safe enough I guess [14:35:43] ggp: yeah, fair enough :) [14:39:36] 10Tool-Labs: Replicate watchlist to labs - https://phabricator.wikimedia.org/T93887#1148888 (10coren) 3NEW a:3Springle [14:40:00] 6Labs: Replicate watchlist to labs - https://phabricator.wikimedia.org/T93887#1148896 (10coren) [14:40:32] 6Labs: watchlist table not available on labs - https://phabricator.wikimedia.org/T59617#1148899 (10coren) a:5coren>3None [14:43:46] 10Tool-Labs: Harden mail server against incoming spam - https://phabricator.wikimedia.org/T67629#1148912 (10coren) p:5Triage>3Low a:5coren>3None [14:44:43] 10Tool-Labs: Harden mail server against incoming spam - https://phabricator.wikimedia.org/T67629#691141 (10coren) Running Spamassasin (or an equivalent) on the MX is a possibility, but the risk of false positives remains. This requires further evaluation of possible solutions. [14:46:04] 6Labs, 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1148926 (10coren) [14:48:25] 6Labs: watchlist table not available on labs - https://phabricator.wikimedia.org/T59617#1148943 (10Harej) Will it necessarily need to be real-time? My use case, at least, would not check more frequently than once per week. Or would it be too slow for even *that*? [14:51:25] 6Labs: watchlist table not available on labs - https://phabricator.wikimedia.org/T59617#1148958 (10coren) The way our replication is setup it is considerably harder to do anything /but/ normal live replication. Then again, when I speak of lackluster performance I mean that the implicit group by clause in the vi... [14:51:44] werdna: Puppet resource collectors don't really do what anyone thinks they do in my experience. Probably what we need to do for the mw-v parsoid is add some javascript to the config that can read in a collection of config files and make each ::mediawiki::wiki invocation add a file to setup that wiki. [14:52:08] bd808: nod, I did it a horrible way :P [14:52:34] Resource collectors really send data back to the puppet master that can be sent back out to other hosts. They really don't track things on the local machine [14:52:42] bd808: read it and weep: https://gerrit.wikimedia.org/r/#/c/199591/1/puppet/modules/mediawiki/templates/parsoid.localsettings.js.erb [14:53:10] and they make things non-deterministic from run to run which really sucks [14:53:17] nod [14:53:44] heh. that will work I suppose [14:53:50] well, not quite [14:54:02] It blows up if you make a wiki with any other server_url [14:54:17] ugh yeah [14:54:17] but I guess it’s better than the status quo [14:55:03] so we need a parsoid equivalent of settings.d/wikis/*/wgConf.php [14:55:10] right [14:55:13] or… something :D [14:55:50] I dunno, I also didn’t really want to pollute the mediawiki::wiki class with parsoid related stuff [14:55:53] to me that is a design failure [14:56:07] puppet is a design failure :) [14:56:11] probably ideally we could just install a lightweight script that prints all wikis and their settings which could be queried from NodeJS [14:56:46] *nod* Shouldn't be too hard to make some sort of json config dump I guess [14:58:17] 10Tool-Labs, 7Tracking: Missing Toolserver features in Tools (tracking) - https://phabricator.wikimedia.org/T60791#1148967 (10coren) a:5coren>3None [14:58:39] Something that runs puppet/modules/mediawiki/templates/multiwiki/LoadWgConf.php.erb and then dumps the $wgConf internal state maybe [14:58:53] does that include script path etc? [14:59:04] yeah [14:59:12] it's got all the per-wiki config in there [14:59:15] 6Labs: centralauth_p is missing tables - https://phabricator.wikimedia.org/T68533#1148982 (10coren) p:5Triage>3Low a:5coren>3None [15:12:49] 10Tool-Labs: Document how to turn shadow into master - https://phabricator.wikimedia.org/T91133#1149072 (10coren) I'm not sure what could be added to https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Admin#Redundancy (I've added a note to use `service` to start/stop the master)? [15:14:16] 10Tool-Labs: Random "can't get password entry for user "tools.liangent-py". Either the user does not exist or NIS error!" error - https://phabricator.wikimedia.org/T71529#1149077 (10coren) 5Open>3Resolved No activity on the task for over a month; presuming the issue was transient. [15:15:12] 10Tool-Labs: Can't send email from tools-exec-07, -14 or -15 - https://phabricator.wikimedia.org/T73097#1149085 (10coren) 5Open>3Resolved New instances no longer have the `/var/log` issue because the building process has changed. [15:17:43] 10Tool-Labs, 7Puppet: Puppetize adding new node to OGE - https://phabricator.wikimedia.org/T88712#1149092 (10coren) p:5Triage>3Low Filing this in the "would be nice to have, lots of work" category for now. [15:18:40] 10Tool-Labs, 7Puppet: Puppetize adding new node to OGE - https://phabricator.wikimedia.org/T88712#1149096 (10coren) a:5coren>3None [15:20:16] 10Tool-Labs, 7Puppet: Fully puppetize Grid Engine (Tracking) - https://phabricator.wikimedia.org/T88711#1149098 (10coren) a:5coren>3None [15:20:37] 10Tool-Labs, 5Patch-For-Review, 7Puppet: Puppetize adding a host to a particular queue - https://phabricator.wikimedia.org/T88713#1149102 (10coren) p:5Triage>3Low a:5coren>3None [15:21:44] 10Tool-Labs: Tool Labs: jsub starts multiple instances of tasks declared as "once" - https://phabricator.wikimedia.org/T62862#1149106 (10coren) [15:21:45] 10Tool-Labs: -once is not checked correctly by jstart randomly - https://phabricator.wikimedia.org/T60145#1149104 (10coren) [15:23:06] 10Tool-Labs: Tool Labs: jsub starts multiple instances of tasks declared as "once" - https://phabricator.wikimedia.org/T62862#1149109 (10coren) p:5Triage>3High a:5coren>3None [15:24:00] 10Tool-Labs: Tool Labs: jsub starts multiple instances of tasks declared as "once" - https://phabricator.wikimedia.org/T62862#1149112 (10coren) Timing shows this indeed happens in cases of NFS stalls or other systemic congestion (such as the recent hardware outages). Adding locking to jstart should fix this for... [15:29:44] 10Tool-Labs: Cannot start java processes using the grid engine - https://phabricator.wikimedia.org/T69588#1149118 (10coren) I'm unable to reproduce the issue; I can start a JVM without difficulty provided enough memory is requested. There are some numbers on that topic at https://wikitech.wikimedia.org/wiki/He... [15:29:52] 10Tool-Labs: Cannot start java processes using the grid engine - https://phabricator.wikimedia.org/T69588#1149120 (10coren) p:5Triage>3Normal [16:20:59] (03PS1) 10John F. Lewis: add sockets directory [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199626 [16:21:42] (03CR) 10John F. Lewis: [C: 032 V: 032] add sockets directory [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199626 (owner: 10John F. Lewis) [16:38:25] 10Quarry: Database dump for analysis - https://phabricator.wikimedia.org/T93907#1149417 (10Halfak) 3NEW [16:40:07] 10Quarry: Database dump for analysis - https://phabricator.wikimedia.org/T93907#1149427 (10yuvipanda) Hmm, so *this* would be somewhat hard to actually produce. The internal DB format is in https://github.com/wikimedia/analytics-quarry-web/blob/master/tables.sql, and I am not sure off the top of my head how to... [16:55:22] 6Labs, 6Phabricator, 5Patch-For-Review, 7Puppet: Disable by default Phabricator alternate file domain on Labs - https://phabricator.wikimedia.org/T93837#1149454 (10Negative24) [16:58:32] (03PS1) 10John F. Lewis: connect to metawiki.labsdb not ruwiki.labsdb [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199635 [16:59:09] (03CR) 10John F. Lewis: [C: 032 V: 032] "makes more sense and matches new set up config" [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199635 (owner: 10John F. Lewis) [16:59:30] 6Labs: centralauth_p is missing tables - https://phabricator.wikimedia.org/T68533#1149478 (10Legoktm) p:5Low>3Normal @coren: So, if you're not planning to work on this, who else should it be assigned to? [17:19:41] 6Labs, 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1149543 (10scfc) At Toolserver, all connections went through a central HA load-balancing proxy, and inter alia because that HA proxy had to be rebooted daily (!) and took down long-running queries with... [17:20:35] 6Labs: centralauth_p is missing tables - https://phabricator.wikimedia.org/T68533#1149545 (10yuvipanda) Looks like nobody is planning on working on this atm, so is assigned to nobody :) @legoktm If this is important enough to you, we can huddle together for an hour next week and get this done. [17:21:30] 6Labs, 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1149548 (10yuvipanda) Yes, please ignore my comment about the proxy :) I realized when writing my second comment that DNS will give us what we want (specifically, the ability to slowly drain connection... [17:24:05] 6Labs, 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1149563 (10yuvipanda) However, note that if we *did* have a proxy, it wouldn't be a SPOF at all - it would run on each exec / submit host, and would be running n localhost only. So it would go down onl... [17:37:27] twentyafterfour: Are there some production specific configs in the security ext that would need to be changed for phab-02? [17:37:44] I've already populated the projects [17:49:29] should be no [17:49:49] why do you ask? [17:56:27] Negative24: you need projects matching the names of the projects in production. create one named operations and one named security... I should make that configurable somewhere [17:57:20] twentyafterfour: is there a place to see output from the Security Event listeners? or other php files? [17:58:08] Negative24: you can put phlog() calls in the code and they show up in darkconsole (enable that in the developer settings in phab) [17:58:23] hit ` key to show the console once it's enabled [17:59:33] I believe it is already enabled on labs. thanks [18:25:45] Coren: I’m mired in the many complications of switching our domain scheme (mostly having to do with puppet.) It occurs to me that I could probably just write a designate driver that creates multiple dns entries for each host — new school, old school, ec2id-school. [18:26:02] Does that strike you as good, or bad? It means that there would still be some weird behaviors for duplicate hostnames. [18:26:14] But it would make the change to designate largely invisible to existing users. [18:26:21] Duplicate hostnames shouldn't be an issue provided their A records all agree. [18:26:39] Oh, I shoud explain. [18:26:42] And yes, that would make things invisible for most things. [18:27:09] If I create ‘foo’ in the ‘bar’ project and later you create ‘foo’ in the ‘baz’ project [18:27:23] We’d have two correct records: foo.bar.eqiad.wmflabs and foo.baz.eqiad.wmflabs [18:27:29] But probably just one foo.eqiad.wmflabs record [18:27:33] which would point to… one or the other [18:27:40] that’s what I mean about duplicate hostnames. [18:27:43] Ah. Right. [18:27:45] We already have that problem, of course. [18:27:49] * Coren ponders. [18:28:57] That should still be "mostly" harmless for the general case: given that instances will find hostnames in their project first, the only "real" issue would be conflicting names between projects that are being used to reach from another one. [18:29:13] yeah [18:29:15] The i-* hostnames are globally unique of course. [18:29:24] I’m pretty sure it would still be much better. [18:29:24] but we should probably prevent that anyway, at least for now, until the ‘old style’ hostnames go away... [18:29:27] they aren’t prevented atm. [18:29:40] IMO that's not a concern. Tell people to always and only use the qualified names when they mean some instance in another project. [18:29:55] And, arguably, that's very rare in the first place. [18:31:13] +1 [18:31:32] I’ll see if I can write a custom driver then — if it’s easy, this seems the less painful route. [18:31:58] It's certainly the route that will cause the fewer things to mysteriously break. [18:32:38] YuviPanda: Also, are you still here? [18:32:40] as you might expect, self-hosted puppet breaks in a million ways when you change domains :) [18:32:50] It might be that /nothing/ else breaks but it’ll be hard to tell [18:32:58] Coren: on and off. sorting out staying places, so on atm :) also looking through forms to fill, etc [18:33:00] without blindly merging changes into production [18:33:40] andrewbogott: Self-hosted puppet is... the evil. [18:33:50] yep! [18:34:13] 10Tool-Labs: jsub: add a -quiet option when using -once - https://phabricator.wikimedia.org/T59596#1149906 (10coren) p:5Triage>3Low a:5coren>3None [18:36:11] 10Tool-Labs: Upgrade git-review to >= 1.22 - https://phabricator.wikimedia.org/T65243#1149929 (10coren) 5Open>3Resolved Trusty (the current default, and also the release on the bastions) packages 1.23. [18:38:16] 10Tool-Labs: set jsub (jstart qcronsub) parameters by environment variable - https://phabricator.wikimedia.org/T64156#1149943 (10coren) p:5Triage>3Low There is a patch in need of some review love to provide a .jsubrc for defaults; perhaps that does the trick? (T56054) [18:40:23] 10Tool-Labs: Make tools-mail route mail for @tools-*.pmtpa.wmflabs correctly - https://phabricator.wikimedia.org/T63484#1149955 (10coren) p:5Triage>3Normal a:5coren>3None [18:42:58] 10Tool-Labs: Audit security groups - https://phabricator.wikimedia.org/T62144#1149982 (10coren) 5Open>3declined With no comment on the relevance of an audit in general, the security groups include the allow from source rule which lets traffic through when it comes from the right source. That applies //befor... [18:44:32] 6Labs, 5Patch-For-Review: Write a custom designate-sink handler - https://phabricator.wikimedia.org/T93928#1149990 (10Andrew) 3NEW [18:44:41] 10Tool-Labs: Set up alerts for mail queue - https://phabricator.wikimedia.org/T60871#1149997 (10coren) p:5Triage>3Normal a:5coren>3None [18:45:12] 10Tool-Labs: Set up alerts for mail queue - https://phabricator.wikimedia.org/T60871#606525 (10coren) Queue length is now monitored; but (afaict) there is no alerting. Changed topic accordingly. [18:45:57] 10Tool-Labs: Provide replication lag as a database function - https://phabricator.wikimedia.org/T50628#1150016 (10coren) p:5Triage>3Low a:5coren>3None [18:47:19] 10Tool-Labs: imagemapedit folder on http://tools.wmflabs.org does not open hence problem accessing files - https://phabricator.wikimedia.org/T75361#1150020 (10coren) 5Open>3Resolved Closing ticket because of inactivity. Reopen if the issue recurs. [18:50:20] 10Tool-Labs: qsub failed from jlocal crontab entry - https://phabricator.wikimedia.org/T73517#1150048 (10coren) 5Open>3Resolved Both cases report `unable to contact qmaster using port 6444 on host "tools-master.eqiad.wmflabs"` which is definitely a symptom of the intermitent DNS resolution issues we have bee... [18:50:43] 10Tool-Labs, 7Tracking: Toolserver migration to Tools (tracking) - https://phabricator.wikimedia.org/T60788#1150052 (10coren) a:5coren>3None [18:50:59] 10Tool-Labs, 7Tracking: Toolserver migration to Tools (tracking) - https://phabricator.wikimedia.org/T60788#597328 (10coren) p:5Triage>3Low [18:53:45] 10Tool-Labs, 7Tracking: Missing Toolserver features in Tools (tracking) - https://phabricator.wikimedia.org/T60791#1150067 (10coren) [18:53:47] 10Tool-Labs, 5Patch-For-Review: Tool Labs: Provide anonymized view of the user_properties table - https://phabricator.wikimedia.org/T60196#1150065 (10coren) 5Open>3Resolved The view exists, and is working. Whitelisting other properties is fairly easy, but requests for this should be made in separate tasks. [18:55:01] 10Tool-Labs: Database replicas: replicate user.user_touched - https://phabricator.wikimedia.org/T92841#1150088 (10coren) [18:57:31] YuviPanda: aww, can't use sqlite3 from grid jobs either :( [18:57:40] ggp: oh? why not? [18:58:07] YuviPanda: same thing: "sqlite3: not found" [18:58:22] ggp: You really, /really/ don't want to run sqlite3 on an NFS system anyways. [18:58:30] ggp: Why not use the actual databases? [18:58:40] ggp: ^ why not? we make mysql available... [18:59:13] Coren: yeah, I suppose I'll have to try that next. just trying solutions in increasing order of complexity :) [19:00:10] ggp: I think it's the first time that I hear someone find using an already installed and maintained database more complex than setting one up. :-) [19:03:01] using mysql requires more setup than sqlite [19:03:13] Coren: yeah, switching is probably for the best anyways. it's just that sqlite3 is what the tool currently uses (and used before being hosted on tools labs) [19:03:28] 10Tool-Labs: Can't delete NovaProxy instance with malformed DNS hostname - https://phabricator.wikimedia.org/T69927#1150115 (10coren) p:5Triage>3High a:5coren>3None [19:03:29] plus, its much easier to use sqlite locally and then copy over your already populated database to labs :P [19:03:50] 10Tool-Labs: Can't delete NovaProxy instance with malformed DNS hostname - https://phabricator.wikimedia.org/T69927#713661 (10coren) If this is still an issue, perhaps @Andrew has insight? [19:04:43] Maybe, but running a database on a networked filesystem is... an exercise in pain, frustration, and slowness. At the best of times. :-) [19:05:33] 10Tool-Labs: Sorting by CPU/VMEM columns doesn't sort by their value on http://tools.wmflabs.org/?status - https://phabricator.wikimedia.org/T69737#1150120 (10coren) p:5Triage>3Low a:5coren>3None [19:06:25] 10Tool-Labs: create procedure to show all granted permission to foreign user for own user databases - https://phabricator.wikimedia.org/T69552#1150126 (10coren) p:5Triage>3Low a:5coren>3None [19:06:47] 10Tool-Labs: create procedure to show all granted permission to foreign user for own user databases - https://phabricator.wikimedia.org/T69552#732619 (10coren) It is not clear to me that this is actually //desirable//. What use case do you perceive? [19:07:07] db='memory' :P [19:09:33] 10Tool-Labs: jsub -continuous not compatible with qalter -notify - https://phabricator.wikimedia.org/T67842#1150165 (10coren) 5Open>3declined `jsub` is specifically intended as a simplified interface to start jobs for users who do not need most of the facilities of using `qsub` directly. If you find yoursel... [19:11:38] (03PS1) 10Awight: Correct Fundraising project tag regex [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/199665 [19:12:22] 10Tool-Labs: running job with SGE doesn't get the right environment when using -once -continuous - https://phabricator.wikimedia.org/T67491#1150182 (10coren) 5Open>3declined [19:16:08] 10Tool-Labs: Crontab fixer should workaround bug 48811 when adding jsub call - https://phabricator.wikimedia.org/T66830#1150196 (10coren) 5Open>3declined Like the underlying issue (closed declined) it is not possible for a simple script to determine the proper amount of quoting to use because qsub will unquo... [19:17:07] 10Tool-Labs: lighttpd redirects URLs of directories without a trailing slash from https to http - https://phabricator.wikimedia.org/T66627#1150200 (10coren) 5Open>3Resolved This is no longer an issue since the redirection is handled by the proxy, now. [19:18:00] 6Labs: Let error responses pass through the proxy if they contain contents - https://phabricator.wikimedia.org/T66393#1150203 (10coren) a:5coren>3None [19:18:46] 6Labs: Let error responses pass through the proxy if they contain contents - https://phabricator.wikimedia.org/T66393#700815 (10coren) p:5Triage>3Normal @yuvipanda: is the current proxy setup able to do that? [19:27:17] 10Tool-Labs: lighttpd redirects URLs of directories without a trailing slash from https to http - https://phabricator.wikimedia.org/T66627#1150277 (10scfc) 5Resolved>3Open No, it is not: ``` [tim@passepartout ~]$ curl -I https://tools.wmflabs.org/typoscan HTTP/1.1 301 Moved Permanently Server: nginx/1.4.6 (... [19:31:49] 10Tool-Labs: lighttpd redirects URLs of directories without a trailing slash from https to http - https://phabricator.wikimedia.org/T66627#1150296 (10coren) Ah, hm. My understanding from the config was that this was (supposed to) be done by the proxy. @yuvipanda: is that a bug or expected behaviour? [19:32:26] 10Tool-Labs: lighttpd redirects URLs of directories without a trailing slash from https to http - https://phabricator.wikimedia.org/T66627#1150300 (10yuvipanda) The proxy should do it, but it doesn't atm and I haven't had time to do it properly yet. [19:32:53] 10Tool-Labs: lighttpd redirects URLs of directories without a trailing slash from https to http - https://phabricator.wikimedia.org/T66627#1150303 (10yuvipanda) Although, note that the new uwsgi services don't have this problem... [19:33:59] YuviPanda: That does very little for people using PHP, etc. :-) It's arguably a bug in lighty, but would be made irrelevant if the proxy did the redirect. [19:34:26] Coren: +1 on all points :) I was just curious if uwsgi did it right, and checked... [19:34:44] YuviPanda: protocol-relative? [19:34:56] Coren: yeah, tools.wmflabs.org/faces redirects to https properly [19:35:21] Coren: oh, it doesn’t actually.. [19:35:35] Coren: that’s just my httpseverywhere… >_> [19:35:47] Hah! [19:35:57] 10Tool-Labs: lighttpd redirects URLs of directories without a trailing slash from https to http - https://phabricator.wikimedia.org/T66627#1150329 (10yuvipanda) Scratch that, uwsgi also has this problem. It 'worked for me' because of httpseverywhere... [19:36:53] 10Tool-Labs: Provide resource for db access in grid - https://phabricator.wikimedia.org/T70881#1150335 (10coren) p:5Triage>3Low a:5coren>3None [19:37:26] YuviPanda: moral of the story: force https everywhere instead of having httpseverything :p [19:37:29] 6Labs: lighttpd redirects URLs of directories without a trailing slash from https to http - https://phabricator.wikimedia.org/T66627#1150340 (10coren) a:5coren>3None [19:38:59] 6Labs: lighttpd redirects URLs of directories without a trailing slash from https to http - https://phabricator.wikimedia.org/T66627#1150346 (10scfc) a:3scfc I had intended to fix this in `nginx`; as I'll be diving into that for T88216 anyhow, licking the cookie. [19:40:46] 10Tool-Labs: Errors in e-mail pipes should go to local error log, not e-mail sender - https://phabricator.wikimedia.org/T72003#1150365 (10coren) p:5Triage>3Normal a:5coren>3None [19:47:47] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Thibaut120094 was created, changed by Thibaut120094 link https://wikitech.wikimedia.org/wiki/Nova+Resource%3aTools%2fAccess+Request%2fThibaut120094 edit summary: Created page with "{{Tools Access Request |Justification=Dev some webtools for the French Wikipedia. |Completed=false |User Name=Thibaut120094 }}" [19:49:14] 10Tool-Labs: Grid engine "swallows" quotation marks (double and single quotation marks) and does not recognize pages at cs.wiki (more at "Additional Information") - https://phabricator.wikimedia.org/T74092#1150408 (10coren) p:5Triage>3Normal If the result is the same when you run it via a wrapper script, the... [19:55:23] 10Tool-Labs: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1150429 (10yuvipanda) # Bastion Hosts ## SSH works ## CPU / Memory usage within sane levels, so interactive use is possible # Grid engine (one off jobs) ## Job starts executing... [19:56:14] 10Tool-Labs: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1142166 (10yuvipanda) [19:56:27] 10Tool-Labs: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1142166 (10yuvipanda) ^ what I could think of. Others? please add in comments. [19:56:46] Coren: ^ ‘services provided by toollabs’. Do comment if I’ve missed any. I’m emailing scfc as well. [19:57:02] hmm, I cc’d him, should be enough [19:58:41] 10Tool-Labs: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1150443 (10coren) Job start delays are probably a poor metric to measure unless you mean a specific, test job with well-constrained requirements. One of the things gridengine d... [19:59:56] YuviPanda: I'll need to have this running in the background for a while to see if I can think of anything missing - that looks fairly complete. [20:00:19] 10Tool-Labs: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1150446 (10yuvipanda) @Coren Oh yeah, totally. The way to measure a lot of these would be with references - have a reference test job, reference sql queries, reference webservic... [20:01:13] Coren: \o/ cool. [20:02:00] 10Tool-Labs, 3ToolLabs-Goals-Q4: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1150451 (10yuvipanda) [20:02:31] 10Tool-Labs, 3ToolLabs-Goals-Q4: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1150452 (10coren) Plausible additions: Grid Engine: check that the number of errored out jobs is under some threshold Grid Engine: check that the number of... [20:02:44] 10Tool-Labs, 3ToolLabs-Goals-Q4: Get rid of portgranter - https://phabricator.wikimedia.org/T93046#1150453 (10yuvipanda) [20:04:12] 10Tool-Labs, 3ToolLabs-Goals-Q4: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1150461 (10yuvipanda) [20:04:18] 10Tool-Labs: Grid engine "swallows" quotation marks (double and single quotation marks) and does not recognize pages at cs.wiki (more at "Additional Information") - https://phabricator.wikimedia.org/T74092#1150464 (10valhallasw) The "page doesn't exist" might actually be caused by T93474 (i/o is assumed to be la... [20:04:50] 10Tool-Labs, 3ToolLabs-Goals-Q4: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1142166 (10yuvipanda) [20:07:04] 10Tool-Labs: logging table appears to have different indexes from what vanilla MW does (e.g no type_action index) - https://phabricator.wikimedia.org/T74167#1150501 (10coren) 5Open>3Invalid Those indexes are available on the views with restricted rows, `logging_userindex` will have those incices (because it... [20:10:04] 10Tool-Labs: Move tools-login and tools-dev to trusty - https://phabricator.wikimedia.org/T91863#1150508 (10yuvipanda) tools-login moved :) [20:10:06] 10Tool-Labs: program created by proprietary compiler allowed on labs? - https://phabricator.wikimedia.org/T74253#1150509 (10coren) 5Open>3Resolved The conclusion is that the Terms of Use do not allow tools the source of which cannot be //effectively// reused because it depends on a specific proprietary toolc... [20:11:13] 10Tool-Labs: Status page should automatically refresh data - https://phabricator.wikimedia.org/T54275#1150512 (10coren) a:5coren>3None [20:12:21] 10Tool-Labs: Document how to install Python modules in a tool's home directory/virtual environment - https://phabricator.wikimedia.org/T63824#1150523 (10coren) a:5coren>3None Leaving for a volunteer virtualenv guru. [20:13:31] 10Tool-Labs: Replicate the Phabricator database to labsdb - https://phabricator.wikimedia.org/T52422#1150529 (10coren) a:5coren>3None [20:17:06] 10Tool-Labs, 5Patch-For-Review: Enable OpenJDK 8 - https://phabricator.wikimedia.org/T68171#1150533 (10coren) 5stalled>3declined While I can see the appeal of the latest-and-greatest, it is not reasonable for us to maintain a major revision of a JDK if it has no support from the distribution for the forese... [20:23:26] 10Tool-Labs: Move tools-login and tools-dev to trusty - https://phabricator.wikimedia.org/T91863#1150559 (10yuvipanda) tools-login is now tools-bastion-01, and maybe I can create tools-bastion-02, and *that* is dev, and they share ssh certificates and we can switch them around if needed (for redundancy!). Afte... [20:23:55] 10Tool-Labs: Move tools-login and tools-dev to trusty - https://phabricator.wikimedia.org/T91863#1150561 (10yuvipanda) Hmm, actually we probably need to provide a precise instance as long as we have tools running on precise. [20:24:24] 10Tool-Labs: Clean up list of projects on Tool Labs home page and add Tomcat tools - https://phabricator.wikimedia.org/T51937#1150571 (10coren) a:5coren>3None Giving up for grabs because I cannot realistically make that change in the short term. That said, the source used to generate those pages is avaliabl... [20:24:42] YuviPanda: https://gerrit.wikimedia.org/r/#/c/198535/ should be pretty easy :) [20:28:19] Negative24: indeed [20:28:34] :) [20:28:35] YuviPanda, do you know anything about varnish on betalabs? [20:28:37] not restarting on mobile03 [20:28:54] yurik: none at all, sorry [20:29:08] Negative24: done [20:29:13] thanks [20:29:14] np, YuviPanda, do you know who might? [20:29:22] yurik: uh, bblack? :) [20:29:47] yurik: ^demon|lunch is getting up to speed on it, I believe [20:30:08] <^demon|lunch> I started reading some manifests [20:30:12] <^demon|lunch> And then went to fix tin [20:30:15] bblack knows everything :) - like why i get this when restarting varnihs Error: (-smalloc) size "0G": too small, did you forget to specify M or G? [20:30:21] ^demon|lunch, ^ [20:33:15] heh [20:34:38] it's most likely related to the various refactors that happened in m/r/c.pp recently with the jessie switch, e.g.: [20:34:41] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/cache.pp#L865 [20:37:55] 6Labs, 3ToolLabs-Goals-Q4: Puppetize & fix tools-db - https://phabricator.wikimedia.org/T88234#1150644 (10yuvipanda) [20:38:46] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1150647 (10yuvipanda) [20:39:08] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Puppetize & fix tools-db - https://phabricator.wikimedia.org/T88234#1006649 (10yuvipanda) [20:39:28] 6Labs: Make sure tools-db is backed up in some form - https://phabricator.wikimedia.org/T88716#1150659 (10yuvipanda) [20:39:42] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Make sure tools-db is backed up in some form - https://phabricator.wikimedia.org/T88716#1018754 (10yuvipanda) [20:40:10] oh, it's the frontend storage that's broken in this case, the malloc [20:40:41] because the machine only has 2G of ram, and we do the math for the fraction of system memory to use for the frontend cache in whole GBs :) [20:41:24] y’know, if we’ve switched prod to jessie then deployment-prep should also switch. [20:45:05] probably :) [20:45:28] I'll add doing that to my list of things I'll get to next decade [20:56:32] chasemp: Would this work? https://gerrit.wikimedia.org/r/#/c/199690/1 [20:58:00] chasemp: nope. just a sec [20:59:50] chasemp: there [21:01:52] actually it looks like beta cache boxes are broken in multiple ways, just nobody ever updates/restarts or looks at puppet output [21:01:57] Starting nginx: nginx: [emerg] SSL_CTX_use_PrivateKey_file("/etc/ssl/private/star.wmflabs.org.key") failed (SSL: error:0B080074:x509 certificate routines:X509_check_private_key:key values mismatch) [21:02:14] something has gone wrong with priv/pub key mismatch there too [21:06:11] twentyafterfour: Could you also look at that change ^^ [21:06:56] Negative24: what is it you are trying to do? [21:07:06] I mean end goal [21:07:17] get security extension managed on phab-02 [21:07:20] syslog only goes back to march18, but puppet's been failing to start nginx since then [21:07:28] I guess nobody tests SSL on beta :P [21:08:17] Negative24: sure but why, that class as it stands there won't work for the next guy because as we noted there is extra configuration / required projects etc [21:08:24] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/PhantomTech was created, changed by PhantomTech link https://wikitech.wikimedia.org/wiki/Nova+Resource%3aTools%2fAccess+Request%2fPhantomTech edit summary: Created page with "{{Tools Access Request |Justification=Hosting my bot. The bot's code will likely not be made public because it is used to fight more persistent vandals on the English Wikipedi..." [21:08:43] so it is a partial configuration, so I'm asking why you are trying to get it on phab-02 because at the moment it doesn't need to go through puppet [21:08:52] and to do it right is more complicated than that change [21:09:07] so just configure manually on phab-02? [21:09:33] if we need it there but why do we need it there? [21:09:57] I have no objections but I'm trying to be more helpful than harmful and what you are doing I don't understand the why [21:11:36] we don't need it there. its just a way to setup labs a little more like prod [21:16:32] in that case the right thing to so is to write a seed file that supplies the needed configuration and just have security extension be default in labs [21:18:49] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Setup a redis slave for toollabs as backup / redundancy - https://phabricator.wikimedia.org/T91239#1150830 (10yuvipanda) [21:18:58] 10Tool-Labs: Move tools-login and tools-dev to trusty - https://phabricator.wikimedia.org/T91863#1150831 (10scfc) I think since we moved the `crontab`s from `tools-login`, there is no real compelling reason to differentiate between different bastions; users should be nice to each other everywhere :-). I still h... [21:19:04] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Have bigbrother run on multiple nodes to provide redundancy against tools-submit failure - https://phabricator.wikimedia.org/T91237#1150832 (10yuvipanda) [21:19:24] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Move toollabs instances around to minimize damage from a single downed virt* host - https://phabricator.wikimedia.org/T91072#1150834 (10yuvipanda) [21:19:37] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Set up a schedule for doing failover exercises for toollabs - https://phabricator.wikimedia.org/T91068#1150835 (10yuvipanda) [21:20:09] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Retire 'tomcat' node, make Java apps run on the generic webgrid - https://phabricator.wikimedia.org/T91066#1150839 (10yuvipanda) [21:20:17] (03PS1) 10John F. Lewis: web: change staff page [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199722 [21:20:53] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Monitor bigbrother - https://phabricator.wikimedia.org/T90850#1150840 (10yuvipanda) [21:21:21] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 7Tracking: Make dumps syncing to Labs NFS reliable enough (Tracking) - https://phabricator.wikimedia.org/T90848#1150841 (10yuvipanda) [21:21:45] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 7Tracking: Replace bigbrother and ssh-cron-thingy with service manifests - https://phabricator.wikimedia.org/T90561#1150843 (10yuvipanda) Startin [21:21:56] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 7Tracking: Make sure that toollabs can function fully even with one virt* host fully down - https://phabricator.wikimedia.org/T90542#1150850 (10yuvipanda) [21:22:21] (03CR) 10Alpha: [C: 032 V: 032] web: change staff page [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199722 (owner: 10John F. Lewis) [21:26:00] 6Labs, 10Tool-Labs: Define expected service level agreement for tools - https://phabricator.wikimedia.org/T90535#1150860 (10yuvipanda) 5Open>3Resolved a:3yuvipanda So T93622 is up for defining the actual services provided, and according to the ops quarterly goal for labs, uptime SLA is 99.5% (https://www... [21:26:01] 6Labs, 10Tool-Labs, 7Tracking: Make toollabs reliable enough (Tracking) - https://phabricator.wikimedia.org/T90534#1150865 (10yuvipanda) [21:26:15] 6Labs, 10Tool-Labs: Define expected service level agreement for tools - https://phabricator.wikimedia.org/T90535#1150866 (10yuvipanda) Note that exact details of what constitutes uptime, etc are happening in the other bug. [21:28:54] 10Tool-Labs: Move tools-login and tools-dev to trusty - https://phabricator.wikimedia.org/T91863#1150888 (10yuvipanda) ah, right. I wonder if toolwatcher should actually be on tools-master or something? I guess we can put that up on both tools-master and tools-shadow, so it has redundancy... [21:29:24] 10Tool-Labs: Support uwsgi web servers directly on toollabs - https://phabricator.wikimedia.org/T85202#1150889 (10yuvipanda) 5Open>3Resolved a:3yuvipanda webservice2 uwsgi-python start ;) [21:30:03] 10Tool-Labs: Investigate using monit to replace bigbrother - https://phabricator.wikimedia.org/T76840#1150893 (10yuvipanda) 5Open>3Invalid a:3yuvipanda Yeah, I guess monit doesn't really make much sense here. But we'll end up with something, and I wonder if that'll just be hand built as well... [21:31:13] chasemp: seed file? [21:31:18] 10Tool-Labs, 3ToolLabs-Goals-Q4: Make webservice2 default webservice implementation - https://phabricator.wikimedia.org/T90855#1150901 (10yuvipanda) [21:32:02] twentyafterfour: and I have talked about this, basically a .sql file that has a default user / password / initial projects / etc [21:32:10] that is loaded post vanilla install [21:32:12] 6Labs, 10Tool-Labs: Implement 'webservice2 status' - https://phabricator.wikimedia.org/T93560#1150905 (10yuvipanda) [21:32:14] 10Tool-Labs, 3ToolLabs-Goals-Q4: Make webservice2 default webservice implementation - https://phabricator.wikimedia.org/T90855#1069456 (10yuvipanda) [21:32:28] that + security extension default would be a complete picture [21:32:56] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Document labsdb replication set up - https://phabricator.wikimedia.org/T85868#1150906 (10yuvipanda) [21:33:26] chasemp: ah thanks [21:34:00] 10Tool-Labs, 3ToolLabs-Goals-Q4: The proxylistener service isn't puppetized - https://phabricator.wikimedia.org/T93121#1150909 (10yuvipanda) [21:34:11] 10Tool-Labs, 3ToolLabs-Goals-Q4: The portgranter service isn't puppetized - https://phabricator.wikimedia.org/T93120#1150918 (10yuvipanda) [21:39:26] 6Labs, 10Tool-Labs, 7Tracking: Make dumps syncing to Labs NFS reliable enough (Tracking) - https://phabricator.wikimedia.org/T90848#1150951 (10yuvipanda) [21:43:51] 6Labs, 10Tool-Labs: Document labsdb replication set up - https://phabricator.wikimedia.org/T85868#1150993 (10yuvipanda) [21:44:24] 6Labs, 10Wikimedia-Labs-Infrastructure, 3ToolLabs-Goals-Q4: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1150999 (10yuvipanda) [21:48:20] 10Tool-Labs, 3ToolLabs-Goals-Q4: Show replication lags in Graphite - https://phabricator.wikimedia.org/T50694#1151027 (10yuvipanda) [21:49:29] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Implement 'webservice2 status' - https://phabricator.wikimedia.org/T93560#1151030 (10yuvipanda) [21:51:47] 10Tool-Labs: Move tools-login and tools-dev to trusty - https://phabricator.wikimedia.org/T91863#1151052 (10yuvipanda) @scfc: So the benefits of having bastion-01 and bastion-02 is that when a virt node goes down we can just switch the floating IP as well (hmm, need to see if you can have two IPs refer to the sa... [21:52:59] 10Tool-Labs, 3ToolLabs-Goals-Q4: Make tools-login / bastion hosts redundant and move them to trusty - https://phabricator.wikimedia.org/T91863#1151069 (10yuvipanda) [21:54:30] 10Tool-Labs, 3ToolLabs-Goals-Q4: Set up alerts for mail queue - https://phabricator.wikimedia.org/T60871#1151081 (10yuvipanda) [21:54:54] 10Tool-Labs, 3ToolLabs-Goals-Q4: Make webservice2 activities blocking - https://phabricator.wikimedia.org/T93334#1151084 (10yuvipanda) [21:55:29] 10Tool-Labs, 3ToolLabs-Goals-Q4: Monitor that proxylistener is accepting new connections - https://phabricator.wikimedia.org/T91958#1151088 (10yuvipanda) [21:56:39] 10Tool-Labs, 3ToolLabs-Goals-Q4: Document / get rid of jobkill.pl - https://phabricator.wikimedia.org/T91233#1151094 (10yuvipanda) [21:58:10] 10Tool-Labs, 3ToolLabs-Goals-Q4: Make list.php not rely on portgranter - https://phabricator.wikimedia.org/T93197#1151103 (10yuvipanda) [21:58:46] 10Tool-Labs, 3ToolLabs-Goals-Q4: Provide a status page (list) of all active proxy definitions - https://phabricator.wikimedia.org/T88216#1151107 (10yuvipanda) [22:01:19] 10Tool-Labs, 3ToolLabs-Goals-Q4: Provide a status page (list) of all active proxy definitions - https://phabricator.wikimedia.org/T88216#1151137 (10yuvipanda) @scfc: The reason we have the socket setup is to prevent one particular class of race conditions that perhaps let one tool pretend to be another tool fo... [22:01:44] 10Tool-Labs, 3ToolLabs-Goals-Q4: Provide a status page (list) of all active proxy definitions - https://phabricator.wikimedia.org/T88216#1151142 (10yuvipanda) Of course, if we have a reliable way of executing code when the tool's webprocess ends, that changes everything:) [22:07:02] 10Tool-Labs, 3ToolLabs-Goals-Q4: Make tools-login / bastion hosts redundant and move them to trusty - https://phabricator.wikimedia.org/T91863#1151184 (10scfc) I //think// `toolwatcher` could run on two instances with low risk of race conditions. Regarding redundancy for bastions, sure, but I do remember vivi... [22:13:05] Looks like https://tools.wmflabs.org/catscan2/catscan2.php is down again [22:23:20] 10Tool-Labs, 3ToolLabs-Goals-Q4: Provide a status page (list) of all active proxy definitions - https://phabricator.wikimedia.org/T88216#1151250 (10scfc) Well, at the moment the stale proxy entries are up for grabs by anyone until an admin intervenes. I thought about putting the shutdown code in an SGE epilog... [23:50:10] twentyafterfour: If you have a min. could you see something? [23:50:46] Negative24: sure? [23:51:20] ok. I have configured load-libraries with sprint and security and updated the tables [23:51:26] on phab-02 [23:51:41] and I have placed a phlog in the register function in the listener [23:52:12] but I'm not seeing anything in the error logs nor any security enforcement actions [23:52:49] Negative24: ok I'll take a look [23:53:02] twentyafterfour: ok thanks [23:55:40] twentyafterfour: I feel like I did something really stupid and obvious [23:56:38] sudo bin/config set events.listeners '["SecurityPolicyEventListener"]' [23:56:50] (I just did that, to add the event listener) [23:57:19] hmm. I actually remember seeing that [23:58:19] restarting daemons... [23:58:27] not sure if that fixed it [23:58:55] daemons report that SecurityPolicyEventListener isn't apart of the phutil lib maps [23:59:19] that has to do with the way .arcconfig is setup