[00:34:36] 6Labs, 10Tool-Labs: allow tool users to attach strace to their processes (at least on exec hosts) - https://phabricator.wikimedia.org/T114401#1695903 (10scfc) [04:21:58] 6Labs, 10Tool-Labs: Initial Deployment of Kubernetes to Tool Labs (Tracking) - https://phabricator.wikimedia.org/T111885#1696090 (10yuvipanda) tools.wmflabs.org/nagf is now running on kubernetes! \o/ So is grrrrit-wm. [05:59:59] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3labs-sprint-116: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#1696213 (10yuvipanda) [06:43:43] 6Labs, 3labs-sprint-116: Make labs domainproxies fully redundant - https://phabricator.wikimedia.org/T98556#1696269 (10yuvipanda) a:3yuvipanda [06:44:12] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 3Labs-Sprint-115, and 2 others: Add support to dynamicproxy for kubernetes based web services - https://phabricator.wikimedia.org/T111916#1619805 (10yuvipanda) tools.wmflabs.org/nagf is now running on kubernetes!!!! \o/ \o/ \o/ HI5 [09:30:51] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3labs-sprint-116: Make sure tools-db is backed up in some form - https://phabricator.wikimedia.org/T88716#1696460 (10jcrespo) Once T88718 is finished (most of the work has been done already), backups can be taken from the slave consistently. A slave replica will pre... [09:51:37] 6Labs, 10Tool-Labs, 7Database, 3Labs-Q4-Sprint-1, and 5 others: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1696524 (10jcrespo) My apologies, @akosiaris, I've actually seen that ferm has been already applied here via the postgres role, so you are not blocking me. [10:02:23] (03CR) 10John Vandenberg: [C: 031] "tox.ini's env list is for CI, and the official support matrix, which usually go hand in hand." [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/242348 (owner: 10Hashar) [10:17:25] 10Tool-Labs-tools-Erwin's-tools: [delete.php] Project filtering: select namespace, sort by number - https://phabricator.wikimedia.org/T114475#1696576 (10Nemo_bis) 3NEW a:3Nemo_bis [10:17:32] 10Tool-Labs-tools-Erwin's-tools: [delete.php] Project filtering: select namespace, sort by number - https://phabricator.wikimedia.org/T114475#1696584 (10Nemo_bis) 5Open>3Resolved [12:00:36] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 3Labs-Sprint-115, and 2 others: Add support to dynamicproxy for kubernetes based web services - https://phabricator.wikimedia.org/T111916#1696734 (10Joe) What do we still need: - Download requests with pip3 (needs a pip3 provider or an ugly exec) - Provide the kube to... [12:44:13] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3labs-sprint-116: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#1696821 (10scfc) [13:24:02] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3labs-sprint-116: Make sure tools-db is backed up in some form - https://phabricator.wikimedia.org/T88716#1696919 (10scfc) >>! In T88716#1185933, @coren wrote: > @scfc: This is DR backups, not partially restorable backups. and I'd repeat my comment T88716#1181437:... [13:44:40] yuvipanda: When you get a few minutes, can you give https://gerrit.wikimedia.org/r/#/c/239377/ a bit of tlc? :-) [13:45:07] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3labs-sprint-116: Make sure tools-db is backed up in some form - https://phabricator.wikimedia.org/T88716#1696957 (10jcrespo) [13:45:09] 6Labs, 10Tool-Labs, 7Database, 3Labs-Q4-Sprint-1, and 5 others: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1696958 (10jcrespo) [13:46:47] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3labs-sprint-116: Make sure tools-db is backed up in some form - https://phabricator.wikimedia.org/T88716#1018754 (10jcrespo) Just a comment: I would make sure to announce the no-recovery guarantee to labs list (and intend to). [14:25:23] (03PS1) 10Andrew Bogott: Added dummy secrets/labstore/id_labstore to make puppet compiler happy. [labs/private] - 10https://gerrit.wikimedia.org/r/243183 [14:25:37] (03CR) 10Andrew Bogott: [C: 032 V: 032] Added dummy secrets/labstore/id_labstore to make puppet compiler happy. [labs/private] - 10https://gerrit.wikimedia.org/r/243183 (owner: 10Andrew Bogott) [14:29:35] (03PS1) 10Andrew Bogott: Added dummy user/password passwords::mysql::labsdb [labs/private] - 10https://gerrit.wikimedia.org/r/243184 [14:30:01] (03CR) 10Andrew Bogott: [C: 032 V: 032] Added dummy user/password passwords::mysql::labsdb [labs/private] - 10https://gerrit.wikimedia.org/r/243184 (owner: 10Andrew Bogott) [14:56:11] 6Labs, 10Labs-Infrastructure, 3labs-sprint-117: Give 'novaobserver' keystone account rights to read everything, everywhere, write or change nothing - https://phabricator.wikimedia.org/T104588#1697105 (10Andrew) [15:02:35] (03CR) 10Hashar: "recheck" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/241568 (owner: 10Jean-Frédéric) [15:03:31] (03CR) 10Hashar: "Fails with:" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/241568 (owner: 10Jean-Frédéric) [15:33:14] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:17] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [16:09:47] 6Labs, 5Patch-For-Review, 3labs-sprint-116: Fix check_disk bogus alerts on labstore1002 - https://phabricator.wikimedia.org/T113435#1697419 (10coren) The issue was annoyingly well-hidden: the check_disk nrpe option `-i` takes a //regular expression// as its argument, but the patch had provided a //glob// ins... [16:10:03] 6Labs, 5Patch-For-Review, 3labs-sprint-116: Fix check_disk bogus alerts on labstore1002 - https://phabricator.wikimedia.org/T113435#1697420 (10coren) 5Open>3Resolved [16:39:23] 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-114: Ironic on Labs - https://phabricator.wikimedia.org/T110556#1697494 (10chasemp) **Preamble** I fell down the rabbit hole on this Ironic and Neutron stuff and I want to relay my thoughts. I am repeating things we all know, but I want to put it all in context.... [16:48:45] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Make sure nova is re-using old private IPs - https://phabricator.wikimedia.org/T113648#1697532 (10Andrew) The schema says: schema.UniqueConstraint( "address", "deleted", name="uniq_fixed_ips0address0deleted"), So, that could explain why th... [16:49:40] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Make sure nova is re-using old private IPs - https://phabricator.wikimedia.org/T113648#1697533 (10Andrew) The IP is allocated by this: fixed_ip_ref = model_query(context, models.FixedIp, session=session, read_deleted=... [16:54:56] 6Labs, 7Database, 5Patch-For-Review: Delete dbname = 'centralauth' from meta_p.wikis table in labs replicas - https://phabricator.wikimedia.org/T101750#1697554 (10Krenair) 5Open>3Resolved [17:02:30] 10Tool-Labs-tools-Article-request: Click targets for checkboxes should include the labels - https://phabricator.wikimedia.org/T114496#1697566 (10APerson) 3NEW a:3Matthewrbowker [17:19:36] 6Labs, 10Tool-Labs: Rewrite the meta_p table populating code to python and have it run on a cron - https://phabricator.wikimedia.org/T107094#1697631 (10Krenair) [17:20:12] 6Labs, 10Tool-Labs: Rewrite the meta_p table populating code to python and have it run on a cron - https://phabricator.wikimedia.org/T107094#1486552 (10Krenair) This might need a follow-up change, I seem to recall we found differences in the output. Might be better since the perl script has been re-run to get... [17:55:54] 6Labs, 10Tool-Labs: Slow running query on cawiki.labsdb; incomplete/missing table stats? - https://phabricator.wikimedia.org/T114513#1697892 (10Tb) 3NEW a:3jcrespo [17:56:02] 6Labs, 7Tracking: Labs project: popcorn - https://phabricator.wikimedia.org/T114514#1697901 (10brion) 3NEW [18:04:16] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [18:04:44] 6Labs, 10Labs-Infrastructure, 6operations: install/setup labservices1001 - https://phabricator.wikimedia.org/T106584#1697943 (10Andrew) 5Open>3Resolved [18:04:59] 6Labs, 10Labs-Infrastructure, 6operations: install/setup labservices1001 - https://phabricator.wikimedia.org/T106584#1472236 (10Andrew) [18:06:24] 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-111: Labs virt capacity expansion - https://phabricator.wikimedia.org/T107624#1697954 (10Andrew) [18:06:46] 6Labs, 10Tool-Labs: Slow running query on cawiki.labsdb; incomplete/missing table stats? - https://phabricator.wikimedia.org/T114513#1697955 (10jcrespo) I think it should work now, can you confirm it? ``` mysql> EXPLAIN SELECT sug_orig_ns, sug_orig, sug_new_ns, sug_new -> FROM p50380g50491__rlrl_cawiki_... [18:07:17] ^that is 10 minutes to read, fix and report results about a bug [18:08:36] 6Labs, 10Labs-Infrastructure: Investigate keystone lockups - https://phabricator.wikimedia.org/T104884#1697961 (10Andrew) 5Open>3Invalid this hasn't happened in ages. [18:08:53] 6Labs, 10Tool-Labs, 7Database: Slow running query on cawiki.labsdb; incomplete/missing table stats? - https://phabricator.wikimedia.org/T114513#1697965 (10jcrespo) [18:10:40] 6Labs, 7Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#1697974 (10Andrew) [18:10:42] 6Labs, 7Tracking: Labs project: popcorn - https://phabricator.wikimedia.org/T114514#1697971 (10Andrew) 5Open>3Resolved a:3Andrew Done. Don't forget to set up your services groups before you create instances :) -A [18:14:12] 6Labs, 10Tool-Labs, 7Database: Slow running query on cawiki.labsdb; incomplete/missing table stats? - https://phabricator.wikimedia.org/T114513#1697991 (10Tb) Perfect ta. Runtime for the query has reduced from ~840 hours to 0.89 seconds. [18:17:01] 6Labs, 10Tool-Labs, 7Database: Slow running query on cawiki.labsdb; incomplete/missing table stats? - https://phabricator.wikimedia.org/T114513#1698000 (10jcrespo) 5Open>3Resolved > ~840 hours to 0.89 seconds. Lol. I will put it on my resume. Happy to help. Keep reporting issues you find! :-) [18:25:16] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [18:46:07] 6Labs, 10Tool-Labs, 7Database: Another slow running query on - lvwiki.labsdb this time. - https://phabricator.wikimedia.org/T114519#1698113 (10Tb) 3NEW a:3jcrespo [18:49:50] 6Labs, 10Tool-Labs, 7Database: Another slow running query on - lvwiki.labsdb this time. - https://phabricator.wikimedia.org/T114519#1698139 (10jcrespo) Can you send a list of all wikis you potentially query? That way we avoid the back an forth and fix them all at once. [18:57:14] 6Labs, 10Tool-Labs, 7Database: Another slow running query on - lvwiki.labsdb this time. - https://phabricator.wikimedia.org/T114519#1698206 (10Tb) The tool generating these queries is configured for: enwiki, enwiktionary, enwikt, enwikq, ennews, dewiki, itwiki, frwiki, plwiki, eswiki, ruwiki, nlwiki, jawiki... [19:14:40] 6Labs, 10Tool-Labs, 3Labs-Sprint-115, 5Patch-For-Review, 3labs-sprint-116: Attribute cache issue with NFS on Trusty - https://phabricator.wikimedia.org/T106170#1698250 (10coren) Change is pushed; reboot of nodes required before it can take effect. [19:17:20] 6Labs, 10Tool-Labs, 7Database: Another slow running query on - lvwiki.labsdb this time. - https://phabricator.wikimedia.org/T114519#1698253 (10jcrespo) 5Open>3Resolved That should be it: ``` *************************** 1. row *************************** id: 1 select_type: PRIMARY ta... [19:19:55] (03CR) 10Jean-Frédéric: "recheck" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/241568 (owner: 10Jean-Frédéric) [19:24:46] 6Labs, 10Tool-Labs, 7Database: Another slow running query on - lvwiki.labsdb this time. - https://phabricator.wikimedia.org/T114519#1698275 (10Tb) A vast improvement once again - many thanks. [19:35:18] (03CR) 10Merlijn van Deen: "recheck" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/241568 (owner: 10Jean-Frédéric) [19:35:31] (03CR) 10Merlijn van Deen: "(let's see if it likes me more? :/)" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/241568 (owner: 10Jean-Frédéric) [19:39:35] ok let's see if i can set up my instance correctly the first time round on this project :D [19:44:13] whoever is doing "SELECT * FROM image" (no where), please, don't. Use a dump. [19:44:25] you are making the database slow for everyone else [19:47:28] also, if you send your queries several time, thinking that some will go through, think again: they are more likely to be rejected. Thanks. [20:05:15] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [20:08:46] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Make sure nova is re-using old private IPs - https://phabricator.wikimedia.org/T113648#1698373 (10Andrew) 5Open>3Resolved After reading a lot of code, I'm mostly convinced that this is fine. [20:50:57] You know maybe if you bothered using a query killer you wouldn't have these issues [20:56:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0] [20:56:43] Dispenser: a more friendly tone would be appreciated. [21:02:20] Dispenser: you just got unbanned. Adopting a more friendly tone would definitely be more appreciated. [21:04:09] All those in favour of killing databases and going to gitipedia say aye! [21:04:24] jynus: ^ ;) [21:05:55] The ban wasn't exactly just [21:11:19] dispenser you aren't helping yourself here. [21:12:14] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:01:15] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [22:16:38] 10MediaWiki-extensions-OpenStackManager, 10Librarization, 10MediaWiki-extensions-Translate: Bring in spyc for OpenStackManager and Translate via composer - https://phabricator.wikimedia.org/T75945#1698838 (10Reedy) >>! In T75945#1471245, @Nikerabbit wrote: > Just for FYI phpyaml is now the recommended librar... [22:21:28] 10MediaWiki-extensions-OpenStackManager: Replace spyc with phpyaml in OSM - https://phabricator.wikimedia.org/T114539#1698852 (10Reedy) 3NEW [22:22:17] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [22:49:03] 6Labs, 6Discovery, 7Elasticsearch, 3labs-sprint-116: Replicate production elasticsearch indices to labs - https://phabricator.wikimedia.org/T109715#1698937 (10chasemp) I allowed labs instances to talk to port 80 on this box as requested... ``` + term nobelium-elastic { + from { +... [22:50:49] Is icinga.icinga.eqiad.wmflabs broken? [22:51:10] Configure links don't work for me, trying to log in returns "Permission denied (publickey)." [22:52:14] urgh, everything is showing blank now... guess I need to log out and in again [22:52:31] I really need to debug that stupid bug at some point [22:54:18] Okay so I can configure, no login [22:54:31] Krenair: the instance migt be dead [22:57:13] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [23:03:22] 6Labs, 10Labs-Infrastructure, 6operations: add logrotate for designate logs - https://phabricator.wikimedia.org/T114544#1698962 (10Dzahn) [23:03:37] 6Labs, 10Labs-Infrastructure, 6operations: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1698966 (10Dzahn) [23:04:07] 6Labs, 10Labs-Infrastructure, 6operations: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1698955 (10Dzahn) [23:05:05] yuvipanda: fyi https://phabricator.wikimedia.org/T114544 [23:05:49] or andrewbogott: [23:06:18] mutante: whoops. [23:08:08] mutante: think itll survive the weekend? [23:08:14] I'll check in a few hours [23:09:23] yuvipanda: yes, i'll move the gzipped file to /srv [23:09:29] then there's even a bit more [23:09:37] mutante: yup! thanks :D [23:11:04] 6Labs, 10Labs-Infrastructure, 6operations: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1698983 (10Dzahn) moved to gzipped log to /srv which has lots of free space and is not used [23:54:46] yuvipanda, so is it dead? [23:55:17] Krenair: what is dead? [23:55:45] I don't know, I was asking you if icinga.icinga.eqiad.wmflabs is dead [23:56:04] oh [23:56:12] it could be? I've never used that instance [23:56:33] would you be able to check? [23:57:39] sure [23:58:37] Krenair: can't get in with root key [23:58:39] I declare it dead [23:58:52] heh, ok