[00:01:17] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1629101 (10Dzahn) [00:15:33] andrewbogott, is wikitech behaving badly for you as well? [00:16:03] ah, nope [00:16:05] my bad [00:16:11] thought login was broken [00:20:46] helloo YuviPanda. I think we are very close to finish the survey. just sent you an email about it. we can wrap it up tomorrow if you'll be around, or later tonight. [00:23:32] leila: looking! [00:48:35] 6Labs, 10Tool-Labs, 5Patch-For-Review: new labs host sends out "mpt raid status change" emails - https://phabricator.wikimedia.org/T104779#1629216 (10scfc) 5stalled>3Resolved a:3scfc On a new Trusty instance, there was no `mpt-statusd` process, so I think this is resolved. [00:51:56] leila: replied [00:53:22] leila: <3 thank you very much! [01:08:09] 6Labs, 7Shinken: Newly created instance is in ERROR state - https://phabricator.wikimedia.org/T111988#1629267 (10scfc) 5Open>3Resolved a:3Andrew [01:08:50] 6Labs, 7Shinken: Newly created instance is in ERROR state - https://phabricator.wikimedia.org/T111988#1621894 (10scfc) (I didn't try to "salvage" the newly created instance, but did just create another one, and that went fine as usual.) [01:19:50] andrewbogott: I created sentry-alpha4.sentry.eqiad.wmflabs during the openstack upgrade (sorry about that), and it seems to be stuck in some half-existing state [01:19:57] can you remove it? [01:41:13] 6Labs, 6Discovery, 10Maps: Replacements for a.toolserver.org, b.toolserver.org, c.toolserver.org not available - https://phabricator.wikimedia.org/T103272#1629295 (10Krinkle) This is causing SSL certificate warnings on production wikis at three levels: https://nl.wikipedia.org/wiki/Amsterdam -> "Kaart" (Map... [03:31:28] !log ores restart redis server on ores-redis-02 to apply tcp-keepalive [03:31:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [09:16:42] 6Labs, 10Tool-Labs, 10Continuous-Integration-Config, 5Patch-For-Review: Change sid pbuilder image name to 'unstable' - https://phabricator.wikimedia.org/T111097#1629825 (10hashar) https://gerrit.wikimedia.org/r/#/c/237604/ creates a symlink for unstable to sid as suggested by @akosiaris above. [09:18:04] 6Labs, 10Tool-Labs, 10Continuous-Integration-Config, 5Patch-For-Review: Change sid pbuilder image name to 'unstable' - https://phabricator.wikimedia.org/T111097#1629829 (10hashar) a:5akosiaris>3hashar [09:18:18] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Change sid pbuilder image name to 'unstable' - https://phabricator.wikimedia.org/T111097#1594382 (10hashar) [09:18:37] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Change sid pbuilder image name to 'unstable' - https://phabricator.wikimedia.org/T111097#1594382 (10hashar) p:5Triage>3Normal [09:40:55] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Scaling: curl http://169.254.169.254/latest/meta-data/public-keys/ is unavailable - https://phabricator.wikimedia.org/T112001#1629847 (10hashar) It works now! On the `contintcloud` project I have generated a ssh key pair `hashar-cloudinit-keypair`. Boot... [09:41:06] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Scaling: curl http://169.254.169.254/latest/meta-data/public-keys/ is unavailable - https://phabricator.wikimedia.org/T112001#1629848 (10hashar) 5Open>3Resolved a:3hashar @andrew I am not sure whether you fixed it over night / Juno upgrade fixed... [09:43:56] 6Labs, 10Tool-Labs, 10Continuous-Integration-Config: Job labs-toollabs-debian-glue is failing for labs/toollabs repository - https://phabricator.wikimedia.org/T110939#1629857 (10hashar) [09:43:58] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Change sid pbuilder image name to 'unstable' - https://phabricator.wikimedia.org/T111097#1629855 (10hashar) 5Open>3Resolved Solved by using a symlink from unstable to sid. Thank you @akosiaris for the suggestion. [09:48:45] Howdy, can anybody help me. Having issues ssh(ing) to gerrit... Error: Unable to negotiate with 208.80.154.81: no matching key exchange method found. Their offer: diffie-hellman-group1-sha1 [10:40:02] Cblair91: yeah we have a bug for that [10:40:24] Cblair91: https://phabricator.wikimedia.org/T112025 "Wikimedia Gerrit doesn't work if OpenSSH version is higher than 7.0 [10:40:25] " [10:40:54] Cblair91: so in your ~/.ssh/config you want to fallback to an older algo: [10:40:57] Host gerrit.wikimedia.org [10:40:57] KexAlgorithms +diffie-hellman-group1-sha1 [10:47:47] hashar: Thanks, will try that later. Decided to fall back and use a box server I have to handle my git stuff instead :P [11:11:04] 6Labs, 10Salt, 6operations: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1630013 (10ArielGlenn) so the reason that keys don't get deleted from salt via this script when the instance is deleted is that (some of) them stay around in ldap. Is that intentio... [12:50:38] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Wikiscan was created, changed by Wikiscan link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Wikiscan edit summary: Created page with "{{Tools Access Request |Justification=Statistics for wikiscan.org |Completed=false |User Name=Wikiscan }}" [13:07:19] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Setup monitoring and reporting for disk space usage of each project on NFS - https://phabricator.wikimedia.org/T106476#1630316 (10coren) It turns out that the scheme I had thought of is considerably less useful than I had init... [13:23:19] What's a reasonable time to wait for a reboot to complete (ordered via Special:NovaInstance)? [13:24:40] Nemo_bis: That really depends on a lot of things. There is a puppet run at boot by default which can add minutes to this - especially if it hasn't been run in a while. In general, though it should be a bit below the 5 minute mark at worse. [13:24:56] Best case is about 1 minute. [13:27:32] If it seems to have been stuck for longer than than, the console output might point at a specific issue. [13:32:08] Coren: that's what worries me, the console output is blank. :) https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&instanceid=20853100-12c6-480d-9eb7-a3d9e1864280&project=pagemigration®ion=eqiad [13:32:41] But this instance was supposed to be rebooted in one of the past maintenances IIRC, perhaps it failed back then and is now irrecoverable. Can I just delete it? [13:33:30] That points to an issue, although not a very specific one. :-) Yeah, you can delete it if it is disposable. In fact, that's probably the best option unless you have valuable data on it. If you want, though, I can give a quick try to a forcible manual restart first. [13:34:28] Nah, not worth it. [13:34:47] Fair 'nuff. [13:35:22] 6Labs, 3Labs-sprint-112, 3ToolLabs-Goals-Q4, 3labs-sprint-113: Fix documentation & puppetization for labs NFS - https://phabricator.wikimedia.org/T88723#1630393 (10mark) I think a diagram would be really helpful too... [14:38:43] tgr|away: alpha4 is cleaned up now… nova was just waiting for that virt node to come back online. [14:40:57] 6Labs, 10Labs-Infrastructure, 3Labs-sprint-112, 5Patch-For-Review, 3labs-sprint-113: Update remaining virt nodes to kilo - https://phabricator.wikimedia.org/T112200#1630574 (10Andrew) labvirt1002 done [14:49:36] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Setup monitoring and reporting for disk space usage of each project on NFS - https://phabricator.wikimedia.org/T106476#1630590 (10scfc) What about not monitoring disk usage after the fact, but instead (always) creating an volu... [14:51:21] !log ores removed ores-web-02 from ores-lb-02 pool [14:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [14:57:19] (03CR) 10Niedzielski: "@Yuvipanda, please review :)" [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/231697 (https://phabricator.wikimedia.org/T99115) (owner: 10Niedzielski) [15:00:53] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Wikiscan was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=177179 edit summary: [15:15:52] (03CR) 10Tim Landscheidt: "recheck" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/234934 (https://phabricator.wikimedia.org/T91231) (owner: 10Tim Landscheidt) [16:03:27] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1630870 (10BBlack) [16:59:52] 6Labs, 10Tool-Labs: role::relic - changes not applied by puppet? on which node or instance is it? - https://phabricator.wikimedia.org/T104537#1631214 (10coren) 5Open>3Resolved role::relic is currently enabled on the instance: https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=conf... [17:02:07] 6Labs, 7Shinken: Newly created instance is in ERROR state - https://phabricator.wikimedia.org/T111988#1631240 (10scfc) 5Resolved>3Open http://permalink.gmane.org/gmane.org.wikimedia.labs/4039 said that the issue should have been resolved, but it has reappeared for me when I try to create new instances. I... [17:15:52] 6Labs: Bring toolserver.org redirects back - https://phabricator.wikimedia.org/T109488#1631324 (10scfc) 5Open>3Resolved a:3scfc http://permalink.gmane.org/gmane.org.wikimedia.labs.announce/76: > As an update: The security team has completed their review, and the > redirects are back online. Thank you for... [17:16:02] 6Labs: Bring toolserver.org redirects back - https://phabricator.wikimedia.org/T109488#1631327 (10scfc) a:5scfc>3None [18:23:47] YuviPanda: hello! do we have a grafana for labmon1001 statsd ? [18:24:36] No hashar [18:25:01] YuviPanda: and I guess servers on the labs host subnet can't reach statsd.eqiad.wmnet but should send to labmon1001 right ? [18:25:14] Yup [18:26:32] though labmon1001 has [18:26:33] ./puppet/statsd.yaml::statsd_host: 'statsd.eqiad.wmnet' [18:26:33] ./puppet/statsd.yaml::statsd_port: 8125 [18:26:55] and I can't find where its diamond metrics are send to :/ [18:27:41] ah to statsd.eqiad.wmnet [18:27:43] :-D [18:28:36] :) [18:28:38] yesss [18:47:25] 6Labs, 5Patch-For-Review: Create a catchpoint check for labs puppetmaster - https://phabricator.wikimedia.org/T107456#1631817 (10Andrew) This is done. Yuvi, check my work? [19:31:50] 6Labs: Setup checkpoint check for private DNS - https://phabricator.wikimedia.org/T107453#1631995 (10Andrew) I've verified that socket.gethostbyname_ex errors out in absence of upstream dns, even if called on the current host's fqdn. So this check should be as simple as socket.gethostbyname_ex(socket.getfqdn()) [19:45:49] 6Labs: Have checkpoint check for public labs DNS - https://phabricator.wikimedia.org/T107451#1632077 (10Andrew) 5Open>3declined a:3Andrew I don't think this is useful... if public dns fails then /all/ of our catchpoint alerts will fail. [19:45:50] 6Labs, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Have catchpoint checks for all labs services (Tracking) - https://phabricator.wikimedia.org/T107058#1632080 (10Andrew) [19:59:44] 6Labs, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Have catchpoint checks for all labs services (Tracking) - https://phabricator.wikimedia.org/T107058#1632152 (10Andrew) [19:59:44] 6Labs: Have a checkpoint check for labs proxies - https://phabricator.wikimedia.org/T107450#1632150 (10Andrew) 5Open>3Resolved a:3Andrew [20:00:02] 6Labs, 3Labs-Sprint-108, 3Labs-Sprint-109, 3labs-sprint-113: Have catchpoint checks for all labs services (Tracking) - https://phabricator.wikimedia.org/T107058#1485532 (10Andrew) [20:06:55] 6Labs, 7Shinken: Newly created instance is in ERROR state - https://phabricator.wikimedia.org/T111988#1632208 (10Andrew) I'm in the process of trying to convince the scheduler to fill up virt node harddrives up to 100%. The disk checks are a bit erratic, though -- it'll run twice in a row, the first time decl... [20:18:50] 6Labs: Have checkpoint check for public labs DNS - https://phabricator.wikimedia.org/T107451#1632254 (10coren) >>! In T107451#1632077, @Andrew wrote: > I don't think this is useful... if public dns fails then /all/ of our catchpoint alerts will fail. I think that's a bug - is there a //requirement// to have che... [20:19:49] 6Labs: Have checkpoint check for public labs DNS - https://phabricator.wikimedia.org/T107451#1632257 (10yuvipanda) If we want to fix that we can make all the checks use IPs instead of DNS from catchpoint's side. [20:29:19] 6Labs: Have checkpoint check for public labs DNS - https://phabricator.wikimedia.org/T107451#1632291 (10coren) I think that would be a good idea - being able to distinguish between "labs broke" and "DNS has failed" is important to direct recovery efforts. [20:31:49] 6Labs, 6Discovery, 7Elasticsearch: Replicate production elasticsearch indices to labs - https://phabricator.wikimedia.org/T109715#1632310 (10demon) >>! In T109715#1627648, @yuvipanda wrote: > We'd also need to make sure that deleted / revdelled content doesn't show up. Deleted content disappears from the pr... [20:42:42] Hi, what happened to https://tools.wmflabs.org/ ?? [20:43:08] Gerghwww: What're you trying to locate? :) [20:43:28] what's wrong with it? [20:43:41] i still see the list of tools [20:43:49] oh.. or not [20:43:54] nothing special... but the page is skipped after "addbot" [20:44:07] yea, just scrolled down .. eh [20:44:11] So it is :P [20:44:15] that's odd [20:44:27] take a look at the source code. stopping at line 156 [20:47:03] Seems to be an issue somewhere located here: http://git.wikimedia.org/blob/labs%2Ftoollabs.git/f275d97d7010b3bb2709d4a5211e2530df178447/www%2Fcontent%2Flist.php#L101 [20:47:28] Presumably throwing an exception =] [20:50:02] reload the page , guys [20:50:12] Coren fixed [20:50:42] Yeah, the actual webservice was cray-cray. [20:51:41] Gerghwww: works again [20:52:00] looks good. thx [21:01:39] hello yuvipanda. [21:01:47] shall we chat about the survey some time today yuvipanda? [21:02:51] leila: yes. are you in the office? [21:02:55] leila: I also responed on the etherpad! [21:03:00] I'm working remotely yuvipanda. [21:03:08] ah! lemme check your responses first then, yuvipanda. [21:03:58] 6Labs, 10Tool-Labs: Make tools-mail route mail for @tools-*.pmtpa.wmflabs correctly - https://phabricator.wikimedia.org/T63484#1632479 (10scfc) I set `/etc/mailname` to `tools.wmflabs.org`, restarted `gridengine-master`, submitted a job on `toolsbeta-master` and `qstat -j 4` still gave: ``` […] mail_list:... [23:40:04] (03CR) 10Yuvipanda: "Sorry about the delay!" (032 comments) [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/231697 (https://phabricator.wikimedia.org/T99115) (owner: 10Niedzielski) [23:57:49] !log cvn Restore localised messages (nl) for CVNBot12 [23:57:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cvn/SAL, Master