[00:04:08] i see it in Special:NovaServiceGroup but not in tools-bastion-03 shell [00:04:32] give it some time [00:17:21] well i created it 8 hours ago [00:17:34] 06Labs, 10Tool-Labs, 06WMF-Legal: Install unrar on Tool Labs - https://phabricator.wikimedia.org/T151794#3170195 (10zhuyifei1999) 05Open>03declined Sure, we [[https://commons.wikimedia.org/wiki/Commons:Bots/Requests/Embedded_Data_Bot_(aggressive_algorithm)|are working towards this direction]]. [00:18:02] i mean i made the request 8 hours ago [00:21:33] wait did you request for permission to get shell access? [00:29:49] where do we request it? [00:36:53] when i created other tools, it was instant and i got shell access right away, so i don't know if there's a bug or something with my request today. [00:38:33] or some kind of maintenance [00:38:45] that's why i'm asking :P i don't mind waiting [01:50:34] PROBLEM - Free space - all mounts on tools-docker-builder-04 is CRITICAL: CRITICAL: tools.tools-docker-builder-04.diskspace.root.byte_percentfree (<10.00%) [02:06:36] thib: I'm a bit confused about your bug report here. Did you get an error and then resubmit and get no error? [02:12:20] thib: I do see the LDAP records for robokobot have been created with user thibaut120094 as a maintainer [02:13:40] but it looks like the /data/project/robokobot directory has not been created. let's open a bug [02:16:53] 06Labs, 10Tool-Labs: /data/project/robokobot directory not created for tool robokobot - https://phabricator.wikimedia.org/T162652#3170295 (10bd808) [02:19:54] 06Labs, 10Tool-Labs: /data/project/robokobot directory not created for tool robokobot - https://phabricator.wikimedia.org/T162652#3170309 (10Peachey88) [02:20:35] PROBLEM - Free space - all mounts on tools-docker-builder-04 is CRITICAL: CRITICAL: tools.tools-docker-builder-04.diskspace.root.byte_percentfree (<10.00%) [02:35:43] !log tools Restarted maintain-kubeusers on tools-k8s-master-01 [02:35:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:39:37] 06Labs, 10Tool-Labs, 15User-bd808: /data/project/robokobot directory not created for tool robokobot - https://phabricator.wikimedia.org/T162652#3170319 (10bd808) 05Open>03Resolved a:03bd808 ``` $ journalctl -u maintain-kubeusers -f -- Logs begin at Mon 2017-04-10 15:18:37 UTC. -- ^C $ systemctl restart... [02:40:28] thib: It should be fixed now. the process that creates the homedirs got stuck somehow. sadly I don't see any error logs explaining what went wrong. [02:49:39] 10Tool-Labs-tools-Other, 06Community-Tech-Tool-Labs, 07Epic: Convert all Labs tools to use cdnjs for static libraries and fonts - https://phabricator.wikimedia.org/T103934#3170327 (10bd808) This would be a good project for #tool-labs-standards-committee to get involved in to help with outreach and documentat... [04:51:19] bd808: it works now, thanks :) [04:51:50] yw. glad I saw your shout for help [04:59:28] 06Labs, 10Tool-Labs, 15User-bd808: Record hangout of basic Tool Labs access and use - https://phabricator.wikimedia.org/T162654#3170364 (10bd808) [05:01:44] 06Labs, 10Tool-Labs, 15User-bd808: Record hangout of basic Tool Labs access and use - https://phabricator.wikimedia.org/T162654#3170378 (10bd808) @Jane023 feel free to edit the checklist I've started in the task description to add/remove things that you think we should cover. Also, what operating system are... [06:50:08] 06Labs, 10Tool-Labs, 15User-bd808: Record hangout of basic Tool Labs access and use - https://phabricator.wikimedia.org/T162654#3170530 (10Jane023) [06:52:19] PROBLEM - Puppet run on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:27:20] RECOVERY - Puppet run on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [07:35:33] PROBLEM - Free space - all mounts on tools-docker-builder-04 is CRITICAL: CRITICAL: tools.tools-docker-builder-04.diskspace.root.byte_percentfree (<10.00%) [10:04:01] 06Labs, 10Tool-Labs-tools-stewardbots, 06Discovery, 10Wikimedia-Portals, 10wikitech.wikimedia.org: Access changes for MarcoAurelio - https://phabricator.wikimedia.org/T162541#3170754 (10MarcoAurelio) In these days I've received several emails from people asking me not to continue with this request, that... [10:27:49] 06Labs, 10Monitoring, 10Shinken: Create an Icinga instance for monitoring labs instances - https://phabricator.wikimedia.org/T162629#3170786 (10hashar) [10:31:47] 06Labs, 10Monitoring, 10Shinken: Create an Icinga instance for monitoring labs instances - https://phabricator.wikimedia.org/T162629#3170790 (10hashar) The labs instances have some monitoring based on Shinken at http://shinken.wmflabs.org/ I think its configuration is auto generated from LDAP. It provides so... [10:39:17] 06Labs, 10Monitoring, 10Shinken: Create an Icinga instance for monitoring labs instances - https://phabricator.wikimedia.org/T162629#3170795 (10Paladox) Yep. icinga2 is really easy to start monitoring a host. You run the a command enter the hostname and then it will monitor everything. [12:12:04] 10Tool-Labs-tools-Other: s52584 is taking over half of the available connections on toolsdb - https://phabricator.wikimedia.org/T162677#3171013 (10jcrespo) [13:04:36] chasemp andrewbogott hi, i deleted an instance this morning and it is still showing as in deleting state a few hours later. [13:04:43] I wanted to re create the instance [13:04:58] as i messed up the puppetmaster so i wanted to re create that instance [13:05:04] the instance is puppet-paladox [13:09:22] chasemp andrewbogott also, i am wondering if a new project could be created called "icinga" or icinga2 please? It says https://wikitech.wikimedia.org/wiki/Help:Getting_Started#Create_Projects there i could ask on irc. [13:09:59] it's for T162629 [13:09:59] T162629: Create an Icinga instance for monitoring labs instances - https://phabricator.wikimedia.org/T162629 [13:17:33] paladox: that's not the format for a new project request, and it doesn't outline the intention. We currently have a shinken project that is in theory used for this same purpose. It seems more practical to set up an icinga2 instance there and guage interest and feasibility as the labs admin folks won't have time to manage this. It would be entirely your deal. [13:17:59] oh ok [13:18:42] chasemp how can i create a incinga2 instance in the shinken project. [13:18:45] please [13:19:23] paladox: change the request to asking for becoming an admin in teh shinken project and describe what you want to do and we'll get to it this week I imagine [13:19:31] oh ok [13:19:55] thanks [13:19:59] I knew people would love icinga. :P [13:20:18] the incinga2 ui is very nice. [13:20:30] + better account management [13:22:32] 06Labs, 10Monitoring, 10Shinken: Admin request for user paradox in the project shinken - https://phabricator.wikimedia.org/T162629#3171312 (10Paladox) [13:22:35] chasemp something like ^^ [13:22:55] paladox: I don't have any problem with you wanting ot manage a general use icinga server, and it sounds neat but you probably want to get some other maintainers up front if you can [13:23:06] Oh [13:23:09] 10Tool-Labs-tools-Other: s52584 is taking over half of the available connections on toolsdb - https://phabricator.wikimedia.org/T162677#3171315 (10PeterBowman) Hi, @jcrespo. Per your comment at T138283#2412013, I disabled connection pooling only for the replicas and -somehow- convinced myself that the tools-db c... [13:25:29] chasemp also it seems that puppet-paladox instance hasen't been deleted. It is showing in the ui as deleting but it has said that for the last few hours [13:25:42] i would like to recreate it [13:25:49] as i messed up puppet master. [13:27:55] paladox: can you file a quick task on that and we'll look today? [13:27:59] assign to me is ok [13:28:14] ok thanks [13:28:55] 06Labs, 10Monitoring, 10Shinken: Admin request for user paradox in the project shinken - https://phabricator.wikimedia.org/T162629#3171327 (10chasemp) @paladox wants to take a shot at managing a shared icinga2 instance and I suggested doing so in the existing shinken project would be fine. [13:29:21] 10Tool-Labs-tools-Other: s52584 is taking over half of the available connections on toolsdb - https://phabricator.wikimedia.org/T162677#3171328 (10jcrespo) 05Open>03Resolved a:03PeterBowman > Excuse my (severe) lack of understanding No, clearly it was my fault for not being 100% clear. Also, the toolsdb s... [13:31:06] 06Labs: Deleting puppet-paladox instance is not deleting it immediately - https://phabricator.wikimedia.org/T162691#3171332 (10Paladox) [13:31:12] chasemp ^^ done :) [13:31:29] 06Labs: Deleting puppet-paladox instance is not deleting it - https://phabricator.wikimedia.org/T162691#3171345 (10Paladox) [13:33:10] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: OpenStack instances stuck in deletion state - https://phabricator.wikimedia.org/T162529#3171346 (10chasemp) a:05chasemp>03Andrew @andrew was looking into this. Another instance was reported on IRC in {T162691}... [13:34:55] our documentation is horrible [13:35:15] I want to check how to connect to replicas on labs, I go to https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Connection_handling_policy [13:35:37] and there is a brick of text with very little practical information [13:35:48] 06Labs: Deleting puppet-paladox instance is not deleting it - https://phabricator.wikimedia.org/T162691#3171359 (10chasemp) [13:35:50] 06Labs, 10Labs-Infrastructure: labvirt1002 ignoring some messages - https://phabricator.wikimedia.org/T162640#3171360 (10chasemp) [13:35:52] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: OpenStack instances stuck in deletion state - https://phabricator.wikimedia.org/T162529#3171358 (10chasemp) [13:36:05] 06Labs: Deleting puppet-paladox instance is not deleting it - https://phabricator.wikimedia.org/T162691#3171332 (10chasemp) p:05Triage>03Normal [13:36:07] I just want to connect, tell me first how, then you can tell me the least important stuff [13:36:39] 06Labs: Deleting puppet-paladox instance is not deleting it - https://phabricator.wikimedia.org/T162691#3171332 (10chasemp) a:05chasemp>03Andrew I am pretty sure a restart of nova-compute will fix this but I'm going to let it be so you can take a look @Andrew [13:36:51] 06Labs, 10Labs-Infrastructure: labvirt1002 ignoring some messages - https://phabricator.wikimedia.org/T162640#3169799 (10chasemp) p:05Triage>03High [13:37:22] jynus: no doubt docs are the worst thing we do right now [13:38:32] better docs == less questions == more time for maintenance [13:38:54] I will try to give it a go at that page [13:39:21] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox in the project shinken - https://phabricator.wikimedia.org/T162629#3171371 (10Paladox) [13:39:42] jynus: I think we are all on the same wave length, it's a negative feedback loop of needing docs and spending time doing support atm, it's basically our #1 wishlist item outside of things that have to be done to keep the lights on [13:39:51] also 2 ways of doing the same may seem like a shortcut [13:40:03] but we should stop doing that [13:41:02] there is one supported way (and that can be community decided), but the rest should not work [13:41:33] I agree whole heartedly [13:42:08] what is past experience on discussing ideas with thousands of people? [13:42:22] as in, the list is not used by everyone [13:42:34] neither is IRC [13:42:57] are polls used? [13:43:06] as in, do people respond to them? [13:43:26] I'm not sure I understand the question [13:43:39] are you asking, what's the best way to solicit feedback from the labs community? [13:43:44] yes [13:44:01] the cloud community [13:44:13] we do a yearly survey (did you not know that?) and it's got a reasonable response rate to these things comparitvely [13:44:22] that's more or less how we are sure that docs are the burning needs [13:44:23] when that happens? [13:44:25] need [13:44:42] annually for last 2 years at this point [13:44:59] so there will be one in 2017? [13:45:05] at some point, right? [13:45:07] right [13:45:08] yep [13:45:36] jynus: we put it in our annual plan that we aim for a better "docs help and I can find what I need" response percentage [13:46:28] jynus: so generally labs-l is the best thing atm (TM) unless it's an admin broadcast then labs-announce but it's true that's not even close to all [13:46:48] we've resorted to hitting people directly in teh past in case of need but there isn't a clear best practice atm [13:46:51] and it makes sense [13:47:03] if I make a tool, and it just works, I do not want to touch it again [13:47:11] because it works [13:47:36] I proposed briefly that registration on wikitech comes with labs-announce subscription [13:47:44] but we need to figure out what we want to do honestly [13:47:58] too many cracks in the system as-is [13:48:24] but on the other side, do backwards incompatible things can have a huge payoff [13:52:01] (03PS1) 10Gehel: maps - add new dummy passwords to follow refactoring to role / profile [labs/private] - 10https://gerrit.wikimedia.org/r/347610 [13:53:55] (03CR) 10Gehel: [C: 032] maps - add new dummy passwords to follow refactoring to role / profile [labs/private] - 10https://gerrit.wikimedia.org/r/347610 (owner: 10Gehel) [13:54:03] (03CR) 10Gehel: [V: 032 C: 032] maps - add new dummy passwords to follow refactoring to role / profile [labs/private] - 10https://gerrit.wikimedia.org/r/347610 (owner: 10Gehel) [14:07:51] 06Labs: Deleting puppet-paladox instance is not deleting it - https://phabricator.wikimedia.org/T162691#3171452 (10Andrew) 05Open>03Resolved Yep, this is the same issue that 1002 is having all the time. This instance should be cleared up now. [14:07:54] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: OpenStack instances stuck in deletion state - https://phabricator.wikimedia.org/T162529#3171454 (10Andrew) [14:08:07] 06Labs: Deleting puppet-paladox instance is not deleting it - https://phabricator.wikimedia.org/T162691#3171455 (10Paladox) Thanks :) [14:15:49] !log tools emptied /srv/pbuilder to make space on tools-docker-04 [14:15:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:25:35] RECOVERY - Free space - all mounts on tools-docker-builder-04 is OK: OK: All targets OK [14:26:42] andrewbogott: hey thanks man [14:28:04] chasemp: I'm not 100% sure that's the right thing to do, but there were log messages from yuvi where he 'cleaned out' that dir to make space [14:28:27] notably, that particular host, -04, doesn't have a /srv partition, /srv is just on / [14:28:33] so that's probably a mistake... [14:28:55] hm yeah right [14:29:03] Lemme check and see if they're all like that [14:29:21] oh! there aren't any others [14:29:35] so if we're feeling brave sometime we should rebuild that one with a proper /srv [14:32:46] 06Labs, 10Labs-Infrastructure: labvirt1002 ignoring some messages - https://phabricator.wikimedia.org/T162640#3171500 (10Andrew) After sleeping on this, it occurred to me to look and see if there were extra nova-compute processes in contention for messages. ``` root@labvirt1002:~# service nova-compute stop n... [14:35:19] agreed [14:38:34] 06Labs, 10Labs-Infrastructure: labvirt1002 ignoring some messages - https://phabricator.wikimedia.org/T162640#3171517 (10Andrew) I only see one nova-compute proc per labvirt on the others. I will repool labvirt1002 when I have time to sit and watch it for a few hours. [14:52:51] chasemp or andrewbogott hi, after applying the puppetmaster class and running puppet agent. It seems to fail restarting apache with this error [14:52:52] Apr 11 14:51:42 puppet-paladox3 apache2[18257]: AH00526: Syntax error on line 9 of /etc/apache2/sites-enabled/50-puppetmaster-wikimedia-org.conf: [14:52:52] Apr 11 14:51:42 puppet-paladox3 apache2[18257]: SSLCertificateFile: file '/var/lib/puppet/server/ssl/certs/puppet-paladox3.git.eqiad.wmflabs.pem' does not exist or is empty [14:53:02] It seems puppet is not generating the cert [14:53:32] paladox: there is a task open that puppet standalone is broken atm or has been [14:53:38] but otherwise I'm not sure [14:53:39] oh [14:53:42] oh [14:53:55] hm, that looks like a different issue than the one I knew about... [14:53:56] ah maybe not the same issue [14:53:57] https://phabricator.wikimedia.org/T162462 [14:54:01] but yeah, it was broken last time I looked [14:54:20] oh [14:54:58] I'll try to take a look sometime today, see what the latest behavior is [14:54:59] andrewbogott chasemp rerunning puppet i now get the error specified on task [14:55:03] (mis)behavior [14:55:30] puppet-master : Depends: puppet (= 4.8.2-3~bpo8+1) but it is not going to be installed [14:56:49] Do we by chance enable jessie-backports on new puppetmasters? [14:58:48] paladox: best I can say is read that task as I'm not sure what the status is [14:58:54] ok [15:00:18] 06Labs, 06Operations: Undo special tools-home and tools-project share definitions for NFS - https://phabricator.wikimedia.org/T161834#3171608 (10madhuvishy) > This should really be done in two parts: > > - refactoring so that the paths used in tools for the share links are common to the rest of the projects Th... [15:09:02] 06Labs, 06Operations: Standalone puppet masters are broken (uninstallable packages) - https://phabricator.wikimedia.org/T162462#3171696 (10Paladox) Workaround is download the debs from https://packages.debian.org/jessie/all/puppet/download [15:09:32] andrewbogott chasemp workaround is to manually download the debs from debian [15:09:36] as i have done it [15:09:42] puppet seems to pass [15:15:50] 06Labs, 10Labs-Infrastructure, 07LDAP: Investigate and document NSS LDAP interactions - https://phabricator.wikimedia.org/T162701#3171741 (10bd808) [15:17:54] andrewbogott it seems when i try to do ssh jenkins-slave-01 or ssh phabricator from the gerrit-test it just hangs. It seems it carn't connect to the host. But doing ssh puppet-paladox3 it returns a permission denied which is expected. [15:18:16] debug2: ssh_connect: needpriv 0 [15:18:16] debug1: Connecting to jenkins-slave-01 [10.68.22.0] port 22. [15:18:22] stuck on that [15:18:39] 06Labs, 07LDAP: Document LDAP structure unambiguously - https://phabricator.wikimedia.org/T138151#3171792 (10bd808) [15:18:45] 06Labs, 10Labs-Infrastructure, 07LDAP: Investigate and document NSS LDAP interactions - https://phabricator.wikimedia.org/T162701#3171791 (10bd808) [15:19:00] 06Labs, 06Operations: Undo special tools-home and tools-project share definitions for NFS - https://phabricator.wikimedia.org/T161834#3171795 (10chasemp) By `refactoring so that the paths used in tools for the share links are common to the rest of the projects` I meant this :) > - Currently the mount paths fo... [15:21:29] 06Labs, 07Documentation, 07LDAP: Document LDAP structure unambiguously - https://phabricator.wikimedia.org/T138151#3171815 (10bd808) [15:28:19] PROBLEM - Puppet run on tools-exec-1433 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:29:18] strange jenkins-slave-01 can reach gerrit-test [15:29:27] but gerrit-test carn't reach jenkins-slave-01 [15:43:35] PROBLEM - Puppet run on tools-exec-1434 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:58:17] RECOVERY - Puppet run on tools-exec-1433 is OK: OK: Less than 1.00% above the threshold [0.0] [15:58:33] RECOVERY - Puppet run on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [16:13:48] 06Labs, 10Datasets-General-or-Unknown, 10Dumps-Generation, 06Operations, 10hardware-requests: Eqiad: Hardware request for labstore1006/7, dataset1002/3 - https://phabricator.wikimedia.org/T161311#3128341 (10RobH) 05Open>03stalled a:03RobH I'm working on quotes in the #procurement S4 space for this... [16:17:29] 06Labs: Request creation of Discourse for Wiki Asian Month labs project - https://phabricator.wikimedia.org/T162134#3153506 (10chasemp) I am opposed to spaces and capticals in a project name in openstack :) but `discourse-for-wiki-asian-month` seems more than fine to create. +1 [16:21:21] 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3172099 (10chasemp) [16:24:06] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3172113 (10Paladox) [16:26:17] 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3172139 (10chasemp) storage is not quota'd in the same fashion as RAM or CPU. I think what you guys want is described in https://wikitech.wikimedia.org/wiki/Help:... [16:27:54] 06Labs: Request creation of Discourse for Wiki Asian Month labs project - https://phabricator.wikimedia.org/T162134#3153506 (10madhuvishy) @fantasticfears We're happy to create this project. One question, would you like the project name to be as long as discourse-for-wiki-asian-month, or would you prefer somethi... [16:30:19] 06Labs, 06Operations: Standalone puppet masters are broken (uninstallable packages) - https://phabricator.wikimedia.org/T162462#3172165 (10akosiaris) The upgrade went fine on all the jessie hosts, now looking into how easy is to do trusty as well. [16:46:13] !log tools added exec nodes tools-exec-1430, 31, 32, 33, 34. [16:46:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:47:33] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: OpenStack instances stuck in deletion state - https://phabricator.wikimedia.org/T162529#3172219 (10hashar) Well I guess this issue is fixed since labvirt1002 is out of the pool and that is being debugged via T162640... [16:50:05] 06Labs, 10Labs-Infrastructure: labvirt1002 ignoring some messages - https://phabricator.wikimedia.org/T162640#3172234 (10hashar) Would it make sense to add an Icinga probe to ensure there is only one nova-compute process? I did that for Jenkins when it sometime managed to spawn twice: ``` lang=ruby nrpe::... [16:51:04] PROBLEM - Host tools-exec-1433 is DOWN: CRITICAL - Host Unreachable (10.68.22.87) [17:04:09] (03CR) 10Dereckson: [C: 04-1] Add package.json and README.md (034 comments) [labs/tools/Wikimedia-Emoji-Bot] - 10https://gerrit.wikimedia.org/r/347304 (owner: 10D3r1ck01) [17:04:12] andrewbogott: ^ is this known? [17:04:18] Host tools-exec-1433 is DOWN [17:06:12] madhuvishy: yeah, I couldn't get that one to queue properly so I'm rebuilding it [17:06:18] andrewbogott: okay :) [17:25:45] 06Labs, 06Operations: Investigate alternative RAID strategies for labstore1001/2 - https://phabricator.wikimedia.org/T162090#3172457 (10chasemp) If performance allows it would be great to get `RAID 50` esp since this is a 2 node HA cluster. We could finally do the beginnings of real (but limited) user backups. [17:30:38] 06Labs: iowait alerts for grid engine nodes - https://phabricator.wikimedia.org/T161898#3172479 (10chasemp) We did reintroduce these and it seems to have had a positive effect. We are still seeing the alerts and it does seem like a systemic issue across. If I had to guess we have at least a few factors here bu... [17:32:23] 06Labs, 10Tool-Labs, 10InternetArchiveBot: tools.iabot is overloading the grid by running too many workers in parallel - https://phabricator.wikimedia.org/T161951#3172482 (10chasemp) p:05Triage>03Normal >>! In T161951#3158193, @Cyberpower678 wrote: > I wasn't aware that big brother can work on jobs other... [17:33:20] 06Labs: iowait alerts for grid engine nodes - https://phabricator.wikimedia.org/T161898#3172485 (10chasemp) AFAICT this is the check that is alerting: ```# Check that no nodes have more than 50% iowait (warn) / 80% iowait (crit) for over 5 minutes define service { check_command check_graphi... [17:40:30] PROBLEM - Puppet run on tools-exec-1430 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:55:30] RECOVERY - Puppet run on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0] [18:11:35] Hmm i wonder why gerrit-mysql can access gerrit-test but not phabricator. [18:12:14] Could it be related to depooling labvirt1002 [18:12:16] andrewbogott chasemp ^^ [18:12:24] can = cannot [18:12:29] woops [18:12:48] i meant i can access gerrit-test from gerrit-mysql but carn't access phabricator [18:12:53] from gerrit-mysql [18:12:57] those things do not seem at all related [18:13:05] ok [18:13:43] I could access jenkins-slave-01 yesturday. But today it seems it wont connect from gerrit-test. Is a connection broken somewhere? [18:14:36] I have no idea what you would mean but no I don't believe anything like that is currently happening [18:15:38] ok [18:16:17] i mean when trying to ssh from gerrit-test to jenkins-slave-01 it is not working. Yesturday that was working as i have a jenkins test setup which ssh into the instance. [18:16:51] generally you shouldn't be able to ssh from one machine to another, ssh-agent forwarding is considered a bad idea™ [18:18:10] paladox: 'it won't connect' is also extremely vague. [18:18:50] By "it won't connect" that i was meaning it gets stuck on ssh jenkins-slave-01 [18:19:00] again, that is extremely vague. [18:19:03] so i decided to do ssh jenkins-slave-01 -vvv which showd. [18:19:30] https://phabricator.wikimedia.org/P5246 [18:19:33] 06Labs, 06Operations: Standalone puppet masters are broken (uninstallable packages) - https://phabricator.wikimedia.org/T162462#3163952 (10jcrespo) I am commenting this here, please tell me if completely unrelated and I will create a new ticket: db1090 keeps failing to run puppet according to icinga since Apr... [18:19:56] ok, so the connection is blackholed. Check your firewall settings. [18:20:14] or rather, the firewall settings of the jenkins-slave-01 instance [18:20:28] Oh ok [18:21:43] But the strange part is jenkins-slave-01 can ssh gerrit-test. [18:21:49] root@jenkins-slave-01:/var/log# ssh gerrit-test [18:21:49] Permission denied (publickey). [18:22:33] no, that's not strange. Firewall rules are not symmetric. [18:23:31] oh [18:26:33] i've checked and port 22 is in there. [18:26:53] and that all the instances were using the default group [18:27:01] Ingress - TCP 22 (SSH) 0.0.0.0/0 [18:36:46] paladox: sudo tcpdump host 10.68.23.58 shows ssh connections from tools-bastion are received by the host. This means the firewall on the host is likely dropping the packets. [18:37:02] oh [18:37:54] 18:37:43.536933 ARP, Request who-has 10.68.17.229 tell tools-bastion-03.tools.eqiad.wmflabs, length 42 [18:38:35] https://en.wikipedia.org/wiki/Address_Resolution_Protocol [18:42:24] I'm not sure where the firewall is specified, but sudo iptables -L shows it's indeed the local firewall. [18:42:32] it seems that ssh on localhost [18:42:34] dosent work [18:42:38] but it works on gerrit-test [18:42:46] so maybe that could be it? [18:42:51] (where the firewall is specified --> which config file(s) determine the firewall settings) [18:43:28] oh phabricator localhost dosent work but jenkins-slave-01 does. [18:51:07] it shows 0 0 ACCEPT tcp -- * * 10.68.20.111 0.0.0.0/0 tcp dpt:22 on the phabricator instance [18:51:14] and the ip is for gerrit-mysql [18:51:27] but gerrit-mysql carn't access phabricator either [18:57:55] it works [18:57:57] after doing [18:57:58] iptables -A INPUT -p tcp -d 0/0 -s 0/0 --dport 22 -j ACCEPT [19:15:52] (03PS10) 10D3r1ck01: Add package.json and README.md [labs/tools/Wikimedia-Emoji-Bot] - 10https://gerrit.wikimedia.org/r/347304 [19:19:33] PROBLEM - Puppet run on tools-exec-1434 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:22:31] paladox: that won't survive a reboot (and maybe not even a puppet run) [19:22:50] oh [19:23:10] But the port should be active in horizion [19:23:57] there are two firewalls. One is in the openstack network layer, and is configured in Horizon. The other one is iptables running on the VM. [19:24:35] iptables is configured when the instance boots; it doesn't remember the commands you put in manually. [19:25:29] oh [19:54:32] RECOVERY - Puppet run on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [20:20:21] 06Labs, 06Operations: Standalone puppet masters are broken (uninstallable packages) - https://phabricator.wikimedia.org/T162462#3173260 (10Andrew) Fixing the puppetmaster issue requires changing (well, removing) the pinning in the puppet manifest, right? Is there a reason not to do that right away? [20:29:33] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3173271 (10Dzahn) >>! In T162629#3170790, @hashar wrote: > We used to have an Icinga instance that created additional services monitoring In this particular case... [21:02:15] 06Labs, 06Operations, 10procurement: eqiad: (2) hardware access request for labvirt1019 and labvirt1020 (refresh) - https://phabricator.wikimedia.org/T162486#3173351 (10RobH) [21:06:06] !log tools.iabot Shutdown 2 guiworker.php processes [21:06:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.iabot/SAL [21:08:05] 06Labs, 10Tool-Labs, 10InternetArchiveBot: tools.iabot is overloading the grid by running too many workers in parallel - https://phabricator.wikimedia.org/T161951#3173369 (10bd808) 05Open>03Resolved I setup this .bgibrotherrc file, commented out all of the cron job lines, and killed the worker2 and worke... [21:21:38] Change on 12www.mediawiki.org a page Wikimedia Labs was modified, changed by BDavis (WMF) link https://www.mediawiki.org/w/index.php?diff=2442978 edit summary: [+105] Mark page as [[Template:Historical]] [21:37:06] bd808: historical? [21:37:48] Sagan: https://www.mediawiki.org/wiki/Wikimedia_Cloud_Services_team [21:38:09] ah, ok [21:38:10] the labs team is done. that project page was an obsolete wreck [21:38:30] we are not shutting down the OpenStack cloud :) [21:40:01] !log tools.precise-tools Replaced with static page [21:40:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.precise-tools/SAL [21:40:15] so it is the same as before with another name, or is there another difference? [21:40:44] a good question. lets call it "same plus" [21:41:06] starting in July we will have our own budget for the first time [21:41:28] ah :) [21:41:48] we are also now an official team reporting to the CTO instead of a project inside techops [21:42:27] we will be kicking off some renaming discussions soon too to try and fix the https://wikitech.wikimedia.org/wiki/Labs_labs_labs [21:42:52] heh :D [21:43:12] but we are going to keep doing all the things we have been doing to run the OpenStack cluster and the Tools project and DB replicas and ... all that jazz [21:43:20] Responsibilities point 5: fix labs labs labs and beta beta beta :D [21:43:43] beta is someone else's problem ;) [22:21:05] PROBLEM - Puppet run on tools-exec-1432 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:21:35] 06Labs, 10Wikidata, 10wikitech.wikimedia.org: Lift IP cap for 186.179.xxx.xx (ongoing hackathon) - https://phabricator.wikimedia.org/T162751#3173532 (10Daniel_Mietchen) [22:29:42] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Add a server-side caching service for the new XTools - https://phabricator.wikimedia.org/T161057#3173585 (10kaldari) 05Open>03Resolved [22:30:29] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: XTools: Clean up "Pages created" tool - https://phabricator.wikimedia.org/T162752#3173587 (10MusikAnimal) [22:30:43] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: XTools: Clean up "Pages created" tool - https://phabricator.wikimedia.org/T162752#3173560 (10MusikAnimal) [22:30:44] 10Tool-Labs-tools-Xtools, 06Community-Tech: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#3173590 (10MusikAnimal) [22:31:04] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Fix caching problems with XTools - https://phabricator.wikimedia.org/T162753#3173591 (10kaldari) [22:31:11] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Fix caching problems with XTools - https://phabricator.wikimedia.org/T162753#3173603 (10kaldari) p:05Triage>03Normal [22:37:32] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Create XTools API with namespace endpoint, using JS to update to namespace selector - https://phabricator.wikimedia.org/T162754#3173629 (10MusikAnimal) [22:43:13] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Create XTools API with namespace endpoint, using JS to update to namespace selector - https://phabricator.wikimedia.org/T162754#3173667 (10MusikAnimal) Relevant commit: https://github.com/x-tools/xtools-rebirth/commit/65577854b2ef7250e60d3e14930dd7816ef9d8af... [22:44:27] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Rrajasek95 was created, changed by Rrajasek95 link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Rrajasek95 edit summary: Created page with "{{Tools Access Request |Justification=To create open source instructional software for Wikiversity, Wiktionary, Wikipedia, and Moodle proposals |Completed=false |User Name=Rra..." [22:44:42] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: XTools: Clean up "Pages created" tool - https://phabricator.wikimedia.org/T162752#3173672 (10MusikAnimal) Moving back to "In Development" until we resolve T162753 [22:44:50] 10Tool-Labs-tools-Other, 06Community-Tech-Tool-Labs, 07Epic: Convert all Labs tools to use cdnjs for static libraries and fonts - https://phabricator.wikimedia.org/T103934#3173676 (10Krinkle) [22:45:25] 10Tool-Labs-tools-Xtools, 06Community-Tech: [Epic] Rewrite XTools: Articleinfo - https://phabricator.wikimedia.org/T157602#3173678 (10MusikAnimal) [22:45:29] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Create XTools API with namespace endpoint, using JS to update to namespace selector - https://phabricator.wikimedia.org/T162754#3173677 (10MusikAnimal) [22:45:51] 10Tool-Labs-tools-Xtools, 06Community-Tech: [Epic] Rewrite XTools: Articleinfo - https://phabricator.wikimedia.org/T157602#3010525 (10MusikAnimal) [22:45:54] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Create XTools API with namespace endpoint, using JS to update to namespace selector - https://phabricator.wikimedia.org/T162754#3173629 (10MusikAnimal) [22:46:03] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Create XTools API with namespace endpoint, using JS to update to namespace selector - https://phabricator.wikimedia.org/T162754#3173629 (10MusikAnimal) [22:46:04] 10Tool-Labs-tools-Xtools, 06Community-Tech: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#3173686 (10MusikAnimal) [22:46:22] 10Tool-Labs-tools-Xtools, 06Community-Tech: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#2870106 (10MusikAnimal) [22:47:37] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Fix caching problems with XTools - https://phabricator.wikimedia.org/T162753#3173689 (10MusikAnimal) [22:47:38] 10Tool-Labs-tools-Xtools, 06Community-Tech: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#2870106 (10MusikAnimal) [22:56:05] RECOVERY - Puppet run on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [23:48:07] 10Tool-Labs-tools-Pageviews, 13Patch-For-Review: Add ability to query for legacy pageviews for projects - https://phabricator.wikimedia.org/T149358#3173810 (10Nuria)