[00:10:07] 06Labs, 10Labs-Infrastructure, 10Monitoring, 13Patch-For-Review: keystone: Monitor existence and membership for certain projects and accounts - https://phabricator.wikimedia.org/T152708#2866830 (10Andrew) [01:45:29] PROBLEM - Puppet run on tools-worker-1017 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:03:49] PROBLEM - Free space - all mounts on tools-worker-1017 is CRITICAL: CRITICAL: tools.tools-worker-1017.diskspace._var_lib_docker.byte_percentfree (No valid datapoints found)tools.tools-worker-1017.diskspace.root.byte_percentfree (<100.00%) [03:50:02] 06Labs, 10Labs-Infrastructure, 10Monitoring, 13Patch-For-Review: keystone: Monitor existence and membership for certain projects and accounts - https://phabricator.wikimedia.org/T152708#2867174 (10Andrew) [03:50:16] 06Labs, 10Labs-Infrastructure, 10Monitoring, 13Patch-For-Review: keystone: Monitor existence and membership for certain projects and accounts - https://phabricator.wikimedia.org/T152708#2857711 (10Andrew) 05Open>03Resolved All done, including the 'maybe' [04:06:32] 06Labs, 10Horizon: Horizon has no logging or watchlist for changes to Puppet/Hiera data - https://phabricator.wikimedia.org/T153036#2867184 (10scfc) [04:09:43] 06Labs, 10Horizon: Horizon loses credentials every day - https://phabricator.wikimedia.org/T145703#2638694 (10scfc) @brion: I believe this is the same issue as T130621 which was fixed. Is Horizon still losing credentials every day for you? Otherwise, please merge this task into T130621. [09:48:14] hello! cannot ssh into etytree-b.etytree.eqiad.wmflabs [10:14:19] 10Tool-Labs-tools-Other, 06Wikisource, 07Bengali-Sites: OCR scripts need updating at tools labs by updating the "tesseract-ben" package - https://phabricator.wikimedia.org/T117711#2867649 (10MarcoAurelio) [12:37:30] 06Labs, 06Operations, 10video2commons: Consider mounting labs NFS labstore1003.eqiad.wmnet:/scratch for server-side uploads - https://phabricator.wikimedia.org/T153068#2868444 (10zhuyifei1999) [12:45:08] 06Labs, 06Operations, 10video2commons: Consider mounting labs NFS labstore1003.eqiad.wmnet:/scratch for server-side uploads - https://phabricator.wikimedia.org/T153068#2868471 (10Dereckson) Current upload volume is 2 to 30 Gb per week. A dedicated volume for v2c and a read only mount would decrease labs/pro... [12:53:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:27:01] 06Labs, 06Operations, 10video2commons: Consider mounting labs NFS labstore1003.eqiad.wmnet:/scratch for server-side uploads - https://phabricator.wikimedia.org/T153068#2868728 (10zhuyifei1999) [13:28:51] 06Labs, 06Operations, 10video2commons: Consider mounting labs NFS labstore1003.eqiad.wmnet:/scratch for server-side uploads - https://phabricator.wikimedia.org/T153068#2868444 (10zhuyifei1999) Security issue found; don't do this yet. [13:33:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:43:03] PROBLEM - Puppet staleness on tools-worker-1017 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [14:32:25] hi I don't seem to be able to ssh into etytree-b.etytree.eqiad.wmflabs [14:34:35] anyone can help? [14:36:13] Epantaleo: it also refuses me, did anything happen here recently? andrewbogott ^ mind trying when you're about [14:37:00] I changed some permissions (made epantaleo owner of some folders) that could cause this problem? [14:37:10] seems unlikely [14:37:44] thanks for your help... let me know if you have any idea [14:40:07] I would like andrewbogott to try his key before I do anything more drastic, I'm curious [14:43:14] ok [14:43:58] in the meantime how can i access as root (once I can access again...)? [14:44:36] you should be able to ssh in and sudo -i [14:53:46] maybe interactive console works for you chase? [14:59:46] ah mornin Krenair, yeah maybe so hm [15:00:02] thanks, will try that [15:04:09] hey [15:04:17] yeah, if that fails, sometimes salt still works [15:05:00] chasemp: I need to open doors 1111 and 8890, I'm cerating a security group [15:05:05] *creating [15:05:38] s/doors/ports Epantaleo but sure it's you're project no worries [15:05:39] do I select CIDR or security group when adding a rule? [15:05:43] this is local to your stuff [15:05:54] thanks. not sure how to do that [15:06:05] security group most likely but I'm not sure we have good docs on this atm [15:06:33] what should I do? [15:07:26] let's wait to see if we can get into the instance first so we can actually test the changes, no luck so far [15:07:35] but we can help you out [15:08:01] we usually use CIDR [15:08:20] what machines do you want to allow access from? [15:08:22] (03PS1) 10Giuseppe Lavagetto: Snakeoil secrets for all pools (needed for PCC) [labs/private] - 10https://gerrit.wikimedia.org/r/326964 [15:10:38] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Snakeoil secrets for all pools (needed for PCC) [labs/private] - 10https://gerrit.wikimedia.org/r/326964 (owner: 10Giuseppe Lavagetto) [15:16:06] * andrewbogott waking up slowly [15:17:00] morning andrew [15:17:18] chasemp: I am installing a virtuoso server and I need to access it from the etytree-b at localhost:8890 [15:17:50] the machines don't use labs networking to route to localhost [15:18:07] so no security groups are needed for that [15:19:08] Epantaleo: I can't reach that instance either. Is that VM valuable or can you recreate it? [15:19:38] hmmm...I would like not to kill it... [15:19:52] I'll see if I can access it on the console [15:19:57] thanks [15:20:13] When you changed ownership of things, did you do any recursive changes? Like chown -R etc. ? [15:20:20] yes [15:20:56] ( andrewbogott my root key failed me so I assumed that was unlikely the cause -- but maybe ) [15:21:08] ok, maybe something related to ssh got changed… ssh does security checks to make sure ownerships/permissions are correct on keys &c. [15:21:14] but I'm just guessing. [15:21:51] not with our ldap integration [15:22:28] you're right, I can't think of a chmod that would break root keys AND ldap keys [15:22:39] well, maybe our AuthorisedKeyCommand program thing is checked [15:22:47] you could -x the authkeys lookup script or something [15:22:54] yeah [15:29:53] yeah, the problem is 'bad ownership or modes for directory /usr' [15:30:01] for both root key and and ldap key [15:30:31] you're in with interactive console? [15:30:34] Epantaleo: what's your goal here? Changing ownership of things outside your homedir is unusual... [15:30:35] Krenair: yes [15:31:14] I tried to become root but coudn't manage to somehow [15:31:26] I'm installig a Virtuoso server in the /opt folder [15:32:29] I don't know that I'm going to be able to save this [15:32:36] and wasn't able to run scripts or create folders [15:32:48] you did chown -R epantaleo * in /? [15:32:58] yes- sorry... [15:33:09] ok, this is never going to work properly again [15:33:10] oh well, that is solves it [15:33:12] you'll need to rebuild it [15:33:23] ok [15:33:28] will do that [15:33:39] thanks for your help anyway, you tried [15:33:40] I don't think just changing things back to root will fix anything, since not everything there was owned by root previously [15:34:00] no it's hosed for sure after that [15:34:24] I've tried to recover from similar in past lives and it's never a reasonable path [15:49:34] chasemp: I'm lauching a new instance and I'm not sure what "Instance Boot Source" means, I can choose either "Boot from Image" or "Boot from snapshot" [15:50:39] Epantaleo: 'snapshot' doesn't work, just use 'image' [15:50:47] and select Jessie unless you know for sure you need Ubuntu [15:50:47] thanks! [15:50:55] ok [15:58:52] (03PS1) 10Andrew Bogott: Keystone: Publish credentials for novaobserver account [labs/private] - 10https://gerrit.wikimedia.org/r/326979 (https://phabricator.wikimedia.org/T150092) [15:59:19] Krenair: ^ is scary but I think I've been procrastinating long enough [15:59:30] probably [16:00:59] yesterday I added an icinga check that will alert if anyone (read: me) accidentally adds an actual functional role to the novaobserver account [16:01:18] (03CR) 10Andrew Bogott: [V: 032 C: 032] Keystone: Publish credentials for novaobserver account [labs/private] - 10https://gerrit.wikimedia.org/r/326979 (https://phabricator.wikimedia.org/T150092) (owner: 10Andrew Bogott) [16:25:42] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Celia sumbane was created, changed by Celia sumbane link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Celia_sumbane edit summary: Created page with "{{Tools Access Request |Justification=i will use it to write and edit |Completed=false |User Name=Celia sumbane }}" [17:18:58] (03PS1) 10Giuseppe Lavagetto: Rename the apaches.svc keys [labs/private] - 10https://gerrit.wikimedia.org/r/327002 [17:20:03] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Rename the apaches.svc keys [labs/private] - 10https://gerrit.wikimedia.org/r/327002 (owner: 10Giuseppe Lavagetto) [17:48:07] 06Labs, 06Operations: Initial OpenStack Neutron PoC deployment in Labtest - https://phabricator.wikimedia.org/T153099#2869590 (10chasemp) [17:49:17] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs - https://phabricator.wikimedia.org/T143349#2869602 (10chasemp) [18:20:39] heya [18:20:47] i'm trying to spawn some new instances in the math project [18:20:54] but they seem to be having some puppet cert error [18:20:56] and are getting stuck [18:21:02] [1;31mError: Could not request certificate: Connection refused - connect(2)[0m [18:21:25] looks like before it got stuck, it ran: [18:21:25] + puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff --waitforcert=10 --certname=hadoop001.math. --server= [18:21:57] andrewbogott: ^ ? [18:22:49] ottomata: they're using a local puppetmaster? [18:23:08] don't think so, they are brand new instances [18:23:35] i don't see any special project puppet configuration for that [18:23:37] looking... [18:24:25] andrewbogott: yeah, no special puppetmaster as far as I can tell [18:25:37] andrewbogott: https://horizon.wikimedia.org/project/instances/092e1e17-3e16-4e02-a70f-44295792122b/console [18:25:56] hostname: Name or service not known [18:25:56] + domain= [18:25:56] + fqdn=hadoop001.math. [18:25:56] ?? [18:26:25] i'm going to leave that one running, will hard reboot hadoop003 and see what happens [18:26:50] chasemp: I created a new instance etytree-2 and I'm trying to ssh into it [18:27:00] but I get channel 0: open failed: administratively prohibited: open failed [18:27:11] yeah, its dhcp is screwed up somehow [18:27:14] I'm looking [18:27:17] ok [18:32:40] andrewbogott: do you know why I cannot ssh? [18:32:56] Epantaleo: were you able to access the instance previously? [18:33:11] I created it some hours ago [18:33:16] i never accessed it [18:33:28] andrewbogott, hey didn't we make a DHCP change recently? [18:33:55] Epantaleo: ok, this is probably something on my end, give me a few minutes [18:34:00] and then you'll need to recreate [18:34:11] Krenair: yes, but I don't think this is related, it looks like we're hitting some arbitrary quota [18:34:11] oh ok thanks [18:34:19] ok [18:56:39] Krenair: ok, I see my dns entries in the designate records table... [18:56:46] https://www.irccloud.com/pastebin/umY1CSeL/ [18:57:02] But dig doesn't seem to see them... [18:57:08] is that your experience as well? [18:57:43] alex@alex-laptop:~$ dig +short dnsqt-1.testlabs.eqiad.wmflabs @labs-ns0.wikimedia.org [18:57:43] alex@alex-laptop:~$ [18:57:44] indeed [18:57:57] hm [18:59:01] andrewbogott, can you also copy the field names? [18:59:08] pdns says Unable to AXFR zone '68.10.in-addr.arpa' from remote '208.80.155.117:5354' (AhuException): GSQLBackend unable to feed record: Failed to execute mysql_query, perhaps connection died? Err=1: Out of range value for column 'id' at row 1 [18:59:37] field names are | id | created_at | updated_at | version | data | domain_id | managed | managed_resource_type | managed_ [19:00:24] well, that'd be the reverse entry [19:00:29] does it have one for eqiad.wmflabs too? [19:00:42] it's certainly the kind of error I'd be expecting in such a situation [19:01:10] 06Labs, 10Tool-Labs, 06Operations, 10Phabricator, and 2 others: Install Arcanist in toollabs::dev_environ - https://phabricator.wikimedia.org/T139738#2869974 (10Dereckson) 05Open>03Resolved Arcanist is available on main tools. server and we don't have a use case for the tasks grid to use arc, so we can... [19:03:02] Yes, also for eqiad.wmflabs [19:03:09] that's pdns being unable to query the db, right? [19:03:27] being unable to insert I think? [19:04:01] oh, hm, seems so [19:04:11] the drive isn't full... [19:05:57] andrewbogott, I don't get this though [19:06:08] the relevant entries exist in the database [19:06:14] so what is it trying to insert? [19:06:41] I think that inserting into designate is working — but then designate sends an update alert to pdns and pdns updates its local database (which is just a cache basically) [19:06:44] and I think it's that local update that's failing [19:06:56] ah [19:07:10] lemme take a look at that on the labtest machines [19:07:29] Do you know how I can ask this db to log activity? [19:08:02] you want to log queries going through mysql? [19:08:23] actually, I've always wondered the details of that. something something binlogs? the DBAs would know [19:08:56] I once had Sean pull a bunch of write queries out of the logs from the MW production database to repair things with [19:09:06] godog: the error from pdns is [19:09:07] Unable to AXFR zone '68.10.in-addr.arpa' from remote '208.80.155.117:5354' (AhuException): GSQLBackend unable to feed record: Failed to execute mysql_query, perhaps connection died? Err=1: Out of range value for column 'id' at row 1 [19:09:19] that's pdns trying to sync to a database table that's on the same host as pdns [19:09:22] labservices1001.wikimedia.org [19:09:39] so, on labtestservices2001 [19:09:47] records table [19:09:48] | id | int(11) | NO | PRI | NULL | auto_increment | [19:11:48] ah, I suppose the errors started recently (?) [19:12:09] yes [19:12:17] you know something about this godog? [19:12:50] I'm only guessing the id column overflowed [19:12:52] godog: yes, just today as best I can tell [19:12:57] seems possible, yeah [19:13:07] yes [19:13:14] and I'm assuming this is the record table [19:13:28] how many entries are in there andrewbogott? [19:14:06] 4229 [19:14:10] not exactly a round number [19:14:19] and max(id) ? [19:14:36] 2147483543 [19:14:43] and Auto_increment: 2147483648 [19:14:47] yeah that'll be integer overflow [19:14:48] from show table status [19:14:59] that looks like 2^31 [19:14:59] hm? 2147483543 is a round number? [19:15:02] ah [19:15:13] huh [19:15:22] in fact that auto_increment value is exactly 2^31 [19:15:57] https://dev.mysql.com/doc/refman/5.7/en/integer-types.html [19:16:30] I guess the question is, how have we got to 2.1 billion dns records? [19:16:44] CI? [19:17:16] guess we could change it to BIGINT [19:17:24] I can't imagine how we got this high [19:17:31] but yes, just changing the type on the table is probably the right thing [19:17:37] signed int is Max = +2,147,483,647 [19:17:39] (although in theory that table was created by pdns :( ) [19:17:59] each instance has what, 2 entries? [19:18:16] maybe more like 4 or 5 [19:18:21] but still that doesn't get us to the billions [19:18:30] that is a stupidly big number to hit [19:18:46] godog: do you know offhand how to change the type of that column? [19:18:49] eqiad.wmflabs A and 68.10.in-addr.arpa PTR? [19:18:57] it'll be an alter table statement I guess andrewbogott [19:19:08] I bet that when it gets an axfr notice it refreshes every record [19:19:10] rather than just the new one [19:19:22] andrewbogott: yeah what Krenair said, alter table [19:19:27] that would get us closer, although 'billions' still seems unlikely [19:19:40] godog: having never done this before, can I ask you to make the change? [19:20:17] We have 4229 current entries, but have hit the 2^31 mark over 5 years [19:20:31] or not, I think pdns is much newer right? [19:20:54] yeah, more like 2 years I think. [19:20:54] andrewbogott: not sure I feel confident enough to make the change myself or what that would mean to existing data [19:21:02] godog: ok, I'll do it then [19:21:11] (as it happens this db is largely a cache so I'm not worried) [19:21:41] Krenair: if we refresh once per minute, that's 5.7 million records per day [19:21:52] So 2 billion isn't out of the question, it's the right order of magnitute [19:22:01] if it flushes every record each time [19:22:08] wait [19:22:13] on every change [19:22:19] it truncates and re-inserts everything? [19:22:24] I don't know [19:22:35] I'm just saying — that would explain us getting an ID of that magnitude [19:23:41] So we're just talking alter table pdns modify id BIGINT; [19:23:42] yes? [19:23:55] andrewbogott: you have pdns in labtest to run it on first? [19:24:00] oh, wait, alter table record modify id bigint [19:24:04] chasemp: yes, trye :) [19:24:06] true :) [19:24:29] yes I think that's right [19:24:33] also if it is largely a cache it can be nuked too ? [19:24:37] PROBLEM - Puppet run on tools-docker-registry-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:24:39] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:24:44] godog: also true [19:24:48] maybe take a copy of the table, then change it on labtest [19:25:27] I'm going to create a new VM on labtest, make sure everything is happy [19:25:31] reading teh docs it seems like [19:25:35] ALTER TABLE pdns MODIFY id BIGINT NOT NULL; [19:26:33] what does the NOT NULL do? [19:26:58] The default for columns is NULLABLE, so if you have a NOT NULL column, don't forget to use "MODIFY columnname INTEGER NOT NULL" or else you will change your column from NOT NULL to NULL. [19:27:55] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2870074 (10scfc) [19:29:24] pdns on labtest now says [19:29:25] Unable to AXFR zone '196.10.in-addr.arpa' from remote '208.80.153.48:5354' (AhuException): GSQLBackend unable to feed record: Failed to execute mysql_query, perhaps connection died? Err=1: Duplicate entry '0' for key 'PRIMARY' [19:31:33] bah, and changing it back to 'INT' that still happens [19:32:37] Oh, I bet I need to set it to BIGINT AUTO_INCREMENT [19:32:42] rather than BIGINT NOT NULL [19:32:57] oh [19:32:58] yes [19:33:02] BIGINT NOT NULL AUTO_INCREMENT [19:33:29] the pdns guide about setting up the db doesn't mention NOT NULL [19:33:44] hey yall, not following along, let me know if I can try to get into my new instances (or if I have to recreate them) [19:33:50] well, the existing one had 'NO' under the 'Null' column [19:33:56] so probably best keep it [19:33:59] ok [19:34:06] yep, that fixed axfr [19:34:19] andrewbogott: I can text Manuel? [19:34:21] ok [19:34:23] in labtest? [19:34:35] it makes sense that it needs auto_increment of course [19:34:58] yeah, everything is good in labtest [19:35:01] I don't think us mucking up labtest pdns caching is something that calls for paging dbas [19:35:08] yep [19:35:21] I'm going to apply that change in production now (after dumping the dbs) [19:35:44] lgtm [19:36:35] sure I meant to page in hte case of we didn't know what to do for prod [19:36:54] ok [19:36:57] bah, if only I could make mysqldump work [19:37:48] I'm going to get dinner now [19:38:05] ok [19:38:07] FWIW I remember having sort-of the same problem in production too with too big ids, maybe it was eventlogging [19:38:42] godog: guatemala day deviantart google that :) [19:39:40] 10Tool-Labs-tools-Xtools: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#2870106 (10MusikAnimal) [19:40:09] chasemp: oh my [19:40:13] 10Tool-Labs-tools-Xtools: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#2870124 (10MusikAnimal) [19:40:16] 10Tool-Labs-tools-Xtools, 06Collaboration-Team-Triage, 10Flow: Add Flow contributions to Xtools - https://phabricator.wikimedia.org/T136950#2870123 (10MusikAnimal) [19:40:33] that must have been a fun one to postrtem [19:40:35] 10Tool-Labs-tools-Xtools: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#2870106 (10MusikAnimal) [19:40:37] 10Tool-Labs-tools-Xtools, 06Collaboration-Team-Triage, 10Flow, 10Tools-Global-user-contributions: Add Flow contributions to GUC and Xtools - https://phabricator.wikimedia.org/T114777#2870125 (10MusikAnimal) [19:40:51] 10Tool-Labs-tools-Xtools: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#2870106 (10MusikAnimal) [19:40:53] 10Tool-Labs-tools-Xtools: Add new semi-automated tools - https://phabricator.wikimedia.org/T97647#2870128 (10MusikAnimal) [19:44:55] andrewbogott: any luck in prod? [19:45:25] dns is working now! But new instances still aren't coming up properly. Much to my surprise [19:46:01] andrewbogott: is it time to suspend new instance creation until we are sure things are working? [19:46:35] I don't know yet… let me make sure that things are genuinely broken [19:46:46] it seems so unlikely that there would be two unrelated problems at the same time... [19:46:46] k [19:46:50] agreed [19:50:38] ok, pdns is still screwed up on labservices1002 [19:50:42] Not sure what I missed, looking [19:57:30] ottomata, Epantaleo, things are still slightly broken but if you recreate your instances now I'd expect them to work. [19:57:41] ok, recreate, not just restart? [19:57:48] andrewbogott: if i delete, can I recreate with the same names? [19:57:58] sure, just give it a few minutes between [19:58:00] ok [20:04:36] RECOVERY - Puppet run on tools-docker-registry-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:04:40] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:06:00] godog, eventlogging possibly yeah, a *lot* of data goes in there [20:10:43] ok [20:13:01] 10Labs-project-Wikistats: all kinds of mixed issues with miraheze table (was: allthetropes is not updating on wikistats) - https://phabricator.wikimedia.org/T146712#2870312 (10Dzahn) @Dzahn Please let me know if you need anything else from us Just working API URLs under the miraheze.org domain. [20:14:54] ottomata: working ok? [20:17:17] andrewbogott: sorry, i haven't tried, deleted instances and was waiting [20:17:19] making new ones... [20:18:51] 10Labs-project-Wikistats: all kinds of mixed issues with miraheze table (was: allthetropes is not updating on wikistats) - https://phabricator.wikimedia.org/T146712#2870331 (10Dzahn) So this ticket has become a little too general. Is it about fixing the special wikis that don't have URLs under miraheze? Well,... [20:26:45] ottomata: andrewbogott verdict on new instances? [20:29:55] chasemp: andrewbogott looks good, i logged into a new instance [20:33:25] andrewbogott: ECDSA host key for etytree-b.etytree.eqiad.wmflabs has changed and you have requested strict checking. [20:33:32] Krenair, godog, thanks for help with that weird problem [20:33:35] Host key verification failed. [20:33:39] Epantaleo: yep, that's because you're connecting to a different host [20:33:46] your ssh client stored a record of the key for the older host [20:33:54] this will happen anytime you create a new server with the same name as an old one [20:34:06] oh ok thanks [20:35:05] works! great, thanks [20:50:13] andrewbogott chasemp hi, it seems i carn't ssh to login.tools.wmflabs.org. [20:50:27] ah never mind, it is taking longer then usual. [20:50:49] it is taking a long time [20:51:16] Yep [20:51:36] carn't do any commands. [20:52:55] nfs is fine I think [20:53:06] ok [20:54:13] thanks for your help earlier today guys [20:54:18] !log tools reboot bastion-03 as unresponsive [20:54:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:54:22] it looks like one of my new nodes just had a busted first puppet run [20:54:31] I was in the process of logging in [20:54:31] because of stuff that was applied to the old node with the same name [20:54:34] had the motd but no prompt [20:54:38] is there a way to force a puppet run? [20:54:43] i think the next puppet run will fix things [20:54:46] ottomata, without being able to ssh in? [20:54:47] but atm i can't log in [20:54:48] yea [20:54:51] wait, you're ops right? [20:54:55] ja [20:55:01] you can use the interactive console [20:55:05] ! [20:55:05] hello :) [20:55:07] load average: 13.24, 12.09, 9.21 [20:55:09] you can use salt [20:55:12] bastinon-03 [20:55:30] Krenair: yeah? [20:55:33] (actually if it had a custom saltmaster you may be able to do that without being ops, depending on how far puppet got I suppose) [20:55:36] don't know anything about interactive console [20:55:51] ottomata, https://wikitech.wikimedia.org/wiki/OpenStack#Get_a_web-based_console_and_root_password [20:56:30] labcontrol1001 [20:56:30] ? [20:56:32] where it says 'testlabs' in that example, use your project name [20:56:33] yes [20:56:52] .eqiad.wmnet? [20:56:54] .eqiad.wmflabs? [20:57:10] can't get into either [20:57:24] ah [20:57:26] .wikmiedia.org [20:57:26] haha [20:57:26] ok [20:57:48] what happens: [20:57:49] Broadcast message from root@tools-bastion-03 (unknown) at 20:54 ... [20:57:50] The system is going down for power off NOW! [20:58:11] NICE thanks Krenair [20:58:13] very cool [20:58:16] ottomata, yeah labcontrol1001 is a production host [20:58:23] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:58:27] that's right doctaxon [20:58:47] can't remember an announce [20:58:55] doctaxon, chase logged it above [20:59:38] Hm, its still asking my user for a password [20:59:42] sigh, i'll delete and recreate again :) [20:59:48] for what do we have a mailing list? [21:00:11] doctaxon: for announcements, not for 'things need to be fixed right now' [21:00:21] ottomata, you got puppet succeeding but login not working? [21:00:24] yeah [21:00:58] oh, it's not planned? okay... [21:02:16] ottomata, that's strange... [21:02:26] but nothing I can do about it, no labcontrol access [21:02:52] aye, s'ok, hopefully it'll just come up this next time [21:02:52] thanks [21:02:55] i learned somethign new! [21:04:10] 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 06Community-Tech-Tool-Labs: Develop evaluation criteria for comparing Platform as a Service (PaaS) solutions - https://phabricator.wikimedia.org/T136265#2870527 (10bd808) Tenant oriented criteria: * Ability to customize container by installing additional applications... [21:08:22] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [21:12:01] andrewbogott: not sure you are the right person to ask.. I want to put some proxy passes in the apache conf file [21:12:19] any suggestin on which file to edit? [21:12:41] I'm probably not the right person [21:12:43] /etc/apache2/apache2.conf would be ok? [21:12:58] Maybe, it depends on if you want to host multiple sites on that one host [21:13:00] :) ok thanks anyway [21:13:08] only one [21:15:24] a different thing: I want to open ports 8890 and 1111 and then access page http://localhost:8890/conductor/ [21:16:32] so I'm adding a rule [21:16:49] custom tcp rule [21:17:21] what do I do for CDIR? [21:17:28] CIDR [21:18:12] what hosts do you want to open it to? [21:18:29] wait [21:18:30] stop [21:18:43] why are you adding a security group rule to have the machine access localhost? didn't we have this conversation already? [21:19:07] etytree-b [21:19:39] we interrupted the conversation [21:19:48] I understand that [21:20:16]