[01:01:08] 6Labs, 10Wikimedia-Labs-wikitech-interface: Wikitech registration requires labs shell access - https://phabricator.wikimedia.org/T88092#1139741 (10scfc) There's a lot of cruft in wikitech; I think you're right that the message can just be trimmed down. [01:28:54] is it possible to remove your original email from gerrit? [02:02:01] is it possible to remove your original email from gerrit? [02:26:36] is it possible to remove your original email from gerrit? [02:27:30] you said it three times [02:27:35] you do not need to repeat yourself [02:27:41] if someone is around, they will answer [02:27:55] what do you mean by your "original email"? [02:28:50] ive been repeating myself about every half hour because people are joining, what i mean is the first email on the list in the "identities" tab in settings which is still searchable after adding another email [02:29:36] I doubt it. [02:29:55] Who are you? [02:30:16] i'd rather not say for privacy reasons [02:31:37] hm [02:31:41] I think it might be possible actually [02:31:59] It's probably your email in LDAP... which means it'd be the email you set in wikitech.wikimedia.org? probably. [02:33:04] it is set by that email but changing the email on wikitech doesn't do anything to gerrit [02:35:05] 10Tool-Labs: Provide a status page (list) of all active proxy definitions - https://phabricator.wikimedia.org/T88216#1139805 (10scfc) On second thought … `proxylistener` has the benefit to delete a proxy entry when the attached web service is terminated and closes the socket. On the other hand it has the disad... [02:35:49] oh, is it the "Preferred Email" setting in Contact Information? [02:36:33] changing that changes the default email shown when someone searches you with user: but it keeps the original in the list of identities and you can still do an email search with user on the old address [02:38:00] so people with your email can find your gerrit account? zomg [02:39:13] the concern is that people can confirm a link between me and my otherwise anonymous account [02:55:54] PROBLEM - Free space - all mounts on tools-dev is CRITICAL: CRITICAL: tools.tools-dev.diskspace._var.byte_percentfree.value (<12.50%) [03:01:59] !log tools wiped out atop.log on tools-dev because /var was filling up [03:02:04] Logged the message, dummy [03:10:53] RECOVERY - Free space - all mounts on tools-dev is OK: OK: All targets OK [04:01:53] PROBLEM - Free space - all mounts on tools-dev is CRITICAL: CRITICAL: tools.tools-dev.diskspace._var.byte_percentfree.value (<22.22%) [04:49:10] 10Tool-Labs: Memory Exhausted Near / Tool labs error while querying with Python - https://phabricator.wikimedia.org/T93074#1139900 (10Springle) Are you using prepared statements at all? [06:37:36] 6Labs, 10Tool-Labs: Implement 'webservice2 status' - https://phabricator.wikimedia.org/T93560#1139967 (10Krinkle) 3NEW [06:37:50] 6Labs, 10Tool-Labs: Make webservice2 write out a bigbrotherrc file - https://phabricator.wikimedia.org/T90574#1139976 (10Krinkle) The `webservice2` program should also have a `status` method. ---- {T93560} [06:43:01] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:49:49] 10Tool-Labs: Toolserver tilserver (toolserver.org/tiles/) redirect - https://phabricator.wikimedia.org/T86739#1139979 (10Krinkle) It seems http://a.www.toolserver.org/tiles/osm -> http://a.tiles.wmflabs.org/osm/ is already in place. These are in place too, the redirects work http://a.www.toolserver.org/tiles/hi... [06:51:18] PROBLEM - Puppet failure on tools-webgrid-06 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0] [06:56:48] PROBLEM - Puppet failure on tools-exec-gift is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:07:53] RECOVERY - Puppet failure on tools-exec-catscan is OK: OK: Less than 1.00% above the threshold [0.0] [07:21:19] RECOVERY - Puppet failure on tools-webgrid-06 is OK: OK: Less than 1.00% above the threshold [0.0] [07:21:51] RECOVERY - Puppet failure on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [07:27:37] Warning: mysqli_connect(): (HY000/1129): Host '10.68.16.30' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts' in /data/project/wikitrends/projects/wikitrends/render/cache.php on line 20 [07:27:42] guys... [07:29:11] tools-exec-01 is anyone cares [09:49:11] 6Labs: Ensure that opsen are paged on failure of labstore1001's NFS service - https://phabricator.wikimedia.org/T76402#1140159 (10mark) @Coren: why is this still not done 4 months after the fact? [10:21:43] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1140219 (10Teslaton) And 4+ hour outage right now: http://tools.freeside.sk/monitor/http- kmlexport.html [10:40:39] 10Tool-Labs: Toolserver tilserver (toolserver.org/tiles/) redirect - https://phabricator.wikimedia.org/T86739#1140254 (10Nemo_bis) >>! In T86739#1139979, @Krinkle wrote: > However the target is 404 Not Found, so it looks like something the tiles project needs to add? What makes you think that http://a.tiles.wmf... [11:58:44] 6Labs: Ensure that opsen are paged on failure of labstore1001's NFS service - https://phabricator.wikimedia.org/T76402#1140337 (10yuvipanda) Also needs alerts for network saturation, IO saturation, CPU spikes, etc etc etc. The only alerts we seem to have now are the ones about network port saturation on the swit... [12:04:29] hi YuviPanda. I've been trying to mount my tool labs space as an ftp disk as I do normally when I want to code. This time I haven't been able to do it during the entire morning. is it related to the host change from yesterday? Thanks. [12:04:55] marcmiquel: that’s possible, yeah. I am not sure how you mount something as an ftp disk, however - we don’t support ftp at all (only sftp) [12:05:20] marcmiquel: you might have to dig down where ssh fingerprints are checked in the tool you are using and ask it to verify the new one... [12:05:39] yeh, sorry, i mean sftp. i use 'transmit' with mac [12:05:50] it works perfectly to check the sftp [12:05:50] ah, right. I haven’t used it so I’m not sure [12:05:52] but it fails to mount it as a disk [12:05:58] marcmiquel: consider emailing labs-l? [12:06:02] or ask transmit support? [12:06:08] "Mount as Disk" [12:06:17] * YuviPanda has to get on a flight now [12:06:20] very comfortable to code locally [12:06:26] yeah, I could guess :) [12:06:35] ok [12:06:40] anyway, off now. Email labs-l and / or transmit support :) [12:07:02] okkkk [12:07:08] have a good flight [13:18:55] Coren: https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=labstore1001.eqiad.wmnet&m=cpu_report&s=descending&mc=2&g=network_report&c=Labs+NFS+cluster+eqiad [13:19:05] why don't we have an alert for that still? [13:19:11] YuviPanda|flight: too :) [13:19:50] Sigh. True true true. I have been doing way too many things. [13:19:57] It flipped during the weekend too [13:20:01] paravoid: Because setting that up has always ended up being just below the top list. :-( Lemme up the priority of this so that it gets done next. [13:20:06] I'll do it today if coren doesn't get to it [13:21:27] Right now I'm trying to fiddle things so that rsync doesn't explode trying to replicate between DCs. [13:21:59] That's my last step to have fully working replication. [13:46:01] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:57] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 755582 bytes in 7.211 second response time [13:58:13] Who actually runs the 'maps' project? [14:03:56] is that what creates the labstore spikes? [14:05:54] paravoid: I don't know - atm I'm working on the replication and I wanted to know if there are cache directories that can safely be excluded from the rsync in there because rebuildable. The maps project accounts for 52 million files and is one of the bigger causes of rsync's woes. [14:06:30] labstore's network is overloaded every 10' [14:06:38] *saturated [14:06:52] the alert can wait, this probably shouldn't ;) [14:07:06] paravoid: Yeah, I'm trying to keep an eye to catch whatever is doing it in the act. [14:07:27] Is there any staff connected? [14:08:10] NeoMahler: Many, but "staff" is wide. What do you need help with? [14:08:36] I need somebody to kill all process in rc-vikidia project... [14:09:08] When I try it it says me that no process is currently running... [14:09:12] but there is! [14:09:51] NeoMahler: I'll need a bit more context to figure out how I can help you. What, exactly, are you trying to do on which instance? [14:10:51] In rc-vikidia there are 3 process running (all named rc-vikidia, I don't know why there are 3 and not one...) [14:10:58] and I need to stop them [14:11:29] but when I do qdel it says me that there isn't any job named rc-vikidia [14:11:54] Oh! You mean jobs in tool labs! [14:11:59] yes XD [14:12:29] When you have more than one job with the same name, it's usually better to qdel them by job id (the number in the first column of qstat output) [14:12:42] Try this? [14:12:45] ok, I'm trying [14:13:18] PROBLEM - SSH on tools-exec-01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:13:51] * Coren wonders why shinken is on crack. [14:14:40] oh! [14:14:43] ok coren :) [14:15:23] uhm... [14:16:00] is it immediatly the deletion? [14:16:21] ah it has deleted all jobs [14:16:23] thanks ;) [14:16:32] NeoMahler: It may take several second; the gridengine schedules the deletion immediately but it may take some time for the process to die and for the exec node to collect it. [14:16:41] ok [14:18:09] RECOVERY - SSH on tools-exec-01 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [14:26:23] paravoid: It's a job that's started by cron; but every time it starts it does so from a different place so it's really hard to catch it. [14:41:34] 10Wikimedia-Labs-wikitech-interface: Include role::analytics::hadoop roles in default list of labs puppet groups - https://phabricator.wikimedia.org/T70391#1140566 (10Ottomata) Yes! The Hadoop ones should be good to go for this. Hm, except, they probably should have better hiera intergration. I am for it eith... [15:29:17] hi guys [15:29:46] i had a script failing with a memoryerror and i was wondering whether it was because of my toollabs instance went out of memory or the mysql [15:34:32] extract_cira_abroadrank(mysql_cur, list_of_lang_to_examine, llengua, articlelangs_titles[langu], page_title, output_file7) [15:34:32] File "cira_abroad.py", line 156, in extract_cira_abroadrank [15:34:32] mysql_cur.execute(query,tuple(user_asstring)) [15:34:33] File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute [15:34:36] self.errorhandler(self, exc, value) [15:34:38] File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler [15:34:40] raise errorclass, errorvalue [15:34:42] MemoryError [15:34:43] anyone has any idea? [15:38:21] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1140682 (10scfc) Again not strictly unattended upgrade, but maybe related: ``` From: root@tools.wmflabs.org (Cron Daemon) Subject: Cron test -x /usr/sbin/anacron || ( cd / && run-parts... [15:40:39] 10Tool-Labs: Memory Exhausted Near / Tool labs error while querying with Python - https://phabricator.wikimedia.org/T93074#1140688 (10marcmiquel) Yes. I do. Now I got a memory error after days working well. marcmiquel@tools-bastion-01:~$ cat scriptabroad.err Traceback (most recent call last): File "cira_abro... [15:43:05] 6Labs: Allow labstores to hot or warm swap in case of failure - https://phabricator.wikimedia.org/T93589#1140690 (10coren) 3NEW [16:12:15] 6Labs: Allow labstores to hot or warm swap in case of failure - https://phabricator.wikimedia.org/T93589#1141373 (10coren) Another caveat to note is that this not not provide standby against a //shelf// failing: that would still requires disk to be physically moved to the backup shelf and it being wired to repla... [16:18:58] 10Tool-Labs-tools-Other: data-* span classes in WSexport ePub produce errors - https://phabricator.wikimedia.org/T84884#1141401 (10Aklapper) @Tpt: As you are set as assignee here, do you plan to work on this? [16:30:17] 10Tool-Labs-tools-Other: data-* span classes in WSexport ePub produce errors - https://phabricator.wikimedia.org/T84884#1141465 (10Tpt) 5Open>3Invalid Yes, I should work on it. But issues for this tool are currently tracked on GitHub. I have just opened an issue there: https://github.com/wsexport/tool/issues/48 [16:32:35] YuviPanda: does your proxy refer to instances by name or by ip? and if by name, is it fqdn? [16:33:07] andrewbogott: for tools or generic? [16:33:12] both [16:33:25] https://phabricator.wikimedia.org/T93087 [16:35:20] andrewbogott: tools uses fqdn, and general one uses whatever is given it by wikitech. [16:35:25] let me see what is given to it [16:35:59] andrewbogott: there’s also lots of random code in places that’s got hardcoded X.eqiad.wmflabs... [16:36:04] beta has a fair bit of ‘em, for example [16:36:46] > 1) "http://mediahandler-tests-static.eqiad.wmflabs:80" [16:37:16] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1141485 (10yuvipanda) So tools-proxy uses fqdn, and dynaimcproxy also seems to use fqdn. [16:37:40] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1141486 (10yuvipanda) There's plenty of random code all around on labs that assumes it's hostname.eqiad.wmflabs, however... [16:37:46] YuviPanda: I think what will happen is there will be a pre-anounced cut off day — on that day I’ll import all existing instances with the old naming scheme into designate... [16:37:56] andrewbogott: hmm, I see. [16:37:56] So it’ll only be /new/ instances thereafter that need to use the new scheme. [16:38:03] well, references to new instances [16:38:07] oh... [16:38:13] so we’ll have a mix of both? [16:38:14] that sounds bad. [16:38:40] YuviPanda: you can look at labs-ns2 now and see what it looks like. All instances will support the new scheme (including old instances) [16:38:55] but new instances (after the magic day) will only appear with the new scheme [16:39:01] andrewbogott: oh, hmm, I see. [16:39:06] And when an instance is deleted it’ll clean up one or both. [16:39:14] andrewbogott: so, tools-bastion-01.tools.eqiad.wmflabs and tools-bastion-01.eqiad.wmflabs will both be valid [16:39:37] YuviPanda: we can discuss this further if you think that it’s not worth changing over. But everyone once in a while people are tripped up by duplicate instance names in different projects. [16:39:40] but if I create tools-bastion-02. after the cutoff, then only tools-bastion-02.tools.eqiad.wmflabs will be valid? [16:39:44] YuviPanda: right. [16:39:58] andrewbogott: yeah, I totally think this is worth changing over, just going to be a bit of a pain to get everything through... [16:40:15] but yeah, that scheme looks quite nice to me :) [16:40:41] Because designate can manage the old-school entries, there will neve be a need to clean up the old ones. [16:40:52] right [16:41:01] ‘tall sounds sane. [16:41:04] YuviPanda: can you comment on that phab ticket, and rattle off all the cases you can think of where code will need updating? [16:41:09] yeah, sure [16:41:11] thx [16:41:22] doesn’t have to happen right now, I won’t get to that for a bit anyway [16:42:08] 10Wikimedia-Labs-Other, 6Phabricator: Email not working on phab-01.wmflabs.org - https://phabricator.wikimedia.org/T76427#1141506 (10chasemp) 5Open>3Invalid a:3chasemp works as of now [16:43:47] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1141510 (10yuvipanda) Places that will need updating here, off top of my head: 1. Scap config for beta / staging 2. Tools infrastructure. I wonder how OGE will deal with... [16:43:56] andrewbogott: yeah, and the b/c scheme is very clever... [16:45:38] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1141516 (10Andrew) Right, this will only break references to new instances. [16:50:57] 10Tool-Labs, 5Patch-For-Review: Enable "Access-Control-Allow-Origin: *" header on tools-static.wmflabs.org - https://phabricator.wikimedia.org/T93466#1141531 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Done [16:51:09] Coren: ping me when you have a few minute to talk about dns? [16:51:59] 10Tool-Labs, 5Patch-For-Review: Puppetize /usr/local/bin/jobkill - https://phabricator.wikimedia.org/T90331#1141534 (10scfc) @yuvipanda: Why the moves for resolved tasks? Don't you have "Filter: Open Tasks" enabled on the #Tool-Labs workboard? [16:52:31] 10Tool-Labs, 5Patch-For-Review: Puppetize /usr/local/bin/jobkill - https://phabricator.wikimedia.org/T90331#1141536 (10yuvipanda) I found it just after I did, and then I realized I've moved them to the wrong column... [16:53:41] andrewbogott: I'm back from lunch in ~5. Works for you? [16:55:57] 10Tool-Labs-tools-Other: Migrate to Tool Labs: https://toolserver.org/~magnus/filterdone.php - https://phabricator.wikimedia.org/T63187#1141545 (10Aklapper) p:5Triage>3Lowest [16:58:32] Coren: yep [16:58:53] andrewbogott: They timing; most impeccable. I'm all yours. [16:59:07] OK — so... [16:59:17] first, I’m not totally sure how thngs work now :) [16:59:29] 10Tool-Labs-tools-Other: Migrate to Tool Labs: https://toolserver.org/~siebrand/tnstats.php - https://phabricator.wikimedia.org/T63034#1141550 (10Aklapper) p:5Triage>3Lowest [16:59:31] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~magnus/coord_gen.php - https://phabricator.wikimedia.org/T63188#1141555 (10Aklapper) p:5Triage>3Lowest [16:59:32] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~magnus/files_in_category.php - https://phabricator.wikimedia.org/T63181#1141557 (10Aklapper) p:5Triage>3Lowest [16:59:33] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~magnus/image_pages_without_image.php - https://phabricator.wikimedia.org/T63180#1141556 (10Aklapper) p:5Triage>3Lowest [16:59:33] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~magnus/missing_links.php - https://phabricator.wikimedia.org/T63191#1141559 (10Aklapper) p:5Triage>3Lowest [16:59:34] 10Tool-Labs-tools-Other: Migrate http://toolserver.org/~purodha/sample/dbswithuser.php to Tool Labs - https://phabricator.wikimedia.org/T63028#1141560 (10Aklapper) p:5Triage>3Lowest [16:59:35] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~magnus/wiki2xml/w2x.php - https://phabricator.wikimedia.org/T63195#1141554 (10Aklapper) p:5Triage>3Lowest [16:59:35] Labs instances have 10.68.16.1 in their resolv.conf [16:59:36] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~magnus/pushforcommons.php - https://phabricator.wikimedia.org/T63185#1141558 (10Aklapper) p:5Triage>3Lowest [16:59:37] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~magnus/pdf.php - https://phabricator.wikimedia.org/T63194#1141553 (10Aklapper) p:5Triage>3Lowest [16:59:38] 10Tool-Labs-tools-Other: Migrate http://toolserver.org/~slakr/* to Tool Labs - https://phabricator.wikimedia.org/T60887#1141562 (10Aklapper) p:5Triage>3Lowest [16:59:39] 10Tool-Labs-tools-Other: Migrate http://toolserver.org/~Emijrp to Tool Labs - https://phabricator.wikimedia.org/T62887#1141564 (10Aklapper) p:5Triage>3Lowest [16:59:40] 10Tool-Labs-tools-Other: Migrate to Tool Labs: http://toolserver.org/~vvv/yaec.php - https://phabricator.wikimedia.org/T63369#1141561 (10Aklapper) p:5Triage>3Lowest [16:59:41] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~bawolff/en-wn-editor-stats.php to Tool Labs - https://phabricator.wikimedia.org/T60867#1141563 (10Aklapper) p:5Triage>3Lowest [16:59:42] 10Tool-Labs-tools-Other: Migrate to Tool Labs: https://toolserver.org/~emw/index.php%3Fc%3Dwikistats - https://phabricator.wikimedia.org/T63038#1141565 (10Aklapper) p:5Triage>3Lowest [16:59:43] ok, wikibugs, take your time [16:59:44] 10Tool-Labs-tools-Other: Migrate to Tool Labs: https://toolserver.org/~nikola/articlesby.php - https://phabricator.wikimedia.org/T63031#1141569 (10Aklapper) p:5Triage>3Lowest [16:59:45] 10Tool-Labs-tools-Other: Migrate to Tool Labs: https://toolserver.org/~mzmcbride/stalker/ - https://phabricator.wikimedia.org/T63040#1141570 (10Aklapper) p:5Triage>3Lowest [16:59:46] 10Tool-Labs-tools-Other: Migrate steward tools by jyothis https://toolserver.org/~jyothis/tools/stewtools.html - https://phabricator.wikimedia.org/T63196#1141568 (10Aklapper) p:5Triage>3Lowest [16:59:47] 10Tool-Labs-tools-Other: Migrate https//toolserver.org/~wiegels/wikipedia-termine.php to Tool Labs - https://phabricator.wikimedia.org/T62888#1141572 (10Aklapper) p:5Triage>3Lowest [16:59:48] 10Tool-Labs-tools-Other: Migrate to Tool Labs: https://toolserver.org/~vvv/grep.php - https://phabricator.wikimedia.org/T63035#1141571 (10Aklapper) p:5Triage>3Lowest [16:59:50] andrewbogott: "poorly" [17:00:02] wikibugs, done? Can we talk now? [17:00:09] OK [17:00:15] So, instances have 10.68.16.1 in their resolv.conf [17:00:22] that’s labnet1001, I presume [17:00:29] Which is running dnsmasq [17:00:34] does that sound right so far? [17:00:37] afaik, right. [17:01:06] ok. So dnsmasq only knows about labs instances, right? So it resolves things like ‘instancename’ and ‘instancename.eqiad.wmnet’ [17:01:09] but nothing else [17:01:34] That is my understanding also. It is set to recurse, however. [17:01:35] And then things that it doesn’t know about fail over to wherever labnet1001’s resolv.conf points, right? [17:02:17] Or do they fail over to something explicitly set in the dnsmasq config? [17:02:26] maybe I should be saying ‘recurse’ rather than ‘fail over' [17:02:26] andrewbogott: I don't know offhand whether it uses resolv.conf; it might be explicit also. [17:03:26] ok [17:03:37] It's not in the .conf. Lemme check the commandline [17:03:45] so, an example of how to replace dnsmasq would be… [17:04:21] 1) change labs resolv.conf to point to holmium [17:04:35] 2) remove dnsmasq from labnet1001 and install pdns there instead [17:04:47] And in either case, that pdns would have to know to recurse for things outside of labs. [17:04:51] 10MediaWiki-extensions-OpenStackManager: Convert OpenStackManager to use extension registration - https://phabricator.wikimedia.org/T87950#1141596 (10Aklapper) p:5Triage>3Low [17:04:53] Which is apparently possible with pdns but highly discouraged [17:04:57] 10MediaWiki-extensions-OpenStackManager: Weird parsing of openstackmanager-configureproject-serviceuserinfo message on Special:NovaProject&action=configureproject - https://phabricator.wikimedia.org/T89467#1141598 (10Aklapper) p:5Triage>3Low [17:05:05] 10MediaWiki-extensions-OpenStackManager: Allow to better customise RAM/CPU/Disk during virtual machine creation process - https://phabricator.wikimedia.org/T91976#1141609 (10Aklapper) p:5Triage>3Low [17:05:32] andrewbogott: Discouraged for public DNS servers no doubt, but it's quite normal to do so for an internal server. [17:05:42] ok [17:06:32] And I’m guessing there’s no easy way to have labs instances use holmium for ..eqiad.wmnet but dnsmasq for .eqiad.wmnet [17:06:43] because we’d have to enumerate all the tenant domains someplace [17:07:00] or is there some way to make dnsmasq recurse for those cases? [17:09:58] There prbably is; but I'm much rather have pdns do so. [17:10:12] ok [17:10:24] So option 2) is probably what we ought to do. [17:10:26] There /is/ a way that should be doable. [17:11:00] If you create an acual subdomain for each project with an SOA, then you can delegate eqiad.wmnet [17:11:18] (Well, more acurately, it delegates the subdomains to /you/ but you get the idea) [17:11:21] Yeah — that makes sense but is maybe not worth the trouble. [17:11:49] So, suppose that tomorrow pdns is authoritative for .eqiad.wmnet [17:11:54] I would expect not. Since most /use/ of instance names are within the same project, haveing .eqiad.wmnet in the search order will suffice. [17:12:17] There will be a few things that might get confused, but if we search [17:12:27] Wait -- [17:12:30] search .eqiad.wmnet eqiad.wmnet [17:12:39] Ah, I see. OK. [17:12:40] It should cover 99% cases [17:13:05] For cases outside of that 99% we’ll be specifying the fqdn, right? So that’ll just work [17:13:32] We have ndots:2 so not even the fqdn is needed. Just . would also work. [17:13:45] ok [17:13:53] So — next question, public dns. [17:14:24] with either solution 1 or 2 the recursion goes: instance, pdns, ns0, labs-ns0 [17:14:33] and labs-ns0 is authoritative for .wmflabs.org so all is well. [17:14:52] What happens when we make pdns authoritative for wmflabs.org instead? [17:15:14] Is that in the cards? [17:15:22] I hope so! [17:15:41] Not immediately, but eventually I’d like designate and its dns server to know both private and public stuff. [17:15:46] * Coren nods. [17:15:51] that’s part of de-ldaping. [17:16:05] It shouldn't be an issue, though if pdns is authoritative then it need to be accessible to the 'net. [17:16:23] Now, /personally/ I'd set labs-ns0 to be a slave instead. [17:16:24] Yeah... [17:16:49] So, labs-ns2/holmium is already public and running designate/pdns [17:17:39] ;; ANSWER SECTION: [17:17:39] tools-exec-01.tools.eqiad.wmflabs. 120 IN A 10.68.16.30 [17:17:45] I’m confused at this point, but I think my question is: Does planning to use designate for public dns in the future mean that we should /not/ use labnet1001 for private dns in the meantime? [17:17:50] Aha! Indeed it is, and working fine too! [17:18:24] andrewbogott: I'm not sure I follow you. [17:18:28] mostly it’s the recursion question that I don’t understand. Can a server be authoritative for two unrelated domains (eqiad.wmnet and wmflabs.org) and recurse for everything else? [17:18:43] andrewbogott: Yes. That's what recursion /is/. :-) [17:19:04] andrewbogott: But you'll want to configure labs-ns2 to not recurse for queries coming from outside the local network. [17:19:09] And there’s no reason to send wmflabs.org queries out to ns0 and /back/ to labs-ns2 afterwards, I suppose. [17:19:32] Yeah, ok. [17:19:49] I think this is mostly making sense, I’ll see if I can figure out how to configure all this. [17:19:50] andrewbogott: Well, IMO, the better way to do it is to have ns0 be a slave to labs-ns2 and not let the 'net hit that directly at all. [17:20:18] I maybe don’t know what ‘be a slave’ means [17:20:26] ns0 knows about /all/ wikimedia domains, not just wmflabs [17:21:01] andrewbogott: A slave server has a domain configured, is authoritative, but does not have the zone file locally. It talks to the master and transfers the zone file from it, and caches it. [17:21:16] andrewbogott: So it can answer authoritatively, but isn't the source of the zone data. [17:21:32] ah, ok. So, yeah, that sounds like what we want. [17:21:44] For all I know it’s doing that with labs-ns0 already. [17:21:51] andrewbogott: Interestingly enough, that's not what we do for production - we have only masters and synchronize zone files. [17:22:03] hm [17:22:15] andrewbogott: It either slaves to it or recurses to it. Lemme check. [17:23:48] Coren: I think the only reason to install pdns on labnet1001 is to avoid having to hand-edit resolv.conf on all the unpuppetized instances. [17:23:54] But that’s probably not a good reason. [17:24:10] It sounds like a really bad one even. :-) [17:24:28] great. [17:24:36] OK — so, one more question. [17:24:47] Instances right now think their domain is ‘eqiad.wmnet’ [17:24:56] With the new scheme their domain will be .eqiad.wmnet [17:25:04] What will that affect? [17:25:14] ... ohcrap. [17:25:28] Well, /that/ may have more drastic effects. [17:25:32] * Coren ponders. [17:25:43] does changing the domain in resolv.conf cause a system to lose its mind? [17:26:09] No, but it may cause things that look things up by hostname to have some trouble. [17:26:27] tools will need some tweaking, for instance, because the gridengine speaks fqdn [17:26:56] We probably want to keep both names in parallel for a while and not rename them. [17:27:09] Yeah — that is https://phabricator.wikimedia.org/T93087 [17:27:13] So, here’s how that will happen: [17:27:20] It can all be done, but it may have project-specific effects that will need to be carefully managed. [17:27:35] Today, if you look, you’ll see that labs-ns2 knows all instances by two names, instance.eqiad.wmnet and instance.tenant.eqiad.wmnet [17:27:59] at a day-to-be-named-later, I’ll do one last sync, and after that new instances will only have the full instance.tenant.eqiad.wmnet name [17:28:21] So worst case things will only break for newly-created instances. [17:28:28] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1141736 (10coren) New instances should not be an issue as any reference to them will be new. OGE can have its instances renamed without many issues, but it does need to b... [17:28:58] Yeah, new instances shouldn't be a concern - nobody is going to refer to them by the old scheme of names. [17:29:20] But it does mean we can't change the domain line in the resolv.conf willy nilly [17:29:38] I’m pretty sure I don’t understand what that domain line does, at all. [17:29:42] What’s an example of something that uses it? [17:30:24] Anything that wants the hosts' fqdn. hostname -f for instance [17:30:53] hm [17:30:55] It just means 'my fqdn is my hostname followed by that domain' [17:31:17] I guess if I get holmium recursing correctly then I can point test instances at it and see what breaks. [17:32:15] You really mustn't change a host's idea of its own fqdn without some inspection and testing case-by-case. [17:32:29] So it's important that the resolv.conf domain line not be changed for existing instances. [17:32:51] (New instances, we don't care) [17:33:10] Also, will we get reverse now? :-) [17:33:41] Coren: ok, I get annoyed everytime someone asks me that. [17:33:45] But not annoyed at the asker [17:34:05] * Coren chuckles. [17:34:06] Here’s the thing — if a dns server has a record for forward dns, it KNOWS everything it needs to know for reverse, right? [17:34:11] So why does it need another damn entry? [17:34:22] Because it's not a 1:1 map. :-) [17:34:29] Hm [17:34:34] I guess not :( [17:34:42] Anyway, nope, designate-sink doesn’t add reverse entries. [17:34:59] No reason why it couldn’t, but I’ll probably have to write it myself [17:35:03] That's sad, but okay. Wishlist material. :-) [17:35:25] So, you just said “So it's important that the resolv.conf domain line not be changed for existing instances." [17:35:28] Do you mean, ever? [17:35:46] I guess since we’re not removing the foo.eqiad.wmnet entries there’s no reason to change them retroactively. [17:35:58] Leaving them alone may make for some ugly puppet code though [17:36:00] I mean, not automatically. Doing so /will/ break things. Not hard to fix, in general, but requires sysadmin care beforehand. [17:36:30] andrewbogott: Why not just a hiera variable "old instance" set, for the last time, at cutoff date? [17:36:47] Then just removing it once the instance is ready and *win* [17:37:47] yeah, that could work. [17:37:52] For instance, in gridengine, it's not a hard process: you have to drain the exec node of jobs, rename it in gridengine, change its fqdn, and put it back online. [17:38:01] Although I’m starting to wonder if there’s any reason to ever bring those instances into the present. [17:38:35] andrewbogott: Mixed-scheme projects. You want to allow project admins to not have them; that means bringing the older ones into the present. [17:38:50] That way it can be done gradually. [17:39:32] ok [17:40:00] I’ll make a new phab case with a step-by-step plan and we’ll see if it looks possible :) [17:40:18] * Coren nods. [17:49:48] Hello everyone, I have a problem with my bot. It uses java. After I start the job with jsub I always get "Could not create the Java Virtual Machine.". Can anyone help me? [17:50:56] tkaspar: Java is a hog with virtual memory, and you almost always need to tweak the memory settings. Check https://wikitech.wikimedia.org/wiki/Help_talk:Tool_Labs#300-350M.3F_Really.3F [17:52:12] Coren: reminder to get the labstore check in :) [17:52:23] I’m going to head off now. been a crazy few days, going to be a crazy week... [17:52:29] thank you. I try it :) [17:53:10] YuviPanda: I should be done with what I'm on in ~60m; it's next on my list unless labs catches fire. Then it'd be third on my list. :-) [17:53:33] Coren: cool. soon hopefully all these lists will be public as well so others can take things off your list... [17:53:57] wheee :) I’ll email out though, and we can try out our ‘next week planning thing’ as well in tomorrow’s ops meeting [18:04:42] (03CR) 10Awight: "I found there actually is another syntax, with a different notation for the optional, missing value elements:" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/196852 (owner: 10Awight) [18:23:14] 10Tool-Labs: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1142166 (10yuvipanda) 3NEW [18:34:08] When connecting to login.tools.wmflabs.org, my SSH client exits with a warning that they key sent by the server has changed to one with the fingerprint: 80:37:58:71:84:99:54:e7:17:dd:c4:be:54:48:41:57. This fingerprint is different from the one listed in topic. Can anyone tell me what might be going on? [18:34:44] ("they"="the") [18:35:08] afeder: heya! see /topic :) this changed yesterday [18:35:45] YuviPanda: Thanks, but the fingerprint in /topic is different from the one I am seeing. What am I missing? [18:36:07] afeder: oh, uhm. strange. looking [18:36:14] thanks [18:39:00] (03PS1) 10Legoktm: Send Commons to #wikimedia-commons-tech, per Steinsplitter [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/198777 [18:39:44] (03CR) 10Steinsplitter: [C: 031] Send Commons to #wikimedia-commons-tech, per Steinsplitter [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/198777 (owner: 10Legoktm) [18:40:28] (03CR) 10Legoktm: [C: 032] Send Commons to #wikimedia-commons-tech, per Steinsplitter [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/198777 (owner: 10Legoktm) [18:41:02] (03Merged) 10jenkins-bot: Send Commons to #wikimedia-commons-tech, per Steinsplitter [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/198777 (owner: 10Legoktm) [18:44:10] !log tools.wikibugs Updated channels.yaml to: 90eed2a902164a9a1cf7930c7d9fb599ec9ae660 Send Commons to #wikimedia-commons-tech, per Steinsplitter [18:44:12] Logged the message, Master [19:21:12] 6Labs, 7Tracking: Project for WMT - https://phabricator.wikimedia.org/T93629#1142456 (10Andrew) Project 'wmt' now exists, with one user and admin: JohnFLewis. You'll want to review security group settings before you create any new instances. Also please notify someone when your tools node can be removed. -A [19:22:40] 6Labs, 7Tracking: Project for WMT - https://phabricator.wikimedia.org/T93629#1142465 (10JohnLewis) Thanks Andrew. Will review and do basic set up so things can be migrated over and then I'll prod about the node when no longer needed. [19:25:21] Coren: if we use holmium as the labs DNS server than it can’t tell the difference between internal and external queries since it’s surely only reachable via the public IP. [19:25:37] o_O [19:25:52] Why couldn't labs access it via its internal IP? [19:26:28] Well, at the moment I don’t think it /has/ an internal IP. [19:26:42] But also, isn’t direct traffic from labs instances to production servers blocked for everything outside of labs-private? [19:26:50] or not so much blocked as not routed? [19:27:49] Blocked; but opening it for DNS would have been okay. But I'm a little surprised holmium doesn't have an internal IP. Surely it makes no sense to expose wmnet. to the public Internet. In fact, I'm sure we absolutely do not want to. [19:28:25] I was under the impression holmium was meant to be an internal server. [19:29:12] (In fact, if I had to guess, I'd have thought it would have been in labs-support) [19:29:34] It having only a public IP complicates matters. [19:30:53] Coren: it will eventually do both, though. [19:31:22] Probably we can give it an internal IP — I don’t immediately know how but it should be pretty easy [19:36:09] 6Labs, 5Patch-For-Review: Provide internal IP access to holmium from labs instances - https://phabricator.wikimedia.org/T93639#1142495 (10Andrew) 3NEW a:3mark [19:36:29] 6Labs, 7Tracking: Project for WMT - https://phabricator.wikimedia.org/T93629#1142506 (10JohnLewis) 5Open>3Resolved a:3JohnLewis Access is good, reviewed basic set up and currently looks good. [19:36:30] 6Labs, 7Tracking: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#1142509 (10JohnLewis) [19:48:39] @RC+ wikitech.wikimedia.org Nova_Resource:Tools/Access_Request/* [19:48:39] Inserted new item to feed of changes [19:48:42] \o/ [19:51:33] @recentchanges-on [19:51:33] Channel had already feed enabled [19:51:56] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Merlijn van Deen was modified, changed by Merlijn van Deen link https://wikitech.wikimedia.org/w/index.php?diff=149665 edit summary: [19:52:07] cool! [19:52:33] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Merlijn van Deen was modified, changed by Merlijn van Deen link https://wikitech.wikimedia.org/w/index.php?diff=149666 edit summary: [19:59:22] 10Tool-Labs: Setup an easy to use logrotate based system for rotating tools logs - https://phabricator.wikimedia.org/T68623#1142615 (10yuvipanda) a:5yuvipanda>3None [19:59:51] 10Tool-Labs: Setup an icinga instance to monitor tools on tool-labs - https://phabricator.wikimedia.org/T53434#1142616 (10yuvipanda) 5Open>3declined Not going to happen. We will probably end up doing some monitoring as part of the service manifests work, however. [20:00:05] 10Quarry: Make "Home" navlink go to profile for logged-in users. - https://phabricator.wikimedia.org/T85175#1142621 (10yuvipanda) a:5yuvipanda>3None [20:09:10] (03CR) 10Merlijn van Deen: "Hm, I suppose this could work:" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/196852 (owner: 10Awight) [20:10:13] (03CR) 10Merlijn van Deen: "Shouldn't these messages also go to #wikimedia-dev? I'm not sure if it makes sense to remove them there just because they are tagged with " [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/198777 (owner: 10Legoktm) [20:11:34] (03CR) 10Legoktm: "Well AIUI #Commons is just like an "fyi this bug affects commons", all bugs should also be associated with a different component like Medi" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/198777 (owner: 10Legoktm) [20:12:26] legoktm: hm, right. [20:12:44] legoktm: I think we maybe need to rethink the concept of the default channel ;-) [20:13:04] YESSS [20:13:34] legoktm: maybe we should just check if all projects are matched by one or more channels explicitly? [20:14:42] * Coren does not believe YuviPanda ever sleeps. [20:14:56] hehe [20:15:01] it’s called ‘just one more patch’ syndrome... [20:30:41] (03CR) 10Awight: "Thanks for thinking about the alternatives. I'm still leaning towards the "key: " syntax, but I'm happy to defer to other people working " [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/196852 (owner: 10Awight) [20:32:39] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:35:47] PROBLEM - Puppet failure on tools-dev is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:41:31] Coren: this part of the puzzle turns out to be easy: https://gerrit.wikimedia.org/r/#/c/198820/ That means that .wmflabs entries are still visible to the outside world but that seems harmless if incorrect. [20:41:37] * andrewbogott waits to be told why it is not harmless [20:41:54] !log deployment-prep updated OCG to version 11f096b6e45ef183826721f5c6b0f933a387b1bb [20:41:58] Logged the message, Master [20:41:59] PROBLEM - Puppet failure on tools-login is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [20:42:41] It is ostensibly harmless. It's very much /wrong/, from a dns-as-infrastructure POV (we are claiming publicly to be authoritative for a domain of a fictive tld) but unless someone *asks* it has no effect. [20:44:03] labstore spikes still happening [20:44:50] valhallasw`cloud: what's the problem exactly? [20:45:34] legoktm: well, if an issue is tagged 'commons' and has another project without explicit channel (and is thus maybe expected to show up in wikimedia-dev), it won't be shown there [20:46:00] oh hmmmmmm [20:46:08] I guess because I just use -feed I never noticed :P [20:46:24] which we could either solve by making sure everything is matched [20:46:36] or by having two types of matches ('remove from -dev' and 'keep in -dev') [20:47:04] I think requiring that every project has a channel set is a good idea, but unsure how practical it'll be [20:47:06] 10Tool-Labs: Look into role::labs::tools::bastion vs. role::labs::bastion - https://phabricator.wikimedia.org/T93661#1142830 (10scfc) 3NEW [20:47:32] 6Labs, 5Patch-For-Review: Provide internal IP access to holmium from labs instances - https://phabricator.wikimedia.org/T93639#1142838 (10Andrew) OK, actually, this is mainly resolved by https://gerrit.wikimedia.org/r/#/c/198820/ So, I'm not sure if this matters. It's weird to have a public dns server that... [20:48:00] legoktm: me neither. I could add it to the project list script, I guess [20:48:15] a second list which lists projects and the channels they will be reported [20:53:47] 10Tool-Labs: Look into role::labs::tools::bastion vs. role::labs::bastion - https://phabricator.wikimedia.org/T93661#1142857 (10scfc) 5Open>3Resolved a:3scfc And as https://gerrit.wikimedia.org/r/#/c/198812/ now leads to Puppet failures on `tools-login`, `tools-dev` and `tools-trusty` due to the duplicate... [21:03:43] 10Tool-Labs: Look into role::labs::tools::bastion vs. role::labs::bastion - https://phabricator.wikimedia.org/T93661#1142884 (10scfc) 5Resolved>3Open The configuration pages don't work at the moment. [21:08:03] 10Tool-Labs: wikitech configuration pages don't work - https://phabricator.wikimedia.org/T93663#1142893 (10scfc) 3NEW a:3scfc [21:10:32] 10Tool-Labs: wikitech configuration pages don't work - https://phabricator.wikimedia.org/T93663#1142905 (10Andrew) This isn't the bug you think it is :) That page has two entries for role::labs::bastion and you're only unchecking one of them... when the code goes through and turns on everything selected, that c... [21:12:55] I'm new here, I am trying to set up a proxy to login to drmf-ecf.wmflabs.org and I am having trouble. Can someone help? [21:14:45] Ankita: when you say ‘set up a proxy’ [21:14:49] do you mean an ssh proxy command? [21:15:57] I want to be able to type "ssh drmf-ecf.eqiad.wmflabs" and get into the instance [21:16:12] yes, I think that is it. [21:16:46] 10Tool-Labs: Look into role::labs::tools::bastion vs. role::labs::bastion - https://phabricator.wikimedia.org/T93661#1142928 (10scfc) [21:16:47] 10Tool-Labs: wikitech configuration pages don't work - https://phabricator.wikimedia.org/T93663#1142926 (10scfc) 5Open>3Invalid Well … let's consider it invalid for the moment :-). It looks as if this is the only class where this can happen, so if someone wants to fix it, sure, but I don't intend to uncheck... [21:16:59] Ankita: Ah, so that ‘org’ on the end up there was a typo? [21:17:11] ah yes [21:17:36] 10Tool-Labs: wikitech configuration pages don't work - https://phabricator.wikimedia.org/T93663#1142929 (10Andrew) You could also just change the 'puppet classes' config for the tools project so that it only appears once in the list :) [21:17:44] cool, then you are doing a normal thing :) [21:17:47] And you’re on linux? [21:17:53] yes I am on linux [21:18:01] CentOS 6 [21:18:26] ok. So that ssh command works for me, let’s see what I’ve got [21:19:08] In my .ssh/config the useful bits are [21:19:09] Host *.eqiad.wmflabs [21:19:10] ProxyCommand ssh -a -W %h:%p bastion-restricted-eqiad.wmflabs.org [21:19:11] Host *.wmflabs [21:19:12] User andrew [21:19:13] IdentityFile /Users/andrew/.ssh/labs [21:19:14] you’d want something similar [21:19:26] oh, except you shouldn’t use bastion-restricted, just bastion [21:20:58] Ankita: sorry, too much information? [21:21:11] one sec [21:22:44] I keep getting permission denied [21:23:03] Host bastion1.eqiad.wmflabs Hostname bastion.wmflabs.org ProxyCommand none Host bastion2.eqiad.wmflabs Hostname bastion2.wmflabs.org ProxyCommand none Host bastion3.eqiad.wmflabs Hostname bastion3.wmflabs.org ProxyCommand none Host drmf-ecf.eqiad.wmflabs Hostname bastion3.wmflabs.org ProxyCommand none Host *.eqiad.wmflabs ProxyCommand ssh -a -W %h:%p bastion.eqiad.wmflabs.org Host *.wmflab [21:23:20] this is what I have in my ~/.ssh/config file [21:23:46] 10Tool-Labs: Look into role::labs::tools::bastion vs. role::labs::bastion - https://phabricator.wikimedia.org/T93661#1142968 (10scfc) 5Open>3Resolved T93663 was PEBKAC :-). [21:25:17] Ankita: it’s hard for me to tell without line breaks, but that seems ok… what is your username on your local box, and what is it on labs? [21:25:38] username on local box is "ans23" [21:25:49] on labs it is "sharmaans" [21:26:12] 10Tool-Labs: wikitech configuration pages don't work - https://phabricator.wikimedia.org/T93663#1142982 (10scfc) No, `role::labs::bastion` appears twice in the "//Global// groups" part, once under "roles", once under "ssh". [21:26:32] Ankita: ok, so you need to specify a User somewhere in your config [21:26:54] as it is you’re trying to ssh as ans23 and of course you don’t have an account by that name… [21:27:14] Host *.wmflabs [21:27:21] User sharmaans [21:27:44] I’d expect that to help [21:28:34] I am typing "ssh sharmaans@drmf-ecf.eqiad.wmflabs" [21:28:56] error: ssh: Could not resolve hostname drmf-ecf.wmflabs.org: Name or service not known [21:29:48] ? [21:29:52] where’d that .org come from? [21:30:09] hmmm [21:30:11] I don't know [21:31:38] Also you have bastion.eqiad.wmflabs.org which I’m pretty sure doesn’t exist [21:31:43] was that from the docs someplace? [21:31:56] I’d think you want bastion.wmflabs.org [21:32:03] https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyCommand_ssh_option_.28recommended.29 [21:33:02] ok, but your config does not resemble that... [21:33:11] Your proxy command is different [21:34:44] https://gist.github.com/sharmaans/1cabdab04582edc0882b [21:34:47] this is my config [21:36:11] looks right to me. Try again? I’m looking at a logfile [21:36:38] Ok I just did [21:37:40] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [21:37:57] can you start by just connection directly to bastion1? I want to see what that looks like. [21:39:22] I tried "ssh bastion1.wmflabs.org" ... "permission denied (publickey) [21:39:54] same thing with "ssh sharmaans@bastion1.wmflabs.org" [21:42:03] RECOVERY - Puppet failure on tools-login is OK: OK: Less than 1.00% above the threshold [0.0] [21:43:23] sorry, I'm going to have to run, I will try again on friday when I come back. thanks andrewbogott [21:45:25] SPF|Cloud: Have you been able to access instances on other occasions? [21:45:36] Is there reason to think this is specific to the new wmt project? [21:46:17] 1) I've never accessed any other wmflabs server except than those with an external ip (bastion, tools-login, tools-dev, tools-trusty) [21:46:29] 2) this is indeed for the new wmt project, so that could be the issue too [21:46:58] ok [21:47:05] So, what have you tried so far? [21:47:19] this is the standard approach: https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyCommand_ssh_option_.28recommended.29 [21:47:38] I've seen that, but "local system" confuses me [21:47:56] local system is, like, the computer you are sitting at [21:48:06] the system you are ssh’ing from [21:48:10] that is Windows + PuTTY so that'll be difficult [21:48:21] ah! [21:48:25] With windows I cannot help you at all [21:48:30] But I think there are some docs for windows [21:48:45] hm, still thanks for the help [21:48:47] https://wikitech.wikimedia.org/wiki/Help:Access_to_instances_with_PuTTY_and_WinSCP [21:49:03] I haven’t ever done it, but those docs have worked for many [21:49:09] You have to forward a key with pageant I think [21:51:36] doesn't work [21:51:53] There might be other people around who can help troubleshoot — more likely earlier tomorrow when Europeans are still about. [21:51:56] Permission denied (publickey). <- I can connect to bastion but can't ssh to other instances from there [21:52:56] Yes, probably because the key isn’t being forwarded properly. [21:56:44] YuviPanda|zz: still up? [21:56:50] Hm, autocomplete says no [21:57:32] did the SSH certificate of tools-login.wmflabs.org change? [21:57:52] the fingerprint on https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/tools-login.wmflabs.org doesn’t match the fingerprint shown by ssh [21:57:55] ireas: yes. [21:58:02] There’s an email thread about it on the labs mailing list. [21:58:11] ah, okay. thanks! [21:58:23] but, that page was updated... [21:58:32] so there’s a new fingerprint but it should math the one on wikitech [21:59:50] i am getting the fingerprint 80:37:58:71:84:99:54:e7:17:dd:c4:be:54:48:41:57, which does not seem to match [21:59:52] andrewbogott: pageant did the trick [22:00:01] great! [22:00:10] thank you!! [22:00:34] SPF|Cloud: woo :p [22:01:05] I’m getting the same fingerprint as afeder [22:01:28] afeder: yeah, I’ve seen that, it’s something to do with which version of ssh I think... [22:01:32] * andrewbogott tries to remember [22:01:44] hmm [22:02:39] our fingerprint is the ECDSA fingerprint, not RSA [22:02:43] maybe that’s the reason [22:03:02] Yes, I think that’s why [22:03:08] I see RSA key fingerprint is f9:c1:53:72:f0:57:85:dd:6f:4d:cb:f6:d0:ce:84:aa [22:03:13] What are you using for ssh? [22:04:04] i am using OpenSSH [22:04:08] Try -oHostKeyAlgorithms='ssh-rsa' [22:04:12] penSSH_6.6.1p1 [22:04:24] Probably we ops are running old-timey versions [22:05:05] yes, that’s it [22:05:06] andrewbogott: ah, that seems to do the trick. i get the right fingerprint now. [22:05:11] thanks andrewbogott & afeder! [22:07:53] andrewbogott: How python are you? [22:08:05] Coren: fairly, what’s up? [22:09:24] andrewbogott: I've written the new version of the snapshot management thang in python, I could use a review for style. Ima just add you as reviewer when it moves to a changeset later. [22:09:39] ok [22:11:01] I've been working on it and using it for my testing for quite some time, so I know it *works*, but it's probably got style issues. [22:13:50] Coren: you suggested that hiera setting for old instances… do we have per-instance hiera? [22:13:57] or just per-project? [22:14:04] (which, per-project will work almost as well) [22:18:42] andrewbogott: If we don't have per-instance hiera, we should - that's the mechanism by which we want to get rid of setting variables for puppet in osmanager. [22:18:53] yeah [22:18:59] I guess I need yuvi to catch me up on that. [22:19:12] I guess there’s no reason I can’t do it in ldap for now :) [22:19:23] 6Labs, 6Phabricator: Mysql suggestions in Labs Phabricator - https://phabricator.wikimedia.org/T93677#1143279 (10Negative24) 3NEW [22:20:57] 6Labs, 6Phabricator: Mysql suggestions in Labs Phabricator - https://phabricator.wikimedia.org/T93677#1143305 (10chasemp) p:5Triage>3Normal [22:23:50] someone familiar with custom webservices? [22:31:56] how do I figure out why a tools webservice dies a few seconds after start? the only note in the log is "server stopped by uid 0" [22:38:02] para: if that's the only error, it's almost certainly caused by an out-of-vmem error. You can make sure by checking accounting info with 'qacct -j ' [22:40:46] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1143395 (10Andrew) Coren suggests that we do this by adding a puppet setting labs_old_dns=true for everything. After that, we can pull projects or instances off of dnsmas... [22:41:08] I don't know... the webservice died several times before qacct returned anything [22:51:41] If I clone from a Gerrit repo does it include all unmerged commits? [22:52:21] because my local repo of ops puppet isn't matching whats on Differential [22:55:50] 6Labs: Make a fact for project_id on labs instances - https://phabricator.wikimedia.org/T93684#1143429 (10Andrew) [22:55:59] 6Labs: Make a fact for project_id on labs instances - https://phabricator.wikimedia.org/T93684#1143431 (10Andrew) a:3Andrew [22:59:47] (03PS1) 10John F. Lewis: add web contents from toollabs [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199152 [23:00:18] andrewbogott: Can you explain why in my local repo of ops puppet I can see yours and a bunch of dzahn's commits but they aren't on the Differential repo [23:00:29] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ori.livneh was created, changed by Ori.livneh link https://wikitech.wikimedia.org/wiki/Nova+Resource%3aTools%2fAccess+Request%2fOri.livneh edit summary: Created page with "{{Tools Access Request |Justification=Harassment, spam, general racketeering |Completed=false |User Name=Ori.livneh }}" [23:00:33] I don’t know what a Differential repo is [23:01:03] just the main repo on git.wikimedia.org [23:01:37] I don’t know where that comes from. Gerrit is canonical, git.wikimedia.org may have a lag or be unmaintained. [23:02:18] ok i'll just merge then. I don't know why they would be out of sync [23:05:06] (03CR) 10Alpha: [C: 032 V: 032] "Looks good!" [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199152 (owner: 10John F. Lewis) [23:09:04] what is an acceptable memory usage for a custom webservice on the webgrid-generic queue? [23:09:50] ireas: By default, vmem is limited to 4G. [23:11:07] Coren: with the default settings, I’m unable to run java (even java -version fails). It worked with "jsub -mem 10g ...", but that’s a lot ^^ [23:11:24] Hi everyone [23:12:31] ireas: That's because java is a SERIOUS hog. Check https://wikitech.wikimedia.org/wiki/Help_talk:Tool_Labs#300-350M.3F_Really.3F for hints. [23:12:44] It seems setlocale under php has no effect, in particular for character sorting, but in other servers, it works. Is there something to do about it? [23:12:45] ireas: In particular, using -jamvm makes things *much* more frugal [23:13:35] (context: I'm working in tools-login) [23:14:46] Coren: I’ll have a look at it, thanks! [23:15:26] jem: I'm not locale expert, but I remember there are lots of caveats to get setlocale to work right. Lemme see if I can find something in my email archive. [23:16:17] Thanks, Coren [23:22:18] jem: What locale are you trying to set? [23:30:00] Coren: es_ES, but I've just solved it thanks to your question [23:30:27] I was writing just es_ES but es_ES.utf8 is needed [23:30:51] (It isn't in the other servers, but now I know) [23:31:26] Thanks again :) [23:35:09] 6Labs: dhclient overwrites /etc/resolv.conf - https://phabricator.wikimedia.org/T93691#1143549 (10scfc) 3NEW a:3scfc [23:36:36] 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1143557 (10scfc) [23:36:36] 6Labs: dhclient overwrites /etc/resolv.conf - https://phabricator.wikimedia.org/T93691#1143558 (10scfc) [23:50:25] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ori.livneh was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=149741 edit summary: