[00:56:52] 6Labs, 6Phabricator, 7Puppet: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428544 (10Negative24) 3NEW [00:57:22] 6Labs, 6Phabricator, 7Puppet: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428551 (10Negative24) a:3Negative24 [00:57:43] 6Labs, 6Phabricator, 7Puppet: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428544 (10Negative24) [00:57:46] 6Labs, 10Gerrit-Migration, 6Phabricator: Stabilize vcs-user owned files and directories in Phab-02 - https://phabricator.wikimedia.org/T95982#1428552 (10Negative24) [00:58:18] 6Labs, 6Phabricator, 7Puppet: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428544 (10Negative24) [01:29:55] YuviPanda: I'm trying to add a puppet group in WT (role::phabricator::labs::differential) and its returning "Bad resource name provided. Resource names start with a-z, and can only contain a-z, 0-9, and - characters." [01:32:12] ^ puppet groups always contain : [01:50:36] ^ nm. Figured it out and updated docs [02:25:11] 10MediaWiki-extensions-OpenStackManager: Puppet group management interface doesn't allow all characters allowed for puppet variables and classes - https://phabricator.wikimedia.org/T38044#1428580 (10Negative24) I updated some confusing documentation [[ https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetm... [07:54:54] 6Labs, 10Tool-Labs, 10Pywikibot-OAuth: Add OAuth to https://tools.wmflabs.org/pywikibot-testwiki/ - https://phabricator.wikimedia.org/T104291#1428775 (10jayvdb) [08:14:05] s51053 is abusing his/her access to replica dbs and creating lag for other users. His/her queries are to be terminated. [08:20:33] ^lag immediatly improved upon performing the termination [08:40:34] jynus: please log such actions, so uses can know what happened to their process [08:40:40] *users [08:43:25] I do not know which project that user belongs to [08:45:13] I need an easy way to do that [08:45:37] jynus: can you see what ip he was coming from ? [08:47:49] no, that is not logged [08:48:13] I see, thanks [08:52:25] I got an IP from my console history, but it does not resolve to any know service [08:54:19] 10Tool-Labs-tools-Other: Https bug with Contributions by User to a page - https://phabricator.wikimedia.org/T104812#1429008 (10Aklapper) p:5Unbreak!>3Triage Confirming, however the bottom of that page says > Bugs, suggestions, questions? Contact the author at User talk:Scottywong @Rsrikanth05: Did you do tha... [08:56:17] 10Tool-Labs-tools-Other: usersearch on Tool-Labs lists invalid URLs for Contributions by User to a page - https://phabricator.wikimedia.org/T104812#1429010 (10Aklapper) [09:09:54] 6Labs, 10Tool-Labs, 7Database: mixnmatch_p on wikidatawiki.labsdb slow - https://phabricator.wikimedia.org/T104833#1429051 (10Nemo_bis) 3NEW [09:10:48] 10Tool-Labs-tools-Other: usersearch on Tool-Labs lists invalid URLs for Contributions by User to a page - https://phabricator.wikimedia.org/T104812#1429059 (10Rsrikanth05) @Aklapper Scottywong has retired on the English Wikipedia, hence I posted it here. Forgive me for the priority levels. This is the first time... [09:24:16] 6Labs, 10Tool-Labs, 7Database, 7Easy: mixnmatch_p on wikidatawiki.labsdb slow - https://phabricator.wikimedia.org/T104833#1429083 (10jcrespo) p:5Triage>3Low [09:26:34] ^you, yes, YOU can help! [09:48:38] jynus: ldaplist -l passwd | grep -C 15; ldaplist -l servicegroups | grep -C 15 --> tools.jackbot [09:49:04] !log tools 10:14 s51053 is abusing his/her access to replica dbs and creating lag for other users. His/her queries are to be terminated. (= tools.jackbot / user jackpotte) [09:49:08] Logged the message, Master [09:50:09] (the s in front of the uid means it's a service group as well, I think) [09:56:36] will !log tools.jackbot work? [09:57:42] jynus: yeah, but that probably won't reach the owners if they doný use the SAL. I'd just mail to tools.jackbot@tools.wmflabs.org [09:59:10] (that only works for tools users, though) [10:25:18] valhallasw`cloud, https://etherpad.wikimedia.org/p/dear-tool-user [10:53:22] jynus: fixed two typos, changed one sentence (the one on correcting measures), looks OK otherwise [10:53:34] jynus: I also realized a user talk page message might be more practical, as that's logged [10:54:18] https://wikitech.wikimedia.org/wiki/User_talk:JackPotte [10:56:12] jynus: and if it's obvious why the query is causing excessive load, it'd be good to mention that, as most users would not necessarily understand why a certain query is bad [10:56:29] in any case, thanks for taking care of load issues [10:56:36] thank you, valhallasw`cloud [10:56:53] problem is, valhallasw`cloud, is always time [10:57:13] if I could, I would sit a bit with the user and mentor it [10:57:22] *him/her [10:57:39] but I really *cannot* [10:58:15] I cannot even do it with production [10:59:01] of course :-) but I guess by looking at the query you'd be able to say 'this is an issue because it's an insert that takes ages', or 'don't do outer joins', or something like that? If there's no obvious cause, it's of course perfectly fine to not put time in it to figure out why the query was an issue [11:00:02] not really, "too many and too long" was the obvious thing here [11:00:30] adding that is already helpful [11:00:55] the only thing that can be done is refer to EXPLAIN, but that is more useless if it is not known by the user [11:08:07] thanks, for the help and suggestions, much appreciated [11:38:52] Negative24: yeah unfortunately [12:36:34] 6Labs: clean up old ec2id-based salt keys on labs - https://phabricator.wikimedia.org/T103089#1429454 (10ArielGlenn) So we have a variety of hosts using the old names. I have a cut and paste list here; these are not salt related issues but should be fixed. +not found: android-build.mobile.eqiad.wmflabs (i-000... [12:37:34] 6Labs, 3Labs-Sprint-101, 3Labs-Sprint-102: Sort out remaining virt1000 salt minions - https://phabricator.wikimedia.org/T103010#1429462 (10ArielGlenn) Still three left for whatever reason. android-build.mobile.eqiad.wmflabs mobile-hierator2.mobile.eqiad.wmflabs test-carbon-c-relay.graphite.eqiad.wmflabs [13:22:04] 6Labs, 10wikitech.wikimedia.org: Cannot log into wikitech - https://phabricator.wikimedia.org/T103939#1429594 (10JanZerebecki) 5Open>3Resolved a:3JanZerebecki Works now. [13:34:56] 6Labs, 10Tool-Labs, 5Patch-For-Review: Create process for 'tool labs is down' notifications on tools.wmflabs.org/* - https://phabricator.wikimedia.org/T102971#1429662 (10valhallasw) Need to update this to https://s.codepen.io/Krinkle/debug/domOQK?#errorpage [13:49:00] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104: Labs: Make a new backup of the Labs storage to codfw - https://phabricator.wikimedia.org/T103356#1429675 (10yuvipanda) a:5coren>3None The broken mount is done, and there's a new copy of a new snapshot going to /srv/backup--*... [13:49:22] 6Labs, 3Labs-Sprint-104: Recover wikidata.php from magnustools - https://phabricator.wikimedia.org/T104337#1429680 (10yuvipanda) a:5coren>3yuvipanda [13:49:34] 6Labs, 3Labs-Sprint-104: Recover files from old corrupted file system (Tracking) - https://phabricator.wikimedia.org/T104334#1429681 (10yuvipanda) a:5coren>3yuvipanda [13:49:45] 6Labs, 3Labs-Sprint-104: Recover GND bot from wikidata-todo - https://phabricator.wikimedia.org/T104336#1429685 (10yuvipanda) a:5coren>3yuvipanda [13:52:17] YuviPanda: regarding mail and dns in labs. if i set up a host and give it a dns name, and configer a mail server on it, will i be able to get mail to matanya@wmflabs.org ? [13:54:07] matanya: no, you can't configure MX records from wikitech [13:54:24] i guess so, thanks valhallasw`cloud [13:54:32] *guessed [13:55:25] matanya: what do you need it for? would it be OK to use @tools.wmflabs.org for it? [13:56:17] kust thinking out loud. i thought about something like people.wikimedia.org but for labs [13:59:07] "waiting for available socket" what? http://tools.wmflabs.org/glamtools/baglama2/#gid=172&month=201505 [13:59:30] Nemo_bis: {{worksforme}}? [13:59:41] maybe browser overload [13:59:50] matanya: there should not be a people.wikimedia.org if wikimedia people can't use it [14:00:02] yeah, I wouldn't expect a message in your browser if that were a server-side wit [14:00:25] Nemo_bis: define wikimedia people :) [14:01:28] PROBLEM - Puppet failure on tools-checker-02 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:01:36] I love it when people think of RTL: https://tools.wmflabs.org/glamtools/baglama2/#gid=172&month=201505&giu=hewiki&server=he.wikipedia.org [14:01:39] matanya: doesn't matter :) [14:01:56] PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:03:02] PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:03:16] PROBLEM - Puppet failure on tools-webgrid-generic-1404 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:03:18] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:03:40] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:03:48] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1409 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:04:18] PROBLEM - Puppet failure on tools-mail is CRITICAL 22.22% of data above the critical threshold [0.0] [14:04:22] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL 30.00% of data above the critical threshold [0.0] [14:04:32] PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:04:33] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:04:38] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:04:52] PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:04:58] PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:05:20] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1210 is CRITICAL 33.33% of data above the critical threshold [0.0] [14:05:32] PROBLEM - Puppet failure on tools-precise-dev is CRITICAL 20.00% of data above the critical threshold [0.0] [14:05:33] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:06:04] wahhh [14:06:29] No such file or directory - /etc/puppet/modules/labstore/files/projects-nfs-config.yaml [14:06:30] andrewbogott: I'm pretty sure ^ is me [14:06:50] I wasn’t worried :) [14:07:10] andrewbogott: I changed a parser function, and apparently those require a puppetmaster restart in labs but not in prod? [14:07:29] prod uses passenger, so no puppetmaster there. [14:07:34] So, seems normal. [14:07:52] andrewbogott: oh, I see. the labs / prod puppetmasters are htat different? [14:07:54] I guess we could make the auto update detect w/not there was a change and restart the master if there was? [14:07:59] labcontrol1001 vs palladium [14:08:09] Oh — no, those should be the same as far as I know. [14:08:16] And I don’t think either one uses a puppetmaster. [14:08:21] It’s only ::self that does. [14:08:39] PROBLEM - Puppet failure on tools-shadow is CRITICAL 60.00% of data above the critical threshold [0.0] [14:08:55] PROBLEM - Puppet failure on tools-master is CRITICAL 33.33% of data above the critical threshold [0.0] [14:08:57] Is it possible that palladium ‘needs’ it too but there just aren’t any symptoms of it being behind? [14:09:01] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:09:04] 6Labs, 10Tool-Labs: lighttpd does not correctly close connections (CLOSE_WAIT) - https://phabricator.wikimedia.org/T104799#1429722 (10valhallasw) [14:09:54] andrewbogott: hmm, but when I made changes to ipresolve function earlier nothing needed restarting... [14:10:04] 6Labs, 10Tool-Labs: lighttpd does not correctly close connections (CLOSE_WAIT) - https://phabricator.wikimedia.org/T104799#1427257 (10valhallasw) [14:10:15] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 11.11% of data above the critical threshold [0.0] [14:10:24] I can never follow when it does and doesn’t need it. [14:10:29] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:10:36] PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:11:00] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL 60.00% of data above the critical threshold [0.0] [14:11:06] One thing to keep in mind is that puppet-merge doesn’t push to labcontrol1001 — it rebases with a cron. So palladium and labcontrol1001 can be out of sync for a minute or two. never more than that though. [14:11:18] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL 66.67% of data above the critical threshold [0.0] [14:11:42] PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:12:10] YuviPanda: in any case, looks like recoveries are on the way. [14:12:18] andrewbogott: yes, I rebooted. [14:12:22] PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:12:27] andrewbogott: err, restarted apache [14:12:29] not rebooted... [14:16:15] elee: heh. we must be on at different times. still around? [14:20:13] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [14:23:27] Negative24: yessir [14:23:31] caught me at a good time [14:24:16] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-102, 3Labs-Sprint-103, 3Labs-Sprint-104: Audit projects' use of NFS, and remove it where not necessary - https://phabricator.wikimedia.org/T102240#1429772 (10yuvipanda) [14:24:18] 6Labs, 5Patch-For-Review: Remove NFS mounts from project cephtest - https://phabricator.wikimedia.org/T102381#1429770 (10yuvipanda) 5Open>3Resolved Done! [14:24:45] elee: good [14:24:50] 6Labs: Archive NFS data for projects that no longer have NFS - https://phabricator.wikimedia.org/T104857#1429775 (10yuvipanda) 3NEW [14:25:10] Negative24: I can't think of additional troubleshooting steps so have given up =p [14:25:18] elee: where's your ssh config again? [14:25:32] https://github.com/leee/.dotfiles/blob/master/sshconfig [14:25:36] also: [14:25:51] 6Labs: Archive NFS data for projects that no longer have NFS - https://phabricator.wikimedia.org/T104857#1429795 (10yuvipanda) Also some projects used to have /home and /project and have only one now. Those should be archived as well. [14:25:54] http://web.mit.edu/leee/Public/ssh-stawp.txt [14:25:56] and http://web.mit.edu/leee/Public/ssh-stawp2.txt [14:26:00] I have a tiny idea that might work but I don't know why someone else wouldn't have caught it [14:26:06] sure hit me [14:26:17] elee: very copy and pastey :p [14:26:21] ^_^ JohnFLewis [14:26:23] RECOVERY - Puppet failure on tools-checker-02 is OK Less than 1.00% above the threshold [0.0] [14:26:43] yeah I probably should think of a better machine name than the current one [14:27:23] ok so bastion resolves but not tunneling [14:27:25] also for some really funky reason I kept thinking harej was indian [14:27:28] Negative24: correct [14:28:29] elee: fyi this is mine: https://github.com/jdloft/dotfiles/blob/master/link/.ssh/config [14:28:58] its significantly simpler and everything below ln 38 is shorthand [14:29:33] oh wait what about gerrit? [14:29:41] or do you use gerrit over https [14:29:52] 6Labs, 3Labs-Sprint-102, 3Labs-Sprint-104, 5Patch-For-Review: Labs: manage-nfs-volumes-daemon rewrite - https://phabricator.wikimedia.org/T102782#1429817 (10yuvipanda) Lots of patches later, the stuff left to do is to support new projects: # Create folders under /srv/others # Bind mount that onto /exp I'... [14:30:40] elee: I haven't thought about gerrit. Its always worked on ssh for me [14:30:46] hrm... [14:31:12] I wish wmflabs had a bastion host that people could dev on [14:31:45] elee: https://gist.github.com/JohnFLewis/e65febe9bc15b00b2510 mine if it helps too [14:31:54] hrm should I just rewrite this [14:31:55] RECOVERY - Puppet failure on tools-exec-1403 is OK Less than 1.00% above the threshold [0.0] [14:32:19] oh I've got a really good question for you al [14:32:20] all* [14:32:24] elee: the config? yes as like 99% of the top half is root-only stuff [14:32:26] how do you guys keep keys portable? [14:32:44] physically portable? [14:32:51] not necessarily [14:33:01] RECOVERY - Puppet failure on tools-exec-1402 is OK Less than 1.00% above the threshold [0.0] [14:33:07] elee: mines encrypted so I don't worry about it as much as other people [14:33:15] like I could always store my private keys in my dotfiles repo [14:33:19] I store by keys in an encrypted usb [14:33:21] but people consider that a Bad Idea [14:33:30] like I see nothing wrong with it though [14:33:32] elee: comment out line 45 for me [14:33:37] no don't do that [14:33:40] RECOVERY - Puppet failure on tools-shadow is OK Less than 1.00% above the threshold [0.0] [14:33:42] RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0] [14:33:46] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1409 is OK Less than 1.00% above the threshold [0.0] [14:33:50] Negative24: roger, standby [14:34:02] oh wait that makes sense [14:34:02] RECOVERY - Puppet failure on tools-exec-1212 is OK Less than 1.00% above the threshold [0.0] [14:34:12] (that was to the dotfiles storage comment, still comment out ln 45) [14:34:22] RECOVERY - Puppet failure on tools-exec-catscan is OK Less than 1.00% above the threshold [0.0] [14:34:22] RECOVERY - Puppet failure on tools-mail is OK Less than 1.00% above the threshold [0.0] [14:34:24] oh irc [14:34:32] you make things so clear :P [14:34:32] RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0] [14:34:33] RECOVERY - Puppet failure on tools-exec-1207 is OK Less than 1.00% above the threshold [0.0] [14:34:38] RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0] [14:34:52] RECOVERY - Puppet failure on tools-exec-1208 is OK Less than 1.00% above the threshold [0.0] [14:34:59] RECOVERY - Puppet failure on tools-bastion-02 is OK Less than 1.00% above the threshold [0.0] [14:35:02] but no cigar [14:35:18] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [0.0] [14:35:30] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0] [14:35:34] RECOVERY - Puppet failure on tools-precise-dev is OK Less than 1.00% above the threshold [0.0] [14:35:34] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [0.0] [14:35:42] elee: do what I do - just have a host bastion.wmflabs.org definition separately with user and identityfile [14:36:01] wait the problem here is being able to use bastion for what it does [14:36:12] RECOVERY - Puppet failure on tools-webproxy-01 is OK Less than 1.00% above the threshold [0.0] [14:36:13] see http://web.mit.edu/leee/Public/ssh-stawp.txt [14:36:34] wait hah JohnFLewis != James_F [14:36:55] how did you know? :o [14:37:01] =p [14:37:23] though anyway, do as I said. keep the bastion definition simple [14:37:26] RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [0.0] [14:38:01] 6Labs: fix labs jessie instances to have correct salt version - https://phabricator.wikimedia.org/T104849#1429845 (10Krenair) [14:38:16] RECOVERY - Puppet failure on tools-webgrid-generic-1404 is OK Less than 1.00% above the threshold [0.0] [14:38:17] elee: James_F is James_F [14:38:20] RECOVERY - Puppet failure on tools-exec-1401 is OK Less than 1.00% above the threshold [0.0] [14:38:20] RECOVERY - Puppet failure on tools-exec-1402 is OK Less than 1.00% above the threshold [0.0] [14:38:21] =p [14:38:32] okay uh let me just make it simple [14:38:36] I'm also not Negative24. [14:38:38] :-) [14:38:43] gasp [14:38:46] WHAT [14:38:54] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [14:39:56] okay hrm [14:40:03] JohnFLewis: I did just do that [14:40:09] Host bastion.wmflabs.org [14:40:15] with the appropriate user and identity file [14:40:16] no cigar [14:40:38] 6Labs: Archive NFS data for projects that no longer have NFS - https://phabricator.wikimedia.org/T104857#1429864 (10Andrew) I periodically run archive-managed-projects. From the comments there: ### puppet:///files/ldap/scripts/archive-managed-projects ###########################################################... [14:40:38] RECOVERY - Puppet failure on tools-exec-1406 is OK Less than 1.00% above the threshold [0.0] [14:40:44] Positive24: hold on one sec. I'm making a simplified ssh config that I know works [14:40:47] keep getting denied (publickey)? [14:40:50] WAIT [14:40:52] UH [14:40:55] hold on [14:41:00] RECOVERY - Puppet failure on tools-exec-cyberbot is OK Less than 1.00% above the threshold [0.0] [14:41:09] 10Quarry: Unicode in query results in strange behavior - https://phabricator.wikimedia.org/T71224#1429869 (10Halfak) 5Open>3Resolved [14:41:18] 10Quarry: Unicode in query results in strange behavior - https://phabricator.wikimedia.org/T71224#742694 (10Halfak) Yup. Works for me now. [14:41:36] HAHA [14:41:42] oh god that's hilarious [14:41:42] RECOVERY - Puppet failure on tools-exec-1209 is OK Less than 1.00% above the threshold [0.0] [14:41:48] okay uh [14:41:51] my dotfiles [14:41:54] my sshconfig rather [14:41:58] look at line 47 [14:42:07] 6Labs, 10Tool-Labs, 10Wikimania-Hackathon-2015: Conduct a research tools workshop at wikimania hackathon 2015 - https://phabricator.wikimedia.org/T91062#1429873 (10Halfak) [14:42:09] I define proxycommand, but not my identityfile, let alone my username [14:42:12] that's pure gold [14:42:48] and wait I don't need to define identityfile because it gets automatically asked for [14:42:53] remember people: leee != elee [14:43:13] yeup it works [14:43:37] that was it [14:43:40] just add User elee [14:43:43] wow I'm a dumbass [14:44:38] but uh wow [14:44:41] thanks JohnFLewis and Negative24 [14:44:57] working configuration at https://github.com/leee/.dotfiles/blob/master/sshconfig [14:45:09] I should clean up my dotfiles, hrm. [14:45:13] yeah [14:45:16] heh [14:45:27] I was getting to the same point in my investigation [14:45:43] lines 4 to 40 will likely never be needed for you anyway [14:45:56] yeah exactly [14:46:03] uh what are bastions 1-4 for anyways [14:46:39] which ones? [14:46:48] was looking at your dotfiles [14:46:51] orr ather, Negative24's [14:47:16] I do however want to talk to harej at some point about getting a new cluster running [14:47:37] out of curiosity, how large is eqiad? [14:48:59] elee: I'm updating my config [14:49:12] check in a min. [14:49:14] as am I =p [14:49:22] elee: around 500 servers I think. I could be way out of date though. [14:49:40] hrm, that's a lot, I wouldn't be able to ask for that many, let alone 100 [14:49:48] 10? 20? =p [14:49:54] but we do have tons of external IPs [14:50:30] elee: now check/refresh [14:51:25] bastion 2 is er [14:51:26] ? [14:51:38] I remember tools has two as well [14:51:44] the latter being larger at 8GB of ram [14:51:48] and 4 cores methinks? [14:52:00] oh right, login.tools vs dev.toolsw [14:52:03] the second one is for a hot swap [14:52:13] only in outages [14:52:17] reasonable [14:52:32] on tools its different. the second is for big dev stuff such as compiling [14:52:45] since on bastion you don't do any dev stuff [14:56:23] Negative24: are there hosts that one can do dev work on? [14:57:12] elee: generic dev stuff? [14:57:39] yeah - I keep all the things I do on seperate machines and would like to keep wikimedia related stuff elsewhere [14:58:48] andrewbogott: elee: I was bored and well, 668 servers in ganglia for eqiad :) [14:59:16] uh well [14:59:24] jesus christ [14:59:24] you must be very bored [14:59:34] that's fun [14:59:51] I honestly would have written a script to do it if I was bored [14:59:53] Gmetad must totally be happy [15:00:16] elee: which means the other 3 datacenters have 371 collectively [15:00:25] wait, there are other ones? [15:00:36] I thought it was just eqiad and something internal with wmf [15:00:51] elee: labs doesn't really want to just give out machines for misc stuff. In fact a lot of the docs say do everything on your local machine and then push to labs for testing [15:00:59] yeah, eqiad (Ashburn), ulsfo (San Fran), esams (Haarlem) and codfw (Dallas) [15:01:16] eqiad is mostly for labs, right? [15:01:31] reasonable. I have a machine that I remotely do dev stuff on but meh [15:01:35] I need to rethink my workflow [15:01:36] eqiad has the only labvirt machines which there are 8 I think? [15:02:09] wait, kvm machines? [15:02:19] I thought labs was a openstack shop [15:02:28] no its openstack which is kvm under the hood [15:02:29] Negative24: ^ + eqiad is one of the main dcs where all traffic goes though codfw is opening as a second main dc [15:02:46] ah [15:02:57] not opening, slowly being deployed as so [15:03:26] ah [15:03:58] sorry for the lack of expertise with nova [15:04:08] I'm hoping to twiddle with mesos when I have more time over the summer though [15:05:43] just wait until we get horizon. that's a whole different ballpark. [15:06:05] come to thing of it, anyone know how to track the progress of horizon [15:06:10] Negative24: labs isn't the only visualized environment in eqiad too :p [15:06:13] horizon? [15:07:10] elee: openstack horizon [15:07:52] oh wait [15:07:54] this looks great [15:08:27] wait [15:08:30] WHOA [15:09:48] elee: where are you looking? [15:09:54] docs [15:10:12] okay wow uh I wonder if they know about this [15:10:14] ganeti ftw imo ;) [15:10:26] elee: they know. Its installed already [15:10:43] oh no this is for someone else [15:11:03] but wow [15:11:42] JohnFLewis: ganeti is cool but I don't know of anyone (other than Google!) that uses it [15:11:51] Negative24: I do :) [15:11:56] and you [15:11:57] the WMF [15:12:00] I'm playing with it. [15:12:09] JohnFLewis: where? [15:12:15] in production [15:12:28] https://wikitech.wikimedia.org/wiki/Ganeti [15:14:39] I wasn't aware of that [15:16:04] elee: where are you playing with it [15:16:26] Negative24: personal cluster [15:16:44] oh. is this MIT? [15:17:05] sure [15:17:15] lets just say its 5 esxi hosts inside a dorm room [15:17:23] rather [15:17:32] I'm seeing if I like esxi over kvm or xen [15:17:35] or something else basically [15:18:22] MIT's great though - literally labs left and right retire machines every two to three years, we go dumpster diving, and find really specced out machines for free [15:18:37] get drives since drives are typically destroyed, and we're good to go [15:19:08] this current cluster are 5 identical poweredges (I think, I haven't seen them for a long time now because summer and stuff) with 128GB each [15:21:30] wow [15:34:03] sorry but I sent emails to wrong addresses date@tools.wmflabs.org, by@tools.wmflabs.org, User-Liangent@tools.wmflabs.org, and@tools.wmflabs.org, User-Liangent-adminbot@tools.wmflabs.org, from@tools.wmflabs.org, edits@tools.wmflabs.org, Template-Dyk@tools.wmflabs.org, -s@tools.wmflabs.org because of bad commands used [15:34:38] liangent: :D they should just bounce, I think [15:34:39] any matched existent user? [15:35:19] from the looks of it probably not [15:36:01] okay [15:56:00] I'm not sure if this is the appropriate place or if this is even an issue or if it is intentional, but the service running on tiles.wmflabs.org is no longer functioning for me. For example, the following URL results in a 500 internal server error: http://a.tiles.wmflabs.org/bw-mapnik/13/2099/3040.png whereas previously this would have returned an image file. Possibly related -- going to http://a.tiles.wmflabs.org/bw-mapnik/ results in a file not [15:56:01] found and http://tiles.wmflabs.org/ just shows a default directory listing as if the server is missing the appropriate website. I'm simply not sure if this is a known issue, if this service was intentionally shut down, or if I'm doing something wrong (although to my knowledge, no changes have been made on my side and these requests previously succeeded). Any information is appreciated. Thanks! [16:00:55] Lee-, there was a recent issue, see topic [16:04:52] jynus, I did read that, but I misunderstood it. I took it to mean that the restoration of the June 8th backup was completed on June 19th. Do you happen to know if this project is expected to be restored or if I should seek an alternative / try to host my own? [16:05:48] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [16:06:40] Lee-: it's expected to be restored, but it's nontrivial due to the size [16:07:52] OK thanks for the information. I really appreciate the services offered by the wikimedia labs group. [16:08:45] YuviPanda|afk: new formatting: http://tools-beta.wmflabs.org/errorpage-generic.html and http://tools-beta.wmflabs.org/ :-) [16:11:13] Krinkle: ^ :-) [16:11:14] Lee-: I can't find the relevant phabricator task at the moment, though :/ [16:11:37] YuviPanda|afk: do you know from the top of your head which task restoring tiles/NFS is? [16:15:19] hmm found the bounces [16:16:50] but they're in Spams [16:16:50] it says "Why is this message in Spam? We've found that lots of messages from mail.tools.wmflabs.org are spam." [16:16:51] in Gmail [16:16:51] :/ [16:16:52] people marking cron mail as spam? [16:18:00] * CP678|away is away: This is a manual computer virus. Please copy paste me in your away message. I'm not here right now. [16:18:12] CP678|away: please turn off that script [16:18:23] valhallasw`cloud, what script? [16:18:31] 18:17 — CP678|away is away: This is a manual computer virus. Please copy paste me in your away message. I'm not here right now. [16:19:09] Hmm... I'm not running a script to do that though. [16:20:21] valhallasw`cloud: Hm.. seems http://tools.wmflabs.org/err is still weirdly blank [16:20:24] not sure what happened there [16:20:32] been blank for a few days at least in too labs [16:20:33] error pages [16:20:42] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 822767 bytes in 2.636 second response time [16:20:44] It used to have a more informative page there [16:22:13] Krinkle: I'm not sure what should be there, as there's no tool called 'err' [16:22:24] Yes? [16:22:34] a 404 error? [16:22:42] ohhhh [16:22:57] yeah, I see your point [16:23:23] In fact, to my great frustration, it has error pages even for things it shouldn't (e.g. when you return 500 or 404 from a tool app, it will overwrite it) [16:23:32] and in the small case where it has its own territory, it doesn't. [16:23:41] the one place I'd say it's okay to have its own error page [16:25:01] yeah, it should only do something with 5XX's if there's no content, I think [16:25:02] https://phabricator.wikimedia.org/T66393 is about 404s with content (e.g. mediawiki) [16:25:02] https://phabricator.wikimedia.org/T103662 [16:25:10] anyway, I'm not sure what was the actual error page beforehand [16:29:24] yuvipanda: About? [16:29:47] I seem to be having connectivity issues to labs in general now [16:29:55] 6Labs, 10Tool-Labs: 'webservice not available' message no longer shown - https://phabricator.wikimedia.org/T104870#1430484 (10valhallasw) 3NEW [16:32:25] Krinkle: hm, it does work if the user exists, e.g. https://tools.wmflabs.org/abibot/ [16:33:27] 6Labs, 10Tool-Labs: 'webservice not available' message no longer shown - https://phabricator.wikimedia.org/T104870#1430515 (10valhallasw) [16:35:13] 6Labs, 10Tool-Labs: Mails from tools are being marked as 'spam' by gmail - https://phabricator.wikimedia.org/T104871#1430526 (10valhallasw) 3NEW [16:36:26] liangent, could you post the headers of the bounces in that bug? [16:38:32] (feel free to leave out your real e-mail address, of course) [16:38:33] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [16:38:48]