[00:04:28] 6Labs, 10Tool-Labs, 10Labs-Infrastructure, 3Labs-Sprint-115: Can't delete rule in default security group - https://phabricator.wikimedia.org/T112492#1668981 (10jcrespo) @Andrew, I can run the following SQL on m5: `ALTER TABLE nova.security_group_rules CHANGE deleted deleted int;` This will leave the tabl... [00:59:52] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/JustBerry was created, changed by JustBerry link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/JustBerry edit summary: Created page with "{{Tools Access Request |Justification=Design tools for searching similar contribs as already-existing socks (searching pages and sub-pages of pages commonly edited by known-so..." [01:05:08] Gentle push for: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/JustBerry. [01:08:02] (03PS1) 10Greg Grossmeier: Add #releng-epics to -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/240627 [01:51:54] andrewbogott or yuvipanda : can you make VolerE and Spage admins of Nova_Resource:Design ? We need to reboot an instance [01:52:00] *VolkerE [01:54:41] @ spagewmf @ andrewbogott @ yuvipanda Request above. [02:15:42] spagewmf: I can do that in about 30nin [02:15:43] Min [02:15:58] JustBerry: for you too [02:16:07] yuvipanda: ? [02:16:11] yuvipanda: Ah, all right. [02:16:13] yuvipanda: nan, not a problem. [02:16:15] yuvipanda: No rush. [02:16:22] Unless spagewmf can. [02:16:29] @ yuvipanda ^^ [02:17:02] He can't [02:17:31] JustBerry: I doubt I'm a tool labs admin, too much shooting foot in mouth :) [02:18:23] spagewmf: Lol. [02:18:35] yuvipanda: Earwig brought me over here :) [02:18:42] err [02:18:44] wugga2k15 ^^ [02:18:48] :) [02:18:56] He does not endorse me though, lol. [02:18:58] @ Earwig ^^ [02:18:59] yep [02:19:33] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/JustBerry was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=184439 edit summary: [02:19:55] yuvipanda: If that was you, thanks. If not, no need for ^^ :) [02:19:59] @ Earwig Too late ^^ lol [02:20:21] alas, tim is not yuvi [02:21:12] Earwig: tim = wugga? [02:21:17] what? no [02:21:24] I don't have the ability to do that stuff [03:26:02] JustBerry: FYI, Tim is scfc on IRC [03:26:10] Negative24: K. [03:26:36] Probably a totally different person than you are working with. He handles a lot of WT requests [04:00:59] "Kubernetify the Vagrants and set Dockers to LXC" "Aye-aye cap'n" goodnight [04:38:30] (03CR) 10Legoktm: "Should we use Release-Engineering(.*) ?" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/240627 (owner: 10Greg Grossmeier) [04:57:39] Negative24: legoktm: How do you remove channel logs from http://wm-bot.wmflabs.org if you're the channel owner or have done @logoff [04:58:06] Earwig: ^^ [04:58:14] yuvipanda: If online, see above. [04:58:27] I don't know [04:59:26] You have to ask petan [05:01:08] yuvipanda: Done. [06:11:48] 6Labs, 10Tool-Labs: create diamond reporter & shinken alert for /var/log/account/pacct and/or pacct.1 size - https://phabricator.wikimedia.org/T107617#1669683 (10greg) See also: T91354#1669678 [06:51:24] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:26:23] RECOVERY - Puppet failure on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [0.0] [08:47:13] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1669896 (10Chmarkine) [[ https://letsencrypt.org/ | Let's Encrypt ]] provides free trusted(*) DV non-wildcard certs. We have 31 domains lists [[... [09:20:42] 6Labs, 10Maps, 5Patch-For-Review: maps-warper /mnt vbd partition errored, turned read only and went missing after reboot - https://phabricator.wikimedia.org/T112641#1669961 (10Chippyy) Very Many Thanks! Parted got installed, and the /srv partition now shows up as /dev/mapper/vd-second--local--disk Should... [09:44:47] 6Labs, 10Tool-Labs: Install composer on tools-login - https://phabricator.wikimedia.org/T104789#1670054 (10valhallasw) A few things I tried. - building an fpm package from the integration/composer repo works: https://gerrit.wikimedia.org/r/#/c/240451/ - but might not be an improvement over just using git c... [11:44:51] admin? [12:12:27] PROBLEM - SSH on tools-exec-1217 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:17:22] RECOVERY - SSH on tools-exec-1217 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [12:31:00] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1670489 (10Krenair) I think we'd also want upload.beta.wmflabs.org, maybe stream.wmflabs.org, all of the m./zero. variants? What about mx.beta.w... [12:38:14] yuvipanda: :O [13:09:59] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1670596 (10Lixxx235) Chmarkine, there's always StartCom/StartSSL which has free certs, and they're already trusted by default in all major brows... [13:47:47] (03Abandoned) 10Tim Landscheidt: WIP: Update rmtool [labs/toollabs] - 10https://gerrit.wikimedia.org/r/122274 (owner: 10Tim Landscheidt) [13:56:43] Can someone help me run 'select page_id, YEAR(rev_timestamp),MONTH(rev_timestamp) from page join revision_userindex on page.page_id = revision_userindex.rev_page where page_namespace=0 and page_is_redirect=0 and rev_user !=0 and lower(CONVERT(rev_user_text USING latin1)) not like '%bot' group by rev_page, YEAR(rev_timestamp), MONTH(rev_timestamp) having count(*) > 4;' on en db [13:57:24] It always fails with '_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')' [13:57:53] I've tried setting the interactive timeout & the wait timeout, but it still fails [13:58:05] any help will be much appreciated :-) [13:59:23] Jeph, Coren: I get a similar error using MySQL workbench. It seems the MySQL server gets bored on us. [13:59:44] But that happens for queries that take forever [14:01:08] I tried dumping the db, to try and create an index on rev_page, but I can't dump the table either, permissions I guess. [14:01:40] I looked for a dump with the revisions table, couldn't find one either [14:04:02] Jeph: You may be hitting into memory or runtime limits. Jaime should be able to tell you for sure. [14:05:26] Jeph: But from a pure SQL pov, what kills you is that lower(convert()) on rev_user_text. You're almost certainly better off doing the group by on all the users and filter out bots on the result instead. [14:07:48] Is Jaime here? I ran both versions on smaller wiki's , fi, ja etc. lower(convert()) and join user_groups, and they gave me almost identical results, joining gave me the extra trouble of having to combine all the rows for a given editor, extra rows for every role. [14:08:28] So I stayed put with the lower(convert()) version [14:09:09] spagewmf: I missed your request yesterday; did Yuvi already take care of it? [14:22:42] 6Labs, 10Tool-Labs, 5Patch-For-Review: Remove modules/toollabs/files/host_aliases - https://phabricator.wikimedia.org/T109485#1670738 (10scfc) After restarting the grid engine master and execd on `tools-exec-1201`, `qstat -f` still showed the queues for that instance as `au`, but: ``` scfc@tools-bastion-01:... [14:28:10] 6Labs, 10Tool-Labs, 5Patch-For-Review: Remove modules/toollabs/files/host_aliases - https://phabricator.wikimedia.org/T109485#1670772 (10scfc) `tools-bastion-02`: ``` scfc@tools-master:~$ qconf -as tools-bastion-02 tools-bastion-02.tools.eqiad.wmflabs added to submit host list scfc@tools-master:~$ qconf -ah... [14:31:07] 6Labs, 10Tool-Labs, 5Patch-For-Review: Remove modules/toollabs/files/host_aliases - https://phabricator.wikimedia.org/T109485#1670783 (10scfc) `tools-checker-01` was readded as a submit host (?), but I was bold: ``` scfc@tools-bastion-01:~$ qconf -as tools-checker-01 tools-checker-01.tools.eqiad.wmflabs add... [14:33:20] 6Labs, 10Tool-Labs, 5Patch-For-Review: Remove modules/toollabs/files/host_aliases - https://phabricator.wikimedia.org/T109485#1670798 (10scfc) `tools-services-02` was still a submit host, but `tools-services-01` was not: ``` scfc@tools-bastion-01:~$ qconf -ds tools-services-02.eqiad.wmflabs scfc@tools-basti... [14:34:58] 6Labs, 10Tool-Labs, 5Patch-For-Review: Remove modules/toollabs/files/host_aliases - https://phabricator.wikimedia.org/T109485#1670804 (10scfc) `tools-webgrid-generic-1404` has been reenabled (?), so I'll disable it, reschedule the jobs, fix the host name in the host groups, restart execd and reenable the queue. [14:38:50] 6Labs, 10Tool-Labs, 3Labs-Sprint-115: Attribute cache issue with NFS on Trusty - https://phabricator.wikimedia.org/T106170#1670826 (10coren) [14:39:58] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1670836 (10Chmarkine) >>! In T50501#1670596, @Lixxx235 wrote: > Chmarkine, there's always StartCom/StartSSL which has free certs, and they're al... [14:45:45] 6Labs, 10Tool-Labs, 10Labs-Infrastructure, 3Labs-Sprint-115: Can't delete rule in default security group - https://phabricator.wikimedia.org/T112492#1670862 (10Andrew) @jcrespo: yep, that sounds like what we want. Thank you! [14:46:07] 6Labs, 10Tool-Labs, 5Patch-For-Review: Remove modules/toollabs/files/host_aliases - https://phabricator.wikimedia.org/T109485#1670863 (10scfc) I misread the process table: The queue seems to be disabled and no jobs running on this host, but there were processes for the tools `clickstream-api`, `faces` and `l... [14:51:38] yuvipanda : around? [14:51:43] It's an emergency [14:51:51] http://tools.wmflabs.org/languageproofing/ [14:51:57] My server seems to be down [14:52:04] What could be the possible reason? [14:53:06] ankita-ks: there are lots of possible reasons, but… have you tried restarting it? [14:53:30] andrewbogott : no, I have just panicked. Trying that now. Thanks! [14:54:47] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:56:32] ankita-ks: you see the error that it’s throwing? java.lang.IllegalArgumentException: Missing 'text' parameter [14:57:20] I can't login. I changed my ssh-keys [14:57:25] Importing them now [15:00:32] 6Labs, 10Tool-Labs: croptool creates huge temporary files - https://phabricator.wikimedia.org/T107328#1670878 (10scfc) 5Open>3Invalid a:3scfc I'm not aware of any recurrences, so closing as invalid. [15:01:28] andrewbogott : okay, where can you see this error? [15:01:42] ankita-ks: your tool writes a .out and a .err log [15:01:50] the .err is pretty clear about what’s wrong, I think [15:05:26] andrewbogott : A restart fixed it. [15:05:42] ankita-ks: ok then :) [15:05:50] ThanksI am sorry for acting so dumb. the IRC showcase for GSoC is about to start and this happened. [15:05:57] I completely panicked. [15:05:58] :| [15:09:46] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [15:12:18] 6Labs, 10Tool-Labs: qmaster chokes on old jobs from hosts that have been renamed - https://phabricator.wikimedia.org/T113614#1670900 (10scfc) 3NEW [15:34:12] 6Labs, 10Tool-Labs: qmaster chokes on old jobs from hosts that have been renamed - https://phabricator.wikimedia.org/T113614#1670981 (10valhallasw) Shutting down gridengine-exec on the host before restarting qmaster might help. Or maybe we should just rebuild exec hosts and delete the old ones if this gives to... [16:10:27] (03CR) 10Greg Grossmeier: "Sure?" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/240627 (owner: 10Greg Grossmeier) [16:13:14] yuvipanda : around? [16:13:29] When I log in to a labs instance [16:13:34] and try to do a vagrant up [16:13:48] I am prompted for mwvagrant password [16:13:56] This has not happened before [16:14:09] How do I find out what my mwvagrant password is? [16:15:06] (03PS2) 10Greg Grossmeier: Add #releng-epics to -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/240627 [16:30:19] yuvipanda: I'm thinking we could do https://beingasysadmin.wordpress.com/2014/12/07/automating-debian-package-management/ [16:31:04] yuvipanda: but for the short term the git checkout might be the best option? [16:36:22] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Scaling: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1671200 (10hashar) 3NEW [16:39:25] (03CR) 10Legoktm: [C: 032] Add #releng-epics to -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/240627 (owner: 10Greg Grossmeier) [16:39:38] (03Merged) 10jenkins-bot: Add #releng-epics to -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/240627 (owner: 10Greg Grossmeier) [16:40:48] !log tools.wikibugs Updated channels.yaml to: eca76c2669d6b0d308b2457b146dbef7f5f91d26 Add #releng-epics to -releng [16:40:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [17:16:06] (03PS1) 10Niedzielski: Set executable bit on script [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/240753 [17:18:21] yuvipanda: looks like i broke hte alpha build the other day :| i forgot to set the executable bit. would you mind +2ing: https://gerrit.wikimedia.org/r/#/c/240753/ [17:20:41] (03PS1) 10Greg Grossmeier: Add #MediaWiki-Releasing to -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/240756 [17:21:20] (03CR) 10Mholloway: [C: 031] Set executable bit on script [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/240753 (owner: 10Niedzielski) [17:25:02] (03CR) 10BearND: [C: 04-1] "I'm confused. The commit message does not seem to match the code change." [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/240753 (owner: 10Niedzielski) [17:27:45] (03PS2) 10Niedzielski: Set executable bit on script [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/240753 [17:28:29] (03CR) 10Niedzielski: "fixed, thanks. i accidentally committed debug code" [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/240753 (owner: 10Niedzielski) [17:36:47] (03CR) 10BearND: [C: 032] Set executable bit on script [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/240753 (owner: 10Niedzielski) [17:40:31] (03CR) 10Legoktm: [C: 032] "These will continue to show up in #wikimedia-dev" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/240756 (owner: 10Greg Grossmeier) [17:40:48] (03Merged) 10jenkins-bot: Add #MediaWiki-Releasing to -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/240756 (owner: 10Greg Grossmeier) [17:41:31] !log tools.wikibugs Updated channels.yaml to: bf1ac0ad9fd2f358aeb62335516c6ca4304b649d Add #MediaWiki-Releasing to -releng [17:41:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [18:09:43] yuvipanda: do you have the ability to approve OAuth apps again? [18:09:50] or anyone else around who can? [18:10:34] ragesoss: there's a user group on mw.org that can do that [18:10:44] ragesoss: but you should be able to test with your own user [18:10:58] We just switched to https for dashboard.wikiedu.org, and we need to update our approved consumers: https://meta.wikimedia.org/wiki/Special:OAuthListConsumers?name=&publisher=Ragesoss&stage=0 [18:11:01] and the queue is typically handled pretty quickly (few days tops I think?) [18:11:07] ah. [18:11:09] ragesoss: I used to be able to and now I can't because of my account merge [18:11:34] legoktm: I can't actually login to yuvipanda anymore can I? [18:11:52] yuvipanda: no, uppercase [18:12:03] legoktm: ya but that doesn't have my OAuth rights [18:12:07] legoktm: do you have oauth rights? [18:13:21] nope [18:13:26] yuvipanda: the rights didn't get moved over? [18:13:40] legoktm: don't think so [18:13:42] let me find out [18:14:10] ah [18:14:10] it did [18:14:12] neverming [18:14:14] ragesoss: I can approve [18:14:35] maybe [18:14:40] once I figure out the UI [18:14:47] :) [18:15:09] admin? [18:15:35] !ask [18:15:35] Hi, how can we help you? Just ask your question. [18:15:40] UA31_: ^ [18:16:11] yuvipanda:Grid problems [18:16:41] ok [18:16:43] go on [18:17:13] legoktm: how do I find my rights? [18:17:31] yuvipanda: special:preferences says which groups you're in, and special:listgrouprights says what rights those groups have [18:17:56] legoktm: yup the rights didn't transfer [18:17:57] When I submit a script with jsart, the script init correctly but in some time, stops suddenly without the d flag [18:17:59] ragesoss: I can't unfortunately [18:18:08] Coren: can you help UA31_? [18:18:18] yuvipanda: on what wiki? [18:18:23] mw.o? [18:18:23] legoktm: meta [18:18:30] legoktm: OAuth approvals moved to meta [18:18:50] I see. [18:18:51] one sec [18:20:19] yuvipanda: fixed [18:20:23] Hi. A labs instance I run is giving 502: Bad Gateway error. Ideas how to fix this? [18:20:33] https://grantreview-dev.wmflabs.org [18:22:11] ragesoss: approved [18:22:23] Niharika: it means that whatever's running on your instance isn't responding [18:22:30] so login to the instance, hit localhost and see if it works? [18:22:32] yuvipanda and legoktm: thanks much! [18:22:37] yuvipanda: Okay. [18:27:15] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Scaling: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672061 (10hashar) From nodepool debug log, three instances in the 10.68.20.0/24 range are unreachable: ``` Creating server with hostname ci-... [18:33:09] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Scaling: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672078 (10hashar) Potential instance for testing: ci-jessie-wikimedia-1443110708 10.68.20.123 I can ping it from labs bastion but not from... [18:40:10] yuvipanda: I am not able to ssh in. I get "channel 0: open failed: connect failed: No route to host [18:40:10] ssh_exchange_identification: Connection closed by remote host" - I'm pretty sure my keys are fine, I'm able to ssh in to other instances except this one. [18:40:44] ow [18:40:48] what instance [18:40:55] https://grantreview-dev.wmflabs.org/ [18:43:34] Niharika: that's the proxy address :) [18:43:38] Niharika: what's the name of the instance? [18:43:51] that you created and are trying to ssh to? [18:44:11] yuvipanda: grantreview-dev.eqiad.wmflabs is what I am trying to ssh into. [18:44:16] ok [18:45:36] ok it's in 'Paused' state [18:45:45] andrewbogott: ^ any idea why an instance would go into a paused state by itself? [18:46:24] yuvipanda: no. Maybe a side-effect of the virt host being troubled... [18:46:27] what host is it on? [18:46:32] It hasn't been used in about six months. Maybe that's why? [18:46:40] 1007? [18:46:44] Niharika: nah shouldn't affect it [18:46:46] andrewbogott: 1007 [18:47:18] looks fine to me [18:47:22] Shall I resume it? [18:47:27] Yes, please! [18:56:38] andrewbogott: did you resume it? [18:56:52] yuvipanda: not yet; something is odd [18:56:58] ok [18:57:09] they were paused outside of nova, like libvirt spontaneously decided to pause a bunch of things [18:59:59] Niharika: is that better? [19:01:11] hi yuvipanda. I updated the etherpad, https://etherpad.wikimedia.org/p/email-tools, please review and once you are happy and made all the changes you see fit, I'll email Manprit. [19:01:46] yuvipanda: there were 15 paused instances on 1007, and none anywhere else [19:01:56] I’m going to resume them, and we should watch out to see if that happens again [19:01:56] leila: +1 [19:02:01] andrewbogott: ugh. yeah, ok [19:02:09] andrewbogott: should we write an icinga check for them? [19:02:22] not unless it happens again [19:02:40] andrewbogott: ok [19:02:52] andrewbogott: I'll file a bug anyway [19:02:54] andrewbogott: Much better. Thank you. [19:02:58] thanks [19:03:19] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Scaling: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672156 (10hashar) I think somewhere you (or I) have a netmask of 10.68.16.0/22 when it should be 10.68.16.0/21 I can't see a... [19:03:42] 6Labs: Instances spontaneously suspended - https://phabricator.wikimedia.org/T113646#1672157 (10yuvipanda) 3NEW [19:04:46] leila: it lgtm :D [19:20:00] 6Labs, 10Labs-Infrastructure: Make sure nova is re-using old private IPs - https://phabricator.wikimedia.org/T113648#1672224 (10Andrew) 3NEW [19:20:31] 6Labs, 10Labs-Infrastructure: Make sure nova is re-using old private IPs - https://phabricator.wikimedia.org/T113648#1672232 (10Andrew) a:3Andrew [19:21:11] PROBLEM - Puppet failure on tools-checker-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:21:40] Coren:When I submit a script with jsart, the script init correctly but in some time, stops suddenly without the d flag [19:24:47] I'm getting publickey issues trying to ssh into a new instance. I looked at the console output, and it did get through the ssh key stuff, but after that there's a lot of errors like 'could not request certificate' [19:31:13] ragesoss: hmm, what instance is this again? [19:31:27] ragesoss: also did you just delete an instance with the same name just before this? [19:31:34] yuvipanda: yes. [19:31:44] wikiedu-backups.globaleducation [19:32:09] I guess that was a bad idea, deleting and then recreating with the same name? [19:32:46] ragesoss: if you wait >5mins it's ok [19:32:53] otherwise cache and stuff fucks things up :( [19:33:11] okay. so, delete again, wait more than 5 mins, and then try once more? [19:34:46] ragesoss: yeah should work [19:35:06] okay, thanks [19:37:35] one more thing, if you don't mind... for some reason, even though I have the exact same ssh config on my laptop and my desktop (following the examples on wikitech), on my desktop I always get 'Could not resolve hostname outreachdashboard.globaleducation.eqiad.wmflabs', even though it works properly on my laptop. [19:37:43] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Scaling: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672262 (10Andrew) seems to be fixed by Chase's updating of the router config from /22 to /21. [19:37:46] any idea on what I've misconfigured? [19:39:55] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Scaling: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672267 (10Andrew) 5Open>3Resolved a:3Andrew [19:55:59] ragesoss: can you pastebin your config? [20:00:10] yuvipanda: http://pastebin.com/7QBnxBLt [20:00:53] yuvipanda: and the -vvv output: http://pastebin.com/tK6xcKkd [20:05:00] ragesoss: couple of ancillary points first - get rid of all bastions outside of bastion.wmflabs.org :) [20:05:29] none of them work anymore outside of that one [20:05:34] ragesoss: are they running same version of ssh? [20:06:02] no. the working one is Ubuntu. [20:06:14] the non working one? [20:06:20] can you paste output of -vvv from both? [20:06:23] Debian [20:06:46] OpenSSH_6.9p1 Debian-2, OpenSSL 1.0.2d 9 Jul 2015 [20:07:43] (newer than what ubuntu provides) [20:08:41] ragesoss: can you ssh to bastion.wmflabs.org from the non-working one? [20:09:17] yuvipanda: yes. [20:14:01] UA31_: The most common case is running out your memory allocation. What does qacct tell you is the cause of exit? [20:16:27] doesn't it say "bastion1.eqiad.wmflabs" yuvipanda [20:16:47] dunno if relevant [20:17:09] hey yalls, i don't seem to be able to create a new instance in deployment-prep [20:17:19] it creates, but seems misconfigured [20:17:24] doesn't have a proper puppetmaster set [20:17:40] chasemp: ya but that's also 'aliased' to bastion.wmflabs.org in another stanza [20:17:48] bastion1.eqiad.wmflabs hasn't worked for a few months now [20:17:56] 2015-09-24T20:16:14.037032+00:00 deployment-conf02 puppet-agent[1291]: Could not request certificate: Connection refused - connect(2) for "" port 8140 [20:17:56] right, well I thought I had something there [20:18:20] maybe the aliasing is bad [20:18:20] idk [20:18:49] ottomata: did you just delete an instance and re-create it soon after? [20:19:17] yes, but no. [20:19:23] i created one yesterday, and it did the same thing [20:19:35] so, i deleted it, then created a new one with a different name [20:19:55] so that error message is consistent with 'goddamn fucking DNS cache aaaaarrrggh' [20:20:03] haha, eh? [20:20:04] which happens if you recreate instances too quickly [20:20:08] no, it has a differen tname [20:20:13] and yesterday it did the same thingm [20:20:14] right so am not sure what's going on [20:20:20] andrewbogott: ^ ? [20:20:31] it also just has the wrong value set for puppetmaster [20:23:00] Coren: 100 : assumedly after job [20:24:13] exit_status 137 [20:24:45] *130 [20:24:59] yuvipanda: ? [20:25:40] ottomata: can you try creating *another* instance? [20:25:43] and see what that says? [20:26:01] ottomata: andrewbogott is the one who usually futzes with DNS and stuff so was just waiting for him to come by [20:26:18] k [20:26:33] deployment-conf03 comin in [20:29:07] 10MediaWiki-extensions-OpenStackManager: OpenStackManager fails the mwext-testextension-zend test - https://phabricator.wikimedia.org/T113655#1672458 (10Paladox) 3NEW [20:29:45] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure: OpenStackManager fails the mwext-testextension-zend test - https://phabricator.wikimedia.org/T113655#1672467 (10Paladox) [20:35:57] yuvipanda: ottomata: new instances were broken for a bit. They should be working now. [20:36:19] andrewbogott: what had happened? [20:36:46] The long story is: Some routers were misconfigured with a /22 subnet which meant we were only actually set up for half as many IPs as intended. Yesterday we hit the lucky number of consuming 50% of private IPs [20:36:53] Chase fixed the router configs and all should be well now [20:37:50] (This was, like, half an hour ago that Hashar noticed and chase fixed.) [20:37:58] ah, ouch [20:37:59] ok [20:38:29] ah ok so this one...came up [20:38:30] ok thanks! [20:38:49] sorry I would ahve noted ottomata and yuvipanda but I couldn't figure how a bad puppetmaster setting woudl relate [20:39:29] no network = failed dhcp = now puppet cert (which is basically the first networky thing a new instance does) [20:39:40] *no puppet cert [20:40:13] it was the noise not the instrument, so noted dude thanks [20:40:39] tahnks yall! [20:40:44] time to airport, laters! [20:44:34] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: OpenStackManager fails the mwext-testextension-zend test - https://phabricator.wikimedia.org/T113655#1672556 (10hashar) They have been renamed in MediaWiki core since MediaWiki 1.24.0 by https://gerrit.wikime... [20:49:30] 10MediaWiki-extensions-OpenStackManager, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: OpenStackManager fails the mwext-testextension-zend test - https://phabricator.wikimedia.org/T113655#1672572 (10Paladox) Oh I have copied the tree images that are required and added them to the extension dire... [20:52:41] Coren:Solved [21:01:28] Why I can't use fork in grid? [21:18:15] yuvipanda:How I can use fork in grid? [21:21:02] I do not understand the question? [21:24:00] When I execute the script with fork, it fails and delete from grid, but if I disable fork, works [21:27:10] I do not think the fork is the problem. you are probably running out of memory [21:27:20] you can get more memory by passing the -mem option to jsub / jstart [21:27:24] try -mem 1G [21:30:21] Fork 10 fails with 1g but works with 5 [21:30:46] You should audit your process and make it not use that.much memory then :) [21:31:15] Krenair: andrewbogott do you know which database has the wikitech db? [21:31:27] yeah [21:31:31] Yuvi|Panda: it’s hosted locally on silver [21:31:32] labswiki [21:31:34] ah ok [21:31:34] on silver [21:34:42] jesus, I have forgotten how to write JOINs [21:37:28] Yuvi|Panda: you're too young for that man [21:37:30] :) [21:40:10] now I know! [21:40:11] Krenair: how do I output the result of a query in the 'sql' tool onto a file? [21:40:11] > [21:40:11] I mean [21:40:12] I can't seem to do 'sql ' [21:40:12] oh god [21:40:12] nevermind me [21:40:12] you need < ;) [21:40:13] sql xxwiki < in.sql > out.tsv [21:44:14] yuvipanda: fyi, forking /is/ a problem because it can cause orphan processes [21:44:40] so the jobs might actually be running, but unsupervised... [21:45:29] valhallasw`cloud: yes but dying suggested that there wasn't enough memory for whatever was going on [21:45:55] legoktm: I think the cat laying on me has made me very dumb today [21:46:54] yuvipanda: or it was a process started as daemon process... [21:47:26] Could be. Without more info... [21:47:28] 6Labs, 10Tool-Labs, 6Design Research Backlog, 6Learning-and-Evaluation, and 2 others: Organize a (annual?) toollabs survey - https://phabricator.wikimedia.org/T95155#1672800 (10leila) [21:47:40] Yuvi|Panda: you are forgiven IF you post pics in #wikimedia-kawaii [21:47:50] nothing running on tools-exec-12*, though, so probably fine. [21:48:01] 6Labs, 10Tool-Labs, 6Design Research Backlog, 6Learning-and-Evaluation, and 2 others: Organize a (annual?) toollabs survey - https://phabricator.wikimedia.org/T95155#1181554 (10leila) [21:48:37] anyway, bed. [22:08:49] might dropping core files in public_html be a possible password leak? [22:10:47] 'core files'? [22:11:39] memory files after a crash [22:15:25] ok, i could find my mysql password in it [22:17:02] that's nothing i necessarily want to be exposed to the public [22:18:18] Coren: ^ if a (my?) web tool crashes it leaves core dumps in public_html (with sensitive (?) data in it); does that constitute a problem?