[05:06:14] Change on 12www.mediawiki.org a page OAuth/For Developers was modified, changed by BDavis (WMF) link https://www.mediawiki.org/w/index.php?diff=1867249 edit summary: Change registration and consumer lists to point at canonical data on meta (T59336) [05:13:18] Change on 12www.mediawiki.org a page OAuth/For Developers was modified, changed by BDavis (WMF) link https://www.mediawiki.org/w/index.php?diff=1867256 edit summary: Fixup MediaWiki-Vagrant instructions [09:14:52] !log tools.heritage Updated ~/pywikibot to latest version, but still getting a FamilyMaintenanceWarning [09:14:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL, Master [09:29:16] Did the file system just die on us? [09:37:06] https://phabricator.wikimedia.org/T110827 [09:37:06] Awww fuck [09:37:06] * multichill pokes YuviPanda, Coren & andrewbogott [09:37:06] Good whatever timezone you are YuviPanda [09:37:07] Sorry for the rude awakening [09:37:07] multichill: nah, got a page [09:37:07] paravoid: around? [09:37:17] 6Labs, 10Tool-Labs: No file system on toollabs, unable to login - https://phabricator.wikimedia.org/T110827#1587427 (10Multichill) 3NEW [09:37:45] YuviPanda: Filed https://phabricator.wikimedia.org/T110827#1587427 in case you didn't notice [09:37:52] multichill: thanks [09:37:55] multichill: kernel bug lockup [09:38:25] I have some interactive mysql sessions. These don't seem to use the file system so those still work :P [09:38:38] heh [09:38:39] ok [09:40:07] Kernel bug lockup in the vm or in the hypervisor? [09:40:38] multichill: in the labs NFS host [09:44:05] Fun. YuviPanda, maybe we should just buy a Netapp to get this shit over with :P [09:47:54] 6Labs, 10Tool-Labs: No file system on toollabs, unable to login - https://phabricator.wikimedia.org/T110827#1587460 (10Multichill) Yuvi is working on this [09:51:56] 6Labs, 10Tool-Labs: No file system on toollabs, unable to login - https://phabricator.wikimedia.org/T110827#1587476 (10yuvipanda) The NFS server is down due to some kernel issues, we're working on it. [10:27:22] * matanya came to ask what happened to tools, but now he knows [10:35:04] YuviPanda: Dude, don't start talking about backups, that scares me. The last time we had to restore was no fun :P Anyway, I hope you're able to fix it. [10:35:13] something up with labs? [10:35:16] multichill: heh :) [10:35:27] multichill: yeah, me too. but that was just a gutcheck to make sure :) [10:35:33] multichill: godo.g is also helping atm [10:35:46] * sDrewthedoff is logged in, then kicked out [10:35:55] sDrewthedoff: NFS died, see https://phabricator.wikimedia.org/T110827 [10:36:14] ok, probably worth putting it into the channel [10:36:23] GMTA [10:36:38] sDrewthedoff: I guess you didn't read the topic? :P [10:36:52] sDrewthedoff: it was in the topic before ('toollabs problems') [10:36:58] which is probably why the bot died [10:37:26] let me see if wikibugs and grrrit are still on the webserver [10:37:38] YuviPanda: oh, we should do the error page thing :P [10:37:50] let's see how thta worked again [10:37:55] heh [10:38:00] I dunno if puppet'll run [10:38:08] I'll take a look at it [10:38:17] then try to get grrit/wikibugs online from a secondary host [10:41:41] YuviPanda: gah, tools-webproxy-01 is not taking my root key [10:41:57] can you unmount /home there? [10:42:21] valhallasw`cloud: done [10:48:55] gaah, of course the active one is -02 >_< [10:49:15] valhallasw`cloud: hah. let me do that too [10:49:32] although somehow -01 is used a lot internally [10:49:55] valhallasw`cloud: done [10:50:01] thans [10:50:11] valhallasw`cloud: yw! [10:50:20] so the puppet trick indeed doesn't work, the nginx manifest needs /data/project [10:50:32] but luckily I added comments on how to do it manually :P [10:51:26] https://tools.wmflabs.org/ <-- there we go! [10:53:37] valhallasw`cloud: <3 [10:53:45] !log tools Set error page on tools webserver via Hiera + some manual hacking (https://wikitech.wikimedia.org/wiki/Hiera:Tools) [10:54:11] valhallasw`cloud: https://stashbot.wmflabs.org/#/dashboard/elasticsearch/default [10:54:27] that still works [10:54:30] even if morebots doesn' [10:54:31] t [10:55:20] !log tools restarted grrrit-wm from tools-webproxy-01 [10:57:23] !log tools started wkibugs from tools-webproxy-01 as well, still need to check if the phab<->redis part is still alive [10:59:26] huh. [10:59:31] for some reason redis is also not working [11:00:04] valhallasw`cloud: let me unmount home there as well [11:00:09] done [11:01:11] 10Wikibugs: wikibugs test bug part II - https://phabricator.wikimedia.org/T90594#1587637 (10valhallasw) Dumdiedum. [11:01:17] ah, thre we go. [11:01:32] 6Labs, 10Tool-Labs: No file system on toollabs, unable to login, web service broken - https://phabricator.wikimedia.org/T110827#1587638 (10doctaxon) I wonder why you haven't a redundant NFS server system not yet. [11:15:43] hmm, I seem to be having an issue sshing to tools-login .... [11:15:56] yep, subject [11:16:02] ;_; [11:16:19] thats sad, I thought it was something wrong my end, as I left on building and it was working fine, and got to another and nothing would connect :D [11:16:20] <- same blindness [12:53:54] looks like Yuvi is working on something though :) [12:53:55] I ma [12:53:55] yes [12:53:56] *am [12:53:56] D [12:53:57] 6Labs, 10Tool-Labs, 7user-notice: No file system on toollabs, unable to login, web service broken - https://phabricator.wikimedia.org/T110827#1587649 (10Josve05a) [12:53:57] :D [12:53:58] labs dead? [12:53:58] oh, bug. [12:54:00] YuviPanda: I replied to your email to the announce list with the phab id, but it's in moderation now [12:54:01] multichill: thanks [12:54:04] any idea when systems will be back [12:54:05] ?? [12:54:05] GerardM-: no ETA yet, no. [12:54:06] YuviPanda: Could you please accept the message in moderation? ;-) [12:54:06] multichill: I did! [12:54:07] I thought I did at least [12:54:07] hmm, don't see it on https://lists.wikimedia.org/pipermail/labs-l/2015-August/thread.html [12:54:08] Right, it's at https://lists.wikimedia.org/pipermail/labs-announce/2015-August/000068.html :-) [12:54:08] Oh, it might still be stuck in moderation in labs-l? [12:54:08] :D [12:54:09] Implicit destination umbrella list crap [12:54:09] done [12:54:09] YuviPanda: For later, there is some hidden mailman setting you have to do on labs-l so it sees labs-announce as being an acceptable alias [12:54:09] multichill: I wouldn't say it is hidden :) [12:54:09] It's mailman, everything is hiding in plain sight [12:54:10] JohnFLewis: can you turn it on? :) [12:54:10] YuviPanda: Nope because you guys aren't letting me have the site password ;) but https://lists.wikimedia.org/mailman/admin/labs-l/?VARHELP=privacy/recipient/acceptable_aliases is it [12:55:57] Why is http://bots.wmflabs.org/~wm-bot giving me a 403? [12:56:32] multichill: we recovered, btw. coming bac up [12:56:57] slowly [13:01:03] !log tools rebooted tools-bastion-01 to see if that remounts NFS [13:03:36] YuviPanda: Crap, I logged in again and it created a new homedir for me [13:03:50] multichill: yeah, 'tis ok, NFS will come back soon [13:04:00] you aren't the only one [13:04:13] And some reason, not having a homedir means that the motd suddenly works ;-) [13:04:47] multichill: heh :) [13:04:53] multichill: I've disabled ssh for non root login now [13:20:16] godog: valhallasw`cloud can help verify things on tools [13:20:17] !log disabled puppet on tools-bastion-01 and restricted it to root only [13:20:17] YuviPanda: how do we revert root only on bastion? [13:20:17] godog: enable puppet [13:20:17] YuviPanda: I don't have root access [13:20:18] valhallasw`cloud: godog does! alternative is to switch DNS to point tools-login to tools-bastion-02 [13:20:18] YuviPanda: and I can only help verify if I know what to verify [13:20:18] ok scratch and home are back on tools-bastion [13:20:18] woo [13:20:18] wooo [13:20:19] i enabled puppet [13:20:19] :> [13:20:19] login on tools-bastion-02 hangs, so NFS is definitely not back there :-p [13:20:21] disabled is not a valid project. [13:20:27] heh [13:20:30] things are coming back up! [13:20:39] !log tools disabling 503 error page [13:20:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [13:20:46] valhallasw`cloud: login enabled on -bastion-01 [13:20:50] * godog will name his next project 'disabled' [13:20:57] hehe [13:21:03] epic work guys :D [13:21:07] all to godog [13:21:13] oh, bastion-02 is also back online now [13:21:17] woooo! [13:21:18] nfs fixing itself like magic [13:21:20] sweet [13:21:20] multichill: ^ [13:21:21] :> [13:21:26] valhallasw`cloud: can you email labs-l? [13:21:27] (well, mostly yu two fixing stuff) [13:21:28] left is tools ? [13:21:36] +49 1516 6026630 is my phone number, tw [13:21:44] in case I ned to be reached in the next... 6h [13:21:54] godog: left? [13:21:57] YuviPanda: I can login again and this time I get a real homedir :-D [13:21:59] http://tools.wmflabs.org/?status < [13:22:02] whee [13:22:13] valhallasw`cloud: err, 'left to recover' [13:22:26] godog: can you email ops@? [13:22:29] YuviPanda: oh, you ran a puppet agent -tv already? :P [13:22:35] or was this a lucky cron timing [13:22:53] valhallasw`cloud: I ran one yeah [13:22:56] k [13:23:04] YuviPanda: sure, has everything recovered? [13:23:18] !log tools killed wikibugs-backup and grrrit-wm on tools-webproxy-01 [13:23:22] looks like [13:23:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [13:23:30] multichill: can you check if maps is back? [13:23:30] Yuvipanda, I'm here but across town from my laptop. Reading back scroll... [13:23:42] I take it things are still broken? [13:23:43] andrewbogott: haha, you missed all the fun [13:23:56] YuviPanda: Maps? I wouldn't know where to look [13:24:08] Really? Great! [13:24:08] !log lolrrrit-wm force-restarting grrrit-wm [13:24:08] lolrrrit-wm is not a valid project. [13:24:16] !log lolrrit-wm force-restarting grrrit-wm [13:24:17] lolrrit-wm is not a valid project. [13:24:23] !log tools.lolrrit-wm force-restarting grrrit-wm [13:24:25] :> [13:24:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lolrrit-wm/SAL, Master [13:24:29] I checked some of the Magnus toys and all seem to be working again [13:24:51] 10Wikibugs: wikibugs test bug part II - https://phabricator.wikimedia.org/T90594#1587759 (10valhallasw) test [13:24:57] ok, wikibugs is also alive [13:25:03] 6Labs, 6operations: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1587760 (10fgiunchedi) assuming the missing drives are on purpose, activating the lv worked with `lvchange -ay labstore/others` and `lvchange -ay labstore/tools` (and nuking some 'others' snapshots so... [13:26:00] 6Labs, 10Tool-Labs, 7user-notice: No file system on toollabs, unable to login, web service broken - https://phabricator.wikimedia.org/T110827#1587763 (10valhallasw) The NFS server and tool labs are back online. [13:27:07] valhallasw`cloud: andrewbogott I'm gonna lose internet now [13:27:17] YuviPanda: ok! thanks for your work! [13:27:27] YuviPanda: Good timing! [13:27:42] I left my phone number [13:27:44] On channel [13:27:46] And keep up the good work :-) [13:27:49] valhallasw`cloud: cool, I take it things are recovering? [13:27:49] Won't have internet [13:27:51] Do call [13:27:57] And all the love to godog [13:27:59] godog: I think so, yes [13:28:04] webservices are on line [13:28:10] sge is (see wikibugs and grrrit-wm) [13:28:18] Yuvipanda nothing needed immediately right? [13:28:24] Safe travels! [13:28:45] andrewbogott: yeah I think we should be ~fine now [13:28:49] godog: Already have an idea about the root cause? [13:29:14] multichill: for this morning's lockup of labstore1002, no not yet [13:29:30] That's a bit scary [13:30:07] YuviPanda: mail sent [13:30:17] 6Labs, 6operations: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1587764 (10Aklapper) p:5Triage>3Unbreak! [13:30:22] I agree, most of the time has been spent trying to exclude hw problems and bring things back up [13:31:28] there's a few SGE emails coming in about jobs failing to start in the previous hours, but that's expected (cronjobs not running) [13:37:13] no, there's still stuff broken [13:37:31] or is it just gmail not being clear about email times [13:37:46] valhallasw`cloud: what's broken? [13:37:53] I [13:38:04] I'm getting cron emails about failed job submissions [13:38:13] on my phone the timestamp was 'now' [13:38:21] but in my browser it's '30 minutes ago' [13:38:41] so it's probably fine [13:38:50] kk [13:41:41] valhallasw`cloud: It's GMAIL [13:41:46] I'm getting that crap too [13:41:46] :D [13:41:57] All timestamped earlier this afternoon [13:42:05] I'm now at 13:10 [13:42:38] You could check the number of queued messages and just flush it to get it over with [13:43:04] Owh, another batch, now we seem to be complete [13:44:18] still 600 mails to go :-p [13:45:31] kk, I have to run now but looks like we're back [13:45:33] http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1440942295.355&target=tools.tools-mail.exim.queue.size [13:45:52] ...we had a million e-mails queued? O_o [13:46:03] 6Labs, 10Tool-Labs, 7user-notice: No file system on toollabs, unable to login, web service broken - https://phabricator.wikimedia.org/T110827#1587769 (10Multichill) Tool operators are getting a lot of failed jobs emails. All because of the outage. Emails should all have a timestamp that falls in the outage p... [13:46:28] oh, no, a megabyte of emails [13:51:08] oh, of course it couldn't send emails because it has to read ~/.forward which is on NFS. Right. [14:22:13] valhallasw`cloud: Mail queue seems to be zero again. [14:38:35] !log tools.heritage Made local change to unused_images.py to get it to work, see https://phabricator.wikimedia.org/T110829 [14:38:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL, Master [16:20:40] 6Labs, 10Tool-Labs: nginx puppet manifest requires nfs so error page cannot be updated over puppet - https://phabricator.wikimedia.org/T110836#1587876 (10scfc) At the moment, we include the "local" NFS Debian repository everywhere. 1. I believe for the proxy we nowadays only need the package `lua-json_1.3.2-1... [16:34:47] 6Labs, 6operations: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1587886 (10jeremyb-phone) [18:16:51] Cyberpower678: you can't login *where*? [18:16:57] tools-login? [18:17:03] Yea [18:17:29] valhallasw`cloud, no it's working [18:19:57] Cyberpower678: "Aug 30 17:09:48 tools-bastion-01 sshd[32059]: Invalid user cyberpowr678 from " [18:21:50] Cyberpower678: that's the only connection request I see from your IP today, and it's being denied because of a typo in the user name [18:22:06] (apart for the succesful one just a few minutes ago) [18:24:26] Oh. [18:24:29] My bad. [18:24:50] can you send an email your issue is resolved? thanks :-) [18:39:25] valhallasw`cloud: back in internetland now. Thanks for dealing with emails and stuff [18:39:28] <3 [20:24:53] 6Labs, 10Tool-Labs: nginx puppet manifest requires nfs so error page cannot be updated over puppet - https://phabricator.wikimedia.org/T110836#1588005 (10valhallasw) [20:35:24] 6Labs, 10Tool-Labs: nginx puppet manifest requires nfs so error page cannot be updated over puppet - https://phabricator.wikimedia.org/T110836#1588008 (10valhallasw) Yes, I think you're right the root cause is the labsdebrepo. Specifically, I think the dependency chain is ``` Notice: /Stage[main]/Dynamicproxy... [21:50:10] 6Labs, 10Tool-Labs: nginx puppet manifest requires nfs so error page cannot be updated over puppet - https://phabricator.wikimedia.org/T110836#1588095 (10scfc) I think the `require` in: ``` class { 'nginx': variant => 'extras', require => Class['misc::labsdebrepo'], } ``` is an artifact when the prox... [22:18:16] I use AWB and in X!'s tools I can't see my (Semi-)automated edits. Why? [22:21:08] aram1985: Cyberpower678 is the xtools maintainer, maybe he knows? They also have a mailing list I think [22:22:50] * aram1985 slaps Cyberpower678 around a bit with a large fishbot [22:37:28] * Cyberpower678 incinerates aram1985 [23:27:42] Change on 12www.mediawiki.org a page Talk:Developer access was modified, changed by 58.187.166.10 link https://www.mediawiki.org/w/index.php?diff=1868296 edit summary: [-980] Sua loi chinh ta [23:29:49] Change on 12www.mediawiki.org a page Talk:Developer access was modified, changed by 58.187.166.10 link https://www.mediawiki.org/w/index.php?diff=1868297 edit summary: [-1388] Sua loi chinh ta [23:31:40] Change on 12www.mediawiki.org a page Talk:Developer access was modified, changed by 58.187.166.10 link https://www.mediawiki.org/w/index.php?diff=1868302 edit summary: [-1332] Sua loi ching ta [23:32:29] Change on 12www.mediawiki.org a page Talk:Developer access was modified, changed by Krenair link https://www.mediawiki.org/w/index.php?diff=1868303 edit summary: [+3700] Reverted edits by [[Special:Contributions/58.187.166.10|58.187.166.10]] ([[User talk:58.187.166.10|talk]]) to last revision by [[User:Nemo bis|Nemo bis]]