[03:00:36] <spagewmf>	 when was Toolserver decommissioned?  I'm updating https://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/Migration_of_Toolserver_tools
[03:01:56] <Krenair>	 you mean when were the servers physically powered off and removed from the racks?
[03:03:22] <Krenair>	 or do you really only care about when ordinary users stopped being able to access it? IIRC these were different days
[03:04:04] <spagewmf>	 Krenair: I guess the latter.  I'll just say decommissioned in 2014.
[03:04:18] <Krenair>	 On 3rd October 2014, 10:43 mark: Shutdown amaranth.toolserver.org's switchport on asw-d-pmtpa
[03:04:47] <spagewmf>	 Krenair: thanks.  https://en.wikipedia.org/wiki/Wikipedia:Toolserver#Migration_from_Toolserver_to_Wikimedia_Labs says both "Final decommissioning of the toolserver was tentatively scheduled to be complete by December 2014." and "On July 1, 2014, the Toolserver was shut down."
[03:11:46] <wikibugs>	 10Tool-Labs, 10Wikimedia-Hackathon-2015, 7Documentation: Re-organize Tool Labs documentation - https://phabricator.wikimedia.org/T91509#1267834 (10Spage) @YuviPanda and I met, see http://etherpad.wikimedia.org/p/tool_labs-doc . We discussed other pages and made a bunch of immediate cleanups , but we didn't g...
[03:59:01] <wikibugs>	 10Tool-Labs: xtools-ec has multiple webservices running - https://phabricator.wikimedia.org/T98432#1267859 (10yuvipanda) 3NEW
[04:29:24] <legoktm>	 is someone rebooting integration slaves?
[04:37:30] <yuvipanda>	 legoktm: not me
[04:58:06] <HaeB>	 yuvipanda: https://wikitech.wikimedia.org/w/index.php?title=Help:Tool_Labs&diff=157896&oldid=157894
[05:39:02] <wikibugs>	 10Tool-Labs: xtools-ec has multiple webservices running - https://phabricator.wikimedia.org/T98432#1267940 (10yuvipanda) ```tools.xtools-ec@tools-bastion-01:~$ qstat | wc -l 112```
[06:11:12] <wikibugs>	 10Tool-Labs: Unify / simplify webservice code - https://phabricator.wikimedia.org/T98440#1267954 (10yuvipanda) 3NEW a:3yuvipanda
[06:28:12] <wikibugs>	 10Tool-Labs: Convert lighttpd-starter from bash to python - https://phabricator.wikimedia.org/T98441#1267965 (10yuvipanda) 3NEW a:3yuvipanda
[06:28:37] <wikibugs>	 10Tool-Labs: Convert tomcat-starter to python - https://phabricator.wikimedia.org/T98442#1267973 (10yuvipanda) 3NEW a:3yuvipanda
[06:33:44] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]  
[06:46:27] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]  
[06:58:43] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0]  
[07:11:30] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0]  
[08:38:14] <shinken-wm>	 PROBLEM - Puppet staleness on tools-shadow is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[09:33:49] <wikibugs>	 6Labs, 6operations: Investigate ways of getting off raid6 for labs store - https://phabricator.wikimedia.org/T96063#1268231 (10mark) p:5Normal>3High
[09:36:51] <wikibugs>	 6Labs, 6operations: Investigate ways of getting off raid6 for labs store - https://phabricator.wikimedia.org/T96063#1268232 (10mark) >>! In T96063#1207581, @coren wrote: > Raid 6 is a performance bottleneck but gives us 66% more effective storage than raid10 would in the current configuration.  It doesn't mean...
[09:56:22] <wikibugs>	 6Labs: Increase RAID6 sync_speed_min to a sensible level - https://phabricator.wikimedia.org/T98456#1268271 (10mark) 3NEW a:3coren
[10:02:34] <apergos>	 I'm going to start upgrading salt on labs instances to 2014.7.5 now.  I'll be doing it in batches, it will be slow and tedious, etc.
[10:03:05] <apergos>	 Probably take several hours.  I'll try to keep interruptions in service down. 
[10:03:21] <apergos>	 (might affect: salt commands, trebuchet/git deploy)
[10:20:30] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-catscan is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[10:21:26] <apergos>	 on virt100 I see this:
[10:21:28] <apergos>	 root     32664  0.1  0.0 443696 25688 pts/14   Sl+  Mar31  86:05 /usr/bin/python /usr/bin/salt -b 25% -G lsb_distrib_codename:jessie cmd.run sudo umount -f -l /home && sudo mount /home
[10:21:51] <apergos>	 I am going to terminate it with prejudice, no salt job should be running since March 31
[10:22:23] <mark>	 indeed
[10:24:14] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-cyberbot is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[10:31:53] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-15 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[10:32:15] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-gift is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[10:33:11] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-wmt is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[10:34:35] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-13 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[10:36:08] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-08 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[10:36:50] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-14 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[10:37:02] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-07 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[10:40:26] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[11:58:49] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]  
[12:01:57] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]  
[12:25:24] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]  
[12:26:56] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0]  
[12:28:50] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0]  
[12:50:23] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0]  
[13:01:44] <apergos>	 fyi the first round of precise and trusty instance salt upgrades is done, there will be manny instances of those to be done manually or to be left for folks to fix their dpkg issues 
[13:01:55] <apergos>	 I see  GPG error: http://nova.clouds.archive.ubuntu.com trusty Release: The following signatures were invalid: BADSIG 40976EAF437D05B5 Ubuntu Archive Automatic Signing Key <ftpmaster@ubuntu.com>   on a lot of hosts: anyone know about this?
[13:02:17] <apergos>	 Coren: ?
[13:02:53] <Coren>	 apergos: First I hear of it; and first time I see that particular error.  Do you have an example instance?
[13:02:58] <apergos>	 sure
[13:03:16] <apergos>	 i-0000048d.eqiad.wmflabs
[13:03:27] <apergos>	 I see it uring an apt-get update and of course it blocks installs also
[13:03:38] <Coren>	 Ima go take a look.
[13:03:41] <apergos>	 thanks!
[13:06:14] <Coren>	 ... that's a deleted instance according to wikitech.
[13:09:08] <Coren>	 Yet it clearly exists.  Hm.
[13:09:23] <Coren>	 apt-get update
[13:21:25] <Coren>	 apergos: I'm a little at a loss how that could have happened, but it's easily fixed by blasting away the /var/lib/apt/lists cache and redoing at apt-get update.
[13:21:36] <apergos>	 hm ok
[13:21:39] <Coren>	 Tries to google for possible causes.
[13:21:46] <apergos>	 did you do that on the particular host then?
[13:21:52] <Coren>	 On that one host, yes.
[13:21:56] <apergos>	 ok great
[13:22:28] <apergos>	 the other thing I have a lot of, without apparently apt or dpkg running, is 
[13:22:30] <apergos>	     E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
[13:22:38] <Coren>	 It looks like one of the signature files was truncated but because the timestamp was okay it never uploaded it anew.
[13:22:56] <Coren>	 downloaded*
[13:23:14] <Coren>	 apergos: Same hosts or different ones?
[13:23:19] <apergos>	 nah differnt hosts
[13:23:31] <apergos>	 but I have an idea abut that so I'll ping you if my idea doesn't pan out
[13:23:35] <Coren>	 kk
[13:23:45] <apergos>	 thanks for fixing that one thing up
[13:24:47] <Coren>	 Do you have a list of hosts for the gpg error?  I could fix 'em.
[13:34:52] <mark>	 Coren: see PM
[13:41:48] <apergos>	 Coren, no but I could fix em too so no worries there
[13:42:21] <Coren>	 apergos: We're definitely not the first to run into that issue but hypotheses about the root cause are all over the map and, since the fix is trivial, doesn't look like an serious investigation has even taken place.
[13:42:34] <apergos>	 yeah ignore for now
[13:45:37] <wikibugs>	 6Labs, 10Labs-Infrastructure: korma.wmflabs.org server not reachable - https://phabricator.wikimedia.org/T98470#1268759 (10Acs) 3NEW
[13:59:11] <Zhaofeng-Li>	 Hi. Where can I find the source of the /usr/local/bin/tool-* files on toollabs's webgrid?
[14:03:49] <MaxSem>	 hey, I created an instance with 160G storagge, however there's only 20G root partition. do I need to manually mount something?
[14:08:10] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0]  
[14:09:09] <wikibugs>	 6Labs, 10Wikimedia-Hackathon-2015: iPython for Labs: call for an interactive coding plattform - https://phabricator.wikimedia.org/T92506#1268857 (10Qgil) Are you still planning to run this session? Would you prefer to have it scheduled in advanced (i.e. to promote it in our circles)?
[14:19:23] <apergos>	 "most" precise and trusty instances are updated. now starting on jessie.
[14:21:52] <wikibugs>	 6Labs, 10Tool-Labs, 10Wikimedia-Hackathon-2015: Organize Wikimedia Labs activities at the Wikimedia Hackathon 2015 - https://phabricator.wikimedia.org/T92274#1268911 (10Qgil) If there are any sessions that you would prefer to schedule in advance (i.e. a training session for newcomers), please let me know. Th...
[14:31:22] <wikibugs>	 6Labs, 10Wikimedia-Hackathon-2015: iPython for Labs: call for an interactive coding plattform - https://phabricator.wikimedia.org/T92506#1268960 (10daniel) I'd very much like to have this session, but @yuvipanda knows far more about this than I do. It would probably be best if he'd run the show.
[14:32:42] <wikibugs>	 6Labs, 10Labs-Infrastructure, 3ToolLabs-Goals-Q4: Limit NFS bandwith per-instance - https://phabricator.wikimedia.org/T98048#1268966 (10coren) That's not what I meant; I meant that any tc rule on the instance can be changed/removed by anyone who has root there - preventing /that/ is a social problem.
[14:33:05] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0]  
[14:34:28] <wikibugs>	 6Labs, 10Labs-Infrastructure, 3ToolLabs-Goals-Q4: Limit NFS bandwith per-instance - https://phabricator.wikimedia.org/T98048#1268977 (10coren) Currently in effect on a few picked instances, with measurable but limited success.  I'm going to make a changeset to deploy this (with conservative limits) on all in...
[14:45:11] <wikibugs>	 6Labs, 10Labs-Infrastructure: When uploading files to web, in network activity is twice of out - https://phabricator.wikimedia.org/T45060#1269029 (10Nemo_bis) Well, nethogs or vnstat could confirm whether this is still happening. I'll check next time I do some transfers.
[15:20:34] <github>	 [13intuition] 15Krinkle pushed 2 new commits to 06master: 02https://github.com/Krinkle/intuition/compare/213cbba886eb...0820c35a170c
[15:20:34] <github>	 13intuition/06master 14ce457b6 15Niklas Laxström: Convert messages to json
[15:20:35] <github>	 13intuition/06master 140820c35 15Timo Tijhof: Add JSON support and refactor internal domain handling...
[15:28:30] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]  
[15:31:22] <github>	 [13intuition] 15Krinkle pushed 1 new commit to 06master: 02https://github.com/Krinkle/intuition/commit/b2a607cc859ebbf6364974e6a51a21dcf5406380
[15:31:22] <github>	 13intuition/06master 14b2a607c 15Timo Tijhof: Follow-up 0820c35: Ensure "en" is loaded in listMsgs()...
[15:37:33] <github>	 [13intuition] 15Krinkle closed pull request #38: Intuition.php: Add ensureLoaded() in listMsgs() (06json...06json) 02https://github.com/Krinkle/intuition/pull/38
[15:42:02] <wikibugs>	 6Labs, 10Labs-Infrastructure: Instances CPU being stuck on at least a couple instances - https://phabricator.wikimedia.org/T97520#1269202 (10Cmjohnson)
[15:42:05] <wikibugs>	 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad: labvirt1005 memory errors - https://phabricator.wikimedia.org/T97521#1269198 (10Cmjohnson) 5Open>3Resolved a:3Cmjohnson The system board has been changed, everything posted as it should.  Updated the bios and iLom settings. Verified MAC address...
[15:46:26] <wikibugs>	 10Quarry: JSON output should be one row per JSON blob - https://phabricator.wikimedia.org/T98492#1269220 (10Halfak) 3NEW
[15:58:30] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0]  
[16:18:54] <wikibugs>	 10Tool-Labs, 10Wikimedia-General-or-Unknown: Missing information template links in templatelinks database - https://phabricator.wikimedia.org/T89441#1269377 (10Jarekt) I just tried a CatScan query [1] and I do not see any problems. I think this issue is resolved  [1] http://tools.wmflabs.org/catscan3/catscan2....
[16:25:11] <andrewbogott>	 Coren: are you about?
[16:33:31] <andrewbogott>	 !log maps stopping maps-tiles3 as I suspect if of flooding NFS
[16:33:39] <labs-morebots>	 Logged the message, dummy
[16:36:48] <Coren>	 andrewbogott: Just back from lunch.  What up?
[16:36:59] <Coren>	 Hm
[16:37:02] <andrewbogott>	 Coren: labstore1001 is maxed out
[16:37:14] <Coren>	 maps-tiles3?  Damn.  It has a tight choke on network.
[16:37:15] <andrewbogott>	 I killed maps-tiles3, in theory the pressure should be off but the graphs are still pegged.
[16:37:26] <andrewbogott>	 Maybe it takes a while to recover due to pending file requests?
[16:37:43] <andrewbogott>	 that or I killed the wrong thing :)
[16:37:52] <Coren>	 That's generally the case, but not more than a couple minutes.
[16:38:02] <apergos>	 I'm still working on some stragglers in labs; "most" instances have been updated to 2014.7.5.  but I'm going to take a lunch/dinner/brainn rest break now
[16:38:08] <andrewbogott>	 It’s only been a couple of minutes.
[16:38:43] <Coren>	 I'm not seeing any network outliers.  I think it's just combo of increased resync speed + generally highish load.
[16:39:04] * Coren reduces resync speed.
[16:39:49] <Coren>	 That said, IO is very high, but I'm not seeing impact on instances beyond generally sluggish.  Perhaps the worst has passed?
[16:40:11] <andrewbogott>	 Maybe.  Icinga is still very upset.
[16:41:38] <Coren>	 The iowait is expected because Mark increased the resync speed this morning; that's going no necessarily increase baseline a lot.  Although I see a /lot/ of variability.
[16:42:22] <apergos>	 ad while on my break, anyone know why I get failure of dns resolution for scrumbugz-mail.eqiad.wmflabs from bast-restricted? need to be able to ssh into the host
[16:42:36] <apergos>	 (to do the update)
[16:42:37] <andrewbogott>	 Coren: do you think I should restart map-tiles3?
[16:42:43] <andrewbogott>	 apergos: I’ll look.  What project is that?
[16:42:48] <apergos>	 uh
[16:43:03] <Coren>	 andrewbogott: I'd like to know for sure whether it is the root cause, and see if I can choke its network further if it is.
[16:43:05] <apergos>	 https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000130.eqiad.wmflabs  
[16:43:11] <andrewbogott>	 apergos: ok
[16:43:16] <apergos>	 thanks, you rock
[16:43:19] <andrewbogott>	 Coren: is that a ‘yes’?
[16:43:25] <Coren>	 andrewbogott: It's a yes.
[16:43:26] <Coren>	 :-)
[16:43:35] <apergos>	 ah coren I found one caue for the gpg signature/corruption thing heh
[16:43:40] <apergos>	 "out of space on /var"
[16:43:50] <Coren>	 aha.  That explains the truncated file.
[16:43:57] <andrewbogott>	 !log maps restarting maps-tiles3 for further investigation
[16:44:01] <labs-morebots>	 Logged the message, dummy
[16:44:06] <apergos>	 I tossed some logs and forced some gzips, if they were using those logs for stats, tooooo bad
[16:44:32] <apergos>	 yep, when in doubt go for the easy 
[16:44:38] <apergos>	 I am... going for icecream!
[16:44:53] <Coren>	 Yeay icecream!
[16:45:28] <andrewbogott>	 !log scrumbugz restarting scrumbugz-mail as it seems to be frozen
[16:45:31] <labs-morebots>	 Logged the message, dummy
[16:46:26] * Coren limits maps-tiles3 to 20mbps of NFS
[16:47:39] <Coren>	 andrewbogott: I see no visible swing on t he graphs; either that wasn't the culprit or it was just the straw that broke the camel's back.
[16:47:49] <andrewbogott>	 Most likely the latter
[16:48:04] <andrewbogott>	 aude: are you involved in the ‘scrumbugz’ project?
[16:48:28] <andrewbogott>	 Coren: can you ack that icinga warning if you think its not important?
[16:48:35] <aude>	 andrewbogott: not really
[16:48:42] <andrewbogott>	 aude: do you know who is?
[16:48:53] <andrewbogott>	 There’s a clearly broken instance there, I’d like to delete it but probably shouldn’t do that unilaterally.
[16:48:55] <Coren>	 andrewbogott: I haven't decided how important it is quite yet.  I'm still keeping an eye on the numbers for a bit.
[16:49:01] <aude>	 christopher, although not sure he's involved either now
[16:49:16] <andrewbogott>	 I don’t know who christopher is, do you have an email address?
[16:49:19] <aude>	 if it's broken i think it can deleted, but you could ask Tobi_WMDE_SW_NA 
[16:49:30] <andrewbogott>	 um… same question :)
[16:50:13] <aude>	 https://wikitech.wikimedia.org/wiki/User:Christopher_Johnson_(WMDE)
[16:50:42] <aude>	 which one is broken?
[16:50:54] <andrewbogott>	 scrumbugz-mail
[16:51:02] <andrewbogott>	 it was frozen; I rebooted it and it dropped into busybox
[16:51:45] <aude>	 let me see if lydia is around and knows
[16:52:52] <andrewbogott>	 aude: thanks
[16:54:28] * apergos thanks you too
[16:58:35] <andrewbogott>	 apergos: you should ignore that instance for now, it’s hopeless.
[16:58:47] <apergos>	 ok thanks
[17:01:31] <apergos>	 I have two more of those:
[17:01:37] <apergos>	 https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000363.eqiad.wmflabs#Labs_Instance:_incident-test_.28active.29  
[17:01:46] <andrewbogott>	 Coren, yuvipanda, I’m about to merge a labs network patch.  It shouldn’t cause an interruption, but… if something goes awry feel free to blame me.
[17:02:20] <Coren>	 Do we get to blame you if everything works fine too?  :-)
[17:02:29] <andrewbogott>	 I hope so!
[17:02:38] <apergos>	 https://wikitech.wikimedia.org/wiki/Nova_Resource:I-000003be.eqiad.wmflabs#Labs_Instance:_wikiviajesve-jurytool_.28active.29
[17:02:53] <apergos>	 then I have two which deny me for public-key but Im just going to ignore those
[17:03:04] <yuvipanda>	 andrewbogott: oh the snat thing? What effect do you think we will have from that?
[17:03:37] <wikibugs>	 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad: labstore1002 issues while trying to reboot - https://phabricator.wikimedia.org/T98183#1269523 (10coren) @mark suggest it might be worthwhile to ensure that the labstores and their shelves are all on the same phase to avoid the possibility of an electr...
[17:05:03] <andrewbogott>	 yuvipanda: I think that it will cause some host-key mismatch complaints...
[17:05:05] <andrewbogott>	 and fix routing.
[17:05:09] <andrewbogott>	 Have a look and see if you agree
[17:05:37] <yuvipanda>	 andrewbogott: 'fix routing' as I'm for private / public IP issues?
[17:05:47] <andrewbogott>	 Oh, damn it, it seems to be intermittent.
[17:05:53] <yuvipanda>	 I'm still in bed on my phone, will be in office soon.
[17:05:54] <andrewbogott>	 It worked when I was looking a second ago but now isn’t working anymore
[17:05:55] <andrewbogott>	 wtf
[17:06:38] <Coren>	 heisenbugs ftw
[17:10:21] <shinken-wm>	 PROBLEM - Puppet failure on tools-redis-slave is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]  
[17:15:36] <wikibugs>	 10Tool-Labs, 10Wikimedia-General-or-Unknown: Missing information template links in templatelinks database - https://phabricator.wikimedia.org/T89441#1269538 (10Aschroet) @Aklapper, @Umherirrender, @Springle: Could someone of you close that ticket please?
[17:17:22] <wikibugs>	 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad: labstore1002 issues while trying to reboot - https://phabricator.wikimedia.org/T98183#1269542 (10Cmjohnson) I verified that both labstores and their shelves are on the same phase.   -CJ
[17:23:41] <Tobi_WMDE_SW_NA>	 andrewbogott: I'm pretty sure that can be deleted.. 
[17:23:56] <andrewbogott>	 Tobi_WMDE_SW_NA: The project or the instance?
[17:25:22] <shinken-wm>	 RECOVERY - Puppet failure on tools-redis-slave is OK: OK: Less than 1.00% above the threshold [0.0]  
[17:26:05] <Tobi_WMDE_SW_NA>	 andrewbogott: all of it. I'm going to write Christopher a warning though I'm sure it isn't in use anymore 
[17:26:23] <ebernhardson>	 odd error in Special:NovaProxy, its not listing any actual proxies although i know they exist (and i logged out and logged back in ;).  On creating a proxy it says "Successfully added commons-phantomcirrus entry for IP address 208.80.155.156. Failed to create new proxy commons-phantomcirrus.wmflabs.org."
[17:26:28] <andrewbogott>	 Tobi_WMDE_SW_NA: great!  Shall I delete now, or do you want to do some followup first?
[17:27:11] <andrewbogott>	 yuvipanda: did the proxy server api crash again?
[17:27:27] <yuvipanda>	 andrewbogott: that's... Possible 
[17:27:46] <yuvipanda>	 Can you restart it? I'm still getting ready to go to office 
[17:28:58] <Tobi_WMDE_SW_NA>	 andrewbogott: just wrote Christopher a Mail, if possible from your side I want to give him some time to follow-up 
[17:30:26] <andrewbogott>	 Tobi_WMDE_SW_NA: sure, no problem.
[17:30:50] <Tobi_WMDE_SW_NA>	 andrewbogott: thx! 
[17:35:37] <apergos>	 you wanna look at incident-test and wikiviajesve-jurytool ?
[17:36:02] <apergos>	 andrewbogott: (no good deed goes unpunished)  but those ar the only other ones I will ask about
[17:36:12] <apergos>	 anything else will go on a summary in a phab ticket
[17:36:24] <andrewbogott>	 apergos: all I can do is… confirm that they are unreachable for me as well.
[17:36:29] <apergos>	 ah
[17:36:57] <apergos>	 okk I'll just a em to the ticket and someone can "figure it out" alog with the rest of the problems
[17:37:11] <apergos>	 like "I won't fix your broken mysql packages in order to upgrade salt, sorry"
[17:37:55] <andrewbogott>	 that sounds right
[17:38:11] <T13|away>	 +1 ^^
[17:41:28] <wikibugs>	 10Tool-Labs, 10Wikimedia-General-or-Unknown: Missing information template links in templatelinks database - https://phabricator.wikimedia.org/T89441#1269588 (10Steinsplitter) 5Open>3Resolved no replag now. seems resolved.
[18:01:28] <SMalyshev>	 I'm trying to delete an instance on labs and I get "The requested host does not exist." Anybody knows what's up?
[18:01:35] <SMalyshev>	 https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=delete&project=wikidata-query&instanceid=29704829-3c25-4f79-b6f6-ed985946ec1f&region=eqiad
[18:04:34] <andrewbogott>	 SMalyshev: what project?
[18:04:54] <SMalyshev>	 andrewbogott: wikidata-query
[18:05:32] <SMalyshev>	 ok, I logged out and logged back in, seems to show the right info now. Weird
[18:05:47] <andrewbogott>	 I just deleted it, worked for me.
[18:05:51] <andrewbogott>	 No idea what happened
[18:06:16] <SMalyshev>	 andrewbogott: yeah, works for me now too, but before I logged out and logged back in, it didn't... strange
[18:06:59] <SMalyshev>	 looks like it lost some of my auth while still showing me as logged in
[18:20:55] <apergos>	 all right, I declare I have done what I could on salt upgrades. "done", exceptions will go on a ticket and I'll add a few likely lab suspects to it as a fyi
[18:37:50] <spagewmf_>	 https://wikitech.wikimedia.org/wiki/Help:Instances#Initial_settings suggests my new instance should be named developer-doc-devhub. Can I give it a shorter name in URLs using the proxy like http://devhub.wmflabs.org ?
[18:49:01] <spagewmf_>	 yuvipanda andrewbogott :  https://wikitech.wikimedia.org/wiki/Help:Instances#Initial_settings suggests my new instance should be named developer-doc-devhub. Can the proxy give it a shorter name like http://devhub.wmflabs.org ?
[18:49:28] <yuvipanda>	 Totally can!
[18:49:35] <andrewbogott>	 spagewmf_: the proxy name is totally arbitrary, you’ll be prompted for it
[19:01:33] <apergos>	 so where I said I was really done, Ihave just 11 more instances to recheck and then I'll be done. :-D
[19:14:30] <yurik>	 :(  Successfully added karta entry for IP address 208.80.155.156.
[19:14:30] <yurik>	 Failed to create new proxy karta.wmflabs.org. 
[19:14:44] <yurik>	 any idea why i get this when setting up a new webproxy?
[19:15:05] <yurik>	 i also tried it for "maps.wmflabs.org"
[19:15:48] <yurik>	 yuvipanda, i know you know everything about labs :)
[19:17:34] <ebernhardson>	 i mentioned that this morning, someone was supposed to kick the daemon :)
[19:17:55] <ebernhardson>	 i imagine it just hasn't happened yet, yuvi was about to head into the office, then metrics now lunch
[19:18:15] <yuvipanda>	 I'm still at home eating cereal...
[19:18:22] <yuvipanda>	 Delicious cereal :p
[19:18:25] <ebernhardson>	 :)
[19:18:39] <yuvipanda>	 And dont have my laptop with me 
[19:19:12] <ebernhardson>	 for now i'm just faking things with an ssh tunnel and a hacked /etc/hosts ;)
[19:19:18] <yuvipanda>	 Sudo service dynamicproxy-api restart 
[19:19:41] <yuvipanda>	 Coren: andrewbogott can either of you do that on dynamicproxy-gateway??
[19:19:56] <Coren>	 yuvipanda: Sure, gimme a sec
[19:20:11] <yurik>	 yuvipanda, will i need to delete them somehow?
[19:20:25] <yurik>	 it said something about adding an entry for an ip
[19:20:35] <yurik>	 sucessfully
[19:20:40] <Coren>	 dynamicproxy-api stop/waiting
[19:20:40] <Coren>	 dynamicproxy-api start/running, process 2190
[19:20:46] <yuvipanda>	 Thanks Coren 
[19:20:50] <yuvipanda>	 ebernhardson: try?
[19:20:57] <ebernhardson>	 the proxy list has returned!
[19:21:23] <ebernhardson>	 Successfully created new proxy commons-phantomcirrus.wmflabs.org for backend phantomcirrus.eqiad.wmflabs:80.
[19:21:30] <ebernhardson>	 preports to work :)
[19:21:39] <ebernhardson>	 s/pre/pur/
[19:21:46] <Coren>	 yuvipanda: You be in bart?
[19:22:09] <ebernhardson>	 yurik: doesn't look like anything needs to be deleted, just worked
[19:22:11] <wikibugs>	 6Labs, 10OpenStreetMap: Enable OSM Postgres machine access in labs - https://phabricator.wikimedia.org/T98382#1269894 (10Reedy)
[19:22:22] <yuvipanda>	 Coren: not yet. Going to in a bit.
[19:22:28] <yurik>	 ebernhardson, just added http://maps.wmflabs.org/, but it is not resolving :(
[19:22:36] <ebernhardson>	 yurik: usually takes a minute
[19:22:39] <yuvipanda>	 Coren: had a late night yesterday so going to be in the office later than usual.
[19:22:42] <yurik>	 ah, ok
[19:23:13] <ebernhardson>	 yurik: yours resolves now from here, might also be a browser cache?
[19:23:52] <ebernhardson>	 yurik: resolves, but doesn't work ;) http://i.imgur.com/gGxDUjx.png
[19:24:17] <yurik>	 ebernhardson, perfect!
[19:24:26] <yurik>	 but still not working from my side
[19:24:46] <ebernhardson>	 join the dark side, point dns to 8.8.8.8 ;)
[19:25:07] <yurik>	 ebernhardson, do you know how to do it for ubuntu?
[19:25:10] <yurik>	 for all conn
[19:25:18] <yurik>	 there are a lot of conflicting suggestions
[19:25:53] <ebernhardson>	 yurik: does your /etc/resolv.conf say something about 'resolvconf' ?
[19:26:28] <yurik>	 nope
[19:26:49] <ebernhardson>	 yurik: dunno then.  /etc/resolv.conf is where linux sources the information, but its auto-generated by different things in different versions of linux 
[19:26:59] <yurik>	 exactly ;)
[19:27:03] <yurik>	 oh well, will see
[19:27:23] <ebernhardson>	 mine says resolvconf, so i just followed those instructions: # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
[19:53:34] <apergos>	 [8631886.916274] EXT4-fs error (device dm-0): ext4_find_dest_de:1648: inode #391: block 8552: comm dpkg: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[19:53:34] <apergos>	 root@wdq-mm-02:~# date
[19:53:34] <apergos>	 Thu May  7 19:53:19 UTC 2015
[19:53:38] <apergos>	 that is instance um
[19:53:59] <apergos>	 i-000007a1.eqiad.wmflabs
[19:54:07] <apergos>	 there are a pile of errors liek that
[19:54:22] <apergos>	 my feeble attempt to install salt there resulted in an Input/Output error 
[19:54:31] <apergos>	 hmm I guess I'll ping Coren :-P
[19:54:41] <Coren>	 pongs
[19:54:48] <apergos>	 see backread
[19:54:55] <apergos>	 dmesg is full o that crap
[19:55:14] <Coren>	 apergos: That looks like an instance that didn't survive a migration.
[19:55:32] <apergos>	 heh
[19:55:58] <apergos>	 well I'm done done now, as in I can tell you what's wrong with each host that didn't upgrade, and there's not too many
[19:56:07] <apergos>	 but too tired to do the writeup :-P
[19:56:12] <apergos>	 tomorrow am!
[19:56:17] <apergos>	 still didn't eat....
[19:56:56] <Coren>	 Go eat then.  I expect most of those need destruction rather than fixin'
[19:57:47] <apergos>	 well it's mostly brokken packages now
[19:57:57] <apergos>	 only the one fs issue
[19:58:12] <apergos>	 and some where they probably got ferm rules that are wrong but I coul ssh in and fix em
[19:58:14] <apergos>	 shrug
[19:58:36] <apergos>	 escaped a lot cause either shut off or they run their wn salt master, not touching those :-P
[19:58:42] <apergos>	 gonehunting for food!  happy trails
[20:10:47] <spagewmf>	 yuvipanda andrewbogott : Thanks, I updated that page.  I also trimmed mention of "choose m1" (they're all m1) and old ubuntus.
[20:26:27] <Coren>	 spagewmf: Heh.  If it mentionned anything but m1, the page predates even my arrival at the wmf.  :-)
[20:27:45] <spagewmf>	 Coren: I just removed /* ‎Moving a tool from Toolserver to Tool Labs */ from Help:Tool_Labs. The treacly weight of lore :)
[20:28:24] <Coren>	 Yeah, full of hysterical carps.  :-)
[20:29:51] <spagewmf>	 "The instance will be created... You should receive an email when the instance is ready (note: this is broken, currently, for all non-wikimedia email addresses)."  I never got an e-mail, can I remove that sentence?
[20:31:43] <yuvipanda>	 spagewmf: yes
[20:31:50] <yuvipanda>	 That predates me 
[20:32:07] <Coren>	 spagewmf: Hm.  The non-wikimedia thing is not true, for sure, but you /can/ turn on echo notification for new instances including email.
[20:32:21] <Coren>	 IIRC, though, it's off by default because spammy.
[20:35:07] <spagewmf>	 Coren yuvipanda : ah, notifications, I used to get a bunch of them.  I'm configured to send notifications but wikitech doesn't seem to have any "Notify me about these events" for Nova activity.  I don't see anything in https://wikitech.wikimedia.org/wiki/Special:Notifications since January, so I guess the whole feature was turned off.
[20:35:46] <Coren>	 I'm still seeing deleted instance messages, etc.
[20:36:33] <spagewmf>	 Coren: interesting (lucky you!). Do you have something in https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-echo ?
[20:36:38] <Coren>	 But you're right - I don't see them in preferences anymore.  andrewbogott: chime in?
[20:37:05] <andrewbogott>	 Yeah, probably got turned off during the move to silver.
[20:37:07] <spagewmf>	 Coren: do you mean you get them in the Echo [NN] badge next to your username on wikitech pages?
[20:37:25] <spagewmf>	 anyway, I removed the mention in the doc
[20:37:27] <Coren>	 Yeah, I turned off email like almost two years ago.  :-)
[20:37:58] <Coren>	 Dduvall deleted instance "integration-raita" in project Nova Resource:Integration - 9 days ago
[20:38:02] <Coren>	 etc
[20:38:36] <Coren>	 Interestingly enough, now that I look at it, I only see *deletions* now.
[20:39:17] <spagewmf>	 Ideally the Nova integration would let you configure notifications, like how Flow appears in https://www.mediawiki.org/wiki/Special:Preferences#mw-prefsection-echo
[20:57:39] <Betacommand>	 Coren: what did you find out about glamtools?
[20:58:03] <Coren>	 Betacommand: I'm sorry, brain overflow.  Remind me of context?
[20:59:08] <Betacommand>	 26 copies of the same task
[21:02:33] <apergos>	 I'm outa this channel in case my connection drops overight (autojoin with > 10 channels fails)
[21:02:37] <apergos>	 have a good one
[21:02:47] <apergos>	 back hereagain tomorrow at some point
[21:02:49] <Coren>	 o/ @ apergos
[21:03:10] <Coren>	 Betacommand: Right; sorry.  That one I didn't get the time to handle yet.
[21:03:24] <Coren>	 Betacommand: NFS is occupying my brain atm
[22:01:57] <L235>	 hey, I'm getting Forbidden: checkSpider. (timeout 10 min) .Please inform the tool maintainer if this isn't correct. when visiting https://tools.wmflabs.org/xtools/agent/config.php
[22:06:29] <afeder>	 MusikAnimal T13|mobile ^
[22:06:56] <grrrit-wm>	 (03PS1) 10Mhurd: This is a test throw-away commit. [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/209640 
[22:09:03] <MusikAnimal>	 L235: hmm, I'm not seeing that?
[22:09:36] <L235>	 MusikAnimal: well, I'm still getting it
[22:09:47] <L235>	 I'll PM you IP, user-agent etc
[22:11:28] <MusikAnimal>	 now I'm seeing it too
[22:12:14] <L235>	 ah, ok
[22:12:52] <csteipp>	 Coren: dang, sorry. I think I just took down the labs web proxy
[22:13:01] <csteipp>	 yuvipanda: ^
[22:14:03] <yuvipanda>	 csteipp: ouch. looking
[22:17:40] <spagewmf>	 Coren: o/t, can I use your Staff and Contractors {{staff member }} as an example in https://wikimediafoundation.org/wiki/Template:Staff_member/doc ?  The Sue Gardiner example is old and misleading
[22:18:59] <ebernhardson>	 oh, ok its not just me, labs proxy down :)
[22:22:13] <csteipp>	 yuvipanda: Thanks! I won't do that again... 
[22:23:18] <spagewmf>	 Coren: I Wuz Bold, feel free to switch the example someone else. Nobody reads the docs anyway :)
[22:23:51] <yuvipanda>	 csteipp: what exactly did you do? :)
[22:23:54] <grrrit-wm>	 (03CR) 10Legoktm: "In the future there are test repos like mediawiki/extensions/examples designed for this purpose..." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/209640 (owner: 10Mhurd)
[22:23:58] <grrrit-wm>	 (03Abandoned) 10Legoktm: This is a test throw-away commit. [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/209640 (owner: 10Mhurd)
[22:25:09] <grrrit-wm>	 (03CR) 10Yuvipanda: "No :P" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/209640 (owner: 10Mhurd)
[22:25:15] <yuvipanda>	 legoktm: yeah, forgot aobut that
[22:25:23] <yuvipanda>	 !log project-proxy restarted dynamicproxy-gateawy
[22:25:30] <labs-morebots>	 Logged the message, Master
[22:26:26] <Coren>	 spagewmf: Meh, it's unkind to the people that will have to see my ugly mug but I care not.  :-)
[22:33:24] <wikibugs>	 6Labs, 7Tracking: Create Labs Project for WikidataPageBanner extension - https://phabricator.wikimedia.org/T98537#1270793 (10Sumit) 3NEW
[22:50:12] <MaxSem>	 yuvipanda, is instance proxy down?
[22:50:38] <yuvipanda>	 MaxSem: yes, is being rebooted
[22:50:45] <MaxSem>	 ah
[22:50:46] <yuvipanda>	 uh oh, did that not come back up
[22:52:58] <yuvipanda>	  MaxSem ouch, it’s down and staying down
[22:53:23] <yuvipanda>	 andrewbogott: so dynamicproxy-gateway is in “ERROR” state and doesn’t start back up, from commandline nor from interface
[22:53:26] * yuvipanda sees which host it’s at
[22:53:44] <yuvipanda>	 andrewbogott: aaah, it’s on labvirt1008
[22:53:53] <andrewbogott>	 ok, I’ll start it shortly
[22:54:18] <yuvipanda>	 andrewbogott: cool. let me setup a bug to make that redundant.
[22:55:53] <andrewbogott>	 yuvipanda: better?
[22:56:00] <andrewbogott>	 there is a bug already for that, I think
[22:56:54] <yuvipanda>	 andrewbogott: checking
[22:57:04] <yuvipanda>	 andrewbogott: oh? it shouldn’t be too hard, I think.
[22:57:23] <yuvipanda>	 andrewbogott: much better, yes
[22:57:24] <yuvipanda>	 thanks
[22:57:26] <yuvipanda>	 MaxSem: is baack
[22:57:40] <MaxSem>	 thanks:)
[23:08:48] <T13|mobile>	 L235 MusikAnimal afeder: assuming the cause has been figured out and corrected or working on being corrected?
[23:09:20] <T13|mobile>	 Assuming it was csteipp's fault?
[23:43:50] <harej>	 Hi! What is the single largest MySQL insert query I can run?
[23:45:18] <Coren>	 harej: Your question contains insufficient information for a sane answer.
[23:46:15] <Coren>	 harej: Define "large", "single", and "insert" in context.  The usual answer is "pretty large, but that's probably not what you want to do"  :-)
[23:46:17] <harej>	 Okay. I have a script that inserts rows into my own database. I currently have it doing one at a time because I'm stupid. I would like to try to merge some of these together so that I could do in one query what I currently do in ten, or 100, or 1000.
[23:46:48] <harej>	 I am interested in consolidation because this script adds several million rows to a database.
[23:47:45] <Coren>	 Ah.  It's a balancing act and depends on what columns you've got in there and especially if you have any kind of blob - transaction cost vs index updates vs locality.  A good rule of thumb for batching is to start trying around 100 rows and measure from there.
[23:48:09] <spagewmf>	 yuvipanda: https://wikitech.wikimedia.org/wiki/Help:Labs-vagrant#Setting_up_your_instance_with_labs-vagrant says "Run an initial `labs-vagrant provision`". No mention of `sudo`, but later labs-vagrant commands prepend `sudo` or `sudo su labs-vagrant`. Did you or bd808 fix labs-vagrant to not require sudo?
[23:48:16] <harej>	 I have two columns (plus an ID column, but that's auto-generating). Both columns are VARCHAR(255).
[23:48:29] <yuvipanda>	 spagewmf: I haven’t touched it in a long long time, so maybe bd808 did?
[23:48:54] <bd808>	 I think you can run it without sudo...
[23:49:13] <Coren>	 harej: You might be able to get away with 1000 rows in a single transaction without running into transaction size issues.  It's probably best if you try, say, 10, 100 and 1000 and measure.
[23:50:01] <bd808>	 yuvipanda: `labs-vagrant provision` at least works without an explicit sudo because it does sudo for the puppet command
[23:50:05] <Coren>	 harej: The trick is that batching is - as a rule - beneficial but that as transaction size grows the cost of the rollback and indexing can start to weigh heavily.
[23:50:27] <harej>	 Sure. Eventually you have very unwieldy queries that take more time to execute than if they were (relatively) atomized.
[23:51:07] <Coren>	 harej: I found that a good rule of thumb is that, for simple tables, you can generally count on being able to insert about 16k worth of tuples in a single transaction with neglectable cost.
[23:51:38] <harej>	 neglectable, or negligible?
[23:52:18] <Coren>	 ESL, but http://grafana.wikimedia.org/#/dashboard/db/labs-monitoring
[23:52:25] <Coren>	 Err, bad paste.
[23:52:29] <Coren>	 http://english.stackexchange.com/questions/202832/is-there-a-difference-between-negligible-and-neglectable
[23:52:53] <harej>	 I like the Wiktionary citation :)
[23:53:34] <Coren>	 But yeah, I suppose negligible is the more "hip" way of saying it.  :-)
[23:54:08] <Coren>	 BTW, I didn't have that ready - I was genuinely wondering if I kept using the wrong word when you asked so I googled it.  :-)
[23:54:18] <spagewmf>	 bd808: sudo-less `labs-vagrant provision` works! bd808 for CTO!
[23:54:51] <bd808>	 heh. nomination denied
[23:55:00] <spagewmf>	 I followed https://wikitech.wikimedia.org/wiki/Help:Proxy , Successfully added devhub entry for IP address 208.80.155.156.
[23:55:04] <spagewmf>	 Failed to create new proxy devhub.wmflabs.org. 
[23:55:29] <yuvipanda>	 boooo
[23:55:34] * yuvipanda checks
[23:55:41] <spagewmf>	 ^ not sure why that part didn't work, does my instanceneed a special role?
[23:56:05] <yuvipanda>	 !log project-proxy restarted dynamicproxy API again
[23:56:10] <labs-morebots>	 Logged the message, Master
[23:56:16] <yuvipanda>	 spagewmf: want to try again now?
[23:56:41] <spagewmf>	 yuvipanda you mean resubmit the "Create proxy" form?
[23:57:21] <spagewmf>	 that worked.
[23:59:10] <spagewmf>	 ... but "504 Gateway Time-out" from nginx/1.5.0
[23:59:40] <Coren>	 spagewmf: Most likely, you don't have port 80 open in your security groups