[02:16:43] Coren: fyi (in case it wasn’t obvious in the weekly etherpad) I’m out tomorrow and Friday. I should be emailable throughout and never more than a few hours from my laptop though. [06:44:10] PROBLEM - Puppet failure on tools-exec-04 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [07:09:06] RECOVERY - Puppet failure on tools-exec-04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:24:11] (03PS1) 10Alexandros Kosiaris: akosiaris: Add passwords::tendril class [labs/private] - 10https://gerrit.wikimedia.org/r/190174 [09:27:10] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] akosiaris: Add passwords::tendril class [labs/private] - 10https://gerrit.wikimedia.org/r/190174 (owner: 10Alexandros Kosiaris) [11:29:24] So, I apologize, but it seems I have broken something on wikitech wiki https://wikitech.wikimedia.org/wiki/Special:Ask/-5B-5BResource-20Type::project-5D-5D/-3F/-3FDescription/format%3Dbroadtable/link%3Dall/headers%3Dshow/mainlabel%3D-2D/searchlabel%3Dprojects/offset%3D0 [11:29:39] we don't see all projects anymore but I don't get why... !? [11:37:00] PROBLEM - Puppet staleness on tools-exec-15 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [11:41:06] 3Labs: Move wikitech web interface to a dedicated server - https://phabricator.wikimedia.org/T88300#1033882 (10akosiaris) Well, it is a shame to have wikitech users unable to login because of this. Apart from the OOM that we should anyway fix, we should monitor keystone anyway. Looking at icinga I see that we do... [11:45:28] 3Labs: Monitor openstack components - https://phabricator.wikimedia.org/T89344#1033895 (10akosiaris) 3NEW [14:27:15] 3Labs: Move wikitech web interface to a dedicated server - https://phabricator.wikimedia.org/T88300#1034214 (10coren) T89266 tracks adding memory to the server to reduce de probablility of OOM, so this can indeed be closed. [14:32:42] 3Project-Creators, Wikimedia-Labs-wikistats, MediaWiki-extensions-OpenStackManager, Wikimedia-Labs-Infrastructure, Wikimedia-Labs-wikitech-interface, Labs, Labs-Vagrant, Tool-Labs-tools-Article-request, Tool-Labs, Wikimedia-Labs-Other, Beta-Cluster, Wikimedia-Labs-extdist, Wikimedia-Labs-General: Labs' Phabricator... [14:40:07] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1034230 (10coren) Inspection of the networking stack's live state show that @yuvipanda's tweak to the size of connection tracking helps a lot for the typical case, but remains vulnera... [15:23:34] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1034297 (10coren) p:5Unbreak!>3High (The above patch is hotfixed on labnet1001) This stabilizes things, and DNS resolves properly in Labs at this time. Still needs monitoring, a... [15:38:45] 3Wikimedia-Labs-Other, Project-Creators, Wikimedia-Labs-General: Labs-General vs Labs-Other - https://phabricator.wikimedia.org/T87372#1034344 (10Aklapper) [15:38:48] 3Project-Creators, Wikimedia-Labs-wikistats, MediaWiki-extensions-OpenStackManager, Wikimedia-Labs-Infrastructure, Wikimedia-Labs-wikitech-interface, Labs, Labs-Vagrant, Tool-Labs-tools-Article-request, Tool-Labs, Wikimedia-Labs-Other, Beta-Cluster, Wikimedia-Labs-extdist, Wikimedia-Labs-General: Labs' Phabricator... [18:17:57] !log nginx recreate instance nginx-dev2 with ubuntu-14.04-trusty image [18:18:01] Logged the message, Master [19:47:09] can I "upgrade" an instance from m1.small to m1.medium? [19:58:48] werdna: Not really. [20:08:29] guess I'll just copy everything over [20:10:58] I'm applying the labs-vagrant role to my new 'reflex2' host, but it doesn't seem to be taking even if I force a puppet run [20:11:37] werdna: Meaning /vagrant hasn't been provisioned for you? [20:12:07] bd808: huh, /vagrant does exist, but > mysql does nothing [20:12:07] werdna@reflex2:~$ mysql [20:12:07] The program 'mysql' can be found in the following packages: [20:12:20] so since I saw no puppet output, assumed it hadn't been provisioned [20:12:23] but yes, there is a /vagrant [20:12:27] Try running `labs-vagrant provision` [20:12:43] got it, thanks [20:13:09] except this one: Error: invalid byte sequence in US-ASCII at /vagrant/puppet/modules/hhvm/manifests/init.pp:1 on node reflex2.eqiad.wmflabs [20:13:13] had it last time, I don't remember why [20:13:17] I guess I need to use sudo [20:13:30] hmmm... it should take care of that [20:13:40] nope, still doesn't work under sudo [20:13:55] yuvi fixed that one for me last time but I don't remember how [20:14:04] werdna: Is this a Trusty host or Precise? [20:14:09] trusty [20:15:02] trusty "should" work fine. I setup a labs-vagrant host on friday [20:15:28] cheated by removing some binary characters from init.pp comments [20:15:34] I guess my locale is wrong [20:16:17] echo $LANG : en_US.UTF-8 [20:16:23] That works for me [20:16:49] Sounds like a semi-common puppet problem -- https://ask.puppetlabs.com/question/3241/invalid-byte-sequence-in-us-ascii-when-automating-puppet/ [20:17:11] en_US.UTF-8 me too [20:17:19] huh [20:21:19] and now I get this one: Notice: /Stage[main]/Mediawiki/Php::Composer::Install[/vagrant/mediawiki]/Exec[composer-install--vagrant-mediawiki]/returns: [Thu Feb 12 20:20:27 2015] [hphp] [21947:7f17bd292bc0:0:000001] [834dc6:835050:9ca744:9caa13:9cf12a:9d070f:a6e441:856a5d:7f17b4f26ec5:909e04] [20:21:19] Notice: /Stage[main]/Mediawiki/Php::Composer::Install[/vagrant/mediawiki]/Exec[composer-install--vagrant-mediawiki]/returns: Fatal error: unexpected St13runtime_error: locale::facet::_S_create_c_locale name not valid [20:21:19] Error: composer install --optimize-autoloader --prefer-dist returned 255 instead of one of [0] [20:21:29] which looks related [20:21:44] 3Labs, hardware-requests, ops-eqiad, operations: Can virt1000 take more ram? - https://phabricator.wikimedia.org/T89266#1035032 (10Cmjohnson) Great News! The old cp box that just broke (idrac related) has 4 sticks of 8GB RAM that will work. Let me know when you want to add/swap. [20:22:41] hhvm died when running something. Let me peek at what that code does [20:22:58] bd808: https://stackoverflow.com/questions/19100708/mongodb-mongorestore-failure-localefacet-s-create-c-locale-name-not-valid [20:23:03] bd808: that looks like the problem [20:23:08] I did echo $LC_ALL and it was empty [20:23:45] so setting LC_ALL and rerunning works [20:24:23] werdna: cool [20:24:31] sheesh, now what [20:24:58] Notice: /Stage[main]/Role::Labs_initial_content/Mediawiki::Import_dump[labs_privacy]/Exec[import_dump_labs_privacy]/returns: [0e863277] [no req] Exception from line 92 of /srv/vagrant/mediawiki/includes/jobqueue/JobQueueRedis.php: Non-daemonized mode is no longer supported. Please install the mediawiki/services/jobrunner service and update $wgJobTypeConf as needed. [20:29:07] werdna: I'm seeing the same error on an instance I'm working on right now [20:29:15] looking into it [20:29:35] I created a file in settings.d with the default $wgJobTypeConf to work around [20:30:53] $wgJobTypeConf['default'] is defined in /vagrant/LocalSettings.php [20:31:28] Does it have bad settings for the latest MW code? [20:31:54] I think it's a race condition [20:32:17] that is my guess [20:32:19] though, not sure [20:32:50] That message throws when "if ( empty( $params['daemonized'] ) ) {" [20:33:08] So it looks like the defaults need to be updated somehow [20:33:16] yeah, I guess so [20:33:20] * bd808 looks for the core cahnge [20:34:05] boom! https://gerrit.wikimedia.org/r/#/c/186410/ [20:34:14] yeah, I found it googling for the error message [20:34:32] I'll make an mw-vagrant patch [20:34:43] will be blowing things up for lots of users [23:16:21] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1035592 (10scfc) Slightly related, whenever OpenStack's dnsmasq is manually restarted, there is a short interval where various stuff complains because the sole DNS server isn't there.... [23:39:25] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1035664 (10mmodell) +1 for a second DNS server.