[00:39:49] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1010307 (10Springle) The title refers to 'misc' which for production shards usually means m[123]. Unless it refers to a new box procured from the misc pool? My understanding is that wikitech was to be remain largely sep... [00:51:21] 3Labs: New disk partition scheme for labs instances - https://phabricator.wikimedia.org/T87003#1010334 (10yuvipanda) How is this resolved? Did Andrew build the new images and make them available via glance? [00:52:00] Are there known issues with labs that make it extremely slow to interact with instances? I tried over a dozen different instances but it takes like 2-5 seconds (!) for an average character to appear in the terminal [00:52:14] I'm in the office, ethernet hardwired in, wifi disabled [00:58:24] Also taking like 10 minutes to rm a simple file [00:59:02] Krinkle: I think it has to be a local issue for you — things are working for me and I’m on a fairly crappy shared wifi connection at a coffee house [00:59:33] Maybe something perverse is happening w/routing? [00:59:48] If you ping a labs instance do you have a lot of packet loss? [01:05:59] PROBLEM - Free space - all mounts on tools-exec-14 is CRITICAL: CRITICAL: tools.tools-exec-14.diskspace.root.byte_percentfree.value (<11.11%) [01:07:34] guenthermi, whoever you are, I’m about to delete your dump files in /tmp on tools-exec-14 because of ^ [01:30:56] RECOVERY - Free space - all mounts on tools-exec-14 is OK: OK: All targets OK [02:04:03] 3operations, Labs: MySQL on wikitech keeps dying - https://phabricator.wikimedia.org/T88256#1010431 (10Andrew) Aw, when the system didn't go down today I thought I might have fixed it with https://gerrit.wikimedia.org/r/#/c/188263/ but now I see it was most likely Sean's work. 'novaold' can almost certainly go.... [02:14:28] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1010434 (10Springle) andrewbogott clarified on IRC that this only pertains to the wiki db itself. I just assumed it meant everything currently on virt1000, including openstack stuff. Moving just the wiki db over sounds... [04:09:34] 3operations, Labs: MySQL on wikitech keeps dying - https://phabricator.wikimedia.org/T88256#1010548 (10Andrew) a:3Andrew [04:42:53] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1010584 (10Andrew) [04:44:00] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1008962 (10Andrew) Sam, can you comment on what db host is best for this? Sean is concerned that we have yet to host a wiki on m[123]. Also, will this have upgrade implications? [04:53:25] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1010594 (10Springle) Wikitech ("labswiki") uses the text table heavily, which is ~20G. It feels like it should stay separate from normal wikis or confirm properly and for eg, use external storage. If we keep it separate... [05:08:00] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1010610 (10Springle) Also, how will this affect the dumps? Snapshot hosts have been trying to connect to virt1000 for a while now since labswiki appeared, but have been blocked by the firewall. They **would** be allowed... [05:18:22] (03PS1) 10Mattflaschen: Add MoodBar, WikiLove to #wikimedia-collaboration [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/188302 [05:21:44] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1010614 (10Springle) Regarding security, virt1000 db current has connections from virt1000 and labnet1001. Will more hosts need to connect? Which ones? Is the list acceptable for the current production network security? [05:22:59] (03PS1) 10Mattflaschen: Add MoodBar to wikimedia-collaboration [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/188303 [05:25:49] (03CR) 10Legoktm: [C: 032] Add MoodBar to wikimedia-collaboration [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/188303 (owner: 10Mattflaschen) [05:26:07] (03Merged) 10jenkins-bot: Add MoodBar to wikimedia-collaboration [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/188303 (owner: 10Mattflaschen) [05:39:28] !log tools.wikibugs Updated channels.yaml to: 9054845f4a69a7364f5270e2ada574f696e4f70f Add MoodBar to wikimedia-collaboration [05:39:33] Logged the message, Master [06:05:51] !log tools.wikibugs legoktm: Deployed 9054845f4a69a7364f5270e2ada574f696e4f70f Add MoodBar to wikimedia-collaboration wb2-phab [06:05:55] Logged the message, Master [06:08:19] 3Wikibugs, Echo: wikibugs test bug - https://phabricator.wikimedia.org/T1152#1010644 (10Legoktm) [06:08:40] 3Wikibugs: wikibugs test bug - https://phabricator.wikimedia.org/T1152#19999 (10Legoktm) [06:32:15] 3Wikimedia-Labs-Infrastructure: "The specified resource does not exist" when you try to configure an instance and are not a projectadmin - https://phabricator.wikimedia.org/T67379#1010668 (10Nemo_bis) p:5Triage>3Normal [07:40:20] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1010735 (10Reedy) We have a script for moving stuff to external storage. Probably not tested recently though. Maybe we could run a neutered version that doesn't update the text table to make sure it migrates properly.... [07:47:24] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1010747 (10Joe) I think we should for sure keep labswiki separated from production wikis. Not sure if we should move things to ES either, I'd like wikitech to be treated as a separated entity from the rest of prod as mu... [07:50:03] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1010759 (10Joe) We should definitely disable dumps if they are not for backup purposes as well. [07:55:48] YuviPanda|zzz: if it expected that applying role::labs::lvm::srv erases /srv entirely? [07:55:52] * is it [08:12:47] 3operations, Labs: MySQL on wikitech keeps dying - https://phabricator.wikimedia.org/T88256#1010779 (10Springle) novaold errors have been fixed. [08:13:36] Nemo_bis: yeah. it is recoverable if needed, though [08:17:08] YuviPanda|zzz: ok; if it takes minutes rather than hours, it would be nice for us to recover /srv/mediawiki [08:17:28] Nemo_bis: sure. which instance is this? [08:17:33] Otherwise we can recreate the few things which were not in version control [08:17:39] YuviPanda: ttmserver-mediawiki01: [08:17:40] it does take minutes, yeah [08:17:42] moment [08:18:04] YuviPanda: mind you, I'm not sure puppet is done [08:18:23] You might want to check that https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=configure&project=ttmserver&instanceid=456b8070-f46b-4a4e-a2bf-7a60ce6ceb92®ion=eqiad corresponds to what you see in the instance [08:19:17] Or just copy the data to /data/scratch or whatever and let me put it in place, whatever takes less for you [08:19:26] Nemo_bis: I copied the old /srv/mediawiki to the new /srv [08:19:51] Nemo_bis: check it out? [08:21:15] YuviPanda: ls: cannot open directory mediawiki/: Permission denied [08:21:27] Nemo_bis: oh, it's still copying [08:21:28] ok, easy to fix [08:21:29] must've been huge [08:21:34] A few GBs [08:21:38] right [08:21:52] Ok, I'll wait a bit to avoid doing damage [08:21:53] instance iops have been terrible on some hosts lately [08:22:04] Nemo_bis: yeah, I'll let you know when it's done [08:22:22] RECOVERY - Puppet staleness on tools-exec-07 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:24:17] Nemo_bis: strange. it's still copying [08:25:43] disk 100 % busy around 10 MB/s of IO activity [08:25:54] hmm, that sounds low [08:42:40] Looks done and working. Thanks! [08:43:02] Nemo_bis: whee! cool [08:43:12] Nemo_bis: should file a bug about low local disk IO performance though. can you? [08:44:45] Ok [08:45:11] Part of the issue seems to be that it's swapping to disk even though there is free memory, just because mysql reserved a lot of virtual memory [08:45:16] aaaah [08:45:21] that might've been the underlying cause [08:51:48] YuviPanda: that said, /var often capped at some 6 MB/s write [08:51:58] At least according to "atop" [09:01:38] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<11.11%) [09:11:37] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [09:31:44] 3Labs: New disk partition scheme for labs instances - https://phabricator.wikimedia.org/T87003#1010937 (10yuvipanda) 5Resolved>3Open [11:50:30] 3Tool-Labs, Labs: Fix Labs' PAM config mess - https://phabricator.wikimedia.org/T85910#1011132 (10faidon) p:5Triage>3High [11:50:44] 3Tool-Labs, Labs: Fix Labs' PAM config mess - https://phabricator.wikimedia.org/T85910#1011134 (10faidon) a:3coren [11:51:27] 3Tool-Labs-tools-Other: Move Gerrit reports crontabs to Tools - https://phabricator.wikimedia.org/T88384#1011135 (10Nemo_bis) [12:24:12] 3Labs: New disk partition scheme for labs instances - https://phabricator.wikimedia.org/T87003#1011157 (10yuvipanda) I'm building the glance images for precise and trusty atm. [13:51:41] hey, anybody around? Having some trouble [13:51:53] Going to Special:NovaInstance, I see the "design" project heading but no instances show up [13:52:06] likewise at https://wikitech.wikimedia.org/wiki/Special:NovaProxy [13:52:18] and then running labs-vagrant provision gives me this error on reflex.eqiad.wmflabs [13:52:23] Error: invalid byte sequence in US-ASCII at /vagrant/puppet/modules/hhvm/manifests/init.pp:1 on node reflex.eqiad.wmflabs [13:58:46] werdna: for the first, log out and back in [13:58:48] that would work [14:01:23] YuviPanda: Did that, and it worked. Thanks! [14:02:04] YuviPanda: any thoughts on the second issue? [14:04:40] werdna: good question. I'm not sure. [14:15:53] 3Labs: New disk partition scheme for labs instances - https://phabricator.wikimedia.org/T87003#1011344 (10yuvipanda) [14:16:35] 3Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1011345 (10yuvipanda) Update from @akosiaris - he said he will try to get to this after Tuesday, which is today! [14:17:04] YuviPanda: also not having much luck with the site itself: http://reflex.wmflabs.org/w/index.php [14:17:21] werdna: hmm, maybe I can take a look. looking [14:18:03] I guess the two issues could be related [14:18:36] werdna: hmm, I just ran labs-vagrant provision and it seems to work ok [14:18:50] werdna: the web server won't work until labs-vagrant provision completes [14:22:30] YuviPanda: also tried running it as root but I don't know [14:22:53] werdna: hmm, so I did 'sudo su' and then ran labs-vagrant provision [14:22:57] I wonder if that made any difference? [14:23:01] dunno [14:23:04] *shrug* [14:23:09] werdna: but it's running now... [14:23:09] let me know when it finishes [14:23:25] werdna: labs-vagrant is a very useful hack, but still a hack. Need to spend some time tidying it up properly... [14:25:59] working now :D [14:26:22] werdna: yup :) [14:30:17] YuviPanda: <3 thanks [14:30:27] 3Tool-Labs-tools-Other: Move Gerrit reports crontabs to Tools - https://phabricator.wikimedia.org/T88384#1011373 (10MZMcBride) >>! In T88384#1011125, @Nemo_bis wrote: > MZ, as you assigned this to me... I have created a gerrit-reports tool and cloned the code there; at your convenience, for maximum smoothness,... [14:30:46] werdna: :) yw! [14:33:40] 3Tool-Labs-tools-Other: Move Gerrit reports crontabs to Tools - https://phabricator.wikimedia.org/T88384#1011393 (10MZMcBride) Oh, sorry, I'm still waking up. The configuration portion is probably what's actually needed of me, at least to continue using the same bot account. I'll take a look. [16:12:14] 3Staging, Labs: Create staging project - https://phabricator.wikimedia.org/T88439#1011648 (10yuvipanda) 3NEW a:3yuvipanda [16:15:02] !log staging created project, added me and ^d as projectadmins [16:15:04] Logged the message, Master [16:23:33] 3Tool-Labs: Shorten update interval of lighttpd error logs - https://phabricator.wikimedia.org/T87562#1011689 (10jkroll) valhallasw: works for me, thanks. [16:28:46] $ qdel 4801445 -- job 4801445 is already in deletion [16:29:08] seems to have died finally [16:29:30] !log tools-jouncebot Killed and restarted jouncebot [16:29:30] tools-jouncebot is not a valid project. [16:29:37] !log tool-jouncebot Killed and restarted jouncebot [16:29:38] tool-jouncebot is not a valid project. [16:30:45] YuviPanda: what's the !log syntax for a tool? [16:31:30] !log tools.jouncebot Killed and restarted jouncebot [16:31:32] Logged the message, Master [16:31:38] success! [18:11:40] 3Labs: New disk partition scheme for labs instances - https://phabricator.wikimedia.org/T87003#1012018 (10yuvipanda) 5Open>3Resolved So, precise and trusty images seem to work! I've deprecated the old images and made new ones. And there was much rejoicing! [18:20:25] 3Labs: Investigate / remove swap from labs instances - https://phabricator.wikimedia.org/T88450#1012034 (10yuvipanda) 3NEW [20:50:48] I have a problem with shell access to bastion, could anyone help me? [20:52:10] what's the problem? [20:54:48] zdzislaw_: we can help you but only if you tell us waht the problem is :) [20:56:12] in three days I try to login to bastion, every time I get " Server refused our key" [20:56:30] You did upload it to wikitech? [20:56:41] I I checked the key on other servers - and it's ok [20:56:53] yes I upload public key [20:57:00] to wikitech [20:57:03] zdzislaw_: what’s an example of another server? [20:57:49] my university servers [20:58:04] oh, so not any labs servers [20:58:20] zdzislaw_: what is your username? [20:58:28] So I am sure that the key is correct [20:58:32] zdzislaw [20:58:38] what does the following tell you? ssh -vvv bastion.wmflabs.org [20:59:41] zdzislaw_: I think you don’t have a shell account. Looking, give me a minute [20:59:44] I am using PuTTY (windows) [21:00:17] View the event log? [21:01:19] yes [21:01:35] Event Log: Offered public key Incoming packet #0x7, type 51 / 0x33 (SSH2_MSG_USERAUTH_FAILURE) ....publickey. Event Log: Server refused our key [21:02:35] zdzislaw_: something is wrong on our end, I’m investigating. [21:06:07] Probably no new users have worked since the 31st. [21:06:54] Yeah, we hit a magic number of users (5000) and the query isn’t working anymore :( [21:07:23] we have 5000 users!? [21:07:23] ok, thanks... I spend a loooooooot of time trying to login [21:10:04] Reedy: for not-entirely-up-to-date stats, look down and to the right on the wikitech front page [21:11:04] 4,837 [21:11:19] Isn't that just wiki uer accounts? [21:15:17] andrewbogott: is a chance tofix the problem today? [21:16:31] zdzislaw_: try now? [21:16:46] ok [21:17:29] works! thanks! [21:17:44] andrewbogott: thank you [21:18:05] cool. Sorry for the trouble. [21:33:01] (03PS1) 10BearND: Updated Android SDK and gitignore [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/188433 [22:13:21] (03Abandoned) 10BearND: Updated Android SDK and gitignore [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/188433 (owner: 10BearND) [22:15:37] (03PS1) 10BearND: Update Android SDK version [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/188478 [22:26:21] (03PS2) 10BearND: Update Android SDK version [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/188478 [22:26:46] (03CR) 10BearND: [C: 032] Update Android SDK version [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/188478 (owner: 10BearND) [22:52:53] in the annual report, the "How does it work?" pop-up explaining the edit counter, has an _Edit Counter homepage_ link to http://tools.wmflabs.org/wmcounter/ says "No webservice. The URI you have requested, /wmcounter/, is not currently serviced." http://tools.wmflabs.org/?tool=wmcounter also links to that URL as "(Web interface)". Is it a wrong URL or a transient problem? [22:56:04] someone with access to the tool would need to turn it on, I think [22:56:30] so Emijrp or a tools admin [22:57:24] Krenair: thanks. I'll alert Emirjp. Or maybe the wmcounter tool doesn't have a Web interface and this is expected. Is this the IRC channel for tools admin? [22:58:00] this channel is good for almost all labs stuff, including tools [23:38:39] (03PS1) 10BearND: Don't need ANDROID_BUILD_TOOLS env var [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/188485 [23:39:03] (03CR) 10BearND: [C: 032] Don't need ANDROID_BUILD_TOOLS env var [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/188485 (owner: 10BearND) [23:59:29] (03CR) 10BearND: [V: 032] Update Android SDK version [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/188478 (owner: 10BearND) [23:59:53] (03CR) 10BearND: [V: 032] Don't need ANDROID_BUILD_TOOLS env var [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/188485 (owner: 10BearND)