[06:23:21] 06Labs, 10DBA, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3299408 (10Marostegui) (As I was expecting) I have seen some issues when importing compressed tablespaces. I am troubleshooting it. [06:41:14] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1426 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:16:14] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1426 is OK: OK: Less than 1.00% above the threshold [0.0] [07:23:56] Hi all. I'm running python 3 web app via Kubernetes. Web app, needs another Python app, celery to run 24/7 for it to work. And I'm having issue with the last one, as Python venv shell is being killed every few hours it seems. How can I have a constantly running process in virtualenv of Kubernetes, accessible to web app? [07:40:31] 06Labs, 10Tool-Labs: A tool queries urwiki recentchanges 6 times per second - https://phabricator.wikimedia.org/T166531#3299446 (10hashar) I am impressed how fast you managed to find the script! If at all possible, can you hack the script so it sleeps whenever nothing has changed? ``` --- watch-orig.py 2017-0... [07:58:21] 06Labs, 10DBA, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3299482 (10Marostegui) While troubleshooting db1095 gave a nasty error and I do not trust it much now I have read a few posts that compression + transporta... [08:10:29] (03PS1) 10Lokal Profil: Add address to handled wikidata fields [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/356148 [08:11:53] (03PS1) 10Lokal Profil: Add address to handled wikidata fields. [labs/tools/heritage] (wikidata) - 10https://gerrit.wikimedia.org/r/356149 [08:13:24] (03Abandoned) 10Lokal Profil: Add address to handled wikidata fields [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/356148 (owner: 10Lokal Profil) [08:58:49] (03PS1) 10Lokal Profil: Rename -lang argument -langcode [labs/tools/heritage] (wikidata) - 10https://gerrit.wikimedia.org/r/356152 (https://phabricator.wikimedia.org/T166528) [09:02:39] (03CR) 10Lokal Profil: "I have some other patches out addressing some of the issues encountered in this patch (such as the -lang parameter and there being no addr" [labs/tools/heritage] (wikidata) - 10https://gerrit.wikimedia.org/r/354961 (https://phabricator.wikimedia.org/T165988) (owner: 10Jean-Frédéric) [11:13:58] 06Labs, 13Patch-For-Review: Replace custom ec2id fact with facter's ec2 - https://phabricator.wikimedia.org/T86297#3299834 (10faidon) 05Open>03Resolved a:03faidon Fixed since 1a9f7b6573b2d85bd503ba95b10eb4b670692fe9, 2 years ago, I think :) [13:47:12] 06Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Move wikitech-static to Chicago - https://phabricator.wikimedia.org/T164271#3300063 (10Andrew) Actually migrating the VM turned out to be a nightmare, so I built a new install (running stretch) on https://wikitech-static-ord.wikimedia.org/wiki/Main_Page... [13:48:53] 06Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Move wikitech-static to Chicago - https://phabricator.wikimedia.org/T164271#3300078 (10Andrew) [13:48:55] 06Labs, 10wikitech.wikimedia.org: Set up external DNS record for wikitech-static - https://phabricator.wikimedia.org/T164290#3300077 (10Andrew) [13:49:12] 06Labs, 10wikitech.wikimedia.org: Set up external DNS record for wikitech-static - https://phabricator.wikimedia.org/T164290#3228802 (10Andrew) No point in doing this until the new server is in Chicago, otherwise we'll have to do it twice. [13:49:21] 06Labs, 10wikitech.wikimedia.org: Set up external DNS record for wikitech-static - https://phabricator.wikimedia.org/T164290#3300083 (10Andrew) p:05Triage>03Normal [13:49:43] 06Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Move wikitech-static to Chicago - https://phabricator.wikimedia.org/T164271#3228092 (10Andrew) p:05Triage>03Normal [13:50:58] 06Labs, 10wikitech.wikimedia.org, 05MW-1.30-release-notes (WMF-deploy-2017-05-30_(1.30.0-wmf.3)), 13Patch-For-Review: Disable/redirect instance instance management on wikitech - https://phabricator.wikimedia.org/T164875#3300091 (10Andrew) p:05Triage>03Normal [13:56:20] 06Labs: nova-fullstack is losing instances on creation - https://phabricator.wikimedia.org/T165555#3300129 (10Andrew) Still no failures. [14:10:16] andrewbogott hi, im getting this error when puppet runs [14:10:16] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node 10.68.20.204: Failed to find 10.68.20.204 via exec: Execution of '/usr/local/bin/puppet-enc 10.68.20.204' returned 255: Invalid hostname 10.68.20.204 [14:10:20] on gerrit-test [14:10:29] it only just started happening [14:11:27] Hmm [14:11:28] strange [14:11:29] root@10:/home/paladox# ping 10.68.20.204 [14:11:36] it shows the number 10 instead of the host [14:12:57] 06Labs, 10DBA, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3300140 (10Marostegui) db1095 has been restored from backups and I have started replication there and on labsdb hosts and db1070 let them catchup. Probably... [14:15:51] or chasemp ^^ [14:16:29] paladox: I'll look [14:16:35] thanks [14:16:54] is that a new instance or one that's been around for a while? [14:18:13] it's been around for a while [14:18:19] and… did you mess with the hostname or anything? [14:19:01] Nope [14:19:29] I was trying to update zuul from https://gerrit.wikimedia.org/r/#/c/356181/ [14:19:45] 06Labs, 10Labs-Infrastructure, 10Tool-Labs: Rollout prometheus-node-exporter 0.14 in labs - https://phabricator.wikimedia.org/T166561#3300186 (10fgiunchedi) [14:24:27] 06Labs, 10Labs-Infrastructure: labvirt1006 super busy right now - https://phabricator.wikimedia.org/T165753#3300230 (10hashar) Graphs over 24 hours: [[https://grafana.wikimedia.org/dashboard/db/labs-capacity-planning?panelId=91&fullscreen&orgId=1&from=now-24h&to=now&var-labvirt=labvirt1006 | CPU % x 2 1 day m... [14:25:39] 06Labs, 10Labs-Infrastructure: labvirt1006 super busy right now - https://phabricator.wikimedia.org/T165753#3300236 (10hashar) [[ https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=labvirt1006&var-network=eth0&from=now-90d&to=now | Server board over 90 days ]] [14:28:39] andrewbogott works now, thanks :) [14:30:10] paladox: I'm glad you brought it up, I think that lots of VMs were about to fall off the network [14:30:19] Not sure what was broken, I restarted a bunch of the dhcp components and things got better [14:30:20] oh [14:30:25] please let me know if you see anything like that again! [14:30:29] ah :), thanks. [14:30:35] Ok. [14:30:44] Thanks for responding quickly and fixing it :) [14:45:20] Xelgen: someone else was working on a similar thing in the last few weeks. I don't think they have made a tutorial on it, but there is some info at https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Kubernetes#Kubernetes_continuous_jobs on how to run a custom continuous job on our Kubernetes cluster [14:46:00] * bd808 looks at irc logs to see if he can figure out who else was trying to get celery working [14:50:03] Xelgen: I think that Zhaofeng_Li was the other person looking into a celery worker to go with their k8s python app. Maybe the two of you can compare notes and add documentation to wikitech on how to make it work. [14:53:32] Xelgen: the custom deployment that Zhaofeng_Li made is at /data/project/refill-api/refill.yaml . It looks like they created a deployment file that runs both the celery and webservice pods together. [14:54:51] "someday" we will have a proper platform as a service wrapper that will make these sorts of things easier, but that day is not going to be soon. We need some volunteers to help with that project if it is going to move forward in the foreseeable future. [15:29:07] 06Labs: Request increased quota (floating ip) for getstarted labs project - https://phabricator.wikimedia.org/T166324#3292471 (10Andrew) This is approved, I'll update the quota shortly. Make sure you limit project access so your certs don't leak :) [15:45:24] 06Labs: Make a WMCS 'clinic duty' doc page - https://phabricator.wikimedia.org/T166572#3300562 (10Andrew) [15:46:19] 06Labs, 15User-bd808: Make a WMCS 'clinic duty' doc page - https://phabricator.wikimedia.org/T166572#3300575 (10bd808) a:03bd808 [15:46:36] 06Labs, 15User-bd808: Make a WMCS 'clinic duty' doc page - https://phabricator.wikimedia.org/T166572#3300578 (10bd808) Start with notes that Madhu made during offsite [15:59:10] any labs folk around? I'd like to chat about T166203#3298177 [15:59:10] T166203: Upgrade facter to version 2.4.6 - https://phabricator.wikimedia.org/T166203 [16:01:28] labs has been upgraded i think [16:01:29] root@gerrit-test:/home/paladox# facter -v [16:01:29] 2.4.6 [16:01:37] volans ^^ [16:02:23] paladox: any >2 version should be good for now, but to avoid issues it should be in *all* labs instances, including trustys [16:03:00] Oh, i see, havent tested trusty. [16:13:29] volans shows the same for trusty [16:14:16] old one or newly started? [16:14:18] o/ [16:14:36] the new ones [16:14:40] oh [16:14:44] wait, it's an old instance [16:15:01] shows 2.4.6 [16:15:09] volans: andrewbogott often handles cross-project forced upgrade stuff for us in the VMs. [16:15:40] He can probably at least help you figure out what percentage is up to date and what needs intervention [16:15:50] bd808: great, thanks [16:16:00] but make me a phab ticket please :) [16:16:22] cannot T166203 works andrewbogott ? [16:16:24] T166203: Upgrade facter to version 2.4.6 - https://phabricator.wikimedia.org/T166203 [16:16:40] ok [16:16:55] so to keep the previous discussion too, I can add you and labs [16:17:12] actually faidon already did add labs :D [16:19:20] andrewbogott should that https://phabricator.wikimedia.org/phame/post/view/14/manage_instance_on_horizon_only/ be public or All Users? [16:19:34] It seems only logged in users will be able to view that. [16:19:51] it's whatever the defaults are for that blog I think… I'll look thugh [16:20:00] ok thanks [16:20:52] paladox: it was set to "all users" but I just changed it to "public" [16:21:01] thanks bd808 :) [16:21:19] bd808, andrewbogott: the next step in prod being T166372 where we're ready to do it with a temporary "if prod", see gerrit 356062 [16:21:19] T166372: Puppet: test non stringified facts across the fleet - https://phabricator.wikimedia.org/T166372 [16:21:49] just to give some context ;) [16:23:40] PROBLEM - Puppet errors on tools-exec-1437 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:27:43] andrewbogott: generally speaking, what can we use to run a command across all labs instances? [16:29:32] 06Labs, 06Operations: Upgrade facter to version 2.4.6 - https://phabricator.wikimedia.org/T166203#3300848 (10Volans) a:05Volans>03None [16:29:37] paravoid: I think there's a clush setup but I haven't used it. Last time I did it it was with salt from labcontrol. [16:29:37] clush I *think*. Salt will kind of work too except its salt and some projects may have local salt masters [16:29:49] I thought clush was tools-only? [16:32:27] paravoid: when c.hase is around tomorrow I think he can confirm or deny clush across everything. I think that it is possible but not trivial today outside of the tools project. [16:32:54] 06Labs, 06Operations, 10ops-eqiad: Degraded RAID on labstore1003 - https://phabricator.wikimedia.org/T165220#3300869 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson Resolving this [16:37:05] root@labcontrol1001:~# salt map.oxygenguide.eqiad.wmflabs cmd.run 'lsb_release -a' [16:37:08] map.oxygenguide.eqiad.wmflabs: No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 12.04.4 LTS Release: 12.04 Codename: precise [16:37:11] sorry, that was mangled a little bit [16:37:14] but TL;DR [16:37:21] map.oxygenguide.eqiad.wmflabs is online and running precise [16:38:18] huggle.huggle.eqiad.wmflabs doesn't have puppet installed whatsoever [16:39:03] rc puppet 3.8.5-2~bpo8 all configuration management system, [16:39:06] un facter (no description available) [16:39:07] (etc) [16:39:18] I can't ssh to map.oxygenguide.eqiad.wmflabs, but salt works [16:40:00] that project doesn't even exist...? [16:40:05] and then we have 13+9+6 (jessie+stretch+trusty) systems running facter 2.4.6 [16:40:23] 290 jessie running 2.2, 238 trusty running 1.7.5 [16:40:51] bd808: no idea, I'm running commands against it just fine though :) [16:41:10] map.oxygenguide.eqiad.wmflabs: [16:41:11] eth0 Link encap:Ethernet HWaddr fa:16:3e:39:14:27 [16:41:11] inet addr:10.68.16.181 Bcast:10.68.23.255 Mask:255.255.248.0 [16:42:05] instance seems to be up but not accepting my root ssh key [16:42:10] yup [16:42:15] salt works :) [16:42:23] you can add your via salt :D [16:42:30] paravoid: open a ticket plz? [16:43:04] for which of the two? [16:43:31] map.oxygenguide.eqiad.wmflabs. I'm poking around in the huggle instance now [16:45:13] *sigh* looks like puppet was deliberately uninstalled on huggle [16:47:19] yeah how else would it get uninstalled :) [16:50:54] !log huggle Puppet removed from huggle.huggle.eqiad.wmflabs on 2017-05-19 13:17:32; this breaks Cloud Services management abilities [16:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Huggle/SAL [16:52:14] petan: not having Puppet installed on huggle.huggle.eqiad.wmflabs is a problem [16:52:43] 06Labs, 10Tool-Labs: A tool queries urwiki recentchanges 6 times per second - https://phabricator.wikimedia.org/T166531#3301013 (10Framawiki) p:05Triage>03High I'm just an user of tools labs, I can read but can't edit. Still running. Perhaps @yuvipanda or an other admin can stop this job ? First, just `qde... [16:57:56] 06Labs, 10Huggle: huggle.huggle.eqiad.wmflabs does not have puppet installed - https://phabricator.wikimedia.org/T166588#3301060 (10bd808) [16:58:42] RECOVERY - Puppet errors on tools-exec-1437 is OK: OK: Less than 1.00% above the threshold [0.0] [17:03:13] PROBLEM - Host tools-worker-1006 is DOWN: CRITICAL - Host Unreachable (10.68.17.89) [17:07:47] 06Labs, 10Huggle: huggle.huggle.eqiad.wmflabs does not have puppet installed - https://phabricator.wikimedia.org/T166588#3301098 (10bd808) Once detached from regular Puppet runs, a virtual machine is in a state where the #cloud-services-team cannot reliably support it. [17:14:04] !log tools.wiki-retweet-bot Removed $HOME/service.manifest. webservice was stuck in restart loop. (T163355) [17:14:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wiki-retweet-bot/SAL [17:14:07] T163355: webservice stop says service not running but service.manifest not cleared - https://phabricator.wikimedia.org/T163355 [17:15:06] !log tools restarted catmon tool to clean up stray files [17:15:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:15:17] !log tools depooled, rebooted, and repooled tools-exec-1412 [17:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:15:24] !log tools restarted croptool to clean up stray files [17:15:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:15:31] !log tools Drained and rebooted tools-worker-1006 [17:15:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:16:08] !log tools Killed tool videoconvert on tools-exec-1440 in debugging labstore disk space issues [17:16:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:17:04] !log tools.wiki-retweet-bot Cleaned up error.log and service.log [17:17:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wiki-retweet-bot/SAL [17:18:41] RECOVERY - Host tools-worker-1006 is UP: PING OK - Packet loss = 0%, RTA = 1.21 ms [17:20:56] !log tools Uncordoned tools-worker-1006 [17:21:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:21:35] Krinkle: I assume I've reported this in the wrong place? https://github.com/Krinkle/intuition-web/issues/2 [17:21:51] hopefully simple! [17:22:45] !log tools restarting vltools to clean up leaked files [17:22:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:29:01] !log tools restarting ytcleaner webservice to clean up leaked files [17:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:31:11] !log tools restarting onetools to clean up file leaks [17:31:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:32:38] !log restarting excel2wiki to clean up file leaks [17:32:38] andrewbogott: Unknown project "restarting" [17:36:43] !log tools restarting idwiki-welcome in kenrick95bot to free up leaked files [17:36:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:36:50] !log tools restarting excel2wiki to clean up file leaks [17:36:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:37:25] bd808: I didn't file that task and I'm about to go/late, sorry [17:37:26] maybe later [17:37:53] also, would be sweet if you guys upgraded facter across the lab fleet, should be just one salt command really + babysitting [17:37:59] but it's already rolled out in prod without any issues [17:38:12] I can do it, but I thought you'd prefer it if I didn't :) [17:38:18] ttyl! [17:41:36] !log phabricator upgraded phab-tin to stretch a few weeks ago (jessie -> stretch) [17:41:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [17:44:15] andrewbogott: hi, can you take a look at https://phabricator.wikimedia.org/T166531 (or another admin)? [17:52:18] !log git upgrading jenkins-slave-01 to stretch (jessie -> stretch upgrade using this guide https://linuxconfig.org/how-to-upgrade-debian-8-jessie-to-debian-9-stretch) [17:52:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [17:55:56] wow that's alot of packages to upgrade [17:55:56] APT WARNING: 1747 packages available for dist-upgrade (0 critical updates). warnings detected. [17:58:46] Tools admins: this task need an action https://phabricator.wikimedia.org/T166531 [18:15:41] !log tools restarted robokobot virgule to free up leaked files [18:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:27:09] 06Labs, 10Tool-Labs: A tool queries urwiki recentchanges 6 times per second - https://phabricator.wikimedia.org/T166531#3299145 (10Andrew) I have applied hashar's patch, and restarted the tool. I will also email the maintainer and direct them to this discussion. [18:27:56] !log git running apt-get dist-upgrade on jenkins-slave-01 [18:27:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [18:29:55] Sagan, done. [18:30:05] andrewbogott: thx :) [18:30:15] framawiki: cc ^ task done [18:30:33] I applied the patch blindly so no guarantees that I didn't totally break the tool… I emailed the maintainer to double-check. [18:31:24] 06Labs: Request increased quota (floating ip) for getstarted labs project - https://phabricator.wikimedia.org/T166324#3301509 (10Andrew) a:03Andrew [18:32:30] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#3301525 (10Andrew) [18:32:32] 06Labs: Request increased quota (floating ip) for getstarted labs project - https://phabricator.wikimedia.org/T166324#3292471 (10Andrew) 05Open>03Resolved ok, ip assigned. [18:33:34] 06Labs: Request creation of jessie-stretch labs project - https://phabricator.wikimedia.org/T165633#3272271 (10Andrew) We discussed this but aren't convinced that it's useful to have a dedicated project for this -- you can do upgrade attempts within existing projects can't you? [18:35:27] Thanks andrewbogott ! [18:35:57] 06Labs: Request creation of jessie-stretch labs project - https://phabricator.wikimedia.org/T165633#3301545 (10Paladox) Yep. But may have problems with certain roles which testing before could fix that. Some of the instances have to be like prod to test changes on. [18:37:09] and Sagan :) [18:42:12] 06Labs: Instance map.oxygenguide.eqiad.wmflabs somehow orphaned on the cluster - https://phabricator.wikimedia.org/T166602#3301566 (10bd808) [18:43:46] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: A tool queries urwiki recentchanges 6 times per second - https://phabricator.wikimedia.org/T166531#3301579 (10bd808) [18:44:51] 06Labs: Instance map.oxygenguide.eqiad.wmflabs somehow orphaned on the cluster - https://phabricator.wikimedia.org/T166602#3301566 (10Paladox) It pings ping 10.68.16.181 PING 10.68.16.181 (10.68.16.181) 56(84) bytes of data. 64 bytes from 10.68.16.181: icmp_seq=1 ttl=64 time=2.85 ms 64 bytes from 10.68.16.181:... [18:47:07] 06Labs: Instance map.oxygenguide.eqiad.wmflabs somehow orphaned on the cluster - https://phabricator.wikimedia.org/T166602#3301566 (10Andrew) Indeed, the project was deleted: https://wikitech.wikimedia.org/wiki/Purge_2016#Deleted_oxygenguide So, this instance is just a leak (project deletion isn't as comprehen... [18:51:42] 06Labs: Instance map.oxygenguide.eqiad.wmflabs somehow orphaned on the cluster - https://phabricator.wikimedia.org/T166602#3301629 (10Andrew) 05Open>03Resolved a:03Andrew https://www.youtube.com/watch?v=gXdv_BJBvew [18:53:45] paravoid: map.oxygenguide.eqiad.wmflabs is nuked. leaked from a project delete last fall apparently [18:55:30] andrewbogott: can you close the task https://phabricator.wikimedia.org/T166531 if it's ok for you, please ? Thanks ! [18:55:45] framawiki: I've referred it to the tool owner to close [18:56:12] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: A tool queries urwiki recentchanges 6 times per second - https://phabricator.wikimedia.org/T166531#3301662 (10Andrew) I've asked the maintainer to verify the change and then close this ticket. [19:00:01] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: A tool queries urwiki recentchanges 6 times per second - https://phabricator.wikimedia.org/T166531#3301668 (10Framawiki) p:05High>03Normal Perhaps the tool owner can take a look on [[ https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Grid#Submitting_continuo... [19:08:44] andrewbogott: what happened with robokobot? [19:09:22] There were a done of files that were deleted but still held open by running processes. Restarting the bot caused those files to finally actually go away, it freed up some disk space. [19:09:28] um… a ton of files [19:14:14] Setting up mariadb-server-10.1 (10.1.23-8) ... [19:14:15] 17870 Segmentation fault | $MYSQL_BOOTSTRAP 2>&1 [19:14:15] 17871 | $ERR_LOGGER [19:14:17] that's a new one [19:14:26] never had a segmentation fault with mariadb. [19:14:37] but guessing there's problems with mysql -> mariadb 10.1 [19:39:16] 06Labs, 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Fix ci puppet role to support stretch - https://phabricator.wikimedia.org/T166611#3301885 (10Paladox) [19:40:31] 06Labs, 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: Fix ci puppet role to support stretch - https://phabricator.wikimedia.org/T166611#3301910 (10Paladox) [20:03:38] 06Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org, 05MW-1.30-release-notes (WMF-deploy-2017-05-30_(1.30.0-wmf.3)), 13Patch-For-Review: Remove OpenStackManager from Wikitech - https://phabricator.wikimedia.org/T161553#3135041 (10Andrew) [20:04:17] 06Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org, 05MW-1.30-release-notes (WMF-deploy-2017-05-30_(1.30.0-wmf.3)), 13Patch-For-Review: Remove OpenStackManager from Wikitech - https://phabricator.wikimedia.org/T161553#3135041 (10Andrew) [20:04:19] 06Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org, 13Patch-For-Review: Create a Horizon panel for managing per-project sudo policies - https://phabricator.wikimedia.org/T162097#3152421 (10Andrew) 05Open>03Resolved p:05Triage>03Normal [20:56:15] I trust: .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .*@wikimedia/Ryan-lane (2admin), .*@wikipedia/.* (2trusted), .*@nightshade.toolserver.org (2trusted), .*@wikimedia/Krinkle (2admin), .*@[Ww]ikimedia/.* (2trusted), .*@wikipedia/Cyberpower678 (2admin), .*@wirenat2\.strw\.leidenuniv\.nl (2trusted), .*@unaffiliated/valhallasw (2trusted), .*@mediawiki/yuvipanda (2admin), .*@wikipedia/Coren (2admin), .*@wikimedia/BDavis-WMF (2admin), .*@wikimedia/Krenair (2admin), .*@wikimedia/mviswanathan-wmf (2admin), [20:56:15] @trusted [20:57:06] We have lots of admins. ;p [21:33:34] !help [21:33:34] !documentation for labs !wm-bot for bot [21:33:54] !documenatation [21:35:03] 06Labs, 15User-bd808: Make a WMCS 'clinic duty' doc page - https://phabricator.wikimedia.org/T166572#3302303 (10bd808) 05Open>03Resolved https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/On-call_duties [22:11:16] !documentation | bd808 [22:11:21] :( [22:11:28] good luck! [22:32:00] !log tools migrating tools-webgrid-lighttpd-1406, tools-exec-1410 from labvirt1006 to labvirt1009 to balance cpu usage [22:32:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:36:04] PROBLEM - Host tools-exec-1410 is DOWN: CRITICAL - Host Unreachable (10.68.18.18) [22:36:34] PROBLEM - Host tools-webgrid-lighttpd-1406 is DOWN: CRITICAL - Host Unreachable (10.68.17.195) [22:42:50] 06Labs, 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 06Release-Engineering-Team (Next): Fix ci puppet role to support stretch - https://phabricator.wikimedia.org/T166611#3302488 (10greg) [23:00:27] 06Labs: Request creation of jessie-stretch labs project - https://phabricator.wikimedia.org/T165633#3302562 (10Dzahn) @Paladox You should say which specific instances you want to test upgrading and then see if it's an issue of quota in that specific project. I think creating a new instance in parallel would ofte... [23:24:22] RECOVERY - Host tools-webgrid-lighttpd-1406 is UP: PING OK - Packet loss = 0%, RTA = 0.92 ms [23:35:14] 06Labs: Request creation of jessie-stretch labs project - https://phabricator.wikimedia.org/T165633#3302638 (10Paladox) >>! In T165633#3302562, @Dzahn wrote: > @Paladox You should say which specific instances you want to test upgrading and then see if it's an issue of quota in that specific project. I think crea...