[03:06:06] PROBLEM - Puppet run on tools-exec-1401 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [03:43:31] 06Labs, 10WikiApiary: New Public IP for WikiApiary on WMFLabs - https://phabricator.wikimedia.org/T160862#3113506 (10Dzahn) If it's just for running a webserver you don't really need one. Then it can just be behind the web proxy like most projects with web servers. You can create a new proxy by clicking in th... [03:46:05] RECOVERY - Puppet run on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [03:52:31] 06Labs, 10WikiApiary: New Public IP for WikiApiary on WMFLabs - https://phabricator.wikimedia.org/T160862#3113510 (10Dzahn) If you still think you need one, then see here how to request it: https://wikitech.wikimedia.org/wiki/Help:Addresses#Request_a_Public_IP_address [03:56:13] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2480369 (10Dzahn) [03:56:15] 06Labs, 10WikiApiary: New Public IP for WikiApiary on WMFLabs - https://phabricator.wikimedia.org/T160862#3113513 (10Dzahn) [03:59:43] 10Labs-project-other, 06Developer-Relations, 10WikiApiary: move WikiApiary to Labs - https://phabricator.wikimedia.org/T149874#3113516 (10Dzahn) [03:59:45] 06Labs, 10WikiApiary: New Public IP for WikiApiary on WMFLabs - https://phabricator.wikimedia.org/T160862#3113515 (10Dzahn) [05:20:47] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Have Edit Counter use same architecture and front-end as the other pieces that have been re-written - https://phabricator.wikimedia.org/T160481#3113581 (10Samwilson) a:03Samwilson [06:52:33] 06Labs: Request creation of getstarted labs project - https://phabricator.wikimedia.org/T160884#3113617 (10Freddy2001) [09:08:09] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Deprecate precise instances in Labs by 2017-03-31 - https://phabricator.wikimedia.org/T143349#3113723 (10hashar) [09:09:53] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Deprecate precise instances in Labs by 2017-03-31 - https://phabricator.wikimedia.org/T143349#2991466 (10hashar) I have finally deleted all three Precise instances from the `integration` labs project and updated the task detail to reflect it. The sub task T... [09:38:52] 06Labs, 06Operations: labtestcontrol2001: cron-spam from invoke-rc.d atop _cron - https://phabricator.wikimedia.org/T159532#3113756 (10elukey) 05Open>03Resolved a:03elukey Found `echo "Do the thing"` in /lib/lsb/init-functions, probably added manually. Just removed it. [10:22:56] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [10:57:53] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [11:04:16] This user is now online in #wikimedia-operations. I'll let you know when they show some activity (talk, etc.) [11:04:16] @notify andrewbogott let me know when you are gonna be around for ~2 hours :P [11:23:31] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3114061 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['labsdb1006.eqiad.wmnet'] ```... [12:26:59] 06Labs, 06Operations: Add monitoring for nfs-exportd on active labstore specifically - https://phabricator.wikimedia.org/T160838#3114170 (10chasemp) https://gerrit.wikimedia.org/r/#/c/343624/ [12:30:37] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Deprecate precise instances in Labs by 2017-03-31 - https://phabricator.wikimedia.org/T143349#3114178 (10chasemp) [12:34:20] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3114181 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['labsdb1006.eqiad.wmnet'] ``` Of which those **FAILED**: ``` set(['labsdb1006.... [12:34:26] 06Labs, 10Tool-Labs, 13Patch-For-Review: Provision and test tools-mailrelay-02 - https://phabricator.wikimedia.org/T97574#3114185 (10chasemp) [12:34:28] 06Labs, 10Labs-Team-Backlog, 10Tool-Labs, 10Mail: Set up A-based SPF for tools.wmflabs.org - https://phabricator.wikimedia.org/T104733#3114184 (10chasemp) [12:54:06] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3114234 (10chasemp) 05Open>03Resolved This is still super important but the immediate issue of this task is resolved it seems and followed up by {T159721} [12:54:17] 06Labs, 10Labs-Infrastructure: labvirt1001 and 1002 cannot launch new VMs - https://phabricator.wikimedia.org/T159721#3076053 (10chasemp) [12:56:47] 10Tool-Labs-tools-Other: Unknown "cewbot" user lurking in channels - https://phabricator.wikimedia.org/T160907#3114242 (10Nemo_bis) [12:58:24] 06Labs, 06Operations: Instance creation stalls before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3114255 (10chasemp) [12:58:31] 06Labs, 06Operations: Instance creation stalls before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3114269 (10chasemp) p:05Triage>03High [12:59:10] 06Labs, 06Operations: Instance creation stalls before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3114255 (10chasemp) a:03Andrew [12:59:53] 06Labs, 06Operations: Instance creation fails before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3114255 (10chasemp) [13:04:05] 06Labs, 06Operations: Instance creation fails before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3114282 (10chasemp) [13:56:13] 06Labs, 10WikiApiary: New Public IP for WikiApiary on WMFLabs - https://phabricator.wikimedia.org/T160862#3114367 (10MarkAHershberger) Dzahn writes: > If it's just for running a webserver you don't really need one. Then it can just be behind the web proxy like most projects with web servers. You can create a... [13:58:49] 06Labs, 10WM-Bot: Move wm-bot instance to Trusty - https://phabricator.wikimedia.org/T157838#3114371 (10Petrb) MariaDB is just as bad as MySQL, anyway I've decided to take the approach of live update, it's most easy atm [14:07:23] petan: I'm here now. I have a short meeting in 1:15, after that I'm free again for a while. [14:08:13] I'm not likely to be much help w/db migration but I'm certainly happy to do what I can [14:11:00] 06Labs, 10Labs-Infrastructure: Weird state of /data/project for dumps (semi-missing files) - https://phabricator.wikimedia.org/T87224#3114399 (10Nemo_bis) The situation is still identical though, so I'd appreciate suggestions on what to do: ``` nemobis@dumps-stats:/data/project$ ls -lh wikistats/ ls: cannot a... [14:27:23] andrewbogott: I don't want to migrate the DB, which is why I need you :) [14:27:27] I want just OS upgrade [14:27:31] and leave DB as it is, where it is [14:28:34] hmm I am just wondering, is it even necessary to upgrade it? [14:28:37] it's 14.04 [14:30:08] which VM are we talking about? [14:31:02] huggle-pg [14:31:42] yeah, looks like that's a Trusty instance already [14:31:46] so should be fine? [14:32:18] ah ok [14:32:22] in that case I think we are done :D [14:32:23] yay [14:33:33] 06Labs, 10Labs-Infrastructure: Weird state of /data/project for dumps (semi-missing files) - https://phabricator.wikimedia.org/T87224#986466 (10zhuyifei1999) @Nemo_bis have you `stat wikistats/`? This is similar to the issue where you don't have the +x permission on the directory: ``` $ cd /tmp $ mkdir test $... [14:34:12] 06Labs, 10Huggle: Labs instance huggle.huggle.wmflabs needs to be replaced or deleted - https://phabricator.wikimedia.org/T157710#3114447 (10Petrb) 05Open>03Resolved soo, it seems we are done :) [14:34:20] petan: great! [14:34:32] petan: is wm-bot done also? Or is that still in a testing period or something? [14:35:12] 06Labs: Request creation of wm-bot labs project - https://phabricator.wikimedia.org/T157879#3114451 (10Petrb) [14:35:17] 06Labs, 10WM-Bot: Move wm-bot instance to Trusty - https://phabricator.wikimedia.org/T157838#3114449 (10Petrb) 05Open>03Resolved I just nuked wm-bot instance, in project bots. There is one more instance "botbot" that I will look in, maybe there is something I would like to archive for future, and then we c... [14:35:21] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Deprecate precise instances in Labs by 2017-03-31 - https://phabricator.wikimedia.org/T143349#3114452 (10Petrb) [14:35:31] 06Labs, 10WM-Bot: Move wm-bot instance to Trusty - https://phabricator.wikimedia.org/T157838#3114453 (10Petrb) btw botbot is 14.04 so it doesn't block stuff [14:35:35] petan: woo! [14:35:43] thanks for handling those [14:36:59] 10Tool-Labs-tools-Other: Unknown "cewbot" user lurking in channels - https://phabricator.wikimedia.org/T160907#3114455 (10zhuyifei1999) This name reminds me of https://meta.wikimedia.org/wiki/User:Cewbot so ping @kanashimi [14:37:42] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [14:42:50] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Deprecate precise instances in Labs by 2017-03-31 - https://phabricator.wikimedia.org/T143349#3114460 (10Andrew) [14:44:29] 06Labs, 10Labs-Infrastructure: Weird state of /data/project for dumps (semi-missing files) - https://phabricator.wikimedia.org/T87224#3114462 (10chasemp) thanks @zhuyifei1999 ignores permissions and does not reproduce `root@dumps-stats:/data/project# ls -lh wikistats` reproduces `root@dumps-stats:/data/proje... [14:48:19] 06Labs, 10Labs-Infrastructure: Weird state of /data/project for dumps (semi-missing files) - https://phabricator.wikimedia.org/T87224#3114464 (10Nemo_bis) Ok. From the description I thought I had checked permissions... [14:54:29] 06Labs: Revert temporary quota increase for fastcci project (when ready) - https://phabricator.wikimedia.org/T160798#3114504 (10Andrew) [14:55:33] 10Labs-project-other, 06Developer-Relations, 10WikiApiary: move WikiApiary to Labs - https://phabricator.wikimedia.org/T149874#3114509 (10Andrew) [14:55:36] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#3114510 (10Andrew) [14:55:38] 06Labs, 10WikiApiary: New Public IP for WikiApiary on WMFLabs - https://phabricator.wikimedia.org/T160862#3114507 (10Andrew) 05Open>03Invalid It sounds like you don't need a quota change, so I'm closing this for now. Feel free to open a quota request if you turn out to really need the IP -- I suspect you'... [15:17:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [15:37:52] 06Labs: Revert temporary quota increase for fastcci project (when ready) - https://phabricator.wikimedia.org/T160798#3114641 (10dschwen) Andrew, I managed to get my existing VM running again. You can lower the quota again. [15:38:51] 06Labs: Request creation of getstarted labs project - https://phabricator.wikimedia.org/T160884#3113617 (10chasemp) +1 [15:42:50] 06Labs: Request creation of getstarted labs project - https://phabricator.wikimedia.org/T160884#3113617 (10zhuyifei1999) Just wondering, what makes this not-doable on tool labs? [15:52:09] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#3114714 (10Dzahn) I see that labs1005/1006/1007 are all either re-installed or down. They don't show up as precise anymore when checking with salt.... [15:54:36] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#3114717 (10jcrespo) Dzhan- the "reinstall as jessie" part is done, but the setup of the passive replica is not 100% complete. It will take one commit... [15:57:58] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#3114742 (10Dzahn) Got it, and thank you very much. [16:13:16] !log tools migrating tools-exec-1408 to labvirt1010 to reduce load on labvirt1001 [16:13:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:17:17] PROBLEM - Host tools-exec-1408 is DOWN: CRITICAL - Host Unreachable (10.68.18.14) [16:30:29] 10PAWS: Enable batch download - https://phabricator.wikimedia.org/T160922#3114877 (10Capt_Swing) [16:42:40] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/GoranSMilovanovic was created, changed by GoranSMilovanovic link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/GoranSMilovanovic edit summary: Created page with "{{Tools Access Request |Justification=I have started working as a Data Analyst, Wikimedia Deutschland. I need the access to the Tools project for various Data Science tasks...." [16:42:53] !log tools migrating tools-webgrid-generic-1404 to labvirt1011 to reduce load on labvirt1001 [16:42:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:49:41] PROBLEM - Host tools-webgrid-generic-1404 is DOWN: CRITICAL - Host Unreachable (10.68.18.53) [17:02:31] RECOVERY - Host tools-webgrid-generic-1404 is UP: PING OK - Packet loss = 0%, RTA = 2.39 ms [17:05:29] !log tools migrating tools-webgrid-lighttpd-1410 to labvirt1012 to reduce load on labvirt1001 [17:05:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:11:13] PROBLEM - Host tools-webgrid-lighttpd-1410 is DOWN: CRITICAL - Host Unreachable (10.68.18.44) [17:21:20] RECOVERY - Host tools-exec-1408 is UP: PING OK - Packet loss = 0%, RTA = 2.52 ms [17:31:45] !log tools migrating tools-exec-1417 to labvirt1013 [17:31:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:32:48] PROBLEM - Host tools-exec-1417 is DOWN: CRITICAL - Host Unreachable (10.68.23.172) [17:52:31] RECOVERY - Host tools-exec-1417 is UP: PING OK - Packet loss = 0%, RTA = 0.88 ms [18:03:39] !log tools Applied openstack::clientlib on tools-checker-01 and forced puppet run [18:03:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:09:16] PROBLEM - Puppet run on tools-checker-01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [18:14:13] RECOVERY - Puppet run on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:21:31] 06Labs: Request creation of getstarted labs project - https://phabricator.wikimedia.org/T160884#3115291 (10Freddy2001) Setting up a clean wiki sandbox environment on demand in the learning process with some "challanges" and tasks to lern for each subject and students, which eliminates itself after completing the... [18:22:07] RECOVERY - Host tools-webgrid-lighttpd-1410 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [18:24:55] 06Labs, 10PAWS: Track PAWS user storage - https://phabricator.wikimedia.org/T160114#3115307 (10yuvipanda) 05Open>03Resolved a:03madhuvishy This is done! [18:33:28] 06Labs: Provision novaobserver credentials on all Labs hosts - https://phabricator.wikimedia.org/T160929#3115341 (10bd808) [18:36:12] !log tools Applied openstack::clientlib on tools-checker-02 and forced puppet run [18:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:14:30] !log deployment-prep migrating deployment-puppetmaster02 to a different labvirt [20:14:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [20:18:43] 06Labs: Provision novaobserver credentials on all Labs hosts - https://phabricator.wikimedia.org/T160929#3115907 (10Andrew) I think that the already-existing openstack::observerenv class is just what we want here. I just forgot that it was separate. [20:26:37] 10Labs-project-Wikistats: wikistats: add new wikipedias: kbp, khw - https://phabricator.wikimedia.org/T160947#3115930 (10Dzahn) [20:26:52] 10Labs-project-Wikistats: wikistats: add new wikipedias: kbp, khw - https://phabricator.wikimedia.org/T160947#3115946 (10Dzahn) p:05Triage>03Normal [20:27:19] 06Labs, 15User-bd808: Provision novaobserver credentials on all Labs hosts - https://phabricator.wikimedia.org/T160929#3115948 (10bd808) a:05Andrew>03bd808 Yeah, `::openstack::observerenv` looks like exactly the right thing. I'll post a patch to add that to `role::labs::instance`. [20:45:21] !log deployment-prep migrating deployment-pdf01 to labvirt1011 [20:45:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [20:51:17] !log deployment-prep migrating deployment-urldownloader to labvirt1013 [20:51:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [21:10:38] 06Labs, 10Labs-Infrastructure, 10Tool-Labs, 07Wikimedia-Incident: Write a simple script that handles failovering proxies - https://phabricator.wikimedia.org/T143639#2574425 (10Andrew) This only barely warrants a script, since I just now did it with a single command: OS_TENANT_NAME=project-proxy openstack... [21:11:57] !log project-proxy switching primary proxy to novaproxy-02 [21:12:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [21:12:07] !log project-proxy migrating novaproxy-01 to labvirt1010 [21:12:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [21:21:56] !log project-proxy switching primary back to novaproxy-01 [21:21:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [21:25:09] !log paws migrating paws-base-01 to labvirt1013 [21:25:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [21:39:32] 06Labs, 06Operations, 10hardware-requests: Codfw: (1) hardware access request for labtest - https://phabricator.wikimedia.org/T154706#2921133 (10RobH) @chasemp: Is there a specific existing server that meets this requirement to base a new spec off of? Also for 1TB is that 1TB of space post raid10? So jus... [21:42:39] 06Labs, 06Operations, 10hardware-requests: Eqiad: (2) hardware access request for labcontrol1003/1004 - https://phabricator.wikimedia.org/T158207#3029754 (10RobH) Is there a specific cpu seed we have to stick to? 24 cores without HT is dual 12 core CPUs. Anything between 2-2.6 ok? [21:43:33] 06Labs, 06Operations, 10hardware-requests: Eqiad: (2) hardware access request for labnet1003/1004 - https://phabricator.wikimedia.org/T158204#3029672 (10RobH) Is there a specific cpu seed we have to stick to? 24 cores without HT is dual 12 core CPUs. Anything between 2-2.6 ok? Disks: 100G means only need 10... [22:47:02] !log tools disable puppet on all k8s workers to test https://gerrit.wikimedia.org/r/#/c/343708/ [22:47:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:44:57] PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:45:49] ^ is me [23:59:57] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0]