[00:17:44] RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 1.12 ms [00:19:20] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [00:41:39] RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 1.33 ms [00:49:50] PROBLEM - Host secgroup-lag-102 is DOWN: PING CRITICAL - Packet loss = 100% [00:53:36] andrewbogott: chasemp: yuvipanda: Having trouble associating a public IP to an instance in cvn. https://phabricator.wikimedia.org/T150209 [00:54:09] I used to have a quota of 2, used on cvn-app4 and 5. Then I got a third one, and assigned it to the new app6, which worked. I migrated all app4 stuff to it and terminated that one [00:54:18] Now I created app7 but I'm unable to assign it a public IP [00:54:31] I suspect one of the first two IPs didn't get reclaimed [00:54:41] Horizon just tells me "Error: Unable to associate floating IP. [00:54:41] " [00:54:44] yeah [00:54:54] Overview also still says I;m using 3/3 [00:55:07] go to Access & Security [00:55:11] under Project -> Compute [00:55:26] then click the Floating IPs tab [00:55:31] Aha [00:55:33] I have to release it here [00:55:34] Cool [00:55:44] yeah [00:55:45] or associate it [00:55:56] Thanks, that did it [00:56:36] now it's time for a humanoid pupppet-ish run on the new instance - https://github.com/countervandalism/infrastructure/blob/master/setup.yaml#L19 [00:57:43] ew [01:02:58] I have a Kubernetes-based webservice that I am trying to start back up but don't seem to be successful in doing? [01:03:53] It's mediaplaycounts and when doing the proper shell command it doesn't start back up; tail -f uwsgi.log confirms this [01:04:34] hmm [01:04:40] uwsgi.log is useless because we didn't turn that back on yet [01:04:47] ...o [01:04:59] let me turn that back on now [01:05:06] `webservice --backend=kubernetes python start` is the command i want in any instance yes? [01:05:13] yeah [01:05:15] although if you've done it once [01:05:24] webservice restart should do the right thing [01:05:32] https://tools.wmflabs.org/mediaplaycounts/api/1 returns error 502 [01:05:35] but without uwsigi.log we've no idea why it could be broken [01:05:38] so let me fix that [01:06:40] yuvipanda: do the stdout/stderr buffer in k8s for things using webservice? Tailing those buffers for stashbot is pretty awesome and saves me from puking stuff to disk [01:07:56] bd808: no because I reroute stdout from there to disk manually [01:08:08] yuvipanda: *nod* [01:08:13] bd808: mostly as a 'do not break this' [01:09:18] bd808: need a way to collect logs at some point [01:19:17] yuvipanda: let me know once you've re-enabled uwsgi.log [01:19:27] yup will do [01:29:35] PROBLEM - Free space - all mounts on tools-docker-builder-03 is CRITICAL: CRITICAL: tools.tools-docker-builder-03.diskspace.root.byte_percentfree (<11.11%) [01:34:06] Hm.. does it matter whether Horizon shows "project-id" in Metadata? [01:34:15] All instances I created before yesterday have it [01:34:17] but not the latest one [01:43:28] !log tools cleanup old images on tools-docker-builder-03 [01:43:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [01:47:41] coward: just as an update, the new images are continuing to build... [01:47:46] should be there in about 10-15mins [01:47:58] okay thank you [01:54:36] RECOVERY - Free space - all mounts on tools-docker-builder-03 is OK: OK: All targets OK [02:05:22] !log tools rebooting tools-docker-registry-01, can't ssh in [02:05:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:05:35] coward: ofc, we seem to have triggered some sort of deep hidden monster, so I'm investigating [02:05:40] lovely [02:06:09] coward: might be a while [02:39:13] coward: several hours at least :( sorry [02:39:25] It's okay! I'll be working tomorrow too :P [02:39:40] PROBLEM - Puppet run on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:39:43] Impressive, I've uncovered a massive techno-nightmare. [02:40:08] coward: it looks like. [02:52:25] i just moved my webservice to kubectl [02:52:27] :) [03:06:33] RECOVERY - Host tools-puppetmaster-01 is UP: PING OK - Packet loss = 0%, RTA = 2.31 ms [03:14:40] RECOVERY - Puppet run on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [03:33:10] PROBLEM - Host tools-puppetmaster-01 is DOWN: CRITICAL - Host Unreachable (10.68.22.61) [04:12:51] [13nagf] 15Krinkle created 06cpu-100 (+1 new commit): 02https://github.com/wikimedia/nagf/commit/30ddf00366c1 [04:12:52] 13nagf/06cpu-100 1430ddf00 15Timo Tijhof: graphs: Limit yMax to 100 for CPU graphs... [04:14:41] wikimedia/nagf#50 (cpu-100 - 30ddf00: Timo Tijhof) The build passed. - https://travis-ci.org/wikimedia/nagf/builds/176266847 [04:15:20] [13nagf] 15Krinkle created 0630ddf00 (+1 new commit): 02https://github.com/wikimedia/nagf/commit/47c5ebaa1f72 [04:15:21] 13nagf/0630ddf00 1447c5eba 15Timo Tijhof: build: Remove PHP 5.3 and PHP 5.4 from Travis CI matrix [04:15:30] [13nagf] 15Krinkle 04deleted 0630ddf00 at 1447c5eba: 02https://github.com/wikimedia/nagf/commit/47c5eba [04:15:56] wikimedia/nagf#51 (30ddf00 - 47c5eba: Timo Tijhof) The build has errored. - https://travis-ci.org/wikimedia/nagf/builds/176267199 [04:16:05] [13nagf] 15Krinkle opened pull request #14: graphs: Limit yMax to 100 for CPU graphs (06master...06cpu-100) 02https://github.com/wikimedia/nagf/pull/14 [04:17:41] [13nagf] 15Krinkle pushed 1 new commit to 06master: 02https://github.com/wikimedia/nagf/commit/a433087bf488b906dd5ea7ba0eb89e9f0fe0168b [04:17:42] 13nagf/06master 14a433087 15Timo Tijhof: graphs: Limit yMax to 100 for CPU graphs (#14)... [04:18:05] [13nagf] 15Krinkle created 06travis (+1 new commit): 02https://github.com/wikimedia/nagf/commit/bbf2f73a2ba8 [04:18:05] 13nagf/06travis 14bbf2f73 15Timo Tijhof: build: Remove PHP 5.3 and PHP 5.4 from Travis CI matrix [04:19:10] wikimedia/nagf#54 (travis - bbf2f73: Timo Tijhof) The build has errored. - https://travis-ci.org/wikimedia/nagf/builds/176267597 [04:19:27] wikimedia/nagf#53 (master - a433087: Timo Tijhof) The build passed. - https://travis-ci.org/wikimedia/nagf/builds/176267538 [04:23:52] wikimedia/nagf#56 (master - aec2c2e: Timo Tijhof) The build passed. - https://travis-ci.org/wikimedia/nagf/builds/176267670 [05:44:56] 06Labs: Horizon prefix puppet dialog puts you in wrong prefix after you create a new prefix - https://phabricator.wikimedia.org/T150828#2798007 (10yuvipanda) [05:47:48] 06Labs, 10Tool-Labs: Tools Docker Registry is Dead - https://phabricator.wikimedia.org/T150829#2798021 (10yuvipanda) [05:47:57] 06Labs, 10Tool-Labs: Tools Docker Registry is Dead - https://phabricator.wikimedia.org/T150829#2798036 (10yuvipanda) p:05Triage>03High [07:52:54] !log deployment-prep the new mysql root password for -db04 is at /tmp/newmysqlpass as well as in a new file in the puppetmaster's labs/private.git [07:53:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [08:13:36] 06Labs, 06Community-Tech, 10DBA, 10MediaWiki-extensions-PageAssessments: Replicate page_assessments and page_assessments_projects tables on Labs - https://phabricator.wikimedia.org/T150832#2798143 (10kaldari) [08:14:03] 06Labs, 10Labs-Infrastructure, 10DBA: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#2798156 (10kaldari) [08:14:07] 06Labs, 06Community-Tech, 10DBA, 10MediaWiki-extensions-PageAssessments: Replicate page_assessments and page_assessments_projects tables on Labs - https://phabricator.wikimedia.org/T150832#2798155 (10kaldari) [08:15:23] 06Labs, 06Community-Tech, 10DBA, 10MediaWiki-extensions-PageAssessments: Replicate page_assessments and page_assessments_projects tables on Labs - https://phabricator.wikimedia.org/T150832#2798143 (10Marostegui) Note: I checked the tables on enwiki and they are present in labs (ie: labsdb1003) already, but... [08:15:36] 06Labs, 06Community-Tech, 10DBA, 10MediaWiki-extensions-PageAssessments: Replicate page_assessments and page_assessments_projects tables on Labs - https://phabricator.wikimedia.org/T150832#2798159 (10kaldari) Apparently, they are already being replicated, but need to be added to the maintain-views config i... [09:05:42] 06Labs, 10Beta-Cluster-Infrastructure, 10Wikimedia-General-or-Unknown, 13Patch-For-Review: rename -labs.php to -beta.php - https://phabricator.wikimedia.org/T150268#2780096 (10hashar) p:05Triage>03Low [09:09:58] 06Labs, 10Labs-Infrastructure, 07LDAP: Remove shell user "80686" - https://phabricator.wikimedia.org/T63967#2798200 (10MoritzMuehlenhoff) I suppose this caused problems in OSM at some point and then validnames was tweaked to not allow that in LDAP? If it's no longer a problem in OSM, we can simply loosen the... [10:16:28] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, and 2 others: Developing community norms for vital bots and tools - https://phabricator.wikimedia.org/T149312#2798384 (10Qgil) [10:23:15] 06Labs, 10Tool-Labs-tools-Other, 10DBA, 13Patch-For-Review: High replication activity filled up labsdb1004 with binlogs - https://phabricator.wikimedia.org/T150553#2798400 (10jcrespo) 05Open>03stalled [10:23:30] 06Labs, 10Tool-Labs-tools-Other, 10DBA, 13Patch-For-Review: High replication activity filled up labsdb1004 with binlogs - https://phabricator.wikimedia.org/T150553#2789420 (10jcrespo) p:05Triage>03Low [10:52:16] 06Labs, 10Tool-Labs, 06Commons, 10Pywikibot-Commons, and 3 others: Pywikibot : Fix Commons scripts broken by toolserver.org to labs migration - https://phabricator.wikimedia.org/T78462#2798483 (10XXN) [11:47:08] 06Labs, 10Tool-Labs, 10Datasets-General-or-Unknown, 07Privacy: Information leak on wikidata-externalid-url - https://phabricator.wikimedia.org/T150803#2798688 (10Multichill) [11:53:28] 06Labs, 10Tool-Labs, 10Datasets-General-or-Unknown, 10Wikidata, 07Privacy: Information leak on wikidata-externalid-url - https://phabricator.wikimedia.org/T150803#2798700 (10Sjoerddebruin) [13:27:19] 06Labs, 10MediaWiki-extensions-TwoFactorAuthentication, 06Operations, 10wikitech.wikimedia.org: Can't login wikitech - https://phabricator.wikimedia.org/T144805#2798872 (10Shizhao) [13:35:40] hi madhuvishy and yuvipanda! i can't view static paws notebooks anymore (I can create and write in notebooks though) i'm getting a 500 internal server error. http://paws-public.wmflabs.org/paws-public are you aware of this? [13:49:57] PROBLEM - Puppet run on tools-exec-1418 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:50:07] PROBLEM - Puppet run on tools-webgrid-lighttpd-1202 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:50:13] PROBLEM - Puppet run on tools-exec-1420 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:50:41] PROBLEM - Puppet run on tools-exec-1209 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:50:51] PROBLEM - Puppet run on tools-exec-1220 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:51:05] PROBLEM - Puppet run on tools-grid-master is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:51:12] PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:51:14] PROBLEM - Puppet run on tools-worker-1021 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:51:20] PROBLEM - Puppet run on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:51:40] PROBLEM - Puppet run on tools-webgrid-lighttpd-1415 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:52:02] PROBLEM - Puppet run on tools-grid-shadow is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:54:08] PROBLEM - Puppet run on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:54:46] PROBLEM - Puppet run on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:55:36] PROBLEM - Puppet run on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [13:56:00] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 531 bytes in 0.008 second response time [13:56:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [13:56:10] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:56:12] PROBLEM - Puppet run on tools-exec-1205 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:56:14] PROBLEM - Puppet run on tools-exec-1203 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:57:39] PROBLEM - Puppet run on tools-worker-1022 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:57:43] PROBLEM - Puppet run on tools-exec-1405 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:58:01] PROBLEM - Puppet run on tools-exec-1415 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [13:58:21] PROBLEM - Puppet run on tools-worker-1019 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:58:28] $HOME isnt home anymore :( [13:58:29] PROBLEM - Puppet run on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:58:33] PROBLEM - Puppet run on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:58:37] PROBLEM - Puppet run on tools-webgrid-lighttpd-1204 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [13:58:55] PROBLEM - Puppet run on tools-worker-1005 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [13:59:07] PROBLEM - Puppet run on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:59:13] PROBLEM - Puppet run on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:59:29] PROBLEM - Puppet run on tools-static-10 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:59:31] PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:59:31] nevermind. user fail [13:59:41] PROBLEM - Puppet run on tools-exec-1404 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [14:00:11] PROBLEM - Puppet run on tools-exec-1419 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [14:00:38] PROBLEM - Puppet run on tools-exec-1408 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [14:00:44] Betacommand: There's no place like ::1 [14:01:02] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.026 second response time [14:03:35] multichill: forgot to log into the tool, from my account [14:05:08] Yesterday at work I completely mixed up servers and accounts while doing sudo / su. That might have triggered a mail or two ;-) [14:09:57] RECOVERY - Puppet run on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [14:14:41] RECOVERY - Puppet run on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [14:25:56] 06Labs, 10Tool-Labs: Tools Docker Registry is Dead - https://phabricator.wikimedia.org/T150829#2799163 (10chasemp) You didn't put down the name of the node you tried to create :) I'm guessing: > | 3e0dc2bb-4e99-4e84-aca4-f3be6a4134b0 | tools-docker-registry-02 | ACTIVE | public=10.68.21.176 > | OS-SRV-US... [14:26:09] RECOVERY - Puppet run on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0] [14:26:13] RECOVERY - Puppet run on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [14:26:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [14:27:03] RECOVERY - Puppet run on tools-grid-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [14:30:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [14:30:13] RECOVERY - Puppet run on tools-exec-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [14:30:41] RECOVERY - Puppet run on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [14:30:53] RECOVERY - Puppet run on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0] [14:31:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:31:12] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:31:40] RECOVERY - Puppet run on tools-webgrid-lighttpd-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [14:33:32] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [14:33:38] RECOVERY - Puppet run on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [14:33:54] RECOVERY - Puppet run on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [14:34:08] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [14:34:08] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [14:34:30] RECOVERY - Puppet run on tools-static-10 is OK: OK: Less than 1.00% above the threshold [0.0] [14:34:48] RECOVERY - Puppet run on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:37] RECOVERY - Puppet run on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:39] RECOVERY - Puppet run on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:15] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:15] RECOVERY - Puppet run on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:15] RECOVERY - Puppet run on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [14:37:39] RECOVERY - Puppet run on tools-worker-1022 is OK: OK: Less than 1.00% above the threshold [0.0] [14:37:45] RECOVERY - Puppet run on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:01] RECOVERY - Puppet run on tools-exec-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:21] RECOVERY - Puppet run on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:29] RECOVERY - Puppet run on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [14:39:13] RECOVERY - Puppet run on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [14:39:29] RECOVERY - Puppet run on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [14:40:09] RECOVERY - Puppet run on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [15:01:16] I can't mysql via labs, gives me error but I have a replica.my.cnf file [15:01:48] forget it, sql works [15:27:10] 06Labs, 10Tool-Labs, 13Patch-For-Review: Tools Docker Registry is Dead - https://phabricator.wikimedia.org/T150829#2799320 (10chasemp) The VM is up and running. I'm not resolving and I'm not sure what remains here re: the registry itself. [15:27:52] 06Labs, 10Tool-Labs, 10Datasets-General-or-Unknown, 10Wikidata, 07Privacy: Information leak on wikidata-externalid-url - https://phabricator.wikimedia.org/T150803#2799322 (10ArthurPSmith) Ha, if I'd actually looked at the logs I would have known that. Yes all the IP addresses in the file are a 10.68 addr... [15:28:07] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate tools to secondary labstore HA cluster (Scheduled on 11/14) [tracking] - https://phabricator.wikimedia.org/T146154#2799324 (10chasemp) [15:29:20] >>> [typeof(localStorage), window.localStorage] [15:30:10] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2799327 (10chasemp) [15:30:12] 06Labs, 06Operations, 07Tracking: Sync data for tools-project from labstore1001 to labstore1004/5 - https://phabricator.wikimedia.org/T144255#2799325 (10chasemp) 05Open>03Resolved This was done on sunday for a sync within 24 hours of main maint for Tools. The actual outage period sync took around 5h for... [15:31:47] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2004220 (10chasemp) [15:31:49] 06Labs, 06Operations, 13Patch-For-Review: revise/fix labstore replicate backup jobs - https://phabricator.wikimedia.org/T127567#2799330 (10chasemp) 05Open>03Resolved A bit of monitoring improvements ongoing in {T144633} but generally this is done. [15:32:28] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate tools to secondary labstore HA cluster (Scheduled on 11/14) [tracking] - https://phabricator.wikimedia.org/T146154#2799335 (10chasemp) [15:43:53] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2799355 (10chasemp) [15:43:54] 06Labs, 06Operations, 07Tracking: Performance test new secondary labstore HA cluster - https://phabricator.wikimedia.org/T146153#2799352 (10chasemp) 05Open>03Resolved a:03chasemp This work did not get persisted to the task here so I will attempt a brief outline for posterity. The main difficulty here... [15:44:51] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2004220 (10chasemp) [15:44:53] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate tools to secondary labstore HA cluster (Scheduled on 11/14) [tracking] - https://phabricator.wikimedia.org/T146154#2799373 (10chasemp) 05Open>03Resolved a:03chasemp Some fallout here {T150829} and I'm looking at addressing an issue w/ wher... [15:45:03] 06Labs, 10Labs-Sprint-103, 10Labs-Sprint-104: Labs: Make a new backup of the Labs storage to codfw - https://phabricator.wikimedia.org/T103356#2799381 (10Papaul) [15:46:43] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2799386 (10chasemp) [15:46:45] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate tools to secondary labstore HA cluster (Scheduled on 11/14) [tracking] - https://phabricator.wikimedia.org/T146154#2799384 (10chasemp) 05Resolved>03Open On second thought this should remain open until {T149946} is done (and reverted) [15:47:25] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2004220 (10chasemp) [15:47:30] 06Labs, 06Operations, 13Patch-For-Review: Move maps share to labstore1003 - https://phabricator.wikimedia.org/T147657#2799389 (10chasemp) 05Open>03Resolved This is done and we need to find a new home for maps as we fixup labstore1001 but the scope of this task is itself completed [15:48:04] Ok so Firefox just throw a warning about localstorage disabled even for vanilla HTML [15:50:10] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2799399 (10chasemp) [15:51:02] I should stop reading w3school, typeof(Storage) !== "undefined" is wrong use !window.localStorage [16:34:59] https://twitter.com/VanamoMedia/status/798785827706765312 That's pretty good [16:38:27] 06Labs, 10Labs-Infrastructure, 10DBA: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#2799553 (10chasemp) [16:38:34] 06Labs, 10Labs-Infrastructure, 10DBA, 10MediaWiki-extensions-ORES, and 3 others: Replicate ores_classification and ores_model tables in labs - https://phabricator.wikimedia.org/T148561#2799550 (10chasemp) 05Open>03Resolved ```--- /etc/maintain-views.yaml 2016-11-02 19:12:41.326458322 +0000 +++ /tmp/pup... [16:38:51] zareen: madhu is off and yuvi should be around shortly but is not atm :) [16:40:28] USE metawiki_p; SELECT COUNT (*) FROM ipblocks WHERE ipb_expiry = 'infinity'; <-- what am I doing bad? [16:43:20] thanks for letting me know chasemp [16:52:40] PROBLEM - Free space - all mounts on tools-docker-registry-01 is CRITICAL: CRITICAL: tools.tools-docker-registry-01.diskspace.root.byte_percentfree (<100.00%) [16:53:57] 06Labs, 06Operations, 10wikitech.wikimedia.org: Can't login wikitech - https://phabricator.wikimedia.org/T144805#2799605 (10Krenair) @Shizhao, we don't use Extension:TwoFactorAuthentication for 2FA, we use Extension:OATHAuth. But either way I have no reason to believe this is a problem with the software itself. [16:55:47] !log tools clush -g all "puppet agent --disable 'trail run for changeset 321786 handling /var/lib/gridengine'" [16:55:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:12:14] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:22:14] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [17:23:14] !log tools reboot tools-exec-1212 (converted via 321786 testing for recovery on boot) [17:23:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:29:07] 06Labs, 10Labs-Infrastructure, 07LDAP: Remove shell user "80686" - https://phabricator.wikimedia.org/T63967#2799796 (10Andrew) As far as I can tell, that error message is not about the username, but about the shell name. Numeric logins are technically permitted on Debian, but any time I google this I find t... [17:34:48] Does anyone happen to know who 'manuel@mirabilis' aka 80686 is? I'd like to contact them about an account issue. [17:37:18] Manuel Schneider? [17:37:44] hm, yep, looks like that's him [17:37:51] I just found him on-wiki, I'll try to contact there [17:38:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [17:39:42] 06Labs, 10Labs-Infrastructure, 07LDAP: Remove shell user "80686" - https://phabricator.wikimedia.org/T63967#2799826 (10Andrew) I don't think it would be crazy to leave the restriction for account creation but let nslcd be more permissive on the hosts. I'll try contacting Manuel first and see what he things. [17:49:15] I noticed that my tools using the kubernetes backend are down. Is there an known outage occurring? [18:06:08] ewulczyn: I think readmore just ran out of RAM [18:06:19] or are other tools down too? [18:07:00] coward: around? [18:09:51] logs are going to be back shortly, which will help [18:19:38] !log reboot tools-exec-1403 [18:19:39] Unknown project "reboot" [18:19:42] !log tools reboot tools-exec-1403 [18:19:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:26:06] coward: uh, your virtualenv is weird [18:26:08] there's no 'python' there?! [18:29:00] coward: you and ewulczyn's 'readmore' have the same strange problem [18:29:05] > ImportError: No module named site [18:29:10] idk what's going on there [18:31:03] !log tools reboot tools-exec-1404 (already depooled) [18:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:34:19] ewulczyn: it's back up now [18:34:35] it was trying to load itself as python2 instead of 3, I rebuilt the images and it's fine now [18:36:25] ewulczyn: coward you all should be back now [18:36:30] apologies for the downtimes [18:42:34] going to restart all k8s webservice pods now [18:42:38] actually [18:42:46] I'm going to wait until chasemp is done with his thing [18:42:49] before doing that [18:45:03] yuvipanda is there a way i can get grrrit-wm to use npm 2 or 3 without having to work around it by using the home dir of the user to store npm [18:45:07] and nodejs please? [18:45:26] not that i know of. we won't support anything not in debian, and if you do you're on your own. [18:45:27] sorry. [18:45:33] Ok [19:06:07] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:21:07] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [19:40:50] yuvipanda: did you ever fix that thing? [19:41:45] Harej: yeah [19:41:50] whee. [19:41:52] should work now [19:42:04] and my apey eye is back up! [19:57:44] !log deployment-prep taking mysql master down to fix perms [19:57:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [20:02:20] !log deployment-prep mysql master back up, root identity is now unix socket based rather than password [20:02:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [20:11:29] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Isomorphyc was created, changed by Isomorphyc link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Isomorphyc edit summary: Created page with "{{Tools Access Request |Justification=I operate a quite active bot at Wiktionary, User:OrphicBot. I would like to see if hosting it at WM Tool Labs is a less error-prone envi..." [20:14:53] !log tools upgrade toollabs-webservice to 0.30 on all webgrid nodes [20:14:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:15:46] Krenair: do you remember why we didn't merge https://gerrit.wikimedia.org/r/#/c/309705? Was it just because it was part of a bigger patchset? [20:16:07] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Isomorphyc was modified, changed by Isomorphyc link https://wikitech.wikimedia.org/w/index.php?diff=983021 edit summary: [20:17:32] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Isomorphyc was modified, changed by Isomorphyc link https://wikitech.wikimedia.org/w/index.php?diff=983022 edit summary: [20:21:37] andrewbogott, you started merging it but I think jenkins wasn't working, then we decided to leave it until the next week or something [20:21:49] ok :) [20:22:02] I'll just merge it now then [20:22:35] well, after I rebase by hand, apparently [20:26:13] PROBLEM - Puppet run on tools-exec-1420 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:26:42] * andrewbogott chases the tip [20:28:36] andrewbogott: wrong time heh [20:28:45] I think I've finished fixing my fuckup [20:33:06] PROBLEM - Puppet run on tools-exec-1401 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:35:31] PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:35:41] PROBLEM - Puppet run on tools-exec-1416 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:35:43] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:36:14] RECOVERY - Puppet run on tools-exec-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [20:36:34] PROBLEM - Puppet run on tools-exec-1414 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:37:08] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:37:12] PROBLEM - Puppet run on tools-exec-1205 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:39:41] PROBLEM - Puppet run on tools-exec-1406 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:40:19] PROBLEM - Puppet run on tools-exec-1201 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:40:39] PROBLEM - Puppet run on tools-exec-1417 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:42:21] PROBLEM - Puppet run on tools-exec-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:42:30] 06Labs, 10Tool-Labs, 10Datasets-General-or-Unknown, 10Wikidata, 07Privacy: Information leak on wikidata-externalid-url - https://phabricator.wikimedia.org/T150803#2800492 (10ArthurPSmith) 05Open>03Invalid @jeblad I'm resolving this as invalid as the initial claim of an information leak seems to be in... [20:43:29] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:48:16] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:55:30] RECOVERY - Puppet run on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [20:57:15] RECOVERY - Puppet run on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:10] 06Labs, 10Labs-Infrastructure: secondary.bastion.wmflabs.org is dead - https://phabricator.wikimedia.org/T150896#2800569 (10yuvipanda) [21:00:21] RECOVERY - Puppet run on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:49] 06Labs, 10Tool-Labs, 13Patch-For-Review: Tools Docker Registry is Dead - https://phabricator.wikimedia.org/T150829#2800584 (10yuvipanda) 05Open>03Resolved a:03yuvipanda This is all sorted out now. For some reason my ssh setup borked exactly at the same time as this happened (see T150896). I was able to... [21:02:35] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Add config option in tools webservice debian package to write logs to /dev/null - https://phabricator.wikimedia.org/T149946#2800590 (10yuvipanda) I've reverted and built package and pushed new images. we need to: 1. Install package on all webgrid nodes... [21:13:07] RECOVERY - Puppet run on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [21:13:15] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [21:15:40] RECOVERY - Puppet run on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [21:15:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [21:16:33] RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [21:17:07] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [21:19:38] RECOVERY - Puppet run on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [21:20:40] RECOVERY - Puppet run on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [21:21:50] !log tools.gridengine-status Working on UTF8 decoding issue for qstat xml output [21:21:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gridengine-status/SAL [21:22:22] RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [21:23:28] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [21:23:55] why the hell isnt my bot working on kubectl atm it was fine yesterday [21:39:32] (03PS1) 10BryanDavis: www: clean non-UTF8 chars from qstat output [labs/toollabs] - 10https://gerrit.wikimedia.org/r/321953 [21:40:09] (03CR) 10BryanDavis: "I've hot patched https://tools.wmflabs.org/?status with this fix." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/321953 (owner: 10BryanDavis) [21:40:47] valhallasw`cloud: ?status is fixed with this ^ [21:43:10] Hello, I am not sure if this is the correct channel for this question. If not, please do let me know. [21:43:28] what question? [21:43:48] I would like to get a (histogram) distribution of the 'article age' (last_edit - NOW()). Is something like that readily available? [21:44:35] bd808: nice! [21:45:19] bd808: although really more of a https://www.youtube.com/watch?v=TAryFIuRxmQ [21:45:22] ferkotaraba do you mean a edit history? [21:46:01] Zppix: yes, basically when was the article last edited [21:46:06] (03CR) 10Merlijn van Deen: [C: 032] www: clean non-UTF8 chars from qstat output [labs/toollabs] - 10https://gerrit.wikimedia.org/r/321953 (owner: 10BryanDavis) [21:46:31] (03Merged) 10jenkins-bot: www: clean non-UTF8 chars from qstat output [labs/toollabs] - 10https://gerrit.wikimedia.org/r/321953 (owner: 10BryanDavis) [21:49:45] Zppix: like I said, I am not sure whether this is the right channel to ask about it. Moreover, I am not sure if my question even makes sense, so please do let me know [21:50:42] ferkotaraba: I don't know of anything like that off the top of my head, but it does seem like the sort of thing that someone would have made a tool for [21:51:23] If I understand your question you are looking for a nice chart of article last edit time [21:51:42] bd808: Understood, yes, that is basically what I am looking for [21:51:54] bd808: so my best bet is to process the whole dump then? [21:51:59] the raw data needed would be in the revision tables for the various wikis [21:52:40] you could probably make a quarry query to figure it out [21:53:33] ferkotaraba are you trying to use an API for a Wikimedia Foundation-ran wiki such as English wikipedia? [21:53:50] revision.rev_timestamp is the edit time [21:54:29] Zppix: currently, I am just trying to figure out how would I go about doing that, but I am particularly interested in English Wikipedia [21:54:29] so you'd want a query that found the newest rev_timestamp for each rev_page [21:54:54] thanks bd808, I'll try to check that out [21:55:21] it's not going to be fast to do for all of enwiki. lots of pages and revisions there [21:56:08] you'll probably need to partition your search based on rev_page ranges [21:56:23] say 1000 at a time or something [21:56:49] ferkotaraba i believe theres a tool already made to find revision details i may be wrong [22:04:35] ferkotaraba: https://quarry.wmflabs.org/query/14096 is a place to start from [22:05:11] thanks a ton bd808 [22:18:26] PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:26:23] 06Labs, 06Community-Tech, 10DBA, 10MediaWiki-extensions-PageAssessments, 13Patch-For-Review: Replicate page_assessments and page_assessments_projects tables on Labs - https://phabricator.wikimedia.org/T150832#2800790 (10kaldari) [22:28:27] RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:36:49] PROBLEM - Puppet run on tools-bastion-05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [22:51:46] RECOVERY - Puppet run on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [22:52:08] PROBLEM - Puppet run on tools-grid-master is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [23:02:07] RECOVERY - Puppet run on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0] [23:02:29] bd808: are there any limits on the size of query/time of execution? [23:04:48] ferkotaraba: there's a time limit. I think something like 20 minutes? [23:05:32] bd808: makes sense, thanks. My query over every single revision won't really work then =) [23:07:01] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 385 bytes in 0.002 second response time [23:07:39] something tells me that toolslab home page isnt supposed to be 503 [23:09:49] seems to have been a blip [23:09:52] or a bad check [23:10:35] ferkotaraba: ha. no it won't via quarry. you could script something and run it on the job grid. I would guess that it will take one the order of days to list them all out. [23:11:54] once you had a baseline you could make subsequent runs faster by joining back in the other direction and only considering revisions added since the last run [23:12:02] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.130 second response time [23:13:29] ferkotaraba: oh! page.page_touched might really be all you need! [23:13:42] bd808: there is such a thing? [23:13:52] as I recall that's almost always the same as the last edit time [23:14:09] yeah. it's a column in the page table [23:15:40] although I think it changes when a transcluded template is updated too [23:16:33] bd808: nah, that's good enough [23:16:43] bd808: it's way faster too =) [23:33:06] PROBLEM - Puppet run on tools-grid-master is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [23:35:39] !log tools.meetbot Submitted request for bot cloak [23:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.meetbot/SAL [23:35:50] !log tools.jouncebot Submitted request for bot cloak [23:35:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.jouncebot/SAL [23:38:07] RECOVERY - Puppet run on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0]