[00:21:01] hey guys [00:21:16] Guest88196: Hi [00:21:31] Allah is doing [00:21:42] sun is not doing allah is doing [00:21:44] to accept Islam say that i bear witness that there is no deity worthy of worship except Allah and Muhammad peace be upon him is his slave and messenger [04:02:37] 06Labs, 10Tool-Labs, 10community-labs-monitoring: Implement a system to monitor tools on tool-labs - https://phabricator.wikimedia.org/T53434#2951500 (10Matthewrbowker) Hello! My apologies for the delay. Based on this information, I'm going to split this task into two parts. First part will be just for... [05:57:23] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/I JethroBT was created, changed by I JethroBT link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/I_JethroBT edit summary: Created page with "{{Tools Access Request |Justification=I'll be working with [[User:Jmorgan]] in maintaining and updating GrantsBot whose files are hosted on Tool Labs. See [https://github.com..." [07:22:57] !log shinken Bringing back shinken post T154336 [07:22:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [07:22:59] T154336: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336 [07:32:25] taxonbot.dwl.eqiad.wmflabs:/home/taxonbot <- is the restoring process of yesterday still running? I'm missing still some files. [07:33:52] checking [07:34:11] doctaxon: ah yes it is still running [07:34:52] can you say, how long it lasts, so about? [07:35:34] doctaxon: yours is the longest process so far - i can't really say because i'm not sure how many more files are pending [07:35:55] okay thanks [07:37:02] doctaxon: if you have access, you can follow along the restore by doing tail -f /var/log/the-great-debucketing [07:37:55] tail to what? [07:41:00] 10Tool-Labs-tools-LTA-Knowledgebase: Create password change function - https://phabricator.wikimedia.org/T155675#2951700 (10Samtar) @Legoktm not really, I was modelling the account request/login structure off of UTRS - to be honest I don't understand how OAuth would be used in a tool where you must first request... [07:41:05] to see logs of the home restore happening [07:43:17] 10Tool-Labs-tools-LTA-Knowledgebase: Require confirmation diff on account request - https://phabricator.wikimedia.org/T155704#2951704 (10Samtar) [07:44:57] no such file or directory [07:45:54] legoktm: if/when you're about I'd appreciate any information you have on how I could use OAuth - I'm not opposed to it, I'm just yet to use it anywhere so don't really understand it! [07:59:57] madhuvishy: can you give me back 3 important files manually, so that I can work on? [08:06:53] doctaxon: not really, sorry [08:09:53] no prob [09:24:58] 06Labs: Request creation of wikidata-federation labs project - https://phabricator.wikimedia.org/T154659#2951928 (10WMDE-leszek) @Andrew @chasemp Thanks! [09:54:54] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Create a cronjob/check to run check_private_data data script and report back - https://phabricator.wikimedia.org/T153680#2952014 (10Marostegui) I would like to get this deployed by Monday so we can watch its behaviour during the week - worst case sc... [10:11:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [11:16:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:19] New wiki has published [11:53:16] Watch out for http error codes [11:54:04] Danger! DO NOT USE THIS WIKI [11:55:00] If an channel operator might be stupid [11:58:33] Try reconnecting again [11:59:17] Try disabling your web cam [12:07:01] !log quarry run chown -R 998:998 quarry/ on labstore1004 [12:07:02] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [12:07:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [14:17:03] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:19] !log restarting apache/puppetmaster on labcontrol1001 to try to fix 'invalid byte sequence in US-ASCII' error [14:36:19] Unknown project "restarting" [14:36:31] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/I JethroBT was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1350711 edit summary: [15:03:26] 10Tool-Labs-tools-LTA-Knowledgebase: Localization - https://phabricator.wikimedia.org/T155738#2952841 (10Samtar) [15:14:39] 06Labs, 10wikitech.wikimedia.org, 05MW-1.29-release-notes, 13Patch-For-Review, 05WMF-deploy-2017-01-17_(1.29.0-wmf.8): LinksUpdate::acquirePageLock error with SMW enabled - https://phabricator.wikimedia.org/T153618#2952882 (10scfc) The job queue seems to be steadily decreasing now; @bd808 (thank you), do... [16:08:04] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:43:28] jynus: Got a minute? Mind taking a look at https://phabricator.wikimedia.org/T155529 if it makes any sense? ;-) [16:46:54] is the restoring process of yesterday still working? any idea, when it will be finished? [16:51:45] doctaxon: Are you on the labs mailing list? [16:52:12] yes [16:52:30] https://lists.wikimedia.org/pipermail/labs-l/2017-January/004863.html was the last update [16:53:30] So most of it should be working, except that list. If something isn't working for you, you should be specific about it [16:54:13] this mail is from 7:20 utc [16:55:09] i think i have to wait furthermore [16:55:30] we're not on that list, though [16:55:44] Wait for what doctaxon? Is your instance on that list? [16:56:38] taxonbot.dwl.eqiad.wmflabs:/home/taxonbot <- my instance [16:57:32] i think, data.eqiad.wmflabs mentions it on the mail [16:58:54] but there are coming files bit by bit, so I think, the process is still running. i only asked for a circa duration [17:05:05] PROBLEM - Puppet run on tools-exec-1221 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:09:38] 06Labs: Request deletion of mediawiki-core-team labs project - https://phabricator.wikimedia.org/T155748#2953258 (10bd808) [17:10:17] 06Labs: Request deletion of mediawiki-core-team labs project - https://phabricator.wikimedia.org/T155748#2953275 (10bd808) It took years, but the project is finally empty so lets kill it before I accidentally add new VMs there. [17:13:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:22:45] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2953307 (10Andrew) [17:22:48] 06Labs: Request deletion of mediawiki-core-team labs project - https://phabricator.wikimedia.org/T155748#2953304 (10Andrew) 05Open>03Resolved a:03Andrew done [17:38:50] 10Tool-Labs-tools-LTA-Knowledgebase: Simplify editing method - https://phabricator.wikimedia.org/T155751#2953366 (10DatGuy) [17:39:20] 10Tool-Labs-tools-LTA-Knowledgebase: Finish editing method - https://phabricator.wikimedia.org/T155338#2953394 (10DatGuy) [17:39:22] 10Tool-Labs-tools-LTA-Knowledgebase: Simplify editing method - https://phabricator.wikimedia.org/T155751#2953393 (10DatGuy) [17:41:28] 10Tool-Labs-tools-LTA-Knowledgebase: Require confirmation diff on account request - https://phabricator.wikimedia.org/T155704#2951704 (10DatGuy) A goal of just converting to OAuth could be more helpful. OAuth confirmation would lead to the "requested account" class. The way described in the description is ineffi... [17:43:18] doctaxon: only two instances are still being restored [17:43:39] yours, and pole.wikidata-query [17:45:06] RECOVERY - Puppet run on tools-exec-1221 is OK: OK: Less than 1.00% above the threshold [0.0] [17:47:38] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Facilitate Volunteer NDA application process for potential Tool Labs standards committee appointees - https://phabricator.wikimedia.org/T154625#2953467 (10Aklapper) [17:47:43] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Facilitate Volunteer NDA application process for potential Tool Labs standards committee appointees - https://phabricator.wikimedia.org/T154625#2918138 (10Aklapper) [17:47:48] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Facilitate Volunteer NDA application process for potential Tool Labs standards committee appointees - https://phabricator.wikimedia.org/T154625#2918138 (10Aklapper) [17:55:24] madhuvishy: what about the missing files? [17:55:45] autoarchiv0.tcl is still missing [17:55:58] and rc.tcl [17:57:15] doctaxon: there seems to be a directory called cat-db that's being restored (this process has been running for about 20 hours now and still going) [17:59:07] ya, cat-db.tcl is missing too [18:00:32] oh, i misunderstood your comment 17:43 utc here [18:00:38] sorry [18:00:51] it's still restoring [18:01:02] okay [18:03:01] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Facilitate Volunteer NDA application process for potential Tool Labs standards committee appointees - https://phabricator.wikimedia.org/T154625#2953532 (10bd808) [18:03:57] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Facilitate Volunteer NDA application process for potential Tool Labs standards committee appointees - https://phabricator.wikimedia.org/T154625#2918138 (10bd808) 05Open>03Resolved [18:04:01] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, and 4 others: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#2953538 (10bd808) [18:08:38] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, and 4 others: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#2953580 (10bd808) The final step in this initial process is for me to write up the outcome of the various votes and poll... [18:09:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:20:12] 06Labs, 10SyntaxHighlight, 10wikitech.wikimedia.org: Extension:SyntaxHighlight_GeSHi reports for pages with syntax highlighting errors are bogus - https://phabricator.wikimedia.org/T153616#2953647 (10bd808) [18:20:14] 06Labs, 10wikitech.wikimedia.org, 05MW-1.29-release-notes, 13Patch-For-Review, and 2 others: LinksUpdate::acquirePageLock error with SMW enabled - https://phabricator.wikimedia.org/T153618#2953644 (10bd808) 05Open>03Resolved a:03bd808 New jobs seem to be moving through the queue. There are a still a... [18:25:36] 06Labs, 10MediaWiki-Vagrant, 15User-Ladsgroup, 15User-bd808: Vagrant 1.9.1 provision failure on Trusty using role::labs:mediawiki_vagrant - https://phabricator.wikimedia.org/T155196#2953671 (10bd808) a:03bd808 [18:37:43] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2953719 (10yuvipanda) Update: Since I'll be travelling on the 25th, I'm going to push this out to early February instead. I'll ping @jcrespo when he's... [18:44:56] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2953763 (10jcrespo) +1, let's meet before to clarify impact. [18:52:02] PROBLEM - Puppet run on tools-puppetmaster-02 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0] [18:57:05] RECOVERY - Puppet run on tools-puppetmaster-02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:57:16] 10PAWS: PAWS lacks packages needed to use the Pandas to_clipboard function - https://phabricator.wikimedia.org/T155763#2953806 (10Neil_P._Quinn_WMF) [19:14:04] 10PAWS: PAWS lacks packages needed to use the Pandas to_clipboard function - https://phabricator.wikimedia.org/T155763#2953900 (10yuvipanda) Unfortunately even if you do install them it won't work - `to_clipboard` will only work when the python process is running on *your* computer, while with PAWS the python pr... [19:29:20] 10PAWS: PAWS lacks packages needed to use the Pandas to_clipboard function - https://phabricator.wikimedia.org/T155763#2953951 (10Neil_P._Quinn_WMF) >>! In T155763#2953900, @yuvipanda wrote: > Unfortunately even if you do install them it won't work - `to_clipboard` will only work when the python process is runni... [19:30:00] 10PAWS: Pandas to_clipboard function does not work in remote environments like PAWS - https://phabricator.wikimedia.org/T155763#2953954 (10Neil_P._Quinn_WMF) p:05Triage>03Low [20:02:56] (03CR) 10BryanDavis: [C: 032] Update for SSH key management and password change [labs/striker/staticfiles] - 10https://gerrit.wikimedia.org/r/328701 (owner: 10BryanDavis) [20:03:03] (03CR) 10BryanDavis: [C: 032] Add js and css for password strength meter [labs/striker/staticfiles] - 10https://gerrit.wikimedia.org/r/329019 (https://phabricator.wikimedia.org/T153935) (owner: 10BryanDavis) [20:03:04] (03Merged) 10jenkins-bot: Update for SSH key management and password change [labs/striker/staticfiles] - 10https://gerrit.wikimedia.org/r/328701 (owner: 10BryanDavis) [20:03:09] (03Merged) 10jenkins-bot: Add js and css for password strength meter [labs/striker/staticfiles] - 10https://gerrit.wikimedia.org/r/329019 (https://phabricator.wikimedia.org/T153935) (owner: 10BryanDavis) [20:03:20] (03CR) 10BryanDavis: [C: 032] Add wheels for sshpubkeys; upgrade cryptography [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/328711 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [20:03:27] (03Merged) 10jenkins-bot: Add wheels for sshpubkeys; upgrade cryptography [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/328711 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [20:04:04] (03CR) 10BryanDavis: [C: 032] Bump wheels submodule for SSH public key management [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/328715 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [20:04:10] (03Merged) 10jenkins-bot: Bump wheels submodule for SSH public key management [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/328715 (https://phabricator.wikimedia.org/T144711) (owner: 10BryanDavis) [20:21:41] 10Tool-Labs-tools-Pageviews: Provide a yearly "Data type" option for topviews - https://phabricator.wikimedia.org/T154446#2954154 (10EdErhart-WMF) I can confirm that this would be very useful, as I got several requests for similar data after the Wikimedia blog published a 'most-read English Wikipedia articles of... [20:23:51] (03PS1) 10BryanDavis: Bump static and striker submodules [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/333095 (https://phabricator.wikimedia.org/T144712) [20:26:00] (03CR) 10BryanDavis: [C: 032] Bump static and striker submodules [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/333095 (https://phabricator.wikimedia.org/T144712) (owner: 10BryanDavis) [20:26:06] (03Merged) 10jenkins-bot: Bump static and striker submodules [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/333095 (https://phabricator.wikimedia.org/T144712) (owner: 10BryanDavis) [21:14:00] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:21:22] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:21:49] ^ not sure if related but getting timeouts SSHing in [21:22:09] hello [21:22:10] we're looking [21:22:11] PROBLEM - High iowait on tools-grid-master is CRITICAL: CRITICAL: tools.tools-grid-master.cpu.total.iowait (>22.22%) [21:22:15] PROBLEM - Puppet run on tools-checker-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:22:20] yuvipanda: <3 [21:22:31] PROBLEM - High iowait on tools-exec-gift is CRITICAL: CRITICAL: tools.tools-exec-gift.cpu.total.iowait (>20.00%) [21:22:40] you guys don't get enough credit, so thanks :-) [21:22:57] PROBLEM - Puppet run on tools-worker-1008 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:23:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:23:26] PROBLEM - Puppet run on tools-webgrid-lighttpd-1401 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:26:04] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.027 second response time [21:26:24] PROBLEM - Puppet run on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:27:12] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:29:28] PROBLEM - Puppet run on tools-static-10 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:30:18] PROBLEM - Puppet run on tools-worker-1015 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:31:21] PROBLEM - Puppet run on tools-worker-1004 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:31:41] PROBLEM - Puppet run on tools-worker-1022 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:32:11] RECOVERY - High iowait on tools-grid-master is OK: OK: All targets OK [21:32:31] RECOVERY - High iowait on tools-exec-gift is OK: OK: All targets OK [21:35:37] PROBLEM - Puppet run on tools-worker-1003 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:35:38] PROBLEM - Puppet run on tools-worker-1014 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:36:23] PROBLEM - Puppet run on tools-docker-builder-03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:37:02] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 531 bytes in 0.009 second response time [21:37:16] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:38:08] PROBLEM - Puppet run on tools-worker-1023 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:38:14] PROBLEM - High iowait on tools-grid-master is CRITICAL: CRITICAL: tools.tools-grid-master.cpu.total.iowait (>30.00%) [21:39:30] PROBLEM - Puppet run on tools-static-11 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:39:36] PROBLEM - Puppet run on tools-exec-1414 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:40:02] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [21:40:08] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:40:12] PROBLEM - Puppet run on tools-webgrid-lighttpd-1206 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:40:26] PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:40:32] PROBLEM - Puppet run on tools-worker-1007 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:40:41] PROBLEM - Puppet run on tools-exec-1417 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:40:41] PROBLEM - Puppet run on tools-exec-1416 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:40:43] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:40:51] PROBLEM - Puppet run on tools-worker-1013 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:41:13] Is tools down as well now? [21:41:17] PROBLEM - Puppet run on tools-worker-1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:41:25] RECOVERY - Puppet run on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [21:41:39] PROBLEM - Puppet run on tools-mail-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:41:53] PROBLEM - Puppet run on tools-worker-1002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:42:21] Zppix: yes [21:43:33] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:44:07] yuvipanda i just love the spam [21:44:20] Zppix: the spam loves you too! [21:44:23] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:44:29] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:44:39] PROBLEM - Puppet run on tools-exec-1406 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:44:51] PROBLEM - Puppet run on tools-exec-1217 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:45:13] PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:45:17] PROBLEM - Puppet run on tools-worker-1009 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:47:14] yuvipanda as long as i dont have to eat it [21:47:15] PROBLEM - Puppet run on tools-worker-1016 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:47:15] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:47:15] PROBLEM - Puppet run on tools-exec-gift is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:47:16] PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:47:17] PROBLEM - Puppet run on tools-exec-1412 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:47:18] PROBLEM - Puppet run on tools-worker-1018 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:47:18] PROBLEM - Puppet run on tools-worker-1017 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:47:18] PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:47:19] PROBLEM - Puppet run on tools-worker-1012 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:47:19] PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:47:19] PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:47:51] PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:47:58] PROBLEM - Puppet run on tools-exec-1418 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:48:14] PROBLEM - Puppet run on tools-webgrid-lighttpd-1205 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:48:14] RECOVERY - High iowait on tools-grid-master is OK: OK: All targets OK [21:48:22] PROBLEM - Puppet run on tools-exec-1410 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:48:23] PROBLEM - Puppet run on tools-mail is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:48:24] PROBLEM - Puppet run on tools-worker-1011 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:49:56] PROBLEM - Puppet run on tools-worker-1020 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:50:06] PROBLEM - Puppet run on tools-grid-master is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:50:10] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:50:56] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:51:12] PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:52:00] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.036 second response time [21:57:13] RECOVERY - Puppet run on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:58:25] RECOVERY - Puppet run on tools-webgrid-lighttpd-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [21:59:21] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:02:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [22:02:27] Zppix: can you try now? should be back [22:02:30] bd808__: ^ [22:03:00] RECOVERY - Puppet run on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [0.0] [22:03:05] yuvipanda: bastion-02 is working for me and stashbot came back to life [22:03:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [22:03:48] bd808 do i have to restart my kubectl stuff? [22:04:30] RECOVERY - Puppet run on tools-static-10 is OK: OK: Less than 1.00% above the threshold [0.0] [22:04:31] Zppix: not sure. stashbot was self-healing [22:05:24] Zppix: if it's not coming back for you then I'd try restarting manually [22:05:42] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:06:14] !log deployment-prep added nuria to deploy-service group on deployment-tin [22:06:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [22:06:23] RECOVERY - Puppet run on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [22:06:40] RECOVERY - Puppet run on tools-worker-1022 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:04] bd808 okay a different question was kubectl affected [22:07:36] kubectl is just the cli api wrapper to talking to kubernetes. you need to be more specific [22:08:19] correction: kubernetes [22:08:33] i abbreviate kubernetes to kubectl cause its easier for me to remember :P [22:08:40] k8s :) [22:08:58] RECOVERY - Puppet run on tools-worker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:18] containers would have lost NFS while the server failed over. that may or may not have crashed a particular container [22:09:30] RECOVERY - Puppet run on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:38] if the whole container crashed then it should be automatically restarted [22:10:00] if the app in the container just freaked out but did not die then you will probably need a manual restart [22:10:18] RECOVERY - Puppet run on tools-worker-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [22:10:27] so TL;DR is "maybe" [22:11:00] !log deployment-prep added bunch of others to the same group per request. we should figure out how to make this process sane somehow [22:11:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [22:11:22] RECOVERY - Puppet run on tools-docker-builder-03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:11:38] Krenair: did you implement a 3 day wait? [22:11:49] Krenair: service-group? Or is that how we already set it up? [22:11:59] yuvipanda: :) [22:12:04] bd808, deploy-service [22:12:07] local group [22:12:27] would be admin module controlled but we don't run that in labs [22:12:38] on my striker project I just set the group to wikidev in hiera I think [22:12:59] 10Tool-Labs-tools-LTA-Knowledgebase: Create password change function - https://phabricator.wikimedia.org/T155675#2954530 (10Samtar) 05Open>03Resolved [22:13:07] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-LTA-Knowledgebase: tools.lta missing replica.my.cnf - https://phabricator.wikimedia.org/T155317#2954532 (10Samtar) [22:13:09] 10Tool-Labs-tools-LTA-Knowledgebase: Finish editing method - https://phabricator.wikimedia.org/T155338#2954531 (10Samtar) 05Open>03Resolved [22:14:23] some time ago, tools-exec-gift/the giftbot queue could hold 200 simultaneous tasks of a job array (or simply speaking: 200 jobs), right now there are 25. did something change? should i just expect less? [22:14:29] RECOVERY - Puppet run on tools-static-11 is OK: OK: Less than 1.00% above the threshold [0.0] [22:14:32] bd808 i found an answer to my question incase someone else asks some things will require a pod deletion and webservices on k8s will need to have webservice restart ran for good measure [22:14:33] RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [22:14:55] annika the servers for tools just got back to normal and still are doing so [22:15:07] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:15] RECOVERY - Puppet run on tools-webgrid-lighttpd-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:19] yes, but this is without outages, too [22:15:33] RECOVERY - Puppet run on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:39] RECOVERY - Puppet run on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:39] RECOVERY - Puppet run on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:41] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:49] RECOVERY - Puppet run on tools-worker-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [22:16:03] Zppix: like I said, it depends. https://tools.wmflabs.org/versions/ for example is fine with no manual restarts [22:16:40] RECOVERY - Puppet run on tools-mail-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:16:44] bd808 but are they on k8s or the grid [22:16:53] k8s [22:17:00] i run all my stuff for my tool on k8s i recently moved my webservice to k8s [22:17:15] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [22:17:33] my irc bot needed manual intervention i didnt check to see if my did before i restarted, i just went ahead and did both just to be safe [22:17:56] *nod* it won't hurt anything for sure [22:18:09] RECOVERY - Puppet run on tools-worker-1023 is OK: OK: Less than 1.00% above the threshold [0.0] [22:18:11] except in-process requests [22:18:35] bd808: wotcha, thanks for the help the other day - replica.my.cnf all generated :) quick question you may know the answer to, do you know who I'd bother reference getting a mailing list set up for the tool? [22:18:35] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [22:19:05] samtar: I think there is a phabricator project for mailman list requests [22:19:28] samtar: https://phabricator.wikimedia.org/project/view/190/ [22:19:37] RECOVERY - Puppet run on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [22:19:44] bd808: you're a star, thanks :) [22:19:55] file a task there and the fine folks who handle mailman will help you out [22:20:06] * bd808 is a star! [22:20:27] RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:20:56] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:21:18] RECOVERY - Puppet run on tools-worker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [22:21:54] RECOVERY - Puppet run on tools-worker-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [22:22:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [22:23:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [22:23:21] RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [22:23:23] RECOVERY - Puppet run on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [22:23:25] RECOVERY - Puppet run on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:29] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:51] RECOVERY - Puppet run on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:11] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:17] RECOVERY - Puppet run on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:44] RECOVERY - Puppet run on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:52] RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:54] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [22:26:10] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:26:20] RECOVERY - Puppet run on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [22:26:28] RECOVERY - Puppet run on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [22:26:29] RECOVERY - Puppet run on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [22:26:59] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [22:26:59] RECOVERY - Puppet run on tools-worker-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [22:27:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [22:27:51] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [22:28:01] RECOVERY - Puppet run on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [22:29:48] shinken-wm tools was down [22:29:51] we get it [22:29:54] :p [22:29:57] RECOVERY - Puppet run on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [22:29:58] ( yes i know its a bot) [22:30:05] RECOVERY - Puppet run on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0] [22:40:48] Jamesofur: hi! duplet is down again, please fix ASAP [22:40:58] :-/ [22:41:07] * Jamesofur has no good way to stop that from happening [22:41:17] will restart it as soon as I can [22:41:24] do you know why it stops? [22:41:29] nope [22:41:47] what are the symptoms that you see? [22:43:13] James_F what is duplet first off? [22:43:19] sorry James_F i meant Jamesofur [22:43:23] Jamesofur: and where is its source code? I may be able to run a copy [22:44:48] gry: restarted, it seems to be running out of connections somehow and then killing itself, what I don't know is what is causing it to keep connections open when it shouldn't (or doesn't need them anymore). I'm not sure if it's a build up issue or specific types of queries that are doing it or what. [22:45:01] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:45:14] Jamesofur: have you uploaded its source code anywhere? [22:45:34] Zppix: http://tools.wmflabs.org/dupdet/ tool to help detect duplication in two webpages (to help find copyright violations) [22:46:19] 10Tool-Labs-tools-Pageviews, 10Analytics: Disable queries for recent data on stats.grok.se - https://phabricator.wikimedia.org/T155785#2954564 (10Krinkle) [22:46:23] i and another user actively use it for reviewing articles at a wiki [22:46:25] Hi! I am new here, can anyone help me with phabrictor? [22:46:28] gry: yup, I didn't write it originally (just take over since it didn't have a maintainer anymore) but have a copy of the source on https://github.com/jamesryanalexander/Duplication-Detector [22:46:44] being in the middle of review and not being able to do a copyvio check using this wonderful tool is a bit hair-pulling experience :p [22:46:50] thanks :) [22:47:08] I want to spend some time digging into it to find the main issue (there are also some undefined indexes that log errors which I'd really like to just fix ;) so that I don't have to see them in the error logs) [22:47:27] I just haven't had the time to sit down for a weekend and work on it or something :( [22:47:32] if you're interested I'm also very happy to have you added as a maintainer too :) [22:47:41] in no way protective of it! [22:47:56] other then wanting to make sure it doesn't go bye bye [22:49:53] Jamesofur: how do you know that it's the problem with connections and not with something else? [22:50:21] if you would like to add me as maintainer, i should be 'gryllida' or 'Svetlana Tkachenko'; given that it's php, i may be able to run a copy within the same account in a subdirectory [22:51:02] gry: nope, certainly can't guarantee that for sure, but I do know that whenever it crashes it does so with something similar to "2017-01-19 15:56:09: (server.c.1444) [note] sockets disabled, connection limit reached " [22:51:04] on wikitech it is my full name as username, and it's Gryllida on every other wiki [22:51:14] ah ok :) that sounds a bit useful [22:53:36] yup, the only other errors in the log are some undefined indexes that should get fixed but shouldn't be causing this [22:54:30] i have a debian laptop at home so i should be able to configure the webserver limit to something ridiculously small and attempt to reproduce it perhaps [22:54:41] granted i figure out which webserver it's running on at present, first [22:56:46] gry: you're added as a maintainer now [23:00:07] 06Labs, 10Tool-Labs: Explain/Investigate low number of giftbot queue jobs - https://phabricator.wikimedia.org/T155789#2954669 (10Giftpflanze) [23:00:56] Jamesofur: thanks, i've added a few URLs to my todo so i'll take a look soon [23:01:07] Jamesofur: how do i restart it when it falls over next time? [23:03:19] gry: login to tools-login, become dupdet, webservice restart [23:03:49] ok , i did the first two, it works, i'll do the third one any time it falls over again [23:03:51] thanks Jamesofur :) [23:04:47] awesome!, course :) [23:07:05] (i added a ~/.description with source link and a description) [23:07:20] (it'll show up at https://tools.wmflabs.org/?list last column soon) [23:08:25] gry it should be almost instant usually [23:08:48] not yet :) [23:09:07] Jamesofur if we have earwig's copyvio tool couldnt we just use that instead of dupedetector? [23:09:30] Zppix: dupdet has advantage of showing context, so scuffed up passages can also be identified [23:09:49] Zppix: gry can speak better about it but my understanding is that there are some features on dupdet that earwig's doesn't have which are useful [23:09:51] yes that :) [23:09:55] and perhaps a personal preference for the interface and analyzing in pairs rather than in percentages [23:10:04] having competing tools is not a bad thing :) [23:11:31] Jamesofur is it running on grid or k8s [23:13:15] it was only after being linked that source URL that i understood that it's not a duplet but is a dupdet, and understood what the name stands for [23:15:09] the .description now works on the list of tools too. woo. :) [23:15:27] gry what are you using for webservice k8s or grid [23:16:00] grid engine with lighthttpd [23:16:47] not sure what that does or how to change it. the service.manifest file asks to not edit it by hand. will need to read more docs before touching that :) [23:18:08] gry well next time webservice start -backend=kubectl is how you change it [23:18:22] i see how that is [23:19:05] now dont ask how to make something non webservice related run on k8s that i have 0 clue, now i can delete a pod and get list of pods but thats about it xD [23:23:42] for now k8s is only webservices [23:24:53] chasemp i run my irc bot using the kubectl commands on shell [23:25:18] ah, I'm not sure that's an intended use case atm but I suppose it works [23:25:55] madhuvishy: did the /home restoring finish? [23:26:45] doctaxon: yup [23:26:54] but I have no permission [23:27:30] chasemp well whatever it works and no-ones yelled at me yet xD [23:27:31] doctaxon: what is your username? [23:27:48] taxonbot [23:27:50] chasemp not to mention lolrrit (before the merge to wm-bot) ran on k8s [23:27:58] doctaxon: it all looks good - drwxr-xr-x 11 taxonbot wikidev 4096 Jan 19 22:32 taxonbot [23:28:02] i just `jsub` my irc bot [23:28:06] brb [23:28:12] Zppix: I'm not mad at you :) cool to see you leveraging k8s [23:28:40] -bash: ./wkat2.tcl: Permission denied [23:29:14] it seems to be in read only mode [23:29:38] doctaxon: try doing chmod +x wkat2.tcl [23:30:11] doctaxon: I'm not sure we made things x if it wasn't already set that way and recovery may have changed exec perms [23:30:30] some permissions may have gotten messed up [23:30:32] yeah [23:31:09] yeah <- ? [23:31:28] was responding to chase's comment [23:31:34] did the chmod work? [23:32:27] yes, but the interpreter redirect doesn't work [23:32:46] but I'll do this tomorrow [23:32:53] have a good nitgh [23:32:55] night [23:32:56] bye [23:32:59] doctaxon: not sure what that means, what was the command? [23:33:00] and thanks [23:33:24] okay [23:33:28] np! [23:34:01] madhuvishy: see -ops [23:34:39] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:41:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [23:41:42] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [23:51:36] madhuvishy: did you somehow overwrite/delete /data/project on taxonbot.dwl? [23:52:01] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2907785 (10Ocaasi_WMF) Having trouble with Wikipedia Library Card platform suddenly. It wasn't on the planned list, but we're getting server errors: Failure: htt... [23:52:11] i guess it's mounted but i shouldn't be [23:53:10] i didn't order nfs for that ;) [23:54:00] annika: it's probably not nfs [23:54:17] lrwxrwxrwx 1 root root 35 Jan 18 18:10 /data/project -> /mnt/nfs/labstore-secondary-project [23:54:23] yup [23:54:27] it's a broken symlink [23:54:35] verifying [23:55:00] it's probably a very rare condition to use /data/project not on nfs but in that case madhuvishy I imagine it got clobbered [23:55:02] https://www.irccloud.com/pastebin/hN0TqXHM/ [23:55:12] can modify the repair script to take that on? [23:55:25] chasemp: no, when the home symlinks got created ubiquitously [23:55:31] data/project did too [23:55:40] just a bunch of broken symlinks to clean up [23:55:46] i hope [23:55:47] sure, but almost no one uses /data/project path for their non-nfs work [23:55:53] yeah [23:55:55] so it's not damaging persay in nearly all cases [23:55:58] is my thought [23:56:09] i'll check with the restore script anyway [23:56:36] annika: did you have things in /data/project? [23:56:39] i probably should store that somewhere else when i set up that instance with an bigger image again [23:56:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [23:56:55] yeah, our interpreter + environment [23:57:00] aah [23:57:03] i could copy it over again [23:57:15] it a tools leftover [23:57:15] okay if not let me know I'll look into restoring it [23:57:49] i can delete the broken symlink and you can recreate the directory [23:57:51] annika: no wrong doing on your part I think, just unexpected [23:57:56] yeah [23:57:59] phew [23:59:55] ok, i found the tar file