[00:29:51] so should adminbot run on exec nodes or on login.. if, as i assume, exec nodes are the correct place, let's just remove the package from the login box [00:30:09] just for less confusion where it runs.. because they are also different distro version [00:30:20] and it was confusing which one we even need to build the package for [00:30:47] precise or trusty [00:31:08] nothing should run on login nodes [00:31:59] i agree. i call it the fenari effect [00:33:14] "snitch", pircbot [00:34:47] ok, so regarding adminbot which i care for right now [00:34:51] modules/toollabs/manifests/exec_environ.pp: 'adminbot', [00:35:11] so puppetized to be on exec nodes.. right.. then i know what for [00:35:41] what to use when changing the package files [00:38:25] still doesnt mean i saw the actual process called "adminbot" running on either of them [00:38:42] tried just one random exec node though.. how do i tell [01:11:06] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1443724 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/180890/ (add URL to log output) https://gerrit.wikimedia.org/r/#/c/223735/ (deb package 1.7.9) merged bo... [01:13:40] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1443726 (10Dzahn) ``` @carbon:/srv/wikimedia# reprepro ls adminbot adminbot | 1.6.3 | lucid-wikimedia | amd64, source adminbot |... [01:25:42] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/WMDE-leszek was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=169834 edit summary: [01:26:38] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Sdia45 was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=169835 edit summary: [01:39:40] Krenair: [01:39:45] root@tools-exec-1203:/etc/apt/sources.list.d# apt-get -s install adminbot [01:39:52] The following packages will be upgraded: adminbot [01:39:57] Inst adminbot [1.7.6] (1.7.9 Wikimedia:12.04/precise-wikimedia [all]) [01:39:57] Conf adminbot (1.7.9 Wikimedia:12.04/precise-wikimedia [all]) [01:41:48] ... [01:41:52] chmod: cannot access `/usr/lib/adminbot/README': No such file or directory [01:42:00] dpkg: error processing adminbot (--configure): subprocess installed post-installation script returned error exit status 1 [01:42:03] Errors were encountered while processing: adminbot [01:42:06] ... :p [01:43:02] why are you sending me this? [01:43:28] I don't know much about adminbot [01:44:43] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:45:01] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:45:05] Krenair: i'm confused. should have been valhallasw`cloud [01:45:31] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:45:31] PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL 50.00% of data above the critical threshold [0.0] [01:45:37] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL 30.00% of data above the critical threshold [0.0] [01:45:51] PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:46:17] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1210 is CRITICAL 33.33% of data above the critical threshold [0.0] [01:46:33] PROBLEM - Puppet failure on tools-precise-dev is CRITICAL 30.00% of data above the critical threshold [0.0] [01:46:36] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1443758 (10Dzahn) so i made a mistake being in the wrong environment on the repo host when using reprepro commands which meant it did not regenerate indices whic... [01:47:30] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 11.11% of data above the critical threshold [0.0] [01:48:11] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1443759 (10Dzahn) ``` chmod: cannot access `/usr/lib/adminbot/README': No such file or directory dpkg: error processing adminbot (--configure): subprocess insta... [01:49:38] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 22.22% of data above the critical threshold [0.0] [01:52:42] PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL 30.00% of data above the critical threshold [0.0] [01:53:14] PROBLEM - Puppet failure on tools-exec-1219 is CRITICAL 11.11% of data above the critical threshold [0.0] [01:53:37] PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL 44.44% of data above the critical threshold [0.0] [01:53:43] PROBLEM - Puppet failure on tools-exec-1214 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:53:59] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL 60.00% of data above the critical threshold [0.0] [01:54:24] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1443760 (10Dzahn) ``` cd /usr/lib/adminbot touch README apt-get remove adminbot apt-get install adminbot root@tools-exec-1203:/usr/lib/adminbot# dpkg -l | grep... [01:54:48] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1443762 (10Dzahn) 5Open>3Resolved a:3Dzahn [01:55:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL 44.44% of data above the critical threshold [0.0] [01:55:14] PROBLEM - Puppet failure on tools-submit is CRITICAL 22.22% of data above the critical threshold [0.0] [01:56:22] PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL 60.00% of data above the critical threshold [0.0] [01:56:40] PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:56:58] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1209 is CRITICAL 30.00% of data above the critical threshold [0.0] [01:57:46] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 60.00% of data above the critical threshold [0.0] [01:57:50] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 60.00% of data above the critical threshold [0.0] [01:57:52] PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:57:56] PROBLEM - Puppet failure on tools-exec-1206 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:58:28] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1205 is CRITICAL 55.56% of data above the critical threshold [0.0] [01:59:09] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 44.44% of data above the critical threshold [0.0] [01:59:33] PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL 60.00% of data above the critical threshold [0.0] [01:59:57] PROBLEM - Puppet failure on tools-exec-1201 is CRITICAL 11.11% of data above the critical threshold [0.0] [02:00:03] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 40.00% of data above the critical threshold [0.0] [02:00:27] PROBLEM - Puppet failure on tools-exec-gift is CRITICAL 66.67% of data above the critical threshold [0.0] [02:01:58] PROBLEM - Puppet failure on tools-exec-wmt is CRITICAL 50.00% of data above the critical threshold [0.0] [02:02:56] PROBLEM - Puppet failure on tools-exec-1210 is CRITICAL 60.00% of data above the critical threshold [0.0] [02:03:00] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1202 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:15:19] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1443798 (10Dzahn) that broke puppet across all exec nodes.. with the help of AndrewBogott applied the fix above on all of them. making puppet happy again. i did... [02:17:04] 6Labs, 6Discovery, 10Maps: WikiMiniAtlas (wma.wmflabs.org) is still down - https://phabricator.wikimedia.org/T104417#1443799 (10dschwen) Is it possible to reboot the maps-wma1 instance while the restoration is ongoing? I wonder when I can start logging back in. [02:19:37] RECOVERY - Puppet failure on tools-exec-1218 is OK Less than 1.00% above the threshold [0.0] [02:20:13] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1201 is OK Less than 1.00% above the threshold [0.0] [02:20:53] !log test [02:20:53] Message missing. Nothing logged. [02:21:03] !log good [02:21:04] Message missing. Nothing logged. [02:22:45] RECOVERY - Puppet failure on tools-exec-1215 is OK Less than 1.00% above the threshold [0.0] [02:22:47] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [0.0] [02:22:57] RECOVERY - Puppet failure on tools-exec-1210 is OK Less than 1.00% above the threshold [0.0] [02:23:15] RECOVERY - Puppet failure on tools-exec-1219 is OK Less than 1.00% above the threshold [0.0] [02:23:29] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1205 is OK Less than 1.00% above the threshold [0.0] [02:23:43] RECOVERY - Puppet failure on tools-exec-1214 is OK Less than 1.00% above the threshold [0.0] [02:25:15] RECOVERY - Puppet failure on tools-submit is OK Less than 1.00% above the threshold [0.0] [02:25:29] RECOVERY - Puppet failure on tools-exec-gift is OK Less than 1.00% above the threshold [0.0] [02:26:36] RECOVERY - Puppet failure on tools-exec-1217 is OK Less than 1.00% above the threshold [0.0] [02:27:00] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1209 is OK Less than 1.00% above the threshold [0.0] [02:27:52] RECOVERY - Puppet failure on tools-exec-1205 is OK Less than 1.00% above the threshold [0.0] [02:27:58] RECOVERY - Puppet failure on tools-exec-1206 is OK Less than 1.00% above the threshold [0.0] [02:28:00] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1202 is OK Less than 1.00% above the threshold [0.0] [02:29:56] RECOVERY - Puppet failure on tools-exec-1201 is OK Less than 1.00% above the threshold [0.0] [02:31:16] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [0.0] [02:31:58] RECOVERY - Puppet failure on tools-exec-wmt is OK Less than 1.00% above the threshold [0.0] [02:34:42] RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0] [02:35:02] RECOVERY - Puppet failure on tools-exec-1212 is OK Less than 1.00% above the threshold [0.0] [02:35:11] 6Labs, 6Phabricator, 7Puppet: On labs phabricator references security extension even though it isn't present - https://phabricator.wikimedia.org/T104904#1443812 (10Negative24) Possibly related: {P934} Only got these errors on a newly created instance. [02:35:34] RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0] [02:35:35] RECOVERY - Puppet failure on tools-exec-1207 is OK Less than 1.00% above the threshold [0.0] [02:35:36] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1204 is OK Less than 1.00% above the threshold [0.0] [02:35:50] RECOVERY - Puppet failure on tools-exec-1208 is OK Less than 1.00% above the threshold [0.0] [02:36:33] RECOVERY - Puppet failure on tools-precise-dev is OK Less than 1.00% above the threshold [0.0] [02:38:39] RECOVERY - Puppet failure on tools-exec-1211 is OK Less than 1.00% above the threshold [0.0] [02:41:21] RECOVERY - Puppet failure on tools-exec-1213 is OK Less than 1.00% above the threshold [0.0] [02:42:43] RECOVERY - Puppet failure on tools-exec-1209 is OK Less than 1.00% above the threshold [0.0] [02:55:47] !log testlabs test log [02:55:50] Logged the message, dummy [02:57:33] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0] [02:59:39] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [0.0] [05:47:26] 6Labs: Create plagiabot labs project - https://phabricator.wikimedia.org/T105442#1443951 (10eranroz) 3NEW [06:25:42] mutante: ugh. Thanks again for putting time into it [06:27:19] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1444010 (10valhallasw) >>! In T105169#1443798, @Dzahn wrote: > i did restart the morebots copy running in the production channel.. it came back but logging itsel... [08:00:30] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104, 3Labs-Sprint-105: Labs: increase size of the volume for the maps project and restore - https://phabricator.wikimedia.org/T103358#1444149 (10Kghbln) @coren Affirmative. I just checked if the Maps on e.g. [[ https://de.wikivoyage... [08:51:04] 6Labs, 10Labs-Infrastructure: Tool database p50380g50851__mixnmatch_p no longed with wikidatawiki_p - https://phabricator.wikimedia.org/T102105#1444225 (10Magnus) 5Open>3Resolved a:3Magnus [08:51:52] 6Labs: Cannot ssh into wdq-mm-02.eqiad.wmflabs - https://phabricator.wikimedia.org/T101102#1444227 (10Magnus) 5Open>3Resolved a:3Magnus [09:01:42] 10Quarry: Add a list/table of popular queries - https://phabricator.wikimedia.org/T71266#1444241 (10Edgars2007) @He7d3r don't think this is a good idea to get top N queries by number of times they were executed. When one is testing the query, he probably executes it //to much//. But I support the idea itself to... [09:11:30] 6Labs, 7Tracking: New project: WikidataLDF - https://phabricator.wikimedia.org/T105457#1444248 (10CristianCantoro) 3NEW [09:13:35] that's my fault... ^_^ [10:23:44] 6Labs, 10Tool-Labs: Lost connection to MySQL server during query when executing large query's - https://phabricator.wikimedia.org/T105468#1444393 (10Steinsplitter) 3NEW [10:35:54] 6Labs, 10Tool-Labs: Lost connection to MySQL server during query when executing large query's - https://phabricator.wikimedia.org/T105468#1444440 (10jcrespo) We need more info: which query (give some example)? Which server? Which user? [10:38:47] (03PS1) 10Giuseppe Lavagetto: Moving the nodepool mock private data to the appropriate place [labs/private] - 10https://gerrit.wikimedia.org/r/224045 [10:40:57] (03PS2) 10Giuseppe Lavagetto: Moving the nodepool mock private data to the appropriate place [labs/private] - 10https://gerrit.wikimedia.org/r/224045 (https://phabricator.wikimedia.org/T105406) [10:41:09] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Moving the nodepool mock private data to the appropriate place [labs/private] - 10https://gerrit.wikimedia.org/r/224045 (https://phabricator.wikimedia.org/T105406) (owner: 10Giuseppe Lavagetto) [10:54:11] (03PS1) 10Hashar: Ssh pub key for nodepool [labs/private] - 10https://gerrit.wikimedia.org/r/224047 [10:59:48] 6Labs: Create plagiabot labs project - https://phabricator.wikimedia.org/T105442#1444466 (10yuvipanda) Can you not just do this on a different account in tool labs? [11:00:45] (03CR) 10Hashar: "Puppet patch is https://gerrit.wikimedia.org/r/#/c/224049/" [labs/private] - 10https://gerrit.wikimedia.org/r/224047 (owner: 10Hashar) [11:02:39] (03Abandoned) 10Hashar: Ssh pub key for nodepool [labs/private] - 10https://gerrit.wikimedia.org/r/224047 (owner: 10Hashar) [11:16:16] lunchh [11:19:54] 6Labs, 10Tool-Labs: Lost connection to MySQL server during query when executing large query's - https://phabricator.wikimedia.org/T105468#1444539 (10Steinsplitter) >>! In T105468#1444440, @jcrespo wrote: > We need more info: which query (give some example)? cat /data/project/steinsplitter/rig.sql >Which ser... [11:26:30] 10Quarry: Add a list/table of popular queries - https://phabricator.wikimedia.org/T71266#1444564 (10He7d3r) ...maybe the most accessed (i.e. most page views)? [11:32:13] Steinsplitter: jcrespo doesn't have access to tool labs, so it helps if you paste the relevant sql [11:33:50] Steinsplitter: as for the server -- the host you connect to, e.g. enwiki.labsdb. The user jcrespo is interested in is the uXXXX value in your replica.cnf [11:52:58] thank you, valhallasw`cloud. I do not own labs, so any clarification is welcome [12:05:19] jynus/valhallasw`cloud: https://phabricator.wikimedia.org/T105468#1444539 i changed the comment [12:05:40] Steinsplitter: thanks [12:16:00] valhallasw`cloud: maybe the querry is too expensive, i don't know. :P [12:16:48] Steinsplitter: if the database server would kill queries automatically, this should be documented (and preferrably have an email sent as with TS's query killer) [12:17:57] ok :) [13:11:43] 6Labs, 3Labs-Sprint-105: Automate snapshots / backups of labstore - https://phabricator.wikimedia.org/T105027#1444718 (10coren) Test run of the snapshot-and-backup code is in progress over the tools filesystem. [13:20:37] the main issue I can see now is that labdb1002 is overloaded and swapping [13:21:14] I wouldn't be surprised if the server was simply killed with an OOM itself [13:26:06] no, the server does not get killed, but it kills the queries using more memory [13:31:09] jynus: If you find users that are outliers in their usage, feel free to tell any of the labs admins and we'll sit down with those users to review their queries and propose alternatives. [13:32:19] Coren, I am going to write some pieces of advice for Steinsplitter to improve his query spead, as he took the time to create a ticket [13:32:38] however, i do not see right now a specific abuse [13:32:45] from any user [13:32:57] jynus: Yeay. That's a *good* thing. :-) [13:33:00] just too many users connecting right now :-) [13:33:15] How are 1001 and 1003? Would spreading the load help? [13:33:16] well, i spend some time to find a way to run it faster. but it runs always forever [13:33:30] it is my longest query, all others are really fast :) [13:33:41] logging is always slow :( [13:33:49] no, Steinsplitter, it gets killed without having any specific blocks [13:33:56] let me give you some tips on the ticket [13:34:04] and sorry for the inconveniences [13:34:10] the resouces are limited [13:34:11] thanks, highly appricated :) [13:36:00] 6Labs, 10Tool-Labs: Lost connection to MySQL server during query when executing large query's - https://phabricator.wikimedia.org/T105468#1444809 (10jcrespo) @Steinsplitter: Your query is using too much memory (why it worked before, I suppose that it has to do with the number of records having increased plus t... [13:37:17] Coren, yes, redirecting some queries right now to labsdb1003 will help [13:37:29] to ease labsdb1002 load [13:38:02] http://ganglia.wikimedia.org/latest/?c=MySQL%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [13:39:22] jynus: how you mean store on a temp table? [13:39:47] not really a temp table, a user table in one of your databases [13:40:02] i hate doing that :/ [13:40:04] so you update that instead of doing the full query every time [13:40:19] i run it only once at day [13:45:12] not sure if using logging_logindex make it faster [13:46:04] 6Labs, 10Tool-Labs, 10Wikimania-Hackathon-2015: Workshop: Doing Research on Wikimedia things as a volunteer - tools and communities - https://phabricator.wikimedia.org/T91062#1444829 (10yuvipanda) [13:48:29] 6Labs, 10Tool-Labs, 10Wikimania-Hackathon-2015: Workshop: Doing Research on Wikimedia things as a volunteer - tools and communities - https://phabricator.wikimedia.org/T91062#1444840 (10yuvipanda) @halfak have scheduled this for wednesday 3pm :) [13:48:36] 6Labs, 10Tool-Labs, 10Wikimania-Hackathon-2015: Conduct a Tool Labs Workshop in Wikimania hackathon - https://phabricator.wikimedia.org/T91061#1444841 (10yuvipanda) I've scheduled this for thursday 3pm. [13:58:49] 6Labs, 10Tool-Labs: Lost connection to MySQL server during query when executing large query's - https://phabricator.wikimedia.org/T105468#1444861 (10Steinsplitter) >>! In T105468#1444809, @jcrespo wrote: > Where $m and $n are variables so you do not read the same events twice. You can store those on a temporar... [14:24:20] 10Tool-Labs-tools-Other, 7Epic: Convert all Labs tools to use cdnjs for static libraries - https://phabricator.wikimedia.org/T103934#1445012 (10Ricordisamoa) [14:28:27] hi guys! i'm having some problems in deleting-creating tables in my datbase. i'm trying: DROP TABLE IF EXISTS u3532__.itwiki_page_groundtruthwithredirects. it says: unknown table. but rightafter i check and it does exist... [14:33:42] YuviPanda: any idea? [14:33:48] marmick: are you looking on the same server? [14:34:10] i always connect to the same server. [14:34:17] i mean, to the one corresponding to the language. [14:34:24] I can see you table with no problems, and your grants are also ok [14:34:36] in this case: itwiki.labsdb [14:35:16] then, the sql query addresses my database: u3532__ [14:35:45] which i could drop and create, but it gives the error of unknown table [14:35:47] it's very weird [14:37:01] so drop works or does it produce an error? [14:39:52] jynus: u seen me reply? [14:40:22] jynus: drop doesn't work. [14:40:26] actually, i use 'drop if exists'. [14:40:41] it does give the error using command line and python script connection [14:43:01] Steinsplitter, I suppose something could be developed, but not for short term [14:43:31] jynus: :'-( every months something is broken because of tools labs infrastructure. [14:45:18] Steinsplitter, resources are limited and there are over 8000 users. I do not even admin labs but was trying to help with I think reasonable suggestions. [14:46:04] wmf has soo a lot of money, that schozldn't be a big deal [14:48:19] Steinsplitter: don't bite the hand that feeds you [14:49:30] valhallasw`cloud: it is ironic, you - the wmf gets money because of volonteers work. [14:50:15] surprise: we don't get paid like the Google admins either [14:51:42] volonteers feeding wmf, so i exept at least a stable infrastructure. [14:52:21] PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:52:27] PROBLEM - Puppet failure on tools-mailrelay-02 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:52:27] PROBLEM - Puppet failure on tools-exec-1409 is CRITICAL 22.22% of data above the critical threshold [0.0] [14:53:32] admins doing labs support are usually not the people making budget decisisons. just sayin' [14:53:44] You're misinterpreting 'a stable infrastructure' as 'I should be able to always run this specific thing'. [14:53:45] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:53:47] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:53:51] so they schould stop biting volonteers. [14:54:04] marmick, can you double check the server ? SELECT @@global.server_id; [14:54:09] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL 33.33% of data above the critical threshold [0.0] [14:54:11] i do understand the issue with stability, but that's why it has been made a goal to make that better [14:54:16] and that you get an error, not a warning? [14:54:32] i am upset, especially because ov valgallasw reply... [14:54:39] that's funny, because I'm not even wmf [14:54:57] PROBLEM - Puppet failure on tools-exec-1410 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:55:15] the wmf provides you with infrastructure, which unfortunately has limits [14:55:29] I can understand it's frustrating if something that works before now is broken [14:55:50] but the problem seems to be the query is too heavy, rather than the infra being a mess [14:55:59] | @@global.server_id | 171975940 | [14:56:01] jynus: [14:56:08] in all likelihood, this query would not have been allowed to run on the actual database servers [14:56:21] (as in: the production ones) [14:56:24] marmick, you are on the right place :-) [14:56:33] because that would impact the wikis themselves [14:57:28] jynus: but still...doesn't work [14:57:35] neither console or python connector [14:57:38] it's weird. it worked before. [14:57:42] i didn't touch anything. [14:57:53] mmm, it smells of corruption [14:58:18] Steinsplitter, I'm getting tired of this. [14:58:22] You realise that labs issues can affect not just volunteers, right? [14:58:50] WMF has engineers which work on labs every day. [14:59:19] marmick, 1) is it MyISAM and 2) can I delete the table, has something valuable? [14:59:53] Don't pretend that random issues with labs are some huge mistreatment of volunteers by WMF [15:00:34] marmick, I can see that it is efectivelly aria [15:01:00] aria and myisam are prone to corruption [15:01:31] 6Labs, 6operations, 10wikitech.wikimedia.org: intermittent wikitech (or nutcracker) failures - https://phabricator.wikimedia.org/T105131#1445155 (10fgiunchedi) [15:01:50] but I think I can delete/repair the table if you confirm me it has no valuable data [15:02:09] We don't have some sort of inherent right to not have issues with using labs [15:02:21] It's technology, sometimes things don't work as we expect, that's reality. [15:03:15] 6Labs, 10wikitech.wikimedia.org, 5Patch-For-Review: remove nutcracker from wikitech - https://phabricator.wikimedia.org/T102993#1445174 (10Andrew) (from IRC): godog: any particular reason why we're special casing wikitech with/without nutcracker btw? andrewbogott: godog: mostly because nutcracker has failed... [15:03:53] ^ marmick [15:04:53] jynus: the table is empty. the one i told you. [15:05:05] DROP TABLE IF EXISTS u3532__.itwiki_page_groundtruthwithredirects [15:05:19] 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-105: Investigate keystone lockups - https://phabricator.wikimedia.org/T104884#1445182 (10Andrew) Current theory is that this was not keystone but rather nutcracker, which is T102993 [15:05:28] 6Labs, 10Labs-Infrastructure: Investigate keystone lockups - https://phabricator.wikimedia.org/T104884#1445184 (10Andrew) [15:05:43] 6Labs, 10wikitech.wikimedia.org, 3Labs-Sprint-105, 5Patch-For-Review: remove nutcracker from wikitech - https://phabricator.wikimedia.org/T102993#1379664 (10Andrew) [15:06:17] jynus: you can delete those with this ending: _page_groundtruthwithredirects [15:06:43] ok, it should work now, i think, care to try? [15:07:26] RECOVERY - Puppet failure on tools-exec-1409 is OK Less than 1.00% above the threshold [0.0] [15:07:32] MariaDB [u3532__]> DROP TABLE IF EXISTS u3532__.itwiki_page_groundtruthwithredirects; [15:07:32] ERROR 1051 (42S02): Unknown table 'u3532__.itwiki_page_groundtruthwithredirects' [15:08:59] give me a sec, there are some issue on prod [15:10:28] 6Labs, 6operations, 10wikitech.wikimedia.org: intermittent nutcracker failures - https://phabricator.wikimedia.org/T105131#1445193 (10fgiunchedi) [15:10:30] 6Labs, 3Labs-Sprint-105: Do a manual backup of labstore1002 - https://phabricator.wikimedia.org/T104882#1445195 (10Andrew) I think this is moot since automatic backups are running now. [15:12:00] Hi I want to get a query from database but it doesn't work i wrote [15:12:02] jsub -once -N zsql -mem 1g sql fawiki_p encat.txt [15:12:05] or [15:12:12] sql fawiki_p encat.txt [15:12:17] 6Labs, 6operations, 10wikitech.wikimedia.org: intermittent nutcracker failures - https://phabricator.wikimedia.org/T105131#1437053 (10fgiunchedi) note some nutcracker problems have been observed in production too in the past, what was the situation there is unknown though ``` #wikimedia-operations_2015-03.l... [15:12:30] both of them doesnt work [15:12:41] what do they do, and what should they do? [15:13:16] Krenair: they will get a query by sql [15:14:09] 6Labs, 10wikitech.wikimedia.org, 3Labs-Sprint-105, 5Patch-For-Review: remove nutcracker from wikitech - https://phabricator.wikimedia.org/T102993#1445219 (10fgiunchedi) To expand on that, I think that the failures we've seen with nutcracker on silver have manifested in production too, see also {T105131} t... [15:15:56] reza1615: ok, and what happens? [15:16:07] no thing [15:16:18] tools.rezabot@tools-bastion-02:~$ jsub -once -N zsql -mem 1g sql fawiki_p encat.txt tools.rezabot@tools-bastion-02:~$ [15:16:24] also when you just run sql fawiki_p encat.txt on the command line? [15:16:44] yes, the jsub one won't work this way, because the piping (> and <) are local, while the jsub submits the job somewhere else [15:17:03] how can i get [15:17:07] doesn't that query return 0 data? [15:17:15] without jsub it doesn't work also [15:17:22] no [15:17:32] it will send you the query [15:17:42] reza1615: with jsub, use jsub -once N zsql -mem 1g -o encat.txt -i /data/project/rexabot/encat.sql sql fawiki_o [15:17:52] reza1615: but sql on the command line should just work [15:17:54] what's the issue there? [15:18:16] do yu know any online nodepade [15:18:26] MariaDB [fawiki_p]> SELECT cl_to FROM categorylinks WHERE cl_from IN (SELECT DISTINCT ll_from FROM langlinks WHERE ll_lang = "fa" ) AND cl_to NOT IN (SELECT DISTINCT page_title FROM langlinks LEFT JOIN page ON page_id = ll_from WHERE ll_lang = "fa" AND page_namespace = 14) GROUP BY cl_to; [15:18:26] Empty set (0.01 sec) [15:18:26] i want to send you the codes [15:19:06] MariaDB [fawiki_p]> works [15:19:09] but [15:19:20] sql fawiki_p encat.txt [15:19:46] doesn't work it worked some month ago [15:21:31] reza1615: as Krenair points out, the query doesn't return anything. [15:21:59] compare echo "select * from page where page_id=0" | sql fawiki_p [15:22:06] to echo "select * from page where page_id=10" | sql fawiki_p [15:22:21] RECOVERY - Puppet failure on tools-exec-1213 is OK Less than 1.00% above the threshold [0.0] [15:22:21] the first one returns nothing, while the second one returns a header and a single entry [15:22:29] RECOVERY - Puppet failure on tools-mailrelay-02 is OK Less than 1.00% above the threshold [0.0] [15:23:06] jynus: i see bigger problems... apparently, the tests i am running are giving wrong results. they depend on other tables from my database which end with _pagelinks. [15:23:11] i'm afraid they got corrupted [15:23:44] RECOVERY - Puppet failure on tools-exec-1215 is OK Less than 1.00% above the threshold [0.0] [15:23:46] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [0.0] [15:24:14] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK Less than 1.00% above the threshold [0.0] [15:24:22] jynus: so, for some I can do the drop-create, but then i get wrong results because of the other table. for others, i cannot even start because of the unknown table. [15:24:41] ok, marmick, please add a ticket with all the details [15:25:00] RECOVERY - Puppet failure on tools-exec-1410 is OK Less than 1.00% above the threshold [0.0] [15:25:05] and we will try to attend it as soon as possible [15:25:15] could you give me the address? [15:25:32] marmick: https://phabricator.wikimedia.org [15:25:34] 6Labs, 3Labs-Sprint-101, 3Labs-Sprint-102, 5Patch-For-Review: Kill off virt1000 - https://phabricator.wikimedia.org/T102005#1445247 (10Andrew) [15:25:35] 6Labs, 3Labs-Sprint-101, 3Labs-Sprint-102, 3Labs-Sprint-105: Sort out remaining virt1000 salt minions - https://phabricator.wikimedia.org/T103010#1445246 (10Andrew) 5Open>3Resolved [15:25:38] yes, go to https://phabricator.wikimedia.org/ [15:26:10] and "create ticket" (you may need to create a user beforehand) [15:26:17] valhallasw`cloud: sql enwiki_p encat.txt [15:26:18] works [15:26:28] marmick: tags would be #labs and #database, and please add jcrespo (= jynus) as cc [15:26:29] but the jsub doesn't work [15:26:38] valhallasw`cloud and kreair: oh, it wasn't meant rude ( i schould add :-) to my comments and improve my english ;o) [15:26:39] yes, please [15:26:40] ok [15:26:42] uff.. [15:26:54] reza1615: yes, because the query does return results for enwiki. [15:27:01] reza1615: as for jsub, see my earlier comment [15:27:11] 17:17 reza1615: with jsub, use jsub -once N zsql -mem 1g -o encat.txt -i /data/project/rexabot/encat.sql sql fawiki_o [15:27:23] (so use -o instead of > and -i instead of < ) [15:27:40] hmm thanks [15:35:59] valhallasw`cloud: I wrote http://kl1p.com/tGym [15:36:08] it shows error [15:36:26] also I replaced -o instead of > and -i instead of < and it doesnt work [15:36:34] reza1615: -N, not N [15:37:04] the -o and -i have to be before the 'sql fawiki_p' because otherwise jsub doesn't know the commands are meant for jsub rather than sql [15:40:05] I think I have a problem with puppet running automatically [15:40:11] it's a self-hosted puppet master [15:40:26] doing this: [15:40:29] E: Some index files failed to download. They have been ignored, or old ones used instead. [15:40:29] W: Failed to fetch http://nova.clouds.archive.ubuntu.com/ubuntu/dists/precise/universe/binary-i386/Packages 404 Not Found [IP: 91.189.91.23 80] [15:42:48] valhallasw`cloud: http://kl1p.com/tGym [15:43:06] 17:36 the -o and -i have to be before the 'sql fawiki_p' because otherwise jsub doesn't know the commands are meant for jsub rather than sql [15:44:21] 6Labs, 7Database: Tables corrupted or impossible to work with them - https://phabricator.wikimedia.org/T105503#1445291 (10marcmiquel) [15:44:34] jynus: https://phabricator.wikimedia.org/T105503 [15:47:54] milimetric: that link on ubuntu.com it is looking for to get package updates, it's really 404. it seems either a problem on their side or they stopped supporting those packages for precise [15:48:47] hmm. no wait [15:48:53] http://nova.clouds.archive.ubuntu.com/ubuntu/dists/precise/universe/binary-i386/ [15:54:32] valhallasw`cloud: I did introduce the problem: https://phabricator.wikimedia.org/T105503 [15:54:38] what should I do now? [15:55:53] marmick: wait :-) jynus will hopefully have time to look at it soon [15:56:16] hopefully today...but during the weekend impossible, right? [15:56:19] valhallasw`cloud: [15:59:44] unfortunately, yeah. [16:03:36] mutante: so I'm not really sure what that means but basically what it looks like is that puppet is not running automatically and /var/log/puppet.log shows that error over and over [16:05:19] milimetric: it's not puppet itself, it's that it's trying to automatically upgrade packages and something needs to be updated about the sources list it is checking to get upgrades from [16:06:11] milimetric: i'd try first to see if it goes away when you comment a line in the file that configures these [16:06:42] in /etc/apt/sources.list.d [16:06:59] there should be files, one of them containing that URL that shows up in the error you pasted [16:07:12] try to comment out that one line, save the file, then run apt-get update [16:07:16] then puppet again [16:18:39] 6Labs: Create plagiabot labs project - https://phabricator.wikimedia.org/T105442#1445423 (10eranroz) 5Open>3Resolved [16:19:07] 6Labs: Create plagiabot labs project - https://phabricator.wikimedia.org/T105442#1443951 (10eranroz) different account in tool labs is just fine. I created one. thanks. [16:32:25] hi [16:33:29] Could somebody help me with one question. I have generated a public key and uploaded it on the OpenStack [16:33:39] but I cant connect [16:33:51] Its pending when I try to do [16:34:03] Ruths-MacBook-Pro:.ssh ruthgarcia$ ssh ruthgavi@tools-login.wmflabs.org The authenticity of host 'tools-login.wmflabs.org (208.80.155.130)' can't be established. ECDSA key fingerprint is SHA256:OfgR6GTw8ObBQ1LbS+6NBVik1eEXrpSUvRkKOueUnQc. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'tools-login.wmflabs.org,208.80.155.130' (ECDSA) to the list of known hosts. [16:34:24] It doesnt ask me for password [16:34:27] nothing [16:35:19] Hello [16:36:28] mutante: that worked, thank you, puppet runs clean now [16:38:11] :) [16:42:15] Ruthgavi: are you a member of a project? [16:42:23] yes [16:42:41] ok, let me check, one moment... [16:42:48] When I use the ssh-v [16:42:53] it hangs in [16:43:02] debug1: SSH2_MSG_SERVICE_ACCEPT received [16:43:11] then it does not move forward anymore [16:43:17] for the password [16:43:33] It definitely should not prompt you for a password, that’s what the keypair is for. [16:43:59] Ah ok [16:44:03] but then [16:44:15] it still hangs [16:44:23] debug1: SSH2_MSG_SERVICE_REQUEST sent debug1: SSH2_MSG_SERVICE_ACCEPT received [16:44:38] does not move from here :( [16:44:44] Yesterday I was able to connect [16:44:55] andrewbogott: YuviPanda: fyi, not too bad: a rsync to catch tools up to the last backup = 22h [16:44:58] Ruthgavi: it looks to me like your username is ruthgavi and not ruthgarcia [16:45:04] ±1h [16:45:13] oh, wait, that’s what you did, sorry [16:45:39] yes [16:45:42] I did [16:45:45] ssh -v ruthgavi@tools-login.wmflabs.org [16:45:46] yeah, I see your logins from yesterday. [16:45:52] Can you ping tools-login.wmflabs.org? [16:46:12] I did [16:46:34] Ruths-MacBook-Pro:PageViews ruthgarcia$ ping tools-login.wmflabs.org PING tools-login.wmflabs.org (208.80.155.130): 56 data bytes 64 bytes from 208.80.155.130: icmp_seq=0 ttl=48 time=96.272 ms 64 bytes from 208.80.155.130: icmp_seq=1 ttl=48 time=91.580 ms [16:46:54] ok [16:48:03] Ruthgavi: please ssh again once more while I watch the log? [16:48:20] ok [16:48:33] I just did it [16:48:34] ssh -v ruthgavi@tools-login.wmflabs.org [16:49:10] do you think its something with the IP address? [16:51:03] Ruthgavi: just for an additional test, can you try ssh ruthgavi@bastion.wmflabs.org? [16:51:20] ok [16:51:45] it still hangs [16:52:05] andrewbogott: it still hangs [16:52:38] Yeah, same behavior there. [16:53:23] I don’t know what’s happening. Please try ssh -vvv ruthgavi@tools-login.wmflabs.org, wait a good long time, and then paste the total output to dpaste.org or pastebin or something? [16:53:46] Also do you know what version of OSX you’re using? [16:54:08] Ruthgavi: can you try to run "unset SSH_AUTH_SOCK" and then ssh again? I had a similar problem on OSX before (basically ssh-agent not working properly) [16:55:17] yosemite [16:55:34] ok give me some seconds [16:55:53] its strange because yesterday it worked [16:56:33] ooooooh! [16:56:45] sitic: thank you! [16:57:09] the secret solution is "unset SSH_AUTH_SOCK" [16:57:27] thank you andrewbogott [16:57:32] as well [16:57:33] sitic: would you be willing to add a little note about that here? https://wikitech.wikimedia.org/wiki/Help:Access#Troubleshooting I run macos but I’ve never seen that. [16:57:40] Ruthgavi: glad you’re unstuck [16:57:46] thanks! [16:57:49] yes do that! [16:57:59] thanks [17:01:42] andrewbogott: I'm not sure how common that issue is, I had it a year ago (not sure what caused it, if I remember corrently ssh-agent didn't ended up in some strange state and just blocked on new connections). I'll leave a short note there [17:01:52] thanks. [17:01:55] Twice is common enough :) [17:05:22] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL 30.00% of data above the critical threshold [0.0] [17:05:39] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 40.00% of data above the critical threshold [0.0] [17:06:01] PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL 20.00% of data above the critical threshold [0.0] [17:07:18] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1210 is CRITICAL 44.44% of data above the critical threshold [0.0] [17:20:59] RECOVERY - Puppet failure on tools-bastion-02 is OK Less than 1.00% above the threshold [0.0] [17:32:21] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [0.0] [17:35:23] RECOVERY - Puppet failure on tools-exec-catscan is OK Less than 1.00% above the threshold [0.0] [17:35:45] RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0] [18:45:36] PROBLEM - Puppet failure on tools-static-01 is CRITICAL 20.00% of data above the critical threshold [0.0] [18:48:54] PROBLEM - Puppet failure on tools-static-02 is CRITICAL 50.00% of data above the critical threshold [0.0] [18:53:35] labs-morebots, everything good? [18:53:35] I am a logbot running on tools-exec-1205. [18:53:35] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [18:53:36] To log a message, type !log . [18:53:39] !log testlabs logging a test [18:53:42] Logged the message, dummy [18:56:03] labs-morebots, you’re updated! How does it feel? [18:56:03] I am a logbot running on tools-exec-1211. [18:56:03] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [18:56:04] To log a message, type !log . [18:56:10] !log testlabs this will fail [18:57:45] valhallasw`cloud: you’re working on the bot code right now? [18:57:51] andrewbogott: in ~/src [18:57:59] which is independent from everything else [18:58:02] ok [18:58:11] so feel free to start/restart/deploy everything debian [18:58:12] I’ll just wait for you to fix it then [18:58:21] PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL 66.67% of data above the critical threshold [0.0] [18:58:44] valhallasw`cloud: does anyone else can check the database problem I found besides jynus (who is away for 4h)? [18:58:54] marmick: I don't know. [18:59:38] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1446188 (10Dzahn) re-included 1.7.9 in APT repo [18:59:53] PROBLEM - Puppet failure on tools-master is CRITICAL 80.00% of data above the critical threshold [0.0] [19:00:03] RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0] [19:03:56] RECOVERY - Puppet failure on tools-static-02 is OK Less than 1.00% above the threshold [0.0] [19:03:56] right, so the issue is it tries to find a year in July 10 [19:04:00] RECOVERY - Puppet failure on tools-exec-cyberbot is OK Less than 1.00% above the threshold [0.0] [19:04:06] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [0.0] [19:05:36] RECOVERY - Puppet failure on tools-static-01 is OK Less than 1.00% above the threshold [0.0] [19:13:05] YuviPanda: andrewbogott: FYI, started backup of all filesystems to 2001. Let's see how long that takes (all three are done in parallel) [19:13:27] (in a screen session on 1002) [19:17:04] 6Labs, 3Labs-Sprint-105, 5Patch-For-Review: Automate snapshots / backups of labstore - https://phabricator.wikimedia.org/T105027#1446208 (10coren) All three are being backed up to 2001 as of 07/10 19:10 in a screen session on labstore1002. [19:39:55] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [19:55:48] !log tools puppet agent -tv'ing all exec-12* hosts for adminbot update [19:58:10] hm, maybe I should first do an apt-get update [19:58:11] bah [19:59:55] PROBLEM - Puppet failure on tools-static-02 is CRITICAL 20.00% of data above the critical threshold [0.0] [20:00:23] valhallasw`cloud: i think you guys are doing the same thing twice [20:00:30] andrewbogott: [20:00:49] yeah, valhallasw`cloud, I’m doing this already :) [20:00:50] does it see 1.7.10 yet? [20:00:55] PROBLEM - Puppet failure on tools-master is CRITICAL 30.00% of data above the critical threshold [0.0] [20:01:03] in some places, there's a few README failures [20:01:08] but I'll let andrewbogott do it [20:01:30] we should fix that one too.. hmmm [20:01:37] PROBLEM - Puppet failure on tools-static-01 is CRITICAL 33.33% of data above the critical threshold [0.0] [20:01:37] in the postinst script [20:02:59] -1213 is still on .9, the rest is on .10 [20:03:24] https://gerrit.wikimedia.org/r/#/c/224176/ [20:03:33] we should have done this before the version bump, heh [20:03:55] oh well [20:03:55] PROBLEM - Puppet failure on tools-exec-1206 is CRITICAL 50.00% of data above the critical threshold [0.0] [20:04:36] PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL 30.00% of data above the critical threshold [0.0] [20:05:33] so is that .11 now ? :p [20:05:50] or i can cheat and remove it from reprepro and readd it [20:06:00] but then we also have to do that on the nodes [20:06:06] .10 is installed in most places, so that's probably a bad idea [20:06:42] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 50.00% of data above the critical threshold [0.0] [20:06:52] but yeah, let's just fix it now [20:06:58] then it's not such an issue the next time [20:07:13] ok, it's merged for next time [20:07:36] I'm touching the READMEs for now [20:08:06] ok [20:08:49] wat. [20:08:59] labs-morebots: sup. [20:08:59] I am a logbot running on tools-exec-1204. [20:08:59] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [20:09:00] To log a message, type !log . [20:09:15] !log tools it took three of us, but adminbot is updated! [20:09:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [20:09:24] ooh, it did something :) [20:09:28] \o/ [20:09:43] and it's in recent changes! :-) [20:09:57] I’ll restart the others [20:10:05] you know the qmod -rj trick? [20:10:09] :) [20:10:19] labs-morebots: version [20:10:19] I am a logbot running on tools-exec-1204. [20:10:19] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [20:10:19] To log a message, type !log . [20:10:21] no, does that restart? [20:10:28] we should have a version command :) [20:10:40] qmod -rj restarts the sge job, yeah [20:10:43] is that logging to SAL or Tools/SAL? :/ [20:10:55] well, that would be easier than what I’m doing [20:11:19] so just qmod -rj 3090 3104 3105 3118 558781b [20:11:21] -b [20:11:45] mutante: ctcp version is implemented, but of course doesn't know the debian version :-) [20:11:58] Krenair: Tools/SAL, it's the labs log bot which does per-project logging [20:12:19] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [20:12:20] :p [20:12:27] oh, right [20:12:30] that's just silly [20:14:36] RECOVERY - Puppet failure on tools-exec-1211 is OK Less than 1.00% above the threshold [0.0] [20:15:57] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [20:16:14] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1446299 (10Dzahn) 1.7.10 now after fixes by valhallasw https://gerrit.wikimedia.org/r/#/q/project:operations/debs/adminbot,n,z [20:16:35] RECOVERY - Puppet failure on tools-static-01 is OK Less than 1.00% above the threshold [0.0] [20:18:55] RECOVERY - Puppet failure on tools-exec-1206 is OK Less than 1.00% above the threshold [0.0] [20:31:56] PROBLEM - Puppet failure on tools-master is CRITICAL 60.00% of data above the critical threshold [0.0] [20:34:54] RECOVERY - Puppet failure on tools-static-02 is OK Less than 1.00% above the threshold [0.0] [20:41:45] RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0] [20:41:56] is there any way to create per-host configuration for labs hiera without patching the puppet tepo? [20:42:18] s/tepo/repo/ [20:43:23] RECOVERY - Puppet failure on tools-exec-1213 is OK Less than 1.00% above the threshold [0.0] [20:46:59] SMalyshev: not currently. yuvi is working to make it possible to do it on wikitech [20:49:42] valhallasw`cloud: https://gerrit.wikimedia.org/r/#/c/224182/2/README [21:06:57] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [21:26:16] getting this error on labs: Error: Could not retrieve catalog from remote server: Error 400 on SERVER: No such file or directory - /etc/puppet/modules/labstore/files/projects-nfs-config.yaml at /etc/puppet/manifests/role/labs.pp:50 on node db01.wikidata-query.eqiad.wmflabs [21:26:31] anybody knows what's up with this? [21:28:56] file projects-nfs-config.yaml doesn't seem to exist [21:36:05] 6Labs, 6operations: puppet error when trying to update labs host - https://phabricator.wikimedia.org/T105556#1446454 (10Smalyshev) 3NEW a:3yuvipanda [22:14:47] 6Labs, 6operations: puppet error when trying to update labs host - https://phabricator.wikimedia.org/T105556#1446527 (10yuvipanda) Do service puppetmaster restart on your puppetmaster and try again? [22:18:58] 6Labs, 6operations: puppet error when trying to update labs host - https://phabricator.wikimedia.org/T105556#1446528 (10Smalyshev) 5Open>3Resolved Yay! That helped. Sorry, should have thought about restarting it. [22:52:30] SMalyshev: Is that in a project with a self-hosted puppet master? I think I fixed a similar error in beta cluster a couple of days ago by restarting the puppet master there. It was running with stale config [22:52:48] bd808: yes, restart fixed it, thanks [22:53:10] usually it works without restarting, so I didn't think about it... [22:53:25] We should maybe amend the auto-update stuff to restart the puppet master [22:53:47] yeah it is mostly not needed except when things that effect the puppet master itself are changed [22:55:40] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 20.00% of data above the critical threshold [0.0] [22:56:40] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 30.00% of data above the critical threshold [0.0] [22:58:12] PROBLEM - Puppet failure on tools-exec-1404 is CRITICAL 22.22% of data above the critical threshold [0.0] [22:58:22] PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL 33.33% of data above the critical threshold [0.0] [22:59:44] PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL 50.00% of data above the critical threshold [0.0] [23:01:00] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1406 is CRITICAL 50.00% of data above the critical threshold [0.0] [23:02:52] PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL 20.00% of data above the critical threshold [0.0] [23:04:17] PROBLEM - Puppet failure on tools-webgrid-generic-1404 is CRITICAL 44.44% of data above the critical threshold [0.0] [23:04:21] PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 66.67% of data above the critical threshold [0.0] [23:04:59] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1405 is CRITICAL 20.00% of data above the critical threshold [0.0] [23:05:01] PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL 40.00% of data above the critical threshold [0.0] [23:06:23] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL 40.00% of data above the critical threshold [0.0] [23:09:19] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL 66.67% of data above the critical threshold [0.0] [23:10:11] PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL 50.00% of data above the critical threshold [0.0] [23:11:17] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 22.22% of data above the critical threshold [0.0] [23:11:35] PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL 30.00% of data above the critical threshold [0.0] [23:12:00] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL 30.00% of data above the critical threshold [0.0] [23:15:56] PROBLEM - Puppet failure on tools-exec-1410 is CRITICAL 40.00% of data above the critical threshold [0.0] [23:16:57] PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL 40.00% of data above the critical threshold [0.0] [23:17:23] PROBLEM - Puppet failure on tools-webgrid-generic-1402 is CRITICAL 22.22% of data above the critical threshold [0.0] [23:17:33] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 30.00% of data above the critical threshold [0.0] [23:17:57] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1408 is CRITICAL 40.00% of data above the critical threshold [0.0] [23:18:27] PROBLEM - Puppet failure on tools-exec-1409 is CRITICAL 66.67% of data above the critical threshold [0.0] [23:19:47] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1409 is CRITICAL 60.00% of data above the critical threshold [0.0] [23:24:46] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1401 is CRITICAL 60.00% of data above the critical threshold [0.0] [23:25:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL 22.22% of data above the critical threshold [0.0] [23:25:48] I'm away from my laptop... does someone know what's up the the pile of puppet alerts I just got? [23:26:04] Coren for example? [23:28:36] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 55.56% of data above the critical threshold [0.0] [23:29:56] mobileandrew: what do they say? [23:30:17] I'm guessing they say the same thing as we just got in the channel? [23:30:32] Can someone confirm that nfs still works on tools? Then I'll just go to my movie and not worry. [23:30:50] Kenair, yep. [23:30:54] Krenair: oh, it's my client ignoring the bot [23:31:30] I created /data/project/alex/test [23:31:35] and didn't see any issues [23:31:48] Ok, great. Thank you! [23:31:50] nothing weird on the nfs ganglia graph, which is what I've seen during outages [23:32:21] (I mean, during nfs outages the graphs showed something was clearly wrong. they don't right now.) [23:32:43] So probably it really is just puppet. That can wait. [23:32:45] no one has been yelling about labs being broken [23:33:19] Yeah, I need to spruce up my mobile client so I can see the back scroll. [23:33:39] question: if I edit stuff in my hieradata/labs/ and puppet is ignoring it, what am I missing? Is it supposed to work? [23:38:38] its' adminbot :/ [23:39:04] nobody knows why it decided to change this right now [23:39:25] but the fail is like earlier [23:39:57] affects the trusty hosts while precise are actually running it [23:42:22] fixes tools-webgrid-lighttpd-1404 [23:46:00] 6Labs, 10Tool-Labs: Cron job updating file running mysql fails with no output - https://phabricator.wikimedia.org/T105565#1446795 (10Rillke) 3NEW [23:47:15] SMalyshev: off the top of my head I don't know why it would be ignored. You can look at hieradata/labs/deployment-prep to see some things that I know work [23:47:31] 6Labs, 10Tool-Labs: Cron job updating file running mysql fails with no output - https://phabricator.wikimedia.org/T105565#1446818 (10Rillke) Is it necessary to run the mysql client on the grid? It doesn't have to do any heavy processing work.... [23:47:34] I think I dounf the reason... it's .yaml not .yml [23:47:54] the Stupid is strong with me today... [23:48:27] puppet kills braincells [23:48:46] like huffing gold spraypaint [23:50:41] 6Labs, 10Tool-Labs: Lost connection to MySQL server during query when executing large query's - https://phabricator.wikimedia.org/T105468#1446824 (10Rillke) >>! In T105468#1444809, @jcrespo wrote: > > What I can recommend you is: > ... > ``` > EXPLAIN SELECT DISTINCT rc_user_text, Without the EXPLAIN of cour... [23:55:09] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK Less than 1.00% above the threshold [0.0] [23:59:02] !log fixing puppet runs on tools-exec via salt [23:59:03] fixing is not a valid project. [23:59:09] !log tools fixing puppet runs on tools-exec via salt [23:59:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master