[01:17:04] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [01:35:36] (03CR) 10Ricordisamoa: [C: 04-1] [WIP] README.rst and Sphinx documentation (031 comment) [labs/tools/ptable] - 10https://gerrit.wikimedia.org/r/333367 (https://phabricator.wikimedia.org/T99847) (owner: 10Ricordisamoa) [02:08:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [03:13:01] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:39:00] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [05:33:01] PROBLEM - Puppet staleness on tools-worker-1014 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [43200.0] [05:44:00] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:40:02] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [06:40:42] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:41:08] RECOVERY - Free space - all mounts on tools-exec-1221 is OK: OK: tools.tools-exec-1221.diskspace._public_dumps.byte_percentfree (No valid datapoints found) [07:01:15] 06Labs, 10Tool-Labs: lighttpd does not correctly close connections (CLOSE_WAIT) - https://phabricator.wikimedia.org/T104799#2960502 (10gstrauss-wiki) FYI: lighttpd 1.4.45 is in sid unstable https://buildd.debian.org/status/package.php?p=lighttpd&suite=sid and should be in stretch in another week or so. lightt... [07:11:24] 06Labs, 10Tool-Labs: lighttpd does not correctly close connections (CLOSE_WAIT) - https://phabricator.wikimedia.org/T104799#1427257 (10zhuyifei1999) >>! In T104799#2960502, @gstrauss-wiki wrote: > Alternatively, you can use Debian packages from stbuehler (another lighttpd maintainer) > https://build.opensuse.o... [07:12:17] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Tools still shaky; DB replicas to blame? - https://phabricator.wikimedia.org/T127940#2059161 (10zhuyifei1999) Is this still an issue? [07:12:22] 06Labs, 10Tool-Labs, 10DBA: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2960511 (10Marostegui) We cannot recover the passwords, as it is hashed. Probably the best shot here is to reset the password and regenerate your .my.cnf. I am sure that is done with a scri... [07:15:04] <_joe_> !log deployment-prep cherry-picking the move of base to profile::base [07:15:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [07:15:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:42] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [08:16:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [08:57:05] 06Labs, 10Tool-Labs, 10DBA: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2960566 (10yuvipanda) 05Open>03Resolved a:03yuvipanda Users of the pXXXgXXX variety have been depreacted for a long time now. I see there's a replica.my.cnf file for tb-dev with credent... [08:57:47] 06Labs, 10Tool-Labs, 10DBA: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2960569 (10Marostegui) Thanks Yuvi!! [08:57:59] 06Labs, 10Tool-Labs, 10Tools-Kubernetes: Move kubernetes authentication to using X.509 client certs - https://phabricator.wikimedia.org/T144153#2960570 (10yuvipanda) a:05yuvipanda>03None [09:10:12] hashar: ping [09:11:14] friendly12345: pong [09:13:15] hashar: I noticed that you added me to some kind of whitelist, but I'm not sure what that is. Where do I look at the output of a Puppet run on Jenkins to determine that say, the result is a noop? [09:14:04] friendly12345: the whitelist is so you get some more Jenkins jobs triggered when you propose patches in Gerrit [09:14:20] to check whether the result is a noop, one needs access to the puppet catalog compiler [09:15:14] hashar: Oh. Is that a log-into-a-server thing and not a Jenkins job? [09:16:57] friendly12345: that is a jenkins job, but it runs on a specific slave [09:17:02] that has access to some production data [09:18:39] * friendly12345 takes a look at Jenkins [09:20:34] friendly12345: just spoke to ops people. It is too easy to screw up the compiler so for now access to it is restricted :] [09:20:49] then I trust ops to carefully review the change and pass them via the compiler if need be :] [09:22:36] hashar: Okay. [09:24:17] hashar: Is there anything that can be done about the multiple 'Either compilation failed or puppetmaster has issues' messages that happen pretty frequently in the wikimedia-operations channel? [09:30:11] 06Labs, 06Operations, 07kubernetes: docker-engine pulled into our repositories only keeps the latest version - https://phabricator.wikimedia.org/T153416#2960621 (10akosiaris) Yeah, I thought about the section workaround too, but at best it's a hack. And tbh, I am not in love with the idea of a "labs" section... [09:39:47] friendly12345: it's often just the puppetmaster restarting though [09:40:09] friendly12345: I saw your toollabs lint patches! thank you for making them! I've set aside some time in an hour or so to review and hopefully merge them. [09:40:09] apologies for taking the time [09:40:56] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Tools still shaky; DB replicas to blame? - https://phabricator.wikimedia.org/T127940#2960636 (10Magnus) 05Open>03Resolved a:03Magnus In the year (!) since I filed this, I have rewritten CatScan2 as PetScan. The "unknown" server seems to have vanished.... [09:41:32] yuvipanda: The aim is to make it easier migrate to newer Puppet versions (4) once the linting is fixed up and enforced with a puppet-lint plugin [09:41:54] * yuvipanda nods [09:50:26] bd808: ping [10:23:41] friendly12345: I've cherry-picked both your toollabs patches onto tools puppetmaster, and reviewed them (look ok). I'll wait for a full puppet run cycle, and merge if there's no breakage [10:27:01] yuvipanda: okay [10:45:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:57:11] friendly12345: merged 'em both [11:41:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:42:41] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [12:10:53] !log video starting v2c worker at encoding03 due to flooding [12:10:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Video/SAL [12:17:42] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [12:37:10] PROBLEM - Free space - all mounts on tools-exec-1221 is CRITICAL: CRITICAL: tools.tools-exec-1221.diskspace._public_dumps.byte_percentfree (No valid datapoints found)tools.tools-exec-1221.diskspace.root.byte_percentfree (<22.22%) [12:38:03] 06Labs, 10Tool-Labs, 10DBA: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2961051 (10Tb) Okay ta. Can you GRANT ALL to s51111 on the databases below please. I'll migrate them to the proper names over the next few weeks and raise a new ticket to drop p50380g50491'... [12:38:44] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [12:41:05] 06Labs, 10Tool-Labs, 10DBA: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2961052 (10yuvipanda) 05Resolved>03Open [13:13:00] RECOVERY - Puppet staleness on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [3600.0] [13:18:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [13:18:51] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Iimog was created, changed by Iimog link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Iimog edit summary: Created page with "{{Tools Access Request |Justification=hosting wikidata games |Completed=false |User Name=Iimog }}" [13:40:50] 06Labs, 10DBA, 06Operations, 10netops: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961118 (10Marostegui) [13:57:02] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Puppet fails on integration instances: nfs_mount[home-on-labstoresvc]: umount: /home: not mounted - https://phabricator.wikimedia.org/T155820#2955587 (10hashar) p:05Triage>03High [14:09:42] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:00:46] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2961338 (10chasemp) [15:07:19] PROBLEM - Free space - all mounts on tools-worker-1014 is CRITICAL: CRITICAL: tools.tools-worker-1014.diskspace._var_lib_docker.byte_percentfree (No valid datapoints found)tools.tools-worker-1014.diskspace.root.byte_percentfree (<100.00%) [15:09:41] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:24:48] 06Labs, 10DBA, 06Operations, 10netops: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961118 (10Marostegui) [15:30:34] 06Labs, 10Labs-Infrastructure, 06Operations, 10ops-eqiad, 07Wikimedia-Incident: Replace fans (or paste) on labservices1001 - https://phabricator.wikimedia.org/T154391#2961504 (10Cmjohnson) @andrew any other issues with this? Can we close the task? [15:44:33] 06Labs, 10DBA, 06Operations, 10netops: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961547 (10Marostegui) [15:59:15] 06Labs, 10Labs-Infrastructure, 06Operations, 07Wikimedia-Incident: labservices1001 down, suspected overheating - https://phabricator.wikimedia.org/T152340#2961652 (10Andrew) [15:59:18] 06Labs, 10Labs-Infrastructure, 06Operations, 10ops-eqiad, 07Wikimedia-Incident: Replace fans (or paste) on labservices1001 - https://phabricator.wikimedia.org/T154391#2961650 (10Andrew) 05Open>03Resolved The box has been solid since you worked on it. Unfortunately, the issue we seek to fix is VERY r... [16:01:07] TabbyCat: pong [16:01:26] bd808: it's about the centralnotice banner of the technical code of conduct [16:01:34] it wasn't running properly [16:01:41] first, it had no banner assigned [16:01:53] second, the banner had no display settings [16:02:07] but I think I managed to fix it and should be seen on meta [16:02:45] I still can't see it probably because of the impression diet, but still, it should be working [16:02:46] I wouldn't have been of any help with centralnotice anyway :) [16:02:59] except pointing out some other people to ask for help [16:03:05] no, it was just so you were aware [16:03:17] as I guess you're the person in charge of that Code [16:03:22] or so I thought [16:03:55] TabbyCat: hi! actually I work a lot on CentralNotice :) [16:04:03] matt_flaschen has been the main CoC driver [16:04:24] Ah right Code or Code [16:05:31] Also, for community CentralNotice campaigns, you could talk to Seddon. He's apparently online (dunno if he's at this computer) on #wikimedia-fundraising [16:05:42] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:06:41] Also to see what happened centralnoticewise in a browser, you can go to a console and say mw.centralNotice.data [16:08:30] apparently we had some problems with the steward elections banner not displaying, later displaying, and now sort of [16:08:56] it's all fixed so I don't think we have to poke anyone now imho :) [16:16:18] 06Labs: Request increased quota for etytree labs project - https://phabricator.wikimedia.org/T156021#2961737 (10Epantaleo) [16:22:08] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Iimog was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1374289 edit summary: [16:22:18] RECOVERY - Free space - all mounts on tools-worker-1014 is OK: OK: tools.tools-worker-1014.diskspace._var_lib_docker.byte_percentfree (No valid datapoints found) [16:40:30] RECOVERY - Puppet run on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [16:43:13] yuvipanda: are you a listmod to the labs mailing list? [16:44:02] Cyberpower678: depends on what you are asking that for [16:44:16] I sent a response and it got stuck in the queue. [16:46:30] yuvipanda: ^ [16:47:09] Cyberpower678: are you sure? I just saw a message from you [16:47:51] Your mail to 'Labs-l' with the subject Re: [Labs-l] [Wikimedia Labs][Announce] NFS (only labs projects) [16:47:52] maintenance on 2017-01-18 Is being held until the list moderator can review it for approval. [16:48:07] Maybe someone approved already. [16:48:37] yuvipanda: anyway, that migration was a complete disaster on the cyberbot-exec-01 project [16:48:48] bd808: thanks for the thorough amazing review! Will work on improving the patch these days. [16:49:03] I just found out there's still damage on that project needing fixing. [16:53:21] 06Labs, 10Analytics, 10Pageviews-API, 10wikitech.wikimedia.org: wikitech.wikimedia.org missing from pageviews API - https://phabricator.wikimedia.org/T153821#2961864 (10Milimetric) [17:02:55] 06Labs: Request increased quota for etytree labs project - https://phabricator.wikimedia.org/T156021#2961910 (10Andrew) Can you please clarify if you need more CPUs or more RAM, or both, or something else? 'Giga of CPU' doesn't make sense. [17:03:12] Cyberpower678: the mod message you got was because your email client replied to both the labs-l list (which went through) and the labs-announce list which got held. [17:03:41] also, you will catch more flies with honey than vinegar [17:09:45] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:10:43] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:12:41] bd808: I did get a message saying that message to labs-l got held too. I got three hold messages. [17:12:43] bd808: what do you mean? [17:13:12] !log tools reboot tools-exec-1411 as having serious transient issues [17:13:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:20:22] andrewbogott: this is an irc line marker because I've looked through historical puppet alerts and I'm not sure there a common thread, and the base conversion masks some things a bit but after this line we should investigate "flapping" nodes a bit :) [17:23:53] PROBLEM - Puppet run on tools-exec-1220 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:26:31] ^ andrewbogott ran fine at for me but took 68.93 seconds. A lot of time spend on apt-cache and /usr/bin/dpkg-query [17:28:08] Cyberpower678: when you say ' including the .my.cnf file' is missing still what project do you mean? [17:28:32] Cyberbot-exec-01 [17:28:40] is that an instance or a project? [17:28:48] chasemp: an instance. [17:28:53] what project is that? [17:28:59] Cyberbot [17:29:16] 06Labs: Request increased quota for etytree labs project - https://phabricator.wikimedia.org/T156021#2962094 (10Epantaleo) I only need more CPU. Does it make sense? Thanks [17:29:35] Ok, we've never (that I know of) published my.cnf files for teh labsdb's outside of Tools, usually people copy over their existing when they migrate out of Tools for capacity or something [17:29:42] chasemp: for some reason almost all of my bot scripts were gone, the rest had weird permissions or weird owner ids. [17:29:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [17:30:04] do you recall possibly copying the replica.my.cnf file out of tools when you moved into that project? [17:30:11] chasemp: it was copied but I didn't think it would get deleted. [17:31:17] it certainly should not have, and it being not recovered isn't good either, but for getting back to functional, can you try copying the creds from the tools project dir? [17:33:02] Everything is working now, it was just a pain to fix it. As for the cnf file I copied it back into the instance and everything is working correctly now. [17:33:37] ah ok, good. I didn't get that impression from the email. Understood that it was a pain, I'm sorry things went awry and you were effected. [17:33:52] RECOVERY - Puppet run on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0] [17:38:20] chasemp: at the time of writing it was still missing and then I realized I'm using my tool labs cnf to access the DB. It's no big deal, but I would like to know what caused it. [17:39:30] I think the incident report is still in progress but the crux of it is that a err in puppet config cause puppet to archive home directories, and then when we recovered everything not all the contents seem to have been archived correctly by puppet [17:41:38] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962172 (10MoritzMuehlenhoff) I've removed the precise instance of the debdeploy project. [17:41:57] chasemp: interesting. Thanks. :-) [17:56:07] 10Tool-Labs-tools-LTA-Knowledgebase: Automatically fail authentication for any password exceeding 4096 bytes - https://phabricator.wikimedia.org/T155946#2962247 (10DatGuy) a:03DatGuy [17:56:47] 10Tool-Labs-tools-LTA-Knowledgebase: Automatically fail authentication for any password exceeding 4096 bytes - https://phabricator.wikimedia.org/T155946#2959758 (10DatGuy) [17:56:49] 10Tool-Labs-tools-LTA-Knowledgebase: Fix password hashing - https://phabricator.wikimedia.org/T155934#2962250 (10DatGuy) [18:04:27] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962275 (10Andrew) Sent an updated nag email to labs-announce, with an updated list of user-controlled instances. 26 to go. [18:22:13] chasemp: your avatar (the bird one) that gmail shows me reminds me of an inactive dev [18:24:54] zhuyifei1999_: I'm not sure I know what bird one you mean :) [18:25:50] https://plus.google.com/u/0/_/focus/photos/public/AIbEiAIAAABDCIjR5OrKltahKCILdmNhcmRfcGhvdG8qKGUzNzE0MGQ5ZGMzODRlMmU5YTQ5M2ZjMDJhMTZjZDRiMzc2ZmE2NzQwAbqYjujHmjM8klsEyHDS0zwAYqN0?sz=64 [18:26:13] ah heh [18:27:20] chasemp: https://phabricator.wikimedia.org/p/Rillke/ <= /me miss him :/ and he also use a bird avatar [18:27:57] ah yeah, I've interacted w/ rillke briefly a few times :) [18:28:23] oh [18:31:47] 06Labs: Request increased quota for etytree labs project - https://phabricator.wikimedia.org/T156021#2962380 (10Andrew) Not really... can you tell me what instance flavor you'd like to add to the project? I can figure out the CPU/RAM needs from there. [18:34:44] 06Labs: Request increased quota for etytree labs project - https://phabricator.wikimedia.org/T156021#2962409 (10Epantaleo) Oh, ok now I get it - sorry for the misunderstanding - I need 10G of hard disk memory (not RAM), my instance is etytree-b.etytree.eqiad.wmflabs [19:07:05] 06Labs, 10Analytics, 10Pageviews-API, 10wikitech.wikimedia.org: wikitech.wikimedia.org missing from pageviews API - https://phabricator.wikimedia.org/T153821#2962511 (10Krenair) @milimetric, you sure that's a duplicate? [19:12:55] 06Labs, 10Tool-Labs, 10Mail: Attempt to clone tools-mail and dist-upgrade it - https://phabricator.wikimedia.org/T156051#2962529 (10yuvipanda) [19:13:31] 06Labs, 10Tool-Labs, 10Mail: Attempt to clone tools-mail and dist-upgrade it - https://phabricator.wikimedia.org/T156051#2962529 (10yuvipanda) [19:24:57] 06Labs: Request increased quota for etytree labs project - https://phabricator.wikimedia.org/T156021#2962592 (10Andrew) 05Open>03Resolved Ah! Disk space doesn't require a quota change, there was already 80G dedicated to your instance. I've added the puppet class role::labs::lvm::srv to your instance which... [19:25:00] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2962594 (10Andrew) [19:35:30] !log tools depool tools-webgrid-lighttpd-1201 for snapshotting tests [19:35:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:36:12] !log tools temporarily shutting down tools-webgrid-lighttpd-1201 [19:36:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:37:25] 06Labs, 10Analytics, 10Pageviews-API, 10wikitech.wikimedia.org: wikitech.wikimedia.org missing from pageviews API - https://phabricator.wikimedia.org/T153821#2962736 (10Milimetric) Oh, no, my fault, I got confused by the assumption around https://wikitech.wikimedia.org/api/rest_v1/ not working. That part... [19:37:31] 06Labs, 10Analytics, 10Pageviews-API, 10wikitech.wikimedia.org: wikitech.wikimedia.org missing from pageviews API - https://phabricator.wikimedia.org/T153821#2962738 (10Milimetric) 05duplicate>03Open [19:39:31] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:42:13] PROBLEM - Host tools-webgrid-lighttpd-1201 is DOWN: CRITICAL - Host Unreachable (10.68.18.45) [19:42:39] PROBLEM - Puppet run on tools-mail-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [19:43:05] hmm [19:49:30] RECOVERY - Host tools-webgrid-lighttpd-1201 is UP: PING OK - Packet loss = 0%, RTA = 1.51 ms [19:52:00] 06Labs, 10Analytics, 10Pageviews-API, 10wikitech.wikimedia.org: wikitech.wikimedia.org missing from pageviews API - https://phabricator.wikimedia.org/T153821#2962803 (10Krenair) Are they even being collected given that wikitech is not behind varnish? [19:56:27] 06Labs, 10Analytics, 10Pageviews-API, 10wikitech.wikimedia.org: wikitech.wikimedia.org missing from pageviews API - https://phabricator.wikimedia.org/T153821#2892095 (10Nuria) @Krenair if wikitech is not behing varnish pageviews cannot be collected. Correct. Seems that we can close ticket? [19:58:11] PROBLEM - Host tools-webgrid-lighttpd-1201 is DOWN: CRITICAL - Host Unreachable (10.68.18.45) [19:59:02] andrewbogott: running narrative on puppet flaps I hit tools-webgrid-lighttpd-1403 directly and it was fine at 58 seconds [19:59:09] what's our threshold on wanrning I wonder? [19:59:16] yeah, they're always fine when I check :( [19:59:26] warning is they've to fail [19:59:27] did you see puppet.log? [19:59:35] RECOVERY - Host tools-webgrid-lighttpd-1201 is UP: PING OK - Packet loss = 0%, RTA = 1.53 ms [20:00:03] yuvipanda: there are no timestamps :/ but [20:00:05] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to determined $::labsproject at /etc/puppet/manifests/realm.pp:53 on node tools-webgrid-lighttpd-1403.tools.eqiad.wmflabs [20:00:05] Warning: Not using cache on failed catalog [20:00:05] Error: Could not retrieve catalog; skipping run [20:00:13] could be from so many occassions [20:00:26] chasemp: the 'configuration version' is a unix timestamp [20:00:42] oh I guess if it failed there's not that either [20:00:46] but we can guess by looking at previous one and then this [20:00:54] chasemp: also same message should be in syslog too and should have ts [20:01:39] that's the ticket of course thanks [20:01:39] Jan 23 19:36:52 tools-webgrid-lighttpd-1403 puppet-agent[20738]: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to determined $::labsproject at /etc/puppet/manifests/realm.pp:53 on node tools-webgrid-lighttpd-1403.tools.eqiad.wmflabs [20:01:53] ^ andrew failure to determine its project at the time of running [20:02:22] so factor pooping itself? [20:02:51] I'm not sure that comes from facter [20:02:54] that fact comes from ... a file in /etc? or something else [20:02:57] there is [20:02:57] labsprojectfrommetadata => tools [20:03:07] but $::labsproject is otherwise filled in I believe [20:03:38] that could also be sympomatic of issues on the tools master too [20:16:41] more duh andrewbogott yuvipanda bd808 logs available on the master too :) so this hits only 2 hosts in memory [20:16:43] 3 tools-mail-01.tools.eqiad.wmflabs [20:16:44] 3 tools-webgrid-lighttpd-1403.tools.eqiad.wmflabs [20:17:06] not sure what % of random failures those are but let's keep an eye out for not-those and look into what the crap on failed project lookup when we can [20:17:07] chasemp: also on logs-01 :D [20:17:16] so true [20:17:39] RECOVERY - Puppet run on tools-mail-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:18:09] ah, it was tools-mail-01 [20:18:16] which is why my look at tools-mail's syslog gave me nothing lol [20:18:57] Maybe it's the metadata service being flaky [20:19:00] or overloaded [20:19:29] that's where ::labsproject comes from I think [20:19:34] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [20:21:03] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962882 (10chasemp) >>! In T143349#2946955, @Acs wrote: >>>! In T143349#2946796, @chasemp wrote: >> @Qgil and @acs do you know if the instance (admins here https://wikitech.wikimedia.org... [20:21:32] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962884 (10chasemp) [20:27:54] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962892 (10chasemp) @Stwalkerster @FastLizard4 @DeltaQuad from https://wikitech.wikimedia.org/wiki/Nova_Resource:Account-creation-assistance you seem to the stakeholders for this instanc... [20:28:08] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962894 (10chasemp) [20:32:34] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962904 (10chasemp) Greetings @addshore @petrb from https://wikitech.wikimedia.org/wiki/Nova_Resource:Huggle Do you have plans to convert this instance to Ubuntu Trusty or Debian Jessie... [20:32:52] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962906 (10chasemp) [20:36:24] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962916 (10chasemp) [20:36:56] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2565984 (10chasemp) [20:38:19] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2565984 (10chasemp) @hashar, what's the deal with Integration-publisher.integration.eqiad.wmflabs? Is there a plan to shut it down? [20:38:34] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962932 (10chasemp) [20:39:43] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2566037 (10chasemp) >>! In T143349#2921069, @dschwen wrote: > Please do not remove the fastcci or maps-wma1 instances! They are being used. @dschwen is maps-tiles1 also upgraded now to... [20:40:43] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962942 (10chasemp) [20:41:39] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Migrate integration-publisher service to use a Jessie instance - https://phabricator.wikimedia.org/T156064#2962943 (10hashar) [20:42:37] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2572112 (10hashar) >>! In T143349#2962916, @chasemp wrote: > @hashar, what's the deal with Integration-publisher.integration.eqiad.wmflabs? Is there a plan to shut it down? Overlooked... [20:42:58] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962963 (10chasemp) Greetings @Crazycomputers @DeltaQuad @Tparis from https://wikitech.wikimedia.org/wiki/Nova_Resource:Utrs Do you have plans to convert this instance to Ubuntu Trusty... [20:43:15] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2962965 (10chasemp) [20:44:28] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2575393 (10chasemp) [20:48:34] maybe related to the metadata service, Horizon is all slow :] [20:52:43] (03PS1) 10DatGuy: Add #wikimedia-lta channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/333720 [20:53:09] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2963021 (10chasemp) Quick update from IRC where I spoke with @Lydia_Pintscher and @yuvipanda in `#wikidata` > Lydia_WMDE: i'll bring it up in tomorrow's team meeting > yuvipanda: wikid... [20:53:38] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2963025 (10chasemp) [21:00:57] 06Labs, 10Tool-Labs: Several users and tools have invalid credentials in replica.my.cnf - https://phabricator.wikimedia.org/T154933#2963037 (10chasemp) > > But `danmichaelo` (`u2238`) is still locked out: > > ``` > danmichaelo@tools-bastion-03:~$ sql enwiki > ERROR 1045 (28000): Access denied for user 'u223... [21:01:48] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2963042 (10chasemp) [21:01:51] 06Labs, 10DBA, 10Wikidata, 07Performance, and 3 others: Create a new project in labs for testing RedisLock in Wikidata - https://phabricator.wikimedia.org/T155042#2963041 (10chasemp) [21:02:09] (03CR) 10Legoktm: [C: 032] Add #wikimedia-lta channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/333720 (owner: 10DatGuy) [21:02:40] (03Merged) 10jenkins-bot: Add #wikimedia-lta channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/333720 (owner: 10DatGuy) [21:02:57] (03CR) 10jenkins-bot: Add #wikimedia-lta channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/333720 (owner: 10DatGuy) [21:03:14] !log tools.wikibugs Updated channels.yaml to: 5a5aedc2430a5eafad48abfbbd8606e640f34a01 Add #wikimedia-lta channel [21:03:15] 06Labs, 10DBA, 10Wikidata, 07Performance, and 3 others: Increase quota for wikidata-dev project - https://phabricator.wikimedia.org/T155042#2931757 (10chasemp) [21:03:35] 06Labs, 10DBA, 10Wikidata, 07Performance, and 3 others: Increase quota for wikidata-dev project - https://phabricator.wikimedia.org/T155042#2931757 (10chasemp) @Andrew I'll +1 this [21:09:44] hmm [21:09:47] morebots dead? [21:11:42] oh, it's stashbot now right [21:12:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:12:20] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 07Beta-Cluster-reproducible, 07Puppet: New instance have broken puppet configuration when using puppetmaster standalone - https://phabricator.wikimedia.org/T148929#2963092 (10hashar) [21:12:22] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Migrate integration-publisher service to use a Jessie instance - https://phabricator.wikimedia.org/T156064#2963091 (10hashar) [21:12:44] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Migrate integration-publisher service to use a Jessie instance - https://phabricator.wikimedia.org/T156064#2962943 (10hashar) Puppet doesn't pass on the new instance due to Puppet SSL certificates T148929 [21:16:22] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2963107 (10hashar) Note that when using a standalone puppet master, Puppet SSL certs are broken and that prevents puppet agent from completing its run. T148929. 100% reproducible when cr... [21:16:48] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Migrate integration-publisher service to use a Jessie instance - https://phabricator.wikimedia.org/T156064#2963109 (10hashar) [21:16:50] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 07Beta-Cluster-reproducible, 07Puppet: New instance have broken puppet configuration when using puppetmaster standalone - https://phabricator.wikimedia.org/T148929#2736876 (10hashar) [21:17:58] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Migrate integration-publisher service to use a Jessie instance - https://phabricator.wikimedia.org/T156064#2962943 (10hashar) Eventually I manually fixed Puppet and applied `role::ci::publisher::labs` in Horizon. Just have to update the... [21:19:32] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 07Beta-Cluster-reproducible, 07Puppet: New instance have broken puppet configuration when using puppetmaster standalone - https://phabricator.wikimedia.org/T148929#2963120 (10hashar) [21:29:53] PROBLEM - Host tools-webgrid-lighttpd-1201 is DOWN: CRITICAL - Host Unreachable (10.68.18.45) [21:48:44] 06Labs, 10Tool-Labs: Several users and tools have invalid credentials in replica.my.cnf - https://phabricator.wikimedia.org/T154933#2963172 (10scfc) >>! In T154933#2963037, @chasemp wrote: > […] >> But `danmichaelo` (`u2238`) is still locked out: > […] > ```sudo /usr/local/sbin/maintain-dbusers delete uu2238 -... [22:21:19] oh my $diety, dupdet is also englishonly... [22:31:19] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, and 4 others: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#2963322 (10bd808) Committee announced: https://lists.wikimedia.org/pipermail/labs-announce/2017-January/000203.html [22:44:54] 06Tool-Labs-standards-committee: Figure out how communications and meetings will work for the Tool Labs standards committee - https://phabricator.wikimedia.org/T156075#2963376 (10bd808) [22:46:05] 06Tool-Labs-standards-committee: Figure out how communications and meetings will work for the Tool Labs standards committee - https://phabricator.wikimedia.org/T156075#2963391 (10bd808) I don't want to try and force any particular workflow or practices on the committee, but I am more than happy to help you start... [22:48:36] 06Tool-Labs-standards-committee: Figure out how communications and meetings will work for the Tool Labs standards committee - https://phabricator.wikimedia.org/T156075#2963376 (10Harej) We should probably get a mailing list. Absent any specific business we should probably plan on having quarterly meetings, at t... [22:52:44] Hi, I'd like to deploy to beta cluster for the first time. I can't seem to be able to connect to deployment-prep. I'm getting 'channel 0: open failed: administratively prohibited: open failed' [22:53:10] I'm using 'ProxyCommand ssh -a -W %h:%p -A bsitzmann@primary.bastion.wmflabs.org' [22:54:03] bearND: probably best to ask in wikimedia-releng as they handle deployment-prep generally [22:54:15] chasemp: ok, will do. Thanks! [22:57:39] 06Tool-Labs-standards-committee: Figure out how communications and meetings will work for the Tool Labs standards committee - https://phabricator.wikimedia.org/T156075#2963421 (10bd808) >>! In T156075#2963394, @Harej wrote: > Absent any specific business we should probably plan on having quarterly meetings, at t... [23:03:25] 06Tool-Labs-standards-committee: Figure out how communications and meetings will work for the Tool Labs standards committee - https://phabricator.wikimedia.org/T156075#2963376 (10Huji) I think quarterly is an absolute minimum. And I think we need a mailing list and an IRC channel. We should probably hold a meet... [23:05:33] 06Labs, 10Tool-Labs: Several users and tools have invalid credentials in replica.my.cnf - https://phabricator.wikimedia.org/T154933#2963438 (10chasemp) >>! In T154933#2963172, @scfc wrote: >>>! In T154933#2963037, @chasemp wrote: >> […] >>> But `danmichaelo` (`u2238`) is still locked out: >> […] >> ```sudo /us... [23:51:09] any of the Labs team member around? [23:51:47] for what sort of task do you need them? [23:54:42] thank you! chatting with Madhu now! [23:55:05] how do i access webserver config on tool labs? i would like to get started with https://redmine.lighttpd.net/projects/1/wiki/Docs_ModStatus for debugging a program