[00:00:03] 6Labs, 6Phabricator: Phab-02 sending old stylesheet copies - https://phabricator.wikimedia.org/T94413#1184370 (10Negative24) 5Open>3stalled [00:00:04] what should that be in bigbrotherrc? [00:00:12] *I* have no idea. [00:00:17] see also: it’s a terrible system :) [00:00:25] but yeah, if that’s in a cron [00:00:29] I bet it’s -once locking being terrible. [00:00:38] can you paste that cron into the bug as well? [00:00:44] jstart -N ecmabot-wm -quiet -stderr -mem 1700M node ~/apps/oftn-bot/wm-ecmabot.js [00:00:54] YuviPanda: I did, months ago :) [00:01:11] hasn't changed :) [00:01:21] ah, fair enough. [00:01:31] I’ll also rewrite jsub at some point not too far in the future... [00:01:44] if you use webservice, you’ll already see a ‘service.manifest’ file in your tool’s homedir. [00:01:45] I'll switch this one to bigbrother [00:01:48] cool [00:01:59] things on bigbrother will be automatically switched over to the ‘new thing’ when that happens [00:02:55] YuviPanda: bb seems very sensitive to line breaks. Could you make it ignore empty lines? [00:03:08] Whenever I have a trailing line break, I get bunch of e-mails a few hours later [00:03:12] well, it’s a hacky perl script that uses regexes everywhere... [00:03:21] it iterates per line though [00:03:32] so I’d rather not touch it, since every time I’ve touched it earlier I’ve myself broken it [00:03:43] and instead spend time on the replacement [00:03:47] k [00:03:54] (which is tracked at https://phabricator.wikimedia.org/T90561) [00:03:59] where is bb hosted? [00:04:30] the code? [00:04:33] ops/puppet [00:05:30] web interface address [00:06:00] ah [00:06:02] there isn’t one :) [00:06:45] 2015-04-07 00:04:57 error: /data/project/ecmabot/.bigbrotherrc:2: command not supported [00:06:47] erm [00:07:05] well then I won't worry about it [00:07:06] just when I thought I had it [00:07:18] nano adds one to many line breaks I guess [00:07:44] and bb sends e-mail every time it checks, even when the process is up, because it's the line that supposed to tell it what to look at :-/ [00:41:19] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Yurik was created, changed by Yurik link https://wikitech.wikimedia.org/wiki/Nova+Resource%3aTools%2fAccess+Request%2fYurik edit summary: Created page with "{{Tools Access Request |Justification=osm |Completed=false |User Name=Yurik }}" [00:41:51] anyone around to do it pls? ^ [00:44:50] hi yurik [00:44:54] hi YuviPanda [00:45:49] yurik: you do know that labs != toollabs, right? :D [00:45:55] (re OSM, etc) [00:46:09] YuviPanda, https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Connecting_to_OSM_via_the_official_CLI_PostgreSQL [00:46:23] i tried to ssh to it and it didn't give me access [00:46:34] yurik: that should work from any labs instance, btw [00:47:05] YuviPanda, tried psql -h labsdb1004.eqiad.wmnet -U osm gis from osm1, failed [00:47:18] not sure if you need a password or what not [00:47:19] psql: FATAL: role "osm" does not exist [00:47:28] anyway, I’ve added you to the tools project [00:47:33] thanks! [00:47:36] yw [00:47:44] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Yurik was modified, changed by Yuvipanda link https://wikitech.wikimedia.org/w/index.php?diff=152454 edit summary: [00:48:03] YuviPanda, do you know where we set up the osm db clone in labs btw? [00:48:13] no idea, no [00:48:17] Alex is who you need [00:48:49] and yes, seems like that is outdated - from tools-dev i get psql command not found - in otherwords postgress is not even set up teher [00:49:08] YuviPanda, which Alex? [00:49:17] not krenair :) [00:49:24] the one from the ops team [00:49:30] thx ) [00:49:32] yeah, that’s highly possible. [00:49:35] I’ve never played with it [00:49:39] (03CR) 10Awight: feed #wmt to new channel (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/199330 (owner: 10John F. Lewis) [00:49:51] * yurik is going to look for opsy alex [00:50:11] yurik: akosiaris is who you’re looking for :) [00:50:17] is probably sleeping tho [00:50:29] what TZ is eh in? [00:50:49] (03PS3) 10Awight: Use block style YAML [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/196852 [00:51:14] (03CR) 10John F. Lewis: feed #wmt to new channel (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/199330 (owner: 10John F. Lewis) [00:51:49] probably greek [00:52:51] (03CR) 10Awight: feed #wmt to new channel (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/199330 (owner: 10John F. Lewis) [00:59:14] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Review and productionize service manifest monitor - https://phabricator.wikimedia.org/T95210#1184485 (10yuvipanda) @scfc ^ should help with that race condition checking, I think. Take a look? You also have merge rights on that... [01:48:34] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Review and productionize service manifest monitor - https://phabricator.wikimedia.org/T95210#1184533 (10scfc) That looks okay to me, but my knowledge about race conditions and symlinks is based on @coren's work in `take`. @cor... [01:52:19] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Review and productionize service manifest monitor - https://phabricator.wikimedia.org/T95210#1184534 (10yuvipanda) Totally planning on doing a test suite at least for Manifest :) I am not sure how exactly to test the other part... [01:54:09] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Review and productionize service manifest monitor - https://phabricator.wikimedia.org/T95210#1184535 (10yuvipanda) Also do take a look at the other open patches too :) https://gerrit.wikimedia.org/r/#/projects/operations/softwa... [02:57:57] hello, [[en:User:lowercase sigmabot III]] which is responsible for archiving tons of talk pages, has stopped running, and the operator inactive and unresponsive. Since it runs on labs, is possible to give it a nudge and restart it on behalf of the operator? [03:00:09] hmm, I guess it is running. Just unusually hasn't made it's way to some (many) talk pages yet [03:00:19] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 3ToolLabs-Goals-Q4, 7Tracking: Replace bigbrother and ssh-cron-thingy with service manifests - https://phabricator.wikimedia.org/T90561#1184549 (10yuvipanda) [03:01:56] RECOVERY - Puppet failure on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [03:07:36] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 3ToolLabs-Goals-Q4: Create debian package for service manifest monitor f - https://phabricator.wikimedia.org/T95255#1184555 (10yuvipanda) 3NEW [03:08:30] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 3ToolLabs-Goals-Q4: Send metrics from service manifest monitor to graphite - https://phabricator.wikimedia.org/T95256#1184561 (10yuvipanda) 3NEW [03:08:42] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 3ToolLabs-Goals-Q4: Send metrics from service manifest monitor to graphite - https://phabricator.wikimedia.org/T95256#1184561 (10yuvipanda) [03:13:26] legoktm: aha! I thought I had to package python3 version of statsd but apparently it already works with 3.4 :D [03:54:05] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Send metrics from service manifest monitor to graphite - https://phabricator.wikimedia.org/T95256#1184585 (10yuvipanda) Some more thought needs to be put into how this should be organized, I think. Currently it's organized on a... [03:54:37] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Send metrics from service manifest monitor to graphite - https://phabricator.wikimedia.org/T95256#1184561 (10yuvipanda) (you can see current set of stats by looking at tools.tools-bastion.ServiceMonitor.* in graphite.wmflabs.org) [04:42:12] 10Tool-Labs, 10Continuous-Integration: labs-toollabs-debian-glue fails apparently with a timeout - https://phabricator.wikimedia.org/T91247#1184622 (10scfc) p:5Triage>3Low [04:42:33] 10Tool-Labs, 10Wikimania-Hackathon-2015: Conduct a Tool Labs Workshop in Wikimania hackathon - https://phabricator.wikimedia.org/T91061#1184625 (10scfc) p:5Triage>3Normal [04:43:59] 10Tool-Labs, 10Wikimania-Hackathon-2015: Conduct a research tools workshop at wikimania hackathon 2015 - https://phabricator.wikimedia.org/T91062#1184627 (10scfc) p:5Triage>3Normal [04:44:22] 10Tool-Labs: Set up lint checks for labs/toollabs - https://phabricator.wikimedia.org/T65687#1184629 (10scfc) p:5Triage>3Low [04:46:50] 10Tool-Labs: "add / remove maintainers" links at http://tools.wmflabs.org/ should link directly to management page - https://phabricator.wikimedia.org/T65741#1184630 (10scfc) 5Open>3Resolved a:5scfc>3valhallasw [04:54:46] 10Tool-Labs: webservice creates blocking files and jobs when called from a user account with an eponymous tool - https://phabricator.wikimedia.org/T66219#1184634 (10scfc) p:5Triage>3Normal `webservice2` seems to have a similar problem (not tested). [04:56:39] 10Tool-Labs: webservice creates blocking files and jobs when called from a user account with an eponymous tool - https://phabricator.wikimedia.org/T66219#1184643 (10scfc) [04:58:02] 10Tool-Labs: querycache and querycachetwo tables aren't available on labs sql dbs - https://phabricator.wikimedia.org/T65782#1184647 (10scfc) p:5Triage>3Normal [04:58:30] 10Tool-Labs: querycache and querycachetwo tables aren't available on labs sql dbs - https://phabricator.wikimedia.org/T65782#690943 (10scfc) [05:00:39] 10Tool-Labs: Limit number of jobs users can execute in parallel - https://phabricator.wikimedia.org/T67777#1184666 (10scfc) p:5Triage>3Normal a:5coren>3None [05:04:09] 10Tool-Labs: Allocate more space for /tmp on tools-webgrid-{01,02} - https://phabricator.wikimedia.org/T67801#1184679 (10scfc) 5Open>3Resolved On all webgrid nodes, `/tmp` is now part of the root partition: ``` [tim@passepartout ~]$ fgrep tools-webgrid- .dsh/group/tools | pdsh -f 1 -w - df -h /tmp tools-web... [05:04:56] 10Tool-Labs: Setup an easy to use logrotate based system for rotating tools logs - https://phabricator.wikimedia.org/T68623#1184681 (10scfc) p:5Triage>3Lowest [05:05:21] 10Tool-Labs: Track 5xx error stats on Graphite - https://phabricator.wikimedia.org/T69880#1184682 (10scfc) p:5Triage>3Normal [05:05:34] 10Tool-Labs: Track labsdb stats on Labs Graphite - https://phabricator.wikimedia.org/T69884#1184683 (10scfc) p:5Triage>3Normal [05:05:50] 10Tool-Labs: Track gridengine stats on Graphite - https://phabricator.wikimedia.org/T69881#1184685 (10scfc) p:5Triage>3Normal [05:06:21] 10Tool-Labs: Monitor mail system in Graphite - https://phabricator.wikimedia.org/T71072#1184686 (10scfc) p:5Triage>3Normal [05:07:20] 10Tool-Labs: sql script does not accept wildcards as parameter - https://phabricator.wikimedia.org/T75595#1184688 (10scfc) p:5Triage>3Normal a:5coren>3scfc [05:09:05] 10Tool-Labs: Multiple queue runners on tools-mail - https://phabricator.wikimedia.org/T74867#1184704 (10scfc) p:5Triage>3Normal a:5coren>3None [07:41:35] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1184862 (10scfc) ``` From: root@tools.wmflabs.org (Cron Daemon) Subject: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) To: root@tools.wmflabs.org Date:... [08:17:31] (03CR) 10Zfilipin: [C: 031] Remove Quality Assurance from -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/202251 (owner: 10Greg Grossmeier) [08:21:59] 10Tool-Labs, 5Patch-For-Review: Unify proxylistener interacting code across portgrabber / tool-nodejs / tool-uwsgi - https://phabricator.wikimedia.org/T91957#1184930 (10scfc) 5Open>3Resolved a:3scfc [08:31:20] (03CR) 10Hashar: [C: 04-1] "Please replace it by Browser-Tests so we get notification for that new project :)" (031 comment) [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/202251 (owner: 10Greg Grossmeier) [09:04:34] 10Tool-Labs-tools-anagrimes: Export Wiktionnaire in dictionary formats - https://phabricator.wikimedia.org/T93340#1184986 (10Darkdadaah) Here is a list of xdxf files created from a fr.wikt dump : [[https://tools.wmflabs.org/anagrimes/data/xdxf/]]. Tested with golddict, some notes : - abbreviations meanings see... [09:44:24] 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185067 (10hashar) 3NEW [09:47:18] I am tired [09:49:11] hashar: get some rest [09:59:52] Darkdadaah: it is not like I am being paid to take nap and sleep :D [10:01:24] 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185081 (10hashar) That is caused by puppet change https://gerrit.wikimedia.org/r/#/c/202278/ which renames the variable from `use_dnsmasq` to `use_dnsmas... [10:03:48] PROBLEM - Free space - all mounts on tools-webgrid-04 is CRITICAL: CRITICAL: tools.tools-webgrid-04.diskspace._var.byte_percentfree.value (<25.00%) [10:04:21] !log integration-puppetmaster hacking /etc/resolv.conf to change nameserver from 208.80.154.12 to nameserver 10.68.16.1 and updating domain/server has well [10:04:21] integration-puppetmaster is not a valid project. [10:06:11] !log integration-puppetmaster restarted /etc/init.d/nscd to clear DNS cache. hostname --fqdn now yields back integration-puppetmaster.eqiad.wmflabs [10:06:12] integration-puppetmaster is not a valid project. [10:07:14] Invalid line 18: allow_ip [10:07:16] that nevers stop [10:07:22] https://tools.wmflabs.org/derivative/ needs restart [10:10:55] hashar: it's often better to take a nap/rest that forcing oneself to work inefficiently. If you are too tired, it could backfire. [10:20:10] how do I find which project a host, say parsoid.wmflabs.org, belongs to? [10:25:09] 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185128 (10hashar) So the problem is VERY nasty. All instances have been switched to the new DNS resolver which causes puppet client to fail. The puppetm... [10:27:25] 6Labs: /etc/ssh/userkeys/ubuntu notices for every puppet run on labs instances - https://phabricator.wikimedia.org/T94866#1185135 (10scfc) I only noticed T85814 now; trying to draw a line that task's scope is about finding and removing the cause of the files' existence in freshly provisioned instances, this task... [10:35:12] 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185151 (10scfc) I think that certificate signing nowadays is handled automatically, so you only need two Puppet runs patience? Also, `/etc/resolv.conf`... [10:37:00] 6Labs: wikidatawiki.labsdb user database is impressively slow - https://phabricator.wikimedia.org/T95276#1185152 (10Magnus) 3NEW [10:42:37] 6Labs: Storage capacity & redundancy expansion (tracking) - https://phabricator.wikimedia.org/T85604#1185163 (10mark) @Coren: see questions above, thanks! [10:45:04] 6Labs, 5Patch-For-Review: Replicate data between codfw and eqiad - https://phabricator.wikimedia.org/T85606#1185164 (10mark) @Coren: what is the status of this? [10:52:02] 6Labs: Storage capacity & redundancy expansion (tracking) - https://phabricator.wikimedia.org/T85604#1185168 (10mark) [10:52:04] 6Labs, 3ToolLabs-Goals-Q4: Test labstore switchover - https://phabricator.wikimedia.org/T94607#1185167 (10mark) [10:52:36] 6Labs, 3ToolLabs-Goals-Q4: Allow labstores to hot or warm swap in case of failure - https://phabricator.wikimedia.org/T93589#1185169 (10mark) [10:52:37] 6Labs: Storage capacity & redundancy expansion (tracking) - https://phabricator.wikimedia.org/T85604#950661 (10mark) [10:53:09] 6Labs, 3ToolLabs-Goals-Q4: Allow labstores to hot or warm swap in case of failure - https://phabricator.wikimedia.org/T93589#1140690 (10mark) [10:53:11] 6Labs: Storage capacity & redundancy expansion (tracking) - https://phabricator.wikimedia.org/T85604#950661 (10mark) [11:52:42] 6Labs, 7Shinken: shinken has many warnings (?) about "UNKNOWN: execution of the check script exited with exception list index out of range" - https://phabricator.wikimedia.org/T95161#1185248 (10Aklapper) [12:20:39] 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185320 (10hashar) 5Open>3Resolved a:3hashar So I have fixed it eventually and that was nasty. On integration-puppetmaster I have: * manually rewro... [12:20:49] 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185323 (10hashar) p:5Triage>3Unbreak! [12:50:11] 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185402 (10Andrew) I don't think this was actually caused by that patch; I suspect that it's another example of https://phabricator.wikimedia.org/T95240 w... [13:04:24] 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185433 (10thcipriani) @hashar we had the same problem yesterday in the staging project. The hiera hack should have been a no-op on every environment unle... [13:12:27] 6Labs: Storage capacity & redundancy expansion (tracking) - https://phabricator.wikimedia.org/T85604#1185451 (10coren) >>! In T85604#1167208, @mark wrote: > How much additional space (storage expansion) has been made available by this? An extra 25%, approximately 18T of usable space. In addition, the cleanup r... [13:17:08] 6Labs, 5Patch-For-Review: Replicate data between codfw and eqiad - https://phabricator.wikimedia.org/T85606#1185458 (10coren) Now that everything has been demonstrably stable over the Easter long weekend, we're ready to turn replication on with a bit of code review. [13:53:45] 6Labs: wikidatawiki.labsdb user database is impressively slow - https://phabricator.wikimedia.org/T95276#1185560 (10Magnus) UPDATE: Database has now slowed to a point where the tool (https://tools.wmflabs.org/mix-n-match/ ) is unusable. [14:11:39] 6Labs, 10Incident-20150331-LabsNFS-Filesystem-Switch: Create scripts to help stagger restarts of labs VMs by different criteria - https://phabricator.wikimedia.org/T94613#1185668 (10coren) [14:12:04] 6Labs, 10Incident-20150331-LabsNFS-Filesystem-Switch, 3Labs-Q4-Sprint-1: Create a simple checklist to follow for announcing / doing planned maintenance (on labs) - https://phabricator.wikimedia.org/T94608#1185669 (10coren) [14:16:14] Coren, what about /data/project/phetools/cache/hocr/ my tools can't even read/write in many of its subdir as they are owned by root atm [14:17:15] phe: Sorry about that; with all the outages last week I didn't want to take any chances. :-( Give me a minute and I'll finish this for you. [14:19:19] phe: I'll fix the permissions right this minute. [14:19:34] (in progress) [14:19:49] 6Labs, 5Patch-For-Review: Designate should support split horizon resolution to yield private IP of instances behind a public DNS entry - https://phabricator.wikimedia.org/T95288#1185706 (10hashar) 3NEW [14:22:03] 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185732 (10hashar) I have filled {T95288} Poked @andrew about the issue on https://lists.wikimedia.org/pipermail/labs-l/2015-April/003592.html [14:24:32] 6Labs, 10Incident-20150331-LabsNFS-Overload, 3Labs-Q4-Sprint-1, 5Patch-For-Review: Comprehensive monitoring / alerting for labstore* instances - https://phabricator.wikimedia.org/T94606#1185757 (10coren) [14:24:55] 6Labs, 10Incident-20150331-LabsNFS-Overload, 3ToolLabs-Goals-Q4: Test labstore switchover - https://phabricator.wikimedia.org/T94607#1185760 (10coren) [14:25:01] phe: Permissions should all be correct now? [14:25:59] 6Labs, 10Incident-20150331-LabsNFS-Overload, 3ToolLabs-Goals-Q4: Reinstall labstore1001 with Jessie - https://phabricator.wikimedia.org/T94609#1185770 (10coren) [14:26:37] Coren, yes [14:35:53] 6Labs, 10Continuous-Integration: Continuous integration should not depend on labs NFS - https://phabricator.wikimedia.org/T90610#1185847 (10Krinkle) p:5Triage>3Low [14:39:10] 6Labs, 10Incident-20150331-LabsNFS-Overload, 3ToolLabs-Goals-Q4: Test labstore switchover - https://phabricator.wikimedia.org/T94607#1185873 (10coren) [14:46:10] 6Labs, 10Tool-Labs: Unable to "Create New Tool" from tools.wmflabs.org webpage - https://phabricator.wikimedia.org/T91246#1185921 (10ananthrk) @scfc - I was able to successfully create a new tool today ("ananthrk-test") [14:48:40] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Make sure tools-db is backed up in some form - https://phabricator.wikimedia.org/T88716#1185933 (10coren) @scfc: This is DR backups, not partially restorable backups. [14:50:18] 6Labs, 10Tool-Labs: Planned labs maintenance on tools-db: Puppetization + log file change - https://phabricator.wikimedia.org/T94643#1185938 (10coren) @springle: Ping? It's now too late to schedule this for this Thursday if you expect any outage at all given we have decided on a week's notice - can you okay a... [14:51:53] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 3ToolLabs-Goals-Q4: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1185957 (10coren) [14:52:11] 6Labs, 3Labs-Q4-Sprint-2, 3ToolLabs-Goals-Q4: Labs NFSv4/idmapd mess - https://phabricator.wikimedia.org/T87870#1185958 (10coren) [14:52:36] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Puppetize & fix tools-db - https://phabricator.wikimedia.org/T88234#1185959 (10coren) [15:03:20] 6Labs: wikidatawiki.labsdb user database is impressively slow - https://phabricator.wikimedia.org/T95276#1186026 (10Magnus) Tool seems to be back, but DB speed is still bad. [15:45:22] YuviPanda: there? [15:52:53] andrewbogott: Haven't seen him yet. [15:53:02] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1186190 (10BBlack) I think the cron traffic reinforces my point, but perhaps we're miscommunicating on what the point is. Puppet's apt-related operations should be using appropriate apt/dpkg commands/interf... [15:58:15] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10Tool-Labs, 10Tool-Labs-tools-Article-request, and 9 others: Labs' Phabricator tags overhaul - https://phabricator.wikimedia.org/T89270#1186219 (10Aklapper) [16:02:13] andrewbogott: Coren just woke [16:04:08] YuviPanda: relatively many things are broken by that ENC bug (specifically: every instance that uses it). I’m in a meeting with Antoine now, can I tell him you’ll work on that today? [16:04:50] Well, I guess not ‘broken’ just prematurely opted in :) [16:05:32] andrewbogott: yes you can :) [16:05:39] thanks! [16:05:45] Sorry about that... [16:06:13] YuviPanda: andrewbogott use_dnsmasq broke CI instances hehe [16:06:22] It’s not like ‘true’ and true should be different [16:06:26] but it is all back to normal. I have replied on the labs-l list [16:06:32] hashar: yes, thanks, you can stop telling me about that now :) [16:06:51] andrewbogott: I have to basically dig into the LDAP library to see what's up [16:06:57] and filled a bug about Designate that should support yielding different response based on the client source. We have a hack in dnsmasq to serve different responses to labs instances vs random internet clients [16:07:09] andrewbogott: shit happens :) no big deal hehe [16:07:20] the resolver worked fine by the way [16:07:36] the only issue were the crazy puppet client certs being renamed and the split horizon issue [16:07:43] I honestly don’t understand what the custom ENC gets us that we didn’t have already. I take it deployment-prep is using it as well? [16:07:55] no clue :( [16:08:21] (03CR) 10Legoktm: [C: 032] Update regex for Wikipedia apps [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/201883 (owner: 10Alex Monk) [16:08:40] thcipriani replied on the bug I have filled [16:08:44] (03CR) 10Legoktm: [C: 032] Add two more projects to #-releng notices [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/200217 (owner: 10Greg Grossmeier) [16:08:45] about beta cluster and use_dnsmasq [16:08:52] (03Merged) 10jenkins-bot: Update regex for Wikipedia apps [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/201883 (owner: 10Alex Monk) [16:08:56] (03CR) 10Legoktm: [C: 032] Remove Quality Assurance from -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/202251 (owner: 10Greg Grossmeier) [16:08:58] (03CR) 10Legoktm: [C: 032] feed #wmt to new channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/199329 (owner: 10John F. Lewis) [16:09:12] (03Merged) 10jenkins-bot: Add two more projects to #-releng notices [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/200217 (owner: 10Greg Grossmeier) [16:09:32] (03Merged) 10jenkins-bot: Remove Quality Assurance from -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/202251 (owner: 10Greg Grossmeier) [16:09:35] (03Merged) 10jenkins-bot: feed #wmt to new channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/199329 (owner: 10John F. Lewis) [16:09:57] andrewbogott hashar the ENC allows us to assign classes to nodes based on regexes, needed for staging auto-setup [16:10:36] ah, autosetup, that makes sense [16:11:17] thcipriani: I have barely looked at your reply :/ Was just happy someone knew about it and that beta had no impact :) [16:11:47] hashar: my reply was kind of a wall o' text :) [16:13:30] hashar: the issue is that a stage in the puppet config (the ENC) swizzled the use_dnsmasq value from true (a boolean) to ‘true’ (a string) thus causing it to no longer be true. [16:13:32] thcipriani: text backed by facts :D [16:13:33] !log tools.wikibugs Updated channels.yaml to: 8a4346f6c5d0f826a9b3099d5f76339d7a64dcad Merge "Remove Quality Assurance from -releng" [16:13:35] 10Gerrit-Patch-Uploader, 6Phabricator: Create a phabricator api method to remove 'patch-for-review' tag from a task - https://phabricator.wikimedia.org/T95307#1186257 (10mmodell) 3NEW a:3mmodell [16:13:36] Logged the message, Master [16:13:38] It was pretty obscure. [16:13:58] andrewbogott: yeah I noticed a == true condition [16:14:10] but I blaming it on the variable rename :) [16:14:23] Nah, it was broken before the rename :( [16:14:27] ah [16:14:45] hashar: is it the case that beta is now committed to using the new resolver? If so we should make that explicit [16:14:51] otherwise it’ll switch back when Yuvi fixes that bug [16:14:52] so in short I just stopped my investigation and used the hiera page to pass the boolean true value to both variable names [16:15:04] manually reverted the resolv.conf change and it all went back up [16:15:12] then I "just" had to fight with puppet certificates :) [16:15:23] oh, I see, you’re using a local puppet hack now? [16:15:26] andrewbogott: for beta you can ask in the meeting :) [16:15:40] but we need the DNS split horizon on beta as well [16:15:41] 10Gerrit-Patch-Uploader: make gerritbot remove the "patch-for-review" tag once a patch is merged (or abandoned) - https://phabricator.wikimedia.org/T95309#1186292 (10mmodell) 3NEW [16:15:51] since we have a bunch of $wgXYZ variables pointing to *.beta.wmflabs.org [16:15:59] so that has to resolve to 10.x.x.x ip [16:16:04] so no, beta cant migrate [16:16:44] andrewbogott: with the enc you dont have to use wikitech and recreating instances is super trivial. [16:17:03] All roles applied to instances are documented and version controlled [16:19:04] I'll be in the office soon and taking care of this [16:19:21] Sorry for the issues everyone [16:19:56] 10Gerrit-Patch-Uploader, 6Phabricator: Create a phabricator api method to remove 'patch-for-review' tag from a task - https://phabricator.wikimedia.org/T95307#1186330 (10mmodell) [16:20:12] 10Gerrit-Patch-Uploader, 6Phabricator: Create a phabricator api method to remove 'patch-for-review' tag from a task - https://phabricator.wikimedia.org/T95307#1186257 (10mmodell) [16:22:52] 6Labs, 10Staging, 5Patch-For-Review: Labs puppet ENC scrambles booleans - https://phabricator.wikimedia.org/T95240#1186348 (10mmodell) fixed? [16:23:34] 6Labs, 10Staging, 5Patch-For-Review: Labs puppet ENC scrambles booleans - https://phabricator.wikimedia.org/T95240#1186351 (10yuvipanda) Nope but dnsmasq errors fixed for now. I'll take a look at the underlying issue today. [16:38:26] Coren, do you plan to restart the copy (or is it in progress) ? or must I rebuild the missing data? [16:38:57] phe: I'll babysit it off-hours this evening to save you the trouble. [16:39:56] Coren, ok, ty [16:40:42] Hi Coren :) [16:40:56] The bigbrother replacement is almost ready to go \o/ [16:40:58] o/ YuviPanda. [16:41:06] Do you have code to review? [16:41:17] Yes! I added you ad a reviewer to a few [16:41:30] It runs as root so needs careful review [16:41:44] Coren: see operations/software/tools-manifest [16:41:59] Lots of open patchsets including some symlink security ones [16:42:08] kk. I'll make sure to do that today then. [16:42:22] Coren: thanks :) [16:42:33] I'm taking a shower I'll brb and then poke at sprint 2 [16:42:38] kk [16:42:43] Coren: we shouldn't remove precious sprint tags I think. [16:42:50] Just add then? [16:42:56] Otherwise the sprint board would be historically inaccurate [16:42:57] Yeah [16:43:23] And ideally we only put sprint tags on things that get taken care of [16:43:31] But that takes a long time before being reality [16:43:39] So overlapping 2 sprints is usually ok [16:43:46] Alright brb [16:48:50] (03PS1) 10Alexandros Kosiaris: add passwords::labs::rabbitmq [labs/private] - 10https://gerrit.wikimedia.org/r/202434 [16:49:32] 6Labs, 10hardware-requests, 6operations: eqiad: (6) labs virt nodes - https://phabricator.wikimedia.org/T89752#1186497 (10Cmjohnson) labvirt1001 wmf4669 u13/14 ge-3/0/8 labvirt1002 wmf4670 u15/16 ge-3/0/20 labvirt1003 wmf4671 u17/18 ge-3/0/21 labvirt1004 wmf4672 u33/34 ge-5/0/18 labvirt1005 wmf4673 u35/36... [17:22:01] 6Labs, 3Labs-Q4-Sprint-2: Slides for the Labs storage presentation - https://phabricator.wikimedia.org/T95317#1186621 (10coren) 3NEW a:3coren [17:36:54] Krinkle: headsup about graphite name changes, email cc’d to you [17:37:22] Coren: so https://gerrit.wikimedia.org/r/#/projects/operations/software/tools-manifest,dashboards/default is all of them now, I’ll be adding more soon [17:37:34] Coren: only support for webservices atm, other jobs support coming soon [17:42:39] YuviPanda: Allright, I'm in the middle of a test right now but reading those is next on my list. [17:42:45] Coren: \o/ cool [17:44:47] YuviPanda: which email? [17:45:13] Krinkle: it was to ops@ from godog [17:45:28] gotit [17:54:29] YuviPanda: Hm. There's no easy place to review the code in toto, I'll have to pepper comments all over the changesets. :-( I hope that won't be too much of a mess. [17:54:51] Coren: yeah, but we also were using a phab ticket for general comments unrelated to specific patchsets let me find it [17:55:00] Coren: https://phabricator.wikimedia.org/T95210 [17:55:08] scfc already commented a bunch [17:55:23] Coren: but do leave specific ones first, I already fixed a bunch of things scfc pointed out (possibly) [17:59:29] Also, while I find the pun amusing, I'm not entirely certain 'destiny' is... apropriate. That might just be me being a bit overly PC. [18:00:21] Coren: yes, I realzied that after finding otu that manifest dsetiny did not mean what I thought it meant [18:00:30] Coren: there’s a patchset renaming it to tools.manifest [18:00:52] Ah, good. I hadn't reached that changeset yet. :-) [18:01:06] Coren: :) I don’t think that’s overly PC :) [18:01:38] Coren: I just assumed manifest destiny meant instances of destiny manifesting itself [18:01:42] rather than a particular instance [18:02:36] It's one of those cases of a perfectly good bit of gramatical language ending up being associated with... unfun politics and being ruined for everyone afterwards. :-) [18:05:06] Coren: yeah [18:05:22] Coren: but yeah, there’s totally a rename patch :) [18:08:50] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] add passwords::labs::rabbitmq [labs/private] - 10https://gerrit.wikimedia.org/r/202434 (owner: 10Alexandros Kosiaris) [18:14:53] YuviPanda: In the meantime, can you review the update https://gerrit.wikimedia.org/r/#/c/199267/ ? [18:15:09] looking [18:31:34] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Review and productionize service manifest monitor - https://phabricator.wikimedia.org/T95210#1187015 (10coren) @scfc: The protections to take against symlink races depend on whether it is possible to create the file or not; whe... [18:34:14] 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Review and productionize service manifest monitor - https://phabricator.wikimedia.org/T95210#1187019 (10coren) As a note, the code in take.cc is indeed very carefully crafted to avoid symlink shennanigans but wouldn't be //quit... [18:40:21] YuviPanda: And yes, writing safely to the filesystem as root is *hard* :-) [20:12:20] 6Labs, 10Staging, 5Patch-For-Review: Labs puppet ENC scrambles booleans - https://phabricator.wikimedia.org/T95240#1187339 (10thcipriani) FWIW puppet's internal ldap isn't that sophisticated: Grabs all `puppetvar` values and splits them on '=' (just like the python enc): https://github.com/puppetlabs/puppet... [20:52:26] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1187496 (10scfc) I didn't mean to suggest that clashes were unlikely; on the contrary, I think (and had hoped to express) that they are //very// likely. But it's not only WMF who runs `apt`/`dpkg` every 20... [20:56:01] 6Labs: wikidatawiki.labsdb user database is impressively slow - https://phabricator.wikimedia.org/T95276#1187514 (10scfc) a:3Springle [20:58:22] 6Labs, 5Patch-For-Review: Designate should support split horizon resolution to yield private IP of instances behind a public DNS entry - https://phabricator.wikimedia.org/T95288#1187528 (10scfc) This is a showstopper for #Tool-Labs as well; many tools connect to the proxies from within Labs. [20:59:54] 6Labs: Designate should support split horizon resolution to yield private IP of instances behind a public DNS entry - https://phabricator.wikimedia.org/T95288#1187538 (10scfc) [21:05:52] RECOVERY - Puppet failure on tools-dev is OK: OK: Less than 1.00% above the threshold [0.0] [22:27:36] 6Labs, 6Analytics-Engineering: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#1187826 (10Milimetric) @yuvipanda, queries against labsdb are faster, and we saw some back-filling going on, but it's still not fast enough to... [22:39:19] 6Labs, 6Analytics-Engineering: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#1187844 (10coren) @milimetric: that's actually downright scary. What kind of changes are you noticing (i.e. additions, changes, deletions)? I... [22:44:37] 6Labs, 6Analytics-Engineering: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#1187859 (10Milimetric) @coren: this problem was observed while pulling data out of analytics-store actually, so it's happening in mediawiki som... [22:59:39] (03PS1) 10Ricordisamoa: Initial commit [labs/tools/ptable] - 10https://gerrit.wikimedia.org/r/202610 [23:23:03] 6Labs, 10Staging, 5Patch-For-Review: Labs puppet ENC scrambles booleans - https://phabricator.wikimedia.org/T95240#1187958 (10yuvipanda) 5Open>3Resolved ```root@staging-palladium:/var/lib/git/operations/puppet# /usr/local/bin/ldap-yaml-enc.py $(facter -p ec2id).$(facter -p domain) classes: [base, 'role::... [23:33:10] 10Quarry: Quarry sorts by the first column by default - https://phabricator.wikimedia.org/T95369#1187983 (10He7d3r) 3NEW