[02:21:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1202 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:21:53] PROBLEM - Puppet run on tools-exec-1220 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:22:13] PROBLEM - Puppet run on tools-worker-1021 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:24:39] PROBLEM - Puppet run on tools-exec-1215 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:24:43] PROBLEM - Puppet run on tools-exec-1405 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:25:24] PROBLEM - Puppet run on tools-exec-1219 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:25:32] PROBLEM - Puppet run on tools-docker-registry-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:25:54] PROBLEM - Puppet run on tools-worker-1005 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:26:10] PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:26:10] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:27:06] PROBLEM - Puppet run on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:27:08] PROBLEM - Puppet run on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:27:38] PROBLEM - Puppet run on tools-webgrid-lighttpd-1204 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:28:08] PROBLEM - Puppet run on tools-exec-1419 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:28:10] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:28:14] PROBLEM - Puppet run on tools-exec-1203 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:28:21] (someone from ops is investigating) [02:28:34] PROBLEM - Puppet run on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:28:39] PROBLEM - Puppet run on tools-docker-registry-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [02:29:21] PROBLEM - Puppet run on tools-worker-1004 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:29:29] PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:29:37] PROBLEM - Puppet run on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:29:41] PROBLEM - Puppet run on tools-worker-1022 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:29:53] PROBLEM - Puppet run on tools-exec-1211 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:31:03] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:31:11] PROBLEM - Puppet run on tools-exec-1413 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:31:19] PROBLEM - Puppet run on tools-worker-1019 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [02:32:29] PROBLEM - Puppet run on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:33:00] PROBLEM - Puppet run on tools-exec-1202 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:33:58] PROBLEM - Puppet run on tools-exec-1403 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:34:12] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:34:30] PROBLEM - Puppet run on tools-worker-1014 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:34:54] PROBLEM - Puppet run on tools-exec-1213 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:35:42] PROBLEM - Puppet run on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:35:48] PROBLEM - Puppet run on tools-bastion-05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [02:37:38] PROBLEM - Puppet run on tools-exec-1417 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:37:41] PROBLEM - Puppet run on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:37:43] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:37:45] PROBLEM - Puppet run on tools-proxy-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:38:09] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:38:21] PROBLEM - Puppet run on tools-exec-1206 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:38:35] PROBLEM - Puppet run on tools-exec-1414 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:38:39] PROBLEM - Puppet run on tools-exec-1416 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [02:39:05] PROBLEM - Puppet run on tools-exec-1221 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:39:25] PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:41:07] PROBLEM - Puppet run on tools-worker-1023 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:42:17] PROBLEM - Puppet run on tools-exec-1201 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:42:55] PROBLEM - Puppet run on tools-worker-1002 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:43:07] PROBLEM - Puppet run on tools-exec-1210 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:43:33] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:43:41] PROBLEM - Puppet run on tools-exec-1216 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:44:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [02:44:13] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:44:29] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:44:53] PROBLEM - Puppet run on tools-exec-gift is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:44:54] PROBLEM - Puppet run on tools-exec-1214 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [02:44:59] PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:45:14] PROBLEM - Puppet run on tools-webgrid-lighttpd-1205 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:45:20] PROBLEM - Puppet run on tools-exec-1412 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:47:24] PROBLEM - Puppet run on tools-exec-1208 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:48:12] PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:49:02] 10Striker: Allow easy replication of existing github/bitbucket repos - https://phabricator.wikimedia.org/T143971#2584987 (10mmodell) It should be straightforward to implement this in striker by calling this api method: https://phabricator.wikimedia.org/conduit/method/diffusion.uri.edit/ [02:56:07] RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [02:56:53] RECOVERY - Puppet run on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0] [02:57:11] RECOVERY - Puppet run on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [02:57:40] 06Labs, 06Operations: labservices1001 down - https://phabricator.wikimedia.org/T152340#2845112 (10fgiunchedi) a:03Andrew [02:57:45] 06Labs, 10Labs-Infrastructure, 06Operations: labservices1001 down - https://phabricator.wikimedia.org/T152340#2845114 (10Krenair) [02:59:26] 10Striker, 10Phabricator, 10Security-Reviews, 13Patch-For-Review: Unable to mirror repository from git.legoktm.com into diffusion - https://phabricator.wikimedia.org/T143969#2845116 (10mmodell) T146055 improves the situation slightly by isolating the db credentials which are readable by the phd user from t... [03:00:30] RECOVERY - Puppet run on tools-docker-registry-02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:01:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [03:01:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [03:02:06] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [03:02:06] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [03:02:36] RECOVERY - Puppet run on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [03:03:10] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [03:03:14] RECOVERY - Puppet run on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [03:03:31] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [03:03:37] RECOVERY - Puppet run on tools-docker-registry-01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:04:37] RECOVERY - Puppet run on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [03:04:37] RECOVERY - Puppet run on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [03:04:41] RECOVERY - Puppet run on tools-worker-1022 is OK: OK: Less than 1.00% above the threshold [0.0] [03:04:43] RECOVERY - Puppet run on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [03:05:21] RECOVERY - Puppet run on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [03:05:53] RECOVERY - Puppet run on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [03:06:05] RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [03:06:21] RECOVERY - Puppet run on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [03:07:29] RECOVERY - Puppet run on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [03:08:10] RECOVERY - Puppet run on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [03:09:22] RECOVERY - Puppet run on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [03:09:30] RECOVERY - Puppet run on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [03:09:32] RECOVERY - Puppet run on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [03:09:54] RECOVERY - Puppet run on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [0.0] [03:09:56] RECOVERY - Puppet run on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0] [03:10:20] RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [03:10:46] RECOVERY - Puppet run on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [03:11:12] RECOVERY - Puppet run on tools-exec-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [03:12:41] RECOVERY - Puppet run on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [03:12:59] RECOVERY - Puppet run on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [03:13:19] RECOVERY - Puppet run on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [03:13:39] RECOVERY - Puppet run on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [03:13:59] RECOVERY - Puppet run on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [03:14:05] RECOVERY - Puppet run on tools-exec-1221 is OK: OK: Less than 1.00% above the threshold [0.0] [03:14:13] RECOVERY - Puppet run on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [03:16:08] RECOVERY - Puppet run on tools-worker-1023 is OK: OK: Less than 1.00% above the threshold [0.0] [03:17:18] RECOVERY - Puppet run on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [03:17:38] RECOVERY - Puppet run on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [03:17:42] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [03:17:44] RECOVERY - Puppet run on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:17:52] RECOVERY - Puppet run on tools-worker-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [03:18:06] RECOVERY - Puppet run on tools-exec-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [03:18:08] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [03:18:32] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [03:18:33] RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [03:19:29] RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:19:53] RECOVERY - Puppet run on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [03:22:23] RECOVERY - Puppet run on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [03:23:11] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [03:23:41] RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [03:24:15] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [03:24:30] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [03:24:54] RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [03:25:00] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [03:25:16] RECOVERY - Puppet run on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [03:25:18] 06Labs, 10Labs-Infrastructure, 06Operations: labservices1001 down - https://phabricator.wikimedia.org/T152340#2845101 (10Legoktm) > Note that during this time dns from labs instances seemed fine, why toolschecker failed needs investigation Well, it wasn't all fine. CI tests were failing with stuff like: ```... [03:25:20] RECOVERY - Puppet run on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [03:34:03] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [04:49:22] 10Striker: Allow easy replication of existing github/bitbucket repos - https://phabricator.wikimedia.org/T143971#2845147 (10bd808) >>! In T143971#2845099, @mmodell wrote: > It should be straightforward to implement this in striker by calling this api method: https://phabricator.wikimedia.org/conduit/method/diffu... [06:02:26] 10Labs-project-Phabricator: Upgrade phab-01.wmflabs.org - https://phabricator.wikimedia.org/T127617#2845151 (10Dzahn) Is it applied on phab-01? Is phab-01 still used and active? [06:29:11] PROBLEM - Puppet run on tools-exec-1419 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:52:20] 06Labs, 10Phabricator, 07Puppet: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2845162 (10mmodell) @paladox: is this now working on phab-01? [06:53:05] 06Labs, 10Phabricator, 07Puppet: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2845163 (10mmodell) a:05mmodell>03None rather, the main role now works on labs, correct? [07:09:09] RECOVERY - Puppet run on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [08:49:54] 06Labs, 10Tool-Labs, 10DBA: Runing SQL EXPLAIN on Labs - https://phabricator.wikimedia.org/T152341#2845181 (10mahmoud) [10:52:42] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [11:21:12] 10Labs-project-Phabricator: Upgrade phab-01.wmflabs.org - https://phabricator.wikimedia.org/T127617#2845247 (10Paladox) Oh nope not applied to phab-01 yet. [11:22:50] 06Labs, 10Phabricator, 07Puppet: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2845248 (10Paladox) @mmodell I haven't applied this role to phab-01 yet but I have applied it to the phabricator role so looks like it is working on labs (no failures) [11:32:41] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:45:09] 06Labs, 10Tool-Labs, 10DBA: Runing SQL EXPLAIN on Labs - https://phabricator.wikimedia.org/T152341#2845319 (10Krenair) [14:45:14] 06Labs, 10Tool-Labs, 07Upstream: Unable to explain queries on replicated databases - https://phabricator.wikimedia.org/T50875#2845322 (10Krenair) [15:25:56] !log deployment-prep Found a git-sync-upstream cron on deployment-mx for some reason... commented for now, but wtf was this doing on a MX server? [15:26:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [16:49:32] 06Labs, 10Tool-Labs, 07Documentation: document the need and usage patterns for special exec hosts - https://phabricator.wikimedia.org/T99067#2845394 (10scfc) [16:49:48] 06Labs, 10Tool-Labs, 07Documentation: document the need and usage patterns for special exec hosts - https://phabricator.wikimedia.org/T99067#1284594 (10scfc) (`tools-exec-cyberbot` is gone.) [17:13:43] 06Labs, 10Tool-Labs, 15User-bd808: Change Python hashbang to `#! /usr/bin/env python -E -s` for user-facing tools - https://phabricator.wikimedia.org/T147350#2845425 (10scfc) [17:15:16] 06Labs, 06Discovery, 06Maps, 10Maps-data, and 2 others: PostgreSQL query planner bug on labsdb1006 - https://phabricator.wikimedia.org/T145599#2845427 (10scfc) [17:19:26] 06Labs, 10Tool-Labs: DNS resolution sometimes fails on tools-bastion-03 - https://phabricator.wikimedia.org/T143194#2560168 (10scfc) @Samwilson, is this still an issue you encounter from time to time? [17:52:17] 06Labs, 10Tool-Labs, 10Pywikibot-weblinkchecker.py: Weblinkchecker should throttle connections to the same host - https://phabricator.wikimedia.org/T152350#2845444 (10valhallasw) [18:00:03] 06Labs, 10Tool-Labs, 10Pywikibot-weblinkchecker.py: Weblinkchecker should throttle connections to the same host - https://phabricator.wikimedia.org/T152350#2845460 (10valhallasw) [18:04:29] I try to do npm install and Git bash say that don't recognize the command :S [18:04:40] and I need that to update the view stats for the project portals [18:04:54] 'cause it uses gulp to generate the data [18:04:57] any ideas? [18:05:08] mafk: 'git bash'? [18:05:17] i.e. on your local computer? [18:05:22] yes [18:05:52] the repo wikimedia/portals require a lot of stuff to run [18:06:04] Ok, then I don't have to worry about tool labs doing something odd :-). Check your $PATH to check it includes the directory where you installed node? [18:06:27] valhallasw`cloud: npm works on labs? [18:06:50] because if I can run gulp update-stats from there I can save time installing it on my local machine [18:08:04] https://meta.wikimedia.org/wiki/Module:Project_portal/views [18:09:51] I think so, yes [18:10:41] * mafk will test [18:10:49] hope not to break anything [18:35:12] 06Labs, 10Tool-Labs, 10DBA: enwiki_p replica on s1 is corrupted - https://phabricator.wikimedia.org/T134203#2257889 (10scfc) @jcrespo: What is the Phabricator task of #DBA that covers the work needed to resolve this task? T147052? If so, could you please add it as a blocking task to this task? That way, q... [18:38:54] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Tool name too long - https://phabricator.wikimedia.org/T141100#2487172 (10scfc) This would also affect: ``` scfc@tools-bastion-03:~$ getent passwd | sed -ne 's/^tools\.\([^:]\{25,\}\):.*$/\1/p;' not-in-the-other-language citation-template-filling perfect-venn-diagram-... [18:44:44] 06Labs, 10Tool-Labs: About 71 users are missing replica.my.cnf - https://phabricator.wikimedia.org/T140592#2845473 (10scfc) [18:45:05] 06Labs, 10Tool-Labs: About 71 users are missing replica.my.cnf - https://phabricator.wikimedia.org/T140592#2469834 (10scfc) ``` root@tools-bastion-03:~# (cd /home && for dir in *; do if ! [ -f "$dir/replica.my.cnf" ]; then echo "$dir"; fi; done) | wc -l 71 root@tools-bastion-03:~# ``` [19:13:18] blerg: trying to spin up new instances in the ci-staging project. I keep getting prompted for a password when I try to login. Old instances still working fine. I changed the project hiera config, removed the puppetmaster hiera var -- same deal. Is there a known issue? Anything else I can try from my end? [19:13:45] tried a reboot, also tried recreating images with no luck :( [19:16:28] I noticed this start at some point Friday, gave up for the week, revisited today and hit the same issue, if that info is helpful [19:27:06] thcipriani|afk did you try creating the instances through horozon? [19:27:12] or trough wikitech. [19:27:52] I was using horizon, switched to wikitech when I hit this wall, same outcome [19:28:31] Oh, so you try to ssh in but get the password dialog? [19:29:00] yup [19:29:32] Do you use ssh proxy. IE set a config file for ssh and do ssh ? [19:30:13] Did you delete and recreate the instance today? [19:30:17] paladox, please [19:30:39] tyler knows what he's doing [19:30:47] thcipriani|afk, can you add me to the project? [19:30:51] Krenair i am only trying to help. [19:30:56] heh, sometimes I do. Krenair yup, sure. [19:33:28] (internet slow at the moment...) [19:34:11] Krenair: you should be an admin now [19:34:16] okay, let's see [19:36:39] building a new instance, watching the console [19:37:01] ci-staging-puppetmaster and ci-staging-jenkins-master seem to be working fine, other instances are not happy. There do seem to be errors in the console... [19:37:41] [K[ [31m*[1;31m*[0m[31m* [0m] A start job is running for /etc/rc.local Compatibili...23s / no limit)[K[ [31m*[1;31m*[0m[31m* [0m] A start job is running for /etc/rc.local Compatibili...24s / no limit)[K[[31m*[1;31m*[0m[31m* [0m] A start job is running for /etc/rc.local Compatibili...25s / no limit)2016-12-04T19:37:12.076336+00:00 ci-staging-alex-testing rc.local[406]: #033[1;31mError: Could not retrieve catalog from remote server: Error 400 [19:37:41] on SERVER: Reading data from Ci-staging failed: TypeError: Data retrieved from Ci-staging is String not Hash at /etc/puppet/manifests/realm.pp:71 on node ci-staging-alex-testing.ci-staging.eqiad.wmflabs#033[0m [19:38:10] a puppet failure at this stage will probably cause the password prompt you ran into [19:38:33] ew, what is this hiera page [19:38:45] which? [19:39:05] https://wikitech.wikimedia.org/w/index.php?title=Hiera:Ci-staging&diff=1060231&oldid=1052545 [19:39:11] ah, crap, yeah, I just saw that. [19:40:01] okay, deleted the page [19:40:01] I bet that's the whole deal -- thank you Krenair [19:40:07] terminating my instance and trying again [19:40:14] ok [19:41:02] added that page because I was having some hiera problems (that turned out to be puppet problems), instead of deleting, just added an empty hash, which wikitech evidently interpreted as a string. [19:41:50] or, I guess, a list with the string '{}' in it. [19:41:52] yeah [19:42:00] not sure why that happened [19:42:10] maybe we should give projectadmins deletion privileges over their hiera pages [19:43:31] thcipriani|afk, try sshing to ci-staging-alex-test2.ci-staging.eqiad.wmflabs now [19:43:58] Krenair: works [19:44:01] great [19:44:14] blerg, that was a dumb error :) Thank you for the help! [19:44:22] I'll terminate my test2 instance now [19:44:32] appreciated. [19:45:34] done [19:47:00] thanks again! I don't know that I would have found that :) [19:48:03] there's a few things that helped me [19:48:35] the first thing to know is that those password prompts are usually triggered by some sort of puppet failure during creation, or an ldap/dns outage [19:49:16] to catch the errors you can just read them straight off the console log. but you sometimes have to go to the page and refresh often to be sure to catch everything, it's limited in the number of lines it'll show annoyingly [19:50:05] once I had the error, git grep on the puppet repo for "Data retrieved from" showed matches in modules/wmflib/lib/hiera/httpcache.rb and modules/wmflib/lib/hiera/mwcache.rb [19:50:25] nice :) [19:50:29] I don't remember why we have two files but I know one of them looks up data from the Hiera: namespace on wikitech [19:51:09] from there it was pretty obvious it didn't like the contents of Hiera:Ci-staging, and indeed the contents looked very strange [19:52:33] yeah, I removed my puppetmaster since I figured puppet failure, but didn't think about the hiera lookups causing issues. [19:53:13] is the Hiera: namespace on wikitech due to be removed at somepoint? [19:53:48] horizon interface has been, at least for this project, nicer to use :) [19:56:01] not sure [19:56:12] we do have too many places to put this data [19:56:28] you can find it in the puppet repository, in wikitech and now also via horizon [19:57:41] puppet repository on a per-instance/project basis should probably be removed due to ops code review practices [19:58:29] we are trying to move stuff away from wikitech but right now those pages are versioned, the system connected to horizon is not [20:00:23] this is true, although that's not a feature I use too often. [20:00:45] do you usually work on something alone? [20:01:10] for deployment-prep I'd like to be able to see who did what, why and when [20:01:31] eh, even for beta when I'm investigating breakage much of it is due to puppetmaster moving rather than hieradata moving [20:01:56] but revision control is nice and I've probably use more than once [20:02:26] https://phabricator.wikimedia.org/T152142 Krenair thcipriani|afk [20:02:45] Is the typeerror issue [20:05:37] huh, yeah, in my instance it was: ---\n- '{}' that was causing the error -- which is also a little weird (although, obviously user-error on my part) [20:06:29] thcipriani|afk, speaking of deployment-prep, do you happen to know why I found git-sync-upstream and various other bits and pieces running on deployment-mx? has a puppetmaster previously been installed there or something? [20:06:50] krenair@deployment-mx:/var/lib/git/operations/puppet$ git log -n 1 [20:06:50] commit 9d61e7781195be69dbdf42e302db1135f068d2d6 [20:06:50] Author: dzahn [20:06:50] Date: Thu Jun 25 13:36:37 2015 -0700 [20:06:55] svn: delete svn.wm.org SSL cert [20:07:19] hrm, not that I can remember. That is bizarre. [20:07:22] and labs/private from December 2014 [20:08:26] yeah, deployment-mx (at least some iteration of it -- no idea if it's been rebuilt at any point) has been around for as long as I can remember. [20:09:38] 2 years, 1 month [20:10:05] beaten only by redis0[12], pdf01 and stream [20:10:46] so it runs trusty of course [20:12:59] beta is such a pit of sorrow :( Everytime I pull on a thread the amount of work that needs done comes crashing down. [20:14:16] such an important resource, probably needs full-time attention. [20:14:57] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Aggregate system logs from kubernetes nodes - https://phabricator.wikimedia.org/T141270#2845560 (10scfc) [20:15:00] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Puppet fails with 'no user syslog' when setting up syslog receiver role on Debian Jessie - https://phabricator.wikimedia.org/T141498#2845556 (10scfc) 05Open>03Resolved a:03fgiunchedi This was fixed by b0fe1c6c283cb2866ce5291289a87210d9b2f577/a1dad77d68b7097700569... [20:15:09] thcipriani|afk, full-time attention by someone with puppet merge rights [20:15:24] I doubt ops would allow it [20:15:28] yes please! [20:17:56] it has been difficult to get trivial things merged. Mostly lack of context, is my guess. [20:18:25] lack of context? [20:18:50] yeah, lack of context for what the patch is and what it affect for folks with merge rights to ops/puppet [20:19:05] *affects [20:19:14] we could explain that easily to anyone who asks [20:19:38] that's never been a blocking issue [20:20:17] 06Labs, 10Tool-Labs: Install debootstrap and fakechroot on tools - https://phabricator.wikimedia.org/T138138#2845566 (10scfc) a:03scfc [20:20:59] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Quietmouse was modified, changed by BryanDavis link https://wikitech.wikimedia.org/w/index.php?diff=1075763 edit summary: [20:24:56] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries, 10DBA: Run a Tool Labs query without Timing out - https://phabricator.wikimedia.org/T138111#2845575 (10scfc) 05Open>03Invalid AFAIU T135644, the purpose of the queries have been achieved by some other means. [20:26:51] 06Labs, 10Tool-Labs: Disable xdebug on Tool Labs - https://phabricator.wikimedia.org/T137146#2358771 (10scfc) @tom29739, are you still planning to run your tests? [20:31:15] 06Labs, 10Tool-Labs: install php5-readline on bastion and exec hosts - https://phabricator.wikimedia.org/T136519#2845594 (10scfc) a:03scfc [20:35:07] 06Labs, 10Tool-Labs: signpostlab and telegrambot webservices flapping (registering/deregistering) - https://phabricator.wikimedia.org/T133092#2845609 (10scfc) 05Open>03Resolved a:03valhallasw Resolving this task as the webservices are now shut down and thus not flapping. @ResMar, @Amire80, if you need h... [20:45:10] 06Labs, 10Tool-Labs: /etc/cron.daily/logrotate: gzip: stdin: file size changed while zipping - https://phabricator.wikimedia.org/T96007#2845613 (10valhallasw) Sounds good to me. [20:46:17] 06Labs, 10Tool-Labs: toolsbeta grid misconfigured - https://phabricator.wikimedia.org/T136433#2845614 (10scfc) 05Open>03Invalid Currently there is no grid in Toolsbeta :-). [20:54:52] 06Labs, 10Tool-Labs: "webservice" command (with jsub) doesn't work in crontab - https://phabricator.wikimedia.org/T135348#2295987 (10scfc) If I understand the `crontab` of `tools.erwin85` correctly, this was intended to unconditionally restart the webservice hourly. IMHO that concept should not be supported.... [20:55:00] 06Labs, 10Tool-Labs: "webservice" command (with jsub) doesn't work in crontab - https://phabricator.wikimedia.org/T135348#2845630 (10scfc) 05Open>03declined [20:57:12] 06Labs, 10Quarry, 10Tool-Labs: Clarify Tool Labs' rules to see if Quarry and PAWS are allowed to be hosted there - https://phabricator.wikimedia.org/T152212#2841828 (10valhallasw) * Exceptions to this rule can be made on a case-by-case basis. Please contact us with your idea, so that we can discuss possible... [20:59:47] 06Labs, 10Tool-Labs: Disable xdebug on Tool Labs - https://phabricator.wikimedia.org/T137146#2845636 (10tom29739) @scfc: I ran the tests and reported back in T137146#2388908. I found that while xdebug slowed down the requests by a small amount, it was NFS that slowed down the requests the most. [21:18:15] 06Labs, 10Tool-Labs, 07Software-Licensing, 07Upstream: Remove or prune cdnjs on tools-static - https://phabricator.wikimedia.org/T128841#2845667 (10scfc) 05Open>03declined Loading external resources is a topic discussed in T129936. Regardless of the conclusions there, the pure existence of `cdnjs` in... [21:24:52] 06Labs, 10Tool-Labs, 10DBA: enwiki_p replica on s1 is corrupted - https://phabricator.wikimedia.org/T134203#2845673 (10jcrespo) [21:24:55] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2845672 (10jcrespo) [21:29:12] 06Labs, 10Tool-Labs: dplbot webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#1718509 (10scfc) ATM, http://tools.wmflabs.org/dplbot/ returns 503 ("No webservice"). There is a pod running: ``` tools.dplbot@tools-bastion-03:~$ kubectl get pod NAME READY... [21:33:35] 06Labs, 10Tool-Labs: dplbot webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2845677 (10scfc) I have now restarted the webservice with `webservice restart`. Now Redis on `tools-proxy-01` points to the new Kubernetes pod (http://192.168.0.26:8000), while `tools-proxy-02` co... [21:39:00] 06Labs, 10Tool-Labs: dplbot webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2845679 (10scfc) `/etc/active-proxy` is `tools-proxy-01` on `tools-bastion-03`. [21:52:48] 06Labs, 10Tool-Labs: Redis replication from tools-proxy-01 to tools-proxy-02 broken - https://phabricator.wikimedia.org/T152356#2845691 (10scfc) [21:54:34] 06Labs, 10Tool-Labs: dplbot webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#1718509 (10scfc) 05Open>03Resolved a:03scfc I've restarted the webservice yet again, and now the entry on `tools-proxy-01` points to http://192.168.0.50:8000, with the entry on `tools-proxy-0... [22:00:18] 06Labs, 10Tool-Labs: Redis replication from tools-proxy-01 to tools-proxy-02 broken - https://phabricator.wikimedia.org/T152356#2845726 (10scfc) `/var/log/redis/tcp_6379.log` is full of: ``` [623] 04 Dec 21:58:52.418 * Connecting to MASTER tools-proxy-01:6379 [623] 04 Dec 21:58:52.418 * MASTER <-> SLAVE sync... [22:12:51] 06Labs, 10Tool-Labs: Redis replication from tools-proxy-01 to tools-proxy-02 broken - https://phabricator.wikimedia.org/T152356#2845735 (10scfc) I don't understand why this did not pop up earlier. AFAICT, at least since 81325bfbad85fa9b8d775a15968f9c6f991c6c0c (2015-09-29) connections between `tools-proxy-01`... [22:30:35] 06Labs, 10Labs-Infrastructure, 10DBA: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#2845769 (10scfc) [22:30:36] 06Labs, 10Labs-Infrastructure, 10DBA, 07Epic, 07Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#2845770 (10scfc) [22:30:39] 10Tool-Labs-tools-Other: Migrate http://toolserver.org/~dispenser/* to Tool Labs - https://phabricator.wikimedia.org/T68868#2845771 (10scfc) [22:30:44] 06Labs, 10Tool-Labs, 07Upstream: Unable to explain queries on replicated databases - https://phabricator.wikimedia.org/T50875#2845764 (10scfc) 05Open>03Resolved a:03coren Closing this task again (as @chasemp suggested in T141095#2511565). The scope of this task was being unable to explain queries on r... [22:45:04] 06Labs, 10Tool-Labs: Disable xdebug on Tool Labs - https://phabricator.wikimedia.org/T137146#2845789 (10Legoktm) [22:45:07] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2845790 (10Legoktm) [22:45:09] 06Labs: Create new labs project tools-xdebug-testing - https://phabricator.wikimedia.org/T138097#2845788 (10Legoktm) 05Resolved>03declined [22:50:37] 10Tool-Labs-tools-Erwin's-tools: For related changes make category changes switchable. - https://phabricator.wikimedia.org/T146594#2845793 (10Akoopal) 05Open>03Resolved a:03Akoopal This has now been implemented, there is now an option to 'hide category edits' same way as others. [23:00:11] 06Labs, 10Tool-Labs: Redis replication from tools-proxy-01 to tools-proxy-02 broken - https://phabricator.wikimedia.org/T152356#2845802 (10scfc) I stand corrected: The `ferm` rule in `modules/toollabs/manifests/proxy.pp` from 113358714b8caedc3b21f9b9395fbaa7e2b2d1fa should still be active. [23:12:53] 06Labs, 10DBA: Querying the logging table on labs is slow - https://phabricator.wikimedia.org/T131266#2845804 (10MZMcBride) I tried the trick mentioned by @scfc in T50875#2845764. ``` MariaDB [enwiki_p]> select log_timestamp, log_comment from logging_userindex where log_namespace = 14 and log_title = 'Handhel... [23:14:16] 06Labs, 10Tool-Labs: Redis replication from tools-proxy-01 to tools-proxy-02 broken - https://phabricator.wikimedia.org/T152356#2845808 (10scfc) I believe the blocking comes from the security group `webproxy` that does not allow access to port 6379. [23:52:07] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [23:58:22] 06Labs, 10Tool-Labs: Redis replication from tools-proxy-01 to tools-proxy-02 broken - https://phabricator.wikimedia.org/T152356#2845878 (10scfc) The truth seems to be much simplier: `/etc/redis/redis-common.conf` has `bind 127.0.0.1`. When the `redis` module was initially introduced in 9356e710ee01fd39c102237...