[00:18:40] 10MediaWiki-extensions-OpenStackManager, 10MediaWiki-Authentication-and-authorization, 06Reading-Infrastructure-Team: Update OpenStackManager to use AuthManager - https://phabricator.wikimedia.org/T110288#2287842 (10Tgr) I don't see much point in doing it if the extension is about to be abandoned anyway. [01:04:36] 06Labs, 10Labs-Infrastructure, 10Quarry: Long-running query produces strangely incorrect results - https://phabricator.wikimedia.org/T135087#2287871 (10Neil_P._Quinn_WMF) [01:17:31] 06Labs, 07Diamond: labs: diamond: no NFS mount points were found - https://phabricator.wikimedia.org/T135078#2287619 (10Krenair) ```krenair@udpmx-01:/usr/share/diamond/collectors$ grep NFS * -r mountstats/mountstats.py:The function of MountStatsCollector is to parse the detailed per-mount NFS Binary file mount... [01:21:33] 06Labs, 10Labs-Infrastructure, 10Quarry: Long-running Quarry query (querry?) produces strangely incorrect results - https://phabricator.wikimedia.org/T135087#2287926 (10Neil_P._Quinn_WMF) [03:28:33] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Diffusion, 15User-bd808: Create application to manage Diffusion repositories for a Tool Labs project - https://phabricator.wikimedia.org/T133252#2288158 (10bd808) >>! In T133252#2266309, @bd808 wrote: > It looks like I just need to get down to the fun wo... [03:42:42] hi. tools.taxonbot gets very much replication lag to dewiki. What's going on? [03:45:20] YuviPanda: can you handle a replication lag of dewiki? tools.taxonbot gets very much replication lag. [03:46:23] v [03:51:40] andrewbogott: ^ [03:59:56] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288166 (10doctaxon) [04:00:15] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288179 (10doctaxon) p:05Triage>03Unbreak! [04:09:48] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288181 (10doctaxon) dewiki reports high database utilization showing recent changes or user contributions [04:19:41] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288182 (10doctaxon) hsbwiki is okay, no lags [04:27:20] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288186 (10MZMcBride) ``` mzmcbride@tools-bastion-03:~$ mysql -hdewiki.labsdb dewiki_p -e "select max(rc_timestamp) from recentchanges;" +-------------------+ | max(rc_timestamp... [04:42:34] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288213 (10doctaxon) @MZMcBride Try to read or write content by API at tools-bastion-03 on dewiki, and you'll see long lasting replication lag. And look at dewiki user contribut... [04:51:42] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288217 (10MZMcBride) Looking at currently, I see: ``` { "batchcomplete": "",... [04:54:01] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288218 (10doctaxon) @MZMcBride dewiki user contributions error line gives: Due to high database server lag, changes newer than 98 seconds may not be shown in this list. [05:13:13] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288223 (10doctaxon) Looks like it's all okay again. It lasted about 2 hours. What was going on? [07:18:31] @replag [07:38:23] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2288166 (10Xqt) I don't see a delay on rc or other special pages and dbrepllag seems 0 at the moment. ``` { "batchcomplete": "", "query": { "dbrepllag": [... [08:01:54] hello [08:03:55] hello [08:27:25] 06Labs, 10Labs-Infrastructure, 10Horizon: Horizon web interface lost CSS/JS assets - https://phabricator.wikimedia.org/T135041#2288392 (10hashar) 05Open>03Resolved a:03hashar Seems fine. Might be related to the Varnish misc cache issue that was going on yesterday. Ex: T134989 T135038 [08:34:52] PROBLEM - Host tools-bastion-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.228) [10:18:35] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, 07Documentation: Run a documentation sprint for Labs - https://phabricator.wikimedia.org/T101659#2288539 (10Qgil) [10:19:11] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, 07Documentation: Run a documentation sprint for Labs - https://phabricator.wikimedia.org/T101659#1344508 (10Qgil) >>! In T101659#2285264, @chasemp wrote: > I would be interested. You got it! :) [10:46:24] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: tools.merlbot stopped working - https://phabricator.wikimedia.org/T135006#2288603 (10Merl) Hi, sorry i am inactive atm and will not be able so access my tools in the next weeks. My mailbox is full of failed cronjobs repor... [11:12:39] RECOVERY - Puppet run on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [0.0] [11:40:46] (03CR) 10Lokal Profil: [C: 04-1] "Minor nitpick with the test but otherwise good to go" (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/287792 (https://phabricator.wikimedia.org/T134727) (owner: 10Jean-Frédéric) [11:57:26] (03PS3) 10Jean-Frédéric: Strip wikitext comments out of parsed values in templates [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/287792 (https://phabricator.wikimedia.org/T134727) [11:59:45] (03CR) 10Jean-Frédéric: "Comments adressed." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/287792 (https://phabricator.wikimedia.org/T134727) (owner: 10Jean-Frédéric) [12:33:10] (03CR) 10Lokal Profil: [C: 032] "thanks!" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/287792 (https://phabricator.wikimedia.org/T134727) (owner: 10Jean-Frédéric) [12:34:04] (03Merged) 10jenkins-bot: Strip wikitext comments out of parsed values in templates [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/287792 (https://phabricator.wikimedia.org/T134727) (owner: 10Jean-Frédéric) [13:04:30] !log tools.heritage Deployed latest from Git: ebcd48c (T134727) [13:04:31] T134727: Strip comments from fields during harvest - https://phabricator.wikimedia.org/T134727 [13:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL, Master [13:09:28] bd808: Maybe you can take a look at tools.merlbot again? Seems like the biggest part of the jobs didn't start today [13:14:26] ah, he already wrote that at the task [13:51:16] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2289126 (10doctaxon) yes, since 05:10 UTC all is okay again. But I want t know, what was going on there. [14:57:07] andrewbogott: FYI: qemu updates for stock trusty are now available, but they haven't merged into the trusty-kilo cloud archive [14:57:24] ok [15:57:15] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: tools.merlbot stopped working - https://phabricator.wikimedia.org/T135006#2289502 (10bd808) >>! In T135006#2288603, @Merl wrote: > Hi, sorry i am inactive atm and will not be able so access my tools in the next weeks. > >... [19:59:44] !log phabricator Added BryanDavis (self) as admin for testing diffusion.repository.edit API [19:59:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL, Master [20:42:38] !log rcm.cac Updating repos to make the enviourment ready for some tests [20:42:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm.cac/SAL, Master [20:49:11] 10MediaWiki-extensions-OpenStackManager, 10MediaWiki-Authentication-and-authorization, 06Reading-Infrastructure-Team: Update OpenStackManager to use AuthManager - https://phabricator.wikimedia.org/T110461#2290671 (10Anomie) [21:35:50] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Diffusion, 15User-bd808: Create application to manage Diffusion repositories for a Tool Labs project - https://phabricator.wikimedia.org/T133252#2290871 (10bd808) > Repo callsign is "TOOL" (e.g. "TOOL52937" for tools.versions This... [21:36:38] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Diffusion, 15User-bd808: Create application to manage Diffusion repositories for a Tool Labs project - https://phabricator.wikimedia.org/T133252#2290872 (10yuvipanda) Let's ignore callsigns if we can! [21:36:47] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Diffusion, 15User-bd808: Create application to manage Diffusion repositories for a Tool Labs project - https://phabricator.wikimedia.org/T133252#2290873 (10bd808) [21:50:24] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Diffusion, 15User-bd808: Create application to manage Diffusion repositories for a Tool Labs project - https://phabricator.wikimedia.org/T133252#2290918 (10mmodell) If you leave callsign undefined it just names the repositories with a sequential number b... [22:37:07] (03PS1) 10Andrew Bogott: Add private bits of labsldapconfig [labs/private] - 10https://gerrit.wikimedia.org/r/288529 [22:39:15] (03CR) 10Andrew Bogott: [C: 032 V: 032] Add private bits of labsldapconfig [labs/private] - 10https://gerrit.wikimedia.org/r/288529 (owner: 10Andrew Bogott) [22:45:31] 06Labs, 10Labs-Infrastructure, 10Quarry: Long-running Quarry query (querry?) produces strangely incorrect results - https://phabricator.wikimedia.org/T135087#2287871 (10Krenair) I looked through the history of that query in the database, and have no good explanation for this. [22:46:51] !log tools deploy k8s master for 1.2.4wmf1 [22:46:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [22:57:21] PROBLEM - Host tools-worker-1004 is DOWN: CRITICAL - Host Unreachable (10.68.16.126) [22:58:49] !log deploy k8s on all worker nodes [22:58:50] deploy is not a valid project. [22:58:56] !log tools deploy k8s on all worker nodes [22:59:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [22:59:05] !log tools deploy k8s 1.2.4wmf1 on all proxy nodes [22:59:12] !log restart tools-worker-1004 to attempt bringing it back up [22:59:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [22:59:13] restart is not a valid project. [22:59:32] !log tools restart tools-worker-1004 to attempt bringing it back up [22:59:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [23:00:51] 10Quarry, 07Easy: Display time taken to execute a query - https://phabricator.wikimedia.org/T135189#2291243 (10Matthewrbowker) [23:01:43] RECOVERY - Host tools-worker-1004 is UP: PING OK - Packet loss = 0%, RTA = 0.86 ms [23:06:02] (03PS1) 10Andrew Bogott: Sync up passwords with hiera and the password module [labs/private] - 10https://gerrit.wikimedia.org/r/288539 [23:06:25] (03CR) 10Andrew Bogott: [C: 032] Sync up passwords with hiera and the password module [labs/private] - 10https://gerrit.wikimedia.org/r/288539 (owner: 10Andrew Bogott) [23:06:28] PROBLEM - Puppet staleness on tools-worker-1004 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [23:06:32] (03CR) 10Andrew Bogott: [V: 032] Sync up passwords with hiera and the password module [labs/private] - 10https://gerrit.wikimedia.org/r/288539 (owner: 10Andrew Bogott) [23:06:50] PROBLEM - Free space - all mounts on tools-worker-1004 is CRITICAL: CRITICAL: tools.tools-worker-1004.diskspace.root.byte_percentfree (<100.00%) [23:08:07] 06Labs: tools-worker-1004 is dead - https://phabricator.wikimedia.org/T134717#2291302 (10yuvipanda) 05Open>03Resolved a:03yuvipanda Another reboot fixed it, and I'm going to let that be since we don't seem to have the time or energy to debug this :( [23:11:35] RECOVERY - Puppet staleness on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [3600.0] [23:12:09] bd808: Any way to specify Mediawiki version with the current Vagrant image? I don't see any config options or methods to modify the ruby files to download 1.26 rather than the latest. [23:12:35] there is a way... sort of [23:13:38] * bd808 checks to see if it is a heira setting or not [23:15:43] CZauX: I haven't tested this in a long time, but ... you should be able to set a "mediawiki::branch: REL1_25" or similar setting via hierdata/local.yaml [23:16:05] that has to be done before the first time the mediawiki dir is cloned [23:16:33] the other way to switch branches is just to manually change the git clones [23:17:16] The fundraising role actually makes a separate clone of mediawiki/core and sets the branch in heira [23:18:05] for 1.26 it would of course be REL1_26 instead of REL1_25 [23:19:03] `vagrant hiera mediawiki::branch REL1_26` would be the way to make that setting from the cli [23:19:29] if you do that before the first `vagrant up` on a new instance it *should* work [23:19:46] if it doesn't I'd be willing to help try and figure out why [23:20:06] I don't see any hiera folder or file with 'local.yaml' on it in the cloned vagrant project after my 1.27 deployment. Is that located somewhere else? [23:20:24] it doesn't exist by default [23:20:42] it would be srv/mediawiki-vagrant/puppet/hierdata/local.yaml [23:20:56] ah-ha, I'l try that out, ty [23:21:16] if you use the vagrant command to do it it will end up in puppet/hieradata/vagrant-managed.yaml [23:21:55] RECOVERY - Puppet run on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [23:33:03] bd808: why is logging into Tool Labs suddenly requiring a password? [23:37:16] It's requiring a password for me too. [23:48:14] PROBLEM - Puppet run on tools-worker-1012 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:48:54] PROBLEM - Puppet run on tools-grid-master is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [23:49:44] PROBLEM - Puppet run on tools-webgrid-lighttpd-1415 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [23:49:44] PROBLEM - Puppet run on tools-exec-1209 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:50:44] PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:50:58] That might be why... [23:54:02] PROBLEM - Puppet run on tools-webgrid-lighttpd-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [23:55:43] so I'm around and I see it [23:55:48] not sure why yet please standby [23:56:45] * andrewbogott here, looking [23:57:01] PROBLEM - Puppet run on tools-exec-1219 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [23:57:01] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [23:57:28] andrewbogott: https://gerrit.wikimedia.org/r/#/c/288536/2/modules/ldap/manifests/role/config.pp or https://gerrit.wikimedia.org/r/#/c/288530/ ? [23:57:53] yes, probably that second one [23:58:00] although I tested it to death with the puppet compiler [23:58:51] and then the eim failures are puppet, not sure if related? [23:58:58] not related [23:59:00] but they seem to either be flaky or something [23:59:01] at least not directly [23:59:20] the exim thing is a thing bd808 knows about [23:59:37] what's weird is even the tools-bastion-03 prompts for password w/ root... [23:59:39] in short, the config that puppet installs is incompatible with the version that's actually installed on labs boxes [23:59:41] idk wtf