[00:06:12] (03CR) 10BryanDavis: [C: 032] "Wake up and do your job zuul." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287690 (owner: 10BryanDavis) [00:07:36] (03PS2) 10BryanDavis: www: make some performance optimizations [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287691 [00:07:38] (03PS1) 10BryanDavis: www: Don't list all tools on the default landing page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 [00:09:14] (03CR) 10BryanDavis: www: make some performance optimizations (031 comment) [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287691 (owner: 10BryanDavis) [00:12:24] (03CR) 10BryanDavis: "After the parent patches for this are merged I'd like to test via a cherry-pick before merging. Reverting the pick will be easier if I've " [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 (owner: 10BryanDavis) [00:17:29] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: Rewrite jsub in python - https://phabricator.wikimedia.org/T132475#2279131 (10scfc) For the tool `dbreps`, `crontab`s fail with: ``` From: root@tools.wmflabs.org (Cron Daemon) Subject: Cron js... [00:23:34] 06Labs, 10Tool-Labs, 06Operations: toolserver.org certificate to expire 2016-06-30 - https://phabricator.wikimedia.org/T134798#2279145 (10Peachey88) There is already a ticket in #procurement for this. I think @Dzahn mentioned on IRC at the time about making this Lets Encrypt. [00:27:11] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: Rewrite jsub in python - https://phabricator.wikimedia.org/T132475#2279157 (10bd808) >>! In T132475#2279131, @scfc wrote: > For the tool `dbreps`, `crontab`s fail with: > > ``` > From: root@tools.wmflabs.org (Cron Daemon)... [01:02:58] (03PS1) 10BryanDavis: jsub: Disable abbreviated argument handling [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287875 (https://phabricator.wikimedia.org/T132475) [01:03:44] (03PS2) 10BryanDavis: jsub: Disable abbreviated argument handling [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287875 (https://phabricator.wikimedia.org/T132475) [01:09:48] 06Labs, 10Tool-Labs, 06Operations: toolserver.org certificate to expire 2016-06-30 - https://phabricator.wikimedia.org/T134798#2279222 (10Dzahn) T134363 [01:13:42] (03CR) 10BryanDavis: [C: 032] jsub: Disable abbreviated argument handling [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287875 (https://phabricator.wikimedia.org/T132475) (owner: 10BryanDavis) [01:30:07] has wmflabs hostkey change recently? [01:30:32] bd808, you always know everything that's going on :) [01:30:39] yurik: A few weeks ago yes. [01:30:47] thx Matthew_ ! [01:30:55] Hold on. [01:31:13] SHA256:aEIm/9XDSpfyXhIzNNcwjV5ijm+qnmSHRSCQr5euVa4? [01:31:22] yurik: https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/tools-login.wmflabs.org [01:31:36] oh, wow, we have them online :) [01:31:37] thansk! [01:31:42] Yep :) [01:32:08] Matthew_, hmm, that's not what i'm seeing... [01:32:22] what host are you sshing to yurik ? [01:32:34] vem.maps-team.eqiad.wmflabs [01:32:46] that's your own VM [01:33:04] right, but doesn't the bastion changes that? [01:33:04] Oh yeah I don't think that's listed. [01:33:28] i rebuilt it a while ago, and i have connected to it a few times afterwards [01:33:29] bastion should be this I think -- https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/primary.bastion.wmflabs.org [01:33:54] bd808, does changing at bastion affect my key as well? [01:34:19] you key? no. but the host key that you see from the machine you connect to [01:34:34] so you need to figure out which host you are actually being warned about [01:34:47] and then check for it on https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints [01:35:08] bd808, the warning was about the vem host (my vm), which is strange because i have used it since rebuiding [01:35:27] i just tried to connect to prime.bastion, the key seems ok [01:35:29] so should eb fine [01:35:33] thx for checknig! [01:35:36] yw [01:35:59] maybe you left off the "maps-team" part when you used it last too. [01:36:10] hmm, would it still work/ [01:36:11] ? [01:36:16] yesh [01:36:24] s/h// [01:36:40] our dns is still kind of funky [01:37:06] yep, without it works (although it still asked me for a conf)... i guess its fine [01:46:34] 06Labs, 10Tool-Labs, 15User-bd808: jsub problem with -cwd flag - https://phabricator.wikimedia.org/T134836#2279340 (10bd808) [01:50:43] (03PS1) 10BryanDavis: jsub: fix handling of 0-arg qsub arguments [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287883 (https://phabricator.wikimedia.org/T134836) [02:31:18] 06Labs, 10Tool-Labs, 13Patch-For-Review, 15User-bd808: jsub problem with -cwd flag - https://phabricator.wikimedia.org/T134836#2279387 (10bd808) p:05Triage>03High [02:40:45] (03PS3) 10Yuvipanda: jsub: Disable abbreviated argument handling [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287875 (https://phabricator.wikimedia.org/T132475) (owner: 10BryanDavis) [02:40:54] (03CR) 10Yuvipanda: [C: 032 V: 032] jsub: Disable abbreviated argument handling [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287875 (https://phabricator.wikimedia.org/T132475) (owner: 10BryanDavis) [02:41:10] (03PS2) 10Yuvipanda: jsub: fix handling of 0-arg qsub arguments [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287883 (https://phabricator.wikimedia.org/T134836) (owner: 10BryanDavis) [02:44:55] 06Labs, 10Tool-Labs, 13Patch-For-Review, 15User-bd808: jsub problem with -cwd flag - https://phabricator.wikimedia.org/T134836#2279394 (10bd808) Caused by {T132475} [02:45:23] (03CR) 10Yuvipanda: [C: 032] jsub: fix handling of 0-arg qsub arguments [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287883 (https://phabricator.wikimedia.org/T134836) (owner: 10BryanDavis) [02:49:34] (03PS1) 10Yuvipanda: Deb version bump [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287885 [02:50:22] (03CR) 10Yuvipanda: [C: 032 V: 032] Deb version bump [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287885 (owner: 10Yuvipanda) [03:18:24] 06Labs, 10Tool-Labs, 13Patch-For-Review, 15User-bd808: jsub problem with -cwd flag - https://phabricator.wikimedia.org/T134836#2279419 (10bd808) 05Open>03Resolved [03:20:12] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: Rewrite jsub in python - https://phabricator.wikimedia.org/T132475#2279420 (10bd808) 05Open>03Resolved There may be new problems found but we should track them in new tickets. [03:49:29] PROBLEM - Host tools-services-01 is DOWN: PING CRITICAL - Packet loss = 100% [03:54:17] RECOVERY - Host tools-services-01 is UP: PING OK - Packet loss = 0%, RTA = 1.08 ms [03:54:19] ^ is us [04:25:12] !log tools Added role::package::builder to tools-services-01 [04:25:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [06:44:13] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:19:21] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:29:23] 06Labs, 10Tool-Labs: Simplify and reduce the amount of options jsub supports (Tracking) - https://phabricator.wikimedia.org/T134846#2279594 (10yuvipanda) [07:44:52] (03CR) 10Lokal Profil: [C: 04-1] "See inline comment" (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/287792 (https://phabricator.wikimedia.org/T134727) (owner: 10Jean-Frédéric) [07:56:51] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Setup NSS inside containers used in Tool Labs - https://phabricator.wikimedia.org/T134748#2279642 (10yuvipanda) [08:34:51] PROBLEM - Host tools-bastion-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.228) [10:04:23] 06Labs, 10Tool-Labs, 07Tracking: Simplify and reduce the amount of options jsub supports (Tracking) - https://phabricator.wikimedia.org/T134846#2279893 (10Danny_B) [10:23:41] when i want to login to my bot account with pywikibot at lab, it returns Login failed (WrongPass), whats the matter? [10:24:45] Stang: curious, can you try it on tools-dev.wmflabs.org? [10:25:21] you mean i should login again at tools-dev.wmflabs.org? [10:25:32] 06Labs, 10Tool-Labs: No API login with login token possible at tools-bastion-02 only - https://phabricator.wikimedia.org/T134262#2279969 (10doctaxon) @Anomie , now the same error does occur at bastion-03, but not at bastion-02. So tell me please, what are your further opinions to handle this error? [10:29:00] solved, thx [10:29:05] Stang: how did you solve it? [10:29:23] login again at tools-dev.wmflabs.org [10:29:49] Formly i use login.tools.wmflabs.org [10:29:49] hmm ok [10:49:16] nope, Stang: this time WrongPass error at login-tools.wmflabs.org (bastion-03) [10:50:52] 06Labs, 10Tool-Labs: No API login with login token possible at tools-bastion-02 only - https://phabricator.wikimedia.org/T134262#2280014 (10valhallasw) >>! In T134262#2260242, @Anomie wrote: > ConfirmEdit returns a WrongPass error "to confuse the shit out of attackers" when the captcha fails > > (...) > > OTO... [10:57:02] YuviPanda: within the last 8 minutes the login is possible again [13:21:25] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Setup NSS inside containers used in Tool Labs - https://phabricator.wikimedia.org/T134748#2280611 (10yuvipanda) It's actually NSS, not PAM. After some experimentation, `libnss-ldapd` which is the recommended setup, works *almost flawlessly* out of the box, except for... [13:26:25] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Setup NSS inside containers used in Tool Labs - https://phabricator.wikimedia.org/T134748#2280615 (10yuvipanda) If we go with (2) we've to somehow configure `libnss-ldapd` to not try to install nslcd itself but to just talk to the appropriate socket. [13:26:42] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Setup NSS inside containers used in Tool Labs - https://phabricator.wikimedia.org/T134748#2280616 (10MoritzMuehlenhoff) The data presented by nslcd is identical to all hosts, so exploring (2) seems best to me. [13:33:58] (03CR) 10Hashar: "recheck" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287885 (owner: 10Yuvipanda) [13:34:58] 06Labs, 10Tool-Labs: No API login with login token possible at tools-bastion-02 only - https://phabricator.wikimedia.org/T134262#2280640 (10Anomie) >>! In T134262#2261233, @doctaxon wrote: > @Anomie: is it possible to monitor the bastion instances to better figure out, what's going on? Maybe, but it wouldn't... [14:59:16] 06Labs: pdns trying to resolve wikimedia.org.eqiad.wmflabs - https://phabricator.wikimedia.org/T128123#2281007 (10jcrespo) [14:59:19] 06Labs, 13Patch-For-Review: Periodic internal labs dns outages - https://phabricator.wikimedia.org/T124680#2281008 (10jcrespo) [14:59:21] 06Labs, 10DBA, 13Patch-For-Review: Move labs pdns database off of m5-master - https://phabricator.wikimedia.org/T128737#2281005 (10jcrespo) 05Open>03Resolved Pdns database has been dropped and all account with grants to it [15:19:45] 06Labs, 10labs-sprint-116, 10DBA, 13Patch-For-Review: watchlist table not available on labs - https://phabricator.wikimedia.org/T59617#2281077 (10jcrespo) Enwiki will take a bit more than I though- there was the failover, thenthere were multiple issues on db1069, but I think they are now fixed: T134349 and... [15:22:27] (03PS2) 10Jean-Frédéric: Strip wikitext comments out of parsed values in templates [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/287792 (https://phabricator.wikimedia.org/T134727) [15:23:34] (03CR) 10Jean-Frédéric: "Good point, adjusted, and added test cases for all these." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/287792 (https://phabricator.wikimedia.org/T134727) (owner: 10Jean-Frédéric) [15:27:43] 06Labs, 06Operations: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2281144 (10chasemp) [15:30:14] 06Labs, 06Operations: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2281144 (10ArielGlenn) The proposed size of the filesystem for dumps looks fine to me. [15:30:29] 06Labs, 06Operations: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2281144 (10yuvipanda) +1 <3. Will we move the content over? There were no guarantees of such, but it might be a nice gesture. No need for it to be complete or consistent. [15:31:04] 06Labs, 06Operations: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2281169 (10yuvipanda) Should also remember to soft mount these. [15:32:24] 06Labs, 06Operations: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2281176 (10chasemp) I have really no opinion on moving the data over, other than it costs us in maint time obv. We don't have a real strategy on /scratch cleanup so it's all adhoc reasoning. [15:33:34] PROBLEM - Puppet run on tools-exec-1205 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:35:01] 06Labs, 06Operations: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2281199 (10yuvipanda) yeah. I'd like for us to just do a simple rsync if possible. If we decide to not do that, we should provide people notice as well. I know that the kiwix project for example is using it for... [15:37:52] 06Labs, 06Operations: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2281202 (10chasemp) The main thing would be I would want to snapshot the volume, then copy over and then swap out to the new volume as gracefully as possible. Which in some cases is not graceful at all. That me... [15:39:20] 06Labs, 06Operations: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2281206 (10yuvipanda) I think that's good enough if we pre-announce it early enough. [15:39:39] 06Labs, 06Operations: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2281207 (10chasemp) This will also involve a period of dumps being offline. I think this is mostly a small issue though, anecdotally I don't see too many consumers this morning. But I can't do the shuffle onli... [15:45:12] 06Labs, 10labs-sprint-116, 10DBA, 13Patch-For-Review: Make watchlist table available on labs - https://phabricator.wikimedia.org/T59617#2281218 (10jcrespo) [15:47:14] !log paws added bd808 to project and made as admin [15:47:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL, Master [15:47:53] YuviPanda: did you get that wikibase wiki to work yet? [15:48:52] bd808: nope... [15:49:05] bd808: and unfortunately I've to get on a flight in about 10minutes [15:49:24] k. I'll look after I get done digging around in the action api data for anomie [15:51:18] YuviPanda: safe travels [16:03:41] RECOVERY - Puppet run on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [16:11:39] 06Labs: new bootstrap-vz jessie images don't log (and maybe don't start at all) - https://phabricator.wikimedia.org/T133551#2281350 (10Andrew) 05Open>03Resolved Upstream patch (a different one from above) was merged: https://github.com/andsens/bootstrap-vz/pull/321 [16:11:48] (03PS3) 10BryanDavis: www: make some performance optimizations [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287691 [16:12:04] (03CR) 10BryanDavis: [C: 032] www: make some performance optimizations [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287691 (owner: 10BryanDavis) [16:22:04] (03PS1) 10BryanDavis: www: Fix "undefined constant SCRIPT_NAME" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287989 [16:22:35] (03CR) 10BryanDavis: [C: 032] www: Fix "undefined constant SCRIPT_NAME" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287989 (owner: 10BryanDavis) [16:23:30] andrewbogott: Hi! I created the instance for May to host this - http://wikimedia-ui.wmflabs.org/. It just has a static git repo that serves this site. I'm not sure if the UI standardization team still needs this - given that the team has been moved around. I guess Volker would be a good person to ask since May is not longer at WMF [16:23:45] I can ask him [16:24:51] andrewbogott: aah just seeing the email [16:25:08] madhuvishy: yeah, in theory I emailed volker yesterday, but if you can nag that'd be great :) [16:25:35] oh - i am in NYC - cannot poke him in person but will also poke over email! [16:26:19] andrewbogott: he just replied :) [16:28:20] (03CR) 10Merlijn van Deen: [C: 031] "<3" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 (owner: 10BryanDavis) [16:31:27] (03PS2) 10BryanDavis: www: Don't list all tools on the default landing page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 [16:32:41] (03CR) 10jenkins-bot: [V: 04-1] www: Don't list all tools on the default landing page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 (owner: 10BryanDavis) [16:33:50] "can't create index cache /var/cache/man/3855: No space left on device" [16:34:02] those pbuilder jobs don't clean up properly :/ [16:44:49] (03CR) 10BryanDavis: "recheck" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 (owner: 10BryanDavis) [16:52:51] !log wlmjurytool killing gmond on wlmjurytool2014, is this instance used in 2016? puppet fail [16:52:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wlmjurytool/SAL, Master [16:54:29] valhallasw`cloud: what do you think? https://tools.wmflabs.org/ [16:55:14] I like it! [16:55:21] it might break the website check :D [16:55:28] yeah? [16:55:30] (I think that checks for 'magnus' in the text) [16:55:34] heh [16:56:03] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 3562 bytes in 0.034 second response time [16:56:09] Should I put a comment in the source? [16:56:14] lol [16:56:27] :D Yes. [16:56:40] small note that actually pages people atm [16:56:43] Find another string to verify. [16:56:43] I'm silencing [16:56:49] bd808: yes :) [16:57:06] i can adjust the icinga check btw, if needed [16:57:11] I reverted the cherry-pick [16:57:24] I'll add a comment to make the check happy [16:57:39] I'm coming in w/ https://gerrit.wikimedia.org/r/#/c/287723/ pretty soon here [16:57:43] which will make it not page [16:59:22] +1 on magnus comment tho :) [17:01:03] (03PS3) 10BryanDavis: www: Don't list all tools on the default landing page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 [17:01:08] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 825707 bytes in 5.024 second response time [17:02:04] (03CR) 10BryanDavis: www: Don't list all tools on the default landing page (031 comment) [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 (owner: 10BryanDavis) [17:02:36] I'm going to pick the change again to see if the comment hack works [17:02:41] you're doing the lords work bd808, or FSM or whomever [17:02:42] kk [17:03:02] All hail Baal [17:03:05] or something [17:03:22] !log dashiki killed gmond on vitalsigns-01, puppet::self master (T115330) [17:03:22] T115330: block labs IPs from sending data to prod ganglia - https://phabricator.wikimedia.org/T115330 [17:03:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Dashiki/SAL, Master [17:03:30] [17:05:07] actually it just seems to be: check_http_slow!20 [17:05:15] for the "tools-home" check [17:05:41] mutante: there are two checks, oen in icinga and one in shinken I think? [17:05:45] it's a case of much confusion [17:05:53] there may also be a check in catchpoint [17:05:58] I can't remember where the magnus callout is [17:06:10] ok, in that case. we wont' get pages from icinga, based on the content and magnus or not [17:06:20] but maybe we want to make it so [17:06:28] that it is like the shinken check [17:07:03] right now the icinga check is just "http works at all and give it 20 seconds to try" [17:07:35] not hard to change to an actual content check though [17:07:36] (03CR) 10BryanDavis: "Cherry-picked to /data/project/admin/toollabs for testing." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 (owner: 10BryanDavis) [17:10:26] i missed an undef index apparently :/ [17:11:24] !log language killed gmond on language-dev, self hosted puppetmaster, (T115330) [17:11:25] T115330: block labs IPs from sending data to prod ganglia - https://phabricator.wikimedia.org/T115330 [17:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Language/SAL, Master [17:12:17] !log language puppet fail on language-dev - cloning from gerrit to /srv/mediawiki/vendor fails, general brokenness [17:12:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Language/SAL, Master [17:14:28] !log wikidata-dev killed gmond on property-suggester, self hosted puppetmaster (T115330) [17:14:29] T115330: block labs IPs from sending data to prod ganglia - https://phabricator.wikimedia.org/T115330 [17:14:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL, Master [17:15:33] !log wikidata-dev property-suggester: hostname: Name or service not known, Error 400 on SERVER: Too many open files, LDAP Search failed, SNAFU [17:15:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL, Master [17:17:42] !log full-text-reference-tool killed gmond, self hosted puppetmaster (T115330) [17:17:43] T115330: block labs IPs from sending data to prod ganglia - https://phabricator.wikimedia.org/T115330 [17:17:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Full-text-reference-tool/SAL, Master [17:38:08] (03PS1) 10BryanDavis: www: Another round of undef index fixes [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287999 [17:38:52] (03CR) 10BryanDavis: [C: 032] www: Don't list all tools on the default landing page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287869 (owner: 10BryanDavis) [17:39:51] (03CR) 10BryanDavis: [C: 032] www: Another round of undef index fixes [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287999 (owner: 10BryanDavis) [17:40:13] (03CR) 10jenkins-bot: [V: 04-1] www: Another round of undef index fixes [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287999 (owner: 10BryanDavis) [17:41:22] (03CR) 10BryanDavis: [C: 032] www: Another round of undef index fixes [labs/toollabs] - 10https://gerrit.wikimedia.org/r/287999 (owner: 10BryanDavis) [17:47:22] chasemp, valhallasw`cloud: Imma gonna stop piddling with tools.admin now and do some real work [17:47:34] :) k [17:47:53] !log tools.admin Deployed a bunch of www changes including a new landing page [17:47:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL, Master [17:48:10] \o/ [17:48:40] bd808: it looks great! [17:49:25] people may suddenly realize that there are instruction links on that landing page [17:49:25] wget -O /dev/null https://tools.wmflabs.org/ 0.00s user 0.01s system 1% cpu 0.775 total [17:49:29] also that :-) [17:53:20] mutante: about? [17:54:02] chasemp: yep [17:54:50] mutante: got an icinga weirndess I don't quite get if you got a sec [17:54:51] https://gerrit.wikimedia.org/r/#/c/287723/ [17:54:52] followed by [17:54:55] https://gerrit.wikimedia.org/r/#/c/287993/ [17:54:58] icinga still says [17:55:08] i just saw a RECOVERY [17:55:12] like seconds ago [17:55:13] Error: Could not find any host matching 'checker.tools.wmflabs.org' [17:55:19] yeah I fixed it manually to see if that one thing would do it [17:55:24] I think it's still broken via puppet [17:55:35] ok, looking [17:55:37] it did fix it to add the host, I don't understand atm why teh host isn't being added by puppet [17:56:09] chasemp: i know [17:56:17] :) that was fast [17:56:19] we need to move the @monitoring::host [17:56:21] what's teh deal? [17:56:24] into the icinga module [17:56:31] it needs to be realized on neon itself [17:56:37] ah [17:56:38] only then will you get the virtual host [17:56:45] so there was alrady a tools.wmflabs.org @host def there [17:56:47] the rest can stay there [17:56:52] but it was non-functional? [17:56:57] and it worked by chance because of [17:57:05] defintion here modules/icinga/manifests/monitor/certs.pp [17:57:06] I wonder? [17:57:21] yes, that is the case [17:57:33] i remember adding the certs.pp stuff [17:57:34] blah ok I was thrown off the the @host in that file [17:57:40] and the same thing happened to us [17:58:10] that's why i have also things like [17:58:18] paws.pp wikidata.pp in there [17:58:27] because they create virtual hosts [17:58:37] yep I'm going to coopt paws into a generic toollabs one I think [17:59:17] yes, that should work [18:00:06] remember if you put a new file in icinga/manifests/monitor/ [18:00:16] you also have to include it in manifests/role/icinga.pp [18:00:50] understood, I think I'm with it now, the nonfunctional @host def really threw me off [18:01:38] *nod* yea, same happened to me and yuvi [18:02:00] hehe yuvi put that one there too [18:08:43] mutante: https://gerrit.wikimedia.org/r/#/c/288011/ [18:13:14] 06Labs, 10Tool-Labs: No API login with login token possible at tools-bastion-02 only - https://phabricator.wikimedia.org/T134262#2281992 (10bd808) >>! In T134262#2280640, @Anomie wrote: > @bd808: Background: Someone is occasionally doing enough failed logins that ConfirmEdit is deciding it needs to show a capt... [18:29:07] chasemp: i was out for a few.. that looks good to me, +1 [18:29:14] thanks [18:29:26] only took me 3 commits but hey [18:30:00] more commits = good for the stats on korma.wmflabs :) [18:30:31] i looked at that when trying to come with something measurable in self-review [18:30:33] juking the stats :) [18:30:43] and at the same time it shows how numbers are meaningless ) [18:31:04] talked about that with others in that "how to write self-review" thing [18:31:33] somebody said "number of code lines" for devs etc :) [18:31:59] and there's the icinga recovery :) [18:33:29] heh [18:33:41] idk who maintains korma but the ticket surge post phab is fascinating [18:33:42] http://korma.wmflabs.org/browser/maniphest.html [18:34:26] :) yes [18:34:35] i think WMF paid bitergia.com for it [18:34:42] something-contractor [18:35:40] They did by the looks of it. [18:36:32] There's their logo on there and on their website the background image shows WMF instance of it. [18:36:35] it has interesting stuff in it, i hope it doesnt have to be killed [18:36:48] yea, i also noticed the laptop with the logo, heh [18:58:57] chasemp: I think phab makes creating tickets easier than BZ did. There are lots of little nits to pick on about phab but overall it is much nicer than BZ imo [18:59:51] It's easier to create tickets with phab whenever I've used it. [18:59:54] also the VE folks (and several other product teams) really ramped up their use of public bug tracking for things [19:00:39] before phab we were up to 4 different task trackers across the WMF plus tracking on wiki and in google spreadsheets [19:00:43] it was a mess [19:05:55] yep agreed [19:18:55] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: Provide a list of the videos available on video2commons servers - https://phabricator.wikimedia.org/T134914#2282360 (10Aklapper) [19:31:15] I myself probably create way more tasks in Phabricator [19:31:38] and we log more things publicly overall regardless. [19:37:39] how do i make sure a bot script keeps running even after i close the ssh session? [19:37:55] do i just close the session, make a cron job, or what? [19:38:17] in labs you would run it on the grid [19:38:37] i think the grid uses cron jobs [19:38:38] PhilrocWP: you can run it inside a "screen" [19:38:46] mutante: but not at tools :P [19:38:53] wow, thought using the grid wasnt nesscary [19:39:14] you can use either toollabs or regular labs [19:39:15] depends [19:39:42] PhilrocWP: everything in tool labs should really run on the grid [19:39:47] bd808, I don't think I've started filing more VE bugs now that we use Phab [19:39:52] it would mean you can run it without grid engine but you have to maintain the whole machine yourself [19:40:48] PhilrocWP: generally speaking running a script on the job grid is as easy as `jsub my_script` [19:41:23] you may find that you need to add something like `-mem 1G` to give the processes a higher ram quota [19:44:12] bd808: for my purposes, it would be jstart instead of jsub [19:45:44] jstart == jsub -continuous ; that will respawn the script if it exits with a non-zero status [19:46:12] well technically jsub -continuous -once [19:47:11] wow, never knew the grid was this simple [19:48:54] there are a lot of fancy qsub options possible but 99% of jobs will never need to use any of them [19:49:47] so can i just exit the ssh session after i do jstart? [19:49:50] which is good because we really want to move away from SGE over the next year or so and we will try very hard to make things "just work" on the replacement system [19:50:07] PhilrocWP: yeah. the grid job is not attached to your shell session [19:50:45] and what did you mean, "just work?" [19:53:12] PhilrocWP: we will change the internals of the launcher scripts to run jobs on the replacement system with similar results to the current SGE backend [19:53:39] so you want good compatibility [19:53:41] ? [19:53:46] yes [19:54:05] as little interruption and extra work for tool maintainers as possible [19:54:16] ok [20:32:29] !log paws Fixed hiera config so https://wikidata-pawsbase.wmflabs.org routes correctly [20:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL, Master [20:36:40] !log paws Visit https://foo-pawsbase.wmflabs.org/ to see list of wikis in wikifarm [20:36:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL, Master [21:53:41] !log testlabs Added BryanDavis (myself) as admin [21:53:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL, Master [21:58:00] !log ircd created new project, added dzahn and krenair as admins [21:58:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ircd/SAL, Master [21:58:30] 10Labs-project-extdist: ExtensionDistributor Labs service down - https://phabricator.wikimedia.org/T134390#2282981 (10Mattflaschen) 05Open>03Invalid Probably just a transient clacks failure. [21:58:32] !log ircd created instance udpmx-01, created puppet groups and class for mw_rc_irc, configured instance with it [21:58:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ircd/SAL, Master [21:58:51] 10Labs-project-extdist: ExtensionDistributor Labs service down - https://phabricator.wikimedia.org/T134390#2282983 (10Mattflaschen) [22:09:51] 06Labs, 10Tool-Labs, 06Operations: toolserver.org certificate to expire 2016-06-30 - https://phabricator.wikimedia.org/T134798#2283026 (10faidon) [22:11:34] 06Labs, 10Tool-Labs, 06Operations: toolserver.org certificate to expire 2016-06-30 - https://phabricator.wikimedia.org/T134798#2277377 (10faidon) @yuvipanda mentioned on the procurement ticket (linked above) that we should use Let's Encrypt. Let's Encrypt does not allow wildcards, so I'm guessing we'd have t... [22:11:52] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Setup NSS inside containers used in Tool Labs - https://phabricator.wikimedia.org/T134748#2283044 (10yuvipanda) After more discussion with @MoritzMuehlenhoff the options for following (2) are: 1. Patch `libnss-ldapd` source package to build a `libnss-ldapd-plain` bina... [22:13:01] 06Labs, 10Tool-Labs, 06Operations: toolserver.org certificate to expire 2016-06-30 - https://phabricator.wikimedia.org/T134798#2283060 (10yuvipanda) We only need toolserver.org and www.toolserver.org I think. [22:14:47] YuviPanda: Any way I can get https://quarry.wmflabs.org/query/9103 not to be killed :/ [22:15:47] Josve05a: I see it as running :) but if it takes more than 30m, not much quarry can do. You can maybe explore paws.wmflabs.org [22:16:02] yeah, it takes longer :/ [22:16:27] Josve05a: post in the 'discuss' section on quarry and ask for help? [22:16:29] uggh...paws...seems so much more complicted...I can't code.at all [22:16:44] yeah, it's probably not ready for all that yet. [22:17:21] haha, I got ERR_TOO_MANY_REDIRECTS on https://paws.wmflabs.org/paws/user/Josve05a [22:17:40] Josve05a: yah, still a problem for new users. try again, it should work [22:17:47] it lacks a lot of polish :) [22:18:14] You dont say? :p [22:18:35] :) if only there were 3 of me :| [22:18:46] there's no way to "lift" the 30 min limit on Quarry? xD [22:19:04] not without more labsdb resources I'm afraid [22:19:08] you need to optimize the query [22:19:23] and now I have to get on a flight. brb [22:22:39] Well, I've tried to optimize it :/ [22:23:40] Wat I want is a list of all ns 0 articles (not disambigs or redirects) with 0 incoming links (alernativly 0 incoming ns-0 links) [22:23:45] What* [22:32:23] 06Labs: confirm that new base labs base image is adequate for kubernetes &c. - https://phabricator.wikimedia.org/T134944#2283185 (10Andrew) [22:34:31] 06Labs: confirm that new base labs base image is adequate for kubernetes &c. - https://phabricator.wikimedia.org/T134944#2283198 (10yuvipanda) Hmm, we're on 3.19 on the k8s hosts because we ran into NFS bugs on 4.2... Not sure if those still exist in 4.4 or what. [23:10:18] bd808: I can see that you are trying to help :D Thanks [23:10:54] I don't know if those uncorrelated subqueries will do any better or not [23:11:40] the problem is that the indexes are built for looking at things in the other direction (outbound usage, not inbound) [23:12:15] yeah :/ [23:13:08] https://en.wikipedia.org/wiki/Special:LonelyPages stopped working, (or stopped updating), and I want "More" than the offered 500 pages every other week... [23:21:34] so, how do I find out why a new tool of mine is returning 502 Bad Gateway? I'm not seeing evidence the request reached my service. (using Rack) [23:21:45] I used portgrabber etc., and my service has been successfully assigned a port. [23:34:55] YuviPanda: around? [23:37:56] madhuvishy, YuviPanda: never mind, figured it out. (my service was not binding to 0.0.0.0) [23:53:37] 06Labs, 10Tool-Labs, 06Operations: toolserver.org certificate to expire 2016-06-30 - https://phabricator.wikimedia.org/T134798#2283509 (10Dzahn) So an existing example to copy puppetized LE setup for a misc service is new RT on ununpentium.