[00:11:14] 06Labs, 06WMF-Legal: Potential ambiguities in the Labs Terms of Use - https://phabricator.wikimedia.org/T140486#2473626 (10jayvdb) If Quarry is only storing "Published queries", "Starred Queries" and "Draft Queries" for each user, then I believe it is compliant. Those items are clearly associated with my user... [00:23:39] 06Labs, 06WMF-Legal: Potential ambiguities in the Labs Terms of Use - https://phabricator.wikimedia.org/T140486#2473636 (10tom29739) Quarry would be non compliant because the ToU classes usernames as Private info. So you *must* show this disclaimer before collecting the private information (in this case the us... [00:52:46] 06Labs, 06WMF-Legal: Potential ambiguities in the Labs Terms of Use - https://phabricator.wikimedia.org/T140486#2473713 (10jayvdb) >>! In T140486#2473636, @tom29739 wrote: > Quarry would be non compliant because the ToU classes usernames as private information. The ToU states that you *must* show this disclaim... [01:06:03] !log tools Upgraded Elasticsearch on tools-elastic-* to 2.3.4 [01:06:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [02:26:56] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Modernize the admin tool's codebase - https://phabricator.wikimedia.org/T140254#2473850 (10bd808) @yuvipanda and @valhallasw I think that this is ready to actually use in place of the current admin tool's `www` content. Could the two of you giv... [03:21:14] 06Labs, 10Labs-Infrastructure, 10Labs-project-extdist: legoktm unable to log into extdist-02.eqiad.wmflabs - https://phabricator.wikimedia.org/T140711#2473886 (10Legoktm) [04:47:30] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Security-Reviews: Security review of Tool Labs console application - https://phabricator.wikimedia.org/T135784#2473933 (10bd808) >>! In T135784#2472547, @dpatrick wrote: > * Vagrant tools cannot find Python.h; needs to depends on -dev package > * Database... [04:48:17] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Security-Reviews: Security review of Tool Labs console application - https://phabricator.wikimedia.org/T135784#2473934 (10bd808) >>! In T135784#2472547, @dpatrick wrote: > `striker/tools/models.py`, line 78 > * fix TODO for field lengths https://github.co... [05:07:38] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Security-Reviews: Security review of Tool Labs console application - https://phabricator.wikimedia.org/T135784#2473946 (10bd808) >>! In T135784#2472547, @dpatrick wrote: > `striker/labsauth/views.py`, line 48 > * Should session expiry be an ini-based confi... [05:08:15] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Security-Reviews: Security review of Tool Labs console application - https://phabricator.wikimedia.org/T135784#2473947 (10bd808) a:03dpatrick [06:54:16] 06Labs, 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2474008 (10madhuvishy) Pasting public gpg key here - if this isn't the best idea, happy to make a new one. ``` -----BEGIN PGP PUBLIC KEY BLOCK----- Ver... [06:59:27] PROBLEM - Puppet run on tools-exec-1404 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:34:26] RECOVERY - Puppet run on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [07:51:24] 06Labs, 10Tool-Labs: system wide software request: sbt - https://phabricator.wikimedia.org/T50859#2474058 (10intracer) [09:14:21] 06Labs, 10Labs-Infrastructure, 06Operations: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#2474575 (10fgiunchedi) freshly restarted slapd ```openldap 16378 1.8 1.2 466556 52048 ? Ssl 09:10 0:03 /usr/sbin/slapd -h ldap:/// ldaps:/// ldapi:/// -g openldap -u openldap -f... [10:06:13] 06Labs, 10Labs-Infrastructure, 10Labs-project-extdist: legoktm unable to log into extdist-02.eqiad.wmflabs - https://phabricator.wikimedia.org/T140711#2474682 (10yuvipanda) I can't get in with my root key either, looks like the instance is stuck [10:16:04] 06Labs, 10Labs-Infrastructure, 10Labs-project-extdist: legoktm unable to log into extdist-02.eqiad.wmflabs - https://phabricator.wikimedia.org/T140711#2474689 (10yuvipanda) I've rebooted it (and upgraded the kernel too, since I saw lots of ksoftirqd usage), and you should be able to log in now. [10:17:50] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: Modernize the admin tool's codebase - https://phabricator.wikimedia.org/T140254#2474690 (10yuvipanda) I like it! (and @valhallasw is on vacation for another week I think) [10:26:13] 06Labs, 06WMF-Legal: Potential ambiguities in the Labs Terms of Use - https://phabricator.wikimedia.org/T140486#2474705 (10zhuyifei1999) A tool that edit on a user's behalf via OAuth automatically disclose the username to MediaWiki revision table, thus publicly and permanently storing "This username has used t... [10:41:50] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [10:49:25] (03PS1) 10Elukey: Add wmde_secrets and analyticsdeploy scap keyholder keys [labs/private] - 10https://gerrit.wikimedia.org/r/299731 [10:50:02] (03CR) 10Elukey: [C: 032] Add wmde_secrets and analyticsdeploy scap keyholder keys [labs/private] - 10https://gerrit.wikimedia.org/r/299731 (owner: 10Elukey) [10:50:09] ^ seems transient (the puppet failure) [10:50:33] (03CR) 10Elukey: [V: 032] Add wmde_secrets and analyticsdeploy scap keyholder keys [labs/private] - 10https://gerrit.wikimedia.org/r/299731 (owner: 10Elukey) [10:55:13] PROBLEM - Puppet run on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [10:55:29] PROBLEM - Puppet run on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [10:55:50] PROBLEM - Puppet run on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [10:56:21] this string of puppet failures brought to you by: Yuvi Panda [10:58:40] PROBLEM - Puppet run on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [10:58:48] PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [10:59:04] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:59:04] PROBLEM - Puppet run on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [10:59:10] PROBLEM - Puppet run on tools-webgrid-lighttpd-1204 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:00:08] PROBLEM - Puppet run on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:00:18] PROBLEM - Puppet run on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:01:38] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:03:38] PROBLEM - Puppet run on tools-bastion-05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [11:05:39] PROBLEM - Puppet run on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:06:05] PROBLEM - Puppet run on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:06:49] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [11:09:14] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:11:50] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [11:12:22] PROBLEM - Puppet run on tools-webgrid-lighttpd-1206 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:12:38] PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [11:14:35] PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [11:14:41] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [11:15:35] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [11:16:07] PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:17:02] (fix in place now) [11:17:07] PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:17:51] PROBLEM - Puppet run on tools-webgrid-lighttpd-1205 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:18:09] PROBLEM - Puppet run on tools-webgrid-generic-1405 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:18:21] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [11:18:21] PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:18:23] PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:18:59] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [11:21:03] PROBLEM - Puppet run on tools-checker-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:21:52] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:22:18] PROBLEM - Puppet run on tools-webgrid-lighttpd-1202 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:23:06] PROBLEM - Puppet run on tools-webgrid-lighttpd-1415 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:24:20] PROBLEM - Puppet run on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:24:32] job-ID prior name user state submit/start at queue slots ja-task-ID [11:24:35] ----------------------------------------------------------------------------------------------------------------- [11:24:39] 8762892 0.30419 lighttpd-i tools.ifttt- r 07/17/2016 01:41:30 webgrid-lighttpd@tools-webgrid 1 [11:24:44] What is going on with my labs tool [11:25:09] I need to stop the lighttpd service running. I need to instead run uwsgi-python [11:25:12] Any help? [11:25:39] webservice stop, webservice uwsgi-python start? [11:26:00] PROBLEM - Puppet run on tools-webgrid-lighttpd-1401 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [11:27:14] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:28:26] YuviPanda: I keep getting the lighttpd running [11:28:47] Also, I see a bunch of PROBLEMS spitted by shinken-wm [11:28:57] Iss there any problem with labs? [11:29:59] YuviPanda: job-ID prior name user state submit/start at queue slots ja-task-ID [11:30:02] ----------------------------------------------------------------------------------------------------------------- [11:30:05] 8762892 0.30420 lighttpd-i tools.ifttt- r 07/17/2016 01:41:30 webgrid-lighttpd@tools-webgrid 1 [11:30:17] I did not start that process/service [11:32:47] It somehow started on its own. [11:33:47] RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [11:34:03] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [11:34:05] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [11:34:11] RECOVERY - Puppet run on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [11:35:11] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [11:35:15] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [11:35:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [11:35:31] RECOVERY - Puppet run on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [11:35:51] RECOVERY - Puppet run on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [11:36:41] RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [11:38:39] RECOVERY - Puppet run on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [11:39:38] d3r1ck what is the name of your tool? [11:40:40] RECOVERY - Puppet run on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [11:40:56] and no, the shinken stuff is just an annoying puppet failure because of debian packages, and it's all recovered now. It also caused no actual issues outside of noise in channel I guess [11:41:04] RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [11:42:20] YuviPanda: My tool is ifttt-testing [11:42:31] Its misbehaving all of a sudden [11:44:03] d3r1ck look at uwsgi.log, you have syntax errors in your python code [11:44:17] so uwsgi keeps crashing [11:44:22] I've killed the lighttpd job separately (with a 'qdel') [11:44:26] Ohhh, Ok [11:44:29] so it should be ok now if you fix the errors [11:44:35] Thanks, did not even think of that. Ops [11:44:43] always look at uwsgi.log first :) [11:44:43] np [11:44:54] * YuviPanda goes afk for a bit [11:46:24] thanks [11:46:48] RECOVERY - Puppet run on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [11:47:22] RECOVERY - Puppet run on tools-webgrid-lighttpd-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [11:50:35] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:05] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:51] RECOVERY - Puppet run on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [11:53:11] RECOVERY - Puppet run on tools-webgrid-generic-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [11:53:23] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [11:53:23] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [11:54:35] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [11:56:05] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [11:57:17] RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [11:58:06] RECOVERY - Puppet run on tools-webgrid-lighttpd-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [11:58:22] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [11:58:58] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [12:02:33] 06Labs, 06WMF-Legal: Potential ambiguities in the Labs Terms of Use - https://phabricator.wikimedia.org/T140486#2474831 (10tom29739) >>! In T140486#2474705, @zhuyifei1999 wrote: > IMHO, username shouldn't be private information. Rather, the association of a username and other private information (IP address, U... [13:00:33] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Omerfarukdemir was created, changed by Omerfarukdemir link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Omerfarukdemir edit summary: Created page with "{{Tools Access Request |Justification=I don't have any idea now. I just want to check it out and be familiar with this wiki developer things. |Completed=false |User Name=Omerf..." [13:03:03] 06Labs, 06Operations, 10netops: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#2474946 (10faidon) 05Open>03Resolved a:03faidon I'm resolving this, as this was primarily a task for the intermittent bandwidth issue. @yuvipanda, f... [13:07:54] 06Labs, 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2474953 (10elukey) @madhuvishy for the pwstore part I think that you should follow up with @MoritzMuehlenhoff when he'll be back (I think July 25th). No... [13:09:13] 10Labs-project-extdist, 10MediaWiki-extensions-ExtensionDistributor: ExtensionDistributor gives error message "Unable to fetch extension list!" - https://phabricator.wikimedia.org/T140753#2474962 (10Mogigoma) [13:14:12] 06Labs, 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2474988 (10Gehel) Looking at the recent history of the pw repository, at least @Dzahn and @ArielGlenn have recent commits, so they should be able to add... [13:15:29] yuvipanda hi, it seems extensiondistributor isent working https://phabricator.wikimedia.org/T140753 [13:15:43] I think it generates its list on one of the instances in labs. [13:17:02] paladox you should ping the maintainers of that project, in this case legoktm [13:17:15] Oh sorry [13:18:08] np paladox [13:18:22] legoktm hi, it seems ExtensionDistributor isent creating the list. [13:18:30] for extensions [13:18:33] see https://phabricator.wikimedia.org/T140753 please. [13:19:12] 10Labs-project-extdist, 10MediaWiki-extensions-ExtensionDistributor: ExtensionDistributor gives error message "Unable to fetch extension list!" - https://phabricator.wikimedia.org/T140753#2475005 (10chasemp) p:05Triage>03Normal [13:22:06] It seems the cpu has gone down to almost 0 https://grafana.wikimedia.org/dashboard/db/labs-project-board?var-project=extdist&var-server=All [13:22:40] extdist-02 seems down too. [13:36:42] https://graphite.wikimedia.org/render shows nothing either [13:38:24] YuviPanda: Hey [13:38:30] paladox, extdist-02 was on 100% iowait for ages until about 10:10 this morning [13:38:36] Oh [13:38:36] IFTTT endpoint tests and all my Unit tests are now passing [13:38:43] paladox: Hi [13:38:48] d3r1ck hi [13:38:57] YuviPanda: Succeeded in fixing this wierd bug :) [13:39:02] congratulations :) [13:39:04] paladox, and extdist-02's memory is "undefined" [13:39:05] paladox: how has it being? [13:39:10] YuviPanda: Thanks. [13:39:20] d3r1ck it's been fine, how about you [13:39:25] tom29739 this https://extdist.wmflabs.org/dist/extensions/ works [13:39:31] paladox: A lot of debugging these days [13:39:38] Oh, :) [13:40:08] paladox: but its very interesting though [13:40:15] Oh [13:40:17] paladox, if I go to the root of that server, it redirects me to https://www.mediawiki.org/wiki/Special:ExtensionDistributor which says "Unable to fetch extension list!" [13:40:17] it makes my mind to strech :) [13:40:26] Oh [13:40:28] *stretch [13:40:43] Yep, it's been reported here https://phabricator.wikimedia.org/T140753#2475005 [13:40:47] tom29739 ^^ [13:41:01] oh [13:42:37] It looks like it might be having memory issues again too. [13:42:56] legoktm could we upgrade the servers to extra large due to them quickly taking up memory please. [13:43:00] and need alot of storage. [13:47:38] RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [13:53:22] Strange, it works for me on my local wiki [13:56:10] 06Labs, 10Tool-Labs, 07Documentation: Wikimedia Labs system admin (sysadmin) documentation sucks - https://phabricator.wikimedia.org/T57946#2475129 (10chasemp) 05Open>03declined Nothing is ever going to happen with this ticket as it's too broad to be useful. We are constantly trying to improve, and help... [14:31:03] RECOVERY - Puppet run on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:01] 06Labs, 13Patch-For-Review: promethium.wikitextexp.eqiad.wmflabs (10.68.16.2, labs baremetal host) has strange DNS A record result, and missing PTR - https://phabricator.wikimedia.org/T139438#2475363 (10AlexMonk-WMF) That commit sorted out the PTR problem, but we still have the strange A record result: ```kren... [14:35:13] 06Labs: promethium.wikitextexp.eqiad.wmflabs (10.68.16.2, labs baremetal host) has strange DNS A record result, and missing PTR - https://phabricator.wikimedia.org/T139438#2475364 (10AlexMonk-WMF) [14:47:13] 06Labs, 10Tool-Labs: Puppet failures in tools-precise-dev - https://phabricator.wikimedia.org/T140696#2475449 (10chasemp) p:05Triage>03High a:05chasemp>03yuvipanda [15:24:15] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [15:25:21] 06Labs, 10Tool-Labs: Puppet failures in tools-precise-dev - https://phabricator.wikimedia.org/T140696#2475711 (10yuvipanda) 05Open>03Resolved Fixed. [15:33:14] YuviPanda: around? [15:33:29] https://www.irccloud.com/pastebin/dgJNBpw2/ [15:34:04] musikanimal fixing [15:34:33] thanks, I tried switching back to gridengine and got the same error [15:34:37] try now, musikanimal [15:34:53] working! [15:34:56] that was a fast [15:35:11] I did bastion-03 but not -02 earlier, just did on all of 'em [15:37:17] I assume there's some other Labs stuff going on? http://tools.wmflabs.org/xtools-ec/ is showing broken images [15:37:37] 503's [15:38:38] oh I see [15:38:44] the main xtools project went down again [15:38:44] musikanimal nope, nothing labswide going on atm [15:39:05] xtools-ec and articleinfo haven't gone down since we moved them to k8s I think [15:42:02] yeah those seem fine [15:42:18] I wonder why the main xtools went down again [15:43:28] https://tools.wmflabs.org/static-browser/ has been returning 500s for a few weeks now. [15:43:36] I don't know who maintains it [15:44:24] musikanimal, xtools-ec isn't down for me [15:44:37] yeah it's working now [15:44:52] it wasn't working for a moment because it relies on the main xtools project to be up and running [15:45:03] that's the part I'm unsure about, why the main xtools went down [15:45:41] I added some basic usage stats to xtools, and turns out it's about as popular as Pageviews Analysis [15:46:37] musikanimal do you think you can move the pageviews related tools back to k8s? [15:47:08] over the past week or so, xtools got around 40,000+ page loads from users in 125 different languages [15:47:57] YuviPanda: hmm, I never moved the Pageviews apps back to gridengine? but I see that it is using it somehow [15:48:24] my restart script just runs `webservice restart` [15:48:55] you should stop running restart scripts on k8s :) [15:49:11] well this is when I make an update [15:49:17] so just pull from master and don't run restart? [15:49:18] ah, that restart [15:49:27] why are you restarting? it's a php app no [15:49:30] that needs no restart to update itselsf [15:50:24] I knew that was the case for static assets but didn't realize PHP changes didn't matter either [15:51:09] musikanimal yup, PHP changes don't matter either [15:51:16] there's a 1second statcache [15:51:18] and that's it [15:51:24] so is it `webservice restart` that magically put it back to gridengine? [15:51:38] webservice --backend=gridengine stop [15:51:45] oops, meant to put that in the terminal [15:51:48] no, if you do a 'webservice stop' and a webservce start [15:51:53] musikanimal I already moved it :D [15:52:19] musikanimal since the default is still gridengine, if you do a 'webservice stop' and 'webservice start' it'll put it back in gridengine [15:53:29] dah okay [15:53:39] well I'll remove that bit from the deploy script [15:53:50] thanks musikanimal :D [15:54:18] YuviPanda, is there any chance you could integrate the git repo functionality of k8s into webservice? [15:54:41] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:53] That'd make PHP webservices much faster and reduce their dependency on NFS. [15:55:17] probably not. the idea with webservice on k8s is to just kill gridengine. Any additional stuff would need to come from https://phabricator.wikimedia.org/T136264 [16:01:01] RECOVERY - Puppet run on tools-webgrid-lighttpd-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [16:04:22] RECOVERY - Puppet run on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [16:06:29] 10Tool-Labs-tools-Pageviews: Add URL parameters to show/hide log scale and start Y-axis from zero - https://phabricator.wikimedia.org/T140783#2475849 (10MusikAnimal) [16:08:20] 10Tool-Labs-tools-Pageviews: Make "stepSize" of Y-axis no smaller than one, so only show integers - https://phabricator.wikimedia.org/T140784#2475864 (10MusikAnimal) [16:13:36] RECOVERY - Puppet run on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [16:21:51] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:33:19] 06Labs, 10Labs-Infrastructure, 10DBA, 07Tracking: Labs databases rearchitecture - https://phabricator.wikimedia.org/T140788#2475959 (10jcrespo) [16:34:38] 06Labs, 10Labs-Infrastructure, 10DBA, 07Epic, 07Tracking: Labs databases rearchitecture - https://phabricator.wikimedia.org/T140788#2475975 (10jcrespo) [16:34:56] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Setup and provision labsdb1009, labsdb1010 and labsdb1011 - https://phabricator.wikimedia.org/T140452#2475978 (10jcrespo) [16:35:02] 06Labs, 10Labs-Infrastructure, 10DBA, 07Epic, 07Tracking: Labs databases rearchitecture - https://phabricator.wikimedia.org/T140788#2475959 (10jcrespo) [16:58:14] 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Epic: Evaluate Kubernetes based workflow replacement options for SGE - https://phabricator.wikimedia.org/T136264#2476073 (10bd808) [17:10:22] PROBLEM - SSH on tools-worker-1004 is CRITICAL: Server answer [17:27:15] 06Labs, 10Tool-Labs: Webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2476216 (10russblau) 05Resolved>03Open Reopening; same symptoms are occurring again. [17:34:12] 06Labs, 10Tool-Labs: Webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2476239 (10yuvipanda) It seems to be running now - did someone manually start it? [17:35:05] 06Labs, 10Tool-Labs: Webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#1718509 (10tom29739) Works for me. [17:35:29] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2476258 (10yuvipanda) 05Open>03Resolved Closing again until it recurs. [17:38:22] 06Labs, 10Tool-Labs: Webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2476267 (10russblau) Yes, I had to manually restart it twice today. The automatic webservice restarter is not working. [17:44:22] 06Labs, 10Tool-Labs: Webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2476309 (10yuvipanda) I moved it to kuberenetes and also fixed the issue with the webservice restarter. Can you verify it works fine under kubernetes? (no changes required from your perspective, since it'... [17:47:38] 06Labs, 10DBA, 06Operations: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2476338 (10RobH) [18:38:06] 06Labs, 10Tool-Labs: Webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2476580 (10russblau) It is down at the moment. "webservice status" says it is running, but "qstat" shows no server process running. [18:44:53] 06Labs, 10Tool-Labs: Webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2476602 (10yuvipanda) I can see it running now? http://tools.wmflabs.org/dplbot/ When I switched it to kubernetes it'll no longer show up in qstat (use 'kubectl get pod' for equivalent). [19:26:07] (03CR) 10Dzahn: [C: 04-1] Add #mediawiki-extensions channel for mediawiki extensions [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/292554 (owner: 10Paladox) [19:27:33] (03Abandoned) 10Dzahn: Add #mediawiki-extensions channel for mediawiki extensions [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/292554 (owner: 10Paladox) [19:39:15] 10PAWS: Paws display 502 - Bad gateway error - https://phabricator.wikimedia.org/T140578#2469380 (10Capt_Swing) @yuvipanda I've also been experiencing this issue for the past few days when I try to log in to PAWS. [19:39:42] 10PAWS: Paws display 502 - Bad gateway error - https://phabricator.wikimedia.org/T140578#2476888 (10yuvipanda) @Capt_Swing are you still experiencing it? [20:05:38] (03CR) 10Legoktm: [C: 032] Add more projects to #wikimedia-de-tech [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/299528 (owner: 10Addshore) [20:08:28] cheers legoktm [20:10:37] :) [20:11:47] (03Merged) 10jenkins-bot: Add more projects to #wikimedia-de-tech [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/299528 (owner: 10Addshore) [20:45:58] !log tools.admin Switching web interface to https://phabricator.wikimedia.org/diffusion/1922/tool-admin-web.git [20:46:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL, Master [20:48:16] bd808, tools.wmflabs.org returns 502 Bad Gateway [20:48:22] Is that related to ^ [20:48:27] yeah. working on it [20:49:20] !log tools.admin Reverted to old version running on gridengine [20:49:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL, Master [20:53:14] bd808, is ^ the striker thing you're doing? I heard it's undergoing security review (or passed it?) [20:53:45] tom29739: striker is a different app [20:54:17] this is jsut a rewrite of the php that handles the tools landing and error pages [21:15:56] 06Labs, 10Labs-Infrastructure, 10DBA, 07Epic, 07Tracking: Labs databases rearchitecture - https://phabricator.wikimedia.org/T140788#2477494 (10chasemp) https://phabricator.wikimedia.org/P3514 [21:29:24] 06Labs, 10Labs-Infrastructure, 10DBA: Investigate moving labsdb (replicas) user credential management to 'Striker' (codename) - https://phabricator.wikimedia.org/T140832#2477524 (10chasemp) [21:37:40] 06Labs, 10Labs-Infrastructure, 10DBA, 07Epic, 07Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#2477582 (10Danny_B) [21:37:43] 06Labs, 10Labs-Infrastructure, 10DBA, 07Epic, 07Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#2475959 (10Danny_B) This should actually be rather goal (have its own orange tag) instead of tracking task. [21:57:59] !log tools.admin Switching web interface to https://phabricator.wikimedia.org/diffusion/1922/tool-admin-web.git (take 2) [21:58:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL, Master [21:59:17] ah ha! [22:01:41] or maybe not so ah ha! [22:06:12] !log tools.admin Switching back to old php code again. /admin/tools and error page handling didn't seem to work as expected. Also couldn't get it to work at all as k8s service. [22:06:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL, Master [22:30:29] (03PS1) 10Jean-Frédéric: Add two known fields to fr (fr) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/299886 [22:36:16] (03PS1) 10Jean-Frédéric: Harvest Wikidata item in gb-eng (en) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/299890 (https://phabricator.wikimedia.org/T140795) [22:44:59] (03PS1) 10Jean-Frédéric: Harvest Wikidata item for Canada in English ca_(en) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/299891 (https://phabricator.wikimedia.org/T138668) [22:47:11] (03PS2) 10Jean-Frédéric: Harvest Wikidata item in gb-eng (en) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/299890 (https://phabricator.wikimedia.org/T140795) [22:47:33] (03PS2) 10Jean-Frédéric: Harvest Wikidata item for Canada in English ca_(en) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/299891 (https://phabricator.wikimedia.org/T140795) [23:01:16] (03CR) 10Jean-Frédéric: [C: 032] "Nice :)" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/298174 (owner: 10Lokal Profil) [23:02:05] (03Merged) 10jenkins-bot: Make lines shorter and docstrings standardised [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/298174 (owner: 10Lokal Profil) [23:15:59] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: Modernize the admin tool's codebase - https://phabricator.wikimedia.org/T140254#2477964 (10bd808) I tried to deploy this live to https://tools.wmflabs.org/ and hit some problems: * https://tools.wmflabs.org/admin/tools ret... [23:21:10] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: Modernize the admin tool's codebase - https://phabricator.wikimedia.org/T140254#2477971 (10bd808) Here's how I was testing switching back and forth between the new code and the legacy code: ``` name=Switch tool-admin-web... [23:23:40] 06Labs: promethium.wikitextexp.eqiad.wmflabs (10.68.16.2, labs baremetal host) has strange DNS A record result, and missing PTR - https://phabricator.wikimedia.org/T139438#2477980 (10AlexMonk-WMF) @bblack figured out the extra requests are AAAA queries (no thanks to /usr/bin/host), which metaldns will currently... [23:30:04] 06Labs, 13Patch-For-Review: promethium.wikitextexp.eqiad.wmflabs (10.68.16.2, labs baremetal host) has strange DNS A record result, and missing PTR - https://phabricator.wikimedia.org/T139438#2477999 (10AlexMonk-WMF) a:03AlexMonk-WMF [23:36:50] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Support reverse dns for public labs IPs - https://phabricator.wikimedia.org/T104521#2478015 (10AlexMonk-WMF) So I have a WIP script coming along to do this, but there's a blocker: Designate stores the in-addr.arpa zone under the 'noauth-project' account... W... [23:39:07] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: Modernize the admin tool's codebase - https://phabricator.wikimedia.org/T140254#2478021 (10bd808) p:05Triage>03Normal [23:40:32] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: Modernize the admin tool's codebase - https://phabricator.wikimedia.org/T140254#2458217 (10bd808) [23:40:34] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Split OGE grid status data collection out of admin tool - https://phabricator.wikimedia.org/T140251#2478023 (10bd808) 05Open>03Resolved p:05Triage>03Normal https://tools.wmflabs.org/gridengine-status can be used by T140254 or any other... [23:52:29] 06Labs, 10Tool-Labs, 10labs-sprint-117, 06Community-Tech-Tool-Labs, and 6 others: Organize a (annual?) toollabs survey - https://phabricator.wikimedia.org/T95155#2478059 (10egalvezwmf) Hi all - this survey is a great candidate for [[ https://meta.wikimedia.org/wiki/Community_Engagement_Insights | the new c...