[00:06:04] CP678: You can tag methods and constants with the @internal if you don't want them to to show up, see https://www.phpdoc.org/docs/latest/references/phpdoc/tags/internal.html [00:06:33] Too slow :3 [00:14:33] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [01:23:27] CP678: Around? [02:59:20] Matthew_, yes [03:00:22] CP678: Excellent. Do you want the docs to be rebuilt via GitHub Webhook, cron job, or a process continuously checking for new git versions? [03:00:35] Webhook strikes me as the best, but it'll require more programming. [03:01:03] Matthew_, I'll leave that in your capable hands. [03:01:57] OKay. [03:02:52] Matthew_, you know you have write access correct? [03:03:19] I believe so, though I'll probably do pull requests anyway. [03:03:37] Don't believe. Know. [03:03:50] :P [03:05:20] I'm in the organization [03:05:54] You have write access. Use it. :p [03:06:37] Okay. [03:06:46] Oops, dinner. Stepping away for a few [04:57:42] RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [05:08:18] 06Labs, 13Patch-For-Review: Periodic internal labs dns outages - https://phabricator.wikimedia.org/T124680#1962377 (10yuvipanda) I've setup https://grafana.wikimedia.org/dashboard/db/labs-dns-dashboard to have some metrics about our DNS systems now - there are more detailed metrics in graphite (response type c... [05:11:13] 06Labs, 10WM-Bot, 07Privacy: http://wm-bot.wmflabs.org/browser/ is loading assets from multiple 3rd party domains - https://phabricator.wikimedia.org/T133644#2238084 (10yuvipanda) [05:13:53] chasemp: andrewbogott bd808 https://grafana-admin.wikimedia.org/dashboard/db/labs-dns-dashboard labs DNS dashboard! more info in graphite that we haven't put there. holmium info coming soon once puppet sets up there [08:10:11] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure: castor.integration.eqiad.wmflabs unreacheable deadlocking the whole CI - https://phabricator.wikimedia.org/T133652#2238394 (10hashar) Seems the /dev/vda disk is stalling somehow :( ``` castor login: [2863440.276096] INFO: task jbd2/vda3-8:113 bloc... [08:29:06] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure: castor.integration.eqiad.wmflabs unreacheable deadlocking the whole CI - https://phabricator.wikimedia.org/T133652#2238451 (10hashar) It wont come back. I am going to create a new instance. [08:34:53] PROBLEM - Host tools-bastion-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.228) [08:37:09] 06Labs, 10Labs-Infrastructure: wmflabs OpenStack is deadlocked (can't boot or delete instances) - https://phabricator.wikimedia.org/T133654#2238466 (10hashar) [08:37:11] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, 07Documentation: Run a documentation sprint for Labs - https://phabricator.wikimedia.org/T101659#1344508 (10Qgil) Would this task or any of the subtasks be a good candidate for #wikimania-hackathon-2016? We are looking for tasks for... [08:38:50] 06Labs, 10Tool-Labs, 10labs-sprint-117, 06Community-Tech-Tool-Labs, and 6 others: Organize a (annual?) toollabs survey - https://phabricator.wikimedia.org/T95155#2238498 (10Qgil) [08:38:53] 06Labs, 10Labs-Infrastructure: wmflabs OpenStack is deadlocked (can't boot or delete instances) - https://phabricator.wikimedia.org/T133654#2238466 (10hashar) castor2 spawned via wikitech yields an error in the Horizon dashboard: ``` Error: Failed to perform requested operation on instance "castor2", the insta... [08:43:11] 06Labs, 10Labs-Infrastructure: wmflabs OpenStack is deadlocked (can't boot or delete instances) - https://phabricator.wikimedia.org/T133654#2238507 (10hashar) From Icinga: labvirt1008 Disk space WARNING 2016-04-26 08:38:43 0d 6h 59m 14s 3/3 DISK WARNING - free space: /var/lib/nova/instances 159698 MB (6% ino... [08:46:06] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure: castor.integration.eqiad.wmflabs unreacheable deadlocking the whole CI - https://phabricator.wikimedia.org/T133652#2238517 (10hashar) [08:56:41] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure: castor.integration.eqiad.wmflabs unreacheable deadlocking the whole CI - https://phabricator.wikimedia.org/T133652#2238545 (10hashar) Labs process got restarted and castor instance managed to spawn. Jenkins refuses to add it back as a slave though :( [08:56:58] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure: castor.integration.eqiad.wmflabs unreacheable deadlocking the whole CI - https://phabricator.wikimedia.org/T133652#2238550 (10yuvipanda) [08:57:01] 06Labs, 10Labs-Infrastructure: wmflabs OpenStack is deadlocked (can't boot or delete instances) - https://phabricator.wikimedia.org/T133654#2238547 (10yuvipanda) 05Open>03Resolved a:03yuvipanda That seems to have fixed it now! I'm going to file a bug to have a paging check for this. [08:59:46] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 10Monitoring, 06Operations: Have a paging check for Nova API accessible - https://phabricator.wikimedia.org/T133656#2238563 (10yuvipanda) [09:07:20] 06Labs, 10Wikimedia-Labs-General, 06Developer-Relations: Community-maintained projects on Labs are hard to track - https://phabricator.wikimedia.org/T64837#2238595 (10Qgil) p:05High>03Low [09:09:04] 06Labs, 10Wikimedia-Labs-General, 06Developer-Relations: Community-maintained projects on Labs are hard to track - https://phabricator.wikimedia.org/T64837#670997 (10Qgil) [09:09:06] 06Labs, 10Tool-Labs-tools-Other, 06Community-Tech-Tool-Labs, 06Developer-Relations: Create an authoritative and well promoted catalog of Wikimedia tools - https://phabricator.wikimedia.org/T115650#2238597 (10Qgil) [09:10:24] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure: castor.integration.eqiad.wmflabs unreacheable deadlocking the whole CI - https://phabricator.wikimedia.org/T133652#2238604 (10hashar) In Jenkins the castor slave thread seems to be blocked. ``` "Channel reader thread: castor" prio=5 BLOCKED hudso... [09:13:31] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure: castor.integration.eqiad.wmflabs unreacheable deadlocking the whole CI - https://phabricator.wikimedia.org/T133652#2238618 (10hashar) 05Open>03Resolved Root cause was Nova acting weirdly fixed by @yuvipanda which "restarted nova-conductor & sc... [09:16:19] hello [09:17:13] I found really interesting the X! tools [09:20:05] 10MediaWiki-extensions-OpenStackManager: Special:NovaAddress should list instance IP - https://phabricator.wikimedia.org/T60444#2238650 (10hashar) 05Open>03declined OpenStackManager is legacy. We are switching to Horizon which has much more informations. [09:21:54] hey Gianfra [09:22:14] Gianfra: you should probably talk to musikanimal (who is not here atm) about xtools [09:24:28] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure, 10Monitoring, 06Operations: Have a paging check for Nova API accessible - https://phabricator.wikimedia.org/T133656#2238670 (10hashar) [09:25:00] thank you YuviPanda [09:25:28] musikanimal is one of the developers? [09:26:06] Gianfra: I think so yes [09:28:53] I see he is involved in Github [09:29:02] yeah [09:29:57] YuviPanda: I am developing a research and one of the X tools can help me a lot [10:20:39] RECOVERY - Puppet run on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [13:28:28] 06Labs, 10Horizon, 13Patch-For-Review: Switch dynamicproxy to point back to IP rather than domain names - https://phabricator.wikimedia.org/T133554#2235590 (10chasemp) Instead of doing something custom we could probably do a restart of webservices causing a re-register https://wikitech.wikimedia.org/wiki/No... [13:48:45] 06Labs, 10Horizon, 13Patch-For-Review: Switch dynamicproxy to point back to IP rather than domain names - https://phabricator.wikimedia.org/T133554#2239442 (10AlexMonk-WMF) I don't really know what that is but it looks tools-specific [13:50:16] 06Labs, 10Horizon, 13Patch-For-Review: Switch dynamicproxy to point back to IP rather than domain names - https://phabricator.wikimedia.org/T133554#2239445 (10chasemp) ah, yes I was only thinking about changing webservices in tools enmass :) [14:17:50] hey guys, how bad would it be 20 minutes of lag on all of labs? [14:18:17] some queries are getting stuck, and I may need an upgrade or lag may get worse [14:18:41] jynus: forever? I'm honestly not sure on what weird outcomes we may get from that, but what's the general lag now [14:18:53] not forever :-) [14:19:02] for 20 minutes, until it recovers [14:19:22] whatever it takes me to upgrade the sanitarium [14:20:17] the other alternative is the current state: 2h on s2 and growing [14:20:22] my general theory is if you need to do it let's get go for it :) I can update the topic here and announce it to be transparent [14:28:43] it does not affect running queries, but I can continue fixing it every single time or do it once for all [14:34:02] jynus: heuristically, I would say: 20 mins replag everywhere is better than a day of replag on s2 [14:34:12] so maybe send an email now and do it in an hour or so? [14:34:39] it is even worse than that, it happened on s1 before and now it is happening on s2 [14:54:51] Anybody knows why https://tools.wmflabs.org/persondata/ does not load at all? [14:55:26] hm, APPER the maintainer is inactive since the 12th march [14:56:35] and looks like there were problems at the past too: https://de.wikipedia.org/wiki/Benutzer_Diskussion:APPER#Personensuche-Tool_defekt_2 [15:00:13] valhallasw`cloud, chasemp that should do it for now, if it happens again, we will upgrade the server [15:09:11] 10Tool-Labs-tools-Other: tools.persondata 504's (sockets disabled, connection limit reached) - https://phabricator.wikimedia.org/T133697#2239762 (10valhallasw) [15:17:22] Ah, thanks everybody! [15:22:43] 10Tool-Labs-tools-Other: tools.persondata 504's (sockets disabled, connection limit reached) - https://phabricator.wikimedia.org/T133697#2239841 (10valhallasw) Here, applebot seems to be the culprit. I tried to poke around in the process per https://derickrethans.nl/what-is-php-doing.html , but that doesn't seem... [15:42:11] hello! [15:43:02] any admin could check me why a process gets "KeyboardInterrupt" even though I do not do it myself? [15:43:24] marmick: kill signal, either from qdel or because it's using too much memory [15:43:47] ok [16:00:33] 06Labs, 10WM-Bot, 07Privacy: http://wm-bot.wmflabs.org/browser/ is loading assets from multiple 3rd party domains - https://phabricator.wikimedia.org/T133644#2239971 (10scfc) a:03Petrb [16:17:38] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240128 (10jcrespo) [16:27:05] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240183 (10Magnus) This looks like PetScan, which is using the catscan2 username? [16:30:26] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240202 (10Magnus) Yes, it's the catscan2 user. Hard to tell what exactly the problem is without seeing the full query. I'll have a look at the concurrency code... [16:31:20] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240215 (10jcrespo) To solve the immediate lag issues, are you ok with me killing the long running queries or can you do it? [16:32:21] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240228 (10jcrespo) Yeas that is read/examined rows, not the ones returned. Maybe an index can help speeding them up? [16:34:34] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240250 (10Magnus) Yes, please kill them. And the database is the dewiki clone, so I can't create an index there... [16:39:04] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240268 (10jcrespo) > I can't create an index there Yes, but I can, and I am open to suggestions :-) That isn't as urgent, though, as fixing the current load p... [17:00:25] 06Labs: new bootstrap-vz jessie images don't log (and maybe don't start at all) - https://phabricator.wikimedia.org/T133551#2240329 (10Andrew) [17:04:42] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240339 (10jcrespo) {F3933342} [17:15:58] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240378 (10Magnus) Ah yes, I just fixed that query earlier today, see [[ https://bitbucket.org/magnusmanske/petscan/commits/83b76344afae002ab752d5a1f82e79926d552... [17:20:57] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240389 (10jcrespo) Server load dropped a lot after the kill, I set a watchdog for long-running concurrent queries. Maybe we can close this now while I search fo... [17:32:11] 06Labs: Missing data on labs replica database - https://phabricator.wikimedia.org/T133715#2240421 (10Ragesoss) [18:01:06] CP678: Can you give me repo creation permissions or create me a repo on "MW-Peachy"? [18:01:22] What do you need the repo for? [18:01:43] I want to create a repository as backup to the webhook endpoint I'm creating. [18:01:52] Name? [18:01:59] Erm... [18:02:19] docs-web? [18:03:44] https://github.com/MW-Peachy/docs-web [18:03:56] Thank you! [18:05:26] 06Labs, 10Tool-Labs, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003 - https://phabricator.wikimedia.org/T133705#2240527 (10Magnus) 05Open>03Resolved a:03Magnus Happy hunting, and tell me if you find any more of mine! [18:15:21] 10Tool-Labs-tools-Other, 10DBA: Killed very long transaction that was blocking the replica on labsdb1001 - https://phabricator.wikimedia.org/T133086#2240612 (10Kelson) OK, I need to identify the request and probably split it in smaller one. [18:23:27] (03PS1) 10BryanDavis: [WIP] Rewrite jsub in python [labs/toollabs] - 10https://gerrit.wikimedia.org/r/285435 (https://phabricator.wikimedia.org/T132475) [18:40:28] mutante: did grrrit-wm die? [18:47:40] YuviPanda: Nope, last contrib from 20:47 CEST [18:47:44] (this minute) [18:47:58] ah ok :D [18:48:07] thanks Luke081515 [18:49:41] !log wikilabels successfully deployed new version of wikilabels to the staging [18:49:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL, Master [18:49:51] halfak: ^ [18:53:46] Amir1, awesome. Next time, you should include the first 7 chars of the commit checksum that was deployed :) [18:54:13] hmm [18:54:15] sure [18:54:17] merged a change by godog https://gerrit.wikimedia.org/r/#/c/284912/ that should stop ganglia::monitor from running on labs instances. it really made no sense that they were running and sending data to aggregators in prod, while ganglia.wmflabs.org is gone [18:54:22] sorry if it's not there [18:54:30] E.g. on sept. 7th, I deployed wikilabels-wikimedia-config:d5f59ce [18:54:36] so dont be surprised if you see a related change on puppet run [18:54:42] But I've been ignoring the wikilabels logs ever since >:( [18:54:45] *:( [18:54:48] shame on halfak [18:55:18] mutante: thanks! [18:55:33] halfak: sustainable halfak, no shame on halfak :) [18:56:17] * halfak works on his sustainable halfak brain farming practices [19:31:04] YuviPanda: thank you very much for the labs fix up yesterday! [19:48:52] 10Quarry: Quarry task running for a while - https://phabricator.wikimedia.org/T133738#2241097 (10Ankit-Maity) [19:49:00] 10Quarry: Quarry task running for a while - https://phabricator.wikimedia.org/T133738#2241109 (10Ankit-Maity) p:05Triage>03Normal [19:58:46] (03CR) 10Tim Landscheidt: "If you replace it in situ (as jsub), the test suite will ensure some basic compatibility." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/285435 (https://phabricator.wikimedia.org/T132475) (owner: 10BryanDavis) [20:20:44] 06Labs, 10wikitech.wikimedia.org: WikiPage::something error encountered while adding two users to Tools project at the same time - https://phabricator.wikimedia.org/T133742#2241219 (10scfc) [21:02:34] hi, I'm trying to get some data from enwiki.labsdb enwiki_p does anyone know if there is an index on page_title? [21:03:10] probably not 18 rows in set (3 min 1.33 sec) [21:03:30] jynus: ^ [21:08:30] physikerwelt: (page_namespace, page_title) [21:08:49] (that's just the regular mediawiki index) [21:09:29] thanks reduced the time 0.12 sec [21:11:35] physikerwelt: what doc could we write on wiki to make that information more discoverable? [21:12:11] I've seen the question asked before and wonder if there is a place that anyone looks first before asking here [21:12:37] It's easier to ask on here for some stuff. [21:13:31] The search works, but sometimes there's stuff without docs, and sometimes it's named something strange, so the search doesn't pick it up. [21:13:39] tom29739: sure, but I would be willing to bet that the number of people who have that question is much larger than the number of people who think "oh I'll just ask on irc" [21:14:10] Some people don't know IRC exists. [21:14:19] exactly [21:15:07] I'm not trying to discourage questions here, just hoping that the docs could be made a bit better [21:15:09] Perhaps it would be good to publicise IRC more. [21:16:25] If I can't find something in the docs, then I ask on here, but others might just stop using labs, or use worse ways to do the same thing, or break rules, etc. [21:21:30] bd808: as far as indices go, there's both mediawiki.org (pages on database structure) and wikitech (special indices) [21:21:54] I'm not sure if we link to the MW docs from wikitech [21:22:05] Some we do, some we don't. [21:22:50] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database is pretty lacking [21:23:09] that should probably at least link to https://www.mediawiki.org/wiki/Manual:Database_layout [21:23:12] there is a note about revision and logging *tno* having indexes [21:24:06] ...and https://www.mediawiki.org/wiki/Manual:Logging_table#Schema is not very clear in what the indices are either [21:25:13] https://phabricator.wikimedia.org/diffusion/MW/browse/master/maintenance/tables.sql is the best source, it seems [21:48:50] I started a little section on schemas -- https://wikitech.wikimedia.org/w/index.php?title=Help:Tool_Labs/Database&diff=463757&oldid=425147 [21:50:02] * bd808 sneaks a NOFX name drop into wikitech at the same time [22:03:19] 06Labs, 06Discovery, 06Discovery-Search-Backlog, 06Operations, 10hardware-requests: eqiad: (2) Relevance forge servers - https://phabricator.wikimedia.org/T131184#2241466 (10Deskana) [22:14:19] somebody nows the dashboard at grafana where I can find account creation prevented by captcha? [22:14:52] found it [22:15:16] Luke081515: grafana.wikimedia.org ? [22:15:36] yeah, I found it at https://grafana.wikimedia.org/dashboard/db/authentication-metrics?panelId=12&fullscreen [22:16:41] ok [23:01:04] PROBLEM - Puppet run on tools-web-static-02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [23:34:45] Hi are the xml dumps accessible from the labs instances via nfs? I vaguely remember that that was possible in the past [23:44:10] I think there's a project downloading them