[00:07:03] 06Labs, 10Tool-Labs, 13Patch-For-Review: Move aptly backups to a cron rather than puppet - https://phabricator.wikimedia.org/T150726#2984095 (10scfc) 05Open>03Resolved a:03scfc @bd808, @yuvipanda: I have added [[https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:Tools/Admin/Deploy_new_jobut... [00:24:01] 10Tool-Labs-tools-Pageviews, 07I18n: massviews-category-description lego for "category" - https://phabricator.wikimedia.org/T146973#2984122 (10MusikAnimal) So I tried adding my own link parser, which I just found out about from the [[ https://github.com/wikimedia/jquery.i18n#extending-the-parser | README ]] (... [01:49:41] !log tools.stashbot Restarted to pick up some temporary logging added to debug T156652 [01:49:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL [01:49:45] T156652: Stashbot not seeing tools.precise-tools as a valid project - https://phabricator.wikimedia.org/T156652 [02:04:28] Can an admin please restart the enwp10 webservice, logs show no activity since Jan 19. Maintainers were notified on Jan 23, no response. [02:05:33] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2981991 (10bd808) The whole pool is full of instances stuck in delete now. Here's a bit more info: ``` nodepool@labnodepool1001:~$ no... [02:08:37] !log tools.enwp10 Ran `webservice restart` at request of bamyers99 on irc [02:08:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.enwp10/SAL [02:09:06] bamyers99: ^ no idea if that fixed anything, but I bounced the service for you [02:10:45] @bd808: It is working now, thanks [02:11:09] "have you tried turning it off and on again?" [02:42:24] 06Labs, 10wikitech.wikimedia.org: Setup TorBlock cron on silver to update exit node list - https://phabricator.wikimedia.org/T156733#2984681 (10bd808) [03:24:25] 06Labs, 10Tool-Labs: Bot (written in Mono, running on Labs) has lags during stats reading through API - https://phabricator.wikimedia.org/T147109#2984699 (10scfc) If an error occurs when a task is run on the grid (with `jsub`/`cron`), but not interactively on the bastion hosts, this often means that the job re... [05:57:20] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Fabandy was created, changed by Fabandy link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Fabandy edit summary: Created page with "{{Tools Access Request |Justification=collaborative tools testing, ... |Completed=false |User Name=Fabandy }}" [06:29:16] PROBLEM - Puppet run on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:30:53] 06Labs, 10Tool-Labs, 07kubernetes: `webservice restart` fails for k8s service if an interactive pod is also open - https://phabricator.wikimedia.org/T156738#2984798 (10bd808) [06:32:30] 06Labs, 10Tool-Labs, 07kubernetes: `webservice restart` fails for k8s service if an interactive pod is also open - https://phabricator.wikimedia.org/T156738#2984811 (10bd808) It looks like it actually did restart the pod. Does it just fail when trying to check to see if the restart worked? [06:44:51] 06Labs, 10Tool-Labs, 07kubernetes: `webservice restart` fails for k8s service if an interactive pod is also open - https://phabricator.wikimedia.org/T156738#2984813 (10zhuyifei1999) [06:44:53] 06Labs, 10Tool-Labs, 10Tools-Kubernetes: k8s webservice restart failure with `ValueError: get() more than one object; use filter` - https://phabricator.wikimedia.org/T156626#2984816 (10zhuyifei1999) [06:46:49] 06Labs, 10Tool-Labs, 10Tools-Kubernetes: k8s webservice restart failure with `ValueError: get() more than one object; use filter` - https://phabricator.wikimedia.org/T156626#2981472 (10zhuyifei1999) Is it really because of having interactive pod open? Last time I tested: `webservice stop; sleep 2; webservice... [06:49:07] RECOVERY - Free space - all mounts on tools-exec-1221 is OK: OK: tools.tools-exec-1221.diskspace._public_dumps.byte_percentfree (No valid datapoints found) [07:01:58] 06Labs, 10Tool-Labs, 10Tools-Kubernetes: k8s webservice restart failure with `ValueError: get() more than one object; use filter` - https://phabricator.wikimedia.org/T156626#2984838 (10zhuyifei1999) Doing some code reading: * [[https://github.com/wikimedia/operations-software-tools-webservice/blob/3fc2aaa548... [07:09:16] RECOVERY - Puppet run on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [07:18:26] 06Labs, 10Tool-Labs, 10Tools-Kubernetes: k8s webservice restart failure with `ValueError: get() more than one object; use filter` - https://phabricator.wikimedia.org/T156626#2984863 (10zhuyifei1999) Hmm indeed both `services` and `deployments` are empty right after stopping, but not the webservice pod that's... [07:24:22] 06Labs, 10Tool-Labs, 10Tools-Kubernetes: k8s webservice restart failure with `ValueError: get() more than one object; use filter` - https://phabricator.wikimedia.org/T156626#2984864 (10zhuyifei1999) Is the logic of `Backend.STATE_STOPPED` being iff all of services, deployments and pods under `webservice_labe... [07:53:57] 10Tool-Labs-tools-Pageviews, 07I18n: massviews-category-description lego for "category" - https://phabricator.wikimedia.org/T146973#2984897 (10Nikerabbit) It's not message **re**-use if you only use that one message in that one other message. It's not ideal, and requires good message documentation, but it seem... [09:49:25] hi, is the beta cluster down? [09:49:26] I get DNS errors [09:49:32] for https://meta.wikimedia.beta.wmflabs.org/ [09:50:40] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2985545 (10hashar) [09:51:11] it's up again [09:51:49] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2981991 (10hashar) [09:52:15] 06Labs, 10Tool-Labs, 06Wikispeech-WMSE, 15User-LokalProfil, 10Wikispeech (Sprint 2017-01-25): New toollabs project not found in SAL - https://phabricator.wikimedia.org/T156127#2985548 (10Lokal_Profil) a:05Lokal_Profil>03None [09:52:33] 06Labs, 10Tool-Labs, 06Wikispeech-WMSE, 15User-LokalProfil, 10Wikispeech (Sprint 2017-01-25): New toollabs project not found in SAL - https://phabricator.wikimedia.org/T156127#2985559 (10Lokal_Profil) [09:53:19] 06Labs, 10Tool-Labs, 06Wikispeech-WMSE, 15User-LokalProfil, 10Wikispeech (Sprint 2017-01-25): New toollabs project not found in SAL - https://phabricator.wikimedia.org/T156127#2964914 (10Lokal_Profil) For completeness: I tried both "wikispeech" and "Wikispeech" [10:04:07] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2985716 (10hashar) Not sure what happened with 508783 but eventually it has been deleted: Logs show that some other instances/proj... [14:07:36] 06Labs, 10Analytics-Tech-community-metrics: http://korma.wmflabs.org/ got erased - https://phabricator.wikimedia.org/T156253#2986278 (10Lcanasdiaz) @Aklapper if we want to recover the legacy Bitergia dashboard what we have to do is: * deploying a virtual machine with a web server like apache/nginx * clone this... [16:15:02] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2986837 (10bd808) The nodepool issues on 2017-01-30 and 31 were very likely caused by a nova-api failure which itself may or may not... [16:45:12] 06Labs, 10Tool-Labs-tools-Other, 06Community-Tech-Tool-Labs, 06Developer-Relations, and 2 others: Create an authoritative and well promoted catalog of Wikimedia tools - https://phabricator.wikimedia.org/T115650#2986989 (10RHeigl) [16:46:07] 06Labs, 10Tool-Labs-tools-Other, 06Community-Tech-Tool-Labs, 06Developer-Relations, and 2 others: Create an authoritative and well promoted catalog of Wikimedia tools - https://phabricator.wikimedia.org/T115650#1729232 (10RHeigl) Would be a great support for new developers [16:52:46] 06Labs, 10DBA, 06Operations, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2987016 (10jcrespo) [16:56:19] 06Labs, 10DBA, 06Operations, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2987020 (10jcrespo) [16:59:01] 06Labs, 10DBA, 06Operations, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2987028 (10jcrespo) [17:54:45] bd808: do you think it'll make sense (in precise-tools) to exclude the jobs that has been submitted without release=trusty under the same name? [17:58:19] I mean, submitted more recently then the submission with that parameter. i.e. something like if with: add job to list; else: remove job from list; (after if block) change jobs to tools [17:59:33] also, is it fine if I add matanya to the maintainers if he agrees? since he originally wanted to volunteer, /me feels bad [18:01:52] more maintainers is alomost always a bonus :) [18:12:11] zhuyifei1999_: +1 to more maintainers certainly [18:12:30] I think tracking conversion mid-week is more difficult I think [18:12:44] I would like to add job name and last seen as precise dates though [18:12:50] k (hopefully he agrees) [18:13:01] matanya: ^ [18:13:12] yeah that's what I wish too [18:13:34] I did not mean to lick this cookie :) I wanted to help get things moving so I was BOLD [18:14:27] my latest idea would be like the https://tools.wmflabs.org/?status where instead of tools-exec-1217 etc. it would be each tool [18:14:59] under it would be the name of the job and when it was last run [18:15:03] yeah something along that line would be good [18:17:48] great idea [18:28:49] zhuyifei1999_: I'm in meetigns for a bit more today, but I have a POC script that handles the job name + times seen + last seen data collection [18:29:04] it should be easy to add that to the current code [18:31:02] k [20:04:49] 06Labs, 10Tool-Labs, 06Stewards-and-global-tools: For Tool Labs admins: Notice of heavy use of tool in the next month - https://phabricator.wikimedia.org/T156845#2987636 (10MarcoAurelio) [20:14:30] 06Labs, 10Tool-Labs, 10Tools-Kubernetes: "webservice shell" fails with "No such file or directory" (with php5.6) - https://phabricator.wikimedia.org/T156605#2987655 (10scfc) [20:14:47] 06Labs: Delete http://wikimedia-ui.wmflabs.org/ instance - https://phabricator.wikimedia.org/T156679#2987657 (10scfc) [20:25:23] 10Tool-Labs-tools-Other, 06Wikispeech-WMSE, 15User-LokalProfil, 10Wikispeech (Sprint 2017-01-25): New toollabs project not found in SAL - https://phabricator.wikimedia.org/T156127#2987662 (10scfc) In which channel did you try `!log wikispeech`? `#wikimedia-labs`? [20:51:11] jynus, around and have time to talk about https://phabricator.wikimedia.org/T146718 ? [20:55:14] zhuyifei1999_: go ahead [20:58:57] halfak, discuss what? [20:59:16] jynus, whether or not you've had a chance to load that table. :) [20:59:21] no [20:59:34] Oh. Anything I could help with? [20:59:35] I will update as in progress when I start [20:59:43] Gotcha. Thanks. [21:00:12] I wanted to make that not depend on me [21:00:19] which is something I mentioned [21:01:00] But there are some important things before I get to that: https://phabricator.wikimedia.org/project/view/1060/ [21:03:13] jynus, maybe this is something I could do, myself. [21:03:30] yes, that is the aim in the end [21:03:53] If you wanted to, say, for example, delegate management of the 'datasets_p' DB to me, I could do some manual work while we make it easier for others to work from the outside. [21:03:58] the thing is, permissions and access model is not final on the new servers yet [21:04:11] Given that I'm staff, it'll just take an access request [21:04:33] it is not a question of trust [21:04:45] we do not allown anyone to write on those servers [21:04:50] except replication [21:05:06] and there is not a way to replicate between servers [21:05:19] a solution has to be planned and deployed [21:05:56] jynus, I was under the impression that you'd be willing to just load the dataset for me when we last chatted. [21:06:04] yes, and I am [21:06:43] Oh yeah. So I guess what I'm suggesting isn't clear. Maybe I could do that for this and other datasets until we're able to plan and deploy a better solution. [21:06:56] are you root? [21:07:34] I guess I could be if there were a reason for me to pick up the mop [21:07:54] thing is , there is no admin user or anything, as I said [21:08:13] new servers are WIP, I offered to load them before they are fully setup [21:09:56] Gotcha. So it seems like you think that passing me some work wouldn't really solve any problems for you right now. I think I understand that. Please do feel free to reach out to me if you're looking for someone to help with DBA work -- especially around tool labs and researchy datasets. [21:10:13] one thing that would speed things up [21:10:20] * halfak leans on the edge of his seat [21:10:26] is contributing to think a model on how to speed things up [21:10:31] lol [21:10:33] in terms of permissions [21:10:39] I am being serious [21:10:47] or accounts or something [21:10:58] the main blocker is the replication model [21:10:58] Permissions as in "who gets to load datasets into datasets_p"? [21:11:02] no [21:11:19] how to load things to a server and that they are available on all [21:11:31] while maintaining consistency bewtween servers [21:11:37] and not breaking replication [21:11:57] we are in a load balanced environement (distributed one) [21:12:10] which makes things a bit more complez [21:12:27] just running LOAD DATA doesn't work anymore [21:12:45] I see. So we'd need a way to make sure that any dataset loaded on DBa is also loaded on DBb or we'll get weird, intermittent query performance. [21:12:49] so if you have some suggestions, we are more than happy [21:12:56] 06Labs: Delete http://wikimedia-ui.wmflabs.org/ instance - https://phabricator.wikimedia.org/T156679#2987766 (10Volker_E) 05Open>03Resolved a:03madhuvishy [21:12:58] not performance [21:13:08] well, performance too [21:13:08] Well, yeah, I guess I mean failures. [21:13:10] but also [21:13:22] As in, if a table exists in DBa and not DBb [21:13:23] you will connect once and the data is there [21:13:25] exactly [21:13:41] Is there a task for this? [21:13:45] we need a "model" (that is why I talked abstractly) [21:13:52] there is a meta task [21:14:09] https://phabricator.wikimedia.org/T140788 [21:14:39] jynus, is this problem only on user-created tables? [21:14:42] probably commenting here is the best way https://phabricator.wikimedia.org/T153058 [21:15:03] yes, because replicated ones, well, it is a single channel [21:15:15] the problem is how to handle multiple channels [21:15:16] OK gotcha. [21:15:37] feel free to create a specific task, too [21:16:01] I intend to do this manually soon [21:16:05] for now [21:16:14] but there is always something on the way [21:17:11] because outage always take priirity [21:17:16] over new functionalities [21:19:26] Ouch. So this is going to need to handle sub-table updates too. [21:19:56] ? [21:20:05] E.g. "UPDATE u34567.mybots_table SET blocked = True WHERE user_id = 123456" [21:20:29] sorry, I do not understand [21:20:30] So we can't just duplicate table creates or have a cron job that notices new tables and makes a copy. [21:20:42] in general, the best way to get me to do something [21:20:51] is to propose a puppet patch [21:21:10] Yeah. I'm just thinking through the scope of this problem so that I can capture it in a phab task. [21:22:01] yes, an n<->n solutions is unlikely to work [21:22:16] maybe some kind of unique poing of write [21:22:33] but even that is not free of problems [21:22:43] "unique poing"? [21:22:44] what prevents someone from importing a large dataset [21:22:48] *point [21:22:52] ahh [21:22:57] and blocking replication for other users [21:23:06] it is not easy [21:23:11] which means it is not done yet [21:23:24] which is the reason we still are in a "doing things manually" phase [21:23:27] :-( [21:23:58] how large is your dataset? [21:24:01] Maybe that's not so bad. Maybe we could let users choose where their user-created tables live. [21:24:19] and then what happens [21:24:27] when that specific server goes down [21:24:34] or I have to upgrade mysql [21:24:35] ? [21:24:51] welcome to ops, where service can be guaranteed, but servers don't [21:24:54] :-) [21:24:58] 15.6GB to start and 2.1GB per year [21:25:10] ok [21:25:14] if you are in a hurry [21:25:16] jynus, when that server goes down, the tool/analysis stops [21:25:48] but do you think all users will be happy about that ? [21:26:04] remember the idea is not to solve your specific needs, but most of all users [21:26:24] one thing you could do *now* [21:26:33] is load it to tools-db [21:26:41] jynus, yeah. Thinking from this point of view I'd be more worried about Ops' point of view where one DB going down is a serious event. [21:26:42] I think it may have enough spave [21:26:54] actually it isnt [21:27:07] the service going down is the serious event [21:27:16] but if a server goes down [21:27:25] and the tables is on that only server [21:27:28] that is a problem [21:27:41] that is why we need at least replica to another place [21:28:10] so, as I said, what I can offer you *now* is tools db [21:28:18] which probably has now enough spave [21:28:28] until we solve the replica model [21:28:59] it has enought disk space [21:29:16] I cannot guarantee the IOPS, which was the whole poing of the new labsdbs [21:29:19] jynus, yeah that's what I was saying [21:29:43] jynus, I never made an estimate of IOPS for you [21:29:56] that is ok, load it there now [21:30:14] and we can stop it if we run out of them [21:30:20] and later move it back to labsdbs [21:31:47] COnsidering this, I'm going to have a problem with not being able to join to replica tables. [21:34:45] The tools-db doesn't have a set of replicas. That cuts the primary utility of this dataset. I might be able to query one DB to get a set of page_ids and then use that to subset the article quality table... [21:34:50] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Fabandy was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1433062 edit summary: [21:34:52] I'll look into that, but I'm skeptical. [21:44:00] 06Labs, 10Tool-Labs, 10Prod-Kubernetes, 10Tools-Kubernetes: Unify docker code and configuration between tools and prod kubernetes - https://phabricator.wikimedia.org/T156852#2987873 (10yuvipanda) [21:53:55] 06Labs, 10Tool-Labs, 10Tools-Kubernetes: k8s webservice restart failure with `ValueError: get() more than one object; use filter` - https://phabricator.wikimedia.org/T156626#2987937 (10yuvipanda) Yeah, I think that logic is bogus. I think two things need to happen: 1. We need to figure out what's wrong with... [21:57:52] 06Labs, 10Tool-Labs, 10Prod-Kubernetes, 10Tools-Kubernetes: Unify docker code and configuration between tools and prod kubernetes - https://phabricator.wikimedia.org/T156852#2987951 (10yuvipanda) Might have to switch to using CNI for flannel with tools as well. [21:59:12] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2987956 (10yuvipanda) Update: the replica script doesn't actually work... [22:07:51] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Fabandy was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1433450 edit summary: [22:09:42] (03PS1) 10Jean-Frédéric: Harvest whether an image is geolocated in the image table [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/335364 [22:11:02] (03CR) 10Jean-Frédéric: "This was made after a request of User:Braveheart" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/335364 (owner: 10Jean-Frédéric) [23:00:09] 06Labs, 10Labs-Infrastructure, 10DBA: Design a method for keeping user-created tables in sync across labsDBs - https://phabricator.wikimedia.org/T156869#2988296 (10Halfak) [23:17:24] !log tools.precise-tools Oops, I broke it. Investigating [23:17:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.precise-tools/SAL [23:38:03] 06Labs, 10Tool-Labs, 06Stewards-and-global-tools: For Tool Labs admins: Notice of heavy use of tool in the next month - https://phabricator.wikimedia.org/T156845#2988428 (10chasemp) p:05Triage>03Normal @MarcoAurelio the intent here is very much appreciated. There isn't anything up front that I'm concer... [23:54:33] !log wikispeech test log [23:54:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikispeech/SAL [23:55:12] 10Tool-Labs-tools-Other, 06Wikispeech-WMSE, 15User-LokalProfil, 10Wikispeech (Sprint 2017-01-25): New toollabs project not found in SAL - https://phabricator.wikimedia.org/T156127#2988495 (10chasemp) 05Open>03Invalid Works for me in `wikimedia-labs` ```!log wikispeech test log stashbot Logged the mess...