[00:01:02] 6Labs, 10Tool-Labs, 10Labs-Infrastructure: Can't delete rule in default security group - https://phabricator.wikimedia.org/T112492#1647654 (10Krenair) [00:03:45] andrewbogott, around? [00:03:49] oh, marked as away [01:44:37] yuvipanda: so for my mediawiki-docker project I think k8s would be a natural next step but it isn't ready yet but I would be interested in the beta [01:45:14] Negative24: sure! put your name in on the ticket? [01:45:18] Negative24: I think it'll be very interesting [01:45:42] yuvipanda: just drop in a note? [01:45:50] I think it will be interesting as well [01:45:50] yeah [01:45:55] ok [01:46:01] but first we need to find a way to run mediawiki in docker nicely [01:46:11] what to do about mysql? offer it as a data container? not sure [01:47:42] 6Labs, 10Tool-Labs: Kubernetes Beta Signup List - https://phabricator.wikimedia.org/T112824#1647850 (10Negative24) 1. mediawiki-docker 2. PHP (and Dockerfile) 3. webservice 4. no 5. no 6. no Not ready for prime time but still interested. [01:47:59] yuvipanda: ^ done [01:48:13] I've been serving mysql linked with a data-only container [01:48:22] right so then the question of where that data gets saved [01:48:39] that's also a concern of mine [01:48:52] I've been ignoring it until I can get extensions working right [01:50:58] yuvipanda: let me flesh out a few more details. I'm sure it'll come up when implimenting [01:51:28] ok :) [01:51:31] docker-compose is really nice [01:51:45] it is. I want to get into it more [05:27:07] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1648007 (10Krinkle) [05:27:36] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1648008 (10Krinkle) p:5Triage>3High [05:28:27] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1567858 (10Krinkle) Status? T110506 seems to be resolved. We're on... [05:58:33] 6Labs, 10Tool-Labs: Kubernetes Beta Signup List - https://phabricator.wikimedia.org/T112824#1648067 (10intracer) I have not deployed the tool to Toollabs yet, but going soon # [[ https://commons.wikimedia.org/wiki/Commons:WLX_Jury_Tool | WLX Jury Tool]] and other WLM/WLE related tools # Scala # Webser... [07:42:13] (03CR) 10Hashar: [C: 04-1] "@Yuvipanda is that still any useful?" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/204013 (https://phabricator.wikimedia.org/T93197) (owner: 10Yuvipanda) [08:10:12] 6Labs, 10Tool-Labs: Decide on Docker image policies for Tool Labs Kubernetes - https://phabricator.wikimedia.org/T112855#1648246 (10yuvipanda) 3NEW [08:11:09] 6Labs, 10Tool-Labs: Decide on Docker image policies for Tool Labs Kubernetes - https://phabricator.wikimedia.org/T112855#1648260 (10yuvipanda) [08:11:53] (03Abandoned) 10Yuvipanda: Do not depend on portgranter [labs/toollabs] - 10https://gerrit.wikimedia.org/r/204013 (https://phabricator.wikimedia.org/T93197) (owner: 10Yuvipanda) [08:12:44] 6Labs, 10Tool-Labs: Decide on Docker image policies for Tool Labs Kubernetes - https://phabricator.wikimedia.org/T112855#1648246 (10yuvipanda) [08:49:02] 6Labs, 10Tool-Labs: Decide on Docker image policies for Tool Labs Kubernetes - https://phabricator.wikimedia.org/T112855#1648344 (10yuvipanda) [09:05:30] 6Labs, 10Tool-Labs: Decide on Docker image policies for Tool Labs Kubernetes - https://phabricator.wikimedia.org/T112855#1648420 (10yuvipanda) [11:05:35] 6Labs, 10Maps: maps-warper /mnt vbd partition errored, turned read only and went missing after reboot - https://phabricator.wikimedia.org/T112641#1648641 (10Chippyy) [11:27:42] 6Labs, 10Maps: maps-warper /mnt vbd partition errored, turned read only and went missing after reboot - https://phabricator.wikimedia.org/T112641#1648686 (10Susannaanas) It is crucial to get enough capacity to warping maps before it is possible to use the tool more widely. We expect to be able to do mass uplo... [13:09:20] 6Labs, 10Beta-Cluster: beta-hhvm.wmflabs.org? - https://phabricator.wikimedia.org/T111657#1648873 (10Reedy) Just kill it! [13:53:07] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1649053 (10Cyberpower678) [13:55:37] Coren, I opened a ticket. ^ Do you need me to provide more info right now? [13:55:50] * Coren checks. [13:56:46] Cyberpower678: I'd add an estimate of what resources you need with your current iteration, at least, just to show you've done your homework. :-) [13:56:47] Hello, is here a member of phab projects phabricator, WMF-NDA or operations? [13:57:04] Luke081515: I'm in ops. [13:57:19] Coren, I don't want to give anyone a heartattack [13:57:37] Cyberpower678: It's not about heart attacks, it's about planning resources. :-) [13:57:43] Can you add me to triagers? I want to put this tasks at the new project, when it is created: https://phabricator.wikimedia.org/T43492 [13:57:45] Cyberpower678: Being upfront is better than getting there and breaking all the things for everyone [13:59:39] Reedy, Coren: True, but I'm worried my request may sound unreasonable. I don't want to be known as the person who wastes resources. [13:59:53] Luke081515: I'd leave that to Andre; he's the main bugwrangler and I wouldn't want to step on his toes. [13:59:59] Coren, check you pm [14:00:00] Cyberpower678: That's for the people in the know to judge [14:00:05] Cyberpower678: Hence the importance of showing your work. :-) [14:00:09] Cyberpower678: could you add what the task is to the ticket? :P [14:00:09] It's not wasting if you're doing something useful [14:00:36] /relevant/interesting to teh projects etc [14:01:49] Reedy, addshore: well the task is supposed to attach archive links to sources that are dead and/or tagged dead, while submitting pages still alive to be archived into the payback machine. [14:02:12] With that said it's going to run through millions of pages continuously. [14:02:17] On several wikis [14:02:45] It also tags links as dead if it finds it's dead and can't get an archive. [14:03:08] going to be on enwiki? [14:03:12] So it's constantly analyzing articles. [14:03:15] addshore, yes [14:03:20] and 29 other wikis [14:03:22] made a brfa yet? ;) [14:03:30] addshore, it's approved [14:03:38] link? :O [14:03:45] Cyberpower678: That sounds useful. So not a "waste" of resources [14:04:00] its approved but the code isnt done yet? ;) [14:04:15] I guess for labs, it'd just be a case of ramping up, rather than spawning thousands of simultaneous requests etc etc [14:04:21] initially [14:04:22] Reedy, I'm quite honored actually to be able to provide such a useful bot, and that members of WMF have taken an interest in it. [14:04:23] Like I said, Cyberpower678, explain what resources you need and how you arrived at those numbers on the ticket. We're all about giving useful stuff what they need to work. [14:05:13] Cyberpower678: Or if you don't know, be honest about that too. Tell us what you do know [14:05:16] addshore: AFAIK, he's already got a small scale version running but needs moar resources to (a) catch up and (b) scale to more wikis. [14:05:18] Alrighty. I came onto something new that might reduce my resource usage so I will have to recalculate those numbers. [14:05:40] You won't be in trouble if you need more etc [14:06:31] https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Cyberbot_II_5 ? :) [14:06:33] The way my bot was setup was to process redirects. Quite wasteful. By removing pages in a query that removes redirects, I can knock off roughy 6-7 million pages according to que [14:06:41] addshore, yea [14:06:55] So https://github.com/cyberpower678/Cyberbot_II/blob/master/deadlink.php ? :) [14:07:36] addshore, code is somewhat out of date. I'm constantly tinkering with the code. [14:08:11] Me limit can be less than 1G though. :-) [14:08:40] addshore, on labs the bot has run as low as 350 MB. [14:09:09] addshore, though on my windows machine, it runs no higher than 128 MB. :p [14:10:05] addshore, this bot can spawn multiple copies of itself though which will improve speed. [14:11:27] addshore, logically that allows the bot to get through Wikipedia in a reasonable amount of time, but it will rapidly build RAM and CPU usage up. [14:11:49] how many articles are you looking at looking at? [14:12:08] Almost 5 million [14:12:12] Was 12 million [14:12:19] 1GB memory isn't a lot [14:12:30] Hence, me needing to recalculate my numbers. [14:12:55] Cyberpower678: where do the 5 million / 12 million come from? [14:13:39] addshore, 12 million comes from the number of pages in the 0 namespace on enwiki from a DB query. [14:13:51] ah, so all of them ;) [14:14:09] addshore, 5 million comes from a more refined query that filters out redirects. [14:14:28] * Cyberpower678 didn't realize that their are almost 7 million redirects. :O [14:15:35] That would would greatly save a lot of time on enwiki. [14:15:45] And API queries. [14:17:17] addshore, fashioning the multi query function is probably the best time saver I came up with. I essentially reduced several hundred curl execs per article to exactly 3. [14:17:38] multi query function? link? :P [14:17:40] https://en.wikipedia.org/wiki/Special:Statistics tells you nearly 5M [14:17:42] ;) [14:18:18] Reedy, My head was obviously stuck in sand in regards to that. :D [14:19:19] addshore, check the repo code you pulled up. [14:19:26] On the bottom. [14:19:37] lol ?> [14:19:51] http_parse_headers? [14:20:08] oh wait, function multiquery [14:23:13] Reedy, lol? [14:23:24] Oh [14:23:26] You shouldn't use ?> at the end of a file [14:23:27] LOL [14:23:34] OH [14:23:38] WHY? [14:23:44] Gah caps lock [14:24:11] Reedy, why not? [14:24:31] https://stackoverflow.com/questions/4410704/why-would-one-omit-the-close-tag [14:35:05] Reedy, huh. My debugger always adds it automatically for me. [14:40:03] 6Labs, 10Tool-Labs, 7user-notice: *.wmflabs.org https certificate expired (tools.wmflabs.org) - https://phabricator.wikimedia.org/T112608#1649174 (10matej_suchanek) [14:46:35] yuvipanda: postgres? [14:59:02] 6Labs, 3Labs-Sprint-114, 5Patch-For-Review: Setup an availability checker for all labsdb hosts - https://phabricator.wikimedia.org/T107449#1649263 (10Andrew) [14:59:26] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Setup a tools checker service that can check all internal services for availability - https://phabricator.wikimedia.org/T97748#1649264 (10Andrew) [15:12:58] Hey [15:14:05] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Setup a tools checker service that can check all internal services for availability - https://phabricator.wikimedia.org/T97748#1251358 (10Andrew) [15:15:36] 6Labs, 10Beta-Cluster: beta-hhvm.wmflabs.org? - https://phabricator.wikimedia.org/T111657#1649349 (10Krenair) 5Open>3Resolved a:3Krenair done [15:19:14] Does anyone here know a guy called Shilad [15:19:19] Idk what his nick is [15:19:26] He was in this channel yesterday [15:27:43] 6Labs, 10Tool-Labs: Packages from the aptly server are not installable - https://phabricator.wikimedia.org/T112699#1649417 (10scfc) I agree that signing the packages is better. I'll file subtasks for creating a key and signing the existing packages. [15:28:24] 6Labs, 10Tool-Labs: Sign packages on aptly server with Tool Labs packages key - https://phabricator.wikimedia.org/T112901#1649421 (10scfc) 3NEW [15:34:28] 6Labs, 10Tool-Labs: Create Tool Labs packages key - https://phabricator.wikimedia.org/T112905#1649483 (10scfc) 3NEW [15:35:41] 6Labs, 10MediaWiki-Configuration, 10wikitech.wikimedia.org: Update wikitech config to allow local logging via Monolog - https://phabricator.wikimedia.org/T106697#1649491 (10Krenair) Done in https://gerrit.wikimedia.org/r/#/c/221825/ [15:35:56] 6Labs, 10wikitech.wikimedia.org: Update wikitech config to allow local logging via Monolog - https://phabricator.wikimedia.org/T106697#1649492 (10Krenair) 5Open>3Resolved a:3bd808 [16:09:05] 6Labs, 10Tool-Labs: Permission issues and/or failure to load Ruby environment on trusty - https://phabricator.wikimedia.org/T106170#1649619 (10coren) I can confirm after quite a bit of testing that (a) the bug can only occur on files created on one client then accessed on another, (b) the symptom goes away aft... [16:09:28] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1649620 (10Cyberpower678) I ran through some sloppy calculations of what it would take to run the bot on the 30 largest wikipedias, in an estimated timeframe, and here's what I arrived at. Personally 3 da... [16:09:47] Coren, ^ [16:10:32] Yep, that helps. [16:10:42] Coren, don't have a heart attack. ;-) [16:11:18] Coren, that's going to need a considerable amount of CPU though. [16:11:49] Especially if you do want the overkill option. [16:15:25] 6Labs, 10Tool-Labs: Permission issues and/or failure to load Ruby environment on trusty - https://phabricator.wikimedia.org/T106170#1649631 (10MusikAnimal) Awesome, thank you @coren and @valhallasw! [16:17:15] Cyberpower678: https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Virtualization%2520cluster%2520eqiad&tab=m&vn=&hide-hf=false [16:17:21] We're only using 20% of the total cpu [16:17:22] xD [16:17:37] (I know, all the hosts won't run vms) [16:30:06] Reedy, what are the prod tech specs? [16:30:18] Of what? [16:30:59] You have 500+ CPUs. How much power does each have? [16:36:17] Reedy, ^? [16:36:56] I believe that's actually CPU cores [16:37:21] So you have 528 cores. But what's the frequency? [16:37:28] If you drill down to the actual host... [16:37:29] https://ganglia.wikimedia.org/latest/?c=Virtualization%20cluster%20eqiad&h=labvirt1001.eqiad.wmnet&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [16:37:33] 48 cores [16:37:44] I dunno if that includes HT [16:38:02] I suspect the quickest answer to that is cat /proc/cpuinfo [16:38:26] Reedy, do it. :D [16:38:35] You can do it too :P [16:38:39] tools-login says cpu MHz : 2693.248 [16:38:47] model name : Intel Xeon E312xx (Sandy Bridge) [16:38:55] But shouldn't you guys know the specs of the CPUs you outfit your services with. [16:39:14] That's a nice one [16:39:26] I'm not ops [16:39:55] And they'll probably not know off the top of their heads either with so many machines [16:44:47] 6Labs, 10Tool-Labs: Permission issues and/or failure to load Ruby environment on trusty - https://phabricator.wikimedia.org/T106170#1649823 (10coren) In the meantime, @MusikAnimal, you may be able to work around the issue temporarily by doing a stat() on problematic files before opening them. It sucks, and I... [16:46:06] yuvipanda: I’m going to lunch, but if you have a moment please look at ~tools.toolschecker/foo.py [16:46:27] It works fine, except I can see that it is clearly leaking records and yet when I look at the db by hand there are no records. Which makes me question… everything [17:53:10] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1650110 (10coren) @valhallasw: Possibly silly question, but does webservicewatcher actually handle the case where a job is queued for restart properly? If you get a bunch of jobs all at o... [18:47:52] Special request: can a labs admin please add myself and ewulczyn to the deployment-prep project? [18:48:23] awight: That's a normal access request, awight. Do you have a phab task for it? [18:49:17] awight: Although I'm not sure our usual process works for it. Technically, though, anyone admin in the project can add you. [18:49:36] I'd much rather it be one of them, if at all possible. [18:50:00] 6Labs, 10Analytics-EventLogging, 10Beta-Cluster, 10Fundraising-Backlog, and 3 others: Betawiki EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1650382 (10awight) 3NEW [18:50:38] Coren: OK sure, thanks for the pointer! [18:52:42] 6Labs, 10Beta-Cluster, 10Ops-Access-Requests, 6operations: Add AWight and EWulczyn to the deployment-prep Nova project - https://phabricator.wikimedia.org/T112927#1650389 (10awight) 3NEW [18:54:36] 6Labs, 10Analytics-EventLogging, 10Beta-Cluster, 10Fundraising-Backlog, and 3 others: Betawiki EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1650405 (10Ottomata) eventlogging and database have been moved to deployment-eventlogging03 instance. Needed to upgrade from Precise... [18:57:08] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1650419 (10valhallasw) Ah. Yes, that might be true· Checking one example: ``` tools.gerrit-patch-uploader@tools-bastion-01:~$ qstat job-ID prior name user state submit/st... [18:57:18] 6Labs, 10Beta-Cluster, 10Ops-Access-Requests, 6operations: Add AWight and EWulczyn to the deployment-prep Nova project - https://phabricator.wikimedia.org/T112927#1650422 (10Krenair) Why is this in #Ops-Access-Requests and #operations? [18:58:46] Hi, I'm not able to login to tool labs. It gets hung when I run the ssh command with no further information. [18:59:04] Problem at my end? [18:59:25] 6Labs, 10Beta-Cluster, 10Ops-Access-Requests, 6operations: Add AWight and EWulczyn to the deployment-prep Nova project - https://phabricator.wikimedia.org/T112927#1650429 (10greg) 1) Beta Cluster, not beta labs (please rename your wikipage), see https://wikitech.wikimedia.org/wiki/Labs_labs_labs 2) no nee... [19:00:59] Niharika, ssh tools-login.wmflabs.org works for me [19:01:29] Platonides: Thanks. I guess my internet is too slow today. [19:03:59] 6Labs, 10Beta-Cluster: Add AWight and EWulczyn to the deployment-prep Nova project - https://phabricator.wikimedia.org/T112927#1650451 (10Krenair) Failed to add Awight to deployment-prep. Failed to add EWulczyn to deployment-prep. [19:04:07] 6Labs, 10Beta-Cluster: Add AWight and EWulczyn to the deployment-prep Nova project - https://phabricator.wikimedia.org/T112927#1650453 (10greg) Should be done, now. (And thanks for renaming :) ) [19:04:44] 6Labs, 10Beta-Cluster, 15User-greg: Add AWight and EWulczyn to the deployment-prep Nova project - https://phabricator.wikimedia.org/T112927#1650458 (10Krenair) a:3greg Ah, looks like @greg got there first [19:04:48] 6Labs, 10Beta-Cluster, 15User-greg: Add AWight and EWulczyn to the deployment-prep Nova project - https://phabricator.wikimedia.org/T112927#1650462 (10Krenair) 5Open>3Resolved [19:23:43] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1650595 (10scfc) That feels like wrong behaviour of `WebServiceMonitor` to me. If there already is a job in any state, IMHO it shouldn't start another one. [19:23:56] Coren, can you help me out with a very basic mysql issue? [19:24:09] andrewbogott: Sure. What be up? [19:24:49] I just added you to the ‘toolschecker’ tool. If you become tools checker then you can look at ~/foo.py [19:25:20] Simple script, creates a record with the epoch timestamp and then (due to a #commented out bit) leaks it [19:25:43] What am I looking at? What is this intended to do? [19:25:44] My issue is that when I log into mysql directly for confirmation, I cannot see the records that that script should be creating. [19:26:04] It’s going to be just a catchpoint test to verify that we can read and write to tools-db [19:26:23] So… it seems to work fine, except that I’m clearly misunderstanding something fundamental. [19:26:38] I don't know pymysql but I'm 99% sure you're being bitten by a transaction. [19:26:48] ah, I need to confirm or commit or something? [19:26:59] You probably just want to add an explicit commit - or tell the library to commit on close. [19:27:01] pymysql is poorly documented :( [19:27:08] ok, I’ll google for something like that; thanks. [19:27:17] cur.execute("COMMIT") should do it. [19:27:39] (And I wouldn't be surprised if there wasn't a cur.commit member too) [19:27:39] I believe there is a conn.commit() [19:27:45] maybe a cur.commit() too [19:27:59] yep, that’s all there was to it. [19:28:17] So, without the commit()… what was happening? [19:28:23] I could read the record back, so it was going somewhere [19:28:35] Does the table have a local cache until it’s committed? [19:28:47] * andrewbogott has done surprisingly little of this [19:29:29] andrewbogott: While you are within a transaction, you see its results - but until a commit nothing else can. [19:29:38] ok. [19:29:43] (Because the transaction is - by definition - atomic as a whole) [19:30:00] So a commit() can fail, if there’s a collision with another transaction elsewhere? [19:30:14] Presumably, pymysql defaults to rollback at close - which is not insane since that means that a crashing script won't commit a half-done transaction. [19:31:00] Am I in a new transaction as soon as I commit, or do I need to re-open or something? I want to verify that the record is really, truly writing to disk. [19:31:06] Yes. The semantics of a sql commit statement is "behave as though the entire transaction was attempted instantaneously at the time of the commit and either succeed as a whole or fail as a whole" [19:31:25] You're in a new transaction as soon as you commit or rollback. [19:31:29] ah, ok. That’s reasonable. [19:31:30] Thank you! [19:32:26] np [19:40:24] Coren: I’d appreciate a review of https://gerrit.wikimedia.org/r/#/c/239182/ (and for that matter, the preceeding patch as well.) [19:41:04] * Coren looks at both [19:45:47] They both lgtm, but I'd really rather the first patch not include a dummy two-liner testjob.py when you could just invoke /bin/true or /bin/sleep. [19:45:51] andrewbogott: ^^ [19:46:15] I don’t think /bin/true will work since it exits immediately. But sleep should work. [19:46:38] Why do you need it to not exit immediately? [19:46:55] won’t job_running() return false if the job is finished? [19:47:07] * Coren ponders. [19:47:24] Yeah, probably, but then you're also courting problem relying on a sleep. [19:47:37] Why not do '-sync y' instead? [19:48:12] …because I don’t know what that is :) [19:48:12] (with an alarm() for timeout instead) [19:48:17] Hah. [19:48:28] Hi [19:48:35] Alarm is going to suck with flask :) [19:48:36] -sync y to qsub/jsub says "wait until the job completes" [19:48:42] * yuvipanda feels a bit sick [19:49:01] yuvipanda: ah. No other reliable way to do a timeout then? [19:49:04] Ah — in that case, i think my answer is “because I’m trying to test a thing that our users actually do" [19:49:23] andrewbogott: I tried the postgres user account yesterday but couldn't figure out the right set of permissions to grant it. I was hoping to catch Alex today but didn't... [19:49:41] yuvipanda: I take it no one really uses postgres, huh? [19:49:46] andrewbogott: '-sync y' is the same as without, only qsub waits; under the hood it does exactly the same. [19:50:10] andrewbogott: I tried the postgres user account yesterday but couldn't figure out the right set of permissions to grant it. I was hoping to catch Alex today but didn't... [19:50:12] andrewbogott: I tried the postgres user account yesterday but couldn't figure out the right set of permissions to grant it. I was hoping to catch Alex today but didn't... [19:50:24] andrewbogott: yeah except halfak [19:50:43] Coren: so could the job just be ‘sleep 600’? Not sure how args are parsed there... [19:50:57] Yeah, 'sleep 600' would work. [19:51:08] * andrewbogott tries [19:51:26] (You might want to /bin/sleep explicitly, but the path certainly includes /bin so no biggie) [19:54:57] (03PS1) 10Dzahn: add fake private key for tendril.wm [labs/private] - 10https://gerrit.wikimedia.org/r/239190 [19:55:37] (03CR) 10Dzahn: [C: 032 V: 032] add fake private key for tendril.wm [labs/private] - 10https://gerrit.wikimedia.org/r/239190 (owner: 10Dzahn) [19:56:21] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1650792 (10coren) >>! In T109362#1650595, @scfc wrote: > That feels like wrong behaviour of `WebServiceMonitor` to me. If there already is a job in any state, IMHO it shouldn't start anot... [20:08:27] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1650865 (10scfc) The problem with "all states but any variation on queued" is T107878. When a restart fails, `WebServiceMonitor` just keeps on trying every 15 s. [20:12:33] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1650896 (10coren) >>! In T109362#1650865, @scfc wrote: > The problem with "all states but any variation on queued" is T107878. When a restart fails, `WebServiceMonitor` just keeps on tryi... [20:29:47] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Setup a tools checker service that can check all internal services for availability - https://phabricator.wikimedia.org/T97748#1650999 (10Andrew) [20:30:33] yuvipanda: is ‘Reading/Writing to all three labsdb replicas works’ different from 'Setup an availability checker for all labsdb hosts’? [20:30:54] 6Labs, 3Labs-sprint-112: Restore some files from /home/gwicke - https://phabricator.wikimedia.org/T110698#1651013 (10cscott) 0.4.1all has been pushed to releases.wikimedia.org, and all the installation instructions updated. [20:31:06] 6Labs, 3Labs-sprint-112: Restore some files from /home/gwicke - https://phabricator.wikimedia.org/T110698#1651018 (10cscott) 5Open>3Resolved a:3cscott [20:31:47] 6Labs, 3Labs-sprint-112: Restore some files from /home/gwicke - https://phabricator.wikimedia.org/T110698#1651021 (10cscott) 5Resolved>3Open Whoops, closed the wrong bug. @gwicke, do you still need those files restored, now that T111213 is resolved? [20:32:29] 6Labs, 3Labs-sprint-112: Restore some files from /home/gwicke - https://phabricator.wikimedia.org/T110698#1651029 (10cscott) [20:40:37] andrewbogott: I guess it would include toolsdb tko [20:41:20] so, we have tools-db, labsdb1001, labsdb1002, labsdb1003, labsdb1004, labsdb1005 [20:41:23] Is that all? [20:41:33] Or is there something else mysterious, ‘labsdb replicas’? [20:42:52] Coren, so about my the bot resources. Does it seem reasonable? [20:43:14] yuvipanda: I’m trying to understand how https://phabricator.wikimedia.org/T107449 overlaps with the things enumerated in https://phabricator.wikimedia.org/T97748 [20:43:26] Cyberpower678: "Not insane"? [20:43:38] Coren, heh [20:44:25] andrewbogott: brrr it is loading very slowly. No labsdb replica just refers to labsdb1001, 2 and 3 [20:44:35] ok, so those are redundant... [20:44:43] but I need to add a test for writing I guess [20:44:51] Yup redundant [20:44:54] Coren, is it possible to get an ETA at when the project can, if at all, get approved. [20:44:55] ? [20:45:06] oh, except, tools doesn’t ever write to 1001,2,3,5 does it? only to toolsdb? [20:45:30] Cyberpower678: I'll bring the subject up Monday at the lab's weekly meeting. [20:46:02] Cyberpower678: Or you can beg and grovel andrewbogott to look at it now since he's the one of us who best knows what we have available. :-) [20:47:06] andrewbogott, pleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleasepleaseplease [20:47:07] :D [20:47:16] Cyberpower678: link? [20:48:11] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Setup a tools checker service that can check all internal services for availability - https://phabricator.wikimedia.org/T97748#1651150 (10Andrew) [20:48:38] andrewbogott, https://phabricator.wikimedia.org/T112881 [20:50:08] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Setup a tools checker service that can check all internal services for availability - https://phabricator.wikimedia.org/T97748#1251358 (10Andrew) I could use guidance for the remaining tasks: - Starting a webservice (lighttpd)... [20:50:32] andrewbogott, did I grovel enough? [20:53:21] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1651203 (10Andrew) It's not immediately clear to me what VM resources we're talking about. Can you tell me more about what those numbers mean? Will the storage be on NFS or local instance storage? If th... [20:53:26] andrewbogott: tools can write to them all [20:54:37] ok [20:55:03] yuvipanda: are some of those shards of others, or is each a stand-alone system? [20:55:53] andrewbogott: 1, 2, 3 are replicas of each other for the most part, although they have some standalone user dbs too [20:55:53] 5 is standalone and so is 4 [20:56:13] um... [20:56:15] andrewbogott, Actually those numbers are RAM usage. [20:56:40] Cyberpower678: ok, can you rephrase your request in terms of X instances of Y size? [20:57:08] What do you mean? [20:57:35] This is a request for a project. Don't I have the ability to create my own instances? [20:58:11] andrewbogott, ^ [20:59:25] yes, but I want to know how many resources you need in your project… [20:59:34] so I can evaluate whether or not we have those resources [21:00:54] an x-large has 8 CPUs and 16GB of ram, so that would run ~32 instances of your bot, if I understand correctly [21:02:20] so about six or seven x-large instances? [21:05:49] Ideally, I would like 170GB of RAM, and enough CPU to simultaneously execute, 340 php scripts [21:06:10] andrewbogott, ^ [21:06:24] I think you can better judge how many scripts can run on a single CPU [21:08:30] How would Andrew be able to judge that? It's not clear what the bottleneck of your bot is [21:08:41] it's not even clear what it /does/ ;-) [21:08:50] Hmm.. [21:09:12] It was alway my impression that PHP hardly uses CPU when running scripts. [21:09:28] And that many PHP scripts can run on a single CPU [21:09:33] again, that depends on what the script does [21:09:48] if it does network requests, and the network is the bottleneck, the CPU will not be busy [21:09:51] valhallasw`cloud, it trolls around the largest 30 wikis [21:09:57] if you are calculating a million digits of pi.... [21:10:09] Making millions of HTTP requests. [21:10:49] It simultaneously executes hundreds at a time though. [21:11:29] measure it? [21:11:33] sounds like a queue [21:12:34] run a limited batch on tool labs using 'time ', which will show you real/user/sys [21:12:37] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1651309 (10Andrew) Ideally, I would like 170GB of RAM, and enough CPU to simultaneously 170 GB is about 50% of the ram of a single hardware node, or about 5% of RAM of all of labs. That... [21:12:39] Assuming I can cram roughy 50 scripts on a CPU based on the current scripts running on the cyberbot exec node, and factoring CPU gaps, 8 CPUs? [21:13:47] valhallasw`cloud, andrewbogott ^ [21:14:24] Cyberpower678: the memory/cpu ratio is constant (1 CPU per 2GB memory) [21:15:09] valhallasw`cloud, with that I would get 85 CPUs. I think that is excessive. [21:15:57] legoktm, it seems I am attracting your attention. :p [21:16:15] maybe. anyway, I'm off to bed. [21:16:33] Cyberpower678: huh? [21:16:49] You just subscribed to my phobia ticket. [21:21:09] oh [21:21:10] yeah [21:21:17] I think the amount of resources you need is crazy [21:21:29] also why are you running things in parallel? [21:21:54] legoktm, to speed things up [21:22:01] but https://www.mediawiki.org/wiki/API:Etiquette#Request_limit [21:22:10] also, there's no deadline [21:22:59] I'm trying to find a way to reduce the resource demand [21:24:08] legoktm, of course. I'm not trying to push it. [21:24:34] legoktm, I agree the amount of resources to achieve this task is ridiculous [21:24:57] But that's based on my calculations at current. [21:27:12] legoktm, like I mentioned in the phab ticket, the bot is a constant work in progress. [21:28:19] I will surely find ways to speed the bot up in a series operation which will speed parallel operations up and I can demand less resources for the same speed. [21:29:25] Gotta go for now [21:30:40] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1651421 (10Cyberpower678) Like I said earlier, the bot is still in development. I will continue to improve resource usage. I must say these are rather high demands, so I will continue to shave GBs of off... [21:34:54] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1651449 (10Legoktm) Running bots in parallel isn't a great idea: https://www.mediawiki.org/wiki/API:Etiquette#Request_limit [21:59:20] 6Labs, 3Labs-Sprint-114, 5Patch-For-Review: Setup an availability checker for all labsdb hosts - https://phabricator.wikimedia.org/T107449#1651600 (10Andrew) Alex, yuvi thinks that only you know how to provide me with a set of creds on labsdb1004. Is that right? [21:59:40] 6Labs, 3Labs-Sprint-114, 5Patch-For-Review: Setup an availability checker for all labsdb hosts - https://phabricator.wikimedia.org/T107449#1651603 (10Andrew) [22:15:46] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Setup a tools checker service that can check all internal services for availability - https://phabricator.wikimedia.org/T97748#1651702 (10Andrew) [22:30:25] Hi, it looks like the meetbot logs for June 10 are gone, link http://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-06-10-21.02.html in https://phabricator.wikimedia.org/T99268#1355187 is a 404 [22:31:28] now https://meta.wikimedia.org/wiki/Meetbot cautions "Exported minutes are not backed up". But people (me) don't have time for the administrivia of copying and pasting stuff. Can I request these files get backed up? [22:36:14] marktraceur: hi, you're listed as a maintainer of meetbot, does it squirrel logs away somewhere? [22:39:59] 6Labs: Document labs SSH Fingerprints in sha256 format - https://phabricator.wikimedia.org/T112993#1651792 (10Platonides) 3NEW [22:45:00] shouldn't I be able to view deployment-prep instances on https://wikitech.wikimedia.org/wiki/Special:NovaInstance ? [22:45:19] Platonides: you might need to log out and in again [22:45:38] legoktm, why? [22:46:00] because its broken :P [22:46:06] spagewmf: so those are backed up on NFS, and just those logs might be missing for the 9day period during the nFS outage [22:46:13] spagewmf: file a bug? Coren can we still recover these/ [22:46:15] ? [22:46:34] that's silly, I have been a member for more than a year [22:47:05] yuvipands, thanks will do. Do I need to file a general bug to request backup of meetbot's files? Also user tto had the great idea of replaying wm-bots logs so meetbot can rebuild its fancy logs [22:47:05] Hey spagewmf, you are welcome! [22:47:07] well, I still don't see instances... [22:47:29] wm-bot please replay 2015-06-10 so meetbot will do it's thing [22:47:37] *its [22:47:39] hmmm [22:48:43] spagewmf: so all of tools' files that do not end in .log are backed up daily now [22:49:04] spagewmf: so I'd say that the 'not backed up' comment is inaccurate [22:49:40] yuvipanda: that's great news! I'll chip in $25 for a backup tape :-) Can you update that meta page? [22:51:09] spagewmf: done so [22:51:20] spagewmf: that comment was accurate the time it was made [23:20:19] 6Labs, 10Labs-Infrastructure: restore Meetbot logs from around 2015-06 lost in NFS outage - https://phabricator.wikimedia.org/T113000#1651922 (10Spage) [23:20:51] yuvipanda , Coren ^ . Thanks! [23:20:52] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1651927 (10Earwig) I don't understand. What kind of work are you doing that requires so much memory? [23:31:12] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1651968 (10Cyberpower678) >>! In T112881#1651927, @Earwig wrote: > I don't understand. What kind of work are you doing that requires so much memory? I want to run the DeadLinksBot in parallel to improve i... [23:31:46] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1651969 (10Cyberpower678) >>! In T112881#1651927, @Earwig wrote: > I don't understand. What kind of work are you doing that requires so much memory? I also intend to run it on 29 other wikis. [23:32:17] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1651972 (10Cyberpower678) But again, these resource demands are not final. I'm still working on the bots resource usage. [23:59:43] 6Labs, 10Labs-Other-Projects: Create Cyberbot Project on Labs - https://phabricator.wikimedia.org/T112881#1652025 (10Platonides) If I understand correctly, DeadLinksBot parses pages fetching archived sources / or submitting for archival. That seems a very linear task. Why does it need 500MB per worker? Also,...