[00:13:27] 6Labs, 10hardware-requests, 6operations: eqiad: (6) labs virt nodes - https://phabricator.wikimedia.org/T89752#1134165 (10RobH) [00:13:41] 6Labs, 10hardware-requests, 6operations: eqiad: (6) labs virt nodes - https://phabricator.wikimedia.org/T89752#1044517 (10RobH) Order placed, lead time is 2-3 weeks for delivery. [00:14:05] 6Labs, 10hardware-requests, 6operations: eqiad: (6) labs virt nodes - https://phabricator.wikimedia.org/T89752#1134167 (10RobH) p:5Normal>3Low [00:54:50] hi, is there an easy way to make beta-labs dump certain tables for all of the languages? We are playing with an extension that's already deployed, and I don't want to spend time implementing proper update [00:55:03] its deployed in betalabs, not in prod [01:14:08] 10Wikimedia-Labs-General, 10Datasets-Archiving, 10Datasets-General-or-Unknown: Make pagecounts-all-sites available in Wikimedia Labs - https://phabricator.wikimedia.org/T93317#1134275 (10Hydriz) 3NEW [01:14:42] Coren, around? ^^^ [01:15:44] yurik_: I'm probably the wrong one to ask; the beta databases are local to the deployment-prep project and are not using the labs db infrastructure. [01:16:04] Coren, do you know anyone who might know this? [01:16:22] is there a list of all DBs? [01:16:39] i could do it with a simple script [01:17:05] Hashar for sure. Yuvi also - he worked quite a bit on beta. [01:19:01] Greg and Chad are also good bets. [01:19:49] YuviPanda|zzz is asleep ( [01:19:54] thanks though! [01:20:27] greg-g: ^^ [01:33:59] Coren: Can you help me find out how resource intensive a script I'm writing is? So I can try work out what it can and can't be used for. [01:34:39] a930913: As a rule, I probably can, but it's generally easier to add instrumentation yourself. What is your primary concern? [01:35:28] Coren: It's going to go through the whole wikidata dump on each run. [01:35:54] Coren: And run many queries to the wikidatawiki db. [01:36:21] Ideally, I'd like to make it a public facing webservice that people can run themselves. [01:37:20] Regardless of how efficient it ends up being, you probably want to make sure you rate limit it. Does it write to the fiiesystem at all or is it just reading the dump? [01:38:31] Coren: It currently builds some python arrays in memory, before dumping it into a file. [01:38:46] And are you reading the dump directly from /public/dumps? [01:39:06] Coren: Though that could/should probably be made more memory efficient by writing as it's read. [01:39:34] Coren: No, the dump wasn't there :p It's currently in the tool's userspace. [01:40:09] If you can, you can ease the load a great deal by using /scratch for the results if they are not long-lived. It's directed at a different controller and a different set of drives - that'd help a lot. [01:40:46] As for the DB queries, I'm no expert but I know that a little tweaking can make a lot of difference. I'd drop an email to Sean if you want to make sure you aren't making a monster. :-) [01:41:14] Coren: How not long lived? [01:42:45] "indefinite"; that is there is no regular wipe but scratch has no promises of persistence and much less redundancy than the "real" shared storage. [01:43:17] Unlike /data/project, /data/scratch may be lost in the case of hardware failure and replaced with blank disks. [01:43:43] OTOH, it's a different channel and is noticably faster in the general case. [01:43:43] So I should have a script to clean up after a week or so? [01:44:01] a930913: That's the best way to be nice to your scratch neighbors. :-) [01:44:35] The directory is global write +sticky. You can just create a directory there (I recomment your tool name) and use it. [01:44:46] recommend* [01:45:51] Coren: If I submit a job for it, how can I record the stats? [01:46:04] Such as CPU, memory, etc. [01:46:21] Well, most stats are already collected for you by the gridengine. You can get them after the fact with 'qacct ' [01:46:48] Ah, cool. I vaguely remembered that existing. [01:47:03] I/O use isn't metered though; you can usually estimate it pretty well by substracting the actual CPU time used from the wall clock. [01:47:54] Except when "time.sleep(0.1)" is needed fo fix a bug XD [01:48:39] Oh, ha. :-) If the job runs for a long time, you can ssh to the box where it lives and used pidstat - but that only works while it's running. [01:49:08] Coren: Do you know roughly what order of magnitude of queries per second is acceptable for the db? [01:50:48] a930913: That depends very much on the query itself. If it ends up doing full table scans, a couple by minute would be a maximum. If it's picking up a single row off an indexed column you can probably get away with dozens per second without it being noticable. [01:52:40] Coren: The simple kind I think. 'SELECT term_text FROM wb_terms WHERE term_type="label" AND term_entity_id="%s" AND term_language="en" AND term_entity_type="item"' [01:53:07] Those are likely to be efficient indeed. [01:53:10] Label name from ID basically. [01:53:24] If you can batch them, even more so. [01:53:45] Hmm. [01:54:42] Multiple line query, or single long query? [01:55:48] I think a single query is more efficient because it pays the setup cost just the once - but I would ask a real DBA. :-) [01:56:51] Coren: How could I do that? Lots of ORs? [02:01:30] Well, if you have several of term_type for one term_entity_it or vice versa, for instance, you could do things like "term_type in ("a", "b", "c") etc. [02:01:47] You probably only want to group by one column. [02:06:17] Coren: Do you know if it's better to query for one line first and failing that get ten, or query the ten every time? [02:06:47] Not enough to give you a confident answer either way. :-( [02:10:02] Coren: Thank you for your patience :) [02:10:20] If you get warnings shortly, don't worry, it's only me overloading the system. :p [02:11:22] Coren: I gave the job 10GB memory to test, is that OK? [02:12:04] I think it should be, 70% free on that node. [02:42:59] 6Labs, 10Wikimedia-Labs-Infrastructure, 10Continuous-Integration, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1134376 (10Krinkle) [03:08:12] Coren: Do you know what unit qstat gives io in? [03:09:52] The docs just say "The current accumulated IO usage of the job." :/ [03:10:19] a930913: IIRC, it's GB as counted in filesystem blocks in and out [03:10:42] The problem, I think, is that this may include network i/o [03:12:01] I guess when it finishes, I can check the ratio of what should to what is. [03:12:16] Coren: Oh, what time zone is Sean in btw? [03:12:35] Hm. Eastern Australia I think. [03:13:50] Autralia for sure, but I don't recall which coast. [05:29:15] https://tools.wmflabs.org/magnustools/multistatus.html the quick intersection service is down [05:29:24] can someone be of service ? [05:53:00] nobody ? [06:04:48] Hello! Can anyone help me? I have a problem with my bot. [06:06:11] One file (//tools.wmflabs.org/citing-bot/pmidFns.php) does not work, while the other run. [06:42:39] PROBLEM - Puppet failure on tools-webgrid-03 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [06:56:10] 10Tool-Labs: Get rid of portgranter - https://phabricator.wikimedia.org/T93046#1134634 (10yuvipanda) +1 to ^, and we can just loop around if needed (and actually add instrumentation too to detect how often this causes problems). [07:01:59] PROBLEM - Puppet failure on tools-login is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:07:44] RECOVERY - Puppet failure on tools-webgrid-03 is OK: OK: Less than 1.00% above the threshold [0.0] [07:27:08] RECOVERY - Puppet failure on tools-login is OK: OK: Less than 1.00% above the threshold [0.0] [08:23:19] https://tools.wmflabs.org/xtools-pages/ looks down [08:26:34] Nemo_bis: yup. not sure what I can do, though - the webservice seems up and running [08:26:55] Ok [08:27:01] Thanks [08:27:04] the maintainers probably need to poke at something [08:27:07] Nemo_bis: yw. [08:27:09] * YuviPanda goes for food [08:28:22] buon appetito [08:36:04] 10Tool-Labs: Make webservice2 activities blocking - https://phabricator.wikimedia.org/T93334#1134761 (10yuvipanda) 3NEW [08:42:11] 6Labs, 10Tool-Labs: Private SSL key got removed on tools-webproxy-01/tools-webproxy-02 and maybe other Labs instances as well - https://phabricator.wikimedia.org/T93212#1134770 (10hashar) [08:43:48] 6Labs, 10Tool-Labs: Private SSL key got removed on tools-webproxy-01/tools-webproxy-02 and maybe other Labs instances as well - https://phabricator.wikimedia.org/T93212#1134774 (10yuvipanda) Beta has no star.wmflabs.org key, and no other labs instance should have that key either. This needs to go on tools-web... [10:34:55] 10Tool-Labs-tools-anagrimes: Export Wiktionnaire in dictionary formats - https://phabricator.wikimedia.org/T93340#1134919 (10Darkdadaah) 3NEW a:3Darkdadaah [10:40:46] 10Tool-Labs-tools-anagrimes: Integrate Anagrimes search forms on fr.wiktionary - https://phabricator.wikimedia.org/T93342#1134941 (10Darkdadaah) 3NEW a:3Darkdadaah [10:44:22] 10Tool-Labs-tools-anagrimes: Develop a translitteration search - https://phabricator.wikimedia.org/T93343#1134950 (10Darkdadaah) 3NEW a:3Darkdadaah [10:45:36] 10Tool-Labs-tools-anagrimes: Export Wiktionnaire in dictionary formats - https://phabricator.wikimedia.org/T93340#1134961 (10Darkdadaah) p:5Triage>3Normal [10:52:18] 10Tool-Labs-tools-anagrimes: Anagrimes: store soft redirects data - https://phabricator.wikimedia.org/T93344#1134980 (10Darkdadaah) 3NEW a:3Darkdadaah [10:55:40] 10Tool-Labs-tools-anagrimes: Anagrimes: store context data from sense lines - https://phabricator.wikimedia.org/T93346#1134998 (10Darkdadaah) 3NEW a:3Darkdadaah [13:15:55] 6Labs, 10Continuous-Integration, 6operations: Evaluate options to make puppet errors more visible - https://phabricator.wikimedia.org/T92710#1135199 (10akosiaris) Just making a point here so we avoid future "puppet failed" threads. Puppet itself did not fail in this specific case. A specific resource failed... [14:07:28] 6Labs, 10Tool-Labs: Private SSL key got removed on tools-webproxy-01/tools-webproxy-02 and maybe other Labs instances as well - https://phabricator.wikimedia.org/T93212#1135258 (10scfc) I was looking at: ``` [tim@passepartout ~/src/operations/puppet]$ git grep privatekey manifests/certs.pp: $privatekey=tru... [14:19:56] 6Labs, 10Continuous-Integration, 6operations: Evaluate options to make puppet errors more visible - https://phabricator.wikimedia.org/T92710#1135275 (10scfc) Note that this task relates to Labs instances where Icinga can't be used. @yuvipanda had to jump through a lot of hoops to make the existing alerts po... [14:53:16] andrewbogott, YuviPanda: I'm going to flee early today because I'll have to finish the effing filesystem copy over the weekend. Do you have anything urgent for me before I go? [14:53:55] Coren: I don’t. I hope the copy is moderately painless :) [14:54:23] The copy is painless; it's just annoyingly long. People keep writing stuff to disk! :-) [14:54:41] Coren: it’s easy enough to stop /that/ from happening :) [14:55:00] Good point. I'll set the filesystem readonly before I leave. Have fun! :-P [14:55:06] Non-urgently — this needs to accumulate a list of issues that need fixing when we change dns schemes. I welcome your additions: https://phabricator.wikimedia.org/T93087 [14:57:13] I'll poke at it when I get a chance. [14:58:14] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1135425 (10Andrew) - resolv.conf needs to include the new .eqiad.wmflabs domain. - Does the webproxy use dns names or IPs? If the former then that needs fixing [15:25:03] 6Labs, 10Continuous-Integration, 6operations: Evaluate options to make puppet errors more visible - https://phabricator.wikimedia.org/T92710#1135516 (10akosiaris) >>! In T92710#1135275, @scfc wrote: > Note that this task relates to Labs instances where Icinga can't be used. @yuvipanda had to jump through a... [15:36:18] Coren: nothing from me either [15:43:48] andrewbogott: Coren some drafts for next quarter goals at https://etherpad.wikimedia.org/p/toollabs-quarter-goal-q4-2014-15 [15:43:53] or at least, the split off of the primary one [15:44:04] this doesn’t include the horizon work / OSM replacement work [16:16:55] 6Labs: Rename keystone role 'projectadmin' to 'admin' - https://phabricator.wikimedia.org/T91830#1135631 (10Andrew) Statement of fact. We need to continue to have a concept of a 'user' who can't create/destroy instances, that'll be done in ldap without any real keystone integration. The hard part is that with... [16:18:31] 6Labs, 6operations, 10ops-codfw: rack and connect labstore-array4-codfw in codfw - https://phabricator.wikimedia.org/T93215#1135632 (10mark) [16:19:36] 6Labs: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1135636 (10mark) @Coren: can we get an update on this if there has been any progress at all? [17:41:41] does wikitech really not share the integrated wikimedia identity used elsewhere on wikimedia? [17:42:07] topaz: nope, separate id and rights, for various reasons [17:42:09] have I screwed myself up by creating a wikitech account in addition to my SUL wikimedia account? [17:42:12] ok, just making sure [17:42:17] tx [17:46:09] 6Labs, 6Collaboration-Team: Allow wayback machine to crawl flow-tests.wmflabs.org - https://phabricator.wikimedia.org/T93221#1135876 (10EBernhardson) p:5Triage>3High [17:47:48] 6Labs, 6Collaboration-Team: Allow wayback machine to crawl flow-tests.wmflabs.org - https://phabricator.wikimedia.org/T93221#1135883 (10Mattflaschen) a:3Mattflaschen [18:43:20] 6Labs, 10Continuous-Integration: Continuous integration should not depend on labs NFS - https://phabricator.wikimedia.org/T90610#1136204 (10greg) [19:29:33] 6Labs, 7Monitoring, 5Patch-For-Review: Monitor nova services - https://phabricator.wikimedia.org/T90784#1136404 (10Andrew) In addition to process monitoring, Something should probably be running 'nova service list' on virt1000 and checking the status there -- in theory that's upgraded via queue messages so w... [20:35:21] Can anyone help me? I'm getting this error message: [20:35:22] The URI you have requested, /mono/, is not currently serviced. [20:35:22] at this page: http://tools.wmflabs.org/mono/ [20:35:54] Mention, PM, or memo me please [20:37:00] Mono: is the web service running? [20:39:58] PROBLEM - Puppet failure on tools-webgrid-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:41:07] Mono: run 'webservice2 start' as the tool and it should start then [21:04:56] RECOVERY - Puppet failure on tools-webgrid-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:59:54] YuviPanda|zzz twentyafterfour: Could I be added to the Phabricator project? [23:13:50] einyx: Did you figure out your shell access? [23:24:33] ^d: ^^