[01:32:36] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Edoderoo was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=209603 edit summary: [01:32:46] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/PetrohsW was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=209604 edit summary: [01:39:05] 6Labs, 10wikitech.wikimedia.org: "Edit with form" missing on a Tools access request page - https://phabricator.wikimedia.org/T118136#1833167 (10scfc) And https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Edoderoo. @Krenair: It would be nice if someone with SMW/SMF knowledge could share ho... [10:42:03] 6Labs, 10Tool-Labs, 7Database: s51078 is executing the same >1h query every 5 minutes - https://phabricator.wikimedia.org/T119695#1833462 (10jcrespo) 3NEW [10:47:07] 6Labs, 10Tool-Labs, 7Database: s51078 is executing the same >1h query every 5 minutes - https://phabricator.wikimedia.org/T119695#1833470 (10jcrespo) 5Open>3Resolved a:3jcrespo Throttled to one connection per user. [10:54:37] 6Labs, 10Tool-Labs, 7Database: s51078 is executing the same >1h query every 5 minutes - https://phabricator.wikimedia.org/T119695#1833485 (10jcrespo) [10:54:38] 6Labs, 10Tool-Labs, 7Database: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#1833484 (10jcrespo) [10:54:56] 6Labs, 10Tool-Labs, 7Database: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#1833486 (10jcrespo) p:5Triage>3High [10:55:29] 6Labs, 10Tool-Labs, 7Database: tools.joanjoc is executing the same >1h query every 5 minutes - https://phabricator.wikimedia.org/T119695#1833490 (10valhallasw) [11:14:29] 6Labs, 6operations: Untangle labs/production roles from labs/instance roles - https://phabricator.wikimedia.org/T119401#1833513 (10yuvipanda) I've done this for most things, just a couple left (openldap::labs is a new and sad exception :() [12:52:13] 6Labs, 10Tool-Labs: Setup an icinga instance to monitor tools on tool-labs - https://phabricator.wikimedia.org/T53434#1833695 (10zhuyifei1999) [12:55:46] 10Tool-Labs-tools-Other, 6Wikisource: OCR scripts need updating at tools labs by updating the "tesseract-ben" package - https://phabricator.wikimedia.org/T117711#1833698 (10Aklapper) [12:57:27] 6Labs, 6operations, 5Patch-For-Review, 7Puppet: Self hosted puppetmaster is broken - https://phabricator.wikimedia.org/T119541#1833712 (10akosiaris) I setup a new self hosted puppetmaster environment today and I did not meet this problem. [13:11:59] Hi, I get this error https://dpaste.de/OQWF while trying to 'announce issue' which sends out Echo notifications to subscribers of a newsletters in vagrant. Tried restarting the redis-server but it did not... just kept saying "Stopping redis-server: " Could someone help ? [13:14:01] tinajohnson: well. I got https://dpaste.de/vqUR/raw while trying to login to wikitech. :\ [14:04:09] !log upgrading zuul on labs to 2.1.0-60-g1cc37f7-wmf3 ( https://review.openstack.org/#/c/249207/2 https://phabricator.wikimedia.org/T97106 ) [14:04:10] upgrading is not a valid project. [14:04:13] grr [17:02:55] 6Labs, 7Database: Database replicas: replicate user.user_touched - https://phabricator.wikimedia.org/T92841#1834218 (10jcrespo) The importing is taking place now. It will take a while, as we have 5GB of user data per server. [18:03:01] jynus: Enwiki has a replag of 1:10:58 at the moment [18:03:11] Luke081515, yes, it is expected [18:03:13] and it's still growing [18:03:16] ok [18:03:17] see server admin log [18:03:56] ok, thanks [18:04:10] now that you have exact lag measuring you are not going to let me pass one, don't you? :-) [18:05:25] is azwikitionary affected too? [18:05:34] no, only enwiki [18:05:46] all of them will be eventually [18:05:51] because azwiktionary has replag > 5 hours [18:06:00] I doubt it [18:06:55] https://phabricator.wikimedia.org/P2361 [18:07:16] http://tools.wmflabs.org/betacommand-dev/cgi-bin/replag [18:07:25] ok, was not the databse problem. Since 14:02 no one edited azwiktionary [18:07:30] *database [18:07:35] *not a [18:07:35] exactly [18:07:43] that is why my method is more accurate [18:08:41] Luke081515: like I said you need to take the numbers with a grain of salt, and know how its calculated [18:09:13] Betacommand, can I challenge you to create a similar page with the new table? :-P [18:09:43] jynus: eventually [18:09:47] :-) [18:10:06] jynus: Not in the mood today, and busy all weekend [18:10:12] of course [18:10:48] jynus: your method is still not 100% [18:10:52] why? [18:11:24] jynus: cases where a query locks a database/table from incoming writes while it works [18:11:44] Ive seen that cause 1-2 hour lags [18:11:45] when a query locks a table, the whole replication stops [18:11:56] and so does heartbeat [18:12:05] jynus: I think thats per database not shard [18:12:10] no [18:12:16] there is no replication per database [18:12:25] only shards are replicated [18:12:36] ah, must have been with the old system [18:12:36] there is no parallel replication, (yet) [18:12:57] if there was, then I would implement a counter per database [18:13:16] jynus: Ive been using replicas for 10 years now :P [18:13:30] I can assure you, we are just reuing the setup created on production [18:13:36] it is very accurate [18:13:43] only 10 years? [18:13:54] :-) [18:14:07] jynus: WMF replicas [18:14:11] ah! [18:14:35] so Ive seen quite a bit [18:14:46] we need help on the DBA team [18:14:52] we are now a group of [18:14:55] 1 people [18:15:33] and we are a bit busy, I know now where I can get help, join #wikimedia-databases [18:15:46] :.-) [18:15:49] :-) [18:16:09] jynus: If your willing to teach, Ill volunteer my time. Ive been doing IT for years. [18:16:28] sure! [18:17:03] I hope you can understand why things sometimes go slow at labs, production takes most of my time [18:17:27] jynus: I remember the days when Brion was the only paid IT person [18:17:32] true! [18:17:41] but we have grown a bit also! [18:17:58] jynus: Hell Ive actually caused a few hiccups in my time [18:18:05] :-) [18:18:33] I still can tell you that without volunteer time, this would not work [18:18:34] trying to clear a 1.2 million item watchlist caused a headache [18:19:25] that is why the tools are so important [18:24:37] Luke081515, it should be shrinking now [18:25:05] but now you have more data to play around! [18:25:11] ok, thanks [18:29:27] 6Labs, 7Database: Database replicas: replicate user.user_touched - https://phabricator.wikimedia.org/T92841#1834333 (10jcrespo) enwiki has been backfilled, it took 5.81GB of transference and 1:28:42 (time). Will backfill the rest of the wikis later. [18:30:45] to go back the conversation, the "query is executing so the lag is not taken into account" is a problem of SHOW SLAVE STATUS "Seconds_behind_master" [18:31:19] not of pt-heartbeat, that is why I say it is very accurate [18:32:18] if a 5-second query was executed on the master, it will show 5 (probably more, 10, 5 for each slave in between) on the lag [18:40:07] jynus: The heartbeat_p.heartbeat is the same from any https://tools.wmflabs.org/bd808-test/ [18:40:19] err.. the same from any db right? [18:40:49] that demo I threw up reads it from mysql:dbname=meta_p;host=s7.labsdb [18:44:27] no [18:44:40] there are 7 * 3 server different replication channels [18:45:40] the division is somewhere, let me search it [19:13:06] technically that hasn't changed [19:13:06] I'll play with it tonight. It seems like it shouldn't be too hard to make a nice report [19:13:06] I could even create a federated table [19:13:07] * bd808 needs to drive to the turkey eating location now :) [19:13:07] so that you have the 3 tables on the 3 servers [19:13:07] :-) [19:13:07] have fun! [23:54:04] hi, someone here who can restart xtools?