[01:31:05] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2871541 (10Papaul) I chat with HP once again, the RAID controller is integrated to the main board so they will have to send another tech onsite to: 1- Replace again the main board 2- Update the sys... [06:26:59] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2871740 (10Marostegui) Thanks for all the updates Papaul! This time the RAID got rebuilt just fine, with no predictive failure, so I am going to try to make the server crash again. [06:33:32] 10DBA, 07Epic, 07Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#2871745 (10Marostegui) [07:13:31] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2871781 (10Marostegui) I have started the s7 file transfer from db2068 to dbstore2001 [07:23:18] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2871789 (10Marostegui) >>! In T152761#2869535, @kaldari wrote: > @Marostegui: Apparently, it's taking a long time to complete the... [08:08:05] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2871830 (10Marostegui) Server crashed and power is off. ``` iLO Advanced 2.40 at Dec 02 2015 Server Name: WIN-12 Server Power: Off ``` As always, there is no log as I didn't delete the ones from... [08:25:08] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2871849 (10Marostegui) The metawiki database in s7 has one partitioned table (pagelinks) ``` /*!50100 PARTITION BY RANGE (pl_namespace) (PARTITION p_2 VALUES LESS THAN (3) ENGINE = In... [08:42:56] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2871899 (10Marostegui) I have tested the workaround in my local environment and I am facing some issues, with column mismatch (even though I am using same MariaDB version from both se... [08:45:13] marostegui: FYI I've ack'ed db1073 host down on icinga so it doesn't show up on the tactical overview as a host down unhandled [08:47:02] is it down?? [08:47:07] i didn't see that on icinga [08:47:26] ah, I think it was down for maintenance [08:47:48] it was in the unhandled hosts down, I know it's under work for the issues, and has already notification disabled [08:47:58] ah ok ok :) [08:48:02] just ack'ed and linked to the same task ;) [08:48:09] thanks :) [08:48:43] yw :) [09:29:46] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2871958 (10jcrespo) Drop them, but do not add them back after transfer. We want to be able to copy them from dbstore2001 to any other host easily. [09:44:31] 07Blocked-on-schema-change, 10DBA, 07Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#2871964 (10Legoktm) [09:51:10] jynus: hey! I was unavailable for a few days because I lost power and any connectivity due to a cyclone. am back today and picking up on the labsdb work again [09:51:29] I'll read through phab mail and get back [09:52:33] just wanted to let you know :) [10:05:35] 10DBA, 10MediaWiki-API, 10MediaWiki-Database, 05MW-1.29-release-notes, and 4 others: ApiQueryExtLinksUsage::run query has crazy limit - https://phabricator.wikimedia.org/T59176#2872005 (10jcrespo) [10:07:35] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2872022 (10jcrespo) [10:21:33] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2872071 (10Legoktm) [10:23:17] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2872076 (10jcrespo) [10:24:20] jynus: I marked that task as a blocker to rolling out the train further. Is there a way we can identify the offending queries? [10:25:45] I wonder if it could be related to populateLocalAndGlobalIds.php running on loginwiki (https://phabricator.wikimedia.org/T152761#2869535) which is on s3 [10:26:36] mmm [10:27:20] we do not have the infrastructure for logging the queries (remember it was determined as not grafana-public ok) [10:27:40] but we can go to each server and see [10:29:16] db or mw servers? [10:29:55] db [10:30:08] I see lots of references to LinkCache [10:30:49] and LinkBatch::doQuery (for Skin::preloadExistence) [10:30:59] but I do not have a historic to compare with [10:31:40] hmm, those shouldn't be new queries [10:32:36] I also see WikiExporter::dumpFrom, on a non-dump server [10:34:22] that would be from someone using Special:Export on a wiki [10:34:30] ok [10:34:37] that would be good news [10:34:45] it could explain the extra load [10:34:57] and it should be temporary [10:35:15] just happen to be run at the same time than the scap? [10:35:23] uhh, seems kind of unlikely [10:35:40] if someone is hammering Special:Export, we should block them and ask them to use dumps instead [10:37:10] (or switch to the API) [10:39:51] most used tables rigth now are heartbeat/heartbeat [10:40:45] actually no [10:41:04] it is afwiki/revision [10:42:03] afwiki isn't on the new MW version yet [10:42:13] I know [10:44:02] there is not activity on testwiki or similar [10:44:33] is mediawiki part of group 0 ? [10:44:36] yes [10:44:53] https://noc.wikimedia.org/conf/highlight.php?file=group0.dblist [10:45:22] CentralAuthUser::importLocalNames [10:45:28] ResourceLoader::preloadModuleInfo [10:46:05] nothing interesting [10:52:59] the only wikis with activity I am seeing now are cebwiki and glwiki [11:11:12] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2872211 (10jcrespo) After a first look, the issue is not 100% clear that this is due to a train deployment, as the issue seems to continue, but there i... [12:55:21] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2872344 (10Marostegui) I will drop them on db2068, then do the transfer and only add them back to db2068 but not to dbstore2001. Thanks! [13:16:55] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2872359 (10jcrespo) Should they even be on db2068? Genuine question (not a blocker for anything you just commented). [13:18:56] 10DBA: Remove partitions from metawiki.pagelinks - https://phabricator.wikimedia.org/T153194#2872361 (10Marostegui) [13:22:54] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2872377 (10Marostegui) >>! In T151552#2872359, @jcrespo wrote: > Should they even be on db2068? Genuine question (not a blocker for anything you just commented). I checked another sl... [13:25:05] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2872378 (10jcrespo) > I checked another slave and it has partitions Is it on all servers, eqiad and codfw? [13:25:50] phabricator backups failed (only ones that failed this time) [13:27:35] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2872382 (10Marostegui) In the majority of hosts indeed: ``` root@neodymium:/home/marostegui/git/software/dbtools# for i in `cat s7.hosts| grep -v db1069| awk -F " " '{print $1}'`; do... [13:29:29] I know what failed (technically, it didn't fail, we just backed up 0 databases) [13:29:43] will fix after coffee time [13:36:34] 10DBA: Remove partitions from metawiki.pagelinks - https://phabricator.wikimedia.org/T153194#2872386 (10Marostegui) This is now running on db2068: ``` ./software/dbtools/osc_host.sh --host=db2068.codfw.wmnet --port=3306 --db=metawiki --table=pagelinks --method=ddl --no-replicate "remove partitioning" ``` [13:45:19] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#2872476 (10TTO) [14:09:35] 10DBA: Dumps for misc triggers unnecessary pages - https://phabricator.wikimedia.org/T132539#2872518 (10jcrespo) 05Open>03Resolved a:03jcrespo This was fixed on rOPUPa24ef9130b0baeccef030edf679b9f916cd00df6 --dump-slave stopped the slave during the whole backup process, not just the start. [14:11:03] fixing a bug: 5 minutes [14:11:04] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2872523 (10Marostegui) I have finished cleaning up the databases from s7 that were transferred earlier from dbstore2001. Also, given the size of m3 (120G) and x1 (176G) it is probabl... [14:11:15] finding the old task that reported the bug: 1 hour [14:11:50] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2872524 (10jcrespo) +1 [14:11:51] XDDDD [14:12:14] consider dumping --no-data and data on separate files [14:12:24] and doing the import already compressed [14:12:49] why you say —no-data and -data on different files? any bad experience? [14:12:58] but of course it depends on the tables, etc. maybe it is not even worth doing it [14:13:10] oh, just for easier CREATE TABLE editing [14:13:22] mere convenience than editing a 200GB file [14:13:38] haha true [14:13:46] e.g. either for editing the file [14:13:54] 10DBA, 10MediaWiki-API, 10MediaWiki-Database, 05MW-1.29-release-notes, and 4 others: ApiQueryExtLinksUsage::run query has crazy limit - https://phabricator.wikimedia.org/T59176#2872527 (10Anomie) [14:13:58] or for creating the tables first, altering them on the db [14:14:01] and the doing the inserts [14:14:21] ah for the compression [14:14:30] if it is needed, big if [14:14:49] yeah, I will double check [14:14:58] only s7 is a priority [14:15:10] yep, I expect the partitions to be removed by tomorrow [14:15:18] the others can wait, we do not back them up from dbstore [14:16:15] for m3, which is not in config, can I just stop replication on db2012 for instance? Or do I need to remove it from somewhere? [14:16:53] what do you mean not in config? [14:17:04] sorry, on db-codfw.php [14:17:51] db2012 is just a mere data redundancy, not service redundancy, I think [14:18:12] that may change soon if they install phab on codfw [14:18:43] but we have pending talking to them about architecturing (replication/load balancing technology) [14:18:50] them == phab admins [14:19:10] aah ok, so for now it is just for redundancy [14:19:53] yes, it was on the delayed slave precisely to survive errors [14:20:12] and off-site data availavility [14:20:47] there has never a good plan for misc services on codfw, that could be part of the next dc switchover [14:20:59] I know mutante was interested on working on that [14:21:57] I would like to have all s* and m* on dbstore[12]001 in the end [14:22:08] but we'll see... [14:22:47] as I said, only s* ones are important for now [14:23:45] 07Blocked-on-schema-change, 10DBA: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#2872535 (10jcrespo) [14:29:50] 07Blocked-on-schema-change, 10DBA, 06Collaboration-Team-Triage, 10Flow, and 2 others: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2764089 (10Marostegui) I will take this ticket as I am about to finish another schema change tomorrow. However due to the xmas times, t... [14:31:34] flow tables are relatively small [14:32:05] so much that in some cases, just doing an alter on the master with binlog on may be not such a bad idea [14:32:23] oh, that can be a good one [14:32:27] they are on x1 (except some exceptions), which makes those easier [14:32:28] I was trying to see where they are [14:32:31] aha :) [14:32:35] hi! why do I get an empty set using left outer join, am I doing anything wrong ? [14:32:37] some private ones are locally [14:32:38] MariaDB [dewiki_p]> select page_title, pl_title from page left outer join pagelinks on pl_from = page_id where pl_title = '!distain' and page_is_redirect = 0 and page_namespace = 0 and pl_namespace = 0; [14:35:43] doctaxon, the answer to that question depends on if you have embraced fully the differences between LEFT join and JOIN, and the differences between ON and WHERE [14:36:24] yes, page_title has to be empty [14:36:25] 07Blocked-on-schema-change, 10DBA, 06Collaboration-Team-Triage, 10Flow, and 2 others: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2872571 (10Marostegui) Jaime pointed out that the tables normally quite small and he's totally right: ``` root@db1031:/srv/sqldata/flo... [14:36:32] but not pl_title [14:36:50] assuming you know that (somtimes it gets complicated), then there is 2 options: there are no rows with that conditions or the rows are wrong/missing [14:39:26] I would check manually the rows you are trying to select from each table, then try what could be wrong; if they are missing, sending a bug report this way for us to check [14:40:45] marostegui, check the mediawiki config, because there are some databases that have those tables locally (which are even smaller) [14:41:08] jynus: flow ones right? [14:41:52] yes [14:42:01] marostegui: jynus yesterday we (andrew) had to change the id column for the pdns db to be a BIGINT as we had reached the limit of signed int autoincrement and things were failing [14:42:15] that's ok, saw it [14:42:16] this was done in labtest first and then patched in prod due to outage [14:42:29] cool, just wanted to give you heads up in case of some unforseen consequence [14:42:48] well, we should have a monitor check with that [14:43:03] but it is on the pile of "nice things to have" [14:43:08] yessum [14:43:12] but it is not happening soon [14:43:45] also, schema changes like those are owned by andrew [14:43:51] as in, he takes care of them [14:44:17] the issue in other cases is when someone does that and don't take care of maintaining it, etc. [14:44:53] so I am more than happy about it [14:45:13] yup, cool, just keeping in touch :) [14:53:37] jynus: both the tables are left joined, left to page, I cannot find any mistake [14:55:59] can you explain in "English" what you are trying to achieve (as verbose as you can), e.g. on a paste, and we can compare it with the sql you sent [15:02:37] jynus: I am searching for pages that link to the page '!distain' on dewiki. These pages should be no redirects and should only be in namespace 0. '!distain' is only an example where there is no such page that links to it, so one column of the result row has to be NULL. [15:03:13] :w [15:03:17] sorry [15:04:54] sorry: ... so one field of the result row has to be NULL. [15:09:47] you want this: SELECT page_id, page_namespace, page_title FROM pagelinks JOIN page ON page_id = pl_from WHERE pl_title = '!distain' and pl_namespace = 0 and pl_from_namespace = 0 and page_is_redirect = 0; [15:09:56] there is not an exaple of such a page [15:10:02] there are 2, but they are redirects [15:10:19] 10DBA: Remove partitions from metawiki.pagelinks - https://phabricator.wikimedia.org/T153194#2872634 (10Marostegui) [15:10:39] Distain! and Distain [15:11:26] not sure what you mean with the last part [15:12:13] but for whaht you asked, you do not need left join- unless you want something more sofisticated [15:15:29] jynus: this is only example, there is a list of pl_title / but with left or right join I ever could get a row with one empty field, because there's nothing to find. Something like here in the examples on https://www.techonthenet.com/sql/joins.php [15:20:28] you want to select all titles? or all pagelinks? [15:24:02] all page_title with its pl_title, if there is no pl_title it's used to get NULL [15:25:05] NULL or an empty field [15:28:10] I think what you are trying to do is a semijoin [15:28:26] tell me more [15:28:35] semijoin? [15:28:43] there is several ways to achieve that [15:28:59] that could be more or less efficient [15:29:18] you want only one example of a single row per page, or all? [15:29:32] *link rows [15:30:16] because if you just one one, you can do the left join and then group by, but that may not be very efficient [15:30:28] an alternative would be to do it in a subquery [15:30:55] so you can tune how deterministic it has to be to short-circuit the semijoin [15:31:32] at this point the better or worse plan depends on the specific features of the mariadb version in use [15:31:54] which I cannot tell you from the top of my mind without testing [15:32:06] let's take a short cut [15:32:25] so do the join on a subselect, see if the performance is acceptable [15:33:02] I only need the page_titles who has no link from another page in namespace 0 [15:33:09] ah [15:33:11] no link [15:33:28] link from, not link to [15:33:29] then you are trying to do a substraction operation [15:33:48] that is with LEFT JOIN ... IS NULL [15:34:26] see an exaple here: http://www.gokhanatil.com/2010/10/minus-and-intersect-in-mysql.html [15:37:30] this way: SELECT page_id, page_namespace, page_title FROM pagelinks LEFT JOIN page IS NULL WHERE pl_namespace = 0 and pl_from_namespace = 0 and page_is_redirect = 0; [15:38:06] jynus: ^ [15:38:09] no, you have to apply the IS null to the PK of the outer table [15:38:27] in this case, any column that cannot be null on pagelinks is ok [15:38:37] so that if it is null [15:38:46] you know no results were given [15:39:06] but what have I to change [15:39:23] you can also do it with the construnction WHERE NOT EXISTS + subquery, if that is easier to understand? [15:43:36] SELECT page_title FROM page, pagelinks WHERE NOT EXISTS (SELECT page_title FROM page, pagelinks WHERE page_id = pl_from AND pl_namespace = 0 AND pl_from_namespace = 0 AND page_is_redirect = 0; [15:43:46] jynus: ^ is this better? [15:44:49] no, that would not work [15:44:59] :( [15:47:23] something like this is simpler (I have not added the namespace restrictions) SELECT page_id, page_namespace, page_title FROM page WHERE page_id NOT IN (SELECT pl_from FROM pagelinks) LIMIT 10; [15:47:39] but not necessarily better in performance [15:48:08] as I said, using that or the LEFT JOIN... IS NULL will depend on how you want to do it [15:48:51] jynus: you said, my left join example was wrong, too [15:49:41] there is a page on the manual with several strategies for semi-joins: https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html [15:51:45] yeah, your ON and where closes on your initial query where not ok [15:51:57] you were asking for no rows to be returned there [15:52:21] more examples on https://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/ [15:52:46] again, the specific best option will depend if you apply more filters or do the queries in small batches [15:53:07] which you may have to because the *links tables are very tall [15:53:51] if you are going to do lots of queries on those tables, you may want to generate a temporary table with titles that are linked [15:53:57] and then do several queries on that [15:54:08] so you do not have to do the same job many times [15:57:26] can you give me the solution to the left join is null strategy please to learn this along the linked manuals? It's very complicated ... [15:58:58] jynus: ^ [16:00:14] I cannot do it right now, sadly, I have important maintenance to take care of [16:01:51] 10DBA, 07Schema-change, 07Tracking: Schema changes for Wikimedia wikis (tracking) - https://phabricator.wikimedia.org/T51188#2872763 (10jcrespo) [16:02:03] 07Blocked-on-schema-change, 10DBA, 10Wikimedia-Site-requests, 06Wikisource, and 3 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2872760 (10jcrespo) 05Open>03Resolved This is officially done. I've checked every single wiki we have (20787 databases), and all h... [16:15:08] I think the backup scripts may fail on codfw [16:15:20] unless we setup CNAME aliases on codfw [16:15:35] which would not be a really bad idea [16:15:56] s1-master.eqiad.wmnet and s1-master.codfw.wmnet [16:22:25] yeah [16:22:34] I was actually surprised we only had cnames for eqiad [16:28:34] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial setup and provision of labsdb1009, labsdb1010 and labsdb1011 - https://phabricator.wikimedia.org/T140452#2872866 (10chasemp) >>! In T140452#2872812, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https... [16:30:13] 10DBA, 06Operations, 13Patch-For-Review: Throttle mysql backups on dbstore1001 in order to not saturate the node - https://phabricator.wikimedia.org/T134977#2872886 (10jcrespo) a:03jcrespo Putting this in progress so it is in the radar, to check next week if it worked. [16:30:45] 07Blocked-on-schema-change, 10DBA, 10Wikimedia-Site-requests, 06Wikisource, and 3 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2539655 (10Nemo_bis) [16:34:30] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2872905 (10Marostegui) I have extended this just in case and in order to avoid bothering our US folks with a page. [16:40:35] 07Blocked-on-schema-change, 10DBA: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#2872926 (10jcrespo) [16:44:11] 07Blocked-on-schema-change, 10DBA: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#2872944 (10jcrespo) This may need the same strategy than T130067#2376998 (thus the T138810 dependency), or maybe we can think of an alternative method wit... [16:47:19] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial setup and provision of labsdb1009, labsdb1010 and labsdb1011 - https://phabricator.wikimedia.org/T140452#2872950 (10jcrespo) Just one minor comment- you want to comment this on the open T141097 or T147051 or T147052. This is an old ticket ju... [16:49:47] 10DBA: Remove partitions from metawiki.pagelinks - https://phabricator.wikimedia.org/T153194#2872963 (10Marostegui) 05Open>03Resolved [16:49:49] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2872964 (10Marostegui) [17:00:42] 10DBA, 06Labs: Fix broken views in labs DB "ERROR 1356 -- references invalid table(s) or column(s)" - https://phabricator.wikimedia.org/T153213#2872990 (10Halfak) [17:14:31] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2873038 (10Marostegui) I am transferring the s7 files from db2068 now to dbstore2001 once the partitions have been removed from metawiki.pagelinks (T153194). [17:16:43] 10DBA, 06Labs: Fix broken views in labs DB "ERROR 1356 -- references invalid table(s) or column(s)" - https://phabricator.wikimedia.org/T153213#2873056 (10Halfak) Looks like `povwatch_log` has problems too [17:34:39] 10DBA, 06Labs: Fix broken views in labs DB "ERROR 1356 -- references invalid table(s) or column(s)" - https://phabricator.wikimedia.org/T153213#2873113 (10Halfak) Here's a related issue I've filed in SQLAlchemy: https://bitbucket.org/zzzeek/sqlalchemy/issues/3871/metadatareflect-should-skip-tables-that [18:08:37] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2873198 (10Marostegui) I am importing m3 (already compressed) into dbstore2001 now. Replication on db2012 will remain stopped until it is done. [18:09:05] 10DBA, 06Labs: Fix broken views in labs DB "ERROR 1356 -- references invalid table(s) or column(s)" - https://phabricator.wikimedia.org/T153213#2872990 (10Krenair) The underlying table simply no longer exists: ```mysql:wikiadmin@db1089 [enwiki]> show create table povwatch_log; ERROR 1146 (42S02): Table 'enwiki... [18:12:16] 10DBA, 06Labs: Fix broken views in labs DB "ERROR 1356 -- references invalid table(s) or column(s)" - https://phabricator.wikimedia.org/T153213#2873207 (10Krenair) Related tickets: T103011, T92739, T54924 [18:14:37] 10DBA, 06Labs, 10Labs-Infrastructure: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#2873214 (10Krenair) [18:14:40] 10DBA, 06Labs: Fix broken views in labs DB "ERROR 1356 -- references invalid table(s) or column(s)" - https://phabricator.wikimedia.org/T153213#2873213 (10Krenair) [18:17:22] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2873220 (10jcrespo) MySQLs wit no SSL ``` $ sudo salt -C 'G@cluster:mysql' cmd.run 'mysql --skip-ssl -e "SELECT @@ssl_ca"' | grep -c 'NULL' 14 ``` MySQL with expired TLS cert: ``` $ sudo... [18:18:02] 10DBA: Use tls for dump backup generation - https://phabricator.wikimedia.org/T151583#2873221 (10jcrespo) [18:18:05] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2873222 (10jcrespo) [18:18:18] 10DBA: Use tls for dump backup generation - https://phabricator.wikimedia.org/T151583#2821866 (10jcrespo) [18:18:21] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2203662 (10jcrespo) [18:20:52] 10DBA, 06Operations, 13Patch-For-Review: Restart pending mysql hosts with old TLS cert - https://phabricator.wikimedia.org/T152188#2873250 (10jcrespo) Pending hosts: ``` db1063.eqiad.wmnet db1054.eqiad.wmnet db1067.eqiad.wmnet db1036.eqiad.wmnet db1015.eqiad.wmnet db1021.eqiad.wmnet db1022.eqiad.wmnet db205... [18:21:18] 10DBA, 06Operations, 13Patch-For-Review: Restart pending mysql hosts with old TLS cert - https://phabricator.wikimedia.org/T152188#2873253 (10jcrespo) [18:33:44] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2873332 (10Marostegui) m3 is now replicating finely in dbstore2001 and catching up with the master. [19:11:48] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2873446 (10Marostegui) The files have been transferred and I have started the tablespace importation of s7 into dbstore2001. It will take several hours to complete. Replication on db2... [19:49:11] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2872022 (10mmodell) are there any async jobs running along with group0? This //may// be related: After deploying the train, I noticed a lot of this... [19:50:13] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2873732 (10mmodell) p:05Triage>03High @jcrespo: I can revert and see what happens? [19:57:23] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2873772 (10mmodell) [20:15:39] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2873848 (10mmodell) I just reverted, waiting a while to see if it makes any difference in the graphs linked above.. [20:18:32] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2873856 (10mmodell) It doesn't appear to be helping. [20:19:54] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2872022 (10demon) >>! In T153184#2873726, @mmodell wrote: > are there any async jobs running along with group0? > > This //may// be related: > After... [20:24:28] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2873890 (10mmodell) ok so rollback of the train hasn't helped. Any other ideas of what could be causing this? [20:41:43] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2872022 (10hashar) On s3 we had the same bump of traffic over the week-end from Dec 10 08:55UTC till Dec 11 20:55UTC: {F5055852 size=full} [21:06:17] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2874115 (10mmodell) removing from deployment blockers because this is apparently unrelated to the new branch. [21:06:28] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2874116 (10mmodell) [21:23:59] 10DBA, 06Release-Engineering-Team, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2874227 (10jcrespo) [21:24:48] 10DBA, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2872022 (10jcrespo) a:03jcrespo Thanks for checking. I really mean it. I will continue investigating to see what is the source of this overhead. [21:28:35] 10DBA, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2874279 (10mmodell) @jcrespo: Glad to help, that's what I'm here for :) [21:42:23] 10DBA: duplicate key problems - https://phabricator.wikimedia.org/T151029#2874323 (10jcrespo) [21:43:43] 10DBA: duplicate key problems - https://phabricator.wikimedia.org/T151029#2805124 (10jcrespo) 05Resolved>03Open This is worrying: ``` MariaDB MARIADB labsdb1001 enwiki > SET GLOBAL innodb_file_per_table = 1; Query OK, 0 rows affected (0.00 sec) MariaDB MARIADB labsdb1001 enwiki > SET GLOBAL innodb_file_fo... [21:43:51] 07Blocked-on-schema-change, 10DBA, 10Wikimedia-Site-requests, 06Wikisource, and 3 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2874339 (10jcrespo) [21:55:19] jynus: are you still there? [22:59:31] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2874723 (10kaldari) @Marostegui: It's taking much longer than I expected. One thing that I've learned is that I'm terrible at esti...