[05:44:58] 10DBA, 10MediaWiki-Cache, 10Patch-For-Review, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10Marostegui) >>! In T210725#5062930, @aaron wrote: > > > By the way, what do you plan to do in... [05:47:48] 10DBA, 10Operations, 10ops-codfw, 10procurement: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Marostegui) a:05jcrespo→03Papaul [05:48:36] 10DBA, 10Operations, 10ops-codfw, 10procurement: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Marostegui) Hostnames updated. Racking proposal, is basically one server per row. And as we have 5 servers and 4 rows, we have to place 2 server within the same row,... [05:49:42] 10DBA, 10Operations, 10ops-codfw, 10procurement: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) a:05jcrespo→03Papaul [05:51:43] 10DBA, 10Operations, 10ops-codfw, 10procurement: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) Done. Racking: Any 1G rack is fine. Hostname: db2102.codfw.wmnet [06:03:50] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10RobH) [06:04:04] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10RobH) [06:10:45] Now that I see the icinga alerts about the backups I just realised that we need to change it to indicate whether the dump failed or the snapshot failed, as backups is now generic [06:20:50] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) [06:20:58] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) [07:16:10] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) [07:16:19] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) [07:26:24] 10DBA, 10MediaWiki-Database, 10WikimediaEditorTasks, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Choose DB/Cluster for WikimediaEditorTasks tables - https://phabricator.wikimedia.org/T218302 (10Marostegui) @Mholloway was this deployed yesterday? There were errors on logtash: https... [07:35:21] 10DBA, 10Goal, 10Patch-For-Review: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10jcrespo) Creating a snapshot between dbstore2002 and dbprov2001, using the SSDs for ongoing, and the HDs for latest: * 1h30m for xtrabackup transfer * 4m to... [07:59:08] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10jcrespo) vlan: private This will be like any other production db-hosts, documented here: https://wikitech.wikimedia.org/wiki/Raid_setup [08:00:28] 10DBA, 10Goal, 10Patch-For-Review: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10Marostegui) Copying here my comment earlier today in IRC just for the record: `Now that I see the icinga alerts about the backups I just realised that we need... [08:01:37] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) s3 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1004 [] db1124... [08:01:41] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) s3 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1004 [] db1124 [] db1123 [] db1095 [] db1078 [] db1077 [] db1075 [08:01:59] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) [08:02:13] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) [08:03:01] jynus: are you using db1095 or can I deploy a schema change there? [08:03:25] 10DBA, 10Goal, 10Patch-For-Review: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10jcrespo) All checks are for dumps now, dumps was not written on the alert because it was confusing for everybody else with actual XML dumps. I can, however, c... [08:04:02] I am personally not using it, if it is an equiad backup source, equiad backups finised not a long time ago [08:04:14] cool, I will deploy it there [08:08:59] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Marostegui) [09:00:45] I am going to do a new snapshot test run, did you suggest we do s3 instead? [09:01:20] yeah, just because it is an special case, as in: lots of files, directories.. [09:01:49] so that is 1.7TB uncompressed [09:03:00] Should we maybe try to compress all the backup hosts with BLOCK_SIZE=4 instead of 8 maybe? [09:03:13] 871G compressed [09:03:21] (innodb compressed) [09:03:26] I think it will fit [09:05:34] 1.6T on dbstore2002, which is the source it would use, plus backups are still running there, so we may have to chose other today [09:05:55] sure, it was just an idea [09:06:02] to play with an extreme case [09:06:09] I am looking for alternative ideas :-) [09:06:13] x1, maybe? [09:06:27] yeah, small, lots of directories…fits well [09:06:32] 292G [09:06:43] sounds like a good test [09:06:49] it will be like s3 but with less files [09:40:39] 10DBA, 10MediaWiki-Cache, 10Patch-For-Review, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10Marostegui) I would like to elaborate more on my idea on how to deploy the change to eqiad for t... [10:19:58] 10DBA, 10Wikimedia-Site-requests, 10Serbian-Sites, 10Wikimedia-maintenance-script-run: Mass bigdeletion scheduled for sr.wikinews - https://phabricator.wikimedia.org/T212346 (10Zoranzoki21) >>! In T212346#4968244, @Zoranzoki21 wrote: >>>! In T212346#4957919, @MarcoAurelio wrote: >> Could I have my botflag... [10:25:30] 10DBA: Decommission codfw x1 host - https://phabricator.wikimedia.org/T219493 (10Marostegui) [10:26:09] 10DBA: Decommission codfw x1 host - https://phabricator.wikimedia.org/T219493 (10Marostegui) p:05Triage→03Normal I am not adding the DCOps tasks yet to avoid unnecessary noise for them as this is not ready to go yet. [10:26:23] 10DBA: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493 (10Marostegui) [11:32:33] marostegui: I get fatal error mwdebug1002 of hywwiki. [11:32:34] https://logstash.wikimedia.org/app/kibana#/dashboard/mwdebug1002?_g=h@44136fa&_a=h@b6cb69f [11:32:45] ExternalStoreDB::fetchBlob master failed to find cluster25/1498395 [11:33:31] checking [11:34:01] so that database isn't present on x1 (which is one of the errors on logtash) [11:34:09] is that the new wiki? [11:34:12] yup [11:34:22] the addwiki.php worked fine when I ran it [11:34:32] I guess the process didn't finish correctly? we never create databases manually on x1 for any wiki creation [11:35:15] so I ran it twice [11:35:30] the first time it failed with something like this: https://phabricator.wikimedia.org/T212881 [11:35:38] Database is read-only: The database has been automatically locked until the replica database servers become available [11:35:44] uh? [11:35:48] let me see [11:36:02] but I re-ran it and it worked just fine [11:36:13] but it is still not on x1 [11:36:19] hywwiki right? [11:36:36] yup [11:36:39] ERROR 1049 (42000): Unknown database 'hywwiki' [11:36:41] it is on s3 [11:36:43] but not on x1 [11:36:58] https://phabricator.wikimedia.org/T212881#4893611 [11:37:00] it should not be on x1, the external storage is not found [11:37:06] I see that Jaime had to create them [11:37:32] let me see on es [11:37:50] root@es1015.eqiad.wmnet[hywwiki]> show tables; [11:37:50] +-------------------+ [11:37:50] | Tables_in_hywwiki | [11:37:50] +-------------------+ [11:37:50] | blobs_cluster24 | [11:37:53] +-------------------+ [11:37:54] but the maintaince script fixed it and got merged... [11:37:55] 1 row in set (0.00 sec [11:37:58] on es they are there [11:38:04] but the error on logtash is on x1 not es [11:38:31] according to logs, it should go to cluster 25 [11:38:33] https://logstash.wikimedia.org/app/kibana#/dashboard/mwdebug1002?_g=h@44136fa&_a=h@b6cb69f [11:39:01] it is also on 25 [11:39:08] root@es1017.eqiad.wmnet[hywwiki]> show tables; [11:39:08] +-------------------+ [11:39:08] | Tables_in_hywwiki | [11:39:08] +-------------------+ [11:39:08] | blobs_cluster25 | [11:39:10] +-------------------+ [11:39:13] 1 row in set (0.00 sec) [11:39:30] that error, is from an x1 server IP [11:39:34] 10DBA, 10Data-Services, 10Operations, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10MarcoAurelio) https://hyw.wikipedia.org is now up. However there's: [XJyyFQpAAD0AAEF-ZM8AAAAD] 2019-03-28 11:37:57: Fatal exception of type "MediaWiki\R... [11:40:08] 10DBA, 10Data-Services, 10Operations, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10Marostegui) We are investigating on -databases [11:40:39] Amir1: so there are two errors, the table not found on 25 (and it is there) and the non found database on x1 (and it is not there) [11:41:26] should it be on x1? it should be on s3 [11:41:51] I guess the shared tables have to be on x1, like the rest of wikis, but you probably know more than me :) [11:41:53] the s3 database exists [11:42:13] but the errors are no more, there are other errors right now, but not those anymore: https://logstash.wikimedia.org/goto/7dbae2858e1dbedbd1b649aab7edd53e [11:43:32] https://github.com/wikimedia/mediawiki-extensions-WikimediaMaintenance/commits/master/addWiki.php [11:43:35] Replication to github seems broken [11:43:46] yeah... [11:43:58] Reedy: https://phabricator.wikimedia.org/T219264 [11:44:07] Reedy: I guess other repos too [11:44:07] IT's resolved :P [11:44:12] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaMaintenance/+/master/addWiki.php [11:44:37] Amir1: So to be clear, when I go to that new wiki, I see errors on logtash, but not for x1 or es anymore, just the ones I sent you above [11:44:58] I can re-run the maintenance on ext1 [11:45:13] Amir1: that is not erroeing anymore [11:45:21] I just tried: https://logstash.wikimedia.org/goto/aa8ab07922913a24dfc87c0e3252629f [11:45:37] but it's still fataling [11:45:37] Amir1: But again, I don't know if that new wiki has to show up on x1 or not, so I cannot tell you :( [11:46:07] Amir1: It is, but with other errors, I don't know what those are [11:46:21] But it doesn't complain about es or x1 (from what I can see) [11:46:47] yeah, it complains about bad data [11:46:54] sorry to ask, but the addwiki issue seems like a mw issue, not really related to databases? Probably should go to operations or if there is any other mw or wmf infrastructure-related channel? [11:47:14] not that I care, but so other people are aware [11:47:39] we don't want people to force to enter here for important topics [11:47:46] jynus: addwiki.php makes the databases [11:48:19] is it run as root? [11:48:44] lolno [11:48:51] Thank goodness no [11:49:06] so it should be -operations or somewhere else [11:49:27] I am not saying it is not important, I am saying because it is important, it shouldn't be here [11:50:24] so more people than us should be made aware [11:58:59] sorry for being so strict about that, but I "got permission" to create this irc as long as all important stuff would continue going to -operations [12:53:32] 10DBA, 10MediaWiki-Database, 10WikimediaEditorTasks, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Choose DB/Cluster for WikimediaEditorTasks tables - https://phabricator.wikimedia.org/T218302 (10Mholloway) @Marostegui, yes, this was caused by a bug fixed by https://gerrit.wikimedia... [13:59:45] snapshot on x1 was very, very fast, from start of transfer to location on the final place, compressed, 45 minutes [14:00:00] nice [14:27:38] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) [14:34:13] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Papaul) [14:36:50] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (1) testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Papaul) switch port information asw-c5-codfw ge-5/0 11 [14:47:47] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) switch port information db2097 asw-a6-codfw ge-6/0 6 db2098 asw-b6-codfw ge-6/0/0 db2099 asw-c6-codfw ge-6/0/6 db2100 asw-d1-codfw ge-1/0/0 db2101 as... [15:16:10] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10RobH) [17:11:23] 10DBA, 10Core Platform Team (MCR), 10Core Platform Team Backlog (Later), 10Multi-Content-Revisions (Tech Debt), 10Schema-change: Once MCR is deployed, drop the rev_text_id, rev_content_model, and rev_content_format fields from the revision table - https://phabricator.wikimedia.org/T184615 (10Neil_P._Quinn... [18:04:02] 10DBA, 10Data-Services, 10Operations, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10Marostegui) This is no longer throwing errors from what I can see. @Ladsgroup you fixed it in the end? I can also see 3 users there already created and we... [18:05:21] 10DBA, 10Data-Services, 10Operations, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10Ladsgroup) No it got reverted. It was sending fatal errors. [18:12:50] 10DBA, 10Data-Services, 10Operations, 10cloud-services-team: Prepare and check storage layer for hywwiki - https://phabricator.wikimedia.org/T212625 (10Marostegui) Ok - let me know when it is attempted again. Thank you! [18:48:12] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 10Performance, 10Schema-change: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961 (10Marostegui) Just for the record, we are almost done with unifying the logging table to what is on tables.sql across all...