[01:40:13] Holy crap, there are 286M echo_event rows on wikidatawiki and 285M of them are orphaned (about to be deleted by this script) [01:40:39] The script is at 10M deleted rows on wikidata now, but this is going to take a while [02:05:23] RoanKattouw: is there a ticket for more information? [04:39:11] Betacommand: T217073 [04:39:13] T217073: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 [04:39:52] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Catrope) I forgot to log the task number when I logged this: > 2019-02-27 17:05 RoanKattouw: Running foreachwikiindblist dblists/echo.dblist... [04:40:20] It's still going, wikidata is at 133M now, so almost halfway [04:40:39] After that it should hopefully finish quickly, there isn't a lot of alphabet left after wikidatawiki, other than zhwiki [04:42:20] Hah and the volume on zhwiki is comparatively small, 9.5M rows of which 7.4M can be deleted [04:42:35] (on enwiki it was approximately 90M / 60M, and on wikidata it's 286M / 285M) [04:43:41] Looks like it's doing roughly 40M / hour, so I guess it's got at least 4 more hours to go [05:49:57] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2033 - https://phabricator.wikimedia.org/T217301 (10Marostegui) p:05Triage→03Normal a:03Papaul @Papaul let's get the disk replaced Thank you [06:58:02] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter, 10Patch-For-Review: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) [07:07:14] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter, 10Patch-For-Review: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) [07:35:46] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter, 10Patch-For-Review: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) [08:21:11] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter, 10Patch-For-Review: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) s4 eqiad progress [x] labsdb1011 [x] labsdb1010 [x] labsdb1009 [x] dbstore1004 [x] dbstore1002 [x] db1125 [x] db1121 [x] db1103 [x] db1102 [x] d... [08:21:33] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter, 10Patch-For-Review: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) [08:32:09] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter, 10Patch-For-Review: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) [08:59:57] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter, 10Patch-For-Review: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) s3 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1004 [x] dbstore1002 [] db1124 [x] db1123 [x] db1095 [] db1078 [x] db1075... [09:08:28] the new backup process fixes the bug (backup_files has a new column), should we redo al backups, run the postprocessing, or just leave it as is, as all backups succeded anyway? [09:08:47] or maybe just add the correct size to the column? [09:09:10] I would suggest to run the postprocessing only, as this is a real example on when we'll need it :) [09:09:18] ok [09:09:19] btw, db1124 (one of the sanitarium is now upgraded) [09:09:28] Going to do db1125 in a bit [09:09:32] thanks! [09:09:49] and once they are done, I will upgrade labs hosts (next week) [09:11:02] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping page.page_no_title_convert on wmf databases - https://phabricator.wikimedia.org/T86342 (10Marostegui) s7 eqiad progress [x] labsdb1011 [x] labsdb1010 [x] labsdb1009 [x] dbstore1003 [x] dbstore1002 [x] db1125 [x] db1116 [] db11... [09:15:44] so this is what I did [09:16:04] I "mv latest/* ongoing" [09:16:27] then run sudo -u /usr/local/bin/dump_section.py --config=/etc/mysql/backups_postprocessing.cnf [09:16:57] and how did it go? [09:17:15] postprocessing.cnf is the same but with only_postprocess:True [09:17:19] we need a cheatsheet! [09:18:58] the start_date is missleading because it set all hosts to 09:15 backups (understandable) [09:19:15] but the size seem good [09:19:23] *sizes [09:20:19] check now says things are ok [09:20:36] \o/ [09:20:40] do you want to do codfw for practicing purposes (doesn't have to be now) [09:20:56] yes, can we do it tomorrow? [09:21:26] we can do it after lunch maybe? [09:21:35] during our meeting [09:21:43] ah sure, that works too [09:44:38] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter, 10Patch-For-Review: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) [09:45:21] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter, 10Patch-For-Review: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) All done - I will keep an eye on tendril for the next 24h before closing this to make sure nothing pops up. [11:50:53] marostegui: how hard would it be to get a dump of the page table onto that test server too for wikidata? [11:51:42] * marostegui in a meeting [11:51:45] should be easy [11:51:46] ack! [11:51:48] :) [11:51:49] will get back to you later [12:02:16] addshore page should be available from labsdb hosts and we from a vslow server, you should have access to both [12:02:34] I think it is also on dumps, let me check [12:02:49] jynus: I want to try it with joins with the tables that are on db1111 [12:03:41] addshore: it is generated automatically: https://dumps.wikimedia.org/wikidatawiki/20190220/wikidatawiki-20190220-page.sql.gz [12:04:03] aaah right, i can just grab that and load it! [12:04:06] :D [12:04:08] :-) [12:04:12] ty [12:04:51] not that we cannot help you, but way easier if you self serve yourself :-) [15:39:15] I guess we should reimage db1114? [15:39:24] (chris just took it down for mainboard replacement yay!) [15:39:37] I can take care of that [15:40:00] I would like to do the backup snapshot restore tomorrow? [15:40:01] maybe? [15:40:19] I can let you do it if you want [15:40:28] yeah, that'd be nice :) [15:40:35] cool them I would not touch it [15:40:38] unless you want to do it for debugging stuff [15:40:44] or do you want me to reimage but not provision? [15:41:01] you decide [15:41:01] yeah, if we can have that deal :) [15:41:05] ok [15:41:09] I mean, I can do the reimage, but I am going to logoff in a sec [15:41:15] I will reimage unless it comes up late [15:41:16] And then wait for you to do the snapshot restore tomorrow [15:41:22] excellent, thanks :) [15:41:45] will comment on ticket if something gets done [15:42:26] great! [15:42:27] thanks :) [16:56:04] marostegui db1114 is back....updated idrac and bios as well [16:57:01] he is not around [16:57:06] thanks! [16:57:18] I will give it a shake [17:44:23] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2033 - https://phabricator.wikimedia.org/T217301 (10Papaul) a:05Papaul→03Marostegui Disk replacement complete. [17:45:02] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10Cmjohnson) 05Open→03Resolved the motherboard has been replaced, the idrac and bios have been updated to latest version. resolving task, reopen if there are any problems. [18:23:23] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Catrope) 05Open→03Resolved This finished last night after I went to sleep [21:31:45] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10jcrespo) 05Resolved→03Open Please tell us from which set of servers, which tables you deleted rows from, as we agreed on IRC, so we can... [23:14:13] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10jcrespo) @Marostegui I've chosen not to reimage the server because this is right now a backup testing one, I think it is ok if currently doesn't have the right enwiki data....