[05:07:56] 10DBA, 10Patch-For-Review, 10User-Urbanecm, 10cloud-services-team (Kanban): Establish process of determining shard for new wikis - https://phabricator.wikimedia.org/T259438 (10Marostegui) Thank you Bryan. @Urbanecm I think we are done from the "code" side. I think I am going to add this step to https://wik... [05:22:09] 10DBA, 10Wikimedia-Rdbms, 10Goal, 10MW-1.36-notes (1.36.0-wmf.5; 2020-08-18), and 4 others: FY18/19 TEC1.6 Q4: Improve or replace the usage of GTID_WAIT with pt-heartbeat in MW - https://phabricator.wikimedia.org/T221159 (10Marostegui) >>! In T221159#6378430, @Krinkle wrote: > @Marostegui FYI - I'm landing... [05:35:26] 10DBA: Upgrade m5 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T260324 (10Marostegui) [05:35:50] 10DBA, 10Patch-For-Review: Upgrade m5 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T260324 (10Marostegui) p:05Triage→03Medium a:03Marostegui [05:44:18] 10DBA, 10Patch-For-Review: Upgrade m5 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T260324 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2135.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202008130544_m... [06:15:17] 10DBA: All sorts of random drifts in wikis in s3 - https://phabricator.wikimedia.org/T260111 (10Marostegui) p:05Triage→03Medium [06:24:14] 10DBA: Upgrade m5 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T260324 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2135.codfw.wmnet'] ` and were **ALL** successful. [07:13:46] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) `muswiki` and `mhwiktionary`... [07:22:13] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) [07:30:05] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) Wikis in s3 situation: `tes... [07:30:23] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) [08:06:59] 10DBA, 10Patch-For-Review, 10User-Urbanecm, 10cloud-services-team (Kanban): Establish process of determining shard for new wikis - https://phabricator.wikimedia.org/T259438 (10Marostegui) Created https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replicas#Update_wikis_location [08:21:07] 10DBA, 10observability, 10Epic: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492 (10Marostegui) [08:21:09] 10DBA, 10Patch-For-Review, 10User-Kormat: Create prometheus alert to detect lag spikes - https://phabricator.wikimedia.org/T253120 (10Marostegui) 05Open→03Resolved Thank you @kormat for working on this, we've wanted to have this alert for a long time! [08:21:23] 💜 [08:23:16] 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat: DBA python layout - https://phabricator.wikimedia.org/T259516 (10Kormat) a:03Kormat [08:24:23] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [08:51:41] kormat: when you want, if you could have a look at: https://gerrit.wikimedia.org/r/q/topic:%22wmfbackups%22+(status:open%20OR%20status:merged) [08:51:50] s/want/can/ [08:52:22] I tried not to touch wmfmariadbpy, but it required a small refactoring on one method there [08:53:17] will do [08:53:30] that will allow to start removing redundancies [08:53:35] like the port-> socket matching [08:53:43] but you may had a previous way to do that [08:53:53] *previous plan [08:55:01] don't worry too much about functionality testing for backup part (I am doing that myself), more the general idea and the wmfmariadb parts [08:56:46] this will also inform the packaging, as backups also use remote execution libraries [09:40:31] jynus: can you add me as a member of the operations-software-wmfmariadbpy gerrit group, please? https://gerrit.wikimedia.org/r/admin/groups/f948dea7f1f871e879aacb863838a9bcf4e17793,members [09:44:01] let me see [09:44:46] I am not sure I can do that myself [09:44:57] oh, huh [09:47:05] yeah, I can modify the members of operations-software-transferpy but not that one [09:48:54] because repo onwer is gerrit managers [09:50:35] well, not the gerrit owner, the group owner [09:58:10] hmm, ok thanks. i _might_ still be able to make a tag, i guess i'll just test and see. [09:59:16] you should be [10:00:09] it inherits permissions from software, which I think gets it from ldap [10:00:55] that's what i thought too, but: [10:01:00] `remote: You need 'Create Signed Tag' rights to push a signed tag.` [10:02:03] create a task to request membership to gerrit admins [10:03:29] heh, tempting [10:06:17] huh. there's only a single SRE person in Gerrit Managers (herr.on) [10:08:21] ask on releng channel, maybe someone in current tz is around [10:08:56] when I asked the creation of the project, I just arrived so I didn't know much about gerrit management [10:09:32] so I may not have asked to be created correctly [10:23:51] screw it, filed https://phabricator.wikimedia.org/T260342 to ask for gerrit-manager access [10:39:48] marostegui: https://phabricator.wikimedia.org/P12249 this is the final list but for non-abstract tables (around 70% of them) [10:40:03] I'm running it on abstract tables now, it'll take until tomorrow to finish [10:40:30] Amir1: should we add that to the task you created? [10:40:35] that last paste I mean [10:40:37] 10DBA: All sorts of random drifts in wikis in s3 - https://phabricator.wikimedia.org/T260111 (10Ladsgroup) The final list on non-abstract tables: {P12249} [10:40:38] ah, you just did :) [10:40:45] Yup :D [10:40:52] that's going to be fun XD [10:44:39] specially for you, I'm already grabbing my popcorn [10:44:45] hahaha [10:45:06] honestly we could even automate this one day for small wikis [10:45:20] kormat: ^ [10:45:24] * marostegui hides [10:45:38] * kormat hunts down marostegui [10:45:56] Amir1: yeah, I was talking a few days ago with kormat about schema changes automation, starting with simple changes and small wikis [10:45:59] If you give me root, I can do it too [10:46:02] haha [10:46:04] * Amir1 is always happy to help [10:46:37] with abstract schema, it'll be a piece of cake [10:46:52] indeed, should be a lot easier [10:47:06] but it is still hard for big tables, big wikis, etc [10:47:23] That's why I was discussing with kormat that we need to start with simple changes, with no many drifts and all that [10:47:34] yeah [10:48:20] marostegui: btw, I haven't checked any drift in data type or size of fields yet (anywhere in production). Once MCR is clean, I start on that [10:48:41] yeah, let's leave that for a second round [11:51:30] 10DBA: Remove muswiki and mhwiktionary from s3 - https://phabricator.wikimedia.org/T260112 (10Marostegui) No errors so far, for the last 2 days. If all continues like this, on Monday I will remove the tables and the database from s3. [12:44:09] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) [13:28:09] kormat: I've split the 2 patches into 3, I think it is clearler this way [13:28:27] * kormat blinks [13:29:37] ah [13:30:07] I have a nitpick of mine that I don't like to rename/move and edit a file on the same patch (creating many smaller patches) [13:30:12] but that is me [13:30:23] unless there is a reason for it [13:30:46] but I don't treat HEAD all the time as working state (I don't like large patches) [13:31:04] only at the end of the patch-chain [13:31:14] responded on gerrit [13:32:28] my counter argument is none of this is "working" ATM [13:33:39] but I agree it is an issue [13:33:51] but I would prefer then to merge all 3 patches into 1 [13:34:11] i would not review that CR [13:34:24] that's far too much in one go [13:34:45] I agree, but then we are in a deadlock :-D [13:34:54] we need manuel to vote [13:35:54] or give a 3rd way [13:36:34] can you please check anyway the other patch [13:36:45] specially touching WMFMariaDBpy class [13:36:55] I want to know your high level thoughts [13:37:59] https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/619962/6/wmfmariadbpy/WMFMariaDB.py [13:38:09] this diff^ [13:38:37] (I don't have an issue to split that away too) [13:39:15] and basically the structure and dependencies created [14:02:51] i agree that the repo should not be left in a broken state after a commit [14:03:02] so in the case of moving files around, that does mean imports need to be fixed in the same commit [14:03:10] if there were CI, it would otherwise be voting negatively as well [14:03:34] obviously, don't do more changes than the bare minimum to keep things working as stuff moves around, but do at least that [14:03:35] but it would vote negatively after that 1 line fix [14:03:47] that is why I say I would agree, and merge the full change into 1 [14:04:21] other than that, keep changes small and manageable [14:05:15] my question is- if I import something from puppet to another repo [14:05:34] do I import if fixed, or do I copy and then fix in place? that is my main question [14:06:10] you could do that in a separate branch if you really wanted [14:06:20] ok, that works for me [14:06:30] but the main dev branches should stay clean [14:06:36] and work after each commit [14:06:48] I merge "unchaged" on a separate branch (not head) [14:06:56] and then I merge to head once fixed [14:08:02] yes, and that probably means squashed then [14:08:28] so best of both worlds [14:08:35] we track better the small changes [14:08:39] but we never break head [14:10:00] not that there's really a relationship between the import branch and the original (puppet?) repo of course [14:11:25] jynus: the question is then how you get your changes into the master branch [14:11:37] that can involve non-trivial complexity [14:11:53] for this case? [14:12:14] it's not clear to me what you actually want to do, here [14:12:37] I just want to separate the backups scripts on its own module [14:12:56] that's https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/619958 [14:13:08] importing scripts from puppet is https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/620005/ [14:13:22] the other patches are to "make them work" [14:13:37] which "them"? [14:13:40] that is why I wasn't worried about breaking them [14:13:47] because they were broken in the first place [14:14:02] these ones https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/619962/7 [14:14:25] I think it would be good to get some CI test coverage in place for the repo [14:14:32] mark: bigtime [14:14:47] and then have the rule that tests must always pass for each commit on the main dev branch [14:14:47] sure, but first let me make them work [14:16:01] that an packaging is why I am separating them [14:16:03] *and [14:16:16] so we have a mostly-self sustained module to test and package [14:16:31] so your proposal is to import 2 python scripts from puppet into a separate branch, then change the... 6 chars in each to fix the import path, and then merge that into master? [14:16:48] it is not 6 chars [14:16:55] it is more than that, see: https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/619962/7 [14:17:17] it is over 300 lines of code per file [14:17:42] after that is when I consider them "healthy" [14:17:51] ahh. because they have embedded libraries in them? [14:18:21] can you see why i am a bit worried to push them to head after just 1 line change? [14:18:39] but they depend on wmfmariadbpy a lot [14:18:49] so we reach a "sync" to the current version [14:18:54] and then we go hand in hand on head [14:18:59] tests, etc. [14:19:08] I can even add tests on a branch [14:19:12] no problem [14:19:35] but at the very least they should be what I call "executable" :-D [14:19:52] jynus: i was in the exact same situation in https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/618310 [14:20:26] kormat: and you did single commit, whith I +1ed and also ok with [14:20:31] for me [14:21:18] let me know what do you think we should do in this case [14:21:40] i'd do the same thing [14:21:41] honestly, I don't mind being less than perfect [14:21:51] given the early stages of development [14:22:10] this is almost as a "first repo setup" [14:22:43] ok, let me make a concrete proposal: [14:22:47] I hear [14:23:04] - fix the imports in https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/619958/ and let's get it merged [14:23:19] and then? [14:23:31] - remove the embedded libraries when copying the scripts in from puppet in https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/620005/, and merge that [14:23:49] - and finally, refactor the libraries like you're doing in https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/619962/ [14:24:05] ok, so basically that is moving small pieces of 619962 into the other 2, correct? [14:24:21] the tree is in a working state at all stages, the CRs are self-contained, and easy to review [14:24:24] yep [14:24:30] ok, seems fair [14:24:36] great :) [14:24:37] but [14:24:42] with a condition :-P [14:24:48] please review https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/619962/7 [14:24:56] on its later status :-D [14:25:08] which would be alsmost the same but a smaller diff [14:25:37] I can even split it if you want to modify WMFMAriaDB on a previous change [14:25:58] 0:-) [14:28:18] the change in that CR to WMFMariaDB.py LGTM [14:28:43] it'll be easier to review the rest of it when the CRs are in their final form though [14:28:53] yeah, ofc [14:29:00] I meant after the changes you requested [14:29:21] ahh, ok [14:29:28] i'm happy to review any/all CRs [14:30:59] I checkd and there is no path change on https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/620005 [14:31:19] the only change there is the WMFMariaDB api change [14:31:45] so that could got right now [14:31:49] **go [14:32:28] Should I move the wmfmariadbpy change there? [14:33:02] let me show it so it is easier to follow [14:37:25] https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/620005 can come in patch1 or patch2 versions, I think you want patch2, and we can merge that right away [14:47:54] kormat: the latest version is ready [14:49:07] i left a couple of minor comments [14:51:42] fixed them already [14:55:08] the commit message proposal contained 4 fixes [14:56:24] let me see, I may have only seen 1 [15:08:37] kormat: so ok to merge all 3 (because only the 2nd is impacting to wmfmariadb) or do I wait for the other 2 reviews? [15:10:29] i would prefer to get my outstanding CR merged first [15:10:37] which one? [15:10:41] https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/619476 [15:10:55] sure, go ahead [15:11:32] do you need my explicit +1? [15:11:59] for a change this big, having a review would be good [15:13:39] I am on it [15:13:51] but it is only big because it contains changes from transfer.py [15:32:43] jynus: oh sorry - i meant research after this CR is merged. we'll want it anyway for `wmfbackup` [15:33:48] that's ok, that was exactly what I suggested [15:33:57] but see my last comment [15:34:05] re: docs, yeah seen [15:34:27] the other is just adding multiple entry points for ci [15:34:36] if we needed them to be independent [15:35:16] so the reason I didn't have a look at this before is because I was convinced this was already merged [15:39:05] ah :) [16:24:31] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10RobH) [16:24:40] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10RobH) [16:39:58] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10RobH) [16:40:09] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10RobH) [20:13:42] 10DBA, 10cloud-services-team (Kanban): Add visitingwatchers to watchlist_count - https://phabricator.wikimedia.org/T150547 (10Bstorm) [20:34:44] hello, is a DBA around to drop a table from m2 by any chance? [20:34:59] i could get root but i'd rather ask first [20:49:34] PROBLEM - 5-minute average replication lag is over 2s on db1121 is CRITICAL: 3.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1121&var-port=9104&var-dc=eqiad+prometheus/ops [20:52:30] RECOVERY - 5-minute average replication lag is over 2s on db1121 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1121&var-port=9104&var-dc=eqiad+prometheus/ops [21:02:37] o/ [21:03:07] Killing my import command. [21:03:07] dpifke: hi! [21:03:11] dpifke: hey there [21:03:18] so there is an alert about replication right before you joined [21:03:33] Sorry I skipped ahead. [21:04:05] I saw the table disappear so I recreated it and started reloading the rows. [21:04:20] That's stopped now, will let you give an all clear before I continue. [21:04:26] mutante: thanks for pointing that out [21:04:33] dpifke: ok great, thanks [21:07:54] dpifke: do you have the schema you used handy? [21:10:19] Yeah, let me paste it somewhere. [21:10:31] thanks [21:11:51] https://phabricator.wikimedia.org/P12255 [21:14:23] ok, phew. replication is working again [21:14:53] i forgot just how dumb mysql replication is [21:14:55] yea, icinga recovery in the other channel [21:15:08] (surprise: it's super super dumb) [21:15:41] Heh. :) [21:18:04] ok! everything should be fine now [21:18:21] dpifke: please proceed [21:18:30] i'll hang out here for a little while just in case [21:18:30] Cool, restarting the import. [21:19:58] Appreciate that. Import is running. [21:22:30] Confirmed new rows from mwdebug1001 are making it in, so the schema fix solved the issue that prompted this exercise. [21:22:38] \o/ [21:31:06] Import just completed successfully. [21:31:15] flawless victory [21:31:35] Indeed! Thanks for the help. [21:31:41] you're welcome :) [21:32:02] :) [21:36:35] mutante: also, thank you for reaching out :) [21:38:12] kormat: thank you too. a gut feeling told me to ask because of replication i guess