[07:01:54] 10DBA, 13Patch-For-Review: Fix dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T130128#2960503 (10Marostegui) All good! ``` root@dbstore2001:~# mysql --skip-ssl -e "show all slaves status\G" | egrep "Connection|Seconds" Connection_name: m3 Seconds_Behind_Master: 0... [07:03:58] 10DBA, 06Operations, 10Wikimedia-General-or-Unknown: Spurious completely empty `image` table row on commonswiki - https://phabricator.wikimedia.org/T155769#2960504 (10Marostegui) >>! In T155769#2956568, @Legoktm wrote: > Let's just delete it? Seems similar to T96233. If you guys consider it is safe to delet... [07:12:22] 10DBA, 06Labs, 10Tool-Labs: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2960511 (10Marostegui) We cannot recover the passwords, as it is hashed. Probably the best shot here is to reset the password and regenerate your .my.cnf. I am sure that is done with a scri... [07:48:21] 10DBA, 13Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418#2960526 (10Marostegui) I have deployed the puppet change to enabled gtid_domain_id on m4, which is eventlogging. I have manually enabled it after running puppet on those hosts (db1046 and db1047... [08:31:38] 07Blocked-on-schema-change, 10DBA, 06Collaboration-Team-Triage, 10Flow, and 3 others: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2960555 (10Marostegui) I have set up the flowdb (with the current data) on a test environment to make sure that adding the PK on the ma... [08:57:06] 10DBA, 06Labs, 10Tool-Labs: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2960566 (10yuvipanda) 05Open>03Resolved a:03yuvipanda Users of the pXXXgXXX variety have been depreacted for a long time now. I see there's a replica.my.cnf file for tb-dev with credent... [08:57:48] 10DBA, 06Labs, 10Tool-Labs: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2960569 (10Marostegui) Thanks Yuvi!! [09:05:44] 10DBA: Meta ticket: Deploy InnoDB compression where possible - https://phabricator.wikimedia.org/T150438#2960578 (10Marostegui) I have started compression on s2 (on dbstore2001) [09:42:18] jynus: let me know when you're around, to talk about the toolsdb migration [10:34:31] jynus, marostegui: wanna talk you about something when you're around [10:34:59] paravoid, I am around [10:35:11] hi [10:35:15] so, not sure if you saw [10:35:25] https://phabricator.wikimedia.org/T155875 [10:35:42] the C2 switch rebooted again on Friday night [10:35:42] no [10:35:57] it's still being investigated but the cause is largely unknown [10:36:12] it may be a junos bug or a hardware issue [10:36:23] if it's the latter we'll replace it (= downtime) [10:36:24] but that is related to labstore1004 ? [10:36:29] not to databases? [10:36:35] it may not happen again, or it may (in which case, downtime) [10:36:52] labstore1004 is just an interesting host on that switch, but the switch has a bunch of other servers connected to it [10:37:04] ok [10:37:09] including... [10:37:13] 04:32 < volans> S1 master and both S2 API servers [10:37:13] 04:33 < volans> and all S1 watchlist, recentchanges, contributions, logpager slaves [10:37:22] https://racktables.wikimedia.org/index.php?page=rack&rack_id=1957 is the full list [10:39:03] so what is your recommendations, should we avoid using those host for critical roles [10:39:15] or do nothing for now [10:39:18] yeah, I'd like to see how possible would that be on a short notice [10:39:33] relatively easy [10:39:51] is an s1 master switchover relatively easy these days? [10:39:53] that's great to hear :) [10:40:05] easy doesn't necesarelly means fast [10:40:36] I have to check each node individually [10:40:55] ok [10:40:58] let's do that [10:41:15] there is 12 nodes [10:41:41] having the s1 master be connected to a switch that has rebooted twice for unknown reasons in the past 2 weeks doesn't help me sleep at night :P [10:41:58] I have to see what each of these 12 is doing and make sure they are not used for anything that is not automatically depoole [10:41:59] d [10:42:12] ok [10:42:23] paravoid, I am really, really thankful [10:42:30] for communicating such an issue [10:42:38] it makes things much easer for us [10:42:45] heh [10:42:47] ok :) [10:43:02] I will take this as high priority, as not emergency [10:43:06] *but [10:43:19] to make sure I do not rush and do thing properly [10:43:34] alright [10:43:48] so far the issue has been that the switch reboots, so we lose connectivity for something like 5 minutes [10:44:00] so long? [10:44:01] that doesn't mean it will be the case if it happens again of course [10:44:23] if it's a hardware issue it might die entirely, who knows really [10:44:25] it's all very blurry :( [10:44:40] so let's prepare for the worst [10:44:47] strange, 5 minutes should create normally lots of complains [10:44:57] but I haven't seen such [10:45:01] hey guys, sorry, I was having a sandwhich, I am back now [10:45:16] hi ;) [10:46:44] oh, that one again :( [10:47:49] will obviously discuss it more tonight at our meeting [10:48:06] but I wanted you guys to get a heads-up [10:48:18] while it's still your working day even :) [10:48:28] thanks :) [10:48:36] paravoid is there a replacement switch around? [10:48:38] let's make a plan, if you are around [10:48:55] marostegui: we do have spare switches yes, but replacing it is not trivial _at all_ [10:49:12] paravoid, maybe we can discuss it with you before doing anything? [10:49:23] but after the plan is done? [10:49:39] happy to [10:49:48] I see :)+ [10:49:51] well, not discussing, just like a "this is the plan", ok? [10:49:59] sure [10:50:46] marostegui: it's not super difficult, but the way it works in our configuration is that each row's switches (so 8 switches in total) are stacked together to form a so-called "virtual chassis" [10:51:49] which is proprietary tech that essentially moves the switches' logical plane into a common one [10:52:12] this means that those switches share state and thus a) have to run the exact same version of the software (so we have to find and install that in the replacement switch) [10:52:27] and b) replacing it means replacing it in the virtual chassis via some configuration commands etc. [10:52:51] if this is not a hardware bug but rather a software bug, this may mean that the whole stack's state is corrupted in some way [10:52:57] so even that replacement won't cut it [10:53:52] oh, I see. Thanks for the explanation [10:54:05] Indeed sounds complicated and not easy/fast [10:55:40] no worries [10:57:32] paravoid jaime and myself have our weekly meeting in 3 minutes, so we will discuss this and let you know :) [10:57:44] ok :) [10:58:21] marostegui: jynus do discuss the toolsdb stuff too :D (if you haven't already planned to!) [11:09:51] paravoid, one question [11:09:57] shoot! [11:10:07] let's assume we mitigate [11:10:14] no SPOF, etc. [11:10:44] we can even, from our POV, lose conectivity of the entire rack [11:10:53] what are the next steps? [11:11:06] because depending on that [11:11:13] we would either relocate servers [11:11:28] or just depool them for days or weeks [11:12:05] we haven't decided for sure yet, it can be either one of "wait to see if that happens again" or "replace the switch" [11:12:27] let me say why I am asking [11:12:37] and maybe that may help you answer [11:12:38] how long you think replacing the switch can take? 1 day? 2 days? [11:12:38] I'd be more inclined to say to take the risk and just wait a few weeks to see if we get a spontaneous reboot again [11:12:52] marostegui: nah, maybe half an hour or something like that [11:12:54] there are servers that we can just stop using [11:12:56] ah gotcha [11:13:04] but obviously [11:13:13] if it will take long [11:13:20] marostegui: (that's the time to execute, not plan it) [11:13:30] we will want them in use [11:13:56] so that will decide between relocate them physically (without hurry) [11:14:09] or just keep them down [11:14:53] in fact, several of them we may want them moved anyway [11:15:08] because they were not chosen wisely for HA [11:15:10] can't we just keep them pooled if they're slaves? [11:15:18] we should probably relocate two of them + switchover s1 master, don't you think jynus? [11:15:31] not if they happen to have the same role and are special slaves [11:15:53] ^ see manuel's suggestion [11:15:56] right [11:16:10] they should depool automatically [11:16:27] but we would be on a soft degraded state [11:16:36] right now we have 3 SPOF: we would lose s1 recentchanges service and s2 api, and s1 master [11:16:46] I use "SPOF" [11:16:59] because the service would be served by the other slaves [11:17:03] automatically [11:17:09] so if we just move one s1 rc server + 1 s2 api we would at least run into degraded service, but not a completely down service [11:17:23] but when we had issues, because api and rc queries are so costly [11:17:27] they create issues [11:17:39] (which is why they are separated in the first place) [11:17:46] so for 5 minutes it is not an issue [11:17:55] for longer, it is [11:18:19] it never goes down, marostegui [11:18:33] but it switches to main server weight [11:18:38] but can we overload the other hosts with those queries? [11:18:49] it depends on the shard [11:18:58] normally, it is the other way around [11:19:10] api and rc queries get very slow [11:19:32] I think s2 would survive, api doesn't require special slaves [11:19:43] that's true [11:19:43] rcs will break for s1 for sure, because I tested it [11:20:11] so the fastest thing is probably ask christ to move one of them somewhere else? [11:20:14] *chris [11:20:22] special slaves are critical for s1, s4 and s5 [11:20:31] yes [11:20:52] depool, move 2 servers around, switchover db1057 [11:22:34] that rack and the following one had all servers in a row [11:23:21] I think the other problematic was D [11:23:42] D1 [11:23:42] D1: Initial commit - https://phabricator.wikimedia.org/D1 [11:24:38] I would literally move db1052 away, too [11:24:55] to another rack? [11:25:45] I have added possible destination racks to the etherpad [11:25:49] 10DBA, 10Cognate, 10Wikidata, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Cognate DB review - https://phabricator.wikimedia.org/T148988#2960961 (10Addshore) @jcrespo would it be possible for you to confirm the review has happened / close this ticket? :) [11:26:07] .28 [11:26:12] :-/ [11:27:17] what? [11:27:23] we do not have a good candidate [11:27:51] to switch over to s1? [11:27:59] yeah [11:28:11] 52- too new [11:28:15] what about db1063 or db1067? (they would need to get the need data from s1 though) [11:28:20] 51-66, special [11:28:52] 72 .28 [11:29:21] actually, 72 would not be that bad [11:29:37] maybe we should upgrade the others [11:29:51] 72 has been api for weeks [11:30:02] without rewrite, and caused no issues [11:30:04] 72 is vslow isn't it? [11:30:28] yes, but that is easy to move [11:30:43] I would trust 52 more, however [11:30:47] slower [11:30:53] and has been the master previously [11:31:00] so lower risk [11:31:41] we just need to upgrade the api servers [11:31:47] it is running RBR remember [11:31:50] (just saying) [11:32:04] and? [11:32:43] nothing, just saying that it is running RBR whereas current master runs SBR [11:32:50] I think we have other jessies as masters [11:33:11] s2, s3, s4 [11:34:57] ok - check out the initial idea on the etherpad [11:35:04] to see if that sums up all the stuff [11:35:47] we need to upgrade the apis as a blocker [11:36:03] I do not trust 28 -> 16 replication [11:37:16] noted [12:38:03] 10DBA, 06Labs, 10Tool-Labs: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2961051 (10Tb) Okay ta. Can you GRANT ALL to s51111 on the databases below please. I'll migrate them to the proper names over the next few weeks and raise a new ticket to drop p50380g50491'... [12:41:05] 10DBA, 06Labs, 10Tool-Labs: Reset password for database user p50380g50491 - https://phabricator.wikimedia.org/T155902#2961052 (10yuvipanda) 05Resolved>03Open [13:40:52] 10DBA, 06Labs, 06Operations, 10netops: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961118 (10Marostegui) [13:55:43] thanks folks [13:55:47] that's awesome [13:58:11] ^ these servers will likely be depooled for a few days, right? if so, that sounds like a nice opportunity to reboot these into the latest jessie/trusty kernels, let me know if I can help with that [14:46:18] marostegui: do you have time to take a quick look at https://phabricator.wikimedia.org/T155902#2961051? [14:52:10] am I here? [14:52:26] or is my client lying again? [14:53:17] yuvipanda: hi you're here [14:55:18] ok! [14:55:59] hey yuvipanda - is that something we have to do or you guys (labs) do? [14:56:56] marostegui: I'm not sure - we haven't done this before for a while, because we moved people off the old style accoutns years back. This must've been missed [14:57:06] marostegui: how about I make the SQL and verify with you before running it myself? [14:57:59] yuvipanda: sure, I didn't mean like you do it yourself, I was just checking if that is something we, DBAs, do or it is normally done by labs. The SQL isn't too hard :) [14:58:06] I was not tracking to slack off :) [14:58:25] marostegui: np! I'm unsure too - usually we're the ones who deal with most access stuff, I guess. [14:58:51] seems like whoever writes down the logic a +1 from the other is good enough? [14:58:57] for a happy marriage [14:59:26] hahaha [14:59:30] chasemp: haha, everywhere I go people bring up marriage ;) [14:59:41] heh [14:59:48] yuvipanda, let's talk labsdb1005 [15:01:18] hi jynus [15:01:26] hi [15:02:35] jynus: I've a short meeting starting shortly though. [15:02:49] ok [15:04:46] 10DBA, 10MediaWiki-Special-pages, 10Wikimedia-Site-requests, 13Patch-For-Review, 07Wikimedia-log-errors: "Invalid DB key" errors on various special pages - https://phabricator.wikimedia.org/T155091#2961348 (10TTO) >>! In T155091#2959031, @TTO wrote: > We could also do with a script that handles categorie... [15:08:45] yuvipanda: let's check the grant thingy once you are back from the meeting then :) [15:15:23] 10DBA, 06Operations, 10ops-eqiad: Move db1051 to row D2 - https://phabricator.wikimedia.org/T156004#2961384 (10Marostegui) [15:16:39] 10DBA, 06Operations: Reimage db1065 and db1066 - https://phabricator.wikimedia.org/T156005#2961400 (10Marostegui) [15:20:53] 10DBA, 06Operations, 10ops-eqiad: Move db1051 to row D2 - https://phabricator.wikimedia.org/T156004#2961418 (10Marostegui) [15:21:51] 10DBA, 06Operations, 10ops-eqiad: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961422 (10Marostegui) [15:24:39] 10DBA, 06Operations, 10netops: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2961450 (10Marostegui) [15:24:50] 10DBA, 06Labs, 06Operations, 10netops: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961118 (10Marostegui) [15:26:50] 07Blocked-on-schema-change, 10DBA, 06Collaboration-Team-Triage, 10Flow, and 3 others: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2961481 (10Marostegui) I have taken a backup of flowdb - `db1031:/srv/tmp/flowdb_23_jan.sql` and going to start altering the tables now. [15:27:46] before you start with flow [15:28:05] how can I help with asw tasks? [15:28:21] you want to reimage the api hosts? [15:28:28] or one of them? [15:28:30] ok [15:28:32] and I can take care of the other one? [15:28:37] yes, only one first [15:28:51] yes, of course [15:29:47] 10DBA, 06Operations: Reimage db1065 and db1066 - https://phabricator.wikimedia.org/T156005#2961499 (10jcrespo) a:03jcrespo [15:30:44] thanks! [15:31:16] https://phabricator.wikimedia.org/T150206#2860862 :-~ [15:32:28] uf :( [15:32:37] hp again :( [15:41:48] 10DBA, 06Operations, 10ops-eqiad: Move db1051 to row D2 - https://phabricator.wikimedia.org/T156004#2961542 (10faidon) D 2 is a potentially bad choice, as it will soon become a 10G rack once the T148506 migration is (finally…) done. Any other rack except D 2 & D 7 (new 10G) and D 6-8 (current 10G) is probabl... [15:42:03] marostegui, I may be wrong, but shouln't T156008 [15:42:04] T156008: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008 [15:42:14] be stalled [15:42:27] but T156006 ASAP [15:42:28] T156006: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006 [15:43:04] jynus: yes, I stalled the db1052 in order to avoid chris having all of a sudden two tasks in high priority [15:43:15] but yes, we can de-stall it [15:43:31] ok, what is the other high? [15:43:48] because I just asked for this to be high, I am a bit lost [15:43:52] marostegui: I can move all of the servers tomorrow or do you want to schedule for later this week? [15:44:06] tomorrow would be great :) [15:44:22] s1-master higher priority than the rest? [15:44:35] 10DBA, 06Labs, 06Operations, 10netops: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961547 (10Marostegui) [15:44:38] 10DBA, 06Operations, 10ops-eqiad: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961546 (10Marostegui) 05stalled>03Open [15:45:08] perfect....I will get in their early so we can get that done first thing (first thing here) [15:45:22] marostegui: GRANT ALL PRIVILEGES ON p50380g50491_common.* TO s51111; and repate for all dbs [15:45:25] thank you very much, really appreciate it! [15:45:31] thanks chris!! [15:45:37] see my comment on T156004 though [15:45:37] T156004: Move db1051 to row D2 - https://phabricator.wikimedia.org/T156004 [15:45:44] D2 might not be the best choice [15:45:45] D2: Add .arcconfig for differential/arcanist - https://phabricator.wikimedia.org/D2 [15:45:47] paravoid: yep, I am looking for a new rack [15:45:51] cool [15:45:53] yuvipanda: give me a sec :) [15:46:11] thanks guys, I feel so much better about all this [15:46:33] paravoid, is there a place to easily know network "anomalies"? [15:46:35] marostegui and jynus plan for 1400UTC [15:46:45] jynus: what kind of anomalies? :) [15:46:49] like dedicated labs racks [15:46:53] 10DBA, 06Operations, 10ops-eqiad: Move db1051 to row D2 - https://phabricator.wikimedia.org/T156004#2961548 (10Marostegui) Thanks Faidon - let's go for B3 then? [15:46:54] cmjohnson1: sounds awesome! [15:47:00] or different speed switches [15:47:13] ah that kind of thing [15:47:15] I guess not, no [15:47:16] or anything "different" [15:47:26] ok, we will ping you for now :-) [15:47:31] sorry about that [15:47:31] 07Blocked-on-schema-change, 10DBA, 06Collaboration-Team-Triage, 10Flow, and 3 others: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2961551 (10Marostegui) Both tables have been successfully altered ``` root@PRODUCTION x1[flowdb]> show create table flow_topic_list\G... [15:47:48] this case is especially weird [15:47:54] paravoid, thanks for this [15:47:56] we're in the middle of a switch transition [15:48:09] we actually spotted many issues in existing infrastrcutre [15:48:14] 07Blocked-on-schema-change, 10DBA, 06Collaboration-Team-Triage, 10Flow, and 3 others: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2961579 (10Marostegui) 05Open>03Resolved [15:48:17] in terms of physical distribution [15:48:18] 10DBA, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, and 2 others: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#2961580 (10Marostegui) [15:48:25] which includes changing the position of the 10G switch, so the current status is even more special than usual :( [15:48:39] 10DBA, 06Operations, 10ops-eqiad: Move db1051 to row B3 - https://phabricator.wikimedia.org/T156004#2961584 (10Marostegui) [15:48:53] 10DBA, 06Operations, 10ops-eqiad: Move db1051 to row B3 - https://phabricator.wikimedia.org/T156004#2961586 (10faidon) Sure, no objections from a networking/DC design perspective :) [15:48:56] we will seek your ok for the movements [15:49:01] \o/ [15:49:26] yuvipanda: labsdb1001 and 1003? [15:49:51] paravoid, in the future I will ask for a single db server on the 10G, but that is for later [15:49:57] marostegui: yup [15:50:00] a bunch of dbs [15:51:37] 10DBA, 06Operations, 10ops-eqiad: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961587 (10jcrespo) ``` marostegui and jynus plan for 1400UTC ``` [15:52:16] jynus: yeah that's fine, we'll have lots of 10G ports [15:52:27] jynus: but 10G servers go to specific racks in the DC [15:52:35] this is just the backup server [15:52:46] the others do not saturate not even 1G [15:53:00] aka the recovery server [15:55:40] 07Blocked-on-schema-change, 10DBA, 06Collaboration-Team-Triage, 10Flow, and 3 others: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2764089 (10jcrespo) [15:55:44] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#2961604 (10jcrespo) [15:55:46] yuvipanda: from the databases he is listing on the ticket, only the *_common exists on labsdb1001 and 1003 [15:56:10] am I missing something? [15:57:26] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#197580 (10jcrespo) [15:57:29] 07Blocked-on-schema-change, 10DBA: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#2961632 (10jcrespo) [15:57:35] marostegui: oh, interesting. let me see where s1 and s3 point to [15:57:45] marostegui: maybe they were on labsdb1002 which would be dead [15:58:04] yuvipanda: only p50380g50491_common exists [15:58:27] ^did you check toolsdb, too? [15:59:01] No, I thought it was only labs boxes involved. Let me check that. Thanks [15:59:30] the user didn't specify toolsdb tho [15:59:32] They are not there either [15:59:59] oh well. I guess they went away for a really long time. [16:00:07] I'll respond to them on the bug, marostegui [16:00:21] yuvipanda: ok thanks - I can grant that for the one that works if you like [16:00:34] marostegui: if you're already there, please do! [16:01:33] you can blame me if that helps [16:01:39] yuvipanda: done for 1001 and 1003 [16:01:39] :-) [16:01:42] haha [16:01:51] marostegui: <3 thanks. [16:02:01] jynus: blaming the DBA is a time honored wikimedian tradition! [16:02:12] nah [16:02:33] we know it is all "because labs" :-P [16:04:13] jynus: :D [16:05:16] please tell me when you fisnish your meeting to give you good news about mysql 5.5 [16:08:52] 10DBA, 06Operations, 10ops-eqiad: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961687 (10Marostegui) Reminder: - we need to silence labsdb1009, 1010 and 1011 as they replicate from it. - we need to stop replication on them - once it is back, we need to repoint them to the new IP [16:12:11] jynus: I'm finished! [16:12:17] jynus: we're ging into another meeting in 10 minutes tho [16:12:20] jynus: what's the good news about mysql 5.5? [16:13:45] 10DBA, 06Operations, 10ops-eqiad: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961719 (10jcrespo) > we need to silence labsdb1009, 1010 and 1011 as they replicate from it. But they replicate from db1094 :-/ [16:14:29] 10DBA, 06Operations, 10ops-eqiad: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961733 (10Marostegui) >>! In T156006#2961719, @jcrespo wrote: >> we need to silence labsdb1009, 1010 and 1011 as they replicate from it. > > But they replicate from db1094 :-/ Gah, sorry! Yes, I meant db109... [16:14:57] jynus: thanks for that ^ I was looking at 1009,1010 for the previous ticket with yuvi and my mind just moved into that when writing that comment! [16:16:39] I do not intend to be pedantic, but that could either mean I am not understanding the task [16:17:01] no no, you are right [16:17:05] or create a misunderstanding even for yourself, when you are handling hundreds of servers [16:17:07] db1052 is db1095's master, not labs [16:18:06] on pourpose, those servers do not have an alerts precisely for these cases [16:18:42] I also made a mistake, with db1094-95 [16:19:08] this can lead me to reboot the wrong servers [16:22:40] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2961764 (10jcrespo) [16:22:43] 10DBA, 06Operations: Reimage db1065 and db1066 - https://phabricator.wikimedia.org/T156005#2961763 (10jcrespo) [18:10:36] 10DBA, 06Operations, 10Wikimedia-General-or-Unknown: Spurious completely empty `image` table row on commonswiki - https://phabricator.wikimedia.org/T155769#2962307 (10matmarex) >>! In T155769#2960504, @Marostegui wrote: > If you guys consider it is safe to delete, go ahead, but please remember to use mediawi... [18:14:11] when you come back tomorrow, marostegui: "Synchronized wmf-config/db-eqiad.php: Depool db1065 (duration: 00m 39s)" [18:14:23] that is a point for transparent failover [19:32:07] 10DBA, 06Community-Tech: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2962680 (10MusikAnimal) [19:33:27] 10DBA, 10CheckUser, 06Community-Tech: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2952710 (10MusikAnimal) [20:18:51] 10DBA, 10CheckUser, 06Community-Tech: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2962873 (10MusikAnimal) [20:19:22] 10DBA, 10CheckUser, 03Community-Tech-Sprint: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2952710 (10MusikAnimal) p:05Triage>03Normal [20:22:15] 10DBA, 10CheckUser, 03Community-Tech-Sprint: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2962885 (10MusikAnimal) @jcrespo Sorry for the ping, hoping this one is a simple yes/no/try-this-instead. For most wikis I don't think altering `cu_changes` wil... [20:25:16] 10DBA, 10CheckUser, 03Community-Tech-Sprint: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2962890 (10jcrespo) This is not a simple question, you are basically asking me what I think on a design of a dataset/schema change I do not know much about. I w... [21:01:53] 10DBA, 06Labs, 10Wikidata, 07Performance, and 3 others: Create a new project in labs for testing RedisLock in Wikidata - https://phabricator.wikimedia.org/T155042#2963041 (10chasemp) [21:03:17] 10DBA, 06Labs, 10Wikidata, 07Performance, and 3 others: Increase quota for wikidata-dev project - https://phabricator.wikimedia.org/T155042#2931757 (10chasemp) [21:03:36] 10DBA, 06Labs, 10Wikidata, 07Performance, and 3 others: Increase quota for wikidata-dev project - https://phabricator.wikimedia.org/T155042#2931757 (10chasemp) @Andrew I'll +1 this [21:49:58] 10DBA, 10CheckUser, 03Community-Tech-Sprint: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2963178 (10MusikAnimal) >>! In T155734#2962890, @jcrespo wrote: > This is not a simple question, you are basically asking me what I think on a design of a datas... [22:16:03] 10DBA, 10CheckUser, 03Community-Tech-Sprint: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2963234 (10MusikAnimal) [22:24:40] 10DBA, 10CheckUser, 03Community-Tech-Sprint: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2963266 (10jcrespo) The following may sound like a noob question, but please have into account I wasn't familiar at all with this extension's persistent layer a... [23:55:34] 10DBA, 10CheckUser, 03Community-Tech-Sprint: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2963551 (10MusikAnimal) >>! In T155734#2963266, @jcrespo wrote: > Why are some columns duplicated from rc, log, revisions? Is it because rows don't correspond 1... [23:56:38] 10DBA, 10CheckUser, 03Community-Tech-Sprint: Investigation: Add old and new length columns to cu_changes - https://phabricator.wikimedia.org/T155734#2963560 (10MusikAnimal)