[05:01:57] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) [05:17:34] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-09-15) rack/setup/install db1150 (see note on hostname) - https://phabricator.wikimedia.org/T260817 (10jcrespo) Any update on this? A week has passed beyond the "Need by" date. [05:21:53] 10DBA, 10Operations, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove sections from db configs - https://phabricator.wikimedia.org/T263127 (10Marostegui) >>! In T263127#6485437, @daniel wrote: > Contributions queries are somewhat special, we may want to keep them separate in case we want... [05:28:23] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2014.codfw.wmnet - https://phabricator.wikimedia.org/T262889 (10Marostegui) [05:33:45] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2014.codfw.wmnet - https://phabricator.wikimedia.org/T262889 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `es2014.codfw.wmnet` - es2014.codfw.wmnet (**PASS**) - Downtimed host on Icinga... [05:47:28] how's tendril doing? [05:47:35] just finished [05:47:41] should be back [05:47:59] 10DBA: tendril_purge_global_status_log_5m and global_status_log needs more frequent purging - https://phabricator.wikimedia.org/T252331 (10Marostegui) Tables purged [05:58:16] 10DBA, 10Wikidata, 10Wikidata-Campsite: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 (10Marostegui) db2084 got those keys dropped: ` root@db2084.codfw.wmnet[wikidatawiki]> show create table wb_changes; +------------+------------------------------------------------------------... [06:11:12] 10DBA, 10decommission-hardware: decommission es2012.codfw.wmnet - https://phabricator.wikimedia.org/T263613 (10Marostegui) [06:11:18] 10DBA, 10decommission-hardware: decommission es2012.codfw.wmnet - https://phabricator.wikimedia.org/T263613 (10Marostegui) [06:11:20] 10DBA, 10Patch-For-Review: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) [06:11:44] 10DBA, 10decommission-hardware: decommission es2012.codfw.wmnet - https://phabricator.wikimedia.org/T263613 (10Marostegui) This host has been depooled: ` [06:08:12] <+logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es2012 for decommmissioning', diff saved to https://phabricator.wikimedia.o... [06:31:08] 10DBA, 10decommission-hardware: decommission es2018.codfw.wmnet - https://phabricator.wikimedia.org/T263615 (10Marostegui) [06:35:04] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2018.codfw.wmnet - https://phabricator.wikimedia.org/T263615 (10Marostegui) [06:35:47] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2018.codfw.wmnet - https://phabricator.wikimedia.org/T263615 (10Marostegui) [06:35:50] 10DBA, 10Patch-For-Review: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) [06:35:51] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2018.codfw.wmnet - https://phabricator.wikimedia.org/T263615 (10Marostegui) mysql stopped, let's give it a some hours before going ahead for the decommissioning. [06:35:59] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2012.codfw.wmnet - https://phabricator.wikimedia.org/T263613 (10Marostegui) mysql stopped, let's give it a some hours before going ahead for the decommissioning. [07:04:29] 10DBA, 10Wikidata, 10Wikidata-Campsite: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 (10Ladsgroup) So tendril doesn't have anything in sampled queries: https://tendril.wikimedia.org/report/sampled_queries?host=^db&user=wikiuser&schema=wik&hours=1 (maybe it's not sampling from... [07:06:44] 10DBA, 10Growth-Structured-Tasks, 10Growth-Team: Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Marostegui) >>! In T261411#6479345, @kostajh wrote: > >> If so: >> - What if the script doesn't run/fails? > > Then we wil... [07:09:02] 10DBA, 10Wikidata, 10Wikidata-Campsite: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 (10Marostegui) I can also try to live capture some on db2084. >>! In T262856#6486318, @Ladsgroup wrote: > ` > > Mostly are straightforward but I actually found a query on master in the c... [07:11:06] 10DBA, 10Wikidata, 10Wikidata-Campsite: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 (10Marostegui) ` root@db2084.codfw.wmnet[wikidatawiki]> CREATE INDEX /*i*/wb_changes_change_revision_id ON /*_*/wb_changes (change_revision_id); Query OK, 0 rows affected (10.702 sec) Record... [07:55:36] now that I can see the new packages working WARNING: the backup commands have changed a bit [07:55:53] I will update docs ASAP [07:56:44] instead of remote_backup_mariadb.py it is remote-backup-mariadb [07:56:59] instead of recover_dump.py, it is recover-dump etc [08:09:05] 10DBA, 10Operations, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove sections from db configs - https://phabricator.wikimedia.org/T263127 (10daniel) >>! In T263127#6486164, @Marostegui wrote: >>>! In T263127#6485437, @daniel wrote: >> Contributions queries are somewhat special, we may w... [08:23:56] 10DBA, 10Operations, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove sections from db configs - https://phabricator.wikimedia.org/T263127 (10jcrespo) Daniel: load groups support may be ok- however some research should be done that they *actually* provide a performance benefit. What was... [08:32:59] 10DBA, 10Wikidata, 10Wikidata-Campsite: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 (10Marostegui) @Ladsgroup db2084 has half the weight it used to, I am capturing live queries arriving to `wb_changes`, so far I haven't found anything strange with their query plans, or extre... [08:36:15] 10DBA, 10PM: Update the DBA task tracking workflow - https://phabricator.wikimedia.org/T263463 (10LSobanski) @mmodell : "The field will show on any task that has a value for the field." - so it only applies if the field value was set at creation time (presumably via a form)? As a side note, looking at the adva... [09:18:09] 10DBA, 10Operations, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove sections from db configs - https://phabricator.wikimedia.org/T263127 (10ArielGlenn) >>! In T263127#6470471, @Marostegui wrote: > So in the past, the hosts serving those 5 groups used to have different schema partitioni... [10:02:34] 10DBA, 10Wikidata, 10Wikidata-Campsite: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 (10Marostegui) db2084 is fully repooled, let's monitor its performance [10:07:59] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) s2 codfw progress: [] db2088.codfw.wmnet [] db2095.codfw.wmnet sanitarium [] db2098.codfw.wmnet dbstore [] db2104.codfw.wmnet [] db2107.c... [11:26:54] 10DBA, 10Operations, 10netops, 10ops-eqiad, and 3 others: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10jijiki) [11:47:59] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10Marostegui) @wiki_willy do you think it is feasible to have these hosts racked&installed by 30th Oct? [11:57:01] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [12:05:42] marostegui: something doesn't make sense to me here. for 'api' group in s2/codfw, the current weights are db2104: 100 and db2126: 200. and yet grafana says that db2104 is getting about 50% more QPS (14.3K vs 9.1K) [12:06:06] https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&from=now-1h&to=now&refresh=5m&var-server=db2104&var-port=9104 [12:06:10] https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&from=now-1h&to=now&refresh=5m&var-server=db2126&var-port=9104 [12:06:21] kormat: but db2105 has 500 on main traffic [12:06:24] and db2126 has 200 [12:06:46] ohh. doh. [12:08:29] the division between 'main' and groups is somewhat unhelpful [12:09:27] yeah, sometimes it is easy to forget [12:10:55] marostegui: i guess there's no way to see how much traffic is due to main vs groups on a given instance, right? [12:11:26] kormat: no, there is no way to distinguish that [12:11:43] inconvenient [12:12:14] another point in favour of getting rid of these groups :) [12:12:33] kormat: yeah, but the api one will always exist most likely, the others are up for discussion [12:12:39] actually, i should update the relevant task with that [12:12:43] ack [12:13:11] I believe api+vslow,dump will need to be there, the others...to be discussed [12:13:21] marostegui: in T263127 should the title have been 'groups' instead of 'sections'? [12:13:21] T263127: Remove sections from db configs - https://phabricator.wikimedia.org/T263127 [12:13:32] yeah, maybe it is clearer [12:13:54] the mw name is "load group" [12:13:58] 10DBA, 10Operations, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Kormat) [12:25:03] jynus: ah, let me change to that then, for even more clarity. thanks :) [12:25:11] it is ok as it is now [12:25:16] it was FYI [12:25:26] sections was indeed confusing [12:25:28] ok, if that's clear enough [12:25:30] gotcha [12:25:55] the other thing I would like to change is were we call sections "shards", as that is technically incorrect in some cases [12:26:15] yep, agreed. i think that might be in my 'refactor puppet' master task [12:26:17] and use the mw language (section) or the more correct (replica sets) [12:26:59] shards for es and pc1 is correct, they are shards, not so much for s* and m* ones [12:29:01] 10DBA, 10Operations, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Kormat) Another reason to remove load groups where possible is they make it very difficult to predict what effect depooling a db server will have on... [12:36:25] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) All sanitarium hosts for all... [12:36:32] I am partially responsible for spreading shard, because that is what the previous dba called them, so I kept using it until I saw it was confusing [12:41:38] jynus: is it worth creating a task to review our documentation for these terms and introducing consistent usage? [12:42:36] I would love to, but I never had the time, and not sure how other people would feel about being "imposed" certain terms [12:43:30] personally I think it avoids missunderstandings, I remember when manuel came it, we used slighly different terminology for boxes/servers/hosts [12:43:57] but on the other side I would like to impose "one way" of calling things [12:44:01] *wouldn't [12:44:41] and we had a huge issue about what a raw backup was when talking to mark :-D [12:46:07] manuel and I wrote a Glossary of terms back when the database backups were designed, if that helps as a starting point: https://docs.google.com/document/d/10HorEne5tNNIJ1afkllcxtPyS6eUK61hM-WRowMf66k/edit#heading=h.nza80b7h8bos [12:46:21] it is a bit outdated now [12:59:13] 10DBA, 10PM: Update the DBA task tracking workflow - https://phabricator.wikimedia.org/T263463 (10mmodell) >>! In T263463#6486516, @LSobanski wrote: > @mmodell : "The field will show on any task that has a value for the field." - so it only applies if the field value was set at creation time (presumably via a... [12:59:37] 10DBA, 10Data-Persistence, 10PM: Update the DBA task tracking workflow - https://phabricator.wikimedia.org/T263463 (10LSobanski) [12:59:40] 10DBA, 10Data-Persistence: Create a "how to engage us" process and documentation for Data Persistence - https://phabricator.wikimedia.org/T263456 (10LSobanski) [13:00:48] 10DBA, 10Data-Persistence: Clean up DB related pages on Wikitech - https://phabricator.wikimedia.org/T263420 (10LSobanski) [13:03:17] Apologies for the spam, I'm trying out some stuff. [14:22:13] 10DBA, 10Wikidata, 10Wikidata-Campsite: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 (10Marostegui) @Ladsgroup after a few hours, I have not seen any significant impact on the host's slow queries for now. The host dashboard at https://grafana.wikimedia.org/d/000000273/mysql?o... [14:55:56] 10DBA, 10Operations, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Krinkle) [14:57:46] 10DBA, 10Operations, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Krinkle) I've added to the task description: > * [ ] Understanding and agreement on which of these (if any) we need to keep, and why. To be carried... [14:59:39] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10wiki_willy) Hi @Marostegui - I think that should be doable. During my sync up with @Cmjohnson and @RobH tomorrow, we'll discuss and see if we can get... [15:00:00] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10Marostegui) Thank you! [15:39:46] 10DBA, 10Wikidata, 10Wikidata-Campsite: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 (10Ladsgroup) Thank you so much! Let me know If I can be of any help. [16:09:32] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) @Marostegui Thank you for the approval of Czech Wikipedia! I'm sending an update that Watchlist Expiry is now up on Officewiki and Mediawiki.org, and we'... [16:18:59] 10DBA, 10Jade, 10Machine Learning Platform (Research): Review real-world query plans and performance for Jade - https://phabricator.wikimedia.org/T212435 (10calbon) [16:42:49] 10DBA, 10Jade, 10Machine Learning Platform: Review real-world query plans and performance for Jade - https://phabricator.wikimedia.org/T212435 (10calbon)