[05:21:30] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3668253 (10Marostegui) [05:45:55] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661#3668262 (10Marostegui) I have tested the procedure described at T168661#3663752 with another core host and it worked perfectly. [06:21:30] 10DBA, 10Data-Services, 10XTools, 10cloud-services-team: Request to increase active connection quota for user s51187 on analytics.db.svc.eqiad.wmflabs - https://phabricator.wikimedia.org/T177570#3668279 (10Marostegui) I gave 30 to s51187 on .web. Let's see how that goes... [06:23:20] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668283 (10Marostegui) [06:26:33] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668294 (10Marostegui) [06:30:45] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3668296 (10Marostegui) [06:37:09] 10DBA, 10Commons, 10Contributors-Team, 10MediaWiki-Watchlist, and 11 others: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis - https://phabricator.wikimedia.org/T171027#3668304 (10thiemowmde) [07:00:23] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3668338 (10Marostegui) [07:18:38] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668357 (10jcrespo) Correct me if I am misunderstanding something, but on RAID 10, we can lose a whole mirror group and we would be ok, what we cannot lose is the same disk on the two mirror... [07:20:15] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668371 (10Marostegui) >>! In T177720#3668357, @jcrespo wrote: > Correct me if I am misunderstanding something, but on RAID 10, we can lose a whole mirror group and we would be ok, what we c... [07:20:44] 10DBA, 10Operations, 10ops-codfw: db2038 disk with predictive failure - https://phabricator.wikimedia.org/T177720#3668372 (10Marostegui) p:05Triage>03High [07:29:53] 10DBA, 10Operations, 10ops-codfw: db2038 disk with predictive failure - https://phabricator.wikimedia.org/T177720#3668379 (10Marostegui) No, they are actually two different disks indeed by looking at the serials. [07:30:14] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668381 (10Marostegui) [07:34:40] the alters finished on labsdb1009 [07:45:40] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668420 (10Papaul) @Marostegui this server is out of warranty 2017-07-10. We need to find out if any of the decommissioned servers have the same disks that we can use. [07:51:32] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668424 (10Marostegui) [07:57:45] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3668442 (10Marostegui) >>! In T172679#3650005, @Marostegui wrote: >>>! In T172679#3638670, @Marostegui wrote: >>>>! In T172679#3635468, @jcrespo wrote: >>> Yes, although we may need still an extra... [08:17:11] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3668477 (10Marostegui) I am going to change my plans and move db1072 instead of db1066. The reason for that change is that db1072 was cloned from db1052, so we are sure we still have its data on t... [08:21:06] 10DBA, 10Commons, 10Contributors-Team, 10MediaWiki-Watchlist, and 11 others: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis - https://phabricator.wikimedia.org/T171027#3668485 (10jcrespo) BTW, the decision was already mentio... [08:21:53] 10DBA, 10Commons, 10Contributors-Team, 10MediaWiki-Watchlist, and 10 others: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis - https://phabricator.wikimedia.org/T171027#3668488 (10jcrespo) [08:24:45] 10DBA, 10Commons, 10Contributors-Team, 10MediaWiki-Watchlist, and 11 others: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis - https://phabricator.wikimedia.org/T171027#3668493 (10jcrespo) 05Open>03Resolved Notification f... [08:32:38] jynus: I'm around for the help on disabling it on big wikis [08:32:50] We already have the patch but it's not merged [08:33:04] should I merge and deploy? [08:38:26] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3668560 (10Marostegui) [08:38:28] can I see it? [08:39:02] I didn't understood which one was of the ones mentioned on the ticket [08:51:09] jynus: The reedy's patches https://gerrit.wikimedia.org/r/#/c/383014/1 [08:51:13] this is one of them [08:51:24] then after merging, we just change config var [08:52:12] yes, let's do that [08:52:20] let's try to involve more people too [08:52:31] in case that breaks more things [08:52:37] :-) [08:53:27] although it should not change things until options are deployed, right? [08:53:46] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668658 (10Marostegui) [08:56:18] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3668664 (10Marostegui) [09:04:46] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668690 (10jcrespo) @Marostegui Should we do a master failover? We had planned it anyway- this is a good excuse. I know the answer is "yes, if we find the time" :-) Maybe I can take care of... [09:07:39] jynus: yes [09:07:42] that's correct [09:07:54] let me start to apply the hotfix [09:08:05] then make a patch and then deploy [09:08:05] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668714 (10Marostegui) This is not a master :-) But yes, s6 needs a master failover anyways to decommission db2028 (s6 master) and to finish T169501. There are no alters running on s6 or sch... [09:12:29] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668722 (10jcrespo) > This is not a master :-) Oh! Easier, then- so just pooling one new server in preparation for the failover. [09:15:29] jynus: by "analytics-store" do you mean stats machines? [09:15:57] I think people query from there, yes, but physically I mean dbstore1002 [09:16:32] they are specially prepared databases to retrieve stats :-) [09:17:02] okay, let me give it a try [09:19:20] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668742 (10jcrespo) But this doesn't solve our issue, db2038 is not supposed to go away :-/ [09:21:00] per https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas I should do "mysql -h analytics-store.eqiad.wmnet -A" on stat1006 but it doesn't and gives me access denied error [09:21:06] let me try it on stat1005 [09:21:07] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668748 (10Marostegui) >>! In T177720#3668742, @jcrespo wrote: > But this doesn't solve our issue, db2038 is not supposed to go away :-/ No, only hosts <2030 are supposed to go away. That i... [09:21:47] it works there, weird [09:22:07] uff, I cannot tell you about that [09:22:22] it may be outdated, I think they decommed some stats servers recently [09:22:48] I will tell analytics team later today (when they wake up, I guess they are mostly US-based) [09:23:06] in any case, long queries are expected on dbstore1002 [09:23:22] and it doesn't server production traffic [09:23:30] so no problem there :-) [09:23:59] Amir1: today is a holiday in the US as far as I remember btw [09:24:01] yeah, added the script to a screen there [09:24:12] ohh, lots of holidays :/ [09:24:21] i am sure they disgree XD [09:24:44] *disagree [09:25:15] disagree with what? [09:25:32] with: lots of holidays [09:25:33] :) [09:27:09] Amir1: that doesn't mean you cannot query production- it is ok [09:27:37] I understand, just not slow queries [09:28:03] I'm sorry, I hope I knew it [09:39:32] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3668778 (10Marostegui) I will not alter labsdb1001/labsdb1003 as they will be decommissioned in around 2 months (hopefully) and they will take ages to... [09:40:39] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3668780 (10Marostegui) [09:42:32] The patch is up, let me wait for jenkins and the I start the process of deploy https://gerrit.wikimedia.org/r/383093 [09:43:14] 10DBA, 10Operations, 10Puppet: Switch databases to the future parser - https://phabricator.wikimedia.org/T172498#3668792 (10Joe) 05Open>03Resolved [09:48:15] 10DBA, 10Operations, 10Puppet: Switch databases to the future parser - https://phabricator.wikimedia.org/T172498#3668819 (10jcrespo) But we didn't check the parameter changes, did you do that or did it finally work? Why resolve now? [09:52:45] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668825 (10Marostegui) [10:24:24] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668922 (10jcrespo) db2028 RAID claims it has 558.911 GB disks. db2038 RAID claims it has 600GB, maybe the actual size is the same? In that case the failover could actually help. [10:30:28] 10DBA, 10Operations, 10ops-codfw: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3668930 (10Marostegui) By looking at both hosts' disks serial numbers, they are both 600GB 15k SAS 3.5" so maybe we can exchange them. @Papaul probably knows better if we can exchange those... [10:32:45] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668933 (10Marostegui) [11:02:31] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3668976 (10Marostegui) [11:06:39] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3668995 (10Marostegui) [11:20:08] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3669022 (10Marostegui) [12:03:15] 10DBA: Update change tag indexes - https://phabricator.wikimedia.org/T42867#3669118 (10Marostegui) 05Open>03Resolved This is all done. I have checked all the shards and they look consistent. Obviously s3 could have had some stuff slipped, but in general looks good. [12:03:18] 10DBA, 10Wikimedia-Site-requests, 10Tracking: Database table cleanup (tracking) - https://phabricator.wikimedia.org/T18660#3669121 (10Marostegui) [12:13:29] 10DBA, 10Collaboration-Team-Triage, 10MediaWiki-extensions-CentralAuth, 10Notifications, and 2 others: CentralAuthCreateLocalAccountJob failing on meta due to Echo deadlocks - https://phabricator.wikimedia.org/T121161#1871031 (10Marostegui) Is this still relevant? [12:43:08] 10DBA, 10Readers-Community-Engagement, 10Community-Liaisons (Oct-Dec 2017), 10Patch-For-Review: Help communicate read-only time for Commons for schema change required by adding 3D filetype - https://phabricator.wikimedia.org/T176883#3669220 (10Marostegui) [13:27:29] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3669396 (10Marostegui) [14:14:51] 10DBA, 10MediaWiki-Watchlist, 10Wikidata: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) - https://phabricator.wikimedia.org/T177772#3669507 (10jcrespo) [14:20:59] marostegui: see ^ [14:33:07] I will do some testing on dbstore1002 [14:33:25] with a table copy [15:27:00] jynus: let me know the rsults, as I need to depool hosts from commons and wikidata for the other optimizes, I can include it too [15:27:44] oh, really? [15:27:48] yep :) [15:27:51] which ones? [15:27:58] the pagelinks and templatelinks [15:28:19] ah [15:28:21] in fact [15:28:22] I would say to stop [15:28:24] you can use db1091 [15:28:27] to test [15:28:28] it is depooled [15:28:31] for commons and ruwiki [15:28:41] I was about to repool it, but I can wait if you want to test there [15:28:43] we will do rcs at the same time [15:28:52] I will test on codfw/non-core [15:29:05] do not want to touch replication on gtid replicas [15:29:24] It is taking 1 hour to dump rc table on commons :-/ [15:29:29] imagine to load it [15:31:01] :| [15:31:14] ok I will stop "my" alters on s4 and s6 [15:31:29] so we can do all at the same time [15:31:47] You want to use db1091 or should I go ahead and repool it? [15:33:41] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3669698 (10Marostegui) [15:35:58] 10DBA, 10MediaWiki-Platform-Team, 10Performance-Team, 10monitoring: Improve database aplication performance monitoring visibility - https://phabricator.wikimedia.org/T177778#3669701 (10jcrespo) [15:38:22] 10DBA, 10Patch-For-Review: Refactor prometheus-mysqld-exporter to support multi-instance hosts - https://phabricator.wikimedia.org/T170666#3669714 (10jcrespo) This is done except Generate instance list to be monitored automatically from exported resources, which will be separated on its own task. [15:38:26] 10DBA, 10Operations, 10monitoring, 10Patch-For-Review, 10Prometheus-metrics-monitoring: MySQL monitoring with prometheus - https://phabricator.wikimedia.org/T143896#3669716 (10jcrespo) [15:38:28] 10DBA, 10Patch-For-Review: Refactor prometheus-mysqld-exporter to support multi-instance hosts - https://phabricator.wikimedia.org/T170666#3669715 (10jcrespo) 05Open>03Resolved [15:39:49] 10DBA, 10Operations, 10monitoring, 10Patch-For-Review, 10Prometheus-metrics-monitoring: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896#2582458 (10jcrespo) [15:42:54] 10DBA, 10Operations, 10monitoring: Generate instance list of database hosts to be monitored automatically from exported resources - https://phabricator.wikimedia.org/T177779#3669724 (10jcrespo) [15:43:36] 10DBA, 10monitoring, 10Epic, 10Wikimedia-Incident: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492#3669760 (10jcrespo) [15:43:38] 10DBA, 10Operations, 10monitoring, 10Patch-For-Review, 10Prometheus-metrics-monitoring: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896#2582458 (10jcrespo) [15:43:41] 10DBA, 10MediaWiki-Platform-Team, 10Performance-Team, 10monitoring: Improve database aplication performance monitoring visibility - https://phabricator.wikimedia.org/T177778#3669759 (10jcrespo) [15:43:54] going to repool db1091, don't want to leave it out for the night as it is one of the 512GB [15:43:57] 10DBA, 10monitoring, 10Epic, 10Wikimedia-Incident: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492#3500113 (10jcrespo) [15:43:59] 10DBA, 10MediaWiki-Platform-Team, 10Performance-Team, 10monitoring: Improve database aplication performance monitoring visibility - https://phabricator.wikimedia.org/T177778#3669701 (10jcrespo) [15:54:48] 10DBA, 10MediaWiki-Watchlist, 10Wikidata: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) - https://phabricator.wikimedia.org/T177772#3669807 (10Lydia_Pintscher) @hoo, @Ladsgroup Can you say if there is anything that would speak agains... [15:56:56] 10DBA, 10MediaWiki-Watchlist, 10Wikidata: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) - https://phabricator.wikimedia.org/T177772#3669809 (10hoo) This shouldn't cause any problems w.r.t Wikibase. [16:05:06] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3669822 (10hoo) After about 4 days we have: ``` mysql:wikiadmin@db1076 [trwiki]> SELECT COUNT(*) FROM wbc_entity_usag... [16:19:21] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3669843 (10hoo) On kowiki currently about 21.7% of all pages having `O` or `X` usages have seen a page link update sin... [16:19:42] 10DBA, 10monitoring, 10Epic, 10Wikimedia-Incident: Reduce false positives on database pages - https://phabricator.wikimedia.org/T177782#3669845 (10jcrespo) [16:24:18] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3669875 (10jcrespo) What you write is ok, but IF you want our opinion, can you translate that into increase of row sto... [16:29:08] Could not execute Update_rows_v1 event on table enwiki.archive; Can't find record in 'archive' [16:29:13] on db1095 [16:44:16] I think it is a problem with bad archive table on enwiki in db1065 [16:47:45] I will try to failover db1095 master to other host already on ROW, or do some surgery [17:23:48] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3670043 (10hoo) >>! In T151717#3669875, @jcrespo wrote: > What you write is ok, but IF you want our opinion, can you t... [17:27:15] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3670067 (10jcrespo) I have the feeling that these numbers could be meaningless on such small wikis, given the issues o... [17:50:26] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807#3175341 (10jcrespo) I believe there is data drift on db1065- it didn't break, but db1095 did, because it is using row based replication. The skipped transaction was applied manually with the data adquire... [17:53:27] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3670109 (10jcrespo) What I mean is that the number are ok to proceeed (not a big deal), but still worried for the larg... [18:05:10] 10DBA, 10MediaWiki-Watchlist, 10Wikidata: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) - https://phabricator.wikimedia.org/T177772#3670119 (10jcrespo) I am leaving a screen open on dbstore1002 loading a copy of recentchanges from co... [19:15:25] 10DBA, 10Operations, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3670375 (10jrbs) I've rebased (thanks, @Dzahn!) and scheduled this to go out on SWAT on Wednesday. [19:16:05] 10DBA, 10Operations, 10Support-and-Safety, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3670376 (10jrbs) [19:29:37] 10DBA, 10Collaboration-Team-Triage, 10MediaWiki-extensions-CentralAuth, 10Notifications, and 2 others: CentralAuthCreateLocalAccountJob failing on meta due to Echo deadlocks - https://phabricator.wikimedia.org/T121161#3670405 (10Tgr) It's [[https://logstash.wikimedia.org/goto/e1d94f4a57b3a0bb055847ac1914a5... [22:26:38] 10DBA, 10Operations, 10Support-and-Safety, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3670808 (10Reedy) >>! In T174370#3670375, @jrbs wrote: > I've rebased (thanks, @Dzahn!) and scheduled this to go out on SWAT on Wednesd... [22:33:47] 10DBA, 10Operations, 10Support-and-Safety, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3670812 (10jrbs) >>! In T174370#3670808, @Reedy wrote: >>>! In T174370#3670375, @jrbs wrote: >> I've rebased (thanks, @Dzahn!) and sche... [22:34:21] 10DBA, 10Operations, 10Support-and-Safety, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3670813 (10Reedy) >>! In T174370#3670812, @jrbs wrote: > What do you recommend? (I am absolutely new to this.) Ask me nicely and I'll... [22:36:15] 10DBA, 10Operations, 10Support-and-Safety, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3670814 (10jrbs) >>! In T174370#3670813, @Reedy wrote: >>>! In T174370#3670812, @jrbs wrote: >> What do you recommend? (I am absolutely... [22:39:09] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3670831 (10hoo) >>! In T151717#3670067, @jcrespo wrote: > I have the feeling that these numbers could be meaningless o...