[05:07:50] <wikibugs>	 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui)
[05:09:35] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: Decommission db1073.eqiad.wmnet - https://phabricator.wikimedia.org/T231892 (10Marostegui)
[05:39:43] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4] - https://phabricator.wikimedia.org/T202367 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` dbproxy1017.eqiad.wmnet ` The log can be found in `/var/log/wmf-au...
[06:12:37] <wikibugs>	 10DBA: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4] - https://phabricator.wikimedia.org/T202367 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1017.eqiad.wmnet'] `  and were **ALL** successful.
[07:32:52] <wikibugs>	 10DBA, 10Operations: Migrate MySQLs to use ROW-based replication - https://phabricator.wikimedia.org/T109179 (10jcrespo) @TK-999 Please note that this is an infrastructure limitation, which means it is mostly related to Wikimedia servers, not mediawiki. As I see it, our main limitations are:  * Compatibility f...
[07:58:42] <wikibugs>	 10DBA, 10Operations: Decommission dbproxy1005.eqiad.wmnet - https://phabricator.wikimedia.org/T231967 (10Marostegui)
[07:59:33] <wikibugs>	 10DBA, 10Operations: Decommission dbproxy1005.eqiad.wmnet - https://phabricator.wikimedia.org/T231967 (10Marostegui) p:05Triage→03Normal Nothing uses dbproxy1005,  but I am going to stop haproxy and leave it stopped for some hours before fully decommissioning this host just in case.
[07:59:48] <wikibugs>	 10DBA, 10Operations: Decommission dbproxy1005.eqiad.wmnet - https://phabricator.wikimedia.org/T231967 (10Marostegui)
[07:59:50] <wikibugs>	 10DBA: Remove grants for the old dbproxy hosts from the misc databases - https://phabricator.wikimedia.org/T231280 (10Marostegui)
[08:02:51] <wikibugs>	 10DBA, 10Performance-Team, 10Wikimedia-Rdbms, 10Patch-For-Review: SHOW SLAVE STATUS as a health check should have a low timeout - https://phabricator.wikimedia.org/T129093 (10jcrespo) BTW, I consider this a smaller issue once replication control was migrated to heartbeat- I am guessing some show slave stat...
[08:48:42] <wikibugs>	 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui)
[09:06:26] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review, 10User-notice: Switchover m1 primary master: db1063 to db1135: Tuesday 10th September at 16:00 UTC - https://phabricator.wikimedia.org/T231403 (10Trizek-WMF) Added for Tech News, since Etherpad service is quite used, and 16:00 UTC is a common meetings hour.
[09:08:13] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review, 10User-notice: Switchover m1 primary master: db1063 to db1135: Tuesday 10th September at 16:00 UTC - https://phabricator.wikimedia.org/T231403 (10Marostegui) >>! In T231403#5464235, @Trizek-WMF wrote: > Added for Tech News, since Etherpad service is quite used, and...
[09:17:49] <wikibugs>	 10DBA, 10MediaWiki-File-management, 10MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), 10Patch-For-Review, 10Performance-Team (Radar): Drop filejournal table from WMF - https://phabricator.wikimedia.org/T51195 (10Marostegui)
[10:43:42] <wikibugs>	 10DBA: Investigate possible memory leak on db1115 - https://phabricator.wikimedia.org/T231769 (10Marostegui) These were the figures before I stopped mysql over the last 2 (I gather data every 8 hours, so 3 times a day) days - we can see MySQL  memory growing every day: ` 29136 mysql     20   0  0.107t 0.063t  25...
[12:28:53] <wikibugs>	 10DBA: Investigate possible memory leak on db1115 - https://phabricator.wikimedia.org/T231769 (10Marostegui) 12:27:04 `  1630 mysql     20   0 67.313g 0.046t  24936 S 366.7 37.8 394:46.56 mysqld `  So almost 20GB more in less than 2 hours after enabling `event_scheduler`
[14:57:55] <Amir1>	 marostegui: jynus https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-group=core&var-shard=s8&var-role=All that's me running the script on up to Q2m, it probably will take two days and it's on a screen on mwmaint1002 in case you need to stop it
[14:58:10] <Amir1>	 for the rest, I will start making a patch on puppet
[14:58:49] <jynus>	 I guess there is replication control, right?
[15:00:21] <Amir1>	 jynus: yes plus +2 seconds on each 250 item batch for secondary replicas to catch up
[15:00:30] <Amir1>	 let me know if 2 seconds is not enough
[15:00:31] <jynus>	 cool
[15:00:40] <jynus>	 then the only worry would be it affecting performance
[15:00:52] <jynus>	 but I don't know if there are any probes specifically for s8
[15:01:24] <Amir1>	 right now it's writes mostly but we will get there when we are flipping the switch on read
[15:01:30] <Amir1>	 (If I understand you correctly)
[15:01:39] <jynus>	 oh, I was thinking of the batch job
[15:01:51] <jynus>	 as writes has some impact on the overal performance
[15:01:57] <jynus>	 that is ok
[15:02:11] <jynus>	 just it would be nice to have if it is significant
[15:02:20] <jynus>	 and a good metric for that
[15:02:45] <jynus>	 not only for that- I just don't know if there is already uncached wikidata.org metrics
[15:03:17] <jynus>	 look at: https://grafana.wikimedia.org/d/000000431/webpagereplay?refresh=15m&orgId=1
[15:03:27] <jynus>	 there are for several wikis, but not for wikidata, I think
[15:03:52] <jynus>	 well, group1, but you get the idea
[15:05:54] <Amir1>	 yeah
[15:06:10] <jynus>	 nothing actionable, just speaking my mind
[15:08:18] <jynus>	 I may create a ticket about commons and wikidata being on group1 not sure if it makes sense in all cases
[20:24:41] <wikibugs>	 10DBA, 10Performance-Team, 10Wikimedia-Rdbms, 10Patch-For-Review: SHOW SLAVE STATUS as a health check should have a low timeout - https://phabricator.wikimedia.org/T129093 (10Krinkle) p:05Triage→03Normal