[06:02:29] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2050 - https://phabricator.wikimedia.org/T216670 (10Marostegui) p:05Triage→03Normal a:03Papaul Let's get the disk changed @Papaul - thanks! [06:15:10] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) I don't think this is too useful https://phabricator.wikimedia.org/P8114 :( [06:25:10] 10DBA, 10MediaWiki-API: API problem with usercontribs - https://phabricator.wikimedia.org/T216656 (10Marostegui) For s2 we can probably decrease the main traffic weight for the rc replicas (db1103 and db1105) as the other hosts I think will have no problem to assume the traffic, but this is another case where... [06:29:33] 10DBA, 10MediaWiki-API: API problem with usercontribs - https://phabricator.wikimedia.org/T216656 (10Marostegui) p:05Triage→03Normal [09:31:38] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10Patch-For-Review, 10User-Ladsgroup: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 (10Marostegui) [09:38:04] 10DBA, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Increase parsercache keys TTL from 22 days back to 30 days - https://phabricator.wikimedia.org/T210992 (10Marostegui) In a couple of days there it will be a month since I switched the TTL from 22 days to 24. There has not been any issues... [09:43:46] 10DBA, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Increase parsercache keys TTL from 22 days back to 30 days - https://phabricator.wikimedia.org/T210992 (10jcrespo) @Marostegui Did the hit rate increase? [09:44:33] 10DBA, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Increase parsercache keys TTL from 22 days back to 30 days - https://phabricator.wikimedia.org/T210992 (10Marostegui) There is no significant increase that can be seen on the graphs, but also 2 days might be too low to notice something [10:06:32] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10Patch-For-Review, 10User-Ladsgroup: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 (10Marostegui) [10:07:16] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10Patch-For-Review, 10User-Ladsgroup: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 (10Marostegui) All the hosts are done except db1067 (s1 master T210713#4967984 ) which I will try a few more times before sta... [10:24:09] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10Patch-For-Review, 10User-Ladsgroup: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 (10Marostegui) [10:39:12] 10DBA, 10Patch-For-Review: BBU issues on codfw - https://phabricator.wikimedia.org/T214264 (10Marostegui) [10:43:34] 10DBA, 10Patch-For-Review: BBU issues on codfw - https://phabricator.wikimedia.org/T214264 (10Marostegui) [11:10:56] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) a:03Marostegui As this drift has already created some issues I will try to work on this as a background task, trying to fix hosts slowly but steady. Now that we can... [11:19:56] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) @Daimona I have done a quick grep on `mediawiki-extensions-AbuseFilter` and on `mediawiki-core` repo to make sure there are no `FORCE INDEX` on any of the following o... [11:24:23] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Daimona) @Marostegui I checked as well and there seems to be no FORCE INDEX at all. BTW should you need anything else I can reach out to you on IRC or wherever you want. [11:24:57] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) Thank you! [12:31:15] 10DBA, 10Analytics, 10MediaWiki-Database, 10Research, 10Wikidata: Improve interlingual links across wikis through Wikidata IDs - https://phabricator.wikimedia.org/T215616 (10diego) I think we are talking about three different things: i) page_id -> CurrentWikidataItem: this was my original request, and... [12:48:58] 10DBA, 10Analytics, 10MediaWiki-Database, 10Research, 10Wikidata: Improve interlingual links across wikis through Wikidata IDs - https://phabricator.wikimedia.org/T215616 (10JAllemandou) We're on the same page @diego :) I can precompute the table described in ii) if needed, and will surely do it once we... [13:14:08] so I don't know how to handle logging, but for now the process logs handy errors like [13:14:17] ERROR:backup:Expecting 1 matching snapshot for s1, found 2 [13:14:24] DEBUG:backup:['/opt/wmf-mariadb101/bin/mariabackup', '--prepare', '--target-dir', 'snapshot.s1.2019-01-22--14-21-28', '--innodb-buffer-pool-size', '10G'] [13:14:53] I think that is good enough for now! [13:15:17] no, I mean it can do that, but right now it is hardcoded into a debug.log output [13:15:27] I don't know how to do it for real [13:16:12] it is almost working [13:18:44] I have allowed to do double-prepares because --prepare is idempotent [13:19:04] you lost me on that one :) [13:20:48] so while doing testing, I prepared twice a backup, but it did nothing and said "completed OK" [13:21:11] so I will not try to check if xtrabackup --prepare has been run already [13:22:31] Ah I see [13:24:02] I will ignore for now the configurable buffer pool size [13:24:28] ok [13:24:43] I think, if it was to be made configurable, it should probably be on a mysql config file [xtrabackup] [13:25:08] but it can be useful if you need to run the dump_section manually no? [13:25:28] I am fearfull of too much configurability [13:25:41] it makes it complicated [13:25:49] I am not saying it cannot be added [13:25:59] just I am first focusing on making it work [13:26:04] sure thing [13:26:31] so compressing a backup is no fast operation [13:26:56] but in order to prepare it has to be decompressed (xbstream), prepared, and compressed again [13:29:35] compressing it, even in parallel, may take 30 minutes, as it is >1TB of data [13:30:44] maybe more, specially on dbstore1001 [13:30:53] we may need those ssds [13:31:24] we'll see the quotes [13:31:51] I will make you "suffer" it so you can check it for yourself [13:32:10] I don't need to suffer it, I know we need the SSDs [13:32:19] I suffer not having them with any operation on all codfw hosts [13:32:27] s/all/old [14:05:23] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295 (10Marostegui) [14:37:35] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10Marostegui) [15:27:15] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2050 - https://phabricator.wikimedia.org/T216670 (10Papaul) a:05Papaul→03Marostegui disk replaced [15:28:07] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2050 - https://phabricator.wikimedia.org/T216670 (10Marostegui) Thanks! ` logicaldrive 1 (3.3 TB, RAID 1+0, Recovering, 2% complete) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Rebuilding) ` [16:09:52] 10DBA, 10MediaWiki-API, 10Patch-For-Review: API problem with usercontribs - https://phabricator.wikimedia.org/T216656 (10Marostegui) >>! In T216656#4972232, @gerritbot wrote: > Change 491993 had a related patch set uploaded (by Anomie; owner: Anomie): > [mediawiki/core@master] ApiQueryUserContribs: Only use... [16:16:08] 10DBA, 10MediaWiki-API, 10Patch-For-Review: API problem with usercontribs - https://phabricator.wikimedia.org/T216656 (10Anomie) Here is good. Looking at the code, it looks like it does fall back to the "main" group if there aren't any usable replicas in the specified group. [16:26:06] 10DBA, 10MediaWiki-API, 10Patch-For-Review: API problem with usercontribs - https://phabricator.wikimedia.org/T216656 (10Marostegui) Thanks for checking although now that I think about it, it is pretty much the same thing, it will timeout anyways (as we have seen) :-) [16:49:39] I am going to do a backup of db1117:s5 [16:50:11] heads up in case it creates some lag or saturation [16:50:21] sorry, I meant db1117:m5 [17:29:57] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Anomie) >>! In T51199#4970715, @Marostegui wrote: > I don't think this is too useful https://phabricator.wikime... [18:41:55] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10Cmjohnson) @Marostegui I will need to swap DIMM B3 and B7 to the A side. LMK when the server is down and ready [18:44:50] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10jcrespo) I will put it down now (it is out of service, I only need to downtime it on icinga) [18:48:25] db1118 had some connection issues (only a small spike, but heads up if it reocurres) [19:28:44] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10Cmjohnson) Before DIMM Swap racadm log /admin1-> racadm getsel Record: 1 Date/Time: 11/04/2017 15:21:07 Source: system Severity: Ok Description: Log cleared... [19:31:16] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10Cmjohnson) @jynus @marostegui I swapped DIMM B3 to A3 and B7 to A7 and cleared the idrac log. Please put some stress on the server and let's monitor. [19:40:27] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10jcrespo) Thanks, I have left it warming the buffer pool/replicating, tomorrow I will create a backup to touch all memory space. [20:20:50] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10Marostegui) @jcrespo maybe we can leave a mydumper running 24x7 on a loop for days on that host: dumping everything, deleting the backups file, dump everyting and so forth.