[04:56:55] 10DBA, 10Wikidata, 10wikidata-tech-focus: Rename tmp1 index on wb_terms databases to something more meaningful - https://phabricator.wikimedia.org/T197854 (10Marostegui) >>! In T197854#4512107, @Addshore wrote: > Should we push forward with the rename or should we add it to our code base with the name "tmp1"... [04:58:53] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Scoring-platform-team, 10User-Ladsgroup: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 (10Marostegui) MySQL doesn't support index renaming, we'd need to drop+create anyways. [04:59:11] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Scoring-platform-team, 10User-Ladsgroup: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 (10Marostegui) p:05Triage>03Normal [05:12:15] 10DBA, 10Operations: rack/setup/install dbproxy101[2-7].eqiad.wmnet - https://phabricator.wikimedia.org/T196690 (10Marostegui) Thank you guys! We'll take it from here [05:17:16] 10DBA, 10CX-analytics, 10Language-2018-July-September: Allow reportupdater scripts access to the cx_translations table in the wikishared database - https://phabricator.wikimedia.org/T201996 (10Marostegui) I have commented on your patch. [06:46:54] 10DBA, 10MediaWiki-Authentication-and-authorization: Interface to manage account links to external sites - https://phabricator.wikimedia.org/T173637 (10Marostegui) >>! In T173637#4510617, @Anomie wrote: >>>! In T173637#4508568, @Legoktm wrote: >> I like your proposed schema, but I don't think we need the `user... [07:58:41] 10DBA, 10Data-Services, 10Patch-For-Review: labsdb1004's toolsdb mariadb is lagging behind labsdb1005 - https://phabricator.wikimedia.org/T202055 (10jcrespo) 05Open>03Resolved This is all resolved from an infrastructure perspective- tool maintainers, please reopen if some change or question happens on yo... [08:03:35] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10Marostegui) I have compared the tables: `echo_target_page`, `echo_event` `echo_notification` across all wikis and no differences have been found. So I believe we are good to go [08:36:01] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore2001 [] dbstore1002 [] db2094 [] db2089 [] db2084 [x] db2075 T114117#4483970 [] db2066 [] db2059 [] db2052 [] db2038 [] db... [08:36:17] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore2001 [] dbstore1002 [] db2094 [] db2089 [] db2084 [x] db2075 T5119... [08:36:42] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore2001 [] dbstore1002 [] db2094 [] db2089 [] db2084 [x] db2075 T67448#4483976 [] db2066 [... [08:37:08] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) I have altered db1100 and I will leave it there and check if there are any regressions on queries as that host receives reads [08:58:15] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10jcrespo) Not sure exactly how you checked, but I saw one error in the first 20 wikis I checked: ``` echo angwikiquote | while read db; do echo "$db..."; ./compare.py wikidatawiki echo... [08:58:59] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10Marostegui) I checked against db2033 only [09:09:14] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10jcrespo) Let's fix the issue above and let me continue a full check- at the moment there is no rush to repool it, we can reevaluate later. We may find more issues, even if not relevant... [09:10:33] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10Marostegui) Yeah, looks like there might be inconsistencies eqiad <-> codfw for all hosts :( [09:29:26] 10DBA: Failover DB masters in row D - https://phabricator.wikimedia.org/T186188 (10Marostegui) 05Open>03stalled Stalling this as the switch upgrade isn't clear now how it will proceed as per the network issues found at T201145 [09:37:52] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10MW-1.32-release-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)), and 2 others: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273 (10Addshore) 05Open>03stalled Marking as stalled as this is on Hold [09:39:46] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: synchronize schema on production with what is created on install - https://phabricator.wikimedia.org/T85414 (10Addshore) [09:40:03] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10wikidata-tech-focus: synchronize schema on production with what is created on install - https://phabricator.wikimedia.org/T85414 (10Addshore) [09:43:04] 10DBA, 10Wikidata, 10wikidata-tech-focus: Rename tmp1 index on wb_terms databases to something more meaningful - https://phabricator.wikimedia.org/T197854 (10Addshore) I have created T202265 to add the current index to our SQL files. We will hold off on renaming for now, the plan is for this whole table to d... [09:43:20] 10DBA, 10Wikidata, 10wikidata-tech-focus: Rename tmp1 index on wb_terms databases to something more meaningful - https://phabricator.wikimedia.org/T197854 (10Addshore) 05stalled>03declined [09:43:59] 10DBA, 10Wikidata, 10wikidata-tech-focus: Rename tmp1 index on wb_terms databases to something more meaningful - https://phabricator.wikimedia.org/T197854 (10Marostegui) Sounds good to me! Thanks @Addshore [09:57:24] [= [09:58:08] addshore: I guess that ticket is also a duplicate of https://phabricator.wikimedia.org/T85414 [09:58:13] (the one you created) [09:58:27] well, I'll leave the one I created to be just for tmp1 [09:58:36] Sure [09:58:42] then I'll check over all of the other schemas at some point and possible file other tickets or close the parent [09:58:48] Sounds good! [09:58:49] Thank you [09:58:54] * marostegui meeting now! [09:59:18] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10wikidata-tech-focus: synchronize schema on production with what is created on install - https://phabricator.wikimedia.org/T85414 (10Addshore) The only sub task is currently for tmp1, the next step for this task would be to go through all of th... [09:59:32] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10wikidata-tech-focus: synchronize schema on production with what is created on install - https://phabricator.wikimedia.org/T85414 (10Addshore) [11:51:09] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10jcrespo) Strange, maybe there is a bug or a race condition? ``` angwikisource... 2018-08-20T09:27:39.462984: row id 269950842/273790891, ETA: 00m22s, 0 chunk(s) found different DIFFEREN... [11:52:39] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10Marostegui) Maybe it was caught in the middle of a transaction or something? [11:55:33] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10jcrespo) It could be the missing auto-commit, only taking an effect over WAN: https://gerrit.wikimedia.org/r/#/c/operations/software/wmfmariadbpy/+/449185/ [12:44:45] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: db1009 overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10jcrespo) 05Resolved>03Open [12:47:18] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: db1009 overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10jcrespo) ``` MariaDB [(none)]> select user, host, count(*) FROM information_schema.processlist GROUP BY USER, HOST; +-----------------+-------... [12:48:09] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: db1009 overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Marostegui) p:05Normal>03High [12:48:41] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: db1009 overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10jcrespo) From manuel https://grafana.wikimedia.org/dashboard/db/mysql?panelId=37&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server... [12:48:54] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10jcrespo) [12:54:14] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10jcrespo) Causing wikitech access errors, among others: https://logstash.wikimedia.org/goto/4d71579b957ae7e197c04882fa9dcd7c [13:01:44] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10aborrero) I asked the DBA team to raise limits for now to avoid contention. We should work on a long term solution to avoid saturating the... [13:01:51] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Marostegui) For now I have done: ``` root@db1073.eqiad.wmnet[(none)]> show global variables like 'max_connections'; +-----------------+---... [14:04:14] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Scoring-platform-team, 10User-Ladsgroup: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 (10Anomie) I think they added index renaming in MySQL 5.7. I don't know if MariaDB picked it up yet ([[https://jira.mariadb... [15:03:32] 10DBA, 10Operations, 10monitoring: HAproxy on dbproxy hosts lack enough logging - https://phabricator.wikimedia.org/T201021 (10Marostegui) The timeout increase was done at: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/450542/ [15:13:36] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Scoring-platform-team, 10User-Ladsgroup: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 (10jcrespo) I am not saying this is needed, but technically the safe way to rename an index (without T202167#4514755) is to... [15:15:24] 10DBA, 10Operations, 10monitoring: HAproxy on dbproxy hosts lack enough logging - https://phabricator.wikimedia.org/T201021 (10jcrespo) Should we add prometheus-haproxy-exporter in scope of this, too? [15:16:23] 10DBA, 10Operations, 10monitoring: HAproxy on dbproxy hosts lack enough logging - https://phabricator.wikimedia.org/T201021 (10jcrespo) No need, tracked on T191400 [15:25:42] 10DBA, 10Operations, 10monitoring: HAproxy on dbproxy hosts lack enough logging - https://phabricator.wikimedia.org/T201021 (10Marostegui) My proposal to get this logging would be to enable: https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#4.2-option%20log-health-checks ``` When this option is... [15:39:31] so +1 to "option log-health-checks", my only question is if that is enough? [15:40:07] I want to test it on a sby proxy [18:40:05] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Bstorm) I've more than halved the number of nova workers. I didn't see a big drop in the usage on grafana this time. One thing I haven't... [18:45:40] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Bstorm) The biggest issue overall with the current level was that cloudcontrol1003 has so many cpu cores and worker values in openstack mit... [19:01:24] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Bstorm) The fact that the idle timeout for api database connections is set at an hour by default might be why it didn't drop right away... [19:32:48] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Bstorm) Actually, nova-api db connections are down to 11 :) Looks like the only remaining problem is nova db itself (nova-conductor). Tha... [19:55:04] 10DBA, 10Cloud-Services, 10Operations, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Marostegui) Thanks a lot Brooke for getting this fixed. I will go back to 500 as max_connections tomorrow morning as it looks fine now. [22:32:29] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10kaldari) [22:33:09] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10kaldari) [22:45:49] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10stjn) A small comment, sorry if I am asking in the wrong place: in the documentation I didn’t see anywher... [22:50:13] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) @stjn Thanks for bringing it up, we do see this kind of abandonment as a possibility. So far, ou...