[07:08:27] 10DBA, 13Patch-For-Review: Defragment db1070, db1082, db1087, db1092 - https://phabricator.wikimedia.org/T137191#3127486 (10Marostegui) 05Open>03Resolved All these hosts have been defragmented and now using file per table. [07:12:59] 10DBA, 10MediaWiki-Database: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3125897 (10Marostegui) Maybe 5.5.8? it is from 2010-12-03 https://dev.mysql.com/doc/relnotes/mysql/5.5/en/news-5-5-8.html [07:21:12] 10DBA, 10MediaWiki-Database: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127496 (10Marostegui) @Paladox would you mind using this template to create a schema change ticket? It is a bit easier and clearer for us to... [07:23:34] 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3127497 (10Marostegui) db1095 and db1064 are done: ``` root@neodymi... [07:24:18] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: *_minor_mime are varbinary(32) on WMF sites, out of sync with varbinary(100) in MW core - https://phabricator.wikimedia.org/T73563#3127498 (10Marostegui) db1095 and db1064 are done: ``` root@neodymium:~# for i in db1095 db1064; do echo $i; mysql --skip-s... [07:44:08] 10DBA: run pt-tablechecksum on s6 - https://phabricator.wikimedia.org/T160509#3127532 (10Marostegui) ruwiki finished too, so the whole shard has been checksumed. ruwiki has a few differences: ``` Differences on db2046 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY ruwiki.geo_tags 1 0 1 P... [07:58:08] 10DBA, 10Analytics, 10Analytics-EventLogging, 10ImageMetrics, 13Patch-For-Review: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3127562 (10Marostegui) Bad news, the table is there again @Nuria :-( ``` root@EVENTLOGGING m4[log... [08:18:57] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127584 (10Paladox) [08:21:16] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127586 (10Paladox) [08:26:57] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127591 (10Paladox) >>! In T161252#3127496, @Marostegui wrote: > @Paladox would you mind using this template to create a s... [08:55:57] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3127646 (10Marostegui) >>! In T17441#1420166, @jcrespo wrote: > This is an updated list for enwiki: > >... [09:03:56] 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3127680 (10Marostegui) [09:04:23] 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3028813 (10Marostegui) [09:04:25] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3127690 (10Marostegui) [09:05:04] 10DBA: run pt-tablechecksum on s5 - https://phabricator.wikimedia.org/T161294#3127691 (10Marostegui) [09:12:55] jynus: LMK if you want to talk re: mysqld-exporter topic we were mentioning yesterday [09:13:11] yes [09:13:30] so I am worried about the flexibility of the exporter [09:13:44] having to recompile and create packages for trivial changes [09:15:02] and if that fear is right- if we could create an extension to query aditional values in a scripting way [09:15:23] that a simple puppet change could provide those extra values [09:15:26] I'm assuming you mean if we'd like to run arbitrary queries? [09:15:46] maybe [09:15:46] yes [09:16:23] let me give you an example [09:16:30] pt-heartbeat is what we use [09:16:51] but I had to mod it to support multi-dc [09:17:05] and multi-source [09:17:26] this is not a one-time change, it could evolve in the future [09:17:37] it is a trivial change on the source code [09:17:53] just 2 "ands" on the existing exporter [09:18:00] but what happens next? [09:18:07] when we have to change again? [09:18:24] also, we have to maintain a fork of the original code? [09:18:43] there are many thing that the exporter do not support (because mariadb) [09:19:04] the change is not that large to throw away the exporter [09:20:52] heh I see what you mean, if we have to change the queries that the exporter does then yeah I don't see another way but keeping a modified copy on gerrit [09:21:02] if the changes are generic enough of course we can talk to upstream about it [09:21:26] I wanted to check with you what is the right way to go [09:21:38] to work the least [09:21:50] but at the same time being flexible [09:21:56] I love that it is compiled [09:21:57] 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: db1057 does not react to powercycle/powerdown/powerup commands - https://phabricator.wikimedia.org/T160435#3127734 (10Marostegui) [09:22:07] because probably that means it is more efficient [09:22:29] but regarding databases, on the WMF we have far from a standard installation [09:22:37] remember we even have our own packaging [09:22:50] and our configuration and data is not really standard [09:23:05] hehhe I'd be surprised otherwise [09:23:15] mose of it comes from wmf needs [09:23:28] and mariadb, which is not really that well supported [09:23:36] but also the data we have [09:24:04] for example, some of the monitoring is not for mysql, but for the queries and data [09:24:27] then there is the privacy needs, which makes us special [09:25:30] indeed, speaking of which you also were thinking about a private prometheus instance of that kind of data? [09:25:41] yes [09:26:02] because remember we were advised not to do it on a public instance [09:26:04] however [09:26:28] what I am talking here is for the public part [09:26:41] we need a special pt-heartbeat query [09:26:54] and we need multi-source support [09:27:15] otherwise tendril has more functionality than prometheus [09:27:21] and that is a regression [09:27:37] (an important one because replication monitoring is a #1 priority [09:27:42] that can be public [09:28:12] if then I set a private monitoring with a script, that is something I can do [09:28:55] indeed, I'll open a task about trying the latest version of mysqld_exporter specifically for pt-heartbeat and multi-source replication [09:29:06] would the latter work out of the box you think? [09:29:10] not pt-heartbeat [09:29:20] pt-heartbeart-wmf [09:29:29] it is a patched pt-heartbeat [09:29:37] and I can tell you already it is not compatible [09:30:26] because it involves "and dc='eqiad' ORDER BY ts DESC LIMIT 1" [09:30:34] because it involves "and dc='codfw' ORDER BY ts DESC LIMIT 1" [09:31:05] remember aaron patch you asked about? [09:31:27] he partially did that because there was no support on prometheus for what he wanted [09:32:38] ah I see, ok so that'd require a change for sure I think, what about multi-source replication? [09:32:50] that should probably be upstream [09:32:57] and it only requires a single change [09:33:24] if mariadb_version > X SHOW ALL SLAVES STATUS else SHOW SLAVE STATUS [09:33:31] that should be upstream [09:34:31] check hosts like db1047 https://grafana.wikimedia.org/dashboard/db/mysql?var-dc=eqiad%20prometheus%2Fops&var-server=db1047 [09:35:36] I am sorry to be so needy [09:36:15] but mysql is not only a stateful service [09:36:32] it is a turing-complete developement platform [09:37:58] heheh fair enough, also where we host most/all of the important data [09:38:29] task is https://phabricator.wikimedia.org/T161296 I was looking at the code and for multi-source replication it seems to do the right thing already [09:38:44] tries SHOW ALL SLAVES STATUS and fallback to SHOW SLAVE STATUS [09:38:44] in the new version? [09:38:55] yeah [09:38:59] ah [09:39:18] becaues in the link above I get no metrics! [09:42:36] mhh we should debug that, it might be related to the new prometheus servers I just pooled [09:43:52] 10DBA: run pt-tablechecksum on s5 - https://phabricator.wikimedia.org/T161294#3127768 (10Marostegui) The following tables need to be excluded from the checks as they do not contain a PK: dewiki ``` archive_save categorylinks change_tag click_tracking click_tracking_user_properties cur edit_page_tracking hidden i... [09:45:33] marostegui, someone changed labs passwords [09:45:39] what? [09:46:10] we have a security incident [10:28:44] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127871 (10Reedy) Noting this doesn't need doing on every wiki as some will already have the new index Plus, this needs s... [10:33:38] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127876 (10jcrespo) I am not sure we should agree to this change, all (or almost all) wikis have the usertext_timestamp, a... [10:37:38] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127882 (10jcrespo) Looking at T154872, this seems like a mediawiki issue, not a production database issue, so I will decl... [10:38:48] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127884 (10jcrespo) 05Open>03declined [10:43:37] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127890 (10Reedy) >>! In T161252#3127876, @jcrespo wrote: > I am not sure we should agree to this change, all (or almost a... [10:44:58] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127892 (10Reedy) And the canonical non prefixed one is... ``` CREATE INDEX /*i*/usertext_timestamp ON /*_*/revision (rev... [10:51:34] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127908 (10jcrespo) T154872#2926839 - yes, mediawiki made an incompatible change with production. They should fix it. I d... [10:52:08] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127909 (10Paladox) As this was declined I guess we can continue eight eh patches I have for moving us to the ar_ prefix.... [10:53:09] 10DBA, 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 13Patch-For-Review, 07Wikimedia-Multiple-active-datacenters: Decouple Mariadb semi-sync replication from $::mw_primary - https://phabricator.wikimedia.org/T161007#3127910 (10jcrespo) So s/both/master/ ? [10:53:29] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127914 (10Paladox) >>! In T161252#3127908, @jcrespo wrote: > T154872#2926839 - yes, mediawiki made an incompatible chang... [10:57:58] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127930 (10jcrespo) No, you can do (propose?) whatever you want on mediawiki code to fix the other wikis -feel free to sen... [11:00:22] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127932 (10Paladox) We could add a config to configure what index to use, but at the same time deprecate the config. Will... [11:05:55] 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3127947 (10Marostegui) db2019 is done, that means that the whole co... [11:16:25] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3127954 (10jcrespo) The easiest way is to rename the index for sqlite-only. We were not notified this was going to happen-... [11:24:20] 10DBA, 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 13Patch-For-Review, 07Wikimedia-Multiple-active-datacenters: Decouple Mariadb semi-sync replication from $::mw_primary - https://phabricator.wikimedia.org/T161007#3127963 (10jcrespo) Actually, I am not sure that is needed- things flopped because ALL s... [11:35:28] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3128004 (10Reedy) >>! In T161252#3127954, @jcrespo wrote: > The easiest way is to rename the index for sqlite-only. We wer... [11:38:36] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3128006 (10jcrespo) Let's talk in the parent ticket and I will definitely help. [11:39:48] 10DBA: run pt-tablechecksum on s5 - https://phabricator.wikimedia.org/T161294#3128011 (10Marostegui) Filters enabled on the rc slaves: ``` Replicate_Wild_Ignore_Table: dewiki.__wmf_checksums,wikidatawiki.__wmf_checksums ``` And dsns table populated for s5 [11:42:42] 10DBA: run pt-tablechecksum on s5 - https://phabricator.wikimedia.org/T161294#3128013 (10jcrespo) Heads up! double check lately the list members. db1057 breaking down created some movements on the servers. [11:44:44] 10DBA: run pt-tablechecksum on s5 - https://phabricator.wikimedia.org/T161294#3128019 (10Marostegui) >>! In T161294#3128013, @jcrespo wrote: > Heads up! double check lately the list members. db1057 breaking down created some movements on the servers. I did! Thanks for the heads up! :) [12:14:31] 10DBA: run pt-tablechecksum on s5 - https://phabricator.wikimedia.org/T161294#3127691 (10Marostegui) I am running it now on dewiki, but I will NOT leave it running for the weekend. [12:15:56] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: *_minor_mime are varbinary(32) on WMF sites, out of sync with varbinary(100) in MW core - https://phabricator.wikimedia.org/T73563#3128056 (10Marostegui) db1056 and db1069 are done: ``` root@neodymium:/home/marostegui/databases_s5# mysql --skip-ssl -hdb1... [12:16:39] 10DBA, 13Patch-For-Review: es2014 revert data compression - https://phabricator.wikimedia.org/T129350#3128057 (10jcrespo) 05Open>03Resolved root@es2014[(none)]> SELECT table_schema, table_name FROM information_schema.tables WHERE Engine='InnodB' AND row_format='compressed'; Empty set (6.40 sec) [12:16:44] 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3128059 (10Marostegui) db1056 and db1069 are done: ``` root@neodymi... [13:13:24] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3128104 (10Marostegui) To clarify a bit, the list of tables which are in use and have a UNIQUE key, and we... [14:30:00] "We think the query pointer is invalid, but we will try to print it anyway. Query: SELECT * FROM user WHERE us_user = ?" [14:30:29] nothing related to the socket, but it would be a lot of coincidence that both happens at the same exact time [14:47:35] what's that? [14:48:25] I think I made labsdb1005 crash [14:50:19] you sure? it also crashed last night [14:50:25] oh [14:51:16] with a different query though [14:51:31] no, last log is from 170304 [14:51:52] ha, what a coincidence: 70223 21:26:33 [14:51:55] and yesterday was 23 too [14:51:56] 10DBA, 10Analytics, 10Analytics-EventLogging, 10ImageMetrics, 13Patch-For-Review: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3128269 (10Nuria) Note that record is from 20170318 , a timestamp before the blacklisting changes... [14:51:57] but 03 [14:52:48] 10DBA, 10Analytics, 10Analytics-EventLogging, 10ImageMetrics, 13Patch-For-Review: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3128270 (10Marostegui) >>! In T141407#3128269, @Nuria wrote: > Note that record is from 20170318 ,... [14:56:01] wait it makes no sense [14:56:07] I changed /tmp [14:56:32] but the socket for labsdb1005 is on /var/run/mysqld [14:56:44] what is tmpdir? [14:57:16] /srv/labsdb/tmp [14:57:30] was the same before? [14:58:32] we can check puppet logs to see if it was changed sometime [14:58:49] https://gerrit.wikimedia.org/r/#/c/341503/ [14:59:36] so I think it is related [15:00:09] a query with a temporary table may have forced a restart [15:00:19] after permissions on /tmp changed [15:00:50] instead of failing the query it makes mysql crash? [15:00:50] that is ugly [15:01:02] well [15:01:04] in a way [15:01:13] /tmp had the wrong permissions [15:01:17] to start with [15:01:53] yep, that is fine, what I say is that it is a bad behaviour, it should fail the query or something but not crash the whole server, no? [15:02:40] maybe it was in the middle of it [15:51:26] 10DBA, 10Analytics, 10Analytics-EventLogging, 10ImageMetrics, 13Patch-For-Review: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3128398 (10Marostegui) I dropped it on all the hosts ``` root@neodymium:/home/marostegui/databases... [15:55:19] 10DBA, 10Analytics, 10Analytics-EventLogging, 10ImageMetrics, 13Patch-For-Review: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3128422 (10Nuria) mmm.. master? man, bermuda triangle problem. I was expecting this came from the... [15:55:38] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 10ImageMetrics, 13Patch-For-Review: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3128423 (10Nuria) [15:57:08] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 10ImageMetrics, 13Patch-For-Review: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3128426 (10Marostegui) >>! In T141407#3128422, @Nuria wrote: > mmm.. master? man, bermuda t... [16:32:51] milimetric: meeting https://hangouts.google.com/hangouts/_/wikimedia.org/nuria?authuser=0 [17:14:43] heading out for a bit, I'll be back later maybe or work tomorrow [17:50:07] 10DBA, 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 13Patch-For-Review, 07Wikimedia-Multiple-active-datacenters: Decouple Mariadb semi-sync replication from $::mw_primary - https://phabricator.wikimedia.org/T161007#3128745 (10jcrespo) I will set this up on s6-codfw as a test, but next week.