[05:29:32] marostegui: morning, ipblock_restrictions.ir_type is tinyint(4) in production while it's tinyint(1) in code [05:29:47] Making a task to investigate [05:30:25] Amir1: thank you! [05:30:47] will try to get that fixed before we are back in eqiad [05:31:37] oh and obviously wikitech is a mess :D [05:32:32] yeah, wikitech is going to be fun... [05:33:54] 10DBA: ipblocks_restrictions.ir_type is tinyint(1) in code but tinyint(4) in production - https://phabricator.wikimedia.org/T265321 (10Ladsgroup) [05:35:09] 10DBA, 10Data-Persistence: Evaluate the impact of changing innodb_change_buffering to inserts - https://phabricator.wikimedia.org/T263443 (10Marostegui) So far, the values haven't increased yet, so going to change it back on pc2009 to `inserts` to see if it goes back to higher values. [05:35:12] oh, everything there is char instead of binary.... [05:36:27] Can we get a max value for the ipblock_restrictions.ir_type across the fleet? [05:36:47] Amir1: Sorry I needed more coffee, I didn't read it was tinyint, it doesn't make any difference on mysql whether it is tinyint(1) or tinyint(4) really [05:36:55] it is just a display width for the client [05:37:02] but internally it is the same for storing the values [05:37:50] does the max value differ? [05:38:04] the max value they can store? [05:38:09] yup [05:38:11] no, it is essentially tinyint [05:38:15] similar with char(4) and char(1) [05:38:38] if not, then if possible (not much work, can it be done if possible) to fix it? since it'll spam the reports [05:38:38] nope, for int types, the (X) is just a visualization thing [05:38:58] cool [05:39:08] Amir1: yeah, I can fix it, but it is not an internal drift :) [05:40:44] 10DBA, 10Data-Persistence: ipblocks_restrictions.ir_type is tinyint(1) in code but tinyint(4) in production - https://phabricator.wikimedia.org/T265321 (10Marostegui) p:05Triage→03Low Internally tinyint(1) and tinyint(4) are the same for the values stored. It is just a visualization width indication, which... [05:41:06] Amir1: on which section did you check this? [05:41:24] anywhere I ran I saw it [05:41:31] haha nice [05:41:48] s2 and s7 showed up right now [05:41:54] if you want to be 100% sure [05:45:04] don't worry, I will check everywhere [05:45:07] thanks for the ticket [05:45:59] 10DBA, 10Growth-Structured-Tasks, 10Growth-Team: Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Marostegui) >>! In T261411#6536246, @kostajh wrote: >> And the plan is to roll that out everywhere after Q2? > > Well, we cu... [05:53:47] Amir1: if it is 4 everywhere, it will be easier to fix in code I think [05:53:53] Definitely less time consuming [05:54:51] sure [05:54:58] let's see if it's everywhere [05:55:04] so far it has been [05:55:12] yep, I am checking now [05:59:23] 10DBA, 10Data-Persistence: ipblocks_restrictions.ir_type is tinyint(1) in code but tinyint(4) in production - https://phabricator.wikimedia.org/T265321 (10Marostegui) So tinyint(4) happens on: s1, s2, some wikis of s5 (the old ones, as s5 also holds some new ones), s6, s7, s8. s3 is on going [06:28:33] 10DBA, 10Data-Persistence: ipblocks_restrictions.ir_type is tinyint(1) in code but tinyint(4) in production - https://phabricator.wikimedia.org/T265321 (10Marostegui) s3 has tinyint(4) on all the wikis but the most recent ones, which is a total of 23 out of 905. It is probably less time consuming (and way less... [06:30:34] 10DBA, 10Data-Persistence: ipblocks_restrictions.ir_type is tinyint(1) in code but tinyint(4) in production - https://phabricator.wikimedia.org/T265321 (10Ladsgroup) Sure thing! I can apply it with the abstract schema change (since the schema of that table itself is abstracted now) to make the patch quick and... [07:33:25] 10DBA, 10Data-Persistence, 10Operations, 10Release-Engineering-Team-TODO, and 2 others: Create integration test env for wmfmariadbpy - https://phabricator.wikimedia.org/T265266 (10hashar) From yesterday discussion: * we do not run arbitrary images * the CI job usually just run an image using Docker, the i... [08:26:56] 10DBA, 10Data-Persistence, 10Schema-change, 10User-Ladsgroup: ipblocks_restrictions.ir_type is tinyint(1) in code but tinyint(4) in production - https://phabricator.wikimedia.org/T265321 (10Marostegui) a:03Ladsgroup Excellent - thank you! Going to assign it to you for now then! [08:29:02] 10Blocked-on-schema-change, 10DBA, 10Data-Persistence, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [09:30:39] elukey: an-test-coord1001 mysql is down? [09:31:13] marostegui: hola! It is up, is icinga complaining? [09:31:13] elukey: just saw it at https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=4&orgId=1&from=now-2d&to=now&var-site=All&var-group=core&var-shard=s8&var-shard=s1&var-shard=s2&var-shard=s3&var-shard=s4&var-shard=s5&var-shard=s6&var-shard=s7&var-role=All [09:31:26] so maybe it needs a exporter restart [09:32:10] ahhh mysql sorry [09:32:20] Yeah, sorry [09:32:23] yes I think I have to add perms for the exporter to fetch [09:32:27] I just did: systemctl restart prometheus-mysqld-exporter@analytics-meta.service [09:32:28] it's mysql, _everyone_ is sorry [09:33:56] I am going to add the prometheus user [09:39:29] I have also restarted prometheus-mysqld-exporter [09:39:35] and the errors are decreasing [09:39:38] it will eventually clean up [09:40:34] elukey: I am wondering why isn't that host running 10.4 btw? [09:40:42] Ah, cause it is stretch [09:44:38] exactly yes [10:08:09] 10DBA, 10Data-Persistence, 10Operations, 10Release-Engineering-Team-TODO, and 2 others: Create integration test env for wmfmariadbpy - https://phabricator.wikimedia.org/T265266 (10JMeybohm) p:05Triage→03Medium [10:18:50] the exporter is still failing, but in the journal I can't see much [10:21:01] of course I wasn't using the @analytics-meta one [10:21:03] * elukey cries [10:46:51] 10DBA, 10MediaWiki-Logevents, 10Schema-change, 10Wikimedia-database-error: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961 (10LSobanski) @Huji Is this still a problem requiring work? [10:59:42] 10DBA, 10Operations: Puppetize grants for mysql hosts that are the source of recovery (dbstore, passive misc) - https://phabricator.wikimedia.org/T111929 (10LSobanski) @jcrespo could you weigh in on what is the exact work that needs to happen here? [11:04:09] 10DBA, 10Growth-Team, 10Operations: Move echo tables from local wiki databases onto extension1 cluster for mediawikiwiki, metawiki, and officewiki - https://phabricator.wikimedia.org/T119154 (10LSobanski) 05Open→03Declined We consider this task to be very risky and with the limited gain suggest against g... [11:07:33] 10DBA, 10Growth-Structured-Tasks, 10Growth-Team: Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10kostajh) >>! In T261411#6537727, @Marostegui wrote: >> Good to know. Do we have the same problem if we use the denormalized... [11:15:08] 10DBA, 10Growth-Structured-Tasks, 10Growth-Team: Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Marostegui) Thanks for the comment. Then, I reckon we do need to think about the denormalized approach as the only approach w... [11:22:45] 10DBA: Monitor the growth of CheckUser databases at large wikis - https://phabricator.wikimedia.org/T265344 (10Urbanecm) [11:23:31] 10DBA: Monitor the growth of CheckUser tables at large wikis - https://phabricator.wikimedia.org/T265344 (10Urbanecm) [11:24:02] 10DBA: Monitor the growth of CheckUser tables at large wikis - https://phabricator.wikimedia.org/T265344 (10Urbanecm) [11:25:11] 10DBA, 10Data-Persistence: Monitor the growth of CheckUser tables at large wikis - https://phabricator.wikimedia.org/T265344 (10Marostegui) p:05Triage→03Medium a:03Marostegui Thanks for creating this task! I will do it for the following wikis: - eswiki - metawiki - ruwiki [11:26:04] 10DBA, 10Data-Persistence: Monitor the growth of CheckUser tables at large wikis - https://phabricator.wikimedia.org/T265344 (10Marostegui) [11:35:18] 10DBA, 10Data-Persistence: Monitor the growth of CheckUser tables at large wikis - https://phabricator.wikimedia.org/T265344 (10Marostegui) [11:36:04] sobanski: what would be the column for in progress on the DBA workboard? Or we only keep track of in-progress on the data-persistence one? [11:38:17] Yes [11:38:40] Only on the data-persistence one [11:39:58] So for new tasks on the DBA tag, should I just use ready? [11:40:02] (if they are on-going) [11:51:16] 10DBA, 10Data-Persistence, 10Growth-Structured-Tasks, 10Growth-Team: Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Marostegui) [12:24:02] 10DBA: querycache qc_type and qc_title have different nullabality on s1 only - https://phabricator.wikimedia.org/T265349 (10Ladsgroup) [12:24:30] It's Christmas ^ [12:27:08] 10DBA: tl_namespace index on templatelinks is unique only in s8 - https://phabricator.wikimedia.org/T250060 (10Ladsgroup) This and imagelinks.il_to are not done in enwikivoyage (s5), I assume they fell into cracks while moving these wikis from s3 to s5? [12:30:49] 10DBA: querycache qc_type and qc_title have different nullabality on s1 only - https://phabricator.wikimedia.org/T265349 (10LSobanski) p:05Triage→03Medium [12:46:00] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Papaul) @Marostegui since the server is under warranty, it is best to use a disk that is under warranty as well. [12:46:03] 10DBA: querycache qc_type and qc_title have different nullabality on s1 only - https://phabricator.wikimedia.org/T265349 (10Marostegui) @Ladsgroup from what I can see, they need to be: ` `qc_type` varbinary(32) NOT NULL, `qc_title` varbinary(255) NOT NULL DEFAULT '', ` And they are the following everywhere... [12:46:50] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Marostegui) @Papaul sounds good, so maybe let's remove the old disk, give it 5 minutes, and then place the new one in? [12:48:11] 10DBA: tl_namespace index on templatelinks is unique only in s8 - https://phabricator.wikimedia.org/T250060 (10Marostegui) >>! In T250060#6538606, @Ladsgroup wrote: > This and imagelinks.il_to are not done in enwikivoyage (s5), I assume they fell into cracks while moving these wikis from s3 to s5? Most likely,... [12:48:52] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Papaul) Will do that once on site [12:49:27] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Marostegui) Going to depool the host just in case, thanks! [12:49:29] 10DBA: tl_namespace index on templatelinks is unique only in s8 - https://phabricator.wikimedia.org/T250060 (10Ladsgroup) Sure! [12:50:04] kormat: let me know when you are done with s1 eqiad [12:50:10] so I can deploy an alter there [12:50:38] 10DBA: querycache qc_type and qc_title have different nullabality on s1 only - https://phabricator.wikimedia.org/T265349 (10Marostegui) a:03Marostegui [12:50:44] marostegui: db1124 (sanitarium) is really struggling [12:51:13] 10DBA: querycache qc_type and qc_title have different nullabality on s1 only - https://phabricator.wikimedia.org/T265349 (10Ladsgroup) Cool! Thanks. [12:51:17] kormat: yeah, expected as they have less IO available [12:51:29] hmm. the graph says it's not catching up _at all_ [12:51:33] i wonder if replication is stopped [12:52:23] marostegui: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&from=now-3h&to=now&refresh=1m&var-server=db1124&var-port=13311&viewPanel=6 [12:52:38] rate of change is steady at -1, i.e. it's not progressing at all [12:53:12] kormat: of course, cause the schema change is running [12:53:26] oh. and it's a single database [12:53:27] so it makes sense it is still lagging behind as the alter isn't online [12:54:00] ok, now i get it [12:54:16] it's 5.5G through 8.3G [12:55:06] I think I will deploy my change there so it queues up [12:55:22] kick it while it's down, i like it. [12:55:51] the section is DT'd until tomorrow [12:56:58] XDDDD [12:57:41] 10Blocked-on-schema-change, 10DBA, 10Data-Persistence, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [12:58:27] 10Blocked-on-schema-change, 10DBA, 10Data-Persistence, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) 05Open→03Stalled All of eqiad is now done. The remaining hosts/sections in codfw will be done after the dc swit... [12:59:15] 2 months of work to get the backup DC done. i'm glad this wasn't even a difficult schema change :) [12:59:47] 10DBA: querycache qc_type and qc_title have different nullabality on s1 only - https://phabricator.wikimedia.org/T265349 (10Marostegui) [13:00:18] 10DBA: querycache qc_type and qc_title have different nullabality on s1 only - https://phabricator.wikimedia.org/T265349 (10Marostegui) 05Open→03Stalled I have deployed the change on eqiad, I will do codfw once it is back as standby [13:00:23] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Sustainability (Incident Followup), 10WorkType-NewFunctionality: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10Marostegui) [13:08:09] 10Blocked-on-schema-change, 10DBA, 10Data-Persistence, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) 05Stalled→03Resolved [13:08:21] wait, what [13:08:38] 10Blocked-on-schema-change, 10DBA, 10Data-Persistence, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) 05Resolved→03Stalled [13:08:40] fixed :P [13:54:10] dropping an interesting link https://issues.apache.org/jira/browse/CALCITE-4034 [13:54:50] in theory this means that we (as Analytics) could replace (on paper) the monthly sqoop from labsdb1012 with Hadoop reading innodb files [13:55:19] no idea about the performances etc.. but it could be an interesting use case to explore [13:56:48] elukey: i _think_ mariadb has a different on-disk format than mysql [13:58:13] good point, needs to be checked [13:58:27] the calcite adapter leverages https://github.com/alibaba/innodb-java-reader but I don't see traces of "mariadb" [13:59:12] No, ibd files are the same on mysql and mariadb [14:00:02] elukey: performance is probably better than mysql itself I would say, cause you'd be skipping the mysql layer (mysql optimizer and parsing etc) (I _guess_) [14:01:15] But that is just me guessing [14:01:47] it has the expected caveat that if the db hasn't flushed data to disk your reads can be inconsistent [14:01:53] but who cares about consistency anyway, right? [14:02:05] kormat: that's a good point indeed [14:02:07] both of them actually [14:02:09] XD [14:02:14] haha [14:05:49] I am ignorant but do we save snapshots of innodb files on our backup infra? If so a reasonably recent snapshot would be fine for us [14:05:59] elukey: we do, yeah [14:06:43] elukey: but you'd still need a mysql up and running [14:06:43] no? [14:07:02] yeah querying a live innodb file doesn't seem a great idea at first sight [14:07:04] marostegui: in theory no, that thing reads innodb files directly [14:07:31] volans: I tested the memcached plugin and it works _very well_ [14:07:46] that's different [14:08:23] but it does the same thing, reads the ibd files [14:08:50] I mean I dunno the implementation of this library they refer to in the task. The memcached plugin uses the innodb API to access the datastore as key:value [14:08:57] IIRC doesn't have any query capability [14:09:10] yeah, that's why I was asking if they need a mysql up and running [14:09:16] volans: it needs to be tested, the calcite project is a top level apache one and it is very solid [14:09:33] I am pretty sure that the adapter works well [14:09:45] but if there are bugs we can surely report them [14:09:46] famous last words! [14:10:01] so I don't see how accessing key:value can led to inconsistent data [14:10:06] I am also worried about upgrades of mariadb, and possibly changes in innodb file format [14:10:16] either that key exists or not and has a value or not, can be old but not 'inconsistent' [14:10:52] the main plus if it works is that we'd process the whole thing into hadoop, without the need of a beefy mariadb to be used only once a month [14:12:09] not saying it's not promising, just that maybe is not targeting a moving-innodb file from a live replica [14:12:19] but I guess they will explain the limitations [14:12:49] volans: I asked about processing a snapshot from the backup infra for that reason :) [14:13:04] yeah that seems the sanest approach IMHO :) [14:13:39] we don't care about the timing of fetching data, it only needs to happen right after a month changes [14:14:13] yes yes pulling from live replicas is not sane for sure [14:14:33] ok if I or Joseph open a task asking some questions? [15:02:23] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Papaul) new disk in place Status Name State Slot Number Size Security Status Bus Protocol Media Type Hot Spare Remaining Rated Write Endurance Physical Disk 0:1:0 On... [15:08:09] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Papaul) ` pt1979@es2026:~$ sudo megacli -PDRbld -ShowProg -physdrv[32:2] -aALL Rebuild Progress on Device at Enclosure 32, Slot 2 Completed 3% in 6 Minutes. [15:40:48] 10DBA, 10Operations, 10ops-codfw, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) Both CPU replaced, servers is back up [16:24:26] 10DBA, 10Operations, 10ops-codfw, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) Thank you Papaul, I will start repooling the host tomorrow and see how not goes with load [17:15:51] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on es2026 - https://phabricator.wikimedia.org/T263837 (10Papaul) return tracking information {F32383537} [17:43:47] 10DBA, 10Data-Persistence, 10Growth-Structured-Tasks, 10Growth-Team: Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Tgr) We are going to have separate tables for each wiki, right? And, at least with the current approach... [19:47:13] 10DBA, 10MediaWiki-Logevents, 10Schema-change, 10Wikimedia-database-error: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961 (10Huji) 05Open→03Invalid @LSobanski An index on `log_namespace, log_title, log_timestamp` exists but as pointed out before, th... [19:47:38] 10DBA, 10MediaWiki-Logevents, 10Schema-change, 10Wikimedia-database-error: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961 (10Huji) 05Invalid→03Open Whoops, closed by accident. [20:31:02] 10DBA, 10MediaWiki-API: Wrong title in revision from recentchanges log - https://phabricator.wikimedia.org/T264951 (10Naike) #dba Please have a look. If it's not in your wheelhouse, tag the platform engineering team. [23:23:56] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) @Marostegui Hello! We have enabled the feature on Persian, French, German, and Czech Wikipedia. [23:43:54] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): SHOW EXPLAIN FOR via Quarry or sql-optimizer fails often - https://phabricator.wikimedia.org/T265430 (10bd808) Here is my thinking about this being a load balancer issue: * We run haproxy instances that sit between the Cloud VPS/Toolforge users and the... [23:51:56] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): SHOW EXPLAIN FOR via Quarry or sql-optimizer fails often - https://phabricator.wikimedia.org/T265430 (10bd808) I am not sure if HAProxy actually has deep enough understanding of MySQL/MariaDB to actually let us configure a `balance` method that could ma...