[04:52:50] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [04:53:13] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) [04:59:14] 10DBA, 10JADE, 10Operations, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Marostegui) >>! In T200297#4466074, @awight wrote: >>>! In T200297#4464608, @Marostegui wrote: >> What does: "our r... [05:58:05] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) [06:21:29] 10DBA, 10Patch-For-Review: Make sure multi-instance slaves page - https://phabricator.wikimedia.org/T200509 (10Marostegui) [06:21:34] 10DBA, 10Epic: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423 (10Marostegui) [06:48:40] 10DBA, 10Release-Engineering-Team, 10Epic: Implement a system to automatically deploy schema changes without needing DBA intervention - https://phabricator.wikimedia.org/T121857 (10Marostegui) p:05Triage>03Normal Now that we are getting closer to have a way to automatic depool/repool slaves, I would like... [07:35:20] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), and 2 others: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10Marostegui) I am runnin... [07:46:20] 30 minutes from fully pooled to reimage to pooled back, that has to be a record [07:46:58] and from those 30 how many doing the commit, waiting for jenkins, deploy, revert, wait for jenkins and deploy again? :) [07:48:11] none, I did those in advance [07:48:17] smart! [07:49:34] did you have time to think how to solve the instance optional monitoring? [07:50:02] probably hiera I think [07:50:41] I haven't thought a lot about it since yesterday evening [07:50:45] cool [07:50:50] to hiera, I mean [07:50:57] I think it is the easier/cleaner [07:51:13] by default false and if the key exists on the yaml, page [07:51:19] using your is_critical => auto [07:51:35] I wonder if we should just configure it per role on hiera, for all of them, not only instance [07:52:03] I think the style guide says we shouldn't put default values on hiera [07:53:23] I meant: no line in db1098.yaml -> don't page, if the key exists on the yaml, page [07:53:28] is that what you also meant? [07:56:16] 10DBA, 10Collaboration-Team-Triage, 10Growth-Team, 10MediaWiki-extensions-PageCuration, 10Schema-change: Drop ptrl_comment in production - https://phabricator.wikimedia.org/T157762 (10Marostegui) Dropped from enwiki and all its hosts: [x] labsdb1011 [x] labsdb1010 [x] labsdb1009 [x] dbstore1002 [x] dbst... [07:56:23] 10DBA, 10Collaboration-Team-Triage, 10Growth-Team, 10MediaWiki-extensions-PageCuration, 10Schema-change: Drop ptrl_comment in production - https://phabricator.wikimedia.org/T157762 (10Marostegui) [08:04:37] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) [08:04:42] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [09:08:30] I have modified the grafana stats: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=13&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1092&var-port=9104 [09:09:07] I now show the hit rato for the buffer pool both excluding and including out of band writes [09:09:28] it should be very similar except on low bandwidth hosts and on buffer pool load [12:28:49] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for wikimediawiki - https://phabricator.wikimedia.org/T201001 (10Urbanecm) [12:44:50] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for wikimaniawiki - https://phabricator.wikimedia.org/T201001 (10Urbanecm) [12:52:27] 10DBA, 10Data-Services, 10User-Urbanecm: Prepare and check storage layer for wikimaniawiki - https://phabricator.wikimedia.org/T201001 (10Marostegui) p:05Triage>03Normal I have sanitized this wiki and my new user was created correctly with all the sanitization done. It is ready for #cloud-services-team t... [12:58:07] 10DBA, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_internalwikimedia - https://phabricator.wikimedia.org/T196748 (10Marostegui) 05Open>03Resolved a:03Marostegui This wiki was created in production: ``` root@db1075.eqiad.wmnet[id_internalwikimedia]> show databases like 'i... [13:03:26] 10DBA, 10Data-Services, 10Chinese-Sites, 10User-Urbanecm, 10cloud-services-team (Kanban): Prepare and check storage layer for zhwikiversity - https://phabricator.wikimedia.org/T199599 (10Marostegui) p:05Triage>03Normal I have sanitized this wiki and my new user was created correctly with all the sani... [13:04:40] 10DBA, 10Data-Services, 10User-Urbanecm: Prepare and check storage layer for satwiki - https://phabricator.wikimedia.org/T198401 (10Marostegui) I have sanitized this wiki and my new user was created correctly with all the sanitization done. It is ready for #cloud-services-team to create the views. [13:33:47] wait, but isn't 449711 the opposite of what you told me this morning? [13:34:04] ? [13:34:51] I even showed you the pseudo code! :p [13:34:55] Maybe i didn't explain myself right [13:34:58] Can I write over your patch, if you have it locally (just for showint purposes) [13:35:14] so you can revert immediately [13:35:31] Revert what? [13:35:39] revery to the current state [13:35:50] *revert to the current patch number [13:35:58] I want to show you something, but it is not correct [13:36:39] I think I am not understanding what you mean :) [13:36:44] You want to write to that patch? [13:36:57] no, just write over yours [13:37:02] you have it locally, right? [13:37:08] yes [13:37:12] well, and if not, you can download it again [13:37:26] I want to "break it" so I can ask you a question :-) [13:38:23] https://puppet-compiler.wmflabs.org/compiler03/11972/ [13:39:04] This looks good (obviously depends on the "auto") [13:40:22] sorry, I broke it, but I hope you get the idea [13:41:02] I deleted the other file, that is still needed [13:41:13] my question is, I thought that is what you wanted [13:41:38] OR if using hiera, using just 1 parameter, not one per instance [13:41:49] one for the whole role [13:43:02] independently of that, I would call the parameter "replication_is_critical" to not confuse it with the other checks (disk, process) [13:44:06] or do you want to configure it per dc for now, until auto is in production? [13:44:25] (which I can see, but I would still do only 1 parameter per role) [13:46:54] What I uploaded is the approach I had in mind with all the complexity it has under it [13:47:15] so, the draft I sent is what I understood you wanted to do [13:47:28] (with some extra breakage) [13:47:33] No, maybe I explained myself incorrectly :) [13:47:51] But my approach was the other one, which also gives flexibitity on a per instance basis for the future [13:48:21] but do we need per instance? and even if we did, why not making per instance deafault to auto? [13:48:53] cause then misc hosts will probably page [13:49:04] ? [13:49:12] misc multi instance [13:49:16] you are editing core_multiinstance [13:49:18] as they are on eqiad [13:49:21] not instance [13:49:45] didn't you set instance is_critical default to false? [13:49:59] yep [13:50:11] I don't know, that is the approach I had [13:50:12] so it is the current state- all default to 0 [13:50:15] I can think of other things [13:50:35] but it is right, I can see the hiera, but not the per-instance hiera [13:51:00] is something in core not going to page? [13:51:11] e.g. core_test_multiinstance should be set to false [13:51:35] of maybe you were thinking of misc, in which case I can see it [13:51:58] Right now something in core alert, but who knows if in the future we need it or if we expand it to misc [13:52:06] something/everything [13:52:18] ok, I can see that, but you didn't edit misc_multiinstance [13:52:51] I didn't do that for now, no, as I want to reduce the scope and make sure we get the core servers with alerting only so far [13:53:15] can you reupload your patch again? [13:53:16] anyways, I can change to your approach [13:53:26] and get rid of the hiera-per-instance key [13:53:27] I just uploaded so you understood me [13:53:41] so the hiera thing I am ok with it [13:53:54] but I wouldn't put per-instance keys [13:54:03] would set jut one at most per role [13:54:38] based on the fact that right now, we do not plan to change it per-instance case, for core hosts [13:55:06] I am resending my patch and will ammend [13:55:09] later with your idea [13:55:20] so you can basically keep core_multiinstance as is [13:55:32] but add only 1 hiera key on the role [13:55:44] (or 8 keys, just not 1 per server) [13:57:15] aka modifying just hieradata/role/common/mariadb/core_multiinstance.yaml and putting there the 8 keys [13:57:52] or not add anything and just do is_critical => auto [13:57:55] as you suggested [13:58:08] hiera is ok to me [13:58:13] I think it is even better [13:58:32] I just don't want on the hosts properties that are really due to the role [13:59:13] I think it is cool it is configurable, I hadn't think about it [15:29:37] 10DBA, 10Operations, 10ops-codfw: db2061 disk with predictive failure - https://phabricator.wikimedia.org/T200059 (10Papaul) a:05Papaul>03Marostegui @Marostegui disk replacement complete [15:32:06] 10DBA, 10Operations, 10ops-codfw: db2061 disk with predictive failure - https://phabricator.wikimedia.org/T200059 (10Marostegui) Thanks! ``` physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 600 GB, Rebuilding) ``` [15:32:50] 10DBA, 10Operations, 10decommission, 10ops-codfw: db2064 crashed and totally broken - decommission it - https://phabricator.wikimedia.org/T195228 (10Papaul) @robh we do have a 12 disks decom on site. (db2013) [15:51:31] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [15:51:39] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) [15:51:44] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) [16:19:36] 10DBA, 10Operations, 10monitoring: HAproxy on dbproxy hosts lack enough logging - https://phabricator.wikimedia.org/T201021 (10jcrespo) [19:11:28] I'm seeing a lot of errors like this: Error: 1290 The MariaDB server is running with the --read-only option so it cannot execute this statement (10.192.32.8) [19:12:08] These are coming from job queue exclusively it seems [19:12:25] JobQueueError from line 828 of /srv/mediawiki/php-1.32.0-wmf.15/includes/jobqueue/JobQueueDB.php: Wikimedia\Rdbms\DBQueryError: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? [19:13:01] https://logstash.wikimedia.org/goto/d53c9f8625b96c9107ba83dcc72c97ac [19:15:40] ah these are all coming from labtestweb2001 [19:28:35] 10DBA, 10Wikimedia-log-errors: db2037 is read only? - https://phabricator.wikimedia.org/T201082 (10mmodell) [19:32:23] also seeing a few Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.48.25) [19:50:57] 10DBA, 10Wikimedia-log-errors: db2037 is read only? - https://phabricator.wikimedia.org/T201082 (10Marostegui) db2037 is read-only because it is m5 codfw master, and as codfw is the passive DC, nothing should write to it. Whatever is trying to write to it, instead of using m5-master (db1073) probably needs to... [19:55:28] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install dbproxy101[2-7].eqiad.wmnet - https://phabricator.wikimedia.org/T196690 (10Cmjohnson) [19:56:08] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install dbproxy101[2-7].eqiad.wmnet - https://phabricator.wikimedia.org/T196690 (10Cmjohnson) a:05Cmjohnson>03RobH assigning to @robh to help complete the installation. [21:20:57] 10DBA, 10JADE, 10Operations, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Halfak) @Marostegui, essentially, we need JADE things to [be wiki pages](https://www.mediawiki.org/wiki/Everything_... [23:20:14] 10DBA, 10Data-Services, 10User-Urbanecm, 10cloud-services-team (Kanban): Prepare and check storage layer for satwiki - https://phabricator.wikimedia.org/T198401 (10bd808) [23:45:05] 10DBA, 10Data-Services, 10Chinese-Sites, 10User-Urbanecm, 10cloud-services-team (Kanban): Prepare and check storage layer for zhwikiversity - https://phabricator.wikimedia.org/T199599 (10bd808) I have run `sudo /usr/local/sbin/maintain-replica-indexes --debug --database zhwikiversity` on all 3 wiki repli...