[06:13:38] 10DBA: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2704823 (10Marostegui) db2058 is finished ``` root@neodymium:~# mysql -hdb2058.codfw.wmnet commonswiki -e "show create table revision\G" *************************** 1. row *************************** Table: revision Create Tabl... [06:58:36] reserve sine time today for T147302 [06:58:37] T147302: Prepare and check production and labs-side filtering for olowiki - https://phabricator.wikimedia.org/T147302 [06:58:58] jynus: Yeah, I guess after lunch as chase will be online? [06:59:05] ok [06:59:17] (if we want to wait for him) [07:04:11] jynus: we have the tech ops meeting at 7pm today, btw [07:22:56] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2704892 (10Marostegui) I have finished cleaning up the tables in S4 now. Before dropping them I have searched in the binlogs to see if another `DELETE from`was issued from the master as... [07:30:41] Interesting [07:30:58] I have found something that might explain why S2 broke replication [07:31:03] With hitcounter table [07:34:49] what? [07:35:59] So I was going to start working on S5 to get rid of them [07:36:02] And I went to the master [07:36:33] And did a select count which issued a DELETE from downstream [07:36:50] Memory tables… <3 [07:37:59] I was tailing dbstore1002 log while doing the count on the master, and voilá, it was there [07:38:58] So clearly, for this case, the drop table needs to be done first on the master to avoid this [08:06:04] mystery solved [08:06:44] yep [08:07:20] I am reading Oracle's documentation as mariadb's one is quite poor for HEAP engine and yeah, it says that the first time a memory table is used on the master, it uses a DELETE from if it is empty [08:07:24] to ensure consistency [08:07:27] lovely [08:09:25] yeah I remember we found about it back in april IIRC :-P [08:10:34] I am tempted to file a bug report for that to be honest, breaking replication with that is a bit silly [08:10:42] Actually the whole idea of a memory table…is :) [08:11:29] Not the replication part itself (which is fine) but issuing a DELETE from just like that [08:14:18] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2704971 (10Marostegui) While getting ready to drop the table from S5 master I issued a `SELECT count` from it and I saw that after it finished (0 rows) it issued a DELETE from in the bin... [08:17:11] jynus: is there a ticket that you could link to regarding the larger issue described in https://phabricator.wikimedia.org/T145412#2704922 ? [08:17:23] I can't quite grasp what you mean! [08:19:11] marostegui: looks like that the idea behind that is that if a master get's rebooted the slaves will have inconsistent data, ofc it would be nicer to do a drop table if exists; create table probably, but maybe was not done because it's not transactional? [08:19:19] * volans just thinking out loud [08:21:49] volans: Yeah, that is the logic behind that. But you can issue that right at the start of the master, instead of waiting for the first time you use the table [08:22:09] To me it makes no sense that a select can trigger a delete [08:24:00] yeah, also the slaves could get selects in the meanwhile and serve obsolete data [08:25:37] by definition you shouldn't relay on your data if it is stored on a memory table, but from there to issue a delete just like that... [08:26:29] totally agree [08:29:57] dbstore1001 complaining (warning) about free disk, and I was wondering if /tmp/db1064.tar.gz.enc can be deleted (238G) - that was used to reimage db1064 (the first server I reimaged) [08:30:22] marostegui, yes [08:30:32] in fact, I think you created that file? [08:30:38] Yes, I did [08:30:53] I will get rid of it now [08:45:03] 10DBA: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#2197786 (10Marostegui) While checking S5 `dewiki` for another issue I found that the revision table has: ``` PRIMARY KEY (`rev_page`,`rev_id`), UNIQUE KEY `rev_id` (`rev_id`), ``` At l... [09:35:04] marostegui, any ideas on how to call the 10.1 package? [09:35:22] jynus: You want to include de wfm tag too? [09:35:48] yes, internal use only-I intend to add the systemd unit with puppet only [09:35:55] Ah, what you said yesterday [09:36:21] Well, wmf-mariadb101e from experimental? [09:37:08] no, no, let's create the final name now [09:37:13] Ah [09:37:32] problem is we can use the wmf-mariadb101 and be consistent or wmf-mariadb-10.1 and be correct [09:37:58] I prefer consistency at this point :) [09:38:36] so wmf-mysql57 too? [09:38:47] That would be nice :) [09:42:28] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2705073 (10Marostegui) I have been trying https://github.com/giacomolozito/ibdata-shrinker for defragmenting in a small instance. From the initial tests: it looks good and no data was lost. More than the shrin... [09:45:45] did you check replication? [09:45:57] maybe that is lost on defragmenting? [09:46:16] That is exactly what I was doing now with my instance, setting up another one [10:10:05] marostegui, I have a task for you [10:10:22] sure [10:10:31] I have uploaded to dbstore2001 [10:10:40] 3 files on my home [10:10:55] I want you to break the system using those [10:11:03] :-) [10:11:23] of course [10:11:25] I can do that [10:11:29] I like breaking things [10:11:30] :) [10:11:35] in particular check [10:11:44] missing or not found dependecies [10:11:48] tls library [10:12:05] sure [10:12:05] in general, them working as intended [10:12:09] there is one gotcha [10:12:36] I have not added a systemd unit- so you will have to run mysql_safe for now from the basedir [10:12:44] I will work now on systemd on puppet [10:13:19] I think files will break if there is a 5.7 -> 10/10.1 migration [10:13:22] (InnoDB) [10:13:33] but again, feel free to play [10:13:42] not only for dbstore, thinking about labs, too [10:14:01] sure, I will do that and get back to you [10:14:18] hopefuly that will help you for the dbstore setup [10:14:21] too [10:14:23] sure [10:14:45] it will help to advance on that [10:54:39] I have to do a 3-way merge among the mysql and mariadb options and our current mysqld_safe options: https://gerrit.wikimedia.org/r/315228 [11:06:36] What is that for? [11:09:02] jynus: I am going to get some lunch, but do you want to hangouts later to fill out the etherpad for the meeting after lunch together? [11:09:08] sure [11:09:13] see you later [11:09:32] \o/ [11:11:23] 10DBA: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2705241 (10Marostegui) db2051 finished, so codfw is now consistent (apart from the special slaves) ``` MariaDB MARIADB db2051.codfw.wmnet commonswiki > show create table revision\G *************************** 1. row ******************... [12:35:48] jynus: I have written some stuff already on the etherpad, let me know when you are around so we can review it and hangout [12:35:57] I am here [12:36:01] Ah o/ [12:58:22] marostegui, jynus: I'll review the systemd changes, but currently fairly busy, probably only by Friday [12:58:36] moritzm, do not worry for now [12:58:57] forget it unless I explicetly mention you [12:59:12] it turns out the unit works well [12:59:27] but we may not be able to use it on mariadb10/5.6 [12:59:58] ok [13:46:24] jynus: marostegui good morning gents still planning on https://phabricator.wikimedia.org/T147302 today ? [13:47:32] chasemp: yep, we are on a hangouts, we will do as soon as we are done [13:47:59] I would like to looky-loo whatever form that takes if you still don't mind [13:48:02] thanks guys [13:48:42] sure, we will ping you [14:10:35] chasemp, we are going to start with olowiki [14:10:45] but we are going to start with the production side [14:10:50] not sure if interesting for you [14:11:02] jynus: ok I'm about, how should I observe / participate? [14:24:50] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Prepare and check production and labs-side filtering for olowiki - https://phabricator.wikimedia.org/T147302#2705670 (10chasemp) note: we should update this contact info https://wikitech.wikimedia.org/wiki/Add_a_wiki#Start [15:31:37] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Prepare and check production and labs-side filtering for olowiki - https://phabricator.wikimedia.org/T147302#2705900 (10Marostegui) We have added olowiki filtering. We executed the following command ``` root@neodymium:/home/jynus/software/redactatron/scr... [16:06:57] 10DBA, 06Operations, 10ops-eqiad: Physically move db1053 to a different rack - https://phabricator.wikimedia.org/T147774#2703076 (10Cmjohnson) @Marostegui I can move this server to A2. Give me the go ahead once you have powered off and it's safe to move. [16:08:24] 10DBA, 06Operations, 10ops-eqiad: Physically move db1053 to a different rack - https://phabricator.wikimedia.org/T147774#2705987 (10Marostegui) Thanks Chris - I have been told I have to update DNS with the new IP, is that something you can give me beforehand or it will just dhcp it? Also, given that tomorro... [16:09:36] 10DBA, 06Operations, 10ops-eqiad: Physically move db1053 to a different rack - https://phabricator.wikimedia.org/T147774#2705989 (10Cmjohnson) @Marostegui I will fix the dns for you once it's moved to row A. [16:10:12] 10DBA, 06Operations, 10ops-eqiad: Physically move db1053 to a different rack - https://phabricator.wikimedia.org/T147774#2705990 (10Marostegui) @Cmjohnson Excellent! Thanks!. So let's wait till Thursday then [16:12:27] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Prepare and check production and labs-side filtering for olowiki - https://phabricator.wikimedia.org/T147302#2706000 (10jcrespo) a:05jcrespo>03chasemp Nominatively assigning to Chase, as the production part is done, but of course feel free to reassign... [16:28:43] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: es2015 crashed with no logs - https://phabricator.wikimedia.org/T147769#2702976 (10RobH) Seems the Dell tech is asking Papaul for hardware logs: Syslog shows nothing for the hard crash: Oct 10 03:29:34 es2015 puppet-agent[172665]: Retrieving pluginfac... [16:35:27] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: es2015 crashed with no logs - https://phabricator.wikimedia.org/T147769#2706079 (10jcrespo) From the IDRAC 8 web console: ``` Log: Normal Mon Feb 08 2016 16:08:44 Log cleared. Critical Mon Oct 10 2016 03:52:20 CPU 1 has an internal error (IERR). Lifec... [16:37:49] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: es2015 crashed with no os logs (kernel logs or other software ones) - it shuddenly went down - https://phabricator.wikimedia.org/T147769#2706093 (10jcrespo) [16:50:04] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: es2015 crashed with no os logs (kernel logs or other software ones) - it shuddenly went down - https://phabricator.wikimedia.org/T147769#2706131 (10Papaul) Enterprise Service Request Hello Papaun, Thank you for contacting Dell! This issue has bee... [16:50:43] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: es2015 crashed with no os logs (kernel logs or other software ones) - it shuddenly went down - https://phabricator.wikimedia.org/T147769#2706134 (10Papaul) BIOS: 2.2.5 http://downloads.dell.com/FOLDER03917193M/1/BIOS_PFWCY_WN32_2.2.5.EXE iDRAC-L... [17:24:38] 10DBA, 10MediaWiki-extensions-ORES, 06Revision-Scoring-As-A-Service, 15User-Ladsgroup: Ensure ORES data violating constraints do not affect production - https://phabricator.wikimedia.org/T145356#2706309 (10Ladsgroup) ``` ladsgroup@terbium:~$ mwscript extensions/ORES/maintenance/CleanDuplicateScores.php --w... [17:31:52] 10DBA, 10MediaWiki-extensions-ORES, 06Revision-Scoring-As-A-Service, 15User-Ladsgroup: Ensure ORES data violating constraints do not affect production - https://phabricator.wikimedia.org/T145356#2706349 (10Ladsgroup) @jcrespo: I ran the maintenance script and then this: {P4196} Is it okay now? Thanks [17:44:54] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: es2015 crashed with no os logs (kernel logs or other software ones) - it shuddenly went down - https://phabricator.wikimedia.org/T147769#2706448 (10Papaul) Today october 11th I call Dell Support for this issue. Call time 10:52 am call duration = 54 m... [18:16:03] 10DBA, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, 07WorkType-NewFunctionality: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#2706641 (10Huji) [18:50:55] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: es2015 crashed with no os logs (kernel logs or other software ones) - it shuddenly went down - https://phabricator.wikimedia.org/T147769#2706765 (10RobH) a:05jcrespo>03Cmjohnson Chris, I'll escalate this to our account team, but can you dispatch ov... [19:31:13] 10DBA, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, 07WorkType-NewFunctionality: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#2707037 (10RobLa-WMF) Thanks @Huji for noting the SecurePoll strict mode problems (T147875).... [20:39:44] * AaronSchulz wonders why there are so many UPDATEs on s2 vs s1 [23:52:03] 10DBA, 10MediaWiki-extensions-ORES, 06Revision-Scoring-As-A-Service, 13Patch-For-Review, and 2 others: hidenondamaging=1 query is extremely slow on enwiki - https://phabricator.wikimedia.org/T146111#2707798 (10Halfak) 05Open>03Resolved a:03Halfak [23:52:07] 10DBA, 10MediaWiki-extensions-ORES, 06Revision-Scoring-As-A-Service, 15User-Ladsgroup: Ensure ORES data violating constraints do not affect production - https://phabricator.wikimedia.org/T145356#2707802 (10Halfak)