[07:54:16] 10DBA: db1117 went away - https://phabricator.wikimedia.org/T208150 (10jcrespo) mysql-prometheus-exporter should not run in a multiinstances host, there is mysql-prometheus@m1, @m2 ... That is specified on puppet. [08:05:44] 10DBA: db1117 went away - https://phabricator.wikimedia.org/T208150 (10Banyek) That was what confused me, because after the host came back, there was this error on icinga, which I don't remember in the past, so I tried to run puppet if that will fix everything, but it didn't [08:09:07] 10DBA, 10MediaWiki-Logging: Evaluate the need for FORCE INDEX (ls_field_val) [now IGNORE INDEX (ls_log_id)], delete the index hint if not needed anymore - https://phabricator.wikimedia.org/T164382 (10jcrespo) p:05Triage>03Low [08:11:53] 10DBA, 10Operations, 10monitoring: Create a script to regenerate prometheus mysqld exporter listing that works with puppetdb - https://phabricator.wikimedia.org/T145072 (10jcrespo) I think this should be moved to zarcillo mariadb metadata database, and centralize there the active database control (substituti... [08:33:29] 10DBA, 10Operations, 10ops-eqiad: db1117 went away - https://phabricator.wikimedia.org/T208150 (10jcrespo) so the error happens because it is tried to be run manually, which it not a big deal if it errors out- just delete any file you may have added. I ran `systemctl disable 10DBA, 10Operations, 10ops-eqiad: db1117 went away - https://phabricator.wikimedia.org/T208150 (10jcrespo) [08:52:42] 10DBA, 10Operations, 10ops-eqiad: db1117 went away - https://phabricator.wikimedia.org/T208150 (10Banyek) What cli command did you use for getting those logs? [09:08:17] 10DBA, 10CheckUser, 10Patch-For-Review: The "show ip" action should also provide a distinct list of user-agents for each IP - https://phabricator.wikimedia.org/T170508 (10jcrespo) Please note that while I have been asking you to wait, I am genuinely concerned about the performance of the query- even if only... [09:36:42] 10DBA, 10Research, 10Patch-For-Review: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) We don't share passwords publicly, and you shouldn't need it to actually use it- you should create puppet code that reads it and write it to a config fi... [09:56:05] 10DBA, 10MediaWiki-API, 10MediaWiki-Database: prop=revisions API timing out for a specific user and pages they edited - https://phabricator.wikimedia.org/T197486 (10jcrespo) https://downloads.mariadb.org/ 10.1.37 not yet considered stable at the time of writing this. While we could deploy something from the... [14:35:45] Now I am running redact_sanitarium.sh on db2094 [14:39:31] ```/usr/local/sbin/redact_sanitarium.sh -S /run/mysqld/mysqld.s5.sock -d enwikivoyage``` [14:39:40] (and the other databases as well) [14:52:13] 10DBA, 10Operations, 10ops-eqiad: db1117 went away - https://phabricator.wikimedia.org/T208150 (10Cmjohnson) @jcrespo @banyek It is not clearly a RAM issue it could be a CPU issue as well on CPU1 ....i will need to do a few things tot he server...Swap the supposedly bad DIMM to B side and see if the error fo... [14:52:20] 10DBA, 10Operations, 10ops-eqiad: db1117 went away - https://phabricator.wikimedia.org/T208150 (10Cmjohnson) idrac log ------------------------------------------------------------------------------- Record: 2 Date/Time: 10/27/2018 21:08:57 Source: system Severity: Non-Critical Description: C... [14:53:42] 10DBA, 10Operations, 10ops-eqiad: db1117 went away - https://phabricator.wikimedia.org/T208150 (10Banyek) Sure! Can I shut down the server now for you? [14:55:47] jynus: after running the redact_sanitarium script the check_private_data script still says there are private data there [14:56:33] I now clean up the tables using the UPDATE sql-s it was wrote and then recheck for private data and see the trigerrs etc. in place [14:57:48] Banyek go ahead and shut it down [14:58:07] ok! [15:00:06] let me check the logs of db2094 [15:03:52] cmjohnson1: the server is shut down you can work with it [15:04:13] okay..great! [15:04:17] give me about 5 mins [15:04:31] take all the time you need [15:10:59] it's powering on banyek [15:11:09] thanks [15:11:21] and what's next? [15:11:30] we leave it run for a while? [15:12:03] yes, let's leave it running to see if the errors return, follow or stay the same [15:12:26] ok, cool, thanks again! [15:13:04] 10DBA, 10Operations, 10ops-eqiad: db1117 went away - https://phabricator.wikimedia.org/T208150 (10jcrespo) > It is not clearly a RAM issue Sorry, my comment was in the context of "it is not a software issue" and "it expresses/reveals itself as a memory error", so we could discard MySQL issues, as it is what... [15:14:37] banyek: could you upgrade the server before restarting the services back again? [15:15:01] (kernel, security packages, mariadb) [15:15:03] that was exactly my plan! [15:15:06] thanks [15:15:58] banyek: I don't see the triggers created on db2094, talk to me when you are done [15:16:08] (probably the execution of the sanitization script failed) [15:44:05] maintenance on db1117 is finished [15:44:16] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Papaul) [15:45:44] thanks, did you see my comments about db2094? [15:48:21] Yes! [15:49:35] I agree that's the issue, now I run the commands which was in the output of check scripts, that is cleaning up the databse, and I'll take a look in the redact script after [15:50:19] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10jcrespo) @Papaul: @Banyek will be your contact point as he will be the person in charge of the related goal while Manuel is out. [15:52:14] banyek: don't do that [15:52:20] those commands are not supposed to be run [15:52:38] just make sure the sanitize script works as intended or report what is the issue with it [15:52:43] I am already running those for a while [15:52:45] running the UPDATE [15:52:51] will just be wasted time [15:52:56] as it will not create the triggers [15:53:10] and a different update will be run anyway [15:53:22] please follow the instructions manuel told you [16:01:15] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Papaul) [16:09:21] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Banyek: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Banyek) [16:51:50] 10DBA, 10User-Banyek: Reimage pc2006 with stretch - https://phabricator.wikimedia.org/T207934 (10jcrespo) We should consider declining that and do the work directly on the new hardware: T207259 [17:24:02] 10DBA, 10MediaWiki-Special-pages, 10Datacenter-Switchover-2018: Significant (17x) increase in time spent by updateSpecialPages.php script since datacenter switch over updating commons special pages - https://phabricator.wikimedia.org/T206592 (10jcrespo) Please give ups the times on eqiad again so we can veri... [22:14:54] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Banyek: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Papaul)