[00:38:16] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Operations, 10Performance-Team, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3508603 (10Krinkle) [00:40:15] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Operations, 10Performance-Team, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3224459 (10Krinkle) [07:32:03] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3508858 (10jcrespo) > for which wikis would that be, and is there a task for that / for the decision? There is not yet a specific task for that- probably it will be created as a subtask of this o... [07:43:31] Hey, I'm running a mainteance script, right now the rate is https://grafana.wikimedia.org/dashboard/db/mysql?panelId=2&fullscreen&orgId=1&from=now-1h&to=now&var-dc=eqiad%20prometheus%2Fops&var-server=db1063&var-port=9104, do you want me to slow down? [07:44:28] is that s5? [07:45:04] Amir1: ? [07:45:14] yup [07:45:15] wikidatawiki [07:45:26] how long? [07:45:38] until the next five minutes [07:45:46] or less [07:45:58] ok, if at some point you run it for over an hour [07:46:19] put it on Deployments page on the "Week of:" section [07:46:26] Sure [07:46:36] that is what we look before doing schema changes [07:46:40] and where we put those [07:46:53] Amir1: the best way to see if it affects production [07:46:58] is looking at: [07:47:06] https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?panelId=5&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops [07:47:12] 1s of lag is ok [07:47:22] over that is problematic [07:47:23] I have both tabs open :) [07:47:30] Okay, noted, thanks! [07:49:55] 10DBA, 10Wikidata: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460#3508860 (10Ladsgroup) Properties are done now, since the number was small, I thought let's run it with "--deduplicate-terms" flag but it caused the terms in Wikidata to disappear temporarily (as was... [08:03:58] 10DBA, 10Wikidata: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460#3508867 (10Ladsgroup) Since there 854M rows in wb_terms right now, my estimation is that it will take 64 days. [08:04:57] jynus: It'd be nicer to do in one pass, but it's not a big problem [08:05:10] Reedy: what? [08:05:16] the unique -> primary [08:05:22] ah [08:05:24] TBH, I think if we do it in two patches sets... I can modify the two sql patches from the first... [08:05:25] as I said [08:05:28] So if anyone hasn't run update.php yet [08:05:36] They don't need to do two seperate alters [08:05:37] I don't care :-D [08:05:45] Anyone that has, has a patch to fix it for them [08:05:54] I have already done the alters this time :-) [08:05:58] heh [08:06:06] Ooorrrr..... we just squash the two patches [08:06:07] that is up to the hackers [08:06:35] I think the question [08:06:45] is if we maintain update.php for small installs or for large ones [08:07:00] if it is for small ones, merge it [08:07:11] It's mostly for small [08:07:17] But I'm sure some larger ones will exist [08:07:18] that use it [08:07:27] maybe document it [08:07:31] on changelog [08:07:34] Yeah [08:07:46] "this will break stuff, if you have a large install, do this instead" [08:07:51] Like I say, if we keep it as two commits, I can improve the first in the second [08:07:59] So they can do the add/drop in one db change etc [08:08:21] Or we just squash.. Knowing the sql changes are fine for WMF [08:08:55] Amir1: quick question? [08:09:05] sure [08:09:14] "my estimation is that it will take 64 days." [08:09:25] is the script easy to kill/stop? [08:09:44] yup [08:09:53] and it's easy to pick up where is left off [08:09:55] I am saying because in 64 days [08:10:06] most likely, wmf version changes [08:10:13] and master will be failed over [08:10:25] but I did linear extrapolation, given that there are cache in mysql, I'm guessing it'll be faster [08:10:38] I do not care how much it takes [08:10:48] if it is needed, it is needed, that is not a problem [08:11:04] I am asking so config and software version is regularly reloaded [08:11:17] yeah, you're right [08:11:42] so something like puppetize it to run for 1 hour, then stops, the starts again with the latest mediawiki version and config [08:11:56] also if terbium goes down, etc. [08:11:58] I'm not starting it yet, but I will soon, I want to put it in screen, can you access mine and kill it if needed? [08:12:31] if it is a 10+ day script I would ask you to puppetize it- it takes very little [08:13:20] timeout mwscript on a cron or something [08:13:59] releng complains if not that an older version of mw is active [08:14:06] and I complain I cannot change db config [08:15:05] jynus: the problem is that I need to set where it starts every time it gets killed [08:15:12] mmm [08:15:19] how do you know it? [08:15:28] the last output of the script [08:15:33] ok :-) [08:15:43] it says "I processed until row foo" [08:15:43] mwscript > file [08:16:51] mwscript $(tail -n 1 file | sed '/I processed until row//") [08:17:07] :-) [08:17:25] stress-free for 60 days [08:17:32] :D [08:17:33] okay [08:17:38] I can help [08:17:55] I need to go for lunch now, I'll be back and build the patch [08:18:38] if terbium goes down, you will also be thankful for it :-) [09:27:51] Yeah [10:51:49] jynus: https://gerrit.wikimedia.org/r/#/c/370626/ [10:52:54] let me see [10:55:15] correct me if I am wrong, but that starts a run tomorrow but never stops it? [10:55:47] and will start another one next week [10:58:49] jynus: yeah, this needs to be fixed [10:59:01] what about [11:01:18] timeout 3540 command + minute => 0, hour => '*', ...) ? [11:01:40] how often does it report the status? [11:02:51] or we can put a lock and kill it on every run [11:04:15] hmm [11:04:18] good idea [11:04:20] alternatively, we can avoid cron, and setup a script that runs the command [11:04:52] there are many ways [11:06:11] doesn't really matters [11:06:28] jynus: one thing, How I can get it to be ran for the first time [11:06:53] we can create the log with a 0 at the start? [11:06:56] nvm [11:07:02] let me try [11:07:24] try a manual run with the command [11:07:36] and then we can puppetize it when you are happy [11:08:15] jynus: ladsgroup@terbium:~$ /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=item >>/var/log/wikidata/rebuildTermSqlIndex.log 2>&1 [11:08:16] -bash: /var/log/wikidata/rebuildTermSqlIndex.log: Permission denied [11:08:40] let me help [11:08:45] Thanks [11:09:32] actually, me creating it will not help because you will have different permissions than puppet [11:10:02] just temporarilly write it on /tmp/rebuildTermSqlIndex.log [11:10:06] it is ok for a test [11:11:53] do you know how frequently it reports? [11:17:11] every five or six minutes [11:17:28] based on what it reported on properties [11:17:34] let me compily it, not sure about the cron parameters [11:17:38] *compile [11:17:48] I can run a test to see what it does with items [11:18:12] yeah, run one sample test redirecting to a file on /tmp/... [11:18:31] and I can use as a test [11:19:38] okay [11:19:42] let me do that [11:20:07] also to check there is no weirdness with the kills and child processes, etc [11:21:05] on it [11:24:16] nothing on the log yet [11:29:39] jynus: I think it doesn't flush out them until it gets killed :/ [11:30:16] proposed command $(tail -100 rebuildTermSqlIndex.log | grep -E 'Processed up to page (\d+?)' | sed -E 's/Processed up to page //; s/ \(Q.+?//' | tail -1) [11:30:26] use the single quote comman [11:30:38] and /usr/bin/timeout, full patch [11:30:41] *path [11:31:00] I am going for lunch, check the log works as intended [11:31:05] and we can deploy later [11:31:10] or when you have time [11:34:28] okay, have fun [11:34:30] Amir1: sorry for making you work more, this is something that should already be standarized [11:34:42] but sadly, it isn't [11:34:46] jynus: nah, you are completely right, it should be done the right way [11:35:04] yeah, but it should be already "ready to go" [11:36:09] if you feel more confortable with changing it on mediawiki, that is also ok [11:36:28] e.g. adding a maxexecution parameter or anything [11:36:35] whatever is easier for you [11:37:55] Thanks [11:39:12] jynus: the problem with single quote is that it's already inside string [11:39:18] so puppet fails [11:39:47] okok [13:17:14] I have copied the log to /var/log/wikidata/ [13:55:45] 10DBA: Point labsdb1001 and labsdb1003 to db1095 and db1102 - https://phabricator.wikimedia.org/T166546#3509320 (10jcrespo) a:05Marostegui>03jcrespo [14:30:10] 10DBA, 10Analytics, 10Contributors-Analysis, 10Chinese-Sites, 10Patch-For-Review: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3509444 (10Milimetric) Good news, the 2017-07_private snapshot finished. I will now start the 2017-07 snapshot process, and if there ar... [15:01:28] 10DBA, 10Cloud-Services: Unable to drop tables on dewiki.labsdb/s51072__dwl_p - https://phabricator.wikimedia.org/T172784#3509557 (10jcrespo) There seems to be corruption on allmost all, if not all Aria table, it seems to have internally crashed, not allowing even dropping the table. After I delete physically... [16:44:14] 10DBA, 10Cloud-Services: Unable to drop tables on dewiki.labsdb/s51072__dwl_p - https://phabricator.wikimedia.org/T172784#3509834 (10jcrespo) @Giftpflanze It should work now, Aria had crashed internally, even if MySQL was still up. My advice to avoid this issue in the future is to convert/create the tables to... [16:45:31] 10DBA, 10Cloud-Services: Unable to drop tables on dewiki.labsdb/s51072__dwl_p - https://phabricator.wikimedia.org/T172784#3509835 (10jcrespo) @marostegui labsdb1003 misbehaved today, I tried to fix it without rebooting, but it ended up crashing and recovering itself to a better state. I have migrated some key... [17:04:14] 10DBA, 10Patch-For-Review: Point labsdb1001 and labsdb1003 to db1095 and db1102 - https://phabricator.wikimedia.org/T166546#3509901 (10jcrespo) I was going to move s2, and then T172784 happened. Will try tomorrow. [17:48:32] 10DBA, 10Cloud-Services: Unable to drop tables on dewiki.labsdb/s51072__dwl_p - https://phabricator.wikimedia.org/T172784#3510005 (10Giftpflanze) 05Open>03Resolved I went through my table creation statements and added engine statements. I dropped four tables that I want to recreate and changed the engine o... [21:07:10] 10DBA, 10Wikidata, 10Patch-For-Review: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460#3510499 (10Ladsgroup) [21:07:29] 10DBA, 10Wikidata, 10Patch-For-Review: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460#3465512 (10Ladsgroup) {T172776} needs to be resolved before moving on, otherwise this will make a mess. [21:34:40] 10DBA, 10RESTBase-API, 10Reading List Service, 10Reading Epics (Synchronized Reading Lists), and 4 others: RfC: Reading List service - https://phabricator.wikimedia.org/T164990#3510600 (10GWicke)