[06:09:07] 10DBA, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10Marostegui) a:03Marostegui [06:52:27] 10DBA, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) MySQL has been stopped on dbstore1002 and won't be started again, as this host will be decommissioned [08:07:30] 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) After the analyze, they (db2038 and db2059) now use the same plan: ` root@db2038.codfw.wmnet[dewiki]> explain SELECT log_id,log_type,log_action,log... [10:15:00] marostegui: ugh re: T217542 ! I think the easiest for now is to stick with 0.10 [10:15:00] T217542: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542 [10:15:28] also for some reason I don't see 0.10 in stretch-wikimedia [10:15:33] godog: I guess so, I did a apt full-upgrade on that host [10:18:09] ok, I'm copying 0.10 into stretch-wikimedia, not sure why it wasn't there already [10:18:20] error: unknown short flag '-c' ? [10:18:28] but I don't see that option [10:21:45] yeah looks like it changed between 0.10 and 0.11 [10:22:37] jynus: it changed from -collect to —collect [10:22:54] godog: thanks :) [10:23:00] oh [10:23:11] well, we can fix it with an if on puppet [10:23:49] the collect.info_schema.tables looks like also changed [10:23:51] not worried for stretch, but I guess it will be the new syntax on buster [10:24:02] I will test on db1114 [10:24:13] or were you working on it? [10:24:29] sure, feel free to test :) [10:24:51] also, I may take some time and remove any passwords CC godog [10:24:56] I just reported it and create the ticket, and workaround it to get it up for now [10:25:17] should be fine now btw for full-upgrades on stretch [10:25:17] jynus: that'd be awesome! [10:25:28] fine == 0.10 will be preferred [10:25:29] godog: what do you mean? [10:25:41] godog: I see [10:25:46] The following packages will be DOWNGRADED: [10:25:48] did you pin it, because we may fix it soon [10:26:21] no I've uploaded 0.10 also to stretch-wikimedia, which afaics was missing and only on jessie-wikimedia [10:26:47] when the puppetization is in place we can either upload 0.11 to stretch-wikimedia or remove 0.10 from stretch-wikimedia, up to you [10:26:59] ok [10:27:27] godog: confirmed it now works [10:28:27] sweet [10:28:37] marostegui: I started refactoring the backup classes: https://gerrit.wikimedia.org/r/493664 FYI [10:28:59] Ah nice! [10:29:20] so now it should be trivial to add new backup processes [10:29:59] <3 [10:30:42] I also prepared (but not implemented) the separation or archiving and compressing [10:31:10] where archiving is creating a tar for each database and compressing is creating a tarball with everything [10:35:41] With everything meaning every database on a section on a single tar? [10:37:37] erything means whatever was gotten from a single mysql instance [10:37:42] got it [10:39:28] o/ [10:39:37] got another fun wikidata related question [10:39:54] is it possible to know how much disk space our page content (text) currently uses? :) [10:42:58] define space used, because it may have many interpretations [10:43:32] it may mean logical storage taken by data [10:43:44] it may mean storage including inefficiencies due to metadata [10:44:10] it may mean including table indexes (which can be equal or larger than data size) [10:44:45] it may mean disk actually used, even if most of it is logically empty [10:47:37] in particular, table size info was removed from prometheus for 2 reasons- private data concerns (specially on wikis with low activity) and peformance (mostly on s3, with hundreds of thousands of files) [10:48:27] there is plans to include that info on tendril replacement under NDA, but it is WIP [10:48:35] there maybe some things on the current tendril [10:50:11] the problem is they are slow and not properly cached [10:50:55] so, less about the table size (I guess that would be revisions table), more about the text field itself, and actual disk used [10:51:22] oh, so you want to get some stats about the table? [10:51:38] so, I would suggest to get a non-production host [10:52:03] and run https://dev.mysql.com/doc/refman/5.7/en/procedure-analyse.html [10:52:24] it may be a bit intense on on a host with many queries [10:52:30] but the text isn't actually in the table right? or will it still show up in the regular mysql state for a table? [10:52:38] text? [10:52:44] oh, you mean like wikicode? [10:53:05] sorry, but I may not be understanding you [10:54:17] tell me what you want to do and I will suggest the best way to do it, if I can [10:57:42] This tells you the average row size, for example (InnoDB stats are not real time, but usually good enough): https://phabricator.wikimedia.org/P8145 [10:57:58] note this is ok to show for enwiki publicly, but not for smaller wikis [11:02:02] jynus: meeting? [11:17:59] addshore: https://tendril.wikimedia.org/report/schemas?schema=wik [14:20:28] strange, I can see 0.11 on buster, but I cannot find it on our repo's pool [14:24:20] oh, my fault, I was looking at the wikimedia repo, not the debian one [14:31:39] marostegui: you may have missed Reading Infrastructure team for reading lists, which also live on x1 [14:32:17] ah, thanks! [14:32:19] ammending! [14:41:23] FYI I am "breaking" db1114 so I can check its 0.11 upgrade [14:41:33] cool! [14:41:50] lots of new options other than the breakage [15:00:46] jynus: I would like to upgrade db1095 (backups) tomorrow morning to 10.1.38 [15:00:47] 10DBA, 10AbuseFilter, 10User-Daimona: Slow queries on abuse_filter_log using afl_action or afl_actions - https://phabricator.wikimedia.org/T217481 (10Daimona) [15:01:13] I am upgrading s2 to 10.1.38 in preparation for the master running 10.1.38 [15:01:21] (I upgraded a bunch of hosts to 10.1.38 on codfw today) [15:01:26] thanks [15:01:34] also dbstore2001, can I do that too? [15:03:36] sure [15:04:20] great, thanks! :) [15:15:26] 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Anomie) Unfortunately that made db2038 start using `times` rather than `actor_time` for the second query. I was hoping it would go the other way, because we wa... [15:19:11] 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) And that ^ also happens with 10.1.38, so I guess it is one again one of those "features" of the optimizer. [15:19:32] 10DBA, 10Operations, 10User-fgiunchedi: Upgrade mysqld_exporter in production - https://phabricator.wikimedia.org/T161296 (10jcrespo) a:03jcrespo [15:29:55] 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) It is unbelievable the optimizer chooses `times`: ` root@db2038.codfw.wmnet[dewiki]> show explain for 3580770; +------+-------------+---------+----... [15:39:31] marostegui / jynus suddenly global renames takes a whole lot of time to complete [15:39:43] we have one simple one running for half an hour already [15:39:50] any issues? [15:43:38] hauskatze: when did that start? [15:43:48] 15:09 [15:43:56] still running a.t.m [15:43:58] 15:09 UTC? [15:44:04] https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress?username=MJL [15:44:06] yup [15:44:18] 16:09 Madrid time :) [15:44:46] I'm not sure if that'd be the batch queries we introduced in renameuser last week? [15:46:12] So on the db-land everything is fine at the moment, we have changed some indexes on the logging table on 3 hosts (1 enwiki, 1 commons, 1 s5), but that is probably unrelated [15:46:38] Those hosts are not even showing up as slow queries [15:47:28] hauskatze: worth asking tgr if that change can make the renames slower [15:49:21] yeah, I can see it progressing however [15:49:31] so it may have failed and restarted some time later [15:49:45] or just wen't slower because of the change [15:49:57] it could also be wikidata, which is not that small for some users [15:50:28] I would suggest to use the ticket with the issues you reported askinf about that [15:50:53] otherwise it may be slower unintendendly, and he can checks the logs for you [15:51:51] jynus: I did an eswiki one for a newly registered user with very few edits and took some minutes, so renames are being slower [15:52:10] I mean, if this is expected and lets the DBs work better I have no concerns [15:52:29] I filed T217561 [15:52:30] T217561: Global renames are very slow - https://phabricator.wikimedia.org/T217561 [15:52:42] * hauskatze merienda time [15:53:10] que aproveche :) [15:53:23] * marostegui subscribed to that ticket [15:54:29] 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Anomie) I think we've analyzed this to the point where we can say that the best short-term fix will be to have the API alter its query to add `IGNORE INDEX (ti... [15:55:41] 10DBA, 10MediaWiki-API, 10Patch-For-Review, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) >>! In T71222#4998200, @Anomie wrote: > I think we've analyzed this to the point where we can say that the best short-term fi... [16:00:26] 10DBA, 10MediaWiki-API, 10Patch-For-Review, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10jcrespo) Please make sure `times` exist everywhere and with that name, as I know Manuel is in the middle of matching the indexes on that... [16:02:53] 10DBA, 10MediaWiki-API, 10Patch-For-Review, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) >>! In T71222#4998242, @jcrespo wrote: > Please make sure `times` exist everywhere and with that name, as I know Manuel is in... [17:20:01] 10DBA, 10MediaWiki-API, 10Patch-For-Review, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) Confirmed, that index exists everywhere. [17:37:26] 10DBA, 10SDC Engineering, 10Wikidata, 10Core Platform Team (MCR), and 5 others: Deploy MCR storage layer - https://phabricator.wikimedia.org/T174044 (10CCicalese_WMF) [18:13:39] 10DBA, 10MediaWiki-Special-pages: Special:Contribs/89.168.0.0/16 times out when given a date offset - https://phabricator.wikimedia.org/T217588 (10MusikAnimal) [18:14:21] 10DBA, 10MediaWiki-Special-pages: Special:Contribs/89.168.0.0/16 on enwiki times out when given a date offset - https://phabricator.wikimedia.org/T217588 (10MusikAnimal) [18:19:56] 10DBA, 10MediaWiki-Special-pages, 10Performance: Special:Contribs/89.168.0.0/16 on enwiki times out when given a date offset - https://phabricator.wikimedia.org/T217588 (10Reedy) >The ipc_hex_time index on ip_changes is for ipc_hex and ipc_rev_timestamp, the two relevant columns for this query, so I'm not su... [18:27:23] Reedy: I got locked out of that task ^ Can you re-add me? (or maybe I should ask for WMF-NDA) [18:27:34] lol [18:27:43] Yeah, you should [18:28:09] sorry, I thought you were nda [18:28:11] and [18:28:18] that it kept already added hosts [18:28:51] how can it "drop" existing subscribers? [18:29:09] I dunno! [18:29:17] I guess the rules get set to only WMF-NDA [18:29:25] Whereas for security etc, we allow task author, subscribers and security etc [18:29:30] musikanimal: Should be viewable again to you [18:29:37] oh, I see [18:29:38] yup, thanks! [18:29:48] sorry about that, certainly not intended [18:30:05] "protect as security issue" should do it properly [18:30:12] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Catrope) >>! In T217073#4992540, @jcrespo wrote: > Please tell us from which set of servers, which tables you deleted rows from, as we agree... [18:30:23] yeah, but I thought it would add security, and this technically is not [18:30:31] so I didn't want to bother you [18:31:21] Have you seen the security workboard? :P [18:32:36] I always wonder the same thing... there's no way for someone like me to *create* a task and make it private without adding Security [18:32:42] *wondered [18:32:57] and often it's not a security-related issue [18:33:52] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Marostegui) 05Open→03Resolved Thanks! I will create a ticket to get them defragmented [18:34:05] if only someone knew about phabricator to make an option :-D [18:35:04] actually it is that query [18:35:15] WMF-NDA is intended for situations that are nonpublic but not 'security' oriented [18:35:24] at this point [18:35:25] 10DBA: Defragment echo_event tables on x1 - https://phabricator.wikimedia.org/T217591 (10Marostegui) [18:35:25] yes, that is what I used [18:35:29] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Marostegui) [18:35:44] but apparently that removed his acces [18:35:46] 10DBA: Defragment echo_event tables on x1 - https://phabricator.wikimedia.org/T217591 (10Marostegui) p:05Triage→03Normal [18:36:10] our onboarding practices for group management leave a bit to be desired :) [18:37:17] I use NDA a lot because query may contain ips, but in most cases they are not security related, only private-data [18:38:30] musikanimal: can you see this form? https://phabricator.wikimedia.org/maniphest/task/edit/form/23/ [18:38:44] nope,ha [18:38:59] I'm creating a task to request WMF-NDA now [18:39:13] well, that's interesting...it's meant to be the wmf-nda version of report security issue to my knowledge [18:39:18] This form should be used to submit tasks which contain sensitive / private information. The task will only be visible to yourself and members of WMF-NDA (because they have signed the NDA Agreement.) [18:39:39] my guess is the forms/task-type stuff has muddied these waters [18:40:02] yeah it's saying I need to be in WMF-NDA [18:40:06] musikanimal: I will have a look at the query, but likely low priority as it is probably the combination of options used + existing data, and technically most conbinations will work, but there are others more frequent [18:40:53] okay thanks! [18:41:09] I check a lot of IP ranges and I've never seen it timeout, much less take anywhere close to that long [18:41:30] it is a /16 that probably has lots of records [18:41:45] For what is worth, that table hasn't had any schema change recently [18:41:49] So the structure hasn't changed [18:41:49] yeah, the timestamp is what does it [18:41:56] And having said that....time to cook! Bye! [18:41:57] I can have it return 500+ results and it's fast [18:42:45] this all scares me a little because I introduced ip_changes :/ [18:43:24] I would talk to the person that is maintaining it right now, I cannot remember who [18:43:34] but we talked recently about optimization [18:49:52] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10jcrespo) Thanks to both! [18:53:02] musikanimal: just to be clear, the protection is a preventive measure to avoid performance or future security problems, and it happens all the time [18:53:22] I am not too worried [18:53:26] sure, I understand [18:53:28] good :) [18:54:04] but we will check it, if it is really not an issue, make it public again, and then addit to the list of perf issues we have [18:54:15] I would be worried if all or most queries would be slow [18:54:26] but 1? happens literally every minute [18:54:54] interesting. I didn't realize it was that often! [18:55:02] I've created https://phabricator.wikimedia.org/T217596 by the way [18:56:50] musikanimal: The template looks more like it's for volunteers [18:56:59] yeah that's what I was thinking haha [18:57:14] not sure why staff aren't automatically in this group [18:57:17] yeah, in theory you sign one on contract [18:57:26] but access is only given when it is needed [18:57:28] at least developers who already have production db access [18:58:06] the important part is "when needed", many people don't need access- most don't use phab or even handle code [18:58:57] sure, makes sense [18:59:39] security will agree with me that as a policy, granting it is not an issue, but not giving it to people that doesn't need it reduces exposure [18:59:50] is there an established way of getting WMF-NDA for staff? [19:00:02] or should I work through the checklist https://phabricator.wikimedia.org/T217596 [19:00:15] I think the one on phab is handled by releng [19:00:42] but the policy for ops ldap, for example, is just requesting it [19:01:04] and justifying it [19:01:34] in theory you have signed it on onboarding [19:01:52] greg-g is probably best placed to ask... He's been adding people most recently (his team members) [19:01:52] I think it's a JFDI [19:02:47] hmm? [19:03:19] I leave you on your own, as I said, the ticket itself I don't think is high priority [19:03:19] wmf-nda: you need to verify them having an NDA. I've been adding my new team members who have an NDA due to being staff [19:03:22] Process for adding staff to WMF-NDA [19:10:28] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10jcrespo) @Catrope One last think, not sure if you are in charge of that, and obviously not a huge priority, but maybe there should be some c... [20:42:13] 10DBA, 10Growth-Team, 10Notifications, 10Schema-change: Remove etp_user from echo_target_page in production - https://phabricator.wikimedia.org/T217453 (10Catrope) [20:42:47] 10DBA, 10Growth-Team, 10Notifications, 10Schema-change: Remove etp_user from echo_target_page in production - https://phabricator.wikimedia.org/T217453 (10Catrope) Updated the task description to match the template. Sorry for forgetting, and thanks for the reminder, it's been a few years since I've request... [20:46:09] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Catrope) It's funny that you should mention Wikidata, because that wiki actually had a very very high cleanup rate (something like 274M/276M... [20:47:30] 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Catrope) Also if you want to talk more about this, feel free to ping me on IRC any time. It's a somewhat complicated topic and we probably b...