[06:09:07] <wikibugs>	 10DBA, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10Marostegui) a:03Marostegui
[06:52:27] <wikibugs>	 10DBA, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) MySQL has been stopped on dbstore1002 and won't be started again, as this host will be decommissioned
[08:07:30] <wikibugs>	 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) After the analyze, they (db2038 and db2059) now use the same plan: ` root@db2038.codfw.wmnet[dewiki]> explain SELECT log_id,log_type,log_action,log...
[10:15:00] <godog>	 marostegui: ugh re: T217542 ! I think the easiest for now is to stick with 0.10
[10:15:00] <stashbot>	 T217542: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542
[10:15:28] <godog>	 also for some reason I don't see 0.10 in stretch-wikimedia
[10:15:33] <marostegui>	 godog: I guess so, I did a apt full-upgrade on that host
[10:18:09] <godog>	 ok, I'm copying 0.10 into stretch-wikimedia, not sure why it wasn't there already
[10:18:20] <jynus>	 error: unknown short flag '-c' ?
[10:18:28] <jynus>	 but I don't see that option
[10:21:45] <godog>	 yeah looks like it changed between 0.10 and 0.11
[10:22:37] <marostegui>	 jynus: it changed from -collect to —collect
[10:22:54] <marostegui>	 godog: thanks :)
[10:23:00] <jynus>	 oh
[10:23:11] <jynus>	 well, we can fix it with an if on puppet
[10:23:49] <marostegui>	 the collect.info_schema.tables looks like also changed
[10:23:51] <jynus>	 not worried for stretch, but I guess it will be the new syntax on buster
[10:24:02] <jynus>	 I will test on db1114
[10:24:13] <jynus>	 or were you working on it?
[10:24:29] <marostegui>	 sure, feel free to test :)
[10:24:51] <jynus>	 also, I may take some time and remove any passwords CC godog
[10:24:56] <marostegui>	 I just reported it and create the ticket, and workaround it to get it up for now
[10:25:17] <godog>	 should be fine now btw for full-upgrades on stretch
[10:25:17] <godog>	 jynus: that'd be awesome!
[10:25:28] <godog>	 fine == 0.10 will be preferred
[10:25:29] <jynus>	 godog: what do you mean?
[10:25:41] <marostegui>	 godog: I see
[10:25:46] <marostegui>	 The following packages will be DOWNGRADED:
[10:25:48] <jynus>	 did you pin it, because we may fix it soon
[10:26:21] <godog>	 no I've uploaded 0.10 also to stretch-wikimedia, which afaics was missing and only on jessie-wikimedia
[10:26:47] <godog>	 when the puppetization is in place we can either upload 0.11 to stretch-wikimedia or remove 0.10 from stretch-wikimedia, up to you
[10:26:59] <jynus>	 ok
[10:27:27] <marostegui>	 godog: confirmed it now works
[10:28:27] <godog>	 sweet
[10:28:37] <jynus>	 marostegui: I started refactoring the backup classes: https://gerrit.wikimedia.org/r/493664 FYI
[10:28:59] <marostegui>	 Ah nice!
[10:29:20] <jynus>	 so now it should be trivial to add new backup processes
[10:29:59] <marostegui>	 <3
[10:30:42] <jynus>	 I also prepared (but not implemented) the separation or archiving and compressing
[10:31:10] <jynus>	 where archiving is creating a tar for each database and compressing is creating a tarball with everything
[10:35:41] <marostegui>	 With everything meaning every database on a section on a single tar?
[10:37:37] <jynus>	 erything means whatever was gotten from a single mysql instance
[10:37:42] <marostegui>	 got it
[10:39:28] <addshore>	 o/
[10:39:37] <addshore>	 got another fun wikidata related question
[10:39:54] <addshore>	 is it possible to know how much disk space our page content (text) currently uses? :)
[10:42:58] <jynus>	 define space used, because it may have many interpretations
[10:43:32] <jynus>	 it may mean logical storage taken by data
[10:43:44] <jynus>	 it may mean storage including inefficiencies due to metadata
[10:44:10] <jynus>	 it may mean including table indexes (which can be equal or larger than data size)
[10:44:45] <jynus>	 it may mean disk actually used, even if most of it is logically empty
[10:47:37] <jynus>	 in particular, table size info was removed from prometheus for 2 reasons- private data concerns (specially on wikis with low activity) and peformance (mostly on s3, with hundreds of thousands of files)
[10:48:27] <jynus>	 there is plans to include that info on tendril replacement under NDA, but it is WIP
[10:48:35] <jynus>	 there maybe some things on the current tendril
[10:50:11] <jynus>	 the problem is they are slow and not properly cached
[10:50:55] <addshore>	 so, less about the table size (I guess that would be revisions table), more about the text field itself, and actual disk used
[10:51:22] <jynus>	 oh, so you want to get some stats about the table?
[10:51:38] <jynus>	 so, I would suggest to get a non-production host
[10:52:03] <jynus>	 and run https://dev.mysql.com/doc/refman/5.7/en/procedure-analyse.html
[10:52:24] <jynus>	 it may be a bit intense on on a host with many queries
[10:52:30] <addshore>	 but the text isn't actually in the table right? or will it still show up in the regular mysql state for a table?
[10:52:38] <jynus>	 text?
[10:52:44] <jynus>	 oh, you mean like wikicode?
[10:53:05] <jynus>	 sorry, but I may not be understanding you
[10:54:17] <jynus>	 tell me what you want to do and I will suggest the best way to do it, if I can
[10:57:42] <jynus>	 This tells you the average row size, for example (InnoDB stats are not real time, but usually good enough): https://phabricator.wikimedia.org/P8145
[10:57:58] <jynus>	 note this is ok to show for enwiki publicly, but not for smaller wikis
[11:02:02] <marostegui>	 jynus: meeting?
[11:17:59] <jynus>	 addshore: https://tendril.wikimedia.org/report/schemas?schema=wik
[14:20:28] <jynus>	 strange, I can see 0.11 on buster, but I cannot find it on our repo's pool
[14:24:20] <jynus>	 oh, my fault, I was looking at the wikimedia repo, not the debian one
[14:31:39] <jynus>	 marostegui: you may have missed Reading Infrastructure team for reading lists, which also live on x1
[14:32:17] <marostegui>	 ah, thanks!
[14:32:19] <marostegui>	 ammending!
[14:41:23] <jynus>	 FYI I am "breaking" db1114 so I can check its 0.11 upgrade
[14:41:33] <marostegui>	 cool!
[14:41:50] <jynus>	 lots of new options other than the breakage
[15:00:46] <marostegui>	 jynus: I would like to upgrade db1095 (backups) tomorrow morning to 10.1.38
[15:00:47] <wikibugs>	 10DBA, 10AbuseFilter, 10User-Daimona: Slow queries on abuse_filter_log using afl_action or afl_actions - https://phabricator.wikimedia.org/T217481 (10Daimona)
[15:01:13] <marostegui>	 I am upgrading s2 to 10.1.38 in preparation for the master running 10.1.38
[15:01:21] <marostegui>	 (I upgraded a bunch of hosts to 10.1.38 on codfw today)
[15:01:26] <jynus>	 thanks
[15:01:34] <marostegui>	 also dbstore2001, can I do that too?
[15:03:36] <jynus>	 sure
[15:04:20] <marostegui>	 great, thanks! :)
[15:15:26] <wikibugs>	 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Anomie) Unfortunately that made db2038 start using `times` rather than `actor_time` for the second query. I was hoping it would go the other way, because we wa...
[15:19:11] <wikibugs>	 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) And that ^ also happens with 10.1.38, so I guess it is one again one of those "features" of the optimizer.
[15:19:32] <wikibugs>	 10DBA, 10Operations, 10User-fgiunchedi: Upgrade mysqld_exporter in production - https://phabricator.wikimedia.org/T161296 (10jcrespo) a:03jcrespo
[15:29:55] <wikibugs>	 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) It is unbelievable the optimizer chooses `times`: ` root@db2038.codfw.wmnet[dewiki]> show explain for 3580770; +------+-------------+---------+----...
[15:39:31] <hauskatze>	 marostegui / jynus suddenly global renames takes a whole lot of time to complete
[15:39:43] <hauskatze>	 we have one simple one running for half an hour already
[15:39:50] <hauskatze>	 any issues?
[15:43:38] <marostegui>	 hauskatze: when did that start?
[15:43:48] <hauskatze>	 15:09
[15:43:56] <hauskatze>	 still running a.t.m
[15:43:58] <marostegui>	 15:09 UTC?
[15:44:04] <hauskatze>	 https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress?username=MJL
[15:44:06] <hauskatze>	 yup
[15:44:18] <hauskatze>	 16:09 Madrid time :)
[15:44:46] <hauskatze>	 I'm not sure if that'd be the batch queries we introduced in renameuser last week?
[15:46:12] <marostegui>	 So on the db-land everything is fine at the moment, we have changed some indexes on the logging table on 3 hosts (1 enwiki, 1 commons, 1 s5), but that is probably unrelated
[15:46:38] <marostegui>	 Those hosts are not even showing up as slow queries 
[15:47:28] <marostegui>	 hauskatze: worth asking tgr if that change can make the renames slower
[15:49:21] <jynus>	 yeah, I can see it progressing however
[15:49:31] <jynus>	 so it may have failed and restarted some time later
[15:49:45] <jynus>	 or just wen't slower because of the change
[15:49:57] <jynus>	 it could also be wikidata, which is not that small for some users
[15:50:28] <jynus>	 I would suggest to use the ticket with the issues you reported askinf about that
[15:50:53] <jynus>	 otherwise it may be slower unintendendly, and he can checks the logs for you
[15:51:51] <hauskatze>	 jynus: I did an eswiki one for a newly registered user with very few edits and took some minutes, so renames are being slower
[15:52:10] <hauskatze>	 I mean, if this is expected and lets the DBs work better I have no concerns
[15:52:29] <hauskatze>	 I filed T217561
[15:52:30] <stashbot>	 T217561: Global renames are very slow - https://phabricator.wikimedia.org/T217561
[15:52:42] * hauskatze merienda time
[15:53:10] <marostegui>	 que aproveche :)
[15:53:23] * marostegui subscribed to that ticket
[15:54:29] <wikibugs>	 10DBA, 10MediaWiki-API, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Anomie) I think we've analyzed this to the point where we can say that the best short-term fix will be to have the API alter its query to add `IGNORE INDEX (ti...
[15:55:41] <wikibugs>	 10DBA, 10MediaWiki-API, 10Patch-For-Review, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) >>! In T71222#4998200, @Anomie wrote: > I think we've analyzed this to the point where we can say that the best short-term fi...
[16:00:26] <wikibugs>	 10DBA, 10MediaWiki-API, 10Patch-For-Review, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10jcrespo) Please make sure `times` exist everywhere and with that name, as I know Manuel is in the middle of matching the indexes on that...
[16:02:53] <wikibugs>	 10DBA, 10MediaWiki-API, 10Patch-For-Review, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) >>! In T71222#4998242, @jcrespo wrote: > Please make sure `times` exist everywhere and with that name, as I know Manuel is in...
[17:20:01] <wikibugs>	 10DBA, 10MediaWiki-API, 10Patch-For-Review, 10Performance: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 (10Marostegui) Confirmed, that index exists everywhere.
[17:37:26] <wikibugs>	 10DBA, 10SDC Engineering, 10Wikidata, 10Core Platform Team (MCR), and 5 others: Deploy MCR storage layer - https://phabricator.wikimedia.org/T174044 (10CCicalese_WMF)
[18:13:39] <wikibugs>	 10DBA, 10MediaWiki-Special-pages: Special:Contribs/89.168.0.0/16 times out when given a date offset - https://phabricator.wikimedia.org/T217588 (10MusikAnimal)
[18:14:21] <wikibugs>	 10DBA, 10MediaWiki-Special-pages: Special:Contribs/89.168.0.0/16 on enwiki times out when given a date offset - https://phabricator.wikimedia.org/T217588 (10MusikAnimal)
[18:19:56] <wikibugs>	 10DBA, 10MediaWiki-Special-pages, 10Performance: Special:Contribs/89.168.0.0/16 on enwiki times out when given a date offset - https://phabricator.wikimedia.org/T217588 (10Reedy) >The ipc_hex_time index on ip_changes is for ipc_hex and ipc_rev_timestamp, the two relevant columns for this query, so I'm not su...
[18:27:23] <musikanimal>	 Reedy: I got locked out of that task ^ Can you re-add me? (or maybe I should ask for WMF-NDA)
[18:27:34] <Reedy>	 lol
[18:27:43] <Reedy>	 Yeah, you should
[18:28:09] <jynus>	 sorry, I thought you were nda
[18:28:11] <jynus>	 and
[18:28:18] <jynus>	 that it kept already added hosts
[18:28:51] <jynus>	 how can it "drop" existing subscribers?
[18:29:09] <musikanimal>	 I dunno!
[18:29:17] <Reedy>	 I guess the rules get set to only WMF-NDA
[18:29:25] <Reedy>	 Whereas for security etc, we allow task author, subscribers and security etc
[18:29:30] <Reedy>	 musikanimal: Should be viewable again to you
[18:29:37] <jynus>	 oh, I see
[18:29:38] <musikanimal>	 yup, thanks!
[18:29:48] <jynus>	 sorry about that, certainly not intended
[18:30:05] <Reedy>	 "protect as security issue" should do it properly
[18:30:12] <wikibugs>	 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Catrope) >>! In T217073#4992540, @jcrespo wrote: > Please tell us from which set of servers, which tables you deleted rows from, as we agree...
[18:30:23] <jynus>	 yeah, but I thought it would add security, and this technically is not
[18:30:31] <jynus>	 so I didn't want to bother you
[18:31:21] <Reedy>	 Have you seen the security workboard? :P
[18:32:36] <musikanimal>	 I always wonder the same thing... there's no way for someone like me to *create* a task and make it private without adding Security
[18:32:42] <musikanimal>	 *wondered
[18:32:57] <musikanimal>	 and often it's not a security-related issue
[18:33:52] <wikibugs>	 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Marostegui) 05Open→03Resolved Thanks! I will create a ticket to get them defragmented
[18:34:05] <jynus>	 if only someone knew about phabricator to make an option :-D
[18:35:04] <jynus>	 actually it is that query
[18:35:15] <chasemp>	 WMF-NDA is intended for situations that are nonpublic but not 'security' oriented
[18:35:24] <chasemp>	 at this point
[18:35:25] <wikibugs>	 10DBA: Defragment echo_event tables on x1 - https://phabricator.wikimedia.org/T217591 (10Marostegui)
[18:35:25] <jynus>	 yes, that is what I used
[18:35:29] <wikibugs>	 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Marostegui)
[18:35:44] <jynus>	 but apparently that removed his acces
[18:35:46] <wikibugs>	 10DBA: Defragment echo_event tables on x1 - https://phabricator.wikimedia.org/T217591 (10Marostegui) p:05Triage→03Normal
[18:36:10] <chasemp>	 our onboarding practices for group management leave a bit to be desired :)
[18:37:17] <jynus>	 I use NDA a lot because query may contain ips, but in most cases they are not security related, only private-data
[18:38:30] <chasemp>	 musikanimal: can you see this form? https://phabricator.wikimedia.org/maniphest/task/edit/form/23/
[18:38:44] <musikanimal>	 nope,ha
[18:38:59] <musikanimal>	 I'm creating a task to request WMF-NDA now
[18:39:13] <chasemp>	 well, that's interesting...it's meant to be the wmf-nda version of report security issue to my knowledge
[18:39:18] <chasemp>	 This form should be used to submit tasks which contain sensitive / private information. The task will only be visible to yourself and members of WMF-NDA (because they have signed the NDA Agreement.)
[18:39:39] <chasemp>	 my guess is the forms/task-type stuff has muddied these waters
[18:40:02] <musikanimal>	 yeah it's saying I need to be in WMF-NDA
[18:40:06] <jynus>	 musikanimal: I will have a look at the query, but likely low priority as it is probably the combination of options used + existing data, and technically most conbinations will work, but there are others more frequent
[18:40:53] <musikanimal>	 okay thanks!
[18:41:09] <musikanimal>	 I check a lot of IP ranges and I've never seen it timeout, much less take anywhere close to that long
[18:41:30] <jynus>	 it is a /16 that probably has lots of records
[18:41:45] <marostegui>	 For what is worth, that table hasn't had any schema change recently 
[18:41:49] <marostegui>	 So the structure hasn't changed
[18:41:49] <musikanimal>	 yeah, the timestamp is what does it
[18:41:56] <marostegui>	 And having said that....time to cook! Bye!
[18:41:57] <musikanimal>	 I can have it return 500+ results and it's fast
[18:42:45] <musikanimal>	 this all scares me a little because I introduced ip_changes :/
[18:43:24] <jynus>	 I would talk to the person that is maintaining it right now, I cannot remember who
[18:43:34] <jynus>	 but we talked recently about optimization
[18:49:52] <wikibugs>	 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10jcrespo) Thanks to both!
[18:53:02] <jynus>	 musikanimal: just to be clear, the protection is a preventive measure to avoid performance or future security problems, and it happens all the time
[18:53:22] <jynus>	 I am not too worried
[18:53:26] <musikanimal>	 sure, I understand
[18:53:28] <musikanimal>	 good :)
[18:54:04] <jynus>	 but we will check it, if it is really not an issue, make it public again, and then addit to the list of perf issues we have
[18:54:15] <jynus>	 I would be worried if all or most queries would be slow
[18:54:26] <jynus>	 but 1? happens literally every minute
[18:54:54] <musikanimal>	 interesting. I didn't realize it was that often!
[18:55:02] <musikanimal>	 I've created https://phabricator.wikimedia.org/T217596 by the way
[18:56:50] <Reedy>	 musikanimal: The template looks more like it's for volunteers
[18:56:59] <musikanimal>	 yeah that's what I was thinking haha
[18:57:14] <musikanimal>	 not sure why staff aren't automatically in this group
[18:57:17] <jynus>	 yeah, in theory you sign one on contract
[18:57:26] <jynus>	 but access is only given when it is needed
[18:57:28] <musikanimal>	 at least developers who already have production db access
[18:58:06] <jynus>	 the important part is "when needed", many people don't need access- most don't use phab or even handle code
[18:58:57] <musikanimal>	 sure, makes sense
[18:59:39] <jynus>	 security will agree with me that as a policy, granting it is not an issue, but not giving it to people that doesn't need it reduces exposure
[18:59:50] <musikanimal>	 is there an established way of getting WMF-NDA for staff?
[19:00:02] <musikanimal>	 or should I work through the checklist https://phabricator.wikimedia.org/T217596
[19:00:15] <jynus>	 I think the one on phab is handled by releng
[19:00:42] <jynus>	 but the policy for ops ldap, for example, is just requesting it
[19:01:04] <jynus>	 and justifying it
[19:01:34] <jynus>	 in theory you have signed it on onboarding
[19:01:52] <Reedy>	 greg-g is probably best placed to ask... He's been adding people most recently (his team members)
[19:01:52] <Reedy>	 I think it's a JFDI
[19:02:47] <greg-g>	 hmm?
[19:03:19] <jynus>	 I leave you on your own, as I said, the ticket itself I don't think is high priority
[19:03:19] <greg-g>	 wmf-nda: you need to verify them having an NDA. I've been adding my new team members who have an NDA due to being staff
[19:03:22] <Reedy>	 Process for adding staff to WMF-NDA
[19:10:28] <wikibugs>	 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10jcrespo) @Catrope One last think, not sure if you are in charge of that, and obviously not a huge priority, but maybe there should be some c...
[20:42:13] <wikibugs>	 10DBA, 10Growth-Team, 10Notifications, 10Schema-change: Remove etp_user from echo_target_page in production - https://phabricator.wikimedia.org/T217453 (10Catrope)
[20:42:47] <wikibugs>	 10DBA, 10Growth-Team, 10Notifications, 10Schema-change: Remove etp_user from echo_target_page in production - https://phabricator.wikimedia.org/T217453 (10Catrope) Updated the task description to match the template. Sorry for forgetting, and thanks for the reminder, it's been a few years since I've request...
[20:46:09] <wikibugs>	 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Catrope) It's funny that you should mention Wikidata, because that wiki actually had a very very high cleanup rate (something like 274M/276M...
[20:47:30] <wikibugs>	 10DBA, 10Notifications, 10Growth-Team (Current Sprint), 10WorkType-Maintenance: Clean up orphaned echo_event rows again - https://phabricator.wikimedia.org/T217073 (10Catrope) Also if you want to talk more about this, feel free to ping me on IRC any time. It's a somewhat complicated topic and we probably b...