[06:33:08] 10DBA, 10AbuseFilter: DBA review of purgeOldLogIPData.php - https://phabricator.wikimedia.org/T186973#3960880 (10Marostegui) In addition to what Jaime said, I would suggest we run it (if possible) on s5 or s6 just for start, to try to isolate things in case there are issues before running it on big wikis. [06:36:05] 10DBA, 10Wikimedia-Site-requests: Global rename of Etienfr → Limotecariu: supervision needed - https://phabricator.wikimedia.org/T186937#3962018 (10Marostegui) if you want to go ahead now, go for it [06:46:15] 10DBA, 10Patch-For-Review: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599#3962024 (10Marostegui) [08:01:36] 10DBA, 10Wikimedia-Site-requests: Global rename of Etienfr → Limotecariu: supervision needed - https://phabricator.wikimedia.org/T186937#3962172 (10Cyberpower678) @Marostegui, still here? [08:02:00] 10DBA, 10Wikimedia-Site-requests: Global rename of Etienfr → Limotecariu: supervision needed - https://phabricator.wikimedia.org/T186937#3962174 (10Marostegui) yeah! go for it if you like [08:05:22] 10DBA, 10Wikimedia-Site-requests: Global rename of Etienfr → Limotecariu: supervision needed - https://phabricator.wikimedia.org/T186937#3962181 (10Cyberpower678) https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Limotecariu [08:09:12] 10DBA, 10Wikimedia-Site-requests: Global rename of Etienfr → Limotecariu: supervision needed - https://phabricator.wikimedia.org/T186937#3962193 (10Marostegui) thanks! [08:36:26] 10DBA, 10Patch-For-Review: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599#3962257 (10Marostegui) [08:49:37] 10DBA, 10Patch-For-Review: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599#3962298 (10Marostegui) [08:50:57] 10DBA, 10Patch-For-Review: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599#3962300 (10Marostegui) 05Open>03Resolved This is all done [09:53:35] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#3962541 (10Marostegui) Progress of s5 (for this initial alter I am doing it on codfw host by host) once we have seen no replication issues or any... [09:53:57] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#3962542 (10Marostegui) This is how the table looks like after the ALTER: ``` root@db2089.codfw.wmnet[dewiki]> show create table externallinks\G *... [09:54:17] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#3962544 (10Marostegui) [09:54:36] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#2871964 (10Marostegui) [10:28:37] is tendril working for you? [10:29:13] jynus: WFM [10:29:19] it came back, it just crashed again [10:29:26] :( [10:29:30] it monitored itself well [10:29:36] lol [10:29:42] db1011 uptime 50s [10:30:23] 10DBA, 10AbuseFilter: DBA review of purgeOldLogIPData.php - https://phabricator.wikimedia.org/T186973#3962700 (10MarcoAurelio) @Marostegui You mean `foreachwikiindblist s5 extensions/AbuseFilter/maintenance/purgeOldLogIPData.php`, then s6 and so on? [10:34:34] 10DBA, 10AbuseFilter: DBA review of purgeOldLogIPData.php - https://phabricator.wikimedia.org/T186973#3962723 (10Marostegui) >>! In T186973#3962700, @MarcoAurelio wrote: > @Marostegui You mean `foreachwikiindblist s5 extensions/AbuseFilter/maintenance/purgeOldLogIPData.php`, then s6 and so on? Yeah, at least... [10:35:16] 10DBA, 10AbuseFilter: DBA review of purgeOldLogIPData.php - https://phabricator.wikimedia.org/T186973#3962729 (10MarcoAurelio) Totally agree. [10:39:31] https://www.reddit.com/r/sysadmin/comments/7wuk62/im_not_a_dba_if_you_want_a_dba_go_hire_a_dba/ [10:41:16] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 2 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#3962758 (10Marostegui) Progress of s5 (for this initial alter I am doing it on codfw host... [10:41:29] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 2 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#3962760 (10Marostegui) Progress of s5 (for this initial alter I am doing it on codfw host... [10:42:50] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 2 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#3962766 (10Marostegui) This is how the archive table looks like after the alter: ``` root@... [10:43:25] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 2 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#3962768 (10Marostegui) [10:48:37] 10DBA, 10AbuseFilter: DBA review of purgeOldLogIPData.php - https://phabricator.wikimedia.org/T186973#3960880 (10Reedy) >>! In T186973#3961787, @jcrespo wrote: > Please set an ORDER BY, the LIMIT without an order by can lead to different results on masters and replicas- while you can argue that the same thing... [12:39:15] marostegui: around? [16:26:10] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964027 (10Anomie) p:05Triage>03High [16:27:45] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 2 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#3964058 (10Anomie) >>! In T185128#3962766, @Marostegui wrote: > This is how the archive ta... [16:27:49] ^that doesn's soud scary [16:27:53] indeed [16:27:57] I mean anomie's [16:28:20] yeah yeah [16:28:48] actually, now that I read it [16:28:49] not scary [16:28:55] just I clapped prematurelly [16:29:06] :'-( [16:30:21] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 2 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#3964084 (10Marostegui) >>! In T185128#3964058, @Anomie wrote: >>>! In T185128#3962766, @Ma... [16:30:49] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#3964085 (10Anomie) >>! In T153182#3962542, @Marostegui wrote: > This is how the table looks like after the ALTER: Looks good to me. [16:34:13] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964102 (10Anomie) [16:36:03] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964027 (10Anomie) [16:38:02] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 2 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#3964130 (10Anomie) Combining the schema change to fix T187089 with this change might be be... [16:39:05] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 2 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#3964142 (10Marostegui) >>! In T185128#3964130, @Anomie wrote: > Combining the schema chang... [16:53:31] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964226 (10Anomie) >>! On IRC, @Marostegui wrote: > May I ask for the alter tables needed? The schema changes woul... [16:55:06] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964241 (10Marostegui) >>! In T187089#3964226, @Anomie wrote: >>>! On IRC, @Marostegui wrote: >> May I ask for the... [16:57:01] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964244 (10Marostegui) [17:02:10] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964265 (10Marostegui) [17:03:14] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964266 (10Anomie) Something else to consider would be to add the img_description_id and img_actor columns to the image table ri... [17:07:49] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964277 (10Marostegui) >>! In T187089#3964266, @Anomie wrote: > Something else to consider would be to add the img_description_i... [17:10:26] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install tendril2001 - https://phabricator.wikimedia.org/T186123#3964293 (10Papaul) @jcrespo @Marostegui Hello this has been already a week since last week I have no update if we have to keep the name or not on this system. if you have time... [17:11:49] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install tendril2001 - https://phabricator.wikimedia.org/T186123#3964305 (10jcrespo) @Papaul as you may have heard, we are in a kind of an emergency right now busy on fixing other stuff, this will have to be delayed. [17:13:24] 10DBA, 10Operations, 10hardware-requests, 10ops-codfw: Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029 - https://phabricator.wikimedia.org/T184090#3964308 (10RobH) [17:13:30] 10DBA, 10Goal, 10Patch-For-Review: Decommission database hosts <= db2031 (tracking) - https://phabricator.wikimedia.org/T176243#3964311 (10RobH) [17:13:33] 10DBA, 10Operations, 10hardware-requests, 10ops-codfw: Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029 - https://phabricator.wikimedia.org/T184090#3872109 (10RobH) 05Open>03Resolved [17:15:44] 10DBA, 10Operations, 10hardware-requests, 10ops-codfw: Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029 - https://phabricator.wikimedia.org/T184090#3964327 (10RobH) [17:50:16] jynus: remember the script we were talking about yesterday? Apparently has been running cronned on the puppet for some time... [17:50:32] although I'm not sure if it's been working properly [17:50:54] let me do some checks [17:50:55] I'd like you to check, if possible, if there are old afl_ip data on, say, eswiki production database [17:51:02] it is getting late here [17:51:08] oh sorry [17:51:13] I am making you a favor here [17:51:33] think this is normally the end of my day, and these weeks have been quite crazy [17:51:42] with multiple service failures [17:51:56] jynus: you don't need too, if your job day is over, it's over [17:52:09] let's do one last thing today [17:52:23] well, muchas gracias [17:52:37] but please don't make me work, please link me to the task [17:53:21] there is like 30 I have been laterly working on/involved just today [17:54:00] https://phabricator.wikimedia.org/T187053 ? [17:54:04] T187053 [17:54:05] T187053: Setup puppet cron to delete old data daily - https://phabricator.wikimedia.org/T187053 [17:54:55] do you want me to check the cron or the data? [17:55:13] Hauskatze ^? [17:55:16] the cron apparently works, but the script was broken mid 2016 [17:55:35] just see if there's data before 20160916011613 and after [17:55:57] well, I mentioned the order by [17:56:04] $this->requireExtension ('AbuseFilter') broke the script [17:56:05] that could contribute to have differences [17:56:07] ah [17:56:14] because on extension.json it's named 'Abuse Filter' [17:56:18] then some monitoring should be in place [17:56:27] to prevent that in the future [17:56:29] that's fixed, merged and cherry-picked [17:56:53] there should be jobs checking things work and complaing if not [17:56:56] some sort of email when the script fails to some people [17:56:59] yes [17:57:03] I do that on labs for my bot [17:57:06] jobs/crons/icinga [17:57:25] the platform is there, just noone wrote those/sent a patch [17:57:51] I said many times we need unit testing for production [17:58:03] that would not be unit testing, but you get the idea [17:58:18] similar to what performance has to check latencies [17:58:30] to validate from time to time functionality on produciton [17:59:30] do you have the table definition handy? [17:59:52] it is an extension, it is not on tables.sql [18:00:13] select * from abuse_filter_log where afl_timestamp<=20180101000000; [18:00:24] sorry [18:00:36] select * from abuse_filter_log where afl_timestamp<=2017111000000; [18:00:42] mysql -h db1052.eqiad.wmnet enwiki -e "SELECT min(afl_timestamp) FROM abuse_filter_log" [18:00:45] mmm [18:00:55] but that I suppose includes sanitized entries [18:00:56] although you can get zillions of reports [18:01:03] which is the one to compare to "" ? [18:01:08] which field [18:01:25] select * from abuse_filter_log where afl_ip <> "" ORDER BY afl_id limit 1; is what Reedy did [18:02:13] I don't have the right index, that could take some time [18:02:23] mysql -h db1052.eqiad.wmnet enwiki -e "SELECT min(afl_timestamp) FROM abuse_filter_log WHERE afl_ip <>''" [18:02:27] 20160916085759 [18:02:47] so that means the oldest afl_ip row is from september 2016 right? [18:02:57] with afl_ip <>'' [18:03:09] there are older rows per sei [18:03:11] that is, the last empty row [18:03:18] non-empty [18:03:53] so it's been running before, was broken, no one noticed until Huji et all dug into making those data avalaible to checkusers [18:04:11] now that the script is fixed it'll clean those rows I guess [18:04:22] mysql -h db1052.eqiad.wmnet enwiki -e "SELECT * FROM abuse_filter_log WHERE afl_ip <>'' and afl_timestamp >= '20160916085759' ORDER BY afl_timestamp ASC LIMIT 10" [18:04:42] that gives afl_id =16553796 and next ones [18:04:53] so it is not a single instance [18:05:05] was the change of the order by deployed? [18:05:18] the 'order by' you mean? [18:05:29] I don't think so but added you and Manuel as reviewers [18:05:31] there was a missing order by on the purge/cleanup [18:05:46] let me fetch the patch [18:06:17] https://gerrit.wikimedia.org/r/#/c/409839/ [18:06:36] for example, the same query give different results on master and 1 replica [18:07:18] I think it is just being purged [18:08:22] with the fix it'll purge the remaining rows, right? [18:08:23] the idea is correct, but I am not 100% sure that is correct [18:08:32] I need to test it [18:08:51] actually [18:08:58] now that I see it, it does select + update [18:09:02] so not needed [18:09:07] I'm going to open a task requesting that some 'critical' scripts, when they crash, notify some people [18:09:09] I did comment to that affect on the task :) [18:09:16] sorry, I didn't see that [18:09:17] purge_checkuser and purge_abusefilter [18:09:24] It shouldn't hurt anything by being added (and therefore being explicit) [18:09:27] I thought it did a delete or just an update [18:09:45] in this case that isssue is solved by doing a by id update [18:09:53] so no need to apply that [18:10:00] I will abandon it [18:10:04] would [18:10:12] beter not add complexity if somthing is working [18:10:22] do you need me to check the cron, or it is now working? [18:10:31] after the other patch? [18:10:38] jynus: if you want to check it just to be sure? [18:11:51] $ crontab -u www-data -l | grep purgeOld -> 15 1 * * * /usr/local/bin/foreachwiki extensions/AbuseFilter/maintenance/purgeOldLogIPData.php >/dev/null 2>&1 [18:11:56] Unfortunately we redirect everything to /dev/null... [18:12:22] well, because why us root should we receive that spam? [18:12:47] I was more meaning redirecting the actual output to a file [18:12:50] a proper log [18:12:55] the idea is to check the results rather than the execution [18:13:11] so checking that there are no old results, alarming if not, that is more reliable [18:13:22] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964572 (10Marostegui) Now that I think about it, we are only altering the image table on some wikis, not all. That means that... [18:13:24] the file is not bad, but has the problem of rotation [18:15:06] and then permissions, etc. [18:15:34] so things are woking now, right, you do not really need me? [18:17:48] T187101 [18:17:49] T187101: Setup some alert mechanism when some 'critical' cron jobs fail - https://phabricator.wikimedia.org/T187101 [18:18:07] yell at me, wave hands, punch me, etc. [18:18:12] I think there is an abuse laterly of Operations tag [18:18:23] we will have to ask for money to use it [18:18:31] :-) [18:18:44] but it's a bit 'embarrasing' that this broke in 2016 and nobody noticed until now :) [18:18:53] I don't disagree [18:19:03] thankfully no one can access that data [18:19:07] for now [18:19:13] I am just saying that the mechanism exists- icinga, prometheus, etc. [18:19:32] I'm not privy to that, I'm just pointing out the perceived problem :) [18:19:34] just someone has to program it, ideally the non-existent maintainer [18:19:41] :-) [18:19:53] so, to sum up, the script is fine and the cron is right, right? [18:20:51] it should happen every day at 1:15 ? [18:22:31] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964606 (10Anomie) It looks like there are only 33 wikis that don't need the image table altered for this task, all on s3. We co... [18:22:32] https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mediawiki/manifests/maintenance/purge_abusefilter.pp [18:22:35] apparently yes [18:23:04] when was the patch fixing the code deployed? [18:23:30] the $this->requireExtension one? [18:23:34] yes [18:24:19] https://gerrit.wikimedia.org/r/#/c/409481/ merged 3 days ago [18:24:38] so it doesn't work? [18:24:40] but Reedy cherry-picked https://gerrit.wikimedia.org/r/#/c/409903/ [18:24:46] 3 hours ago [18:24:56] there has been no mediawiki train on production wikis yet [18:24:58] I said deployed, not merged! [18:25:00] :-) [18:25:01] ok [18:25:06] so it will be fixed this week [18:25:06] sorry [18:25:23] it's on wmf20 now so it should be live right? [18:25:35] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964611 (10Anomie) We'd also need to make sure any newly-created wikis got the image table alter, at least until we merge a patc... [18:25:37] or do they have to scap it? [18:25:39] well, then in a few hours [18:25:43] chachi [18:25:50] I'll clean Beta Cluster now [18:26:06] and I'll try to figure out how to cron that job there too [18:28:15] and wrt the operations tag, I'm not sure if there's a better tag to ping people working on the wikimedia puppet [18:28:23] in fact the repo is operations/puppet :-) [18:29:13] Hauskatze: the fact that someting is on operations puppet only means it requires "root" aproval [18:29:37] everthing should be on operations puppet, from the point of view of configuration management :-) [18:30:31] e.g. in an ideal situation, whoever maintains abusefilter, will create a patch, and yes, we will review it and aprove it, if that makes sense [18:31:41] but it is the same thing that with mediawiki deploys- anyone can create a patch, only "deployers" can aprove it and deploy it, if it makes sense [18:32:45] jynus: there are no active maintainers for abusefilter - T185154 [18:32:46] T185154: AbuseFilter (and dependencies): code stewardship review - https://phabricator.wikimedia.org/T185154 [18:33:12] I don't blame anyone. AF is really complex [18:33:21] yet critical for WMF sites now [18:33:21] that is the problem :-) [18:33:52] as I said, will disconnect now, see you! [18:34:37] hopefuly I will not have to come back today at 1:15 [18:34:47] because the databases are overloaded [18:34:54] :-) [18:34:55] good evening [18:34:59] and thanks so much [18:35:10] I did nothink, it was all reedy's work [18:35:21] if something I distracted both of you [18:42:00] 10DBA, 10Wikimedia-Site-requests: Global rename of Etienfr → Limotecariu: supervision needed - https://phabricator.wikimedia.org/T186937#3964676 (10Cyberpower678) 05Open>03Resolved a:03Cyberpower678 Successfully finished. [18:42:23] now that's funny [18:42:26] maurelio@deployment-tin:~$ mwscript extensions/AbuseFilter/maintenance/purgeOldIPLogData.php --wiki=aawiki [18:42:26] The MediaWiki script file "/srv/mediawiki-staging/php-master/extensions/AbuseFilter/maintenance/purgeOldIPLogData.php" does not exist. [18:44:05] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#3964686 (10Marostegui) Oh, in that case we can certainly add them. So all s3 needs it pretty much :-) Thanks! [21:24:09] 10DBA, 10Operations, 10Patch-For-Review, 10codfw-rollout: [RFC] improve parsercache replication and sharding handling - https://phabricator.wikimedia.org/T133523#3965226 (10Krinkle) [21:24:34] 10DBA, 10Operations, 10Patch-For-Review, 10codfw-rollout: [RFC] improve parsercache replication and sharding handling - https://phabricator.wikimedia.org/T133523#2234475 (10Krinkle) @jcrespo Thanks, I'll untag our team for now then. Let me know if there's anything we can do. [23:16:27] 10DBA, 10AbuseFilter, 10Patch-For-Review: DBA review of purgeOldLogIPData.php - https://phabricator.wikimedia.org/T186973#3965529 (10MarcoAurelio) 05Open>03Resolved Given the recent events I think this review is now done. Closing. If mistaken, please reopen.