[19:05:40] halfak: interesting link: https://en.wikipedia.org/wiki/User:Emijrp/Anti-vandalism_bot_census [21:37:31] halfak, really stupid question you'll know the answer to [21:37:42] does server time reflect DST, or is it just all UTC, all the time? [21:41:02] Depends how you are getting at it. [21:41:05] I assume UTC in specs. [21:47:04] * Ironholds nods [21:47:11] "in MediaWiki" is the best I can do. [21:47:22] Trying to control for TZ changes on the server-side, when doing the TS localisation [22:04:39] halfak: busy? [22:09:08] Helder, this week is meeting week. If you can drop me a message, I'll get to it this evening. [22:09:28] Ironholds, bad enough problems that we needed to restart [22:09:43] kk [22:10:27] halfak: I was polishing my badwords list on https://gist.github.com/he7d3r/1285f6b52e2782d96b9e [22:11:18] I run another script, which checked a history dump to get the # of times each stem was removed from a page, and them removed the stems which were not removed, ever [22:11:39] so, now that list is sorted by the number of removals. [22:12:23] https://gist.github.com/Ironholds/6d94a14bde61a9c8159a [22:12:25] in the bottom, there are a few items which were removed only once. They probably should be removed? [22:13:15] or maybe we should remove all items which have less than N (for some N) removals? [22:14:59] Also, should we remove non-Portuguese words from the list? E.g.: motherfucking was removed on ptwiki 23 times, should it be among the Portuguese badwords? [22:15:35] more generally, what to do with badwords from one language if it is used in a wiki in another language? [22:19:46] another thing: assuming a given stem was removed N times, and in M of these cases the corresponding words were not badwords (i.e., false positives) [22:20:54] should the stem be removed from the list if M/N is above some threshold? what would be a reasonable threshold? 0.5? [22:21:35] or false positives should not have so much influence in the results that we can keep them in the initial list? [22:21:57] halfak: I think these are the questions in my mind for now :-) [22:24:12] * halfak is barely present in this room :( [22:37:07] no worries