[06:56:42] (03CR) 10Awight: [C: 03+1] "Good changes. If it weren't so much effort, I would suggest that we enable checkstyle in a non-voting mode at first... but I don't think " (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681772 (owner: 10Gehel) [07:03:58] (03PS3) 10Gehel: Adding checkstyle configuration. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681773 [07:04:31] (03PS2) 10Awight: Rewrite WMDE Tech Wishes reports as native HiveQL [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/681707 (https://phabricator.wikimedia.org/T193169) [07:04:46] (03CR) 10Awight: "PS 2: manual rebase" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/681707 (https://phabricator.wikimedia.org/T193169) (owner: 10Awight) [07:05:19] (03CR) 10Gehel: "> Patch Set 2: Code-Review+1" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681772 (owner: 10Gehel) [07:08:32] (03CR) 10Gehel: Fix some checkstyle violations. (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681772 (owner: 10Gehel) [07:12:40] 10Analytics: Analytics coordinator failover improvements - https://phabricator.wikimedia.org/T280905 (10elukey) [07:13:11] (03CR) 10Awight: [C: 03+1] Fix some checkstyle violations. (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681772 (owner: 10Gehel) [07:13:57] 10Analytics: Analytics coordinator failover improvements - https://phabricator.wikimedia.org/T280905 (10elukey) [07:13:59] 10Analytics-Clusters: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10elukey) [07:33:37] (03CR) 10Gehel: [C: 03+1] "Looks good enough to be merged as a minor cleanup. Andrew: ping me if you think otherwise, or please merge (I don't have +2 rights on this" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681772 (owner: 10Gehel) [07:53:00] 10Quarry: Query that selects multiple columns with the same name never finishes - https://phabricator.wikimedia.org/T280909 (10BrandonXLF) [08:33:13] (03PS1) 10Awight: [WIP] Report on test coverage [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681933 [08:52:12] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "One confusing nesting level gone. I'm a fan. \o/ Thanks!" (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/681707 (https://phabricator.wikimedia.org/T193169) (owner: 10Awight) [09:52:47] (03CR) 10Awight: Rewrite WMDE Tech Wishes reports as native HiveQL (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/681707 (https://phabricator.wikimedia.org/T193169) (owner: 10Awight) [09:56:40] (03PS3) 10Awight: Rewrite WMDE Tech Wishes reports as native HiveQL [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/681707 (https://phabricator.wikimedia.org/T193169) [10:11:32] 10Quarry: Handle visiting non-existent query - https://phabricator.wikimedia.org/T280915 (10BrandonXLF) [10:21:56] (03CR) 10Awight: "recheck" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681933 (owner: 10Awight) [10:42:06] (03PS2) 10Awight: [WIP] Report on test coverage [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681933 [10:59:28] (03PS3) 10Awight: [WIP] Report on test coverage [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681933 [11:12:50] (03PS4) 10Awight: [WIP] Report on test coverage [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681933 [11:38:29] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, 10Readers-Web-Backlog: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) >>! In T279382#7026277, @Jdlrobson wrote: > I honestly can't remem... [12:07:21] (03PS5) 10Awight: [WIP] Report on test coverage [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681933 [12:17:16] (03PS6) 10Awight: [WIP] Report on test coverage [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681933 [12:19:02] (03CR) 10Awight: "I ran into my own naivety about maven. It seems that if I add a report-aggregation in the root pom.xml, which executes during the `verify" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681933 (owner: 10Awight) [13:05:05] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, 10Readers-Web-Backlog: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10Ottomata) FYI, @mforns has begun work on this I think. [13:12:36] (03CR) 10Ottomata: [C: 03+2] Fix some checkstyle violations. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681772 (owner: 10Gehel) [13:13:31] (03CR) 10Ottomata: "Should I merge this?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681773 (owner: 10Gehel) [13:14:28] ottomata: yes, you can merge it [13:15:08] (03CR) 10Ottomata: [C: 03+2] Adding checkstyle configuration. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681773 (owner: 10Gehel) [13:15:13] :) [13:41:33] (03PS1) 10Gehel: Sonarqube indexes checkstyle reports. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681965 [13:42:03] ottomata: and since you're around, another one for you ^ [13:49:00] (03CR) 10Ottomata: [C: 03+2] Sonarqube indexes checkstyle reports. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681965 (owner: 10Gehel) [13:49:05] ty! [14:05:05] (03PS1) 10Gehel: Ensure that maven site generation works. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681989 [14:05:49] ottomata: last one and then I'll stop working for real [14:06:41] i love it [14:07:03] wait until you see how much sonar starts complaining :) [14:07:16] gehel: this publishes javadocs?? [14:07:35] no, there seems to be an issue with javadoc generation [14:07:39] rats [14:07:47] for refinery-source specifically? [14:07:49] or in general? [14:07:51] but if we fix it, then yes, it publishes a lot of stuff [14:07:54] i'd love that especially for wikimedia-event-utilities [14:07:55] for refinery [14:08:38] that **should** already be the case for event-utilities [14:09:11] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Rename event_sanitized partition directories to lowercase - https://phabricator.wikimedia.org/T280813 (10Ottomata) ah ha, right. The partition changes needed to be done in batches. https://issues.apache.org/jira/browse/HIVE-12077 Setting `SET hive.msck.re... [14:09:31] Oh no, it's missing that same job [14:10:48] I'll add it to my other CR [14:12:07] for an example of what we publish for other projects: https://doc.wikimedia.org/wikidata-query-rdf/query-service-parent/ [14:13:23] the really interesting reports are https://doc.wikimedia.org/wikidata-query-rdf/query-service-parent/common/dependency-updates-report.html and https://doc.wikimedia.org/wikidata-query-rdf/query-service-parent/common/plugin-updates-report.html [14:13:28] IMHO [14:16:04] wow nice [14:16:26] will it show duplicates too? i know we have a lot of thoose in refinery [14:17:17] gehel: oo does it do scala doc too?! [14:17:26] (03CR) 10Ottomata: [C: 03+2] Ensure that maven site generation works. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681989 (owner: 10Gehel) [14:18:32] Duplicates are another story :/. But I have a tool for that too [14:19:06] I've never looked at scaladoc, but I'm sure there is a solution for it ! [14:23:38] gehel: fyi razzi might ask you for some log4j / log4j2 tips if you have some [14:23:47] we see some annoying log rotate behavior since upgrading hive [14:23:48] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Rename event_sanitized partition directories to lowercase - https://phabricator.wikimedia.org/T280813 (10mforns) > Setting SET hive.msck.repair.batch.size=5000; before running MSCK REPAIR TABLE seems to do the trick! 👏 [14:24:03] and we think it might have to do with the log4j2 version change? but are really not sure [14:24:11] and both of us are totally confused when we try to undertsand log4j configs [14:31:33] Happy to have a look if you point me in the right direction [14:34:52] gehel: this is the ticket, but I wouldn't worry about it unless razzi asks you, we haven't really deeply investigated. https://phabricator.wikimedia.org/T279304 [15:30:09] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Patch-For-Review: Mention QRank in “Analytics Datasets” - https://phabricator.wikimedia.org/T278416 (10Ottomata) @fdans @sascha how's https://gerrit.wikimedia.org/r/c/operations/puppet/+/681994/2/modules/dumps/files/web/html/analytics_index.html look? [17:20:48] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Rename event_sanitized partition directories to lowercase - https://phabricator.wikimedia.org/T280813 (10Ottomata) Ah, found another step. Last week, when I upgraded the Refine job, I inadvertently re-refined a lot of old data, which caused the input to e... [17:44:20] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Rename event_sanitized partition directories to lowercase - https://phabricator.wikimedia.org/T280813 (10Ottomata) Did: `lang=scala import org.wikimedia.analytics.refinery.job.refine._ import com.github.nscala_time.time.Imports._ import org.joda.time.format... [17:45:53] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Rename event_sanitized partition directories to lowercase - https://phabricator.wikimedia.org/T280813 (10Ottomata) Backfilling: ` sudo -u analytics kerberos-run-command analytics refine_event_sanitized_analytics_immediate --since='2021-04-15T14:00:00Z' --un... [19:29:18] hi all! apologies for the bother on a holiday, looking for guidance here on a few topics, if anyone's a bout and has some time [19:30:43] - How to copy a large number of results of a Hive query (that filters on a much larger set of data) to a table in my user data base. (I kept getting Java out of heap space errors when I tried) [19:32:15] - How to link webrequests (event logging requests to be exact, or possibly the entries in an event table) to the initial http request record for the same pageview [19:33:13] - Suggestions for joining two large datasets on unique datapoints in each dataset [19:33:49] Note: this is for a personal project, not work planned for the team I'm on... many thanks in advance! [19:34:21] https://phabricator.wikimedia.org/T280478 [19:48:36] AndyRussG: hi! I've had luck copying over hive data directly, something like `create table awight.foo as select c1, c2 from event.bar where ...` [19:48:43] Or maybe this is already what you're doing? [19:49:34] & they say `beeline` is the recommended CLI client at the moment, so probably comes with the best JVM defaults. [19:50:16] awight: hiiiii! thanks! yeah I tried that, on beeline, hive, sapark2sql and Jupyter notebooks..... 8p [19:50:58] hehe I expected you were many steps ahead already [19:55:14] AndyRussG: Maybe it's possible to add the data in batches, then? Or possibly there's an inefficiency in the query... [20:07:21] (03CR) 10Gehel: [C: 04-1] "Really good! Getting coverage info would be great!" (039 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681933 (owner: 10Awight) [22:19:51] 10Analytics-Radar, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Performance-Team (Radar): CentralNotice banners shouldn't be served to bots - https://phabricator.wikimedia.org/T252200 (10awight) 05Open→03Invalid Unfortunately, search engines penalize this behavior, it's considered bad...