[01:15:42] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1747263 (Milimetric) The deployment of this is stalling because of two production issues that we had to jump on this week. Sorry... [05:26:49] Analytics, Analytics-Cluster: Graphoid service requests end up in the TEXT hadoop storage - https://phabricator.wikimedia.org/T116356#1747456 (Yurik) NEW a:Ottomata [05:27:15] Analytics, Analytics-Cluster: Graphoid service requests end up in the TEXT hadoop storage - https://phabricator.wikimedia.org/T116356#1747464 (Yurik) [08:38:27] Analytics-Kanban, RESTBase, Services, RESTBase-API: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1747589 (mobrovac) [08:40:52] Analytics: Publicly available view counter of videos and audio files - https://phabricator.wikimedia.org/T116363#1747593 (Mrjohncummings) NEW a:kevinator [09:24:35] (PS18) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 [09:38:11] Analytics-Backlog, Database: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1747671 (jcrespo) ``` (in cleanup) 2015-10-23T09:36:38 Error copying rows from `log`.`MobileWebClickTracking_5929948` to `log`.`_MobileWebClickTracking_5929948_new`: DBD::mysql::st execu... [10:37:03] (PS19) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 [11:06:11] Analytics-Backlog, Database: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1747765 (jcrespo) I've finished the alter table (CC @nuria), by doing Toku tables with its online ALTER. This may make selects a bit slower afterwards, but nothing different from what it... [11:07:23] Analytics: Publicly available view counter of videos and audio files - https://phabricator.wikimedia.org/T116363#1747767 (Nemo_bis) Can you clarify what you need? This was fixed months ago: https://wikitech.wikimedia.org/wiki/Analytics/Data/Mediacounts See some example data extractions for a couple WMIT pro... [11:08:18] Analytics-Backlog, Database: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1747768 (jcrespo) BTW, editCountBucket is currently NULL for all rows: ``` mysql> SELECT editCountBucket FROM MobileWebWatchlistClickTracking_10720361 LIMIT 1\G *************************... [11:17:35] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [11:19:25] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [11:22:59] Analytics-Backlog, Database: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1747777 (mforns) @jcrespo > I've finished the alter table Awesome, thank you! > The idea is now: > a) Setup 2 triggers per table on update and on insert that updates the new column aut... [11:44:37] joal: hello [11:44:40] re https://phabricator.wikimedia.org/T114830 [11:44:44] hi mobrovac [11:44:57] i see you moved it to the 'done' column on your porject [11:45:02] but ... it's not actually done [11:45:10] Arf, sorry [11:45:26] Will get it back in progress then [11:45:35] we yet have to settle on the per-project and per-article routes inside each domain [11:45:38] thnx! [11:45:48] np :) [11:46:05] It's milimetric task, I shouldn't have touched it anyway :) [11:46:09] hehehe [11:46:38] let's put that in the too-excited-to-see-it-go-live box :) [11:46:46] indeed ;) [11:47:34] thanks a lot by the way mobrovac for the job done :) [11:47:49] np, my pleasure [11:47:50] that's great [11:48:12] thank you guys for going with RESTBase [11:48:18] we're really excited about that [11:49:27] Analytics-Backlog: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1747807 (mforns) We should implement the auto-purging per schema, not per revision. Because otherwise, every time someone edits a schema, we'd revert it back to the default purging strategy. So the white-l... [11:52:14] * joal is gone, will be back in roughly two hours [12:27:29] joal: YESSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS [12:27:34] hahahaha [12:27:43] I'm gonna pop open some champagne [12:28:12] and start posting the happy news to the more internal folks that need to know [12:45:24] Analytics-Engineering, Community-Tech: [AOI] Add page view statistics to page information pages (action=info) - https://phabricator.wikimedia.org/T110147#1747864 (Milimetric) The API is launched, spec is final, we're still working on loading in all the data (had to switch schemas because it used way more s... [13:16:16] Hey milimetric :D [13:16:27] hey hey [13:16:40] * joal is very happpy :) [13:16:51] likewise man, good fight, good win [13:17:05] Yessir ! [13:17:27] we should tie up all the loose ends [13:18:49] do you have a different load job for per-article-flat? [13:19:08] Yes, particularly the per-article end-point which os not the real one but maybe the new one won't work better than the old one so who knows: ) [13:19:28] milimetric: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0048655-150922143436497-oozie-oozi-C/?bundle_job_id=0048654-150922143436497-oozie-oozi-B [13:19:57] btw guys, you cannot deploy your flat table change before https://github.com/wikimedia/ansible-deploy/pull/20 is merged, as the checks would fail [13:20:03] about 3/4 times faster than the previsou one [13:20:57] so that's loading oct 1 - 3, new schema [13:21:13] and you're saying it's running 175% the speed of the old one, or 75%? [13:21:40] between 3 and 4 times faster than the old one (wild estimation) [13:22:07] oh! [13:22:17] I thought that 3 was divided by 4 :) [13:22:28] cool, that's good news, means we'll probably see slightly smaller data sizes [13:22:28] yes, it could have make sense :) [13:23:41] thanks mobrovac, we'll probably not be ready anyway. I'll make the PR for it and note in the commit msg that it's blocked by this one. [13:28:57] kk [13:30:10] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1747924 (Ottomata) I'd like an actual timestamp to be part of the framing for all events too. I'm all for a reqid, (although I'... [13:36:24] milimetric: with the rough estimate of already loaded data, I'd say the change in format divides the datasize by 2.5 compared to previous format [13:36:35] I'd have expected way more :( [13:36:38] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1747925 (Jgreen) >>! In T97676#1739669, @awight wrote: > Confirmed that the campaign is intact. All the pipeli... [13:37:00] joal: I think sadly that's what gabriel was expecting [13:37:07] ok, so time for plan B [13:37:13] yup [13:37:16] I mean, we should still do this but we need to switch off cassandra fast [13:37:25] Yes [13:37:35] on the other hand, we need to make sure that whatever we switch to will actually compress the data more [13:37:40] I'm not 100% sure druid would... [13:37:46] There still is something I don't get about the data size ... Anyway [13:37:54] although... we CAN store Druid segments in HDFS, so that's nice [13:38:05] milimetric: the cassandra setup buys us 4/5 month [13:38:09] joal: do you have a good idea of the source data size? [13:38:25] right, but we'll fill that up with just the backfilling [13:38:39] milimetric: can find the source yes [13:38:45] But I need more precision :) [13:38:54] as per what you expect :) [13:39:03] sry :) [13:39:40] what I mean is, how big is pageview_hourly for one day vs. how big is pageviews.per.article.flat for the same period? [13:40:03] wrt cassandra, you should also take into account that going with more than 800 GB per instance can cause problems [13:40:04] i know they're not comparable, I'm just wondering [13:40:07] I'll have some estimates in 5 mins [13:40:14] we are currently switching to multiple instances per hw box [13:40:18] mobrovac: wait what :) [13:40:22] haha [13:40:47] wow ... That changes the thing then [13:40:59] so we had our instances reach > 1TB which created certain problems [13:41:07] what problems [13:41:11] not to worry though - we already have a solution in the works [13:41:25] the multiple instance per box thing makes sense, we can use that I gues [13:41:32] but still, we'll run out of space sooner than later [13:41:38] so we need a backup [13:41:59] do you have a growth estimate? [13:42:15] we're still getting that as we generate the new flat structure [13:42:16] as in how much new data per day/month you plan on adding? [13:42:31] joal: wait, you have intermediary files generated by the load, right, so I guess the size of those would be more relevant than pageview_hourly [13:42:46] that's what I 'm looking at milimetric [13:43:04] mobrovac: the old structure was almost 200 Gigs per day!! [13:43:20] something doesn't seem right... [13:43:43] how could that be, that seems way big [13:44:03] omg [13:44:21] actually, because of the diversity of the fields, compression couldn't do much [13:44:30] your new flat structure is way better [13:44:37] yeah, agreed [13:44:44] milimetric: between 220M and 250M gzipped compress per hour [13:44:56] joal: the source data? [13:45:13] so we're getting a factor of 1000 increase in data size just from loading it in... [13:45:16] wow [13:45:21] milimetric: plus between 2.2G and 2.5G gzipped compress per day [13:45:29] oh per hour, sorry [13:45:38] so factor of 100 [13:45:54] more or less, something between 50 and 100 [13:46:02] so 8.5G gzipped compress per day [13:46:12] ok, let's see what the new load looks like after it finishes compressing and compacting and everything [13:46:30] wait ... what? How is 8.5 between 2.2 and 2.5? [13:46:39] ohohoh! [13:46:40] yeah, but cassandra has a hard time compressing / compacting due to the load we put on it [13:46:51] 2.5 + 220 * 24 [13:47:04] 0.25G * 24 + 2.5 [13:47:18] right, so i mean let's load these 3 days, and then wait to let it take a breath and finish compressing [13:47:31] it's ok if we have to load slowly, but it's not ok if no matter what we do we still run out of space [13:47:43] in that case we need to switch gears crazy fast [13:48:08] It'll probably have finished backfill october by the end of the weekend, and hopefully will do a good clean up after the match :) [13:48:19] ok, so if we had 200G per day before, 200 / 8.5 = 23 [13:48:30] so a factor of 23 explosion when we unzip and load [13:48:44] Let me try something milimetric [13:50:28] Unzipping an example file gives me a factor 10 [13:50:31] milimetric: --^ [13:50:36] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1747951 (mobrovac) >>! In T116247#1747924, @Ottomata wrote: > I'd like an actual timestamp to be part of the framing for all eve... [13:50:41] from 82M to 777M [13:51:02] aha... so the world has stopped making sense [13:51:12] I'm gonna go stare at the ceiling and eat butter [13:51:15] it has for as well :( [13:53:53] milimetric: t [13:54:18] there is a way to earn another ~30% on data size [13:54:30] But it's even more tricky [13:54:41] milimetric: http://blog.librato.com/posts/cassandra-compact-storage [13:55:15] milimetric: as you said, let's wait and see if cassandra compression / compaction helps [13:56:16] milimetric: come back from looking at your ceiling ! I need you ;) [14:10:33] (I'll finish breakfast and read that post) [14:14:52] note the sentence "As the following shows, you can’t create more than one column past the primary key. " [14:15:07] Yes I have seen that mobrovac [14:15:08] Analytics: Publicly available view counter of videos and audio files - https://phabricator.wikimedia.org/T116363#1747974 (Sadads) @Nemo_bis - there is no easy way to get at this data through either the commons interface or a tool, I think what he is requesting is a tool or visible stat that helps improve the... [14:15:30] If we go down that path mobrovac, we'll seroalize the data json style [14:15:48] I don't know if it'll be sufficient though [14:18:09] yeah, you're kind of in a strange pickle here [14:18:39] with the old-style tables, you had a lot of the same column values, but they were too short to be significant for compression [14:19:01] on the other hand, with the flat structure you have the problem that there are a lot of numbers there [14:19:08] so again not so good for compression [14:22:28] Analytics: Publicly available view counter of videos and audio files - https://phabricator.wikimedia.org/T116363#1747991 (Mrjohncummings) @sadads Yes this is exactly it, something that is easy to use and can provide metrics for both individual files and collections of files e.g a Commons categories [14:28:37] Analytics: Publicly available view counter of videos and audio files - https://phabricator.wikimedia.org/T116363#1747997 (Sadads) @Mrjohncummings you might want to update your request then, to reflect the need for a tool or more transparent interface for this data. [14:32:46] Analytics: Publicly available and easy to use metrics for plays of video and audio files. - https://phabricator.wikimedia.org/T116363#1748002 (Mrjohncummings) [14:34:51] Analytics: Publicly available and easy to use metrics for plays of video and audio files. - https://phabricator.wikimedia.org/T116363#1748006 (Mrjohncummings) @Sadads done :) Apologies to analytics team if it is still not clear, happy to discuss further. [14:35:11] ok, skimmed the article [14:35:20] I'm not super excited about squeezing a little bit more space [14:35:34] milimetric: Yeah, I can hear that [14:35:44] Unless we somehow magically found a factor of 100 increase, we're still going to have to switch [14:35:55] I agree [14:36:09] that's not an exageration, we need at least 100 times more space [14:36:13] milimetric: I'd like to test druid and ES [14:36:32] yep, we can compare them both for the same exact amount of dat [14:36:34] *data [14:36:40] if we get 10, it's already not so bad, but a 100 would better yeas [14:36:50] 10 would not be bad for this year and the next maybe [14:37:00] but this isn't supposed to be obsolete in one year... [14:37:08] and we still want to load 8 years of historical data too [14:37:09] milimetric: I suggest we go 1 full day: 24h + 1D (no relevant for druid) [14:37:22] yes milimetric [14:37:39] you mean test 1 full day, hourly and daily, in terms of size across the 3 different solutions? C*, Druid, ES? [14:37:47] milimetric: yessir [14:37:51] k, agreed [14:38:01] the concern is for per-article stuff, the rest is peanuts [14:38:27] as far as different schemas, I think one easy solution for us would be to checksum all the keys and store that checksum and the viewcounts in one array [14:38:44] so when someone queries, we just checksum their parameters, look for a match and retrieve the right count from the array [14:39:08] so the schema would just be (checksum, view_count_array), or (c, a) :) [14:39:15] milimetric - can be done [14:39:29] that's if we're trying to get extreme and we don't have another solution in place by the time we run out of space again. [14:39:44] I'm not sure this would earn a lot though, cause checksum doens't deal well with compaction [14:39:57] compression not compaction [14:40:32] yeah, you might be right [14:41:24] anyway, let's copy those 8G of data somewhere in labs and load it up in Druid and ES. I'll set up a druid cluster with their new install instructions and a schema for per-article. [14:41:31] milimetric: I start by generating the test dataset [14:41:44] isn't it already generated for cassandra loading? [14:41:56] After, ES test can be done using docker I think [14:42:13] oh i'll check for a druid docker too [14:42:22] milimetric: deleted straight after successful load, to prevent hdfs filled up [14:42:44] ottomata: you around ? [14:42:59] oh ok [14:43:12] ottomata: could it be possible to install docker on an1004 ? [14:46:09] :o [14:48:09] joal: yes, but i would not want to do it without asking [14:48:32] joal: for druid? [14:48:45] ottomata: for druid and / or ES [14:48:52] test data sizes [14:48:55] Analytics-Tech-community-metrics, DevRel-October-2015: Affiliations and country of resident should be visible in Korma's user profiles - https://phabricator.wikimedia.org/T112528#1748039 (Aklapper) > hard and impossible to verify that such data is correct. I support correcting affiliation but country sou... [14:48:56] aye [14:50:26] So, ticket I guess ? How long do you tink it could ottomata ? [14:53:24] if ops people are cool, it'd be pretty easy [14:53:28] there is a docker package in trusty [14:53:33] yup [14:53:47] I mean, you are ops, and you're cool, sooo ;) [14:53:49] so ja, ticket, and i think i'll try to make the ops meeting on monday [14:53:57] and make them discuss [14:54:17] in the ticket, make sure you explain why you can't use labs [14:55:22] hm, we can use la [14:55:30] but data size is kinda big [14:55:36] Ok, I'll explain :) [14:55:56] I'll assign it to you, which project should I use? [14:56:04] ottomata: --^ [14:56:19] wait, so, why do we need to use prod for this? [14:56:32] we're just trying to test the size of the loaded data [14:56:41] so we can copy one day of data out to labs, it's all public anyway [14:56:50] and load it into ES and Druid [14:56:55] yes, it's about 10G [14:57:01] so, that's fine [14:57:09] ok :) [14:57:17] ottomata: sorry for bothering :) [14:57:25] Analytics-Engineering, operations: Install docker on analytics1004.eqiad.wmne - https://phabricator.wikimedia.org/T116383#1748053 (Dzahn) [14:57:46] joal: ottomata: there's a labs NFS -> production thing possible, right? [14:57:48] we can do it that way [14:58:18] milimetric: I think there is folder, yes, but can't remember which one [14:58:38] hm, um i think we made it possible to rsync stuff to labs [14:58:41] it's a thing you have to enable in puppet somehow I think [14:58:42] from prod [14:58:54] yeah, ellery needed it for translation recommender tihngy [14:59:00] but not sure how that's set up [14:59:34] https://wikitech.wikimedia.org/wiki/Analytics/FAQ#How_do_I_transfer_public_files_from_stat_boxes_to_labs.3F [14:59:51] not working yet! yes it is..i think [15:00:36] Analytics-Engineering, operations: Install docker on analytics1004.eqiad.wmne - https://phabricator.wikimedia.org/T116383#1748059 (Krenair) [15:02:24] joal: milimetric Congrats!! :) [15:02:36] anyway, joal, if you work on setting up ES I'll do druid [15:02:44] madhuvishy: yeah!!! :D [15:02:46] What's ES [15:02:50] elastic [15:02:55] Aah [15:03:07] (we're already out of space, the celebration didn't last long0 [15:03:09] ) [15:03:16] Ya ya I read the whole thread [15:03:18] ay... [15:04:07] Analytics-Engineering, operations: Install docker on analytics1004.eqiad.wmne - https://phabricator.wikimedia.org/T116383#1748064 (JAllemandou) Hi Guys, I was thinking to use a spare to test loading a 10G dataset into both Druid and ES, using docker to set up the services. Given the reasonnable size of the... [15:04:32] Thanks madhu :) [15:04:38] milimetric: either :) [15:07:53] Analytics, Analytics-Cluster: Graphoid service requests end up in the TEXT hadoop storage - https://phabricator.wikimedia.org/T116356#1748067 (Ottomata) No, it is per varnish cluster. I'd like to expand webrequest_source to many more partition types than it currently has, but at the moment doing so requ... [15:07:59] joal: can you make the cassandra meeting today or not really? [15:08:08] Analytics-Backlog, Analytics-Cluster: Graphoid service requests end up in the TEXT hadoop storage - https://phabricator.wikimedia.org/T116356#1748081 (Ottomata) [15:08:25] nuria: it's at my 1-1 time, and we already killed the previous one :) [15:08:36] nuria: feasible to cancel it again though [15:08:46] joal: nah, 1 on1s are important [15:08:53] joal: let's see if we can move it [15:08:57] k [15:09:54] done [15:10:27] ottomata, hi, around? should we adjust hadoop collection for graphoid? it goes into 'text'/ [15:10:29] thx nuria :) [15:11:24] hey, did you guys register a 'Wikimedia' account on Docker Hub back in March? [15:11:55] milimetric: ^ [15:12:02] ori: doesn't ring any bell to me [15:12:18] ori sorry I forgot to answer that, I saw you asking last time [15:12:28] I wanted to check with others, but yeah, none of us know about it [15:12:39] darn. thanks for checking, though. [15:12:57] we played with docker and decided that all the packages are woefully out of date. So it's a great tool if you want outdated software [15:12:58] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1748095 (Ottomata) > So the producer would store the same time stamp twice? UUID v1 already contains it. Could you provide an ex... [15:13:19] milimetric: I'd mitigate that opinion :) [15:13:35] with "it's easy to use?" :) [15:14:08] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1748102 (Ottomata) > topics named something like mw-edit and mw-edit-private perhaps (where the latter contains this extra info)... [15:14:09] well, for pre-loaded stuff, it is, and the out-of-date is not completely true [15:14:16] depends on software I guess [15:14:25] Analytics: create tool to crunch metrics for views (play started) of video and audio files - https://phabricator.wikimedia.org/T116363#1748105 (jeremyb-phone) [15:16:26] Analytics: create tool to crunch metrics for views (play started) of video and audio files - https://phabricator.wikimedia.org/T116363#1748108 (jeremyb-phone) is there a phabricator project like "tool requests"? does analytics use workboards? (@aklapper) [15:18:15] Analytics-Backlog, Analytics-Cluster: Graphoid service requests end up in the TEXT hadoop storage - https://phabricator.wikimedia.org/T116356#1748118 (Yurik) @ottomata, the graphs are coming from the sca cluster, and exposed as two endpoints: https://graphoid.wikimedia.org/www.mediawiki.org/v1/png/Extens... [15:19:53] Analytics-Backlog, Analytics-Cluster: Graphoid service requests end up in the TEXT hadoop storage - https://phabricator.wikimedia.org/T116356#1748124 (Ottomata) What varnishes serve that first endpoint? [15:20:29] Analytics-Tech-community-metrics, DevRel-October-2015: Affiliations and country of resident should be visible in Korma's user profiles - https://phabricator.wikimedia.org/T112528#1748125 (Qgil) >>! In T112528#1748039, @Aklapper wrote: >> hard and impossible to verify that such data is correct. > > I supp... [15:22:56] (CR) Ottomata: Update bin/camus to include CamusPartitionChecker (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/248048 (https://phabricator.wikimedia.org/T113252) (owner: Joal) [15:23:05] joal: why java -jar instead of hadoop jar [15:23:08] or...spark-submit [15:23:14] to launch the checker? [15:23:25] milimetric: ottomata it's not a hadoop job :) [15:23:32] its not!? :o [15:23:32] oops sorry milimetric [15:23:36] :) [15:23:42] i thought it was spark, its just scala? [15:23:48] It pure scala [15:23:50] nice! [15:23:51] ha, ok [15:23:59] makes sense, its just looking at a file [15:24:00] That's why :) [15:24:04] right [15:24:06] why involve big ol hadoop [15:24:26] for conf, for instance, but still prefer to keep it separate :) [15:24:30] cool [15:24:38] my other q that I commented then [15:24:40] HMM [15:24:41] well [15:24:42] hm [15:24:46] ok 2 thoughts [15:24:59] 1, i bet you can get classpath that you need from an env vars somehow [15:25:43] e.g. by running /usr/lib/hadoop/libexec/hadoop-config.sh [15:25:47] sourcing it [15:26:16] milimetric, joal: if all you care about is compression, then moving to deflate and larger windows is an option too [15:26:35] and [15:26:35] makes sense ottomata [15:26:41] /usr/lib/spark/bin/compute-classpath.sh [15:26:57] or, adding nodes / reducing the replication factor [15:27:39] joal: milimetric, i think we can grant you one more node of the same type, MAYBE 2 if we don't want a new stat box [15:27:39] gwicke: the main point is, can we do anything to store this data in at least 10 times less space? [15:27:51] if the answer is no, then we have to look for an alternative anyway [15:28:25] gwicke: maybe we won't find better alternatives :) [15:28:27] preferably 100 times less space if we want any kind of historical data [15:28:32] that's also possible, sure [15:28:50] Looking for is the first step :) [15:29:09] 3 is doable by reducing replication from 3 on 1 [15:29:30] gwicke: yes, but if a node crashes, data's gone :) [15:29:48] another 2x or *might* be doable with deflate [15:29:51] gwicke: we can recompute and reload, but it's not free :) [15:30:16] but, 100x might be a bit far [15:30:38] this isn't very helpful, but there is about 750G of unallocated space on the volume group that cass is using [15:30:41] you might have to use very optimized storage schemes like delta encoding for that [15:30:47] so we could get you +750G on each of those nodes [15:31:09] even with that I'm pessimistic that it'd give you much over compression [15:31:43] how much data we hve loaded thus far? [15:32:10] (PS3) Ottomata: Update oozie diagram to reflect current status [analytics/refinery] - https://gerrit.wikimedia.org/r/248041 (https://phabricator.wikimedia.org/T115993) (owner: Joal) [15:32:16] (CR) Ottomata: [C: 2] Update oozie diagram to reflect current status [analytics/refinery] - https://gerrit.wikimedia.org/r/248041 (https://phabricator.wikimedia.org/T115993) (owner: Joal) [15:32:22] (CR) Ottomata: [V: 2] Update oozie diagram to reflect current status [analytics/refinery] - https://gerrit.wikimedia.org/r/248041 (https://phabricator.wikimedia.org/T115993) (owner: Joal) [15:32:33] nuria: about 1.3T [15:33:08] gwicke: sorry, i was asking how much of our dat ais loaded though like a month of pageviews? cc milimetric [15:33:12] nuria: btw, i let you review and merge joal's cassandra refinery changes [15:34:06] ottomata: i was looking at partition checker, are you merging that one? [15:34:07] nuria: we have about 1/10th of a month in pageview_per_article, and 1 day+ in pageviews_per_article_flat [15:34:18] joal: going from 3x to 2x replication might be a good compromise, though [15:34:29] Analytics-Backlog, Analytics-Cluster: Graphoid service requests end up in the TEXT hadoop storage - https://phabricator.wikimedia.org/T116356#1748152 (Yurik) Might need to ask @mobrovac or @gwicke about that, but I have moved everything to the second endpoint a few days ago, so it is no longer being used. [15:34:32] nuria: i can review those ones, ja [15:34:33] if you do the others [15:34:35] sorry nuria, 1/3 of a month [15:34:50] gwicke: much agreed [15:34:51] but that is like "saving money on bird food" gwicke [15:35:39] if 1/3 of a month is 1.3T [15:35:50] so ottomata, you suggrest using hadoop and spark classpath definers, right [15:35:57] ottomata: anything else ? [15:35:59] a reduction of 1/3 is not going to get us very far [15:36:06] joal: ja better to use those to avoid hardcoding the CPs [15:36:19] ottomata: makes sense :) [15:36:22] i think if you just source those files (and set JAVA_HOME first?) in the script or the command maybe [15:36:25] you can avoid that [15:36:32] ottomata: no parameter for classpath then, ok ? [15:36:38] ja, i think is fine. [15:36:49] ok [15:36:49] We'll have a team pow wow and update you with our thoughts cc gwicke [15:36:56] joal: if we want that later, we'll add it, and maybe make it overridable via env var instead [15:36:57] but ja [15:37:12] With the spark one, it's trickier: prints the spark classpath on stdout :) [15:37:36] ok ottomata, anything else for me to chnage while I'm at it ? [15:39:39] joal: i think not, just the commetn about refinery-job jar or camus-job jar :) [15:42:10] milimetric: did i lost it or 100 times more data than this erik z. has on wikistats using way less space [15:42:40] the difference is the hourly resolution [15:43:00] hmm, joal [15:43:01] idea. [15:43:04] wikistats has much higher granularity [15:43:05] for sure milimetric and nuria --> wikistats has monthly resolution [15:43:10] it woudl be really nice to be able to run checker in dry run mode [15:43:16] to ask it if a partition is good [15:43:17] no there are some daily pageviews in some cases [15:43:19] without actually writing a fiel [15:43:20] file [15:43:21] but with fewer dimensions [15:43:33] k milimetric [15:43:53] but still right? From erik's e-mail [15:43:58] https://www.irccloud.com/pastebin/7HDs8TaF/ [15:43:58] ottomata: hm, what would you expect as an outcome of a checker ? [15:44:35] those are daily agreggates, few dimensions sure but sizes are orders of magnitude lower [15:44:42] cc milimetric [15:44:49] milimetric, nuria : wikistats don't handle every pageview, right ? [15:44:52] only top ? [15:45:17] afaik it's every page view, but doesn't include mobile [15:45:31] it's complicated :) [15:45:43] the main point is, it's not hourly and it doesn't include all the dimensions we are [15:45:46] ottomata: throw an exception if the checker fails (ok), plus some log about import file would have been created ? [15:46:16] however, that said, cassandra is adding size on *top* of unzipped data [15:46:18] so that's bad [15:46:30] anyway, I have druid up and I'm writing a loading task now [15:46:35] so we'll know soon enough if that's better [15:47:00] milimetric: have you understood how to copy data to labs ? [15:47:16] milimetric: I don't think that's correct; compression is on the entire sstable [15:47:28] which includes identifiers [15:47:35] the main point is, it's not hourly and it doesn't include all the dimensions we are [15:47:54] yeah, that would explain most of it [15:48:25] anyhow more dimensions and granularity isn't necessarily something to boast about -- the gold standard is the system that is already running and has generated tremendous value for users on modest hardware requirements [15:48:31] joal: let me know when you and ottomata are done looking ta the partition checker [15:48:48] k nuria [15:48:49] milimetric: it would be good if you did a calculation of the raw expected data size, assuming binary encoded ints or the like [15:48:56] Analytics-Backlog, Analytics-Cluster: Refactor webrequest_source partitions and oozie jobs - https://phabricator.wikimedia.org/T116387#1748209 (Ottomata) NEW [15:49:03] gwicke: the zipped data per day is 8GB [15:49:05] Analytics-Backlog, Analytics-Cluster: Refactor webrequest_source partitions and oozie jobs - https://phabricator.wikimedia.org/T116387#1748224 (Ottomata) [15:49:06] Analytics-Cluster: Make varnishkafka produce using dynamic topics - https://phabricator.wikimedia.org/T108379#1748225 (Ottomata) [15:49:21] I'd say it's reasonable to expect about the same or better from whatever solution we use [15:49:27] zipped means compressed? [15:49:37] gzipped, yea, like the raw textual data [15:49:47] gwicke: TSV in flat format, gziped [15:50:05] is it getting gzipped again by cassandra? [15:50:08] that would add overhead [15:50:10] I don't think your expectations are realistic [15:50:35] no, ori, we're loading in the raw data and letting cassandra do whatever it can to reduce size [15:50:42] gzip isn't so bad at compressing highly regular data like numbers [15:51:00] ottomata: gwicke using column oriented storage might change the thing though [15:51:03] Analytics-Backlog, Analytics-Cluster: Graphoid service requests end up in the TEXT hadoop storage - https://phabricator.wikimedia.org/T116356#1748235 (Yurik) Open>Resolved Update; Per @gwicke, its the parsoid varnish - map to 208.80.154.248 [15:51:15] Analytics-Backlog, Analytics-Cluster: Refactor webrequest_source partitions and oozie jobs - https://phabricator.wikimedia.org/T116387#1748240 (Yurik) [15:51:16] Analytics-Backlog, Analytics-Cluster: Graphoid service requests end up in the TEXT hadoop storage - https://phabricator.wikimedia.org/T116356#1748239 (Yurik) [15:51:48] ottomata: batcave before standup about partition checker dryrun ? [15:52:04] joal: sorry was making a ticket [15:52:07] umm, no real quick [15:52:07] i mean [15:52:09] np ottomata [15:52:17] i think it can just yeah, print out a message about success or failure [15:52:23] OO [15:52:26] MAYBE [15:52:29] maybe this isn't for this task. [15:52:42] but it could print out the statuses in a nicer way than hdfs dfs -cat of those offset files [15:52:42] :) [15:52:46] if you want it to [15:52:48] as another mode [15:52:48] currently, you are using LZ4 compression in Cassandra, and are getting a compression ratio of around 28% [15:52:53] buuuut, maybe not for this task :) [15:53:11] ottomata: makes sense, agreed for a new task ! [15:53:15] Can you create that ? [15:53:22] gwicke: that doesn't seem to make sense with the numbers we're seeing, so I'm open to the possibility of us not understanding something [15:53:33] joal: i don't want to, let's just keep in our minds as a nice thing to have [15:53:37] just an idea [15:53:38] so we have 8GB gzipped, 80GB raw TSV textual, and 200GB loaded into C* [15:53:42] milimetric: what's the raw data size? [15:53:47] joal: uh, but for this task [15:53:52] as in binary ints, not compression? [15:53:56] yeah, make dry run output whether the partition is complete [15:53:57] *no [15:54:01] and where it would have written the file if so [15:54:13] k ottomata [15:54:16] gwicke: not sure about binary... we only know the raw TSV textual [15:54:38] milimetric: is that '200GB' with replication? or without? [15:54:46] ottomata: I believe with [15:54:49] oh so that makes sense [15:55:02] also - I should've said this *way* earlier but I don't think replication makes any sense here [15:55:16] it's no disaster if we lose data, we can just fill from the source where it's redundant in HDFS [15:55:21] oof [15:55:22] milimetric: really ? at least 2, no ? [15:55:24] yeah but it would be painful [15:55:25] at least 2. [15:55:33] i mean, i disagree [15:55:42] how hard would it be for you to switch to daily resolution for now? [15:55:43] we wouldn't lose *all* the data, right? [15:56:04] milimetric: we would loose 1/3 of the data, roughly :) [15:56:29] daily resolution would probably be ok for most of the use cases, but that still doesn't constitute an improvement. Currently people have hourly resolution on stats.grok.se with a mysql db!!! [15:56:30] ahem, an endpoint that needs to be available 100% w/o replication is not possible [15:56:56] agree with milimetric on that cassandra has to MAJORLY improve [15:57:02] nuria: but the replication is there, it's just offline. And this doesn't need to be up 100%, it can take a few days of downtime, no? [15:57:11] current situation with mysql or wikistats [15:57:26] nuria: but the sizes are reasonable if you factor in replication factor [15:57:47] I think one major advantage with Druid is that it can store on HDFS directly, so we wouldn't have either space limitations or replication issues [15:57:48] milimetric: stats.grok.se is daily, not hourly, right ? [15:57:58] to me it looks daily [15:58:00] http://stats.grok.se/en/201510/Foobar [15:58:01] we could probably even go to Druid as a replacement for pageview_hourly actually, that'd be the biggest win [15:58:23] joal: hm... it ingests hourly data, and I know it's possible to query hourly via their rest thing [15:58:34] "rest" used loosely [15:58:38] milimetric: that's for our talk in 2hours ;-P [16:00:59] joal: right, let's make meeting 1 hour [16:02:24] milimetric: omw, sorry [16:04:03] Analytics-Kanban, Patch-For-Review: EventLogging mysql consumer cannot insert events that have a nested json schema that includes a plain "array" {oryx} [5 pts] - https://phabricator.wikimedia.org/T116241#1748255 (Milimetric) [16:06:51] Analytics-Tech-community-metrics, Developer-Relations, DevRel-October-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1748264 (Aklapper) p:Normal>High [16:10:17] ottomata: saw your comments, so i can assume that the logging and email template exist? no need to even ensure? - the assumption is, if the package is installed these things exist [16:10:21] correct? [16:11:02] right. [16:11:24] Analytics-Backlog, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, Patch-For-Review: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1748281 (bd808) [16:12:15] Analytics-Kanban, Traffic, operations, Patch-For-Review: Flag in x-analytics in varnish any request that comes with no cookies whatsoever {bear} [5 pts] - https://phabricator.wikimedia.org/T114370#1748283 (Nuria) Open>Resolved [16:12:59] Analytics-Kanban, Patch-For-Review, WMF-deploy-2015-10-27_(1.27.0-wmf.4): Incident: EventLogging mysql consumer stopped consuming from kafka {oryx} - https://phabricator.wikimedia.org/T115667#1748285 (Nuria) [16:13:00] Analytics-Kanban, Patch-For-Review: EventLogging mysql consumer cannot insert events that have a nested json schema that includes a plain "array" {oryx} [5 pts] - https://phabricator.wikimedia.org/T116241#1748284 (Nuria) Open>Resolved [16:13:16] Analytics-Backlog, Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Add MAPJOIN configuration so in memory mapjoins happen by default in hive [1] - https://phabricator.wikimedia.org/T116202#1748286 (Nuria) Open>Resolved [16:16:21] ottomata: the burrow.yaml file with the consumer_groups list doesn't get picked up :P [16:18:11] picked up where? ja you'd need to include the role class via the role keyword [16:18:17] which you can't do in labs, unless you hardcode a node def [16:18:19] which you can do! [16:18:25] HM [16:18:25] i think [16:18:35] actually, i have no idea how that works. [16:18:41] the role hiera backend is a mystery to me [16:18:47] hmmm [16:18:53] i know in prod [16:18:56] you don't get those [16:19:00] unless your class is included like [16:19:02] i included the role class using wikitech [16:19:03] role analytics::burrow [16:19:05] right [16:19:12] which seems to be fine [16:19:15] because [16:19:21] https://wikitech.wikimedia.org/wiki/Hiera:Analytics [16:19:27] everything here shows up [16:19:30] oh! [16:19:31] great [16:19:34] right make it [16:19:37] burrow::consumer_groups [16:19:38] right? [16:19:44] you are setting the var on the module, not the role class [16:20:20] yeah, i'll change that, and may be it'll show up - but shouldn't it be looked up in hieradata/role/common/analytics/burrow.yaml where it exists? [16:27:10] Analytics-Tech-community-metrics, DevRel-October-2015: Backlogs of open changesets by affiliation - https://phabricator.wikimedia.org/T113719#1748316 (Aklapper) I'm not sure how much the proposed approach will scale if more affiliations/companies were involved. Do we really want to click through all backl... [16:28:07] ottomata: nope, doesn't get picked up from wikitech [16:28:26] just the consumer_groups [16:28:41] everything else is fine [16:29:46] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [16:30:15] hm [16:32:09] hm, madhuvishy, maybe it doesn't work if you specify other params too [16:32:11] not sure about that [16:32:19] ottomata: possible [16:32:44] i think, whatever i'm trying to specify for the burrow class through hiera doesn't show up [16:33:02] the ones included from role::analytics::kafka::config are fine [16:38:22] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1748329 (JanZerebecki) If we offer public access to the public events of the past we need to rewrite them according to new event... [16:42:54] ottomata: yup, tried overriding httpserver_port - same thing [16:44:34] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [16:47:42] madhuvishy: can you make a simple test? if you have a class with 2 parameters, [16:47:50] can you get those parmaeters filled in if you don't manually specify them [16:47:51] and only do [16:47:59] include [16:48:07] via [16:48:17] ::param: in hiera? [16:48:31] a-team: humble suggestion, let's pair program on things like config files for this data store investigation: https://ide.c9.io/milimetric/workspace [16:48:38] (as we try druid / es / etc.) [16:48:56] I started work on the druid indexer json spec there [16:49:24] ottomata: hmmm, okay [16:50:21] joal: before you're done for the day, leave me the path where I can find 2015-10-01 data (the 8GB we were talking about) [16:50:30] I'll try to load it asap [16:50:32] sure milimetric [16:52:10] Analytics, MediaWiki-extensions-Gadgets: Gadget usage statistics - https://phabricator.wikimedia.org/T21288#1748387 (kaldari) [16:52:30] Analytics, MediaWiki-extensions-Gadgets: Gadget usage statistics - https://phabricator.wikimedia.org/T21288#240606 (kaldari) [16:52:34] Analytics, operations: removed user handrade from access - https://phabricator.wikimedia.org/T114427#1748392 (RobH) Open>Resolved [16:53:41] Analytics-Kanban, RESTBase, Services, RESTBase-API: configure RESTBase pageview proxy to Analytics' cluster {slug} [34 pts] - https://phabricator.wikimedia.org/T114830#1748399 (Milimetric) [16:53:56] (PS6) OliverKeyes: Functions for identifying search engines as referers. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) [16:54:00] (CR) OliverKeyes: Functions for identifying search engines as referers. (7 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) (owner: OliverKeyes) [17:01:18] (PS6) Mforns: Add oozie job to compute browser usage reports [analytics/refinery] - https://gerrit.wikimedia.org/r/246851 (https://phabricator.wikimedia.org/T88504) [17:02:21] milimetric: the hourly computation is almost done (hour 22 going), then doing daily [17:02:23] (CR) Mforns: Add oozie job to compute browser usage reports (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/246851 (https://phabricator.wikimedia.org/T88504) (owner: Mforns) [17:02:45] ottomata: coming to pageview api meeting? [17:02:46] milimetric: the path is: /user/joal/test_per_article/ [17:02:50] madhuvishy: hola? [17:03:19] nuria: yup! [17:03:52] oh [17:03:53] meeting [17:03:54] forgot [17:04:21] ottomata: holaaaa [17:05:23] nuria hello [17:07:34] ottomata: there's a cassandra/druid capacity meeting now, are you joining in? [17:07:55] oh! [17:07:58] there is? [17:07:59] uh [17:08:09] i am about to die of hunger and my friend is asking me to go get lunch [17:08:11] its not on my cal! [17:08:13] wasn't ready [17:08:15] do I need to be there [17:08:16] ? [17:08:17] you should go! [17:08:44] i don't think so, we are figuring out how much space each thing takes, etc [17:11:45] ok cool [17:11:49] getting lunch! [17:19:18] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1748539 (Aklapper) >>! In T107254#1524708, @chasemp wrote: > I mean in theory we could do this I think as we have the data: > > `... [17:20:22] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1748547 (Aklapper) Summarizing issues brought up here: * Older comments in tasks are confusing when they talk about status changes... [17:24:19] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1748586 (Aklapper) [17:39:54] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1748667 (GWicke) @ottomata, UUIDs are described in https://en.wikipedia.org/wiki/Universally_unique_identifier. An example for a... [17:40:32] Analytics-Tech-community-metrics, DevRel-October-2015: Affiliations and country of resident should be visible in Korma's user profiles - https://phabricator.wikimedia.org/T112528#1748669 (Aklapper) >>! In T112528#1748125, @Qgil wrote: > I meant impossible to verify by the own users. If I want to check whe... [17:47:14] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1748732 (GWicke) @JanZerebecki: Suppression information would indeed be needed for public access to older events. One option wou... [17:47:34] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [17:49:32] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [18:02:05] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1748811 (Aklapper) >>! In T107254#1748539, @Aklapper wrote: > So I'm not even sure **if** we would have the data to "fix" this task... [18:10:10] need to take a short break, will create tasks in a bit cc milimetric joal madhuvishy mforns_gym [18:10:37] I gotta eat lunch and stuff - I'll do the queries and I'll ping you West Coasters if you wanna pair up on loading druid? [18:13:13] milimetric: sure [18:13:35] i'm super interested, but sad i'll not be fully around next week [18:13:58] madhuvishy: it's ok, we can get a lot done today hopefully [18:14:07] I'm still around for at least a few hours [18:14:12] okay cool [18:26:32] Analytics-Backlog, Analytics-Kanban: Document Cassandra SLAS and storage requirements for daily and hourly data {slug} - https://phabricator.wikimedia.org/T116407#1748914 (Nuria) NEW a:Milimetric [18:26:47] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1748922 (JanZerebecki) As long as a separate public suppression event exists that refers to the old one it sounds fine. [18:28:32] Analytics-Backlog, Analytics-Kanban: Remove loading of hourly data from Cassandra loading scripts and hql {slug} - https://phabricator.wikimedia.org/T116408#1748931 (Nuria) NEW a:JAllemandou [18:28:48] Analytics-Backlog, Analytics-Kanban: Document Cassandra SLAS and storage requirements for daily and hourly data {slug} - https://phabricator.wikimedia.org/T116407#1748938 (Nuria) Maybe here? https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI/Capacity [18:32:25] milimetric: want to loot at druid on labs together then? [18:34:19] nuria: I was just finishing running the queries [18:34:26] aham [18:34:27] joal: you can truncate as you like, we have our results [18:34:33] no matter what you ask for, it takes 0.9 seconds [18:34:41] describe keyspaces, query a day hourly, query a day [18:35:08] I think 0.9 is just how long it takes for cqlsh to start, the data retrieval seems instant [18:35:21] anyway - nuria - I'll grab me some food and be back in 30 minutes [18:35:28] milimetric: ok [18:35:51] (i wrote the results in cloud9 and pasted my scripts in there) [18:38:54] Analytics-Backlog, Analytics-Kanban: Druid testing on labs to asses whether is a suitable Cassandra replacement. {slug} - https://phabricator.wikimedia.org/T116409#1748988 (Dzahn) [18:39:19] Analytics-Backlog, Analytics-Kanban: Druid testing on labs to asses whether is a suitable Cassandra replacement. {slug} - https://phabricator.wikimedia.org/T116409#1748947 (Dzahn) [18:40:42] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1748992 (mmodell) we could at least link to old-bugzilla.wikimedia.org/bugNN.html I guess, it doesn't really solve the problem but... [18:41:08] (CR) Nuria: Functions for identifying search engines as referers. (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) (owner: OliverKeyes) [18:41:56] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1748998 (Eevans) >>! In T116247#1748095, @Ottomata wrote: >> So the producer would store the same time stamp twice? UUID v1 alre... [18:49:49] (CR) Nuria: "This is ready to merge but I think we have a lot of changes on next deployment, will wait for @joal to confirm" [analytics/refinery] - https://gerrit.wikimedia.org/r/246851 (https://phabricator.wikimedia.org/T88504) (owner: Mforns) [18:51:37] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1749023 (ellery) Special:BanenrLoader currently does not include the campaign as a query parameter. Special:Re... [18:53:09] does anything special need to be done WRT eventlogging and rolling back to a previous schema? We deployed https://gerrit.wikimedia.org/r/#/c/247853/ several days ago, and i've verified the full update (including the WikimediaEvents.php file) are synced out [18:53:22] but the events coming in are still using the pre-revert schema revision id [18:53:33] the javascript code is right, but the schema revision is wrong, meaning all of them error out [18:56:20] ottomata: figured out the problem, in the erb file i was using @group instead of group which was the local variable for that iteration [19:03:31] nuria: do you know what needs to happen when some rolls back to a previous schema in EL? ebernhardson was asking ^ [19:04:04] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Schema changes - https://phabricator.wikimedia.org/T114164#1749067 (JKatzWMF) a:JKatzWMF>atgo [19:08:52] madhuvishy: yes, they need to update mw code [19:09:07] madhuvishy: same as it if was a rollforward [19:09:10] cc ebernhardson [19:09:44] if you see events coming in for the old schema (cc ebernhardson ) [19:09:51] nuria: all the mw code was updated, it's just not being delivered to the javascript [19:09:56] is because you have client code sending them, not all mw [19:10:02] are deployed at the same time [19:10:41] it's rolled back in both prod branches, verified via `mwversionsinuse` to see which versions are currently deployed [19:11:06] ebernhardson: did you update the schema that gets passed to javascript ? [19:11:20] ebernhardson: ah sorry [19:11:51] nuria: i think its something to do with caching, but i can't figure out how the caching is delivering the correct (post-revert) javascript, but the incorrect (pre-revert) schema version [19:12:14] ebernhardson: ah yes, the wikimediawevents.php is one of those files [19:13:32] ebernhardson: i deal so little with mw code that i forget how does this work, let me look at my notes [19:14:36] nuria: thanks [19:15:38] milimetric: table truncated, loading restarted, some more precise timing tested: between 100 and 350 ms per read :) [19:15:55] thx joal, have a nice weekend [19:16:05] Thanks mate [19:16:06] how'd you get that timing? [19:16:08] You too [19:16:17] tracing on in cqlsh [19:16:22] Gives a lot of details [19:16:34] oh awesome [19:16:40] should I re-do all the queries or did you? [19:16:49] I did most of them [19:16:58] and results are not dependent of data type [19:17:09] ok, if you have the results paste them in that summary [19:17:09] probably more of SStable being cahced or not [19:17:18] no need to format, I can do that [19:18:56] also, only did it for the old format because truncated data first milimetric [19:19:14] sok, no worries [19:19:23] it's pretty fast, is the point [19:19:30] few values added to the doc on cloud9 [19:19:41] the real question is actually how fast will it be with a few months of data [19:19:43] what is says is: it's fast :) [19:19:50] thx [19:19:53] good question [19:20:05] Anyway, gooooOOOOOOOnne ! [19:20:13] Bye a-team, have a good weekend [19:20:22] bye joal! [19:22:30] ebernhardson: ok, this is happened before and I think what is happening has to do with the way mw loads javascript, it works OK if you are moving ahead and loading a "new" module (newer js). In this case because your request (the one that gets formed from php file ) doesn't include new javascript (rolling back a schema you are loading an older js module [19:22:30] than you had prior) your are plain getting the older version. [19:23:09] argh, what a poor explanation. but bottom line cc Krinkle , you need to ask for the help of the guys that really understand the loading of js in mw [19:23:52] ok [19:24:15] ebernhardson: there is nothing "eventlgogging" about it i do not think, [19:25:02] ebernhardson: things will work fine if you make a new schema version (instead of rolling back), but yeah i know, not very helpful [19:25:20] that sounds like the easiest solution, thanks! [19:25:33] just making a new revision id with the old content is pretty easy [19:26:00] nuria: Hm.. from your explanation is sounds like everything is good (rollback to older version and thus loading the older version). We've definitely rolled back before without issues. I don't see anything special abut that, unless the EventLogging extension is doing something special in its SchemaModule that doesn't interact well with version decreases. [19:26:00] REsourceLoader supports it fine. [19:28:06] Krinkle: no, rollback to "old version" https://gerrit.wikimedia.org/r/#/c/247853/1/WikimediaEvents.php [19:28:30] Yes, that should be fine. [19:28:35] Krinkle: but they are getting events tagged with the "newer" version. in this case: " 14098806" [19:29:00] Krinkle: yes, basically after deploying that revert no events have made it through the processors [19:29:07] Krinkle: because they all come in with the old schema revision id [19:29:17] old=pre-revert [19:29:25] There is nothing specal about this in MediaWiki. The module (EventLogging's ResourceLoaderSchemaModule) is responsible for fetching a schema and giving it a version hash. [19:29:36] see: https://logstash.wikimedia.org/#/dashboard/elasticsearch/eventlogging-errors [19:29:42] Krinkle: if i remember cause those schema versions were converted to timestamps , thus when you revert [19:29:48] That version has won't literally include the revision number here, and comparison is done based on quality, not greater than or anything like that [19:30:04] equality* [19:30:16] Krinkle: you were asking for an "older" timestamp? thus no update of module will be served [19:30:23] Krinkle: does this ring a bell at all? [19:30:33] We don't use timestamps, not for almost 6 months now. [19:30:44] And even then it was fine because the timestamp was used as cache buster. [19:30:51] Krinkle: ah, ok, my info is bout 6 months old [19:31:14] if it goes back to the old version hash, it'll also get the old module (if it is still cached) as it will match the hash, which is good. [19:31:22] Krinkle: how long does this file get cached: https://gerrit.wikimedia.org/r/#/c/247853/1/WikimediaEvents.php [19:31:30] server side [19:31:49] doesn't get cached at all. [19:31:57] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/includes/ResourceLoaderSchemaModule.php#L71-L77 [19:31:59] Krinkle: so then, any ideas on how javascript is getting the wrong schema? the url generated in production by doing mw.loader.using(['schema.TestSearchSatisfaction2']) is https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=schema.TestSearchSatisfaction2&skin=vector&version=519bbb1e2035 [19:32:04] and that has the wrong schema in it [19:32:14] That code immediately results in a different definition summary, thus a different hash, and thus not hte same module content anymore. [19:32:40] https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=schema.TestSearchSatisfaction2&skin=vector&version=blabla [19:32:40] i've double checked on application servers that WikimediaEvents.php was synced out along with the js [19:32:43] That one is working [19:32:52] So it is cached on the old version hash [19:33:07] Likely caused by the race condition in scap. [19:33:17] scpa? [19:33:21] scap is the mw deploy tool [19:33:21] "scap"? [19:33:28] aaahhh [19:33:43] Krinkle: so, whats the fix then? i'm not following how those version=xxx values are generated [19:34:33] You deploy the code change (revision number), it syncs to server A, then, (before syncing to server B) someone views a page, they discover the startup module with the new version hash (based on after the deployment on server A), that users' browser then fetches that url and it goes to say server B (load balancer right), that produces teh response with the [19:34:33] old version still but its cached in varnish for 30 days. [19:34:49] version= is just sha1() of json_encode of getDefinitionSummary() return value. [19:34:54] plus some compression [19:35:47] fundamentally caused by cluster syncing not being atomic to the the whole cluster [19:35:54] ah ok, it is because ye the newer content is not a vailable for 2nd user [19:36:23] and thus he gets latest confused with version on HEAD-1 (if that makes sense) [19:36:48] bd808: but also because you are letting two urls point to the same resource [19:37:17] yes. that is another horrible thing we do [19:37:17] bd808: ebernhardson: https://phabricator.wikimedia.org/T47877 [19:37:49] bd808: cause our system instaed of saying '404' gives you wahtever is latest [19:37:58] or whatever it "thinks" is latest [19:38:09] ok, that all makes sense. i suppose the right fix here then is make a new revision in meta, and point this at it [19:38:12] nuria: Yeah, you have a user getting the meta data from server A, using that to make a request with version=new, whch then randomly ends up on server B and thus "fills" that request in varnish cache with the old code. [19:38:16] (well, the stop-gap fix) [19:38:22] ebernhardson: Yeah, the fix is to roll the dice again. [19:38:56] Related proposals: Have RL verify the version hash and emit short max-age if its wrong so that it isn't stuck for 30 days. [19:39:13] nuria: relatedly, is there any way to re-process all those failed events with the right schema? [19:39:23] And also: Proper deployment cycle (e.g. depool servers, build critical mass, and then switch over) [19:39:26] (after i fix them coming in) [19:39:40] It doesn't say 404 because the resource exists, just in a different version. The core design flaw is that the versioning of resources is not exposed in the URl for resources that are updated outside of a MW version bump [19:40:24] ebernhardson: no, there isn't [19:40:41] ebernhardson: an easy way, that is, they are not valid [19:40:57] oh well, ok [19:41:33] >bd808> It doesn't say 404 because the resource exists, just in a different version. [19:42:35] do not buy it, the resource doesn't know it has a version . every versioned resource is a different one. [19:42:40] but any ways ..... [19:43:02] ebernhardson: please retry, sorry that those events went to waste [19:43:12] ebernhardson: they will be in the all-events log [19:43:22] but not on the database [19:45:51] (CR) Nuria: Functions for identifying search engines as referers. (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) (owner: OliverKeyes) [19:51:48] does anyone know if dumps.wikimedia.org requests will be in hadoop? :P or should I just go and look? [19:51:51] and lists.wikimedia.org? [19:53:21] addshore: not lists [19:53:23] but dumps maybe, which ones? [19:53:45] well, any, I'll just go have a look :) [19:55:03] hey a-team, what exactly is eventlogging Edit schema [19:55:07] is that just saves from visual editor? [19:55:09] or all edits? [19:55:25] ottomata: it's all edits, there's an "editor" field [19:55:34] that has either visualeditor or wikitext in it [19:55:35] ok cool [19:55:37] perfect [19:55:40] but that schema is much much more [19:55:44] i'm just making some dummy examples for lightning talk [19:55:49] it has timing information for different types of actions [19:55:55] and am saying things like top K edits by wiki, etc. [19:55:59] k [19:56:03] i'm just doing dummy queries [19:56:10] think of it as a de-normalized table storing all edit-related activity with a lot of null columns [19:56:14] so just want to make sure i don't say somethign incorrect [19:56:19] perfect [19:56:23] cool [19:56:44] madhuvishy: I'm in the batcave at nuria's request, but she's disappeared for the moment [19:56:54] so join me in the cave, we can look at this together [19:57:08] ok [19:59:01] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 2 others: Schema changes - https://phabricator.wikimedia.org/T114164#1749270 (atgo) [20:04:41] so neither lists.wikimedia or dumps.wikimedia requests go into hadoop :( [20:05:28] i'm not sure what you mean by dumps [20:05:33] but the request data there is generated from hadoop [20:05:54] /hdfs/wmf/data/archive/pagecounts-all-sites [20:05:59] and pagecounts-raw [20:07:49] http://dumps.wikimedia.org ottomata! So I mean requests of dumps.wikimedia.org pages. but the logs do not go into hadoop! [20:08:11] addshore: aah, check out the pageview definition to see what gets in [20:08:20] they should be in the raw logs [20:08:40] hmmm [20:08:48] actually, why is it not in refined [20:09:03] addshore: how are you looking it up? [20:10:37] Ohhh [20:10:38] haha [20:10:49] addshore: no, i don't think dumps.wm.o goes in there [20:11:05] it might go into webrequests, but i'm not sure [20:12:27] couldn't see it in webrequests :( sad times! [20:13:09] there are a lot of things we don't track sadly. mw tarball downloads are another that we really should be counting and displaying somewhere [20:14:10] *agrees* :P [20:14:17] Should I file tickets for all of these ? ;) [20:16:04] ottomata: if they don't get into webrequest, where are they filtered [20:16:23] we don't filter certain domains on refine no [20:16:36] bd808: I cant wait for the api data to land in hadoop :) [20:17:00] :) I'm working on the hard specs for it now [20:18:02] :) It was rather amazing timing, I started looking at trying to use webrequest data to look into the api usage for wikidata, only to find of course POST data is missed, then got the api.log s moved over so I could run some silly scripts on them. [20:18:18] and probably only a week later that patch appears :) or at least I sa it for the first time :D [20:19:37] madhuvishy: i'm not sure how dumps.wm is served [20:19:47] hmmm [20:19:48] but to get into webrequest, there has to be varnish and a varnishkafka instance [20:19:52] yeah [20:19:58] if they are in webrequest, i'd guess in misc [20:20:05] right [20:20:15] addshore: ^, if you do make a ticket [20:23:03] ottomata: which labs project is labstore1003? can you add me? [20:24:50] madhuvishy: its not a project [20:24:53] its a phsyical node [20:25:10] ya ya, but it will be part of some project no? [20:25:27] ottomata: can you add me to it? [20:25:43] If your labs project instances don't already have /public/statistics mounted, then you will need to make a commit to the operations/puppet repository in modules/labstore/files/nfs-mounts.yaml to enable this. [20:25:49] yeah, saw that [20:25:56] i'm not really sure what that means [20:25:58] might ahve to ask Yuvi [20:26:05] ottomata: are you saying i don't need access to labstore? [20:26:11] ok let me poke him [20:26:14] right [20:26:18] Analytics, Wikimedia-Mailing-lists: Requests to lists.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116429#1749325 (Addshore) NEW [20:26:20] Analytics, Datasets-General-or-Unknown: Requests to dumps.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116430#1749332 (Addshore) NEW [20:26:26] madhuvishy: made a task for each ^^ [20:26:44] addshore: we don't look at Analytics anymore [20:26:48] oh :p [20:26:50] use analytics-backlog [20:26:57] *goes to fix* [20:27:15] Analytics-Backlog, Wikimedia-Mailing-lists: Requests to lists.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116429#1749339 (Addshore) [20:27:16] Analytics-Backlog, Datasets-General-or-Unknown: Requests to dumps.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116430#1749341 (Addshore) [20:27:19] also, I'm not even sure we'll do this, if it needs varnish in front of it to end up in our logs [20:27:38] addshore: i feel like it will be ops [20:27:58] shall I CC ops too? [20:28:42] I'll just wait and see what happens to the tickets :) If it cant be done then it cant be done :/ But it would be great to actually see what is happening there [20:29:08] Analytics-Backlog, Datasets-General-or-Unknown, operations: Requests to dumps.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116430#1749332 (Addshore) [20:29:10] Analytics-Backlog, Wikimedia-Mailing-lists, operations: Requests to lists.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116429#1749325 (Addshore) [20:31:14] addshore: cool, I'll ask in our next backlog grooming if no one comments by then, I don't know much about this stuff [20:31:45] awesome! :) [20:35:50] madhuvishy or others: i'm having unicode issues when querying wmf.pageview_hourly in the Hive CLI, regarding the values of page_title [20:35:58] e.g. https://fr.wikipedia.org/wiki/Wikip��dia:Accueil_principal instead of https://fr.wikipedia.org/wiki/Wikipédia:Accueil_principal [20:36:55] output into a file ("hive ... > results.txt") and scping it does not help [20:38:58] when running the query on Hue the same page name is displayed fine, but... [20:39:48] .. it still shows this: https://fr.wikipedia.org/wiki/Sp?cial:Search (but at the same time: https://fr.wikipedia.org/wiki/Spécial:Recherche ) [20:39:54] any ideas? [20:50:11] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 2 others: Schema changes - https://phabricator.wikimedia.org/T114164#1749409 (atgo) [20:50:58] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 2 others: Schema changes - https://phabricator.wikimedia.org/T114164#1686646 (atgo) [20:51:56] addshore: our pageview infrastructure requires varnish on top of domain [20:52:10] addshore: no varnish , no varnishkafka no pageviews [20:53:36] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1749424 (Jgreen) >>! In T97676#1749023, @ellery wrote: > Special:BanenrLoader currently does not include the ca... [21:01:49] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1749449 (ellery) Hmm. I thought Special:RecordImpression was retired and replaced by Special:BannerLoader. Andy... [21:02:58] (CR) Mforns: [C: 1] Update camus-partition-checker (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247847 (owner: Joal) [21:04:29] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1749452 (Ottomata) Right, but how would you do this in say, Hive? Or in bash? Timestamp logic should be easy and immediate. >... [21:06:31] is there a way to raise the heap size on Hue like on can when invoking the Hive CLI (https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries#Out_of_Memory_Errors_on_Client )? [21:07:08] (because i'd like to run the above query on Hue instead with less unicode issues than in the CLI) [21:08:45] HaeB: hm, i doubt it. [21:08:47] hehe [21:08:55] HaeB: maybe try beeline instead of hive cli? [21:09:06] madhuvishy: what was that beeline command you found [21:09:06] ? [21:09:46] ottomata: yes, what was the command for beeline again? (you mentioned it before, but..) [21:09:52] beeline -u jdbc:hive2://analytics1027.eqiad.wmnet:10000 [21:11:06] HaeB: ^ [21:12:03] madhuvishy: great, trying it now.... the -f parameter works as with hive? ("... -f query.hql ") [21:12:08] not sure, maybe it will handle those encoding issue better [21:12:13] HaeB: yeah [21:12:14] ok yalls [21:12:17] have a good weekend! [21:12:19] ottomata: byeeee [21:16:37] madhuvishy: ERROR : Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=anonymous, access=WRITE, inode="/user":hdfs:hadoop:drwxrwxr-x [21:16:56] HaeB: aah [21:17:38] HaeB: can you do beeline -u jdbc:hive2://analytics1027.eqiad.wmnet:10000 -n HaeB [21:17:54] assuming HaeB is your shell username [21:18:17] it's tbayer... trying now [21:19:04] HaeB: sorry, we haven't experimented with beeline much, i'll play with it more next week and document it [21:22:54] madhuvishy: it worked! (both for the access issue and the unicode issue... it even has https://fr.wikipedia.org/wiki/Spécial:Recherche correct, the one that was wrong on Hue too) [21:23:03] ooh awesome [21:23:12] great [21:23:14] ..now trying the full query that hit the heap size limit [21:25:34] HaeB: beeline also supports csv i think - don't know if that's useful to you [21:26:23] --outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2] [21:26:53] good to know... in interactive mode too? [21:28:05] madhuvishy: "Error: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (state=08S01,code=0)" [21:28:11] guess i'll just retry [21:28:19] HaeB: aah yes, you'd have to do- !set outputformat csv [21:28:25] i think [21:35:20] madhuvishy: ok, restarted beeline and the pipe is holding up... however, i'm getting the heap error here too now: "Error: java.lang.OutOfMemoryError: Java heap space (state=,code=0)" [21:35:43] HaeB: did you do export HADOOP_HEAPSIZE=1024 [21:36:04] not yet, so that's for beeline too? [21:36:10] ok, trying now [21:37:16] HaeB: yeah [21:37:25] beeline is just an interface [21:37:36] right, ok [21:39:39] madhuvishy: still getting the same heap error :( [21:39:43] after doing this: [21:39:50] tbayer@stat1002:~$ export HADOOP_HEAPSIZE=1024 [21:39:51] tbayer@stat1002:~$ beeline -u jdbc:hive2://analytics1027.eqiad.wmnet:10000 -n tbayer [21:40:04] HaeB: hmmm, let me look at docs [21:40:58] i know 1024 is enough because the query ran successfully via the old CLI (except the unicode issues) [21:45:43] HaeB: yeah I wonder if you have to set it differently for beeline to pick it up [21:50:03] madhuvishy: confirming that it didn't pick up the change: [21:50:06] 0: jdbc:hive2://analytics1027.eqiad.wmnet:100> set env:HADOOP_HEAPSIZE; [21:50:23] ... shows this: " env:HADOOP_HEAPSIZE=256 " [21:51:12] HaeB: oh cool [21:51:15] (that's when calling it like this: "tbayer@stat1002:~$ export HADOOP_HEAPSIZE=1024 && beeline -u jdbc:hive2://analytics1027.eqiad.wmnet:10000 -n tbayer") [21:51:32] can you do !set env:HADOOP_HEAPSIZE=1024 inside [21:53:19] that doesn't seem to be the right syntax: [21:53:26] 0: jdbc:hive2://analytics1027.eqiad.wmnet:100> !set env:HADOOP_HEAPSIZE=1024 [21:53:26] Usage: set [21:54:11] and also not like this: [21:54:23] 0: jdbc:hive2://analytics1027.eqiad.wmnet:100> set env:HADOOP_HEAPSIZE=1024; [21:54:33] Error: Error while processing statement: null (state=,code=1) [21:55:29] HaeB: may be [21:55:34] !set env:HADOOP_HEAPSIZE 1024 [21:57:18] madhuvishy: that results in "Error setting configuration: env:HADOOP_HEAPSIZE: java.lang.IllegalArgumentException: No method matching "setenv:HADOOP_HEAPSIZE" was found in org.apache.hive.beeline.BeeLineOpts." [21:58:14] * HaeB reads http://blog.cloudera.com/blog/2014/02/migrating-from-hive-cli-to-beeline-a-primer/ "Perhaps the most interesting difference between the clients concerns the use of Hive variables." [21:58:27] oh yes, very interesting and entertaining indeed [21:59:46] HaeB: it has a line that says Note that environment variables cannot be set: [22:01:10] HaeB: I tried [22:01:21] set env:HADOOP_HEAPSIZE 1024 [22:01:25] and it dint complain [22:01:56] hmmm [22:02:07] i see it now [22:02:09] weird [22:02:26] from inside beeline? [22:03:33] no it errors out [22:04:24] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1749626 (GWicke) > Right, but how would you do this in say, Hive? Or in bash? Timestamp logic should be easy and immediate. Yea... [22:16:53] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1749660 (GWicke) [22:18:00] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1744407 (GWicke) I went ahead and updated the task description with the current framing / per-event schema. I renamed the `reqi... [22:36:31] Analytics-Backlog, Wikimedia-Mailing-lists, operations: Requests to lists.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116429#1749735 (Dzahn) What are you trying to find out? [22:37:57] Analytics-Backlog, Datasets-General-or-Unknown, operations: Requests to dumps.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116430#1749739 (Dzahn) duplicate of T116429 [22:38:59] Analytics-Backlog, Datasets-General-or-Unknown, operations: Requests to dumps.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116430#1749749 (Dzahn) [22:44:04] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1749757 (AndyRussG) @ellery, sadly we haven't yet been able to ditch Special:RecordImpression for Special:Banne... [22:50:23] Analytics-Backlog, Wikimedia-Mailing-lists, operations: Requests to lists.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116429#1749766 (Dzahn) p:Triage>Normal [23:42:29] madhuvishy: tried various other things, but i'm still getting java.lang.OutOfMemoryError :( [23:43:30] (and wasn't able to change HADOOP_HEAPSIZE... [23:44:32] ...there seems to be other relevant variables too https://stackoverflow.com/questions/24070557/what-is-the-relation-between-mapreduce-map-memory-mb-and-mapred-map-child-jav