[00:25:20] Analytics-Backlog, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, Patch-For-Review: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1525053 (bd808) [00:25:41] Analytics-Engineering, Analytics-EventLogging, Need-volunteer, Patch-For-Review: EventLogging calling deprecated SyntaxHighlight_GeSHi::buildHeadItem - https://phabricator.wikimedia.org/T71328#1727042 (Legoktm) Open>Resolved [00:28:54] legoktm: thanks ^ [00:29:11] yw :) [00:35:03] Analytics-Cluster, Database: Replicate Echo tables to analytics-store - https://phabricator.wikimedia.org/T115275#1727080 (Neil_P._Quinn_WMF) [10:53:23] (CR) Joal: [C: 1] "Minor comments inline." (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [11:15:25] Analytics-Engineering, Community-Tech: [AOI] Add page view statistics to page information pages (action=info) - https://phabricator.wikimedia.org/T110147#1727738 (NiharikaKohli) @kaldari, I think this should be in the Blocked column instead of Assigned to other teams. [12:10:06] !log Stopped daily and monthly mobile unique coordinators [12:10:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [12:12:18] !log Rerunning daily mobile unique jobs for days 2015-08-[03,04,11,12,12,14,17], 2015-09-16 [12:12:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [12:12:48] !log Restarting daily and monthly mobile unique coordinators with new patch [12:12:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [12:17:23] !log Refinery deploy needed before restart --> Deploying [12:17:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [13:07:49] Analytics-Kanban, Mobile-Apps, Patch-For-Review: Investigate and fix inconsistent data in mobile_apps_uniques_daily {hawk} [5 pts] - https://phabricator.wikimedia.org/T114406#1728009 (JAllemandou) Fix deployed, data corrected (highest value kept) for days: - 2015-08-03 - 2015-08-04 - 2015-08-11 - 2015-0... [13:43:37] (PS8) Joal: Add CassandraXSVLoader to refinery-job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) [13:59:25] joal: hello, yt/ [13:59:31] Hi nuria [13:59:34] I am ! [14:00:09] joal: is there a workarround for this error i was getting while testing oozie: " SemanticException [Error 10071]: Inserting into a external table is not allowed pageview_hourly" [14:00:19] cave > [14:00:20] ? [14:00:27] sure, 2 mins [14:01:24] joal:omw [14:01:28] k [14:05:08] hi a-team! [14:05:15] Analytics, Services, operations: Automatic monitoring not working for AQS - https://phabricator.wikimedia.org/T115588#1728063 (mobrovac) NEW [14:05:17] hi! [14:05:18] Hi a-team :) [14:05:39] howdy [14:05:57] Good milimetric :) [14:06:11] Analytics-Kanban, netops, operations, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1728076 (mobrovac) [14:06:12] Analytics, Services: restbase is not listening on port 7231 on aqs* - https://phabricator.wikimedia.org/T114742#1728072 (mobrovac) Open>Resolved >>! In T114742#1725185, @Dzahn wrote: > We still have 3 CRITs in Icinga for "Restbase endpoints health" on aqs and there was a comment next to them linking... [14:18:35] kevinator: not usual to have you this early :) [14:18:46] kevinator: enjoying home ? [14:19:03] yes, but I feel like my clock is all messed up. [14:19:12] :) [14:19:25] I'm not used to working very much before standup [14:23:21] Analytics-Backlog: Traffic Breakdown Report - Browser Major Minor Version {lama} - https://phabricator.wikimedia.org/T115590#1728090 (Milimetric) NEW [14:24:34] Analytics-Backlog: Traffic Breakdown Report - Client OS Major Minor Version - https://phabricator.wikimedia.org/T115591#1728097 (Milimetric) NEW [14:25:21] Analytics-Backlog: Traffic Breakdown Report - Google Requests - https://phabricator.wikimedia.org/T115592#1728103 (Milimetric) NEW [14:26:15] Analytics-Backlog: Traffic Breakdown Report - Mime Type {lama} - https://phabricator.wikimedia.org/T115594#1728118 (Milimetric) NEW [14:27:52] Analytics-Backlog: Traffic Breakdown Report - Target Wiki {lama} - https://phabricator.wikimedia.org/T115595#1728124 (Milimetric) NEW [14:30:13] Analytics-Backlog: Traffic Breakdown Report - Crawlers {lama} - https://phabricator.wikimedia.org/T115596#1728132 (Milimetric) NEW [14:32:20] Analytics-Backlog: Traffic Breakdown Report - User Agent Overview {lama} - https://phabricator.wikimedia.org/T115599#1728152 (Milimetric) NEW [14:34:14] Analytics-Backlog: Traffic Breakdown Report - User Agents Trend {lama} - https://phabricator.wikimedia.org/T115601#1728165 (Milimetric) NEW [14:34:27] Analytics-Backlog: Traffic Breakdown Report - Browser Trend {lama} - https://phabricator.wikimedia.org/T115602#1728171 (Milimetric) NEW [14:38:38] Analytics-Backlog: Traffic Breakdown Report - Visiting Country {lama} - https://phabricator.wikimedia.org/T115605#1728191 (Milimetric) NEW [14:39:15] Analytics-Backlog: Traffic Breakdown Report - Visiting Country per Wiki {lama} - https://phabricator.wikimedia.org/T115607#1728204 (Milimetric) NEW [14:41:47] Analytics-Backlog: Traffic Breakdown Report - Visiting Country per Wikipedia Language {lama} - https://phabricator.wikimedia.org/T115608#1728210 (Milimetric) NEW [14:42:34] Analytics-Backlog: Traffic Breakdown Report - Visiting Country per Wiki Trend {lama} - https://phabricator.wikimedia.org/T115609#1728216 (Milimetric) NEW [14:43:38] Analytics-Backlog: Traffic Breakdown Report - Browsers from Visiting Country {lama} - https://phabricator.wikimedia.org/T115610#1728223 (Milimetric) NEW [14:45:34] Analytics-Backlog: Traffic Breakdown Report - Device by Site from Visiting Country {lama} - https://phabricator.wikimedia.org/T115612#1728241 (Milimetric) NEW [14:45:54] Analytics-Backlog: Traffic Breakdown Report - Client OS from Visiting Country {lama} - https://phabricator.wikimedia.org/T115613#1728248 (Milimetric) NEW [14:46:15] Analytics-Backlog: Traffic Breakdown Report - Google Requests {lama} - https://phabricator.wikimedia.org/T115592#1728254 (Milimetric) [14:46:25] Analytics-Backlog: Traffic Breakdown Report - Client OS Major Minor Version {lama} - https://phabricator.wikimedia.org/T115591#1728256 (Milimetric) [14:48:49] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1728261 (ezachte) While waiting for new input for Monthly Pageview Reports (which is coming along, thanks @Milimetric !), I looked... [14:49:35] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1728262 (Milimetric) FYI - we are starting to prioritize this work for this quarter. We've made tasks for each of the reports listed on the wiki page [1]. These tasks can be found in the #Analytics-Ba... [15:18:22] joal: do you have 1 sec? [15:18:27] I do nuria [15:18:38] Analytics, Services, operations: Automatic monitoring not working for AQS - https://phabricator.wikimedia.org/T115588#1728301 (mobrovac) a:mobrovac [15:19:02] joal: I am still geting the hive error despite having changed all these in coordinator properties: [15:19:14] https://www.irccloud.com/pastebin/Z2lH1SOi/ [15:20:24] I must be missing a place where things have to be changed [15:20:44] joal: i also have -Doozie_directory=/tmp/oozie-nuria/oozie [15:21:12] where i put my oozie directory in hdfs like: [15:21:14] hdfs dfs -rmr /tmp/oozie-nuria ; hdfs dfs -mkdir /tmp/oozie-nuria; hdfs dfs -put oozie/ /tmp/oozie-nuria [15:21:14] nuria: you need to reset hive_site (for sure) [15:21:35] joal: right, my hive-site is -Dhive_site_xml=/tmp/nuria/hive-site.xml [15:21:53] nuria: really ? [15:22:20] nuria: I guess you have it taken from a refinery deploy ? [15:22:36] joal: yes, i launcxh the job like: [15:22:52] oozie job -run -Duser=nuria -Dhive_site_xml=/tmp/nuria/hive-site.xml -Darchive_directory=hdfs://analytics-hadoop/tmp/nuria -Doozie_directory=/tmp/oozie-nuria/oozie -config ./oozie/pageview/hourly/coordinator.properties -Dstart_time=2015-09-05T00:00Z -Dstop_time=2015-09-05T01:00Z [15:23:17] joal: which is the way i have tested in the past, how else can you pass hive-site? [15:23:54] Giving it a real deployed one (from refinery/current for instance) [15:24:04] But if it's an ok version (then that's not it) [15:24:24] When you have created your tables, you have changed the locations, right ? [15:24:26] joal: maybe my hive-site is old [15:24:48] nuria: shouldn't, hive-site has not changed for ages [15:25:09] joal: location of tables is: LOCATION '/tmp/nuria/whilelist'; [15:25:19] mmm.. typooo [15:26:45] joal: ok, if all sounds good i will look again, i must have missed one place [15:26:55] nuria: soun [15:26:58] ds weird [15:27:16] There 3 tables --> 3 different path ? [15:27:30] nuria: --^ [15:28:58] joal: 1 path per table you mean? [15:29:16] there is pageview, whitelist, and unexpected_values [15:29:22] Must have a path for each [15:31:14] joal: thank you, changing that [15:31:20] joal: morning :) [15:31:38] Have you figured out the class not found error? [15:31:39] nuria: if you use the same path for each table, hive won't work :) [15:31:52] joal: ajajam [15:32:13] hey madhuvishy : class not found solved, now fighting (for at least 2 hours) with classpath ordering [15:32:46] Joal, aah did you have to split out the inner class to a separate one to solve it? [15:32:48] madhuvishy: for the class not found, I extracted the reducer class from the CassandraXSVLoader one [15:32:53] yup [15:33:32] joal, okay. I'm getting to office, let's catchup when I'm there/after standup! [15:33:39] sure madhuvishy :) [15:33:42] safe trip [15:47:05] nuria, yt? [15:47:19] yesssir [15:47:22] hi :] [15:47:27] milimetric: for fun: https://github.com/mrdoob/three.js [15:47:34] mforns: hola [15:47:34] nuria, regarding https://phabricator.wikimedia.org/T88504 [15:48:00] aham [15:48:06] you say we need 2 oozie jobs, one for mobile and another for desktop? [15:48:29] ideally yes, one reads from all sources [15:48:37] other reads from mobile source only [15:48:41] both produce a file [15:48:57] I see [15:50:10] dan just filed similar work items to get these from wikistats but I think delivering the browser report sooner will help a lot of developers [15:50:19] cc milimetric [15:50:28] aha [15:50:42] thx [15:51:00] I've new thoughts after going through those tasks [15:51:04] we should talk at standup [15:51:10] milimetric: yess, wannna share? [16:02:23] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Setup pipeline for search logs to travel through kafka and camus into hadoop {hawk} [55 pts] - https://phabricator.wikimedia.org/T113521#1728388 (kevinator) Open>Resolved [16:02:55] Analytics-Kanban: Enforce policy for each schema: Sanitize {tick} [8 pts] - https://phabricator.wikimedia.org/T104877#1728392 (kevinator) [16:02:56] Analytics-Kanban, Database: Delete obsolete schemas {tick} [5 pts] - https://phabricator.wikimedia.org/T108857#1728391 (kevinator) Open>Resolved [16:03:52] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1728393 (kevinator) Open>Resolved [16:04:31] Analytics-Kanban, Patch-For-Review: Create Hadoop Job to load data into cassandra [34 pts] {slug} - https://phabricator.wikimedia.org/T108174#1728398 (kevinator) Open>Resolved [16:05:03] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1728402 (Ironholds) For what it's worth, as the author of both the R definition and the Java definition I absolutely agree with this approach - with the exceptio... [16:11:21] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1728417 (Nuria) > If the wiki page and code disagree, the code is wrong. You are right, this should be the approach for a definition with such a high visibility.... [16:11:22] (CR) Milimetric: Archive hourly pageviews by article in wsc format (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [16:19:36] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1728458 (Ironholds) English really needs a word for "yay, we agree!" but in the absence of that I'll just say this :P [17:12:09] Analytics-Backlog: ==== Immediate Above ==== - https://phabricator.wikimedia.org/T115634#1728670 (Milimetric) NEW [17:14:26] Analytics-Backlog, Analytics-Cluster: Implement better Webrequest load monitoring {hawk} - https://phabricator.wikimedia.org/T109192#1728684 (Milimetric) [17:29:07] Analytics-Backlog: Move camus files from refinery to puppet - https://phabricator.wikimedia.org/T113990#1728723 (Nuria) This is now done. [17:29:16] Analytics-Backlog: Move camus files from refinery to puppet [3] - https://phabricator.wikimedia.org/T113990#1728724 (Nuria) [17:30:00] Analytics-Backlog, Analytics-Kanban: Move camus files from refinery to puppet [3] - https://phabricator.wikimedia.org/T113990#1681855 (Nuria) [17:33:07] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1728741 (Jgreen) >>! In T97676#1726675, @ellery wrote: > 1:10 is already much better. Pgheres has a campaign fi... [17:35:32] mforns: let's talk about browser report in a bit [17:39:10] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1728747 (awight) @ellery: just noting that there's still an open question for you, which is blocking us. I don... [17:41:58] * milimetric going to get lunch [17:51:24] (PS12) Joal: [WIP] Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 [17:52:29] mforns: let me know when you are back [17:54:49] Hey a-team, have a good day ! [17:54:56] See you tomorrow [18:04:05] milimetric: nuria this documentation debate is endless - we only want to remove docs for things like Kraken that never existed right? [18:04:15] madhuvishy: yes [18:05:15] nuria: and we'll archive old pages if they are about things we worked on, but don't look at anymore - correct? [18:05:24] madhuvishy: right [18:05:28] cool [18:05:35] madhuvishy: that is what would make sense to me. [18:09:52] agreed, btw, both that the debate is silly and about what to delete [18:10:11] yeah [18:20:50] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728829 (jgbarah) Hi, @Anmolkalia, I've run your code, and I don't see that unicode error you see: ``` $ python mediawiki_analysis.py --database mdb --db-user jgb --db-password XXX --url http... [18:23:52] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728836 (jgbarah) In fact, after a while, I get an error with your code: ``` $ python mediawiki_analysis.py --database mdb --db-user jgb --db-password XXX --url https://www.wikipedia.org/w None... [18:24:28] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728837 (Anmolkalia) Hi, @jgbarah, I am getting that error if I run the code again, basically when the database already contains the values that are being retrieved. Meaning, the print statement... [18:26:09] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728839 (jgbarah) >>! In T114437#1717889, @Anmolkalia wrote: > There seems to be a problem with the print statement in the insert_page function, line 101. It seems to be working if I am using on... [18:28:10] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728848 (Anmolkalia) >>! In T114437#1728836, @jgbarah wrote: > In fact, after a while, I get an error with your code: > > ``` > $ python mediawiki_analysis.py --database mdb --db-user jgb --db-... [18:30:19] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728852 (Anmolkalia) >>! In T114437#1728839, @jgbarah wrote: >>>! In T114437#1717889, @Anmolkalia wrote: >> There seems to be a problem with the print statement in the insert_page function, line... [18:31:24] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728854 (jgbarah) BTW, to easy the process of reviewing the code, maybe you can fork the MediaWikiAnalysis GitHub repository, point me to your fork, and I just clone it. That way I can follow yo... [18:40:06] madhuvishy: thanks for responding to the thread, but I think this discussion is not very useful [18:40:18] ya [18:40:31] the folks chiming in have never read our documentation, they will never read our documentation, and they are just speaking from a theoretical point of view of a marginal use case [18:40:37] yeah [18:40:52] our documentation should meet the needs of people that don't know about our world. We'll continue to make it better for those people [18:41:31] yup! [18:41:57] k, a-team, see above regarding the documentation discussion. I'm not going to respond on that thread, it's not constructive [18:47:27] milimetric, btw, are you planning to work on the wikimetrics changes this quarter? [18:47:50] mforns: yes, we roughly scheduled them to start next week [18:48:26] I'm going to show Amanda's partner how to develop on wikimetrics, see if he wants to help too [18:48:39] but I'll make tasks and you can join if you like [18:49:05] ok [18:50:00] milimetric, is anyone else willing to work on this? [18:51:16] I donno :) [18:51:27] I was thinking of a personal goal for this quarter [18:51:37] I can only answer for myself - and I am lazy so I like easy tasks that have really excited stakeholders [18:51:55] hehe [18:52:07] so I'm happy to work on this. I'm not sure what my personal goals will be, but if you want to work on this, definitely add it as your personal [18:52:22] if you want, we can chat for a bit so you know what's involved [18:52:32] mmm, I'll think about this [18:52:39] k, lemme know [18:52:45] k, thanks :] [18:55:38] I'm committed to the puppet changes [18:55:52] For wikimetrics [18:56:08] aha [18:56:11] Happy to help with the other changes too [19:13:13] nuria, do you think we should remove spiders from the browser usage report? [19:21:10] mforns, I am not nuria, but given the number of questions I've got about spider traffic counts: no ;p [19:21:35] hey Ironholds, thanks for chiming in :] [19:21:41] cool, I'll keep them [19:21:49] I dunno, wait for nuria, I may also be an edge case :D [19:22:01] ok [19:22:03] :] [19:47:03] Analytics-Cluster, Analytics-Kanban, Easy: PM sees reports on browsers (Weekly or Daily) [8 pts] - https://phabricator.wikimedia.org/T88504#1729092 (mforns) How about something like this for mobile (mobile web + mobile app): ``` os browser view_count iOS 9 Mobile Sa... [19:56:56] Ironholds: hola [19:57:02] * Ironholds waves [19:57:18] mforns: yes, as the browser report is for developers [19:57:25] mforns: to be efficient traging bugs [19:57:25] nuria, ok [19:57:32] mforns: it does not represent traffic [19:57:49] Ironholds: we are going to have maybe a better proxy for spiders soon [19:57:56] Ironholds: if it works i will let you know [19:58:08] nuria, so, should we include the os's and browser's major version? [19:58:14] Ironholds: will document it and yarayara [19:58:39] mforns: let me see again teh output of ua-parser [19:58:54] nuria, there's an example in the task [19:59:05] https://phabricator.wikimedia.org/T88504 [19:59:33] mforns: aham [19:59:34] the thing is, like this, the long tail is quite long: 3600 records [19:59:55] mforns: yes, that is too be expected with mobile fragmentation [20:00:16] ok, so I'll keep it like this [20:00:32] so "browser_family" "os_family" "browser_version" "os_version" [20:00:49] browser_major that is [20:01:03] https://www.irccloud.com/pastebin/kTAGiA2Z/ [20:01:33] mforns: in the case above [20:01:55] we will not be reporting device or os-minor or app-version [20:01:57] nuria, cool! [20:02:10] ok [20:02:10] we might need to add os-minor later for mobile [20:02:54] mforns: i would not include mobile app [20:03:15] mforns: cause in that case what matters are different fields [20:03:26] mforns: want me to comment on task? [20:03:28] nuria, aha [20:03:48] nuria, no it's not needed [20:04:27] mforns: in this case traffic is tiny compared to desktop but it is really a different report in which browser versions are not so meningful without app version [20:04:31] *meaningful [20:05:02] nuria, aha [20:05:32] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [20:06:01] Analytics-Cluster, Analytics-Kanban, Easy: PM sees reports on browsers (Weekly or Daily) [8 pts] - https://phabricator.wikimedia.org/T88504#1729157 (Nuria) Two changes: 1) let's not include mobile app, just mobile web on the mobile report and mobile+desktop in the overall. 2) let's report percentages [20:06:09] mforns: also percentages better than views [20:06:21] mforns: as otherwise devs are going to have to calculate those [20:06:40] nuria, is this a report to be read by humans then? [20:06:44] mforns: it is to be understood that (because we are throwing away the longtail) it will not add up to 100% [20:06:51] mforns: yes, by devs [20:07:12] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [20:07:23] nuria, oh, we are throwing away the long tail? so where is the threshold? [20:07:48] mforns: so far i have used less than 0.5% [20:07:55] mforns: let me verify [20:08:10] if the report is to be read by humans, better to concatenate all the os and browser info [20:08:21] ok [20:08:34] well, it will be a comma (or tab) separated file [20:09:09] leaving os and browser separated allows for easier pivits in excell if someone wants to import it, this is what they do now [20:09:19] oh I see [20:09:27] Krinkle is offline but he'll get messaged by the ticket [20:09:32] cool [20:12:28] mforns: i think we did 0.05% [20:12:36] nuria, ok [20:12:40] will do [20:12:41] mforns: but let's run the reports and see [20:12:49] ok [20:57:10] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1729350 (jgbarah) >>! In T89135#1720801, @Aklapper wrote: >>>! In T89135#1720707, @ashitaprasad wrote: >> Can you please guide me to gain some traction and contribute to this p... [21:02:02] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1729370 (jgbarah) > Yes, if I remove print (pageid+" "+ namespace + " " + title+" was already in the db") and replace it with print (pageid+" was already in the db"), I don't get this problem. T... [21:04:40] mforns: let me know if you run into oozie trouble [21:04:46] nuria, sure [21:05:40] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1729377 (jgbarah) >>! In T114437#1728837, @Anmolkalia wrote: > Hi, @jgbarah, I am getting that error if I run the code again, basically when the database already contains the values that are bei... [21:05:53] I was going to ask madhuvishy because we combined this way in standup, but I'm still improving the query, when I come to oozie, I'll ask either of you, thx [21:06:02] nuria, ^ [21:06:20] mforns: sure! [21:06:48] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1729384 (jgbarah) >>! In T89135#1714299, @Fhocutt wrote: > This looks interesting, and I can help with exploring and working with the MediaWiki API. Please, go ahead! [21:07:04] mforns, madhuvishy we should probably have a generic workflow for the browser report as in "run this query and dump to a file" [21:07:19] yeah [21:07:34] we are gonna append to the end of a file right? [21:07:35] mforns, madhuvishy w/o the parametization that teh other waorkflows have [21:07:55] mmhm [21:08:03] nuria: boy I wish I hadn't deleted those meetings. This was quite an outage :/ [21:08:04] nuria: it will still need time params no [21:08:08] madhuvishy: if i remember appending was hard, it was easier to "create" a weekly file [21:08:23] data's randomly missing from the whole time the consumer was down [21:08:25] madhuvishy: yes nbut only those [21:08:28] *but [21:08:37] milimetric: niceee [21:08:47] milimetric, mmm :/ [21:08:48] mforns: oh! [21:08:52] madhuvishy: do you know how long data's going to stick around in that kafka topic? The mixed one that mysql consumer consumes? [21:08:54] madhuvishy, ? [21:08:59] 7 days [21:09:08] milimetric: but how...could it be "randomly" missing? [21:09:14] not sure [21:09:20] uhhh mforns sorry that was supposed to be for milimetric [21:09:23] that can only be explained if the duplicate stuff isnot deployed [21:09:26] it's weird, I looked at 3 different schemas and they're all the same [21:09:30] :] [21:09:39] so some of the inserts are failing [21:09:48] like you were saying yesterday, yea [21:09:57] i'll check puppet [21:10:05] aha makes sense [21:10:17] milimetric: you can check the code, on /srv let me see what machine was this deployed to [21:10:31] nuria: eventlog1001 [21:10:34] nah, replace=true [21:10:34] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/eventlogging.pp#L233 [21:10:45] eventlog1001.eqiad.wmnet, yep [21:11:32] yep, checked the config on el1001, it's there, replace=True [21:11:34] weeeeird [21:11:58] milimetric: indeed [21:12:13] andrew and I tested the replace=True very carefully, it did what we figured it would [21:12:29] milimetric, I have another theory [21:12:32] k :) [21:13:00] mforns: let me guess [21:13:12] underpants gnomes! [21:13:14] mforns: consumer is starting->inserting [21:13:23] blowing up with oom [21:13:26] the thread that was consuming from kafka and queueing for insertion was faster than the thread actually inserting [21:13:29] starting->inserting again [21:13:40] right, causing uooms [21:13:58] i'll save the log from 10.14 so we can check that [21:14:17] mmmmm, I was thinking that the queue got too big and the consumer restarted several times, loosing what was in the queues [21:14:43] mforns: which queue is this? [21:14:54] well if it is too big it'll die with oom, right? [21:15:05] k, saved logs to /home/milimetric on el1001 [21:15:06] madhuvishy, the queue where all events get queued for sql insertion [21:15:15] mm hmm [21:15:26] nuria, yes! I was figuring out what oom was :] [21:17:21] mforns: let's check the kern log [21:17:34] mforns, milimetric : what utc time was teh outage? [21:17:52] nuria, yes, the kern log right [21:17:55] 06:48 UTC to 17:03 UTC [21:18:03] yesterday 14th? [21:18:05] that's how long we see insertAttempted = 0 [21:18:29] however, after 17:03 it started attempting a lot of inserts, but never ended up inserting enough to catch up [21:18:39] if the theory is true, the ooms should start at +-15:30 UTC [21:19:00] Oct 14 15:52:25 eventlog1001 kernel: [19872077.612410] eventlogging-pr invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [21:19:11] this is the processor [21:19:12] ta -ta-channnn [21:19:15] ay ay a [21:19:16] sorry, 15:03, sorry, not 17:03 [21:19:23] it should be eventlogging-co [21:19:33] right -co [21:19:58] Oct 14 15:52:25 eventlog1001 kernel: [19872077.613003] Out of memory: Kill process 10512 (eventlogging-co) score 867 or sacrifice child [21:19:59] interesting though..... processor died?! [21:20:16] there you go [21:20:23] https://www.irccloud.com/pastebin/NfWZhiHK/ [21:21:04] yeah, I was afraid that would happen, I wanted to restart the consumer too but figured it couldn't keep up [21:21:11] to fully confirm the theory we should have several consumer kills [21:21:17] during a couple hours [21:21:31] well, this one death could've killed a TON of events [21:21:45] so we don't necessarily need multiple deaths [21:22:06] for it to OOM that big machine with just text :) [21:22:39] the offsets probably got committed to kafka though and the messages were never re inserted [21:22:42] anyway, we have to grab the kafka topic, and re-consume from the offset before 06:00 [21:22:51] has anyone thought about how to do that yet ^? [21:22:56] I see, but wouldn't just one kill, create a whole in the db? Kafka delivers events in order, right? [21:23:05] like can I start another consumer with a specific offset? [21:23:36] hmmm, i dont think you can start with specific offset [21:23:44] mmm [21:23:59] mforns: yeah, you're right, some data trickles in. hm... maybe that's from the other thread that inserts what's left in the queue after the worker dies? [21:24:12] we should also, modify the eventlogging mysql consumer code to block the kafka reader thread if the queue is too big [21:24:34] yeah, that makes sense [21:24:48] mforns: bounded blocking queues [21:24:57] madhuvishy, aha [21:26:31] i wanna try to finish that restbase work, if we have this data for 7 days I'd like to wait until Andrew is around [21:26:43] milimetric: you can manually pipe the events into the consumer? [21:26:51] madhuvishy, maybe for now, would be easier to just add a sleep(1) in the kafka-reader thread if the queue is too large [21:26:54] like, we can use kafkacat [21:27:04] and read from that using specific offset [21:27:05] that's a good idea, like backfillinf [21:27:05] oh... and consume from stdin? [21:27:09] and pipe to consumer [21:27:21] yeah, kafkacat will let you specify offset [21:27:30] that makes sense [21:27:46] our kafka reader handler doesn't do that as far as i know, there's just latest and earliest [21:28:14] right. I'm not sure how to use kafkacat, investigating [21:28:26] kafkacat -o -5 -t eventlogging_Blah -b kafka1012:9092 [21:28:42] milimetric: ^ [21:28:44] hm, but it's not on el1001 [21:28:55] hmmm [21:29:02] stat1002 has it [21:29:19] right, but it doesn't have access to the mysql machines the consumer needs [21:29:22] yup [21:30:12] milimetric: where are the mysql machines? [21:30:23] stat1003? [21:30:44] no, i'm not sure how they're set up but they're not reachable from the stat machines [21:31:03] they're called m4-master.eqiad [21:31:11] hmmm [21:31:37] conspicuously completely absent from puppet: https://github.com/wikimedia/operations-puppet/search?utf8=%E2%9C%93&q=m4-master [21:31:58] secret machines! [21:32:00] data holes! [21:32:04] our life is fun :) [21:32:05] xD [21:32:31] milimetric: hmmm [21:33:02] you know, though, we could kafkacat from kafka and just save this period of events to a file [21:33:11] then we can copy that manually [21:33:17] we don't have to pipe everything all fancey [21:33:24] yeah if you want to be safe for now [21:34:03] and then we can pipe the file to consumer [21:34:16] yea, but then it would have the same problem [21:34:20] aha, I think we have around 702 MB per hour uncompressed [21:34:21] we need a sleep like mforns said [21:34:57] yeah [21:35:13] milimetric, if you want I can do that [21:35:54] mforns: i'd love to test it on one of those test ana machines [21:36:07] do we still have an1004 or was it repurposed? [21:36:25] no idea [21:38:20] mforns: yeah, if you want you can set that up, I'll try to kafkacat this stuff out [21:38:25] ok [21:45:42] madhuvishy: this offset parameter seems random :) [21:45:56] milimetric: why? [21:46:01] I'm trying all kinds of values for -o and the timestamps of the events are all within a few minutes of now [21:46:16] everything from 1 to -5000000 [21:46:33] milimetric: hmmm [21:47:08] but I can do -o beginning and that gets me 7 days ago.... [21:47:33] milimetric: interesting [21:47:55] i have never tried more than last 10 messages and it's worked fine for that [21:48:07] there's an option called -c [21:48:13] to limit count of messages [21:48:21] wonder if that's in play somehow [21:48:39] but you mention it's fine if you start at the beginning so dunno [21:49:39] yeah, I did -c 2 [21:50:17] you pass just a normal integer to -o, and I'd expect the first offset to be 0 or 1 right? [21:50:30] and then for them to increase as we go.... oh, hm, that wouldn't make sense [21:50:41] 'cause the "first" message would constantly be different [21:51:13] is there a way to get the offset for a particular message? the new versions of kafkacat seem to have a format -f param, but ours doesn't [21:53:15] kaldari: the api is not going to get a public API this week [21:53:34] kaldari: but next week for sure, or I'll just run it out of my basement or something [21:53:39] milimetric: there's an offset associated with each message [21:54:02] right, question is, how do I find out what offset to try :) [21:55:00] milimetric: -O [21:55:11] -O gives you the offset along with the message [21:55:19] aha [21:55:20] ! [21:56:14] but yeah, in this case its a little bit of a pain to figure out which offset [21:57:51] milimetric: piping to file would be like reading from the alle-vents log right? [21:57:55] all-events [21:58:28] milimetric: cause we have 1 topic per schema so it is likely to be tedious [21:58:55] no, we have valid-mixed [21:58:59] that's what the consumer is grabbing [21:59:50] nuria: the valid-mixed topic only has events that are going to mysql, it skips the ones that are blacklisted (these are only consumed in hadoop now) [21:59:51] madhuvishy: do you know if pykafka lets you specify an offset to start consuming from? We might want to make the kafka-reader capable of taking that parameter in [22:00:03] milimetric: I don't think so [22:00:08] k, sok [22:00:09] i'll check [22:00:23] milimetric: ok, it seems that kafkacat will make you issue commands per topic, so it's going to be a lot of comands [22:00:45] milimetric: but there is a only-valid-events log [22:00:48] close enough! I got a decent offset: [22:00:58] kafkacat -o 48850000 -O -c 2 -t eventlogging-valid-mixed -b kafka1012:9092 [22:01:21] nuria: don't have to do per topic because valid-mixed topic has everything [22:01:57] milimetric: cool [22:02:06] milimetric: ok, once you get it to work , documenting here: https://wikitech.wikimedia.org/wiki/EventLogging/Backfilling [22:02:32] would be awesome so we do not have to repeat ourselves [22:02:57] milimetric: Thanks for the update. We were finally able to get in touch with the maintainer of the popular pages reports, so that might not be blocked after all. Let me know when you have a better idea of when it will go live. [22:03:23] that's good news kaldari, you'll be the first to know when it's available [22:04:21] the only-valid-events.log would work too, but it's more of a pain in the butt to slice the right piece out of that [22:05:24] milimetric: sure [22:05:55] milimetric, do we have a task to associate the gerrit change to? [22:06:14] sorry :) making [22:06:25] milimetric, oh thanks! [22:07:44] Analytics-Kanban: Enable EL consumer to deal with a lot of pressure from kafka {oryx} - https://phabricator.wikimedia.org/T115667#1729697 (Milimetric) NEW [22:07:48] ^ [22:10:22] mforns: ^ [22:10:29] milimetric, oh sorry, I should have created that task instead of bothering you :] I thought you had a task for the incident [22:10:59] I'll change it and assign it to you [22:11:14] it's ok, I forgive you :) [22:11:18] xD [22:12:01] I was going to make a joke that you owe me and I just remembered I owe you a chess game [22:12:03] from Mexico [22:12:16] I have the worst / weirdest debt memory, sorry about that [22:12:31] mmm we can play in chess.com [22:15:19] mforns: what's your username? [22:15:25] marcelrf [22:15:32] milimetric, ^ [22:17:19] (added) [22:17:28] btw guys, kafkacat is A BEAST [22:17:36] it consumed 1.1 million messages in like two seconds [22:17:51] makes sense the consumer ran out of memory :) [22:18:38] cool! [22:21:49] Analytics-Kanban: Incident: EventLogging mysql consumer stopped consuming from kafka {oryx} - https://phabricator.wikimedia.org/T115667#1729727 (mforns) a:Milimetric [22:24:09] https://gerrit.wikimedia.org/r/246796 [22:24:39] milimetric, madhuvishy, ^ [22:24:47] I haven't tested that yet [22:28:43] Analytics-Tech-community-metrics, Research consulting, Research-and-Data: Quantifying the "sum of all contributors" - https://phabricator.wikimedia.org/T113406#1729751 (DarTar) @ezachte: any progress on this? Let's talk briefly during our 1:1 tomorrow. [22:30:57] hey a-team, going to sign off, see ya tomorrow [22:31:02] nite mforns [22:31:08] hasta luego [22:31:20] adeu :] [22:31:29] Analytics-Backlog, Analytics-Cluster: Use Burrow for Kafka Consumer offset lag monitoring - https://phabricator.wikimedia.org/T115669#1729754 (Ottomata) NEW a:Ottomata [22:32:49] something's really weird [22:33:19] if I look at offset 50000000 its timestamp is around 22:00 [22:33:38] if I look at offset 48850000 its timestamp is around 05:50 [22:33:41] so that's the range I want [22:33:50] but if I subtract, and I get 1150000 [22:34:03] and then consume that many events, the last one I have is from 08:00 [22:34:09] which... makes no sense [22:47:04] k, signing off, too weird, need other brains to bounce tomorrow [22:54:22] madhuvishy: i can't seem to find it in my email, where was i supposed to let you know about the updated schema? (finished off the patches on our end, after review should be good for a deploy next week) [23:56:56] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Schema changes - https://phabricator.wikimedia.org/T114164#1729891 (Jdlrobson)