This project is read-only.


Possible bug in ExtractArticleConcepts


I encountered a strange problem when using the OpenTextSummarizer on short messages: the key concepts were coming back truncated. For example, I'd get concepts like "financ" and "statu" instead of "finance" and "status".
I traced the problem to the ExtractArticleConcepts method in Grader.cs. In lines 55-62, it decides what to do if there are fewer than 5 important words in the source article. If there are more than 5 words, it creates a short list of words with a frequency greater than or equal to a base frequency. But if there are 5 or fewer important words, it returns a list of the STEMS of these words.
I can't imagine why we'd want whole word concepts in the first case and only the stems in the second. This looks like a bug to me.
Closed Jan 30, 2011 at 6:52 PM by PatrickBurrows


CleverHuman wrote Jan 30, 2011 at 4:19 AM

I agree. I will take a look at this tomorrow.

wrote Jan 30, 2011 at 4:21 AM

wrote Jan 30, 2011 at 4:21 AM

wrote Jan 30, 2011 at 4:21 AM

wrote Jan 30, 2011 at 6:52 PM

wrote Feb 14, 2013 at 1:15 AM

wrote May 16, 2013 at 8:39 AM