Stable version

May 17, 2009 at 4:47 PM
Edited May 17, 2009 at 5:47 PM

Hello Patrick,

First of all, congrats for the noble intention of migrating this great tool from C to C#, and most of all, creating it as a dll.

My concern is if you consider the current source from the source control as beeing stable and if you'd create a release?

 

Thanks

Dec 17, 2009 at 9:31 PM

We're using the OpenTextSummarizer in our project as well. I second the motion for an updated stable release!

 

Coordinator
Jan 30, 2011 at 3:22 AM

Hi. Sorry I haven't checked back in here in the last couple years!

Yes, I consider the OST source code a stable release -- I've been using it for years. If I get time tomorrow I'll go ahead and package it up for download.

Jan 30, 2011 at 5:38 PM

Thanks, Patrick. The OpenTextSummarizer has been working really well for us.

I think having an official release will encourage others to use the project, too.

Coordinator
Jan 30, 2011 at 5:57 PM

Added a stable version release to the downloads section.

Jan 31, 2011 at 4:50 AM

Hi, Patrick,

Thanks for posting this.  A couple of fast questions....

1. Do you have a small snippet of sample code showing how this is supposed to be used.  What I figured out from playing with the namespace is...

SummarizerArguments sa = new SummarizerArguments();

 

sa.InputString = "this is a test sentence.  What a lovely, but trivial, sentence.";

 

 

SummarizedDocument sd = Summarizer.Summarize(sa);

This fills in the 'concepts' and 'sentences' arrays inside the summarized document.

2. In the above example, I'm ending up with ONE sentence, even though there are 2 sentences.  What is the sentence delimiter?

3. The linux OTS code has a lot of 'stuff' behind the summary, i.e. word stemming, word scoring, sentence scoring, etc.  Is there any way to tap into any of that information in your version?  Alternatively, any chance of releasing the source so this can be extended to expose this information?

Thanks!

 

Coordinator
Jan 31, 2011 at 2:42 PM

If you get the source there are two sample projects in their, one is a WinForm app (called OTSApp) and one is a simple command line test app called OTSTester. There is sample code in both those apps. 

Here is the sample code from the OTSApp

http://ots.codeplex.com/SourceControl/changeset/view/67575#1753717

One thing it is doing that your code is not is setting the "DisplayLines" property of the SummarizerArguments object.

You must set either "DisplayLines" or "DisplayPercent" to get any real results back. 

If you look at the code for the highlighter, you will see the first line is:

if (args.DisplayPercent == 0 && args.DisplayLines == 0) return;

Which means you won't get any sentences back. If you are only getting one sentence back, you likely have DisplayLines set to 1.

 

 

As for getting back the stems, word score, and the scores for the sentences, that is certainly easy enough.
If you change the Article class to Public, you can just have the Summarizer.Summarize function skip the
CreateSummarizedDocument call and return the Article. Here is a sample Summarize function which would do that (not tested):

        public static Article Summarize(SummarizerArguments args)
        {
            if (args == null) return null;
            Article article = null;
            if (args.InputString.Length > 0 && args.InputFile.Length == 0)
            {
                article = ParseDocument(args.InputString, args);
            }
            else
            {
                article = ParseFile(args.InputFile, args);
            }
            Grader.Grade(article);
            Highlighter.Highlight(article, args);
            return article;
            //SummarizedDocument sumdoc = CreateSummarizedDocument(article, args);
            //return sumdoc;

        }

Out of curiosity, what are you wanting to do with the stem and score data?

Jan 31, 2011 at 3:09 PM
Thanks, Patrick!
I'm doing a text compare across sets of text, so the stemming is critical to get an apples-to-apples compare. It is also helpful when doing synonym looksups (which are an expensive transaction in terms of time). By checking my local cache for the stemmed words, I can save a synonym lookup if I've already looked up a similar word.
On the scoring, I need to have some additional factors that influence the weight of any given word in any given sentence or paragraph - think 'domain-specific rulesets'. By opening up the 'black box' a bit, I can inject additional logic into the process before getting a final summarization to get a more meaningful targeted result.
Thanks agian for doing this port to .NET - you've given a great gift to the community. :)
Paul

Paul Scivetti

Chief Idea Officer

Synergen, Inc 22323 735th Ave, Dassel, MN 55325

voice: 320.275.6000 fax: 320.275.2943


From: PatrickBurrows [notifications@codeplex.com]
Sent: Monday, January 31, 2011 9:42 AM
To: Paul Scivetti
Subject: Re: Stable version [OTS:56583]

From: PatrickBurrows

If you get the source there are two sample projects in their, one is a WinForm app (called OTSApp) and one is a simple command line test app called OTSTester. There is sample code in both those apps.

Here is the sample code from the OTSApp

http://ots.codeplex.com/SourceControl/changeset/view/67575#1753717

One thing it is doing that your code is not is setting the "DisplayLines" property of the SummarizerArguments object.

You must set either "DisplayLines" or "DisplayPercent" to get any real results back.

If you look at the code for the highlighter, you will see the first line is:

if (args.DisplayPercent == 0 && args.DisplayLines == 0) return;

Which means you won't get any sentences back. If you are only getting one sentence back, you likely have DisplayLines set to 1.

As for getting back the stems, word score, and the scores for the sentences, that is certainly easy enough.
If you change the Article class to Public, you can just have the Summarizer.Summarize function skip the
CreateSummarizedDocument call and return the Article. Here is a sample Summarize function which would do that (not tested):

        public static Article Summarize(SummarizerArguments args)
        {
            if (args == null) return null;
            Article article = null;
            if (args.InputString.Length > 0 && args.InputFile.Length == 0)
            {
                article = ParseDocument(args.InputString, args);
            }
            else
            {
                article = ParseFile(args.InputFile, args);
            }
            Grader.Grade(article);
            Highlighter.Highlight(article, args);
            return article;
            //SummarizedDocument sumdoc = CreateSummarizedDocument(article, args);
            //return sumdoc;

        }

Out of curiosity, what are you wanting to do with the stem and score data?

Jul 30, 2011 at 6:05 AM

hi,

  is there documentation of Open Text Summarize?  I want to know the algorithms that you have used .Please do help...