Final report
A. short detail fuction of each subroutines

1.preprocessing()

input:   raw Penn treebank file

output:  prefix.new01 containing each sentence with just one newline at the end 

Function:
removes not necessary whitespace and newline in every sentence and the explanations at the beginn of the file.

2.puretext()

input:   prefix.new01 containing each sentence with just one newline at the
           end 
  output:  The pure text but with 's 'm 'n't 'd 're problems

Function:
to extract all the terminal(text) aut of the Treebank.


3.sproblem()

input:   prefix.withs 

output:  prefix.pure without 's 'm 'n't 'd 're problems

Function:
to clean those 's 'm 'n't 'd 're problems

4.terminalcount()

input:   prefix.withs 

output:  prefix.pure without 's 'm 'n't 'd 're problem

Function:
to count the positions of the terminals 

5.phasenprint()

input: the prefix.new01       

output: all the phrases with the postion informations 

Function:
print all the phrases with the postion informations 


6.goback()

input:   the prefix.phrase and prefix.npdepth

output:  the phrase without the teminal position informations
            the npdepth information with phrase indexes

removes all the teminal position informations so that later we
can use the recursive program to extract the real entity
removes the additional information in infile.NPDEPTH

7.subknote()

input:   $infile.back
output:  creats a list of the index of the phrase which we need

function:

For example: (NP (NP different parts) (PP of (NP Europe))) 
They want to mark both ``different parts of Europe'' and ``Europe'', since they both contribute distinct DE's. How­ ever, notice that ``different parts'' does not contribute a DE since it is not possible to refer to this subexpression alone in subsequent discourse. To avoid finding such undesirable NP's, our system has a heuristic (H1) which says: Pass over any NP which is a leftmost child of a top­level NP. This heuristic is too drastic, though, eliminating constructions like (2). (2) (NP (NP the inner brain) and (NP the eyes)) 
To avoid losing these examples, they include another heursitic (H2) which says: H1 does not apply when the NP is a sibling of another NP. A third heuristic must be added to overrule H1 in the case of a possessor in a possessive construction, such as: 
(3) (NP (NP Chicago's) South Side) where they should extract both ``Chicago'' and ``Chicago's South Side''. So, the heuristic H3 is introduced: H1 does not apply when the NP is a possessive form. Even with heuristics eliminating the NP's which they 
do not need to consider, there are some NP's that will be found by the system which cannot be eliminated automatically. Copular constructions such as (4) introduce unnecessary NP's. 

8.npphrase()

input:   prefix.phrase

output:  NP phrases with the begin and end positions of each phrase and             add it at the end of each phrase 

function:

get the NP phrases out of all the phrases and extract the begin and
end positions of each phrase and add it at the end of each phrase
we need

9.npprad()

input:   infile.phrasenp
output:  the npphrases without NP-PRD List1 is the index of phrases we don't
            need,the list3 muss be worded later

function:
get rid of the NP in NP-PRD that we don't need produce the list1
in which the index of phrases that we don't need
Workpackage 4: some execptions and test


10nppad2()

input:   list3 and all the NP phrase

output:  List4 contain the index of the phrase that we don't need

function:
 for the NP-PRD we should get rid of all the sonknote of the first 
subknote and also if there is a , , in NP-PRD the first knote after
it muss also be deleted

11.final()


input:   list1 list4 and the NP phrases

output:  the realnp we need

function:
 get rid of all the phrases we don't need

12.nummer()

input: cf02.realnp ,cf02.npdepth2
output:.cf02.info and .cf02.attr


B: the cf02.pure and .cf02.info and .cf02.attr are the input for another tool, which calls mate.

the result of the project is with and other programm in c almost the same (99.3%).