rcf Documentation
rcf is our data format to store and analyze code clone data. It defines an extendible schema for code clone data and provides a Java API that eases the development of analysis code. The main ideas behind rcf:
- Use standards: rcf defines a data model that covers the most common entites reported by clone detectors. Among other entites it handles clone pairs, clone classes (also often referred to as clone groups), fragments (the cloned code areas). The clone data of multiple versions of a system can be stored in one rcf file. The predefined model works with our viewer application cyclone.
- Extend to your needs: You propably need to store some specific data along with your clone data that is not covered by the predefined data model. This requirement was one major reason why we developed rcf. The predefined model can be extended so that rcf stores additional data you want to anntotate to your clones. Extensions will not break the compatibility to the cyclone viewer.
-
Focus on your objective:
Analyzing clone data usually means to handle I/O, write a parser for
your clone report, define a data model, write analysis code, collect
further data etc. rcf was designed to do most
of these things so that you can focus on your objective. Once your data
has been converted to rcf—many
converters are included in the distribution—you can access your
data in an object-oriented fashion, without worrying with technical
issues. The conventions of rcf are implemented
into the API, so that you do not need to care about adding the data
the right way—rcf will help you with this.
You can also use our cyclone that reads rcf clone data.
How rcf organizes clone data
Each rcf file is initialized with a predefined schema, which models common aspects of code clones. We call this the core schema. The following UML diagram shows a simplified version this schema.
Each entity has an id attribute which serves as a unique key. The clientId attributes can be used for own id information. A Version has a basepath. All directories contain their relative path to this basepath.
Extending the schema
The core schema can be extended as needed. Conceptionally rcf consists of three different elements:
- Relations
- These model the entites that can be stored in an rcf. CloneClass is one example for such a relation. You can think of a relation as a list of Entries.
- Attributes
- Relations have attributes, that can store values of a defined type. In the core schema type is an attribute of the relation CloneClass with the type int .
- Entries
- An entry is one instance in an relation. While the relation CloneClass describes the concept of a clone class, an entry of that relation denotes one specific clone class. An entry defines a value for each attribute defined by the relaiton's schema.
In rcf it is possible to add arbitrary attributes and even relations. Attribute values can be of the primitive types int, float, boolean and String or reference another entry of any relation. An attribute can hold a scalar value or a list of values of the aforementioned types.
The schema is stored in the rcf file itself. This means that no schema definition or interfaces must be provided togehter with an rcf file that contains an extended schema.
Using the Java API
rcf can be accessed and modifed via our Java library (see downloads section). The library implements classes and access functions for all relations and attributes defined in the core schema. It also provides generic functions to access relations and attributes that were added by the user. See the API docs for a complete reference.
The following example shows how an existing rcf can be used to calculate the average token count of all fragments over all versions.
int avgFragmentSize(RCF rcf) { if (rcf.getFragments().size() == 0) return 0; int sum = 0; for (Version v : rcf.getVersions()) { for (CloneClass cc : v.getCloneClasses()) { for (Fragment f : cc.getFragments()) { sum = sum + f.getNumTokens(); } } } return sum / rcf.getFragments().size(); }
Each Relation in the core schema is represented by its own class in the API. The call to rcf.getVersions() returns an Object of the type Versions which represents the version relation . All relation objects are iterable. Using for(Version v: rcf.getVersions()) will iterate over all versions. Objects of the type Version represent one entry of the version relation, that is one concrete version.
Accessing the attributes is straightforward. For all attributes in the core schema get and set methods exist (e.g. f.getNumTokens()). It might not always be suiteable to set every attribute for every entry. Therefore attribute values can be unset. The attempt to access an unset attribute value will raise a ValueNotSetException which is a RuntimeException (which means that it does not have to be explicitly caught). If it is not clear if an attribute value is set for every entry it is possible to either catch the exception or to use a variant of the get method, which takes a default value. In our example: public int getNumTokens(int default). This will return the given default value if the numTokens attribute is not set.
Besides holding scalar values, attributes can also contain lists of values of the same type. The call of v.getCloneClasses() accesses the list attribute cloneclasses of the Version entry v. This returns a List<CloneClass>.
Loading a RCF file
To load a RCF file use the following code:
File file = new java.io.File("/path/to/rcf") AbstractPersistenceManager apm = PersistenceManagerFactory.getPersistenceManager(file); RCF rcf = pm.load(file);
The second line will select a suitable PersistenceManager to load the given file. The thrid line loads the file.
Adding & changing data
Most relations provide convenience methods to add new entries. You should use these, beacuse they will ensure that your data complies with the rcf conventions. The methods have the prefix add*. For instance, to add a fragment it is necessary to find out if its file and directory have been added to the rcf before, SourcePositions need to be created and linked from the fragment etc. All these things will be done automatically by calling Fragments.addFragment():
RCF rcf = //... Version v = rcf.getVersions().getFirstEntry(); Fragments fragments = rcf.getFragments(); // add a fragment in "/path/to/file.c" // at lines 12:2-16:34 with 23 tokens to version v Fragment f = fragments.addFragment("/path/to/file.c", v, 12, 16, 2, 34, 23);
Nevertheless you can add entries manually using the relation's append() method. To add a new entry to a relation, call the relation's append() method. This will return a newly created entry with all values unset. To set values call the set methods. The following example shows how a new Version is added manually.
RCF rcf = //... Versions = rcf.getVersions(); Version v = Versions.append(); v.setClientId(1); v.setBasepath("/a/b/c");
When a rcf model is not needed anymore, it must be explicitly closed using the close() method.
Extending the Schema using the API
The schema can be extended using the API functions. The following snippet shows an example how to add a new relation with attributes and how to use it.
RCF rcf = //... //add a new relation Relation<Entry> functions = rcf.addRelation("Function"); //add name attribute Attribute attName = functions.addScalarAttribute( "name", AttributeType.STRING); //add lenght attribute Attribute attLenght = functions.addScalarAttribute( "lenght", AttributeType.INTEGER); //add start and end attributes SourcePositions sourcePositions = rcf.getSourcePositions(); Attribute attStart = functions.addReferenceAttribute( "start", AttributeType.REFERENCE, sourcePositions); Attribute attEnd = functions.addReferenceAttribute( "end", AttributeType.REFERENCE, sourcePositions); //add a function entry Entry function = functions.append(); function.setString(attName, "myFunction"); function.setInt(attLength, 10); function.setEntry(attStart, sourcePositions.append()); function.setEntry(attEnd, sourcePositions.append()); //set the values of the SourcePositions //...
In general all relations, attributes and entries can be handled using the generic class types Relation<Entry>, Attribute and Entry. Accessing attribute values requires one to pass the attribute as an Attribute object or as the attribute's name. For relations and attributes that extend the core schema, these generic classes must be used.