3 I was recently performing I-V measurements of a MOS
4 (Metal-Oxide-Semiconductor) structure. A full set of measurements
5 contained a DC biaising voltage, a AC frequency, a small signal
6 capacitance and conductance. I had to change a few times the
7 measurement device configuration, so sometimes the sweeping occured
8 first on frequency, then on voltage, sometimes in the reverse
9 order. To make it short, I had to deal with many input files with
10 inconsistent columns order. The code to identify this order quickly
13 The idea of a dataframe is to implement a mix between matrix and
14 cells. Its' like a matrix, where each column contains elements of the
15 same type. Unlike a matrix, columns type may be dissimilar. Also,
16 each colum MUST have a name, and rows MAY have a name. Moreover, to
17 make it easy to interface with databases, each row must have an unique
18 identifier. The goal is to make possible to use constructs like
19 y(:, ["Fr*"; "VB*"; "C";"G"])
20 where y is the dataframe, and column selection is based on
21 regexp. This way, the translation between names and indexes uses all
22 the power of regexpes.
25 a dataframe is a class containing the following members:
26 _cnt = [0 0] : row count, column count, ... nth dimension count
27 _name = cell(1, 2) : row names, column names, ...
28 _ridx = [] : a unique Id for each row
29 _data = cell(0, 0) : a container for each column
30 _type = cell(0, 0) : the type of each column
32 The constructor can be used as
33 - no argument: convert the whole workspace to a dataframe (TBD)
34 - one null argument: return an empty dataframe
35 - one numeric or cell argument: transform it to a dataframe; tries to
36 infer column names from the name of the input argument.
37 - one char array with more than one line: uses it as rownames
38 - one single line char array: take it as the name of a file to read
39 data from. Expected format is csv, try to be carefull with
40 quoted/unquoted strings, also tries to remove trailing and leading
41 spaces from string entries. Do not try to cope with things such as
42 separator INSIDE quoted strings.
44 -supplemental arguments may occur either as pairs (string, value),
45 either as vectors. In the first case, the string contains an optional
46 parameter whose value is contained in the next argument. In the
47 second case, the argument is right-appended to the dataframe. Valid
48 optional parameters are
49 - rownames: a character array with the row names
50 - unquot: a logical to indicate if strings must be unquoted, default=true
51 - seeked: a string which must occur in the first row to start
52 considering values. Previous lines are skipped.
55 - like a single matrix: df(:, 3); df(3, :). If all the results are of
56 the same type, returns a matrix, otherwise a dataframe. This behavior
57 can be inhibited by having the last argument set to 'dataframe':
58 df(3, 3, 'dataframe') will return a one-by-one dataframe
60 df(:, ["Fr*"; "VB*"; "C";])
61 will try to match a columname beginning by "F" followed by an
62 optional 'r', thus 'F', 'Fréquence' and 'Freqs'; then a columname
63 starting by "V" with an optional "B", like f.i. "VBias", then a
64 columname with is the exact string 'C'.
65 - by rownames: same principle
66 - either member selector may also be logical:
67 df(df.OK=='A', ['C';'G'])
68 - as a struct: either use one of the column name (df.C), either use
69 one of the allowed accessor for internal fields: "rownames",
70 "colnames", "rowcnt", "colcnt", "rowidx", "types". Direct access to
71 the members like y._type is allowed, but should be restricted to
72 class members and friends. "types" accept both numeric and strings
73 arguments, the latter being converter to column order based upon
75 - as a cell: TODO: define how to fill the cell array with all the
79 - as a matrix, using '()': use the same syntax as reading:
81 df(df.OK=='?', ['C'; 'G']) = NaN;
82 Note that removing elements may only occur on a full row of colum
83 basis. Removing a single element is not allowed.
84 - as a struct: either access a columname, as
86 either accessing the internal fields through entry points 'rownames'
87 and 'colnames', where care is taken to adapt the strings width in
88 order to make them compatibles. The entry point "types", with
89 arguments numeric or strings, has the effect to cast whole column(s)
91 df.types{[3 5]} = 'uint16'
92 df.type{"Freq"} = "uint32"
95 5) other overloaded functions: display, size, numel, cat. The latter
96 has to be thoroughfully tested. In particular, I've put the
97 restriction that horizontal cat requires that the row indexes are the
98 same for both elems. For vertical cat, how should we proceed ? Require
99 uniqueness of row indexes, and sorting ? Other ?
102 - the 'load' function is in fact contained inside the constructor;
103 maybe we should have a specific load function ?
104 - be able to load a dataframe from a URI specification
105 - write a simple 'save' function
106 - adding data to a dataframe: R doesn't seems to allow adding rows
107 to a data.frame, should we follow it ?
109 - implement a 'factor' class for categorised data
110 - make all functions below statistics/ dataframe compatible
113 Louvain-la-Neuve, July First, 2010.