The present disclosure describes a system for utilizing a weighting scheme for ensuring that market research panels adequately represent segments of a population on whom large-scale data is available on a social media website. The system also ensures that market-sensitive information about social media users available at the social media website is not revealed to panel partners. A large number of weighting variables, held by the social media website, that have the greatest impact on answers are selected. A principal component analysis (PCA) algorithm is utilized on a dataset associated with all of the weighting variables for the full population of interest, in order to obtain principal components for the population. Each PC is a linear transformation of all of the weighting variables, where weights are assigned to each of the weighting variables for (d = the number of the weighting variables). Thus a value of each PC is computed for all matched panelists, being as they are members of the population of interest. The principal components are calculated using an eigen decomposition of a variance-covariance matrix of a d-dimensional weighting variables vector. The first few principal components are measured in terms of their ability to explain a total variance of the population and the number of PCs sent out to panel partners is then determined. Not all of the PCs are sent out to panel partners so as to preserve a privacy of the sensitive dataset. Partial of the PCs are sent to the panel partners, along with a mean value of each PC among the population as a whole. The panel partners then use these principal components to construct the weights, which use some of the information contained in the social media website’s weighting variables, without the panel partner gaining direct access to that information. The panel partners’ own weights are then augmented with the weights computed using the principal components.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.