{"title": "Online and Differentially-Private Tensor Decomposition", "book": "Advances in Neural Information Processing Systems", "page_first": 3531, "page_last": 3539, "abstract": "Tensor decomposition is positioned to be a pervasive tool in the era of big data. In this paper, we resolve many of the key algorithmic questions regarding robustness, memory efficiency, and differential privacy of tensor decomposition. We propose simple variants of the tensor power method which enjoy these strong properties. We propose the first streaming method with a linear memory requirement. Moreover, we present a noise calibrated tensor power method with efficient privacy guarantees. At the heart of all these guarantees lies a careful perturbation analysis derived in this paper which improves up on the existing results significantly.", "full_text": "OnlineandDifferentially-PrivateTensorDecompositionYiningWangMachineLearningDepartmentCarnegieMellonUniversityyiningwa@cs.cmu.eduAnimashreeAnandkumarDepartmentofEECSUniversityofCalifornia,Irvinea.anandkumar@uci.eduAbstractTensordecompositionisanimportanttoolforbigdataanalysis.Inthispaper,weresolvemanyofthekeyalgorithmicquestionsregardingrobustness,memoryef\ufb01ciency,anddifferentialprivacyoftensordecomposition.Weproposesimplevariantsofthetensorpowermethodwhichenjoythesestrongproperties.Wepresentthe\ufb01rstguaranteesforonlinetensorpowermethodwhichhasalinearmemoryrequirement.Moreover,wepresentanoisecalibratedtensorpowermethodwithef\ufb01cientprivacyguarantees.Attheheartofalltheseguaranteesliesacarefulperturbationanalysisderivedinthispaperwhichimprovesupontheexistingresultssigni\ufb01cantly.Keywords:Tensordecomposition,tensorpowermethod,onlinemethods,stream-ing,differentialprivacy,perturbationanalysis.1IntroductionInrecentyears,tensordecompositionhasemergedasapowerfultooltosolvemanychallengingproblemsinunsupervised[1],supervised[18]andreinforcementlearning[4].Tensorsarehigherorderextensionsofmatriceswhichcanrevealfargreaterinformationcomparedtomatrices,whileretainingmostoftheef\ufb01cienciesofmatrixoperations[1].Acentraltaskintensoranalysisistheprocessofdecomposingthetensorintoitsrank-1components,whichisusuallyreferredtoasCP(Candecomp/Parafac)decompositionintheliterature.WhiledecompositionofarbitrarytensorsisNP-hard[13],itbecomestractablefortheclassoftensorswithlinearlyindependentcomponents.Throughasimplewhiteningprocedure,suchtensorscanbeconvertedtoorthogonallydecomposabletensors.Tensorpowermethodisapopularmethodforcomputingthedecompositionofanorthogonaltensor.Itissimpleandef\ufb01cienttoimplement,andanaturalextensionofthematrixpowermethod.Intheabsenceofnoise,thetensorpowermethodcorrectlyrecoversthecomponentsunderarandominitializationfollowedbyde\ufb02ation.Ontheotherhand,perturbationanalysisoftensorpowermethodismuchmoredelicatecomparedtothematrixcase.ThisisbecausetheproblemoftensordecompositionisNP-hard,andifalargeamountofarbitrarynoiseisaddedtoanorthogonaltensor,thedecompositioncanagainbecomeintractable.In[1],guaranteedrecoveryofcomponentswasprovenunderboundednoise,andtheboundwasimprovedin[2].Inthispaper,wesigni\ufb01cantlyimproveuponthenoiserequirements,i.e.theextentofnoisethatcanbewithstoodbythetensorpowermethod.Inorderfortensormethodstobedeployedinlarge-scalesystems,werequirefast,parallelizableandscalablealgorithms.Toachievethis,weneedtoavoidtheexponentialincreaseincomputationandmemoryrequirementswiththeorderofthetensor;i.e.anaiveimplementationona3rd-orderd-dimensionaltensorwouldrequireO(d3)computationandmemory.Instead,weanalyzetheonlinetensorpowermethodthatrequiresonlylinear(ind)memoryanddoesnotformtheentiretensor.Thisisachievedinsettings,wherethetensorisanempiricalhigherordermoment,computedfromthestreamofdatasamples.Wecanavoidexplicitconstructionofthetensorbyrunningonlinetensor30thConferenceonNeuralInformationProcessingSystems(NIPS2016),Barcelona,Spain.\fpowermethoddirectlyoni.i.d.datasamples.Weshowthatthisalgorithmcorrectlyrecoverstensorcomponentsintime1\u02dcO(nk2d)and\u02dcO(dk)memoryforarank-ktensorandnnumberofdatasamples.Additionally,weprovideef\ufb01cientsamplecomplexityanalysis.Asspectralmethodsbecomeincreasinglypopularwithrecommendationsystemandhealthanalyticsapplications[29,17],dataprivacyisparticularlyrelevantinthecontextofpreservingsensitiveprivateinformation.Differentialprivacycouldstillbeusefulevenifdataprivacyisnottheprimeconcern[30].Weproposethe\ufb01rstdifferentiallyprivatetensordecompositionalgorithmwithbothprivacyandutilityguaranteesvianoisecalibratedpoweriterations.Weshowthatunderthenaturalassumptionoftensorincoherence,privacyparametershaveno(polynomial)dependenceontensordimensiond.Ontheotherhand,straightforwardinputperturbationtypemethodsleadtofarworseboundsanddonotyieldguaranteedrecoveryforallvaluesofprivacyparameters.1.1RelatedworkOnlinetensorSGDStochasticgradientdescent(SGD)isanintuitiveapproachforonlinetensordecompositionandhasbeensuccessfulinpracticallarge-scaletensordecompositionproblems[16].Despiteitssimplicity,theoreticalpropertiesareparticularlyhardtoestablish.[11]consideredavariantoftheSGDobjectiveandproveditscorrectness.However,theapproachin[11]onlyworksforeven-ordertensorsanditssamplecomplexitydependencyupontensordimensiondispoor.TensorPCAInthestatisticaltensorPCA[24]modelad\u00d7d\u00d7dtensorT=v\u22973+EisobservedandonewishestorecovercomponentvunderthepresenceofGaussianrandomnoiseE.[24]showsthatkEkop=O(d\u22121/2)issuf\ufb01cienttoguaranteeapproximaterecoveryofvand[14]furtherimprovesthenoiseconditiontokEkop=O(d\u22121/4)viaa4th-ordersum-of-squaresrelaxation.Techniquesinboth[24,14]arerathercomplicatedandcouldbedif\ufb01culttoadapttomemoryorprivacyconstraints.Furthermore,in[24,14]onlyonecomponentisconsidered.Ontheotherhand,[25]showsthatkEkop=O(d\u22121/2)issuf\ufb01cientforrecoveringmultiplecomponentsfromnoisytensors.However,[25]assumesexactcomputationofrank-1tensorapproximation,whichisNP-hardingeneral.NoisymatrixpowermethodsOurrelaxednoiseconditionanalysisfortensorpowermethodisinspiredbyrecentanalysisofnoisymatrixpowermethods[12,6].Unlikethematrixcase,tensordecompositionnolongerrequiresspectralgapamongeigenvaluesandeigenvectorsareusuallyrecoveredoneatatime[1,2].Thisposesnewchallengesandrequiresnon-trivialextensionsofmatrixpowermethodanalysistothetensorcase.1.2NotationandPreliminariesWeuse[n]todenotetheset{1,2,\u00b7\u00b7\u00b7,n}.WeuseboldcharactersA,T,vformatrices,tensors,vectorsandnormalcharacters\u03bb,\u00b5forscalars.ApthordertensorTofdimensionsd1,\u00b7\u00b7\u00b7,dphasd1\u00d7\u00b7\u00b7\u00b7\u00d7dpelements,eachindexedbyap-tuple(i1,\u00b7\u00b7\u00b7,ip)\u2208[d1]\u00d7\u00b7\u00b7\u00b7\u00d7[dp].AtensorTofdimensionsd\u00d7\u00b7\u00b7\u00b7\u00d7dissuper-symmetricorsimplysymmetricifTi1,\u00b7\u00b7\u00b7,ip=T\u03c3(i1),\u00b7\u00b7\u00b7,\u03c3(ip)forallpermutations\u03c3:[p]\u2192[p].ForatensorT\u2208Rd1\u00d7\u00b7\u00b7\u00b7\u00d7dpandmatricesA1\u2208Rm1\u00d7d1,\u00b7\u00b7\u00b7,Ap\u2208Rmp\u00d7dp,themulti-linearformT(A1,\u00b7\u00b7\u00b7,Ap)isam1\u00d7\u00b7\u00b7\u00b7\u00d7mptensorde\ufb01nedas[T(A1,\u00b7\u00b7\u00b7,Ap)]i1,\u00b7\u00b7\u00b7,ip=Xj1\u2208[d1]\u00b7\u00b7\u00b7Xjp\u2208[dp]Tj1,\u00b7\u00b7\u00b7,jp[A1]j1,i1\u00b7\u00b7\u00b7[Ap]jp,ip.Weusekvk2=pPiv2iforvector2-normandkvk\u221e=maxi|vi|forvectorin\ufb01nitynorm.WeusekTkoptodenotetheoperatornormorspectralnormofatensorT,whichisde\ufb01nedaskTkop=supku1k2=\u00b7\u00b7\u00b7kupk2=1T(u1,\u00b7\u00b7\u00b7,up).AneventAissaidtooccurwithoverwhelmingprobabilityifPr[A]\u22651\u2212d\u221210.Welimitourselvestosymmetric3rd-ordertensors(p=3)inthispaper.Theresultscanbedirectlyextendedtoasymmetrictensorssincetheycan\ufb01rstbesymmetrizedusingsimplematrixoperations(see[1]).Extensiontohigher-ordertensorsisalsostraightforward.Asymmetric3rd-ordertensorTisrank-1ifitcanbewrittenintheformofT=\u03bb\u00b7v\u2297v\u2297v=\u03bbv\u22973\u21d0\u21d2Ti,j,\u2018=\u03bb\u00b7v(i)\u00b7v(j)\u00b7v(\u2018),(1)1\u02dcOhidespoly-logarithmicfactors.2\fAlgorithm1Robusttensorpowermethod[1]1:Input:symmetricd\u00d7d\u00d7dtensoreT,numberofcomponentsk\u2264d,numberofiterationsL,R.2:fori=1tokdo3:Initialization:Drawu0uniformlyatrandomfromtheunitsphereinRd.4:Poweriteration:Computeut=eT(I,ut\u22121,ut\u22121)/keT(I,ut\u22121,ut\u22121)k2fort=1,\u00b7\u00b7\u00b7,R.5:Boosting:RepeatSteps3and4forLtimesandobtainu(1)R,\u00b7\u00b7\u00b7,u(L)R.Let\u03c4\u2217=argmaxL\u03c4=1eT(u(\u03c4)R,u(\u03c4)R,u(\u03c4)R).Set\u02c6vi=u(\u03c4)Rand\u02c6\u03bbi=eT(u(\u03c4)R,u(\u03c4)R,u(\u03c4)R).6:De\ufb02ation:eT\u2190eT\u2212\u02c6\u03bbi\u02c6v\u22973i.7:endfor8:Output:Estimatedeigenvalue/Eigenvectorpairs{\u02c6\u03bbi,\u02c6vi}ki=1.where\u2297representstheouterproduct,andv\u2208Rdisaunitvector(i.e.,kvk2=1)and\u03bb\u2208R+.2AtensorT\u2208Rd\u00d7d\u00d7dissaidtohaveaCP(Candecomp/Parafac)rankkifitcanbe(minimally)writtenasthesumofkrank-1tensors:T=Xi\u2208[k]\u03bbivi\u2297vi\u2297vi,\u03bbi\u2208R+,vi\u2208Rd.(2)Atensorissaidtobeorthogonallydecomposableifintheabovedecompositionhvi,vji=0fori6=j.Anytensorcanbeconvertedtoanorthogonaltensorthroughaninvertiblewhiteningtransform,providedthatv1,v2,...,vkarelinearlyindependent[1].Wethuslimitouranalysistoorthogonaltensorsinthispapersinceitcanbeextendedtothismoregeneralclassinastraightforwardmanner.TensorPowerMethod:Apopularalgorithmfor\ufb01ndingthetensordecompositionin(2)isthroughthetensorpowermethod.ThefullalgorithmisgiveninAlgorithm1.We\ufb01rstprovideanimprovednoiseanalysisfortherobustpowermethod,improvingerrortoleranceboundspreviouslyestablishedin[1].Wenextproposememory-ef\ufb01cientand/ordifferentiallyprivatevariantsoftherobustpowermethodandgiveperformanceguaranteebasedonourimprovednoiseanalysis.2ImprovedNoiseAnalysisforTensorPowerMethodWhenthetensorThasanexactorthogonaldecomposition,thepowermethodprovablyrecoversallthecomponentswithrandominitializationandde\ufb02ation.However,theanalysisismoresubtleundernoise.Whilematrixperturbationboundsarewellunderstood,itisanopenprobleminthecaseoftensors.ThisisbecausetheproblemoftensordecompositionisNP-hard,andbecomestractableonlyunderspecialconditionssuchasorthogonality(andmoregenerallylinearindependence).Ifalargeamountofarbitrarynoiseisadded,thedecompositioncanagainbecomeintractable.In[1],guaranteedrecoveryofcomponentswasprovenunderboundednoiseandwerecaptheresultbelow.Theorem2.1([1]Theorem5.1,simpli\ufb01edversion).SupposeeT=T+\u2206T,whereT=Pki=1\u03bbiv\u22973iwith\u03bbi>0andorthonormalbasisvectors{v1,\u00b7\u00b7\u00b7,vk}\u2286Rd,d\u2265k,andnoise\u2206Tsatis\ufb01esk\u2206Tkop\u2264\u0001.Let\u03bbmax,\u03bbminbethelargestandsmallestvaluesin{\u03bbi}ki=1and{\u02c6\u03bbi,\u02c6vi}ki=1beoutputsofAlgorithm1.ThereexistabsoluteconstantsK0,C1,C2,C3>0suchthatif\u0001\u2264C1\u00b7\u03bbmin/d,R=\u2126(logd+loglog(\u03bbmax/\u0001)),L=\u2126(max{K0,k}log(max{K0,k})),(3)thenwithprobabilityatleast0.9,thereexistsapermutation\u03c0:[k]\u2192[k]suchthat|\u03bbi\u2212\u02c6\u03bb\u03c0(i)|\u2264C2\u0001,kvi\u2212\u02c6v\u03c0(i)k2\u2264C3\u0001/\u03bbi,\u2200i=1,\u00b7\u00b7\u00b7,k.Theorem2.1isthe\ufb01rstprovablycorrectresultonrobusttensordecompositionundergeneralnoiseconditions.Inparticular,thenoiseterm\u2206Tcanbedeterministicorevenadversarial.However,oneimportantdrawbackofTheorem2.1isthatk\u2206TkopmustbeupperboundedbyO(\u03bbmin/d),whichisastrongassumptionformanypracticalapplications[28].Ontheotherhand,[2,24]showthatbyusingsmartinitializationstherobusttensorpowermethodiscapableoftoleratingO(\u03bbmin/\u221ad)2Onecanalwaysassumewithoutlossofgeneralitythat\u03bb\u22650byreplacingvwith\u2212vinstead.3\fmagnitudeofnoise,and[25]suggeststhatsuchnoisemagnitudecannotbeimprovedifde\ufb02ation(i.e.,successiverank-oneapproximation)istobeperformed.Inthispaper,weshowthattherelaxednoiseboundO(\u03bbmin/\u221ad)holdseveniftheinitializationofrobustTPMisassimpleasavectoruniformlysampledfromthed-dimensionalsphere(Algorithm1).Ourclaimisformalizedbelow:Theorem2.2(ImprovednoisetoleranceanalysisforrobustTPM).AssumethesamenotationasinTheorem2.1.Let\u0001\u2208(0,1/2)beanerrortoleranceparameter.ThereexistabsoluteconstantsK0,C0,C1,C2,C3>0suchthatif\u2206Tsatis\ufb01esk\u2206T(I,u(\u03c4)t,u(\u03c4)t)k2\u2264\u0001,|\u2206T(vi,u(\u03c4)t,u(\u03c4)t)|\u2264min{\u0001/\u221ak,C0\u03bbmin/d}(4)foralli\u2208[k],t\u2208[T],\u03c4\u2208[L]andfurthermore\u0001\u2264C1\u00b7\u03bbmin/\u221ak,R=\u2126(log(\u03bbmaxd/\u0001)),L=\u2126(max{K0,k}log(max{K0,k})),(5)thenwithprobabilityatleast0.9,thereexistsapermutation\u03c0:[k]\u2192[k]suchthat|\u03bbi\u2212\u02c6\u03bb\u03c0(i)|\u2264C2\u0001,kvi\u2212\u02c6v\u03c0(i)k2\u2264C3\u0001/\u03bbi,\u2200i=1,\u00b7\u00b7\u00b7,k.Duetospaceconstraints,proofofTheorem2.2isplacedinAppendixC.Wenextmakeseveralremarksonourresults.Inparticular,weconsiderthreescenarioswithincreasingassumptionsimposedonthenoisetensor\u2206TandcomparethenoiseconditionsinTheorem2.2withexistingresultsonorthogonaltensordecomposition:1.\u2206Tdoesnothaveanyspecialstructure:inthiscase,weonlyhave|\u2206T(vi,ut,ut)|\u2264k\u2206Tkopandournoiseconditionsreducestotheclassicalone:k\u2206Tkop=O(\u03bbmin/d).2.\u2206Tis\u201cround\u201dinthesensethat|\u2206T(vi,ut,ut)|\u2264O(1/\u221ad)\u00b7k\u2206T(I,ut,ut)k2:thisisthetypicalsettingwhenthenoise\u2206TfollowsGaussianorsub-Gaussiandistributions,asweexplaininSec.3and4.Ournoiseconditioninthiscaseisk\u2206Tkop=O(\u03bbmin/\u221ad),strictlyimprovingTheorem2.1onrobusttensorpowermethodwithrandominitializationsandmatchingtheboundformoreadvancedSVDinitializationtechniquesin[2].3.\u2206Tisweaklycorrelatedwithsignalinthesensethatk\u2206T(vi,I,I)k2=O(\u03bbmin/d)foralli\u2264k:inthiscaseournoiseconditionreducestok\u2206Tkop=O(\u03bbmin/\u221ak),strictlyimprovingoverSVDinitialization[2]inthe\u201cundercomplete\u201dregimek=o(d).Notethatthewhiteningtrick[3,1]doesnotattainourbound,asweexplaininAppendixB.Finally,weremarkthattheloglog(1/\u0001)quadraticconvergencerateinEq.(3)isworsenedtolog(1/\u0001)linearrateinEq.(5).Wearenotsurewhetherthisisanartifactofouranalysis,becausesimilaranalysisforthematrixnoisypowermethod[12]alsorevealsalinearconvergencerate.ImplicationsOurboundsinTheorem2.2resultsinsharperanalysisofbothmemory-ef\ufb01cientanddifferentiallyprivatepowermethodswhichweproposeinSec.3,4.Usingtheoriginalanalysis(Theorem2.1)forthetwoapplications,thememory-ef\ufb01cienttensorpowermethodwouldhavesamplecomplexitycubicinthedimensiondandfordifferentiallyprivatetensordecompositiontheprivacylevel\u03b5needstoscaleas\u02dc\u2126(\u221ad)asdincreases,whichisparticularlybadasthequalityofprivacyprotectione\u03b5degradesexponentiallywithtensordimensiond.Ontheotherhand,ourimprovednoiseconditioninTheorem2.2greatlysharpenstheboundsinbothapplications:formemoryef\ufb01cientdecomposition,wenowrequireonlyquadraticsamplecomplexityandfordifferentiallyprivatedecomposition,theprivacylevel\u03b5hasnopolynomialdependenceond.Thismakesourresultsfarmorepracticalforhigh-dimensionaltensordecompositionapplications.Numericalveri\ufb01cationofnoiseconditionsandcomparisonwithwhiteningtechniquesWever-ifyourimprovednoiseconditionsforrobusttensorpowermethodonsimulationtensordata.Inparticular,weconsiderthreenoisemodelsanddemonstratevariedasymptoticnoisemagnitudesatwhichtensorpowermethodsucceeds.Thesimulationresultsnicelymatchourtheoretical\ufb01ndingsandalsosuggest,inanempiricalway,tightnessofnoiseboundsinTheorem2.2.Duetospaceconstraints,simulationresultsareplacedinAppendixA.4\fWealsocompareourimprovednoiseboundwiththoseobtainedbywhitening,apopulartechniquethatreducestensordecompositiontomatrixdecompositionproblems[1,21,28].WeshowinAppendixBthat,withoutsideinformationthestandardanalysisofwhiteningbasedtensordecompositionleadstoworsenoisetoleranceboundsthanwhatweobtainedinTheorem2.2.3Memory-Ef\ufb01cientStreamingTensorDecompositionTensorpowermethodinAlgorithm1requiressigni\ufb01cantstoragetobedeployed:\u2126(d3)memoryisrequiredtostoreadensed\u00d7d\u00d7dtensor,whichisprohibitivelylargeinmanyreal-worldapplicationsastensordimensiondcouldbereallyhigh.Weshowinthissectionhowtocomputetensordecompositioninamemoryef\ufb01cientmanner,withstoragescalinglinearlyind.Inparticular,weconsiderthecasewhentensorTtobedecomposedisapopulationmomentEx\u223cD[x\u22973]withrespecttosomeunknownunderlyingdatadistributionD,anddatapointsx1,x2,\u00b7\u00b7\u00b7i.i.d.sampledfromDarefedintoatensordecompositionalgorithminastreamingfashion.Oneclassicalexampleistopicmodeling,whereeachxirepresentsdocumentsthatcomeinstreamsandconsistentestimationoftopicscanbeachievedbydecomposingvariantsofthepopulationmoment[1,3].Algorithm2displaysmemory-ef\ufb01cienttensordecompositionprocedureonstreamingdatapoints.ThemainideaistoreplacethepoweriterationstepT(I,u,u)inAlgorithm1witha\u201cdataassociation\u201dstepthatexploitstheempirical-momentstructureofthetensorTtobedecomposedandevaluatesapproximatepoweriterationsfromstochasticdatasamples.Thisprocedureishighlyef\ufb01cient,inthatbothtimeandspacecomplexityscalelinearlywithtensordimensiond:Proposition3.1.Algorithm2runsinO(nkdLR)timeandO(d(k+L))memory,withO(nkR)samplecomplexity(numberofdatapointgonethrough).IntheremainderofthissectionweshowAlgorithm2recoverseigenvectorsofthepopulationmomentEx\u223cD[x\u22973]withhighprobabilityandwederivecorrespondingsamplecomplexitybounds.TofacilitateourtheoreticalanalysisweneedseveralassumptionsonthedatadistributionD.The\ufb01rstnaturalassumptionisthelow-ranknessofthepopulationmomentEx\u223cD[x\u22973]tobedecomposed:Assumption3.1(Low-rankmoment).ThemeantensorT=Ex\u223cD[x\u22973]admitsalow-rankrepre-sentationT=Pki=1\u03bbiv\u22973ifor\u03bb1,\u00b7\u00b7\u00b7,\u03bbk>0andorthonormal{v1,\u00b7\u00b7\u00b7,vk}\u2286Rd.Wealsoplacerestrictionsonthe\u201cnoisemodel\u201d,whichimplythatthepopulationmomentEx\u223cD[x\u22973]canbewellapproximatedbyareasonablenumberofsampleswithhighprobability.Inparticular,weconsidersub-GaussiannoiseasformulatedinDe\ufb01nition3.1andAssumption3.2:De\ufb01nition3.1(Multivariatesub-Gaussiandistribution,[15]).AD-dimensionalrandomvariablexbelongstothesub-GaussiandistributionfamilySGD(\u03c3)withparameter\u03c3>0ifithaszeromeanandE(cid:2)exp(a>x)(cid:3)\u2264exp(cid:8)kak22\u03c32/2(cid:9)foralla\u2208RD.Assumption3.2(Sub-Gaussiannoise).Thereexists\u03c3>0suchthatthemean-centeredvectorizedrandomvariablevec(x\u22973\u2212E[x\u22973])belongstoSGd3(\u03c3)asde\ufb01nedinDe\ufb01nition3.1.WeremarkthatAssumption3.2includesawidefamilyofdistributionsthatareofpracticalimportance,forexamplenoisethathavecompactsupport.Assumption3.2alsoresembles(B,p)-roundnoiseconsideredin[12]thatimposessphericalsymmetryconstraintsontothenoisedistribution.Wearenowreadytopresentthemaintheoremthatboundstherecovery(approximation)errorofeigenvaluesandeigenvectorsofthestreamingrobusttensorpowermethodinAlgorithm2:Theorem3.1(Analysisofstreamingrobusttensorpowermethod).LetAssumptions3.1,3.2holdtrueandsuppose\u0001<C1\u03bbmin/\u221akforsomesuf\ufb01cientlysmallabsoluteconstantC1>0.Ifn=e\u2126(cid:18)min(cid:26)\u03c32d\u00012,\u03c32d2\u03bb2min(cid:27)(cid:19),R=\u2126(log(\u03bbmaxd/\u0001)),L=\u2126(klogk),thenwithprobabilityatleast0.9thereexistspermutation\u03c0:[k]\u2192[k]suchthat|\u03bbi\u2212\u02c6\u03bb\u03c0(i)|\u2264C2\u0001,kvi\u2212\u02c6v\u03c0(i)k2\u2264C3\u0001/\u03bbi,\u2200i=1,\u00b7\u00b7\u00b7,kforsomeuniversalconstantsC2,C3>0.Corollary3.1isthenanimmediateconsequenceofTheorem3.1,whichsimpli\ufb01estheboundsandhighlightsasymptoticdependenciesoverimportantmodelparametersd,kand\u03c3:5\fAlgorithm2Onlinerobusttensorpowermethod1:Input:datastreamx1,x2,\u00b7\u00b7\u00b7\u2208Rd,no.ofcomponentsk,parametersL,R,n.2:fori=1tokdo3:Drawu(1)0,\u00b7\u00b7\u00b7,u(L)0i.i.d.uniformlyatrandomfromtheunitsphereSd\u22121.4:fort=0toR\u22121do5:Initialization:Setaccumulators\u02dcu(1)t+1,\u00b7\u00b7\u00b7,\u02dcu(L)t+1and\u02dc\u03bb(1),\u00b7\u00b7\u00b7,\u02dc\u03bb(L)to0.6:Dataassociation:Readthenextndatapoints;update\u02dcu(\u03c4)t+1\u2190\u02dcu(\u03c4)t+1+1n(x>\u2018u(\u03c4)t)2xiand\u02dc\u03bb(\u03c4)\u2190\u02dc\u03bb(\u03c4)+1n(x>\u2018u(\u03c4)t)3foreach\u2018\u2208[n]and\u03c4\u2208[L].7:De\ufb02ation:Foreach\u03c4\u2208[L],update\u02dcu(\u03c4)t+1\u2190\u02dcu(\u03c4)t+1\u2212Pi\u22121j=1\u02c6\u03bbj\u03be2j,\u03c4\u02c6vjand\u02dc\u03bb(\u03c4)\u2190\u02dc\u03bb(\u03c4)\u2212Pi\u22121j=1\u02c6\u03bbj\u03be3j,\u03c4,where\u03bej,\u03c4=\u02c6v>j\u02dcu(\u03c4)t.8:Normalization:u(\u03c4)t+1=\u02dcu(\u03c4)t+1/k\u02dcu(\u03c4)t+1k2,foreach\u03c4\u2208[L].9:endfor10:Find\u03c4\u2217=argmax\u03c4\u2208[L]\u02dc\u03bb(\u03c4)andstore\u02c6\u03bbi=\u02dc\u03bb(\u03c4\u2217),\u02c6vi=u(\u03c4\u2217)R.11:endfor12:Output:approximateeigenvalueandeigenvectorpairs{\u02c6\u03bbi,\u02c6vi}ki=1of\u02c6Ex\u223cD[x\u22973].Corollary3.1.UnderAssumptions3.1,3.2,Algorithm2correctlylearns{\u03bbi,vi}ki=1uptoO(1/\u221ad)additiveerrorwith\u02dcO(\u03c32kd2)samplesand\u02dcO(dk)memory.ProofsofTheorem3.1andCorollary3.1arebothdeferredtoAppendixD.ComparedtostreamingnoisymatrixPCAconsideredin[12],theboundisweakerwithanadditional1/kfactorintheterminvolving\u0001and1/dfactorinthetermthatdoesnotinvolve\u0001.Weconjecturethistobeafundamentaldif\ufb01cultyofthetensordecompositionproblem.Ontheotherhand,ourboundsresultingfromtheanalysisinSec.2haveaO(1/d)improvementcomparedtoapplyingexistinganalysisin[1]directly.RemarkoncomparisonwithSGD:Ourproposedstreamingtensorpowermethodisnothingbuttheprojectedstochasticgradientdescent(SGD)procedureontheobjectiveofmaximizingthetensornormonthesphere.Theoptimalsolutionofthiscoincideswiththeobjectiveof\ufb01ndingthebestrank-1approximationofthetensor.Here,wecanestimateallthecomponentsofthetensorthroughde\ufb02ation.AnalternativemethodistorunSGDbasedacombinedobjectivefunctiontoobtainallthecomponentsofthetensorsimultaneously,asconsideredin[16,11].However,theanalysisin[11]onlyworksforeven-ordertensorsandhasworsedependency(atleastd9)ontensordimensiond.4DifferentiallyprivatetensordecompositionTheobjectiveofprivatedataprocessingistoreleasedatasummariessuchthatanyparticularentryoftheoriginaldatacannotbereliablyinferredfromthereleasedresults.Formallyspeaking,weadoptthepopular(\u03b5,\u03b4)-differentialprivacycriterionproposedin[9]:De\ufb01nition4.1((\u03b5,\u03b4)-differentialprivacy[9]).LetMdenoteallsymmetricd-dimensionalrealthirdordertensorsandObeanarbitraryoutputset.ArandomizedalgorithmA:M\u2192Ois(\u03b5,\u03b4)-differentiallyprivateifforallneighboringtensorsT,T0andmeasurablesetO\u2286OwehavePr[A(T)\u2208O]\u2264e\u03b5Pr[A(T0)\u2208O]+\u03b4,where\u03b5>0,\u03b4\u2208[0,1)areprivacyparametersandprobabilitiesaretakenoverrandomnessinA.Sinceourtensordecompositionanalysisconcernssymmetrictensorsprimarily,weadopta\u201csymmet-ric\u201dde\ufb01nitionofneighboringtensorsinDe\ufb01nition4.1,asshownbelow:De\ufb01nition4.2(Neighboringtensors).Twod\u00d7d\u00d7dsymmetrictensorsT,T0areneighboringtensorsifthereexistsi,j,k\u2208[d]suchthatT0\u2212T=\u00b1symmetrize(ei\u2297ej\u2297ek)=\u00b1(ei\u2297ej\u2297ek+ei\u2297ek\u2297ej+\u00b7\u00b7\u00b7+ek\u2297ej\u2297ei).Asnotedearlier,theabovenotionscanbesimilarlyextendedtoasymmetrictensorsaswellastheguaranteesfortensorpowermethodonasymmetrictensors.Wealsoremarkthatthedifferenceof6\fAlgorithm3Differentiallyprivaterobusttensorpowermethod1:Input:tensorT,no.ofcomponentsk,numberofiterationsL,R,privacyparameters\u03b5,\u03b4.2:Initialization:D=0,\u03bd=6\u221a2ln(1.25/\u03b40)\u03b50,\u03b40=\u03b42K,\u03b50=\u03b5\u221aK(4+ln(2/\u03b4)),K=kL(R+1).3:fori=1tokdo4:Initialization:Drawu(1)0,\u00b7\u00b7\u00b7,u(\u03c4)0uniformlyatrandomfromtheunitsphereinRd.5:fort=0toR\u22121do6:Poweriteration:compute\u02dcu(\u03c4)t+1=(T\u2212D)(I,u(\u03c4)t,u(\u03c4)t).7:Noisecalibration:release\u00afu(\u03c4)t+1=\u02dcu(\u03c4)t+1+\u03bdku(\u03c4)tk2\u221e\u00b7z(\u03c4)t,wherez(\u03c4)ti.i.d.\u223cN(0,Id).8:Normalization:u(\u03c4)t+1=\u00afu(\u03c4)t+1/k\u00afu(\u03c4)t+1k2.9:endfor10:Compute\u02dc\u03bb(\u03c4)=(T\u2212D)(u(\u03c4)R,u(\u03c4)R,u(\u03c4)R)+\u03bdku(\u03c4)Rk3\u221e\u00b7z\u03c4andlet\u03c4\u2217=argmax\u03c4\u02dc\u03bb(\u03c4).11:De\ufb02ation:\u02c6\u03bbi=\u02dc\u03bb(\u03c4\u2217),\u02c6vi=u(\u03c4\u2217)R,D\u2190D+\u02c6\u03bbi\u02c6v\u22973i.12:endfor13:Output:eigenvalue/eigenvectorpairs{\u02c6\u03bbi,\u02c6vi}ki=1.\u201cneighboringtensors\u201dasde\ufb01nedabovehasFrobeniousnormboundedbyO(1).Thisisnecessarybecauseanarbitraryperturbationofatensor,evenifrestrictedtoonlyoneentry,iscapableofdestroyinganyutilityguaranteepossible.Inanutshell,De\ufb01nitions4.1,4.2statethatanalgorithmAisdifferentiallyprivateif,conditionedonanysetofpossibleoutputsofA,onecannotdistinguishwithhighprobabilitybetweentwo\u201cneighboring\u201dtensorsT,T0thatdifferonlyinasingleentry(uptosymmetrization),thusprotectingtheprivacyofanyparticularelementintheoriginaltensorT.Here\u03b5,\u03b4areparameterscontrollingthelevelofprivacy,withsmaller\u03b5,\u03b4valuesimplyingstrongerprivacyguaranteeasPr[A(T)\u2208O]andPr[A(T0)\u2208O]areclosertoeachother.Algorithm3describestheprocedureofprivatelyreleasingeigenvectorsofalow-rankinputtensorT.Themainideaforprivacypreservationisthefollowingnoisecalibrationstep\u00afut+1=\u02dcut+1+\u03bdkutk2\u221e\u00b7zt,whereztisad-dimensionalstandardNormalrandomvariableand\u03bdkutk2\u221eisacarefullydesignednoisemagnitudeinordertoachieveddesiredprivacylevel(\u03b5,\u03b4).Onekeyaspectisthatthenoisecalibrationstepoccursateverypoweriteration,whichaddstotherobustnessofthealgorithmandachievessharperbounds.Wediscussattheendofthissection.Theorem4.1(Privacyguarantee).Algorithm3satis\ufb01es(\u03b5,\u03b4)-differentialprivacy.Proof.TheonlypoweriterationstepofAlgorithm3canbethoughtofasK=kL(R+1)queriesdirectedtoaprivatedatasanitizerwhichproducesf1(T;u)=T(I,u,u)orf2(T;u)=T(u,u,u)eachtime.The\u20182-sensitivityofbothqueriescanbeseparatelyboundedas\u22062f1=supT0kT(I,u,u)\u2212T0(I,u,u)k2\u2264supi,j,k2(|uiuj|+|uiuk|+|ujuk|)\u22646kuk2\u221e;\u22062f2=supT0(cid:12)(cid:12)T(u,u,u)\u2212T0(u,u,u)(cid:12)(cid:12)=supi,j,k6(cid:12)(cid:12)uiujuk(cid:12)(cid:12)\u22646kuk3\u221e,whereT0=T+symmetrize(ei\u2297ej\u2297ek)issomeneighboringtensorofT.Thus,applyingtheGaussianmechanism[9]wecan(\u03b5,\u03b4)-privatelyreleaseoneoutputofeitherf1(u)orf2(u)byf\u2018(u)+\u22062f\u2018\u00b7p2ln(1.25/\u03b4)\u03b5\u00b7w,where\u2018=1,2andw\u223cN(0,I)arei.i.d.standardNormalrandomvariables.Finally,applyingadvancedcomposition[9]acrossallK=kL(R+1)privatereleaseswecompletetheproofofthisproposition.Notethatbothnormalizationandde\ufb02ationstepsdonotaffectthedifferentialprivacyofAlgorithm3duetotheclosenessunderpost-processingpropertyofDP.Therestofthesectionisdevotedtodiscussingthe\u201cutility\u201dofAlgorithm3;i.e.,toshowthatthealgorithmisstillcapableofproducingapproximateeigenvectors,despitetheprivacyconstraints.Similarto[12],weadoptthefollowingincoherenceassumptionsontheeigenspaceofT:7\fAssumption4.1(Incoherentbasis).SupposeV\u2208Rd\u00d7kisthestackedmatrixoforthonormalcomponentvectors{vi}ki=1.Thereexistsconstant\u00b50>0suchthatdkmax1\u2264i\u2264dkV>eik22\u2264\u00b50.(6)Notethatbyde\ufb01nition,\u00b50isalwaysintherangeof[1,d/k].Intuitively,Assumption4.1withsmallconstant\u00b50impliesarelatively\u201c\ufb02at\u201ddistributionofelementmagnitudesinT.Theincoherencelevel\u00b50playsanimportantroleintheutilityguaranteeofAlgorithm3,asweshowbelow:Theorem4.2(Guaranteedrecoveryofeigenvectorunderprivacyrequirements).SupposeT=Pki=1\u03bbiv\u22973ifor\u03bb1>\u03bb2\u2265\u03bb3\u2265\u00b7\u00b7\u00b7\u2265\u03bbk>0withorthonormalv1,\u00b7\u00b7\u00b7,vk\u2208Rd,andsupposeAssumption4.1holdswith\u00b50.Assume\u03bb1\u2212\u03bb2\u2265c/\u221adforsomesuf\ufb01cientlysmalluniversalconstantc>0.IfR=\u0398(log(\u03bbmaxd)),L=\u0398(klogk)and\u03b5,\u03b4satisfy\u03b5=\u2126(cid:18)\u00b50k2log(\u03bbmaxd/\u03b4)\u03bbmin(cid:19),(7)thenwithprobabilityatleast0.9the\ufb01rsteigenpair(\u02c6\u03bb1,\u02c6v1)returnedbyAlgorithm3satis\ufb01es(cid:12)(cid:12)\u03bb1\u2212\u02c6\u03bb1(cid:12)(cid:12)=O(1/\u221ad),kv1\u2212\u02c6v1k2=O(1/(\u03bb1\u221ad)).Atahighlevel,Theorem4.2statesthatwhentheprivacyparameter\u03b5isnottoosmall(i.e.,privacyrequirementsarenottoostringent),Algorithm3approximatelyrecoversthelargesteigenvalueandeigenvectorwithhighprobability.Furthermore,when\u00b50isaconstant,thelowerboundconditionontheprivacyparameter\u03b5doesnotdependpolynomiallyupontensordimensiond,whichisamuchdesiredpropertyforhigh-dimensionaldataanalysis.Ontheotherhand,similarresultscannotbeachievedviasimplermethodslikeinputperturbation,aswediscussbelow:ComparisonwithinputperturbationInputperturbationisperhapsthesimplestmethodfordif-ferentiallyprivatedataanalysisandhasbeensuccessfulinnumerousscenarios,e.g.privatematrixPCA[10].Inourcontext,thiswouldentailappendingarandomGaussiantensorEdirectlyontotheinputtensorTbeforetensorpoweriterations.ByGaussianmechanism,thestandarddeviation\u03c3ofeachelementinEscalesas\u03c3=\u2126(\u03b5\u22121plog(1/\u03b4)).Ontheotherhand,noiseanalysisfortensordecompositionderivedin[24,2]andinthesubsequentsectionofthispaperrequires\u03c3=O(1/d)orkEkop=O(1/\u221ad),whichimplies\u03b5=\u02dc\u2126(d)(cf.LemmaF.9).Thatis,theprivacyparameter\u03b5mustscalelinearlywithtensordimensiondtosuccessfullyrecovereventhe\ufb01rstprincipleeigenvector,whichrenderstheprivacyguaranteeoftheinputperturbationprocedureuselessforhigh-dimensionaltensors.Thus,werequireanon-trivialnewapproachfordifferentiallyprivatetensordecomposition.Finally,weremarkthatamoredesiredutilityanalysiswouldboundtheapproximationerrorkvi\u2212\u02c6vik2foreverycomponentv1,\u00b7\u00b7\u00b7,vk,andnotjustthetopeigenvector.Unfortunately,ourcurrentanalysiscannothandlede\ufb02ationeffectivelyasthede\ufb02atedvector\u02c6vi\u2212vimaynotbeincoherent.Extensiontode\ufb02atedtensordecompositionremainsaninterestingopenquestion.5ConclusionWeconsidermemory-ef\ufb01cientanddifferentiallyprivatetensordecompositionproblemsinthispaperandderiveef\ufb01cientalgorithmsforbothonlineandprivatetensordecompositionbasedonthepopulartensorpowermethodframework.Throughanimprovednoiseconditionanalysisofrobusttensorpowermethod,weobtainsharperdimension-dependentsamplecomplexityboundsforonlinetensordecompositionandwiderrangeofprivacyparametersvaluesforprivatetensordecompositionwhilestillretainingutility.Simulationresultsverifythetightnessofournoiseconditionsinprinciple.Oneimportantdirectionoffutureresearchistoextendouronlineand/orprivatetensordecompositionalgorithmsandanalysistopracticalapplicationssuchastopicmodelingandcommunitydetection,wheretensordecompositionactsasonecriticalstepfordataanalysis.Anend-to-endanalysisofonline/privatemethodsfortheseapplicationswouldbetheoreticallyinterestingandcouldalsogreatlybene\ufb01tpracticalmachinelearningofimportantmodels.AcknowledgementA.AnandkumarissupportedinpartbyMicrosoftFacultyFellowship,NSFCareerawardCCF-1254106,ONRAwardN00014-14-1-0665,AROYIPAwardW911NF-13-1-0084andAFOSRYIPFA9550-15-1-0221.8\fReferences[1]A.Anandkumar,R.Ge,D.Hsu,S.M.Kakade,andM.Telgarsky.Tensordecompositionsforlearninglatentvariablemodels.JournalofMachineLearningResearch,15(1):2773\u20132832,2014.[2]A.Anandkumar,R.Ge,andM.Janzamin.Learningovercompletelatentvariablemodelsthroughtensormethods.InProc.ofCOLT,2015.[3]A.Anandkumar,Y.-k.Liu,D.J.Hsu,D.P.Foster,andS.M.Kakade.Aspectralalgorithmforlatentdirichletallocation.InNIPS,2012.[4]K.Azizzadenesheli,A.Lazaric,andA.Anandkumar.ReinforcementlearningofPOMDP\u2019susingspectralmethods.InCOLT,2016.[5]B.W.BaderandT.G.Kolda.Algorithm862:Matlabtensorclassesforfastalgorithmprototyping.ACMTransactionsonMathematicalSoftware,32(4):635\u2013653,2006.[6]M.-F.Balcan,S.Du,Y.Wang,andA.W.Yu.Animprovedgap-dependencyanalysisofthenoisypowermethod.InCOLT,2016.[7]L.Birg\u00e9.Analternativepointofviewonlepski\u2019smethod.LectureNotes-MonographSeries,pages113\u2013133,2001.[8]B.Cirel\u2019soN,I.Ibragimov,andV.Sudakov.Normsofgaussiansamplefunctions.LectureNotesinMathematics,550:20\u201341,1976.[9]C.DworkandA.Roth.Thealgorithmicfoundationsofdifferentialprivacy.FoundationsandTrendsinTheoreticalComputerScience,9(3-4):211\u2013407,2014.[10]C.Dwork,K.Talwar,A.Thakurta,andL.Zhang.Analyzegauss:optimalboundsforprivacy-preservingprincipalcomponentanalysis.InSTOC,2014.[11]R.Ge,F.Huang,C.Jin,andY.Yuan.Escapingfromsaddlepoints\u2014onlinestochasticgradientfortensordecomposition.InCOLT,2015.[12]M.HardtandE.Price.Thenoisypowermethod:Ametaalgorithmwithapplications.InNIPS,2014.[13]C.J.HillarandL.-H.Lim.Mosttensorproblemsarenp-hard.JournaloftheACM(JACM),60(6):45,2013.[14]S.B.Hopkins,J.Shi,andD.Steurer.Tensorprincipalcomponentanalysisviasum-of-squaresproofs.InCOLT,2015.[15]D.Hsu,S.M.Kakade,andT.Zhang.Atailinequalityforquadraticformsofsubgaussianrandomvectors.Electron.Commun.Probab,17(52):1\u20136,2012.[16]F.Huang,U.Niranjan,M.U.Hakeem,andA.Anandkumar.Onlinetensormethodsforlearninglatentvariablemodels.JournalofMachineLearningResearch,16:2797\u20132835,2015.[17]F.Huang,I.Perros,R.Chen,J.Sun,A.Anandkumar,etal.Scalablelatenttreemodelanditsapplicationtohealthanalytics.arXivpreprintarXiv:1406.4566,2014.[18]M.Janzamin,H.Sedghi,andA.Anandkumar.Beatingtheperilsofnon-convexity:Guaranteedtrainingofneuralnetworksusingtensormethods.arXivpreprintarXiv:1506.08473,2015.[19]G.Kamath.Boundsontheexpectationofthemaximumofsamplesfromagaussian.[Online;accessedApril,2016].[20]T.G.KoldaandJ.R.Mayo.Shiftedpowermethodforcomputingtensoreigenpairs.SIAMJournalonMatrixAnalysisandApplications,32(4):1095\u20131124,2011.[21]V.Kuleshov,A.T.Chaganty,andP.Liang.Tensorfactorizationviamatrixfactorization.InAISTATS,2015.[22]B.LaurentandP.Massart.Adaptiveestimationofaquadraticfunctionalbymodelselection.AnnalsofStatistics,pages1302\u20131338,2000.[23]P.Massart.Concentrationinequalitiesandmodelselection,volume6.Springer,2007.[24]A.MontanariandE.Richard.AstatisticalmodelfortensorPCA.InNIPS,2014.[25]C.Mu,D.Hsu,andD.Goldfarb.Successiverank-oneapproximationsfornearlyorthogonallydecompos-ablesymmetrictensors.SIAMJournalonMatrixAnalysisandApplications,36(4):1638\u20131659,2015.[26]G.W.Stewart,J.-g.Sun,andH.B.Jovanovich.Matrixperturbationtheory.AcademicpressNewYork,1990.[27]R.TomiokaandT.Suzuki.Spectralnormofrandomtensors.arXiv:1407.1870,2014.[28]Y.Wang,H.-Y.Tung,A.J.Smola,andA.Anandkumar.Fastandguaranteedtensordecompositionviasketching.InNIPS,2015.[29]Y.WangandJ.Zhu.Spectralmethodsforsupervisedtopicmodels.InNIPS,2014.[30]R.Zemel,Y.Wu,K.Swersky,T.Pitassi,andC.Dwork.Learningfairrepresentations.InICML,2013.9\f", "award": [], "sourceid": 1769, "authors": [{"given_name": "Yining", "family_name": "Wang", "institution": "Carnegie Mellon University"}, {"given_name": "Anima", "family_name": "Anandkumar", "institution": "UC Irvine"}]}